Pandas Core Structures
Python Data Manipulation
In the previous chapter, we understood why Pandas is one of the most important libraries for data handling in Python. Now it’s time to understand the heart of Pandas – its core data structures.
Pandas mainly works with two powerful structures:
- Series – for one-dimensional data
- DataFrame – for two-dimensional, table-like data
Along with these, we must also understand indexing basics, because without proper indexing, working with data becomes confusing.
1. Pandas Series
A Series is the simplest Pandas data structure. You can think of it as an enhanced Python list that comes with labels (indexes).
Unlike a normal list, a Series:
- Stores data with an index
- Can handle missing values
- Works efficiently with numerical operations
Creating a Series
To create a Series, we use pd.Series().
import pandas as pd
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
Output:
0 10
1 20
2 30
3 40
dtype: int64
Here, the numbers on the left (0, 1, 2, 3) are indexes, automatically created by Pandas.
Custom Index in Series
We can also define our own meaningful index values.
s = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])
print(s)
This makes data more readable and realistic, especially in real-world datasets.
Accessing Series Data
You can access values using:
- Index label
- Index position
print(s["Jan"])
print(s[1])
This flexibility is one reason Pandas is preferred in data analysis.
2. Pandas DataFrame
It represents data in the form of rows and columns, similar to an Excel sheet or SQL table.
In simple words:
- Each column is a Series
- All columns together form a DataFrame
Creating a DataFrame
The most common way is using a dictionary.
data = {
"Name": ["Amit", "Riya", "Suresh"],
"Age": [21, 22, 23],
"Marks": [85, 90, 88]
}
df = pd.DataFrame(data)
print(df)
This creates a clean, structured table that is easy to analyze.
Understanding Rows and Columns
- Rows represent individual records
- Columns represent features or attributes
For example, here:
- Each student is one row
- Name, Age, and Marks are columns
Accessing Columns
print(df["Name"])
print(df.Age)
Both methods work, but using square brackets is considered safer.
3. Indexing Basics in Pandas
Indexing is how we select specific data from a Series or DataFrame. Pandas provides powerful and flexible indexing methods.
Default Index
When no index is specified, Pandas automatically numbers the rows, beginning with zero.
print(df.index)
Row Selection Using loc
loc is label-based indexing.
print(df.loc[0])
This returns the complete first row.
Row Selection Using iloc
iloc works with integer positions.
print(df.iloc[1])
It is useful when you want position-based access, similar to lists.
Selecting Specific Rows and Columns
print(df.loc[0, "Name"])
print(df.iloc[0, 2])
This level of control is what makes Pandas extremely powerful for data analysis.
Why These Core Structures Matter
Understanding Series, DataFrame, and indexing is non-negotiable if you want to:
- Work with CSV or Excel files
- Clean and filter datasets
- Build data-driven applications
- Prepare data for Machine Learning
Without mastering these basics, advanced Pandas operations will feel confusing.
Chapter Summary
- Series handles one-dimensional labeled data
- DataFrame handles two-dimensional tabular data
- Indexing allows precise data selection
Multiple Choice Questions (MCQs)
This section helps you check how well you understand the core structures of Pandas. Try answering each question on your own before viewing the correct answer.
1. What is the main purpose of a Pandas Series?
- A. To store multi-dimensional data
- B. To represent one-dimensional labeled data
- C. To create charts
- D. To replace Python dictionaries
Correct Answer: B
2. Which function is used to create a Series in Pandas?
- A. pd.Data()
- B. pd.Table()
- C. pd.Series()
- D. pd.Frame()
Correct Answer: C
3. In a Pandas Series, what does the index represent?
- A. Memory location
- B. Column header
- C. Label for each value
- D. Data type
Correct Answer: C
4. What happens when no index is provided while creating a Series?
- A. An error is raised
- B. Data is skipped
- C. Pandas creates numeric labels automatically
- D. Index remains empty
Correct Answer: C
5. A Pandas DataFrame is best described as:
- A. A single list
- B. A one-dimensional structure
- C. A table with rows and columns
- D. A Python tuple
Correct Answer: C
6. Which data structure is most commonly used to create a DataFrame?
- A. List
- B. Tuple
- C. Dictionary
- D. Set
Correct Answer: C
7. Each column in a DataFrame is internally treated as:
- A. List
- B. Series
- C. Tuple
- D. Matrix
Correct Answer: B
8. Which method is used for label-based row selection?
- A. iloc
- B. select
- C. loc
- D. index
Correct Answer: C
9. Which method is used for position-based indexing?
- A. loc
- B. iloc
- C. at
- D. where
Correct Answer: B
10. What does df.iloc[1] return?
- A. First column
- B. Row with label 1
- C. Second row by position
- D. Second column
Correct Answer: C
11. Which is the safest way to access a DataFrame column?
- A. df.columnName
- B. df->columnName
- C. df["columnName"]
- D. df(columnName)
Correct Answer: C
12. Why is indexing important in Pandas?
- A. It increases execution time
- B. It formats the output
- C. It allows precise data selection
- D. It converts file types
Correct Answer: C
13. Which Pandas object is best suited for Excel-like data?
- A. Series
- B. List
- C. DataFrame
- D. Dictionary
Correct Answer: C
14. What type of data can a DataFrame handle?
- A. Only numbers
- B. Only text
- C. Mixed data types
- D. Only boolean values
Correct Answer: C
15. Which statement is TRUE?
- A. Series can have multiple columns
- B. DataFrames do not support indexing
- C. DataFrame works with rows and columns
- D. Indexing is optional and unused
Correct Answer: C
Regular practice of these questions will strengthen your understanding of Pandas fundamentals. In the next chapter, we will work with real datasets and file operations.
In the next chapter, we will move into data loading, filtering, and real-world operations using Pandas.
Keep practicing small examples — Pandas becomes easier only through hands-on use.
