Pandas Core Structures

Pandas Core Structures

Pandas Core Structures

Python Data Manipulation 

In the previous chapter, we understood why Pandas is one of the most important libraries for data handling in Python. Now it’s time to understand the heart of Pandas – its core data structures.

Pandas mainly works with two powerful structures:

  • Series – for one-dimensional data
  • DataFrame – for two-dimensional, table-like data

Along with these, we must also understand indexing basics, because without proper indexing, working with data becomes confusing.


1. Pandas Series

A Series is the simplest Pandas data structure. You can think of it as an enhanced Python list that comes with labels (indexes).

Unlike a normal list, a Series:

  • Stores data with an index
  • Can handle missing values
  • Works efficiently with numerical operations

Creating a Series

To create a Series, we use pd.Series().

import pandas as pd

data = [10, 20, 30, 40]

s = pd.Series(data)

print(s)

Output:

0    10

1    20

2    30

3    40

dtype: int64

Here, the numbers on the left (0, 1, 2, 3) are indexes, automatically created by Pandas.

Custom Index in Series

We can also define our own meaningful index values.

s = pd.Series([100, 200, 300], index=["Jan", "Feb", "Mar"])

print(s)

This makes data more readable and realistic, especially in real-world datasets.

Accessing Series Data

You can access values using:

  • Index label
  • Index position
print(s["Jan"])

print(s[1])

This flexibility is one reason Pandas is preferred in data analysis.


2. Pandas DataFrame

In practical data analysis work, the DataFrame is the structure you will use most often because it closely resembles tables we see in Excel or databases

It represents data in the form of rows and columns, similar to an Excel sheet or SQL table.

In simple words:

  • Each column is a Series
  • All columns together form a DataFrame

Creating a DataFrame

The most common way is using a dictionary.

data = {

    "Name": ["Amit", "Riya", "Suresh"],

    "Age": [21, 22, 23],

    "Marks": [85, 90, 88]

}

df = pd.DataFrame(data)

print(df)

This creates a clean, structured table that is easy to analyze.

Understanding Rows and Columns

  • Rows represent individual records
  • Columns represent features or attributes

For example, here:

  • Each student is one row
  • Name, Age, and Marks are columns

Accessing Columns

print(df["Name"])

print(df.Age)

Both methods work, but using square brackets is considered safer.


3. Indexing Basics in Pandas

Indexing is how we select specific data from a Series or DataFrame. Pandas provides powerful and flexible indexing methods.

Default Index

When no index is specified, Pandas automatically numbers the rows, beginning with zero.

print(df.index)

Row Selection Using loc

loc is label-based indexing.

print(df.loc[0])

This returns the complete first row.

Row Selection Using iloc

iloc works with integer positions.

print(df.iloc[1])

It is useful when you want position-based access, similar to lists.

Selecting Specific Rows and Columns

print(df.loc[0, "Name"])

print(df.iloc[0, 2])

This level of control is what makes Pandas extremely powerful for data analysis.


Why These Core Structures Matter

Understanding Series, DataFrame, and indexing is non-negotiable if you want to:

  • Work with CSV or Excel files
  • Clean and filter datasets
  • Build data-driven applications
  • Prepare data for Machine Learning

Without mastering these basics, advanced Pandas operations will feel confusing.


Chapter Summary

  • Series handles one-dimensional labeled data
  • DataFrame handles two-dimensional tabular data
  • Indexing allows precise data selection

Multiple Choice Questions (MCQs)

This section helps you check how well you understand the core structures of Pandas. Try answering each question on your own before viewing the correct answer.

1. What is the main purpose of a Pandas Series?

  • A. To store multi-dimensional data
  • B. To represent one-dimensional labeled data
  • C. To create charts
  • D. To replace Python dictionaries

Correct Answer: B

2. Which function is used to create a Series in Pandas?

  • A. pd.Data()
  • B. pd.Table()
  • C. pd.Series()
  • D. pd.Frame()

Correct Answer: C

3. In a Pandas Series, what does the index represent?

  • A. Memory location
  • B. Column header
  • C. Label for each value
  • D. Data type

Correct Answer: C

4. What happens when no index is provided while creating a Series?

  • A. An error is raised
  • B. Data is skipped
  • C. Pandas creates numeric labels automatically
  • D. Index remains empty

Correct Answer: C

5. A Pandas DataFrame is best described as:

  • A. A single list
  • B. A one-dimensional structure
  • C. A table with rows and columns
  • D. A Python tuple

Correct Answer: C

6. Which data structure is most commonly used to create a DataFrame?

  • A. List
  • B. Tuple
  • C. Dictionary
  • D. Set

Correct Answer: C

7. Each column in a DataFrame is internally treated as:

  • A. List
  • B. Series
  • C. Tuple
  • D. Matrix

Correct Answer: B

8. Which method is used for label-based row selection?

  • A. iloc
  • B. select
  • C. loc
  • D. index

Correct Answer: C

9. Which method is used for position-based indexing?

  • A. loc
  • B. iloc
  • C. at
  • D. where

Correct Answer: B

10. What does df.iloc[1] return?

  • A. First column
  • B. Row with label 1
  • C. Second row by position
  • D. Second column

Correct Answer: C

11. Which is the safest way to access a DataFrame column?

  • A. df.columnName
  • B. df->columnName
  • C. df["columnName"]
  • D. df(columnName)

Correct Answer: C

12. Why is indexing important in Pandas?

  • A. It increases execution time
  • B. It formats the output
  • C. It allows precise data selection
  • D. It converts file types

Correct Answer: C

13. Which Pandas object is best suited for Excel-like data?

  • A. Series
  • B. List
  • C. DataFrame
  • D. Dictionary

Correct Answer: C

14. What type of data can a DataFrame handle?

  • A. Only numbers
  • B. Only text
  • C. Mixed data types
  • D. Only boolean values

Correct Answer: C

15. Which statement is TRUE?

  • A. Series can have multiple columns
  • B. DataFrames do not support indexing
  • C. DataFrame works with rows and columns
  • D. Indexing is optional and unused

Correct Answer: C


Regular practice of these questions will strengthen your understanding of Pandas fundamentals. In the next chapter, we will work with real datasets and file operations.

In the next chapter, we will move into data loading, filtering, and real-world operations using Pandas.

Keep practicing small examples — Pandas becomes easier only through hands-on use.

Previous Post Next Post

Contact Form