Learn Pandas Series and DataFrames

Pandas is one of the most powerful and widely used Python libraries for data analysis and data manipulation. It provides easy-to-use data structures and functions needed to work efficiently with structured data. The two most important data structures in Pandas are the Series and the DataFrame.

1. Introduction to Pandas

Pandas is built on top of NumPy and is designed to handle tabular data, time series, and heterogeneous datasets efficiently.

Key Features:

- Easy handling of missing data - Powerful grouping and aggregation - Data alignment and indexing - Integration with other libraries like Matplotlib and Scikit-learn

2. Understanding Pandas Series

A Series is a one-dimensional labeled array capable of holding data of any type (integers, strings, floats, etc.).
Creating a Series
import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
Output:
0    10
1    20
2    30
3    40
dtype: int64
With Custom Index
import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(s)
Output:
a    10
b    20
c    30
d    40
dtype: int64
Key Characteristics of Series:
- One-dimensional - Contains values and index - Supports vectorized operations
Accessing Data
import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])

print(s['a'])  # Access by label
print(s.iloc[0])  # Access by position
Output:
10
10
Operations on Series
import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])

add = s + 10
print(add)

mul = s * 2
print(mul)
Output:
a    20
b    30
c    40
d    50
dtype: int64
a    20
b    40
c    60
d    80
dtype: int64
These operations are element-wise.

3. Understanding Pandas DataFrame

A DataFrame is a two-dimensional labeled data structure with rows and columns, similar to a table or spreadsheet.
Creating a DataFrame
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
print(df)
Output:
    Name  Age
0   John   25
1  Alice   30
2    Bob   22
Key Characteristics:
- Two-dimensional - Columns can have different data types - Labeled axes (rows and columns)

4. Accessing Data in DataFrame

Column Selection
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
col = df['Name']
print(col)
Output:
0     John
1    Alice
2      Bob
Name: Name, dtype: str
Multiple Columns
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
mul_col = df[['Name', 'Age']]
print(mul_col)
Output:
    Name  Age
0   John   25
1  Alice   30
2    Bob   22
Row Selection
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
by_label = df.loc[0]  # By label
print(by_label)

by_row = df.iloc[0]  # By index
print(by_row)
Output:
Name    John
Age       25
Name: 0, dtype: object
Name    John
Age       25
Name: 0, dtype: object

5. DataFrame Operations

Adding a New Column
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
df['Salary'] = [50000, 60000, 45000]  # Adding a new column

print(df)
Output:
    Name  Age  Salary
0   John   25   50000
1  Alice   30   60000
2    Bob   22   45000
Deleting a Column
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
df.drop('Age', axis=1, inplace=True)  # Deleting a column

print(df)
Output:
    Name
0   John
1  Alice
2    Bob
Filtering Data
import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
filtered = df[df['Age'] > 25]  # Filtering

print(filtered)
Output:
    Name  Age
1  Alice   30

6. Handling Missing Data

Pandas provides powerful tools to deal with missing values.
Sample DataFrame with Missing Values
import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df)
Output:
    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   NaN   90.0
2   Sita  30.0    NaN
3  Geeta   NaN   70.0
1. df.isnull() โ†’ Detect Missing Values
import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(print(df.isnull()))
Output:
    Name    Age  Marks
0  False  False  False
1  False   True  False
2  False  False   True
3  False   True  False
None
2. df.dropna() โ†’ Remove Rows with Missing Values
import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.dropna())
Output:
   Name   Age  Marks
0  Amit  25.0   80.0
3. df.fillna(0) โ†’ Replace Missing Values
import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.fillna(0))
Output:
    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   0.0   90.0
2   Sita  30.0    0.0
3  Geeta   0.0   70.0
3. df.fillna(0) โ†’ Replace Missing Values
import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.fillna(0))
Output:
    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   0.0   90.0
2   Sita  30.0    0.0
3  Geeta   0.0   70.0

7. Indexing and Selection

Setting Index
df.set_index('Name', inplace=True)
Resetting Index
df.reset_index(inplace=True)

8. Data Aggregation and Grouping

Grouping allows splitting data into groups and applying functions.
import pandas as pd

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, 25, 30, 40],
    'Marks': [80, 90, 90, 70]
}

df = pd.DataFrame(data)
group_by_age_sum = df.groupby('Age').sum()

print(group_by_age_sum)
Output:
          Name  Marks
Age                  
25   AmitRahul    170
30        Sita     90
40       Geeta     70
Common aggregation functions:
- sum() - mean() - count() - min(), max()

9. Reading and Writing Data

Reading Files
df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')
Writing Files
df.to_csv('output.csv')
df.to_excel('output.xlsx')

10. Difference Between Series and DataFrame

Feature Series DataFrame
Dimension One-dimensional Two-dimensional
Structure Single column Multiple columns
Data Types Homogeneous or mixed Mixed across columns
Use Case Simple data Complex datasets
Understanding Pandas Series and DataFrames is essential for anyone working in data science, machine learning, or data analysis. A Series provides a simple way to handle single-dimensional data, while a DataFrame offers a powerful structure for handling complex, tabular datasets.
Nagesh Chauhan
Nagesh Chauhan
Principal Engineer | Java ยท Spring Boot ยท Python ยท Microservices ยท AI/ML

Principal Engineer with 14+ years of experience in designing scalable systems using Java, Spring Boot, and Python. Specialized in microservices architecture, system design, and machine learning.

Share this Article

๐Ÿ’ฌ Comments

Join the discussion