Learn Pandas Series and DataFrames

Pandas is one of the most powerful and widely used Python libraries for data analysis and data manipulation. It provides easy-to-use data structures and functions needed to work efficiently with structured data. The two most important data structures in Pandas are the Series and the DataFrame.

1. Introduction to Pandas

Pandas is built on top of NumPy and is designed to handle tabular data, time series, and heterogeneous datasets efficiently.

Key Features:

- Easy handling of missing data - Powerful grouping and aggregation - Data alignment and indexing - Integration with other libraries like Matplotlib and Scikit-learn

2. Understanding Pandas Series

A Series is a one-dimensional labeled array capable of holding data of any type (integers, strings, floats, etc.).

Creating a Series

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

Output:

0    10
1    20
2    30
3    40
dtype: int64

With Custom Index

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(s)

Output:

a    10
b    20
c    30
d    40
dtype: int64

Key Characteristics of Series:

- One-dimensional - Contains values and index - Supports vectorized operations

Accessing Data

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])

print(s['a'])  # Access by label
print(s.iloc[0])  # Access by position

Output:

10
10

Operations on Series

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data, index=['a', 'b', 'c', 'd'])

add = s + 10
print(add)

mul = s * 2
print(mul)

Output:

a    20
b    30
c    40
d    50
dtype: int64
a    20
b    40
c    60
d    80
dtype: int64

These operations are element-wise.

3. Understanding Pandas DataFrame

A DataFrame is a two-dimensional labeled data structure with rows and columns, similar to a table or spreadsheet.

Creating a DataFrame

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
print(df)

Output:

    Name  Age
0   John   25
1  Alice   30
2    Bob   22

Key Characteristics:

- Two-dimensional - Columns can have different data types - Labeled axes (rows and columns)

4. Accessing Data in DataFrame

Column Selection

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
col = df['Name']
print(col)

Output:

0     John
1    Alice
2      Bob
Name: Name, dtype: str

Multiple Columns

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
mul_col = df[['Name', 'Age']]
print(mul_col)

Output:

    Name  Age
0   John   25
1  Alice   30
2    Bob   22

Row Selection

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
by_label = df.loc[0]  # By label
print(by_label)

by_row = df.iloc[0]  # By index
print(by_row)

Output:

Name    John
Age       25
Name: 0, dtype: object
Name    John
Age       25
Name: 0, dtype: object

5. DataFrame Operations

Adding a New Column

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
df['Salary'] = [50000, 60000, 45000]  # Adding a new column

print(df)

Output:

    Name  Age  Salary
0   John   25   50000
1  Alice   30   60000
2    Bob   22   45000

Deleting a Column

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
df.drop('Age', axis=1, inplace=True)  # Deleting a column

print(df)

Output:

    Name
0   John
1  Alice
2    Bob

Filtering Data

import pandas as pd

data = {
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22]
}

df = pd.DataFrame(data)
filtered = df[df['Age'] > 25]  # Filtering

print(filtered)

Output:

    Name  Age
1  Alice   30

6. Handling Missing Data

Pandas provides powerful tools to deal with missing values.

Sample DataFrame with Missing Values

import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df)

Output:

    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   NaN   90.0
2   Sita  30.0    NaN
3  Geeta   NaN   70.0

1. df.isnull() → Detect Missing Values

import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(print(df.isnull()))

Output:

    Name    Age  Marks
0  False  False  False
1  False   True  False
2  False  False   True
3  False   True  False
None

2. df.dropna() → Remove Rows with Missing Values

import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.dropna())

Output:

   Name   Age  Marks
0  Amit  25.0   80.0

3. df.fillna(0) → Replace Missing Values

import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.fillna(0))

Output:

    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   0.0   90.0
2   Sita  30.0    0.0
3  Geeta   0.0   70.0

3. df.fillna(0) → Replace Missing Values

import pandas as pd
import numpy as np

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, np.nan, 30, np.nan],
    'Marks': [80, 90, np.nan, 70]
}

df = pd.DataFrame(data)
print(df.fillna(0))

Output:

    Name   Age  Marks
0   Amit  25.0   80.0
1  Rahul   0.0   90.0
2   Sita  30.0    0.0
3  Geeta   0.0   70.0

7. Indexing and Selection

Setting Index

df.set_index('Name', inplace=True)

Resetting Index

df.reset_index(inplace=True)

8. Data Aggregation and Grouping

Grouping allows splitting data into groups and applying functions.

import pandas as pd

data = {
    'Name': ['Amit', 'Rahul', 'Sita', 'Geeta'],
    'Age': [25, 25, 30, 40],
    'Marks': [80, 90, 90, 70]
}

df = pd.DataFrame(data)
group_by_age_sum = df.groupby('Age').sum()

print(group_by_age_sum)

Output:

          Name  Marks
Age                  
25   AmitRahul    170
30        Sita     90
40       Geeta     70

Common aggregation functions:

- sum() - mean() - count() - min(), max()

9. Reading and Writing Data

Reading Files

df = pd.read_csv('data.csv')
df = pd.read_excel('data.xlsx')

Writing Files

df.to_csv('output.csv')
df.to_excel('output.xlsx')

10. Difference Between Series and DataFrame

Feature	Series	DataFrame
Dimension	One-dimensional	Two-dimensional
Structure	Single column	Multiple columns
Data Types	Homogeneous or mixed	Mixed across columns
Use Case	Simple data	Complex datasets

Understanding Pandas Series and DataFrames is essential for anyone working in data science, machine learning, or data analysis. A Series provides a simple way to handle single-dimensional data, while a DataFrame offers a powerful structure for handling complex, tabular datasets.

Learn Pandas Series and DataFrames

1. Introduction to Pandas

Key Features:

2. Understanding Pandas Series

Creating a Series

Output:

With Custom Index

Output:

Key Characteristics of Series:

Accessing Data

Output:

Operations on Series

Output:

3. Understanding Pandas DataFrame

Creating a DataFrame

Output:

Key Characteristics:

4. Accessing Data in DataFrame

Column Selection

Output:

Multiple Columns

Output:

Row Selection

Output:

5. DataFrame Operations

Adding a New Column

Output:

Deleting a Column

Output:

Filtering Data

Output:

6. Handling Missing Data

Sample DataFrame with Missing Values

Output:

1. df.isnull() → Detect Missing Values

Output:

2. df.dropna() → Remove Rows with Missing Values

Output:

3. df.fillna(0) → Replace Missing Values

Output:

3. df.fillna(0) → Replace Missing Values

Output:

7. Indexing and Selection

Setting Index

Resetting Index

8. Data Aggregation and Grouping

Output:

Common aggregation functions:

9. Reading and Writing Data

Reading Files

Writing Files

10. Difference Between Series and DataFrame

Nagesh Chauhan

Share this Article

💬 Comments

Join the Discussion