Pandas
Pandas
DataFrames, indexing, groupby, merge, cleaning, and visualization — complete pandas reference.
📖 6 sections
⏰ 18 min read
✅ Quizzes included
01DataFrame Basics
PANDASCreating & loading
import pandas as pd

# Create DataFrame
df = pd.DataFrame({"name":["Ali","Sara"],"age":[25,30]})

# Load from file
df = pd.read_csv("file.csv")
df = pd.read_excel("file.xlsx")
df = pd.read_json("file.json")

# Basic info
df.head(5)          # first 5 rows
df.tail(5)          # last 5 rows
df.shape            # (rows, cols)
df.info()           # dtype, nulls
df.describe()       # stats summary
df.columns          # column names
df.dtypes           # data types
02Selection & Indexing
PANDASSelecting data
# Select column
df["age"]           # Series
df[["name","age"]]   # DataFrame

# iloc: integer location
df.iloc[0]          # row 0
df.iloc[0,1]        # row 0, col 1
df.iloc[:5,:]       # first 5 rows

# loc: label based
df.loc[0]           # row with index 0
df.loc[0,"name"]    # specific cell

# Boolean filtering
df[df["age"]>25]
df[(df["age"]>25) & (df["name"]=="Ali")]
df.query("age > 25")
03Data Cleaning
PANDASHandling missing data
df.isnull().sum()          # count nulls per column
df.dropna()                # drop rows with any null
df.dropna(subset=["age"])  # drop if age is null
df.fillna(0)               # fill nulls with 0
df.fillna(df.mean())       # fill with column mean
df["age"].fillna(df["age"].median(), inplace=True)

# Duplicates
df.duplicated().sum()      # count duplicates
df.drop_duplicates()       # remove duplicates

# Data types
df["age"] = df["age"].astype(int)
df["date"] = pd.to_datetime(df["date"])
04GroupBy & Aggregation
PANDASGroupBy operations
# Group by and aggregate
df.groupby("city")["salary"].mean()
df.groupby("city").agg({"salary":"mean","age":"max"})
df.groupby("city").size()  # count per group

# Pivot table
df.pivot_table(values="salary", index="dept", columns="gender", aggfunc="mean")

# Apply custom function
df["age_doubled"] = df["age"].apply(lambda x: x*2)
df.apply(lambda row: row["salary"]/row["age"], axis=1)
❓ Quiz
How do you select rows where age > 25?
Use boolean indexing: df[df["age"]>25]. The inner df["age"]>25 creates a boolean Series, used to filter rows.
05Merge & Join
PANDASMerging DataFrames
# Inner join (only matching keys)
result = pd.merge(df1, df2, on="id", how="inner")

# Left join (all from left)
result = pd.merge(df1, df2, on="id", how="left")

# Concatenate (stack)
result = pd.concat([df1, df2], axis=0)  # vertically
result = pd.concat([df1, df2], axis=1)  # horizontally

# Join on index
df1.join(df2, how="left")
💡
inner: only matching rows. left: all left rows, NaN for non-matching right. outer: all rows from both, NaN for non-matches.
06Export & Visualization
PANDASSaving & plotting
# Export
df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)
df.to_json("output.json")

# Quick plots (requires matplotlib)
df["age"].hist()            # histogram
df.plot(x="age",y="salary",kind="scatter")  # scatter
df["city"].value_counts().plot(kind="bar")  # bar chart

# Sort
df.sort_values("age", ascending=False)
df.sort_values(["dept","salary"])