Pandas
Pandas
DataFrames, indexing, groupby, merge, cleaning, and visualization — complete pandas reference.
01DataFrame Basics▼
PANDASCreating & loading
import pandas as pd
# Create DataFrame
df = pd.DataFrame({"name":["Ali","Sara"],"age":[25,30]})
# Load from file
df = pd.read_csv("file.csv")
df = pd.read_excel("file.xlsx")
df = pd.read_json("file.json")
# Basic info
df.head(5) # first 5 rows
df.tail(5) # last 5 rows
df.shape # (rows, cols)
df.info() # dtype, nulls
df.describe() # stats summary
df.columns # column names
df.dtypes # data types02Selection & Indexing▼
PANDASSelecting data
# Select column
df["age"] # Series
df[["name","age"]] # DataFrame
# iloc: integer location
df.iloc[0] # row 0
df.iloc[0,1] # row 0, col 1
df.iloc[:5,:] # first 5 rows
# loc: label based
df.loc[0] # row with index 0
df.loc[0,"name"] # specific cell
# Boolean filtering
df[df["age"]>25]
df[(df["age"]>25) & (df["name"]=="Ali")]
df.query("age > 25")03Data Cleaning▼
PANDASHandling missing data
df.isnull().sum() # count nulls per column df.dropna() # drop rows with any null df.dropna(subset=["age"]) # drop if age is null df.fillna(0) # fill nulls with 0 df.fillna(df.mean()) # fill with column mean df["age"].fillna(df["age"].median(), inplace=True) # Duplicates df.duplicated().sum() # count duplicates df.drop_duplicates() # remove duplicates # Data types df["age"] = df["age"].astype(int) df["date"] = pd.to_datetime(df["date"])
04GroupBy & Aggregation▼
PANDASGroupBy operations
# Group by and aggregate
df.groupby("city")["salary"].mean()
df.groupby("city").agg({"salary":"mean","age":"max"})
df.groupby("city").size() # count per group
# Pivot table
df.pivot_table(values="salary", index="dept", columns="gender", aggfunc="mean")
# Apply custom function
df["age_doubled"] = df["age"].apply(lambda x: x*2)
df.apply(lambda row: row["salary"]/row["age"], axis=1)❓ Quiz
How do you select rows where age > 25?
Use boolean indexing: df[df["age"]>25]. The inner df["age"]>25 creates a boolean Series, used to filter rows.
05Merge & Join▼
PANDASMerging DataFrames
# Inner join (only matching keys) result = pd.merge(df1, df2, on="id", how="inner") # Left join (all from left) result = pd.merge(df1, df2, on="id", how="left") # Concatenate (stack) result = pd.concat([df1, df2], axis=0) # vertically result = pd.concat([df1, df2], axis=1) # horizontally # Join on index df1.join(df2, how="left")
💡
inner: only matching rows. left: all left rows, NaN for non-matching right. outer: all rows from both, NaN for non-matches.
06Export & Visualization▼
PANDASSaving & plotting
# Export
df.to_csv("output.csv", index=False)
df.to_excel("output.xlsx", index=False)
df.to_json("output.json")
# Quick plots (requires matplotlib)
df["age"].hist() # histogram
df.plot(x="age",y="salary",kind="scatter") # scatter
df["city"].value_counts().plot(kind="bar") # bar chart
# Sort
df.sort_values("age", ascending=False)
df.sort_values(["dept","salary"])