Statistics for AI Cheatsheet

Stats for AI

Statistics for AI

Probability, distributions, hypothesis testing, Bayesian thinking, and information theory for ML practitioners.

📖 5 sections

⏰ 15 min read

✅ Quizzes included

01Probability Foundations▼

Sample space

Set of all possible outcomes.

Event

Subset of sample space.

P(A)

0 to 1. P(certain)=1, P(impossible)=0.

Independence

P(A,B)=P(A)*P(B). Knowing A does not affect B.

Conditional

P(A|B)=P(A,B)/P(B). P(A) given B occurred.

Bayes' Theorem

P(A|B)=P(B|A)*P(A)/P(B). Update beliefs with evidence.

STATSBayes example

# Medical test: 1% have disease (prior)
# Test: 99% true positive, 1% false positive
P_disease = 0.01
P_pos_given_disease = 0.99
P_pos_given_healthy = 0.01

P_pos = P_pos_given_disease*P_disease + P_pos_given_healthy*0.99
P_disease_given_pos = P_pos_given_disease*P_disease/P_pos
# Result: only ~50% if test positive!

02Key Distributions▼

Distribution	PMF/PDF	Mean	Variance	ML use
Bernoulli(p)	p^x*(1-p)^(1-x)	p	p(1-p)	Binary classification
Binomial(n,p)	C(n,k)p^k(1-p)^(n-k)	np	np(1-p)	Count successes
Gaussian N(mu,sigma^2)	(1/sigmasqrt(2pi))exp(-((x-mu)/sigma)^2/2)	mu	sigma^2	Most ML problems
Poisson(lambda)	lambda^k*e^-lambda/k!	lambda	lambda	Event counts
Exponential(lambda)	lambdae^(-lambdax)	1/lambda	1/lambda^2	Time between events

03Statistical Inference▼

STATSConfidence intervals & tests

# Confidence interval (sample mean)
CI = x_bar +- z*(sigma/sqrt(n))
95% CI: z=1.96, 99% CI: z=2.576

# t-test (unknown population std)
t = (x_bar - mu0) / (s/sqrt(n))
degrees of freedom = n-1

# p-value interpretation:
p < 0.05: reject H0 (significant)
p > 0.05: fail to reject H0

# Effect size (Cohen d)
d = (mean1 - mean2) / pooled_std
Small: d=0.2, Medium: d=0.5, Large: d=0.8

❓ Quiz

In ML, what does a p-value < 0.05 indicate?

p < 0.05 means there is strong statistical evidence against the null hypothesis (less than 5% chance results occurred by chance if H0 is true).

04Information Theory▼

Entropy H(X)

-sum(p*log2(p)). Measures uncertainty/randomness.

High entropy

Uniform distribution. Maximum uncertainty.

Low entropy

Concentrated distribution. Predictable.

Cross-entropy

H(p,q)=-sum(p*log(q)). Loss function in classification.

KL divergence

D_KL(P||Q)=sum(p*log(p/q)). How different Q is from P.

Mutual information

How much knowing X reduces uncertainty about Y.

STATSEntropy calculation

# Binary: p=0.5 vs p=0.9
import numpy as np

def entropy(p):
    return -p*np.log2(p)-(1-p)*np.log2(1-p)

entropy(0.5)  # 1.0 (maximum)
entropy(0.9)  # 0.469 (less uncertain)
entropy(1.0)  # 0.0 (certain)

# Cross-entropy loss (classification)
loss = -sum(y_true * log(y_pred))

05Correlation & Regression▼

STATSCorrelation types

# Pearson r: linear correlation
r = cov(X,Y)/(std(X)*std(Y))

# Spearman: rank correlation (non-linear)
from scipy.stats import spearmanr
r_s, p = spearmanr(x, y)

# Linear regression OLS
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
print(model.coef_, model.intercept_)

# R-squared: proportion of variance explained
model.score(X_test, y_test)

⚠

Correlation does NOT imply causation. Always check for confounding variables before drawing conclusions.