Probability and Stats Cheat Sheet (1)

0- Conventions

1- r-Vectors in matrix operations are considered as column vectors (matrices):

x\in \mathbb{R}^r\equiv\begin{bmatrix}x_1\\x_2\\x_3\\ \vdots \\ x_r \end{bmatrix}\in\mathbb{R}^{r\times 1} \implies x^{\rm T}=\begin{bmatrix}x_1&x_2&x_3& \dots & x_r \end{bmatrix}

2- Column vector inner/dot product:

x.y \equiv (x)^\text T(y)=(y)^\text T(x)=x^\text Ty=y^\text Tx\ \in \mathbb{R} \ \forall x,y\in \mathbb{R}^{r\times 1}

1- Population expected value and covariance matrix

1- X a random r-vector [X_1, \dots, X_r]^\text{T} :

\mu_X= \text{E[X]}:=[\text E[X_1],\dots,E[X_r]]^\text T=[\mu_1,\dots,\mu_r]^\text T\\ \ \\ \text{E}[X^\text T]:=(\text E[X])^\text T \Sigma_{XX}:=\underset{\color{blue}{\text or\ Var(X)}}{\text{Cov}}(X,X):=\text E\big[(X-\mu_X)(X-\mu_X)^\text T\big]\ =\sigma_{ij}^2 \in\mathbb{R}^{r\times r}

where, \sigma_{ij}^2=\text{var}(X_i)=\text E[(X_i-\mu_i)^2] \ \text { for }i=j, and \sigma_{ij}^2=\text{cov}(X_i,X_j)=\text E[(X_i-\mu_i) (X_j-\mu_j) ] \ \text { for } i\ne j.

\Sigma_{XX}=\text E[XX^\text T]-\mu_X\mu_X^\text T \Sigma_{XX}=\text E[XX^\text T] \iff \mu_X=0

2- For random vectors X\in \mathbb{R}^{r\times 1} and Y\in\mathbb{R}^{s\times 1}:

\Sigma_{XY}:=\text{Cov}(X,Y):=\text E\big [(X-\mu_X)(Y-\mu_Y)^\text T\big]=\Sigma_{YX}^\text T\ \in\mathbb{R}^{r\times s} \Sigma_{XY}=\text E\big [XY^\text T\big] \iff \mu_X=\mu_Y=0

Remark: \Sigma_{XX} is a symmetric matrix but \Sigma_{XY} is not necessarily symmetric.


3- Y \in \mathbb {R}^{s \times 1} linearly related to X \in \mathbb {R}^{r \times 1} as Y=AX+b with A\in \mathbb {R}^{s \times r} and b\in \mathbb {R}^{s \times 1} a constant:

\mu_Y=A\mu_X + b \ \in \mathbb{R}^{s \times 1}\\ \ \\ \Sigma_{YY}=A\Sigma_{XX} A^\text T\ \in \mathbb{R}^{s \times s}

4- X,\ Y\in \mathbb {R}^{r \times 1} and Z\in \mathbb {R}^{s \times 1}:

\text {Cov}(X+Y,Z)=\Sigma_{XZ}+\Sigma_{XY}\\ \ \\ \text {Cov}(X+Y,X+Y)=\Sigma_{XX}+\Sigma_{XY}+\Sigma_{YX}+\Sigma_{YY}

5- X \in \mathbb {R}^{r \times 1}, Y\in \mathbb {R}^{s \times 1}, A\in \mathbb {R}^{r \times s} and B\in \mathbb {R}^{r \times s}:

\text {Cov}(AX,BY)=A\Sigma_{XY}B^\text T\ \in \mathbb {R}^{r \times s}

For the proof, expand \text E[(AX-\text E [AX])(BY-\text E [BY])^\text T].

In a special case:

\text {Var}(AX)=\text {Cov}(AX,AX)=A\Sigma_{XX}A^\text T\ \in \mathbb {R}^{r \times r}

6- \Sigma_{XX} is positive semi-definite. Proof: show that \forall u\in \mathbb{R}^{r\times 1},\ u^\text T\Sigma_{XX}u \ge 0. Use 5 in the proof.

u^\text T\Sigma_{XX}u=\text{Cov}(u^\text TX, u^\text TX)=\text{Cov}(u^\text TX, (u^\text TX)^\text T)=\text{Var}(Z)\ge 0\\ \ \\ \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad Z:=u^\text TX \in \mathbb{R}

7- A random vector X is centred iff \mu_X=0. Any random vector can be centred by the transformation X_c=X-\mu_x. The following expressions are useful (proof is by expansion/writing the terms and induction):

\forall X\in \mathbb{R}^{r\times 1}, Y\in \mathbb{R}^{t\times 1}\\ \ \\ \Sigma_{XY}=\text E\big [(X-\mu_X)(Y-\mu_Y)^\text T\big]=\text E\big [X_cY_c^\text T\big]=\Sigma_{X_cY_c}\\ \ \\ \Sigma_{XX}=\text E\big [(X-\mu_X)(X-\mu_X)^\text T\big]=\text E\big [X_cX_c^\text T\big]=\Sigma_{X_cX_c} \text E[X_c^\text TX_c]=\text {tr}(\Sigma_{XX}) \text E[X_c^\text TC Y_c]=\text {tr}(C\Sigma_{XY}) \qquad C\in \mathbb{R}^{r\times t} \text E[Y_c^\text T C^\text T C Y_c]=\text {tr}(C\Sigma_{YY}C^\text T)

Above expressions involving the trace can have different form according to the properties of the trace.

8- Conditional probability density function: