Miscellaneous Proofs for SSM (1)

1) The sum of the squared of distances is minimized by the vector means

Prove \hat\mu=\frac{1}{n}\sum_{i=1}^{n}X_i=\argmin_{\mu}\sum_{i=1}^{n} |X_i-\mu|^2. X_i \text{ and } \mu are vectors and \|.\| is a vector norm coming from the inner product defined on the vector space; \|v\|^2=\lang v,v \rang.

Proof:

\begin{aligned} \argmin_{\mu} \sum_{i=1}^{n} \|X_i-\mu\|^2=\argmin_{\mu}\sum_{i=1}^{n} \lang X_i-\mu , X_i-\mu \rang \\ =\argmin_{\mu}\sum_{i=1}^{n} \lang X_i,X_i \rang -2\lang X_i,\mu \rang + \lang\mu,\mu \rang \\ \text{as} \lang X_i,X_i\rang \text{ is a constant not depending on } \mu \\ =\argmin_{\mu}\sum_{i=1}^{n} -2\lang X_i,\mu\rang + \lang \mu,\mu \rang\\ =\argmin_{\mu}(n\lang \mu,\mu \rang-2 \sum_{i=1}^{n} \lang X_i,\mu \rang)\\ \text{by the linearity of inner product}\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2\lang\sum_{i=1}^{n} X_i,\mu\rang)\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2n\lang\frac{1}{n}\sum_{i=1}^{n} X_i,\mu\rang)\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2n\lang\ \overline{X},\mu\rang)\\ \text{dividing by a constant, n, won’t affect the }\argmin\\ =\argmin_{\mu}(\lang \mu,\mu \rang -2\lang\ \overline{X},\mu\rang)\\ \text{adding the constant} \lang \overline{X},\overline{X} \rang \text{ won’t affect the } \argmin \text{ therefore: }\\ =\argmin_{\mu}(\lang \mu,\mu \rang -2\lang\ \overline{X},\mu\rang +\lang \overline{X},\overline{X} \rang)\\ =\argmin_{\mu}(\lang \mu – \overline{X},\lang \mu – \overline{X}\rang )\\ =\argmin_{\mu}(\| \mu – \overline{X}\|^2 )\\ \end{aligned}

Because \|.\| is larger than or equal zero, the minimum of the norm is zero, hence:

\hat\mu=\argmin_{\mu}(\| \mu – \overline{X}\|^2 )=\overline{X}\qquad\checkmark

In a case that X_i and \mu are matrices, the norm is the Frobenius norm and the inner product is the Frobenius inner product.

2) Proof regarding the full GPA method

Proving the following:

\inf_{\beta_i,\Gamma_i,\gamma_i}\sum_{i=1}^{n} \|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – \frac{1}{n} \sum_{j=1}^{n} (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2\\ =\\ \inf_{\beta_i,\Gamma_i,\gamma_i}\frac{1}{n}\sum_{i=1}^{n} \sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2

The “inf” operator is not restated in the proof. Let A_i:=\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}. Therefore:

\sum_{i=1}^{n} \|A_i – \frac{1}{n} \sum_{j=1}^{n} A_j\|^2 =\frac{1}{n^2}\sum_{i=1}^{n} \|nA_i – \sum_{j=1}^{n} A_j\|^2\tag{I}

The Frobenius norm comes from the Frobenius inner product, i.e.  \forall A\in \mathbb{R}^{k\times m}\  ,\|A\|^2=\lang A,A \rang. Therefore, above continues as:

=\frac{1}{n^2}\sum_{i=1}^{n} \lang nA_i – \sum_{j=1}^{n} A_j,nA_i – \sum_{j=1}^{n} A_j\rang

Inner product is a bi-linear form and hence by its linearity and commutative properties:

\begin{aligned} =\frac{1}{n^2}\sum_{i}( \lang nA_i – \sum_{j} A_j,nA_i\rang – \lang nA_i – \sum_{j} A_j,\sum_{j} A_j\rang)\\ =\frac{1}{n^2}\sum_{i}( \lang nA_i,nA_i\rang – \lang nA_i, \sum_{j} A_j\rang- \lang \sum_{j} A_j,nA_i\rang \\+ \lang \sum_{j} A_j,\sum_{j} A_j\rang)\\ =\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\lang A_i, \sum_{j} A_j\rang + \lang \sum_{j} A_j,\sum_{j} A_j\rang) \end{aligned}

Again by applying the linearity property of inner product on the finite sums:

=\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\sum_{j}\lang A_i, A_j\rang + \sum_{l}\sum_{j}\lang A_l, A_j\rang)

Applying the \sum_{i}:

\begin{aligned} =\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\sum_{j}\lang A_i, A_j\rang + \sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ =\frac{1}{n^2}(n^2\sum_{i}\lang A_i,A_i\rang – 2n\sum_{i}\sum_{j}\lang A_i, A_j\rang\\ + \sum_{i}\sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ =\frac{1}{n^2}(n^2\sum_{i}\lang A_i,A_i\rang – 2n\sum_{i}\sum_{j}\lang A_i, A_j\rang + n\sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ \text{As the last two double-sigma terms are identical:}\\ =\frac{1}{n}(n\sum_{i}\lang A_i,A_i\rang – \sum_{i}\sum_{j}\lang A_i, A_j\rang )\tag{II} \end{aligned}

The rest can be done by induction, but for simplicity we continue with writing a number of terms to observe the pattern and reach the final result. With |A-B|^2=\lang A,A\rang -2\lang A,B\rang + \lang B,B\rang , and letting n=4, the expansion of the above term in the parentheses (last equation) becomes:

(4-1)\lang A_1,A_1\rang+(4-1)\lang A_2,A_2\rang+(4-1)\lang A_3,A_3\rang+(4-1)\lang A_4,A_4\rang\\ -2\lang A_1,A_2\rang -2\lang A_1,A_3\rang-2\lang A_1,A_4\rang -2\lang A_2,A_3\rang -2\lang A_2,A_4\rang\\ -2\lang A_3,A_4\rang\\ =\lang A_1,A_1\rang -2\lang A_1,A_2\rang +\lang A_2,A_2\rang\\ +\lang A_1,A_1\rang -2\lang A_1,A_3\rang +\lang A_3,A_3\rang\\ +\lang A_1,A_1\rang -2\lang A_1,A_4\rang +\lang A_4,A_4\rang\\ +\lang A_2,A_2\rang -2\lang A_2,A_3\rang +\lang A_3,A_3\rang\\ +\lang A_2,A_2\rang -2\lang A_2,A_4\rang +\lang A_4,A_4\rang\\ +\lang A_3,A_3\rang -2\lang A_3,A_4\rang +\lang A_4,A_4\rang\\ \begin{aligned} =\|A_1-A_2\|^2+\|A_1-A_3\|^2+\|A_1-A_4\|^2+\|A_2-A_3\|^2\\ +\|A_2-A_4\|^2+\|A_3-A_4\|^2 \\ =\sum_{i=1}^{4}\sum_{\underset{{j\lt 4}}{j=i+1}}^{4}\|A_i-A_j\|^2 \end{aligned}

Consequently, for any number of terms Eq. II becomes \frac{1}{n}\sum_{i=1}^{n}\sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|A_i-A_j\|^2. Hence, recalling Eq. I, we complete the proof:

\sum_{i=1}^{n} \|A_i – \frac{1}{n} \sum_{j=1}^{n} A_j\|^2=\frac{1}{n}\sum_{i=1}^{n}\sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|A_i-A_j\|^2\qquad\checkmark