# SSM Using Scalismo (5): Creating a point distribution model

Finally, I arrive at statistical shape models. I am going to create a point distribution model (PDM) of shapes. Reading some theory on the point distribution models, I need superimposed shapes and their mean shape. Superimposed shapes and their mean shapes are available from my previous tutorial on GPA with Scalismo. The outcomes of the full GPA is going to be utilized for creating a PDM. The PDM in this tutorial uses shapes (not forms), and it is based on the multivariate normal distribution of the points’ coordinates collected in a shape vector (Eq. 16 in this post).

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

object PDM extends App {

import scalismo.geometry._
import scalismo.common._
import scalismo.ui.api._
import scalismo.mesh._
import scalismo.io.{StatisticalModelIO, MeshIO}
import scalismo.statisticalmodel._
import scalismo.registration._
import scalismo.statisticalmodel.dataset._

scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
// ui
val ui = ScalismoUI()

val drive = "I:/"
val meshDir = drive + "registration/Full GPA/"
val unsortedMeshFiles = new java.io.File(meshDir).listFiles()
unsortedMeshFiles.foreach(file => println(s"\n\$file"))
println(unsortedMeshFiles)
// specifying the file name of the mean mesh generated through the full GPA
val meanMeshName = "meanMesh.stl"
val meshFileNames = unsortedMeshFiles.map(file => file.getName())
val indexOfMeanMesh = meshFileNames.indexOf(meanMeshName)
// bringing the meanMesh to the head (first entry) of the meshFiles sequence
val meshFiles = unsortedMeshFiles.updated(0, unsortedMeshFiles(indexOfMeanMesh)).updated(0, unsortedMeshFiles(indexOfMeanMesh))

I have to choose a reference mesh and some training meshes. The full GPA mean mesh is chosen as the reference mes and the three (registered and superimposed) meshes as the training meshes.

// specifying the reference mesh and the training meshes; the full GPA mean mesh is set to be the reference mesh
val trainingMeshes = meshFiles.tail.map(file => MeshIO.readMesh(file).get)//the other 3 meshes

Once a reference mesh and training meshes are chosen, the deformation vector from each point of the reference mesh to its corresponding point of each training mesh should be calculated. Thereby, the deformation fields morphing the reference mesh to the training meshes are obtained.

 // [The meshes are already aligned using the full GPA, and in correspondence, so we can create a discrete GP]
// creating the deformation vector field
val deformationFields = trainingMeshes.map(mesh => {
val deformationVectors = refMesh.pointSet.pointIds.map { id =>
mesh.pointSet.point(id) - refMesh.pointSet.point(id)
}.toIndexedSeq
// a discrete deformation field needs a domain and the deformation vectors at each point of the domain
DiscreteField[_3D, UnstructuredPointsDomain[_3D], EuclideanVector[_3D]](refMesh.pointSet, deformationVectors)
})

The calculated deformation fields are samples (realizations) of a discrete random field. The domain of the random field is the reference mesh points. The random field of deformations is assumed to have a multivariate normal distribution. Here, the discrete random field is expressed by a random vector. Thereby, a shape, described by a shape vector, can be produced by adding the shape vector of the reference shape to the normally distributed random vector of deformations expanded by the deformation modes (this is explained by Eq. 16 in this post).

Technically, Scalismo works with the concept of Gaussian process (GP) which is a continuous random field whose any finite sub-collection of random variables (here deformations) has a multivariate normal (Gaussian) distribution. Therefore, continuous sample deformation fields should be approximated based on the known discrete realizations/samples of the random field. By interpolating a discrete field, Scalismo creates a continuous field. Different interpolation approaches are possible; Scalismo uses the Nearest Neighbor Interpolation method. The following code block defines a discrete GP (of the deformations) which is in fact a normally distributed random vector with elements indexed by the reference mesh points (a discrete domain). The discrete GP (random vector of deformations) is expressed in terms of its principal axes (modes). This is done using the PCA as described in this post (see Eq. 16).

  val interpolator = NearestNeighborInterpolator[_3D, EuclideanVector[_3D]]()
val continuousDefFields = deformationFields.map(field => field.interpolate(interpolator))
val gp = DiscreteLowRankGaussianProcess.createUsingPCA(refMesh.pointSet, continuousDefFields)

As I have 3 training meshes and therefore 3 training deformation fields, the rank of the sample covariance matrix of deformations is at most 3. This means that at most 3 principal axes are used to express the random vector of deformations.

I can sample the GP (random field) of deformations. By adding a sample deformation field to the reference mesh, I can generate a shape vector. This shape vector is a sample mesh from meshes whose points’ coordinates are normally distributed. Sampling some meshes from the mesh distribution, I’ll have the following realization:

  val PDMGroup = ui.createGroup("refMeshGroup")
val refMeshView = ui.show(PDMGroup, refMesh, "refMesh")
refMeshView.color = java.awt.Color.GREEN
val gpDefView = ui.addTransformation(PDMGroup, gp, "RefMeshDefrmdByGp")

The horizontal bar-like controls in the figures are proportional to the coefficients of the deformation modes (b_i=\alpha_i\sqrt{\lambda_i}). Each coefficient varies within three standard deviations (Eq. 4 in this post).

Scalismo assembles the reference mesh and the discrete GP as a statistical mesh model:

// [creating a statistical shape model (PDM)
val model = StatisticalMeshModel(refMesh, gp.interpolate(interpolator))

This is the statistical PDM.

# Point Distribution Models of Shapes and Forms

A point distribution model (PDM) is a model that locates (or determines the geometrical (spatial) distribution of) the points of the shapes or forms (size-and-shapes) in a particular shape or form population. The word distribution refers to the spatial distribution of the points; not a probability distribution. This is my best understanding of the definition of PDM’s so far.

A PDM approximates the geometrical distribution of the points of shapes/forms in a population. This means that a PDM is intended to model/express all the possible shapes/forms in the population.

A PDM entails shapes or forms. This means we are working with superimposed configurations modulo their location, rotation, and scale information for the case of shapes, and their location and rotation information in the case of forms. Therefore, we assume that shapes or forms are available through performing a full or partial generalized Procrustes analysis (GPA) on the given configurations. In other words, a shape is an instance of the full Procrustes fits, and a form is an instance of partial Procrustes fits. Henceforward, whatever includes the concept of a shape in this article also holds for a form unless declared.

As explained, a shape can be (approximately) represented using landmarks, i.e. discrete set of points. By choosing a 3D Cartesian coordinate system (CS) with orthogonal basis vectors and x,y, and z coordinates (axes), the landmark coordinates can then be collected in a matrix. One can readily think of collecting the coordinates in a column vector (in fact every matrix is isomorphic to a vector). Therefore, a shape in a set of shapes, all having the same number of N landmarks is represented as a vector called the shape vector [1]:

X_i=\big[x_{1x}^i,x_{1y}^i,x_{1z}^i,\dots,x_{Nx}^i,x_{Ny}^i,x_{Nz}^i\big]^\text T \in\mathbb{R}^{3N\times 1}\tag{1}

See this post for linear algebra conventions a identities.

A shape vector X is a member of the vector space \mathbb{R}^{3N\times 1}. Therefore, there are bases, and X can be written as a linear combination of the basis vectors. By default, the 3N-D X vector is born in the vector space spanned by the standard basis. Indeed, other bases are possible.

Using a basis \{v_1,\dots,v_{3N}\} for the vector space, we can write any shape vector as a deviation from an arbitrary constant (shape) vector a\in \mathbb{R}^{3N\times 1} :

It should be noted that, a is a point (position vector) and Vb is a free vector in the concept of geometrical modelling.

On the other hand, X is a random vector with with a mean \mu_X and a covariance \Sigma_{XX}. The constant vector a can be replaced with \mu_X. Any other shape vector, like a known arbitrary shape vector in the population, can also be substituted for a. The basis vectors can be chosen such that the linear projection of X onto them generates random variables each having maximum possible variance. These vectors are obtained by PCA as a variance-maximization method. Because \Sigma_{XX} is symmetric, it has 3N independent and mutually orthogonal eigenvectors, and hence any shape vector can be written as:

\begin{gathered} X=\mu_X+Vb=\mu_X+\sum_{i=1}^{3N}b_iv_i \tag{2} \\ \ \\ \text{s.t. }\ \Sigma_{XX}=V\varLambda V^\text T\\ \varLambda\text{ is diagonal and }\varLambda_{ii}=\lambda_i\quad\text{s.t. } \lambda_1\ge\lambda_2\ge \dots \ge \lambda_{3N}\ge 0 \end{gathered}

where, the term V\varLambda V^\text T is the singular value decomposition (SVD) of \Sigma_{XX} such that the eigenvalues, \lambda_i‘s, collected in the diagonal matrix \varLambda are sorted in the decreasing order. As the covariance matrix is symmetric, SVD is equivalent to the spectral/eigenvalue decomposition of the matrix.

So far we showed that a shape instance (represented by a shape vector) in the population can be generated by deforming the mean shape through a deformation field (vectors) which is determined by linear combinations of the eigenvectors of \Sigma_{XX} . The eigenvectors of \Sigma_{XX} are principal (3N dimensional) directions along which the variances of the linearly projected (random) shape vector are maximized. The principal directions are called deformation modes (deformations from the mean). The deformation modes are also called eigenshapes. The combining coefficients b_i‘s are called the shape model parameters. The deformation modes weighted by the shape model parameters adds up to produce a deformation vector that will be added to the mean shape vector to generate a shape. Such a representation is called the modal representation; this is the idea behind a PDM.

By choosing proper shape parameters, the model generates shapes in the shape population. It should be noted that there is generally no restrictions on the values of b_i‘s; in other words, the deformation modes can be combined arbitrarily. However, a generated shape out of an arbitrary values of b_i‘s may not fall into the shape population; this is not difficult to imagine.

If we let U:=X-\mu_X, then U is a random vector expressing the deviation of a shape from the mean shape of the population. I call this vector the deviation or the deformation shape vector. This vector contains deformation components (in three directions of the CS) of the mean shape at each of its points. The mean of the deformation shape vector is zero, \mu_U=0. In fact, \Sigma_{XX}=\Sigma_{UU}. This means that the elements of \Sigma_{XX} are variances (or covariances) of the deformation components. U can be linearly projected on the directions (unit vectors) defined by the eigenvectors of \Sigma_{XX}:

U_p=V^\text T U\ \in\mathbb{R}^{3N\times 1}

and by considering Eq. 2, we can conclude (it is simply the result of the linear projection):

U=Vb\implies U_p=V^\text T Vb=I_rb=b

Just as a side note: \mu_U=\mu_{U_p}=\mu_b=0.

On the other hand \Sigma_{bb}=\Sigma_{U_pU_p}=V \Sigma_{UU}V^\text T = \varLambda. This is what we can conclude:

\begin{gathered}X=\mu_X+Vb=\mu_X+\sum_{i=1}^{3N}b_iv_i \tag{3} \\ \ \\ \text{s.t. }\ \Sigma_{bb}=\varLambda= \begin{bmatrix}\lambda_1&0&\dots\ &0\\ 0 &\lambda_2\ &\dots\ &0\\ \vdots&\vdots &\ddots &\vdots\\ 0&0&0&\lambda_{3N} \end{bmatrix}\quad \text{and}\quad \mu_b=0\\ \ \\ \text{with}\\ \ \\ \|b_iv_i\|=|b_i|\end{gathered}

Note that \mu_b=0 because \mu_U=0.

It is obvious that the magnitude of b_i controls the magnitude of deformation of the mean shape by each mode. From above, the standard deviation of b_i is \sqrt{\lambda_i}. It is assumed that a PDM generates plausible shapes, or well behaved shapes, if the shape parameters are limited as:

-3\sqrt{\lambda_i}\le b_i\le +3\sqrt{\lambda_i}\tag{4}

A plausible shape is created by adding a plausible deformation shape vector to the mean shape vector; a plausible deformation shape vector is a deformation vector that generates shapes still belong to the shape family under consideration.

The covariance matrix \Sigma_{XX} (and also \Sigma_{bb}) is positive semi-definite; therefore, \lambda_i\ge 0. This implies that the variance of some b_i‘s can be zero, i.e. the zero-variance b_i‘s are constant. In that case, b_i=\mu_{b_i}=0 simply because no variation about the mean. Therefore, a deformation mode associated with a shape parameter b_i with zero variance (\lambda_i=0) vanishes from the modal representation. Assuming M zero eigenvalues (including algebraic multiplicity), we can write the modal representation of a shape (vector) as:

X=\mu_X+\sum_{i=1}^{3N-M}b_iv_i \tag{5}

### Shape approximation

A PDM is based on the modal representation of shapes (Eq. 2 or 3). This kind of representation is useful when it comes to approximation of a shape. In this regard, only some of the deformation modes being relatively more effective in forming a shape is kept and the rest of the modes are ignored. We already showed (Eq. 5) that zero eigenvalue of the covariance matrix leads to zero variation of the corresponding shape parameter and consequently vanishing of the associated mode in the modal representation. This motivates us to throw out the shape parameters with relatively small variances, hence, their associated modes; thereby, approximating a shape vector by a t-term compact modal representation:

\begin{gathered} X\approx \tilde X=\mu_X+\tilde Vb=\mu_X+\sum_{i=1}^{t}b_iv_i\\ \ \\ \text{s.t. }\ b\in \mathbb{R}^{t\times 1}\ , \tilde V=[v_1\quad v_2\ \dots\quad v_t] \in \mathbb{R}^{3N\times t}\ ,\text{ and } t<3N \tag{6} \end{gathered}

where v_i‘s are as in Eq.2.

Moreover, by the Least-squares optimality of PCA, we can approximate the shape vector as (see Eq. 3a in this post):

\begin{gathered} X\approx\tilde X =\mu_X + \tilde V\tilde V^\text T(X-\mu_X)= \mu_X +\sum_{i=1}^{t}v_iv_i^\text T(X-\mu_X) \tag{7} \\ \ \\ \text{s.t. }\ b\in \mathbb{R}^{t\times 1}\ , \tilde V=[v_1\quad v_2\ \dots\quad v_t] \in \mathbb{R}^{3N\times t}\ ,\text{ and } t<3N \end{gathered}

This indicates that for approximating a particular shape vector, the shape model parameters should be chosen as:

But, how many modes i.e. t do we need? it can be determined based on the goodness-of-fit measure defined for PCA as a dimension reduction tool (Eq. 7 in this post):

g(t) = \frac{\sum_{i=t+1}^{3N} \lambda_i}{\text{Trace}(\Sigma_{XX})=\sum_{i=1}^{3N} \lambda_i}\tag{9}

Therefore, we only need to keep the first t modes having large associated eigenvalues. Actually, we hope that the first t eigenvalues are substantially larger than the rest of the eigenvalues. As a suggestion g(t)\le 0.05 is ok.

### Sample shapes (data set)

In practice, the mean and covariance of the shape (random) vector X are unknown and should be estimated. Let S=\{S_1,\dots,S_n\} be a set of sample shapes (each represented by a shape vector S_i) from a particular shape family population. This is, in fact, a set of independent observations of identically distributed random variables. This set is called the training set. As a remark, it should be noted that we generally start with a set of configurations, X_c=\{C_1,\dots ,C_n\} out of a configuration population; for example the set of all dinosaurs. Then, we perform the full or the partial GPA to obtain a set of shapes/forms already denoted by S. Having said that, the following estimates are considered [2]:

\begin{gathered} \hat \mu_X=\bar S:=\frac{1}{n}\sum_{i=1}^{n} S_i \ \in \mathbb{R}^{3N\times 1}\tag{10}\\ \ \\ \hat\Sigma_{XX}= \frac{1}{n-1}\sum_{i=1}^n (S_i-\bar S)(S_i-\bar S)^\text T\ \in \mathbb{R}^{3N\times 3N} \end{gathered}

where n is the number of training shapes. The covariance estimate is a symmetric matrix, therefore, it has 3N independent eigenvectors, however, has at most rank n (properties of rank in this post), i.e. non-zero eigenvalues. Therefore, the modal representation of the training shapes and other shapes in the population predicted based on the model generated using sample shapes (training shapes) is:

X=\bar S+\sum_{i=1}^{n}b_i\hat v_i \tag{11}

where \hat v_i is an eigenvector of \hat\Sigma_{XX}.

When n is large enough, then we can use the following criterion to select the first t modes with relatively large eigenvalues:

g(t) = \frac{\sum_{i=t+1}^{n} \lambda_i}{\text{Trace}(\hat \Sigma_{XX})=\sum_{i=1}^{n} \lambda_i}\tag{12}

This criterion is similar to Eq. 9.

## PDM’s with normal distributions

It can be assumed that a shape vector X, being a random vector, has a multivariate normal distribution i.e. [1]:

X\sim \mathscr N(\mu_X,\Sigma_{XX})\tag{13}

This assumption can be manipulated toward a more practical form. Let’s consider the following proposition:

If Y\in \R^{r\times 1} has a multivariate normal distribution Y\sim \mathscr N(\mu_Y,\Sigma_{YY}), A\in \R^{s\times r}, and c\in \R^{s\times 1} being a constant vector, then, the linear transformation X=AY+c is distributed as X\sim \mathscr N(A\mu_Y+c,A\Sigma_{YY}A^\text T) .

Using the proposition and writing the shape vector in its modal form as X=\mu_X+Vb, we can conclude:

X=\mu_X+Vb\sim \mathscr N(\mu_X,\Sigma_{XX})\\ \ \\ \iff X\sim \mathscr N(V\mu_b+\mu_X,V\Sigma_{bb}V^\text T)\\ \ \\ \iff b\sim \mathscr N(0,\Sigma_{bb})

where \Sigma_{bb} is the diagonal matrix of the eigenvalues as in Eq. 3.

By the marginalization property of normal distributions:

b_i\sim \mathscr N(0,\lambda_i)

and, we can write:

\alpha_i:=\frac{b_i}{\sqrt{\lambda_i}}\implies \alpha_i\sim \mathscr N(0,1)

Therefore, the modal representation of the shape vector becomes:

X=\mu_X+\sum_{i=1}^{3N}\alpha_i\sqrt{\lambda_i}v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{14}

For shape vector approximation and sample shape vector, the following equations (based on Eq. 6 and 11) are readily obtainable:

\tilde X=\mu_X+\sum_{i=1}^{t}\alpha_i\sqrt{\lambda_i}v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{15} X=\bar S+\sum_{i=1}^{n}\alpha_i\sqrt{\lambda_i}\hat v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{16}

## Shape space spanned by the PDM

A shape vector X\in \R^{3N} can be represented by any of the equations 6, 11, 15, or 16. In any of the equations, a 3N dimensional shape vector is presented by a linear combination of basis vectors whose number of them is less than the dimension of the shape vector. This means that a PDM spans a space (of shape vectors) which is a subspace of \R^{3N} . Therefore, a PDM model established by training shapes can only constitutes shapes born out of the linear combinations of the shape modes, \hat v_i‘s. Such a model then misses the shapes (in the population) which are not captured by the training set of shapes or cannot be represented by linear combinations of the shape modes.

#### References

[1] Gaussian Process Morphable Models, Marcel Luthi , Thomas Gerig, Christoph Jud, and Thomas Vetter. IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 40 , Issue: 8 , Aug. 1 2018).

[2] Image Processing and Analysis (Chapter 7: Model-Based Methods in Analysis of Biomedical Images). R. Baldock and J. Graham, Oxford University Press, 2000.