# Miscellaneous Proofs for SSM (1)

### 1) The sum of the squared of distances is minimized by the vector means

Prove . are vectors and is a vector norm coming from the inner product defined on the vector space; .

Proof:

\begin{aligned} \argmin_{\mu} \sum_{i=1}^{n} \|X_i-\mu\|^2=\argmin_{\mu}\sum_{i=1}^{n} \lang X_i-\mu , X_i-\mu \rang \\ =\argmin_{\mu}\sum_{i=1}^{n} \lang X_i,X_i \rang -2\lang X_i,\mu \rang + \lang\mu,\mu \rang \\ \text{as} \lang X_i,X_i\rang \text{ is a constant not depending on } \mu \\ =\argmin_{\mu}\sum_{i=1}^{n} -2\lang X_i,\mu\rang + \lang \mu,\mu \rang\\ =\argmin_{\mu}(n\lang \mu,\mu \rang-2 \sum_{i=1}^{n} \lang X_i,\mu \rang)\\ \text{by the linearity of inner product}\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2\lang\sum_{i=1}^{n} X_i,\mu\rang)\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2n\lang\frac{1}{n}\sum_{i=1}^{n} X_i,\mu\rang)\\ =\argmin_{\mu}(n\lang \mu,\mu \rang -2n\lang\ \overline{X},\mu\rang)\\ \text{dividing by a constant, n, won’t affect the }\argmin\\ =\argmin_{\mu}(\lang \mu,\mu \rang -2\lang\ \overline{X},\mu\rang)\\ \text{adding the constant} \lang \overline{X},\overline{X} \rang \text{ won’t affect the } \argmin \text{ therefore: }\\ =\argmin_{\mu}(\lang \mu,\mu \rang -2\lang\ \overline{X},\mu\rang +\lang \overline{X},\overline{X} \rang)\\ =\argmin_{\mu}(\lang \mu – \overline{X},\lang \mu – \overline{X}\rang )\\ =\argmin_{\mu}(\| \mu – \overline{X}\|^2 )\\ \end{aligned}

Because is larger than or equal zero, the minimum of the norm is zero, hence:

In a case that and are matrices, the norm is the Frobenius norm and the inner product is the Frobenius inner product.

### 2) Proof regarding the full GPA method

Proving the following:

\inf_{\beta_i,\Gamma_i,\gamma_i}\sum_{i=1}^{n} \|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – \frac{1}{n} \sum_{j=1}^{n} (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2\\ =\\ \inf_{\beta_i,\Gamma_i,\gamma_i}\frac{1}{n}\sum_{i=1}^{n} \sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2

The “inf” operator is not restated in the proof. Let . Therefore:

\sum_{i=1}^{n} \|A_i – \frac{1}{n} \sum_{j=1}^{n} A_j\|^2 =\frac{1}{n^2}\sum_{i=1}^{n} \|nA_i – \sum_{j=1}^{n} A_j\|^2\tag{I}

The Frobenius norm comes from the Frobenius inner product, i.e. . Therefore, above continues as:

=\frac{1}{n^2}\sum_{i=1}^{n} \lang nA_i – \sum_{j=1}^{n} A_j,nA_i – \sum_{j=1}^{n} A_j\rang

Inner product is a bi-linear form and hence by its linearity and commutative properties:

\begin{aligned} =\frac{1}{n^2}\sum_{i}( \lang nA_i – \sum_{j} A_j,nA_i\rang – \lang nA_i – \sum_{j} A_j,\sum_{j} A_j\rang)\\ =\frac{1}{n^2}\sum_{i}( \lang nA_i,nA_i\rang – \lang nA_i, \sum_{j} A_j\rang- \lang \sum_{j} A_j,nA_i\rang \\+ \lang \sum_{j} A_j,\sum_{j} A_j\rang)\\ =\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\lang A_i, \sum_{j} A_j\rang + \lang \sum_{j} A_j,\sum_{j} A_j\rang) \end{aligned}

Again by applying the linearity property of inner product on the finite sums:

=\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\sum_{j}\lang A_i, A_j\rang + \sum_{l}\sum_{j}\lang A_l, A_j\rang)

Applying the :

\begin{aligned} =\frac{1}{n^2}\sum_{i}( n^2\lang A_i,A_i\rang – 2n\sum_{j}\lang A_i, A_j\rang + \sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ =\frac{1}{n^2}(n^2\sum_{i}\lang A_i,A_i\rang – 2n\sum_{i}\sum_{j}\lang A_i, A_j\rang\\ + \sum_{i}\sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ =\frac{1}{n^2}(n^2\sum_{i}\lang A_i,A_i\rang – 2n\sum_{i}\sum_{j}\lang A_i, A_j\rang + n\sum_{l}\sum_{j}\lang A_l, A_j\rang)\\ \text{As the last two double-sigma terms are identical:}\\ =\frac{1}{n}(n\sum_{i}\lang A_i,A_i\rang – \sum_{i}\sum_{j}\lang A_i, A_j\rang )\tag{II} \end{aligned}

The rest can be done by induction, but for simplicity we continue with writing a number of terms to observe the pattern and reach the final result. With , and letting , the expansion of the above term in the parentheses (last equation) becomes:

(4-1)\lang A_1,A_1\rang+(4-1)\lang A_2,A_2\rang+(4-1)\lang A_3,A_3\rang+(4-1)\lang A_4,A_4\rang\\ -2\lang A_1,A_2\rang -2\lang A_1,A_3\rang-2\lang A_1,A_4\rang -2\lang A_2,A_3\rang -2\lang A_2,A_4\rang\\ -2\lang A_3,A_4\rang\\ =\lang A_1,A_1\rang -2\lang A_1,A_2\rang +\lang A_2,A_2\rang\\ +\lang A_1,A_1\rang -2\lang A_1,A_3\rang +\lang A_3,A_3\rang\\ +\lang A_1,A_1\rang -2\lang A_1,A_4\rang +\lang A_4,A_4\rang\\ +\lang A_2,A_2\rang -2\lang A_2,A_3\rang +\lang A_3,A_3\rang\\ +\lang A_2,A_2\rang -2\lang A_2,A_4\rang +\lang A_4,A_4\rang\\ +\lang A_3,A_3\rang -2\lang A_3,A_4\rang +\lang A_4,A_4\rang\\ \begin{aligned} =\|A_1-A_2\|^2+\|A_1-A_3\|^2+\|A_1-A_4\|^2+\|A_2-A_3\|^2\\ +\|A_2-A_4\|^2+\|A_3-A_4\|^2 \\ =\sum_{i=1}^{4}\sum_{\underset{{j\lt 4}}{j=i+1}}^{4}\|A_i-A_j\|^2 \end{aligned}

Consequently, for any number of terms Eq. II becomes . Hence, recalling Eq. I, we complete the proof:

\sum_{i=1}^{n} \|A_i – \frac{1}{n} \sum_{j=1}^{n} A_j\|^2=\frac{1}{n}\sum_{i=1}^{n}\sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|A_i-A_j\|^2\qquad\checkmark

# SSM Using Scalismo (3): Non-rigid registration

## Non-rigid registration

The rigid registration preserves the shape and size (form) of objects by merely transforming them through rotation and translation. Unlike the rigid registration, non-rigid registration (NRR) deforms (morphs) an object/configuration to have it fitted onto another object. Here, an object is a valid surface that its geometry can be discretely represented by a triangular mesh. A configuration is the set (or subset) of points/nodes of the mesh of the object. In a word, the geometry of an object is closely approximated with a finite set of points and triangles connecting the points/nodes; all together a triangular mesh. The triangles linearly interpolate the actual geometry between the points/nodes. I am going to refer to a points of a triangular mesh as a node.

But why we want to morph an object onto another object. One application is to establish one-to-one correspondence between the configurations’ points. For statistical shape modelling/analysis correspondence is essential. Another difference between the rigid and NRR is that the rigid registration entails known landmark correspondence; however, NRR seeks correspondence. Briefly, the registration problem (NRR) is the problem of establishing correspondence between two objects (their points).

Imagine two objects having configurations/meshes X and Y with different number of points; one with m and the other n points. A 2D example is depicted below. Definitely, a bijective correspondence cannot be established between every pair of points of X and Y. So, we cannot think about some point-to-point solution. The correct way of establishing the correspondence between the geometry of two objects is to morph the mesh of one of them onto the other one. In the figure below, the configuration/mesh X is morphed onto the configuration/mesh Y. Thereby, X’ closely matches Y and also its points are in correspondence with X. We have to pay attention to the difference between each two of these three terms, and their individual role in the problem: reference mesh, target mesh, fitted mesh; here, X, Y, and X’ respectively.

Terms: By deformation of an object we mean the deformation vectors or field. By transformation of an object we mean the transformed geometry.

Here, two questions bug my brain (there must be some brain there..!): 1- how to morph?, 2- how much morph? Devising and utilizing a model simulating plausible deformations is the answer to the first question, and defining a distance function to measure the distance between two meshes answers the second one.

A reference mesh also acting as the domain of the functions, a target mesh , a model of deformation fields , a distance function , and a weighted regularization term on the deformations formulate the registration problem as the following optimization problem [1]:

\hat u=\argmin_u (D[\Gamma_R,\Gamma_T,u]+\eta R[u])\tag{1}

The deformation field is a function . The distance function eats two (constant) configurations, the reference and the target ones, and also the deformation field, then it computes the distance between the target mesh and the deformed reference mesh, i.e. . The regularization term involves the prior assumption. It penalises deformations that do not fit the the prior assumption [1]. This method is called parametric NRR.

The theory of parametric NRR is the content of another post. Here, I try to explain the practical steps toward putting the theory into practice with Scalismo.

### 1- Initialising Scalismo

In addition to the main code, I defined some functions and organised them in a package object. The package code will be at the end of the post. How to create a package, click here.

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

object NRR extends App {
//To send the code selection to Scala Console, select the code in the Editor, press Ctrl + Shift + X
//imports
import scalismo.geometry._
import scalismo.common._
import scalismo.ui.api._
import scalismo.mesh._
import scalismo.registration.LandmarkRegistration
import scalismo.io.{MeshIO, LandmarkIO}
import scalismo.numerics.UniformMeshSampler3D
import scalismo.registration.{Transformation, RotationTransform, TranslationTransform, RigidTransformation}
import scalismo.numerics._
import scalismo.kernels._
import scalismo.statisticalmodel._
import scalismo.registration._
scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
// ui
val ui = ScalismoUI()

This is to include the package:

import custom.methods._

We use our rigidly aligned meshes of the femoral heads and their anatomical landmarks; available in the post SSM Using Scalismo (1): Rigid alignment. The initial rigid alignments of the meshes and some landmarks in correspondence are going to mitigate the registration problem. This time, we intend to fit/morph (NRR) the reference mesh to the target meshes.

 // [Loading aligned meshes and LM files]
val meshDir = "I:/registeration/AlignedData/Meshes/"
val LMDir = "I:/registeration/AlignedData/LMs/"
val registeredMeshDir = "I:/registeration/Registered/"
// meshes
val meshFiles = new java.io.File(meshDir).listFiles()
meshFiles.foreach(file => println(s"${file}\n")) val numOfMeshes = meshFiles.size println(s"number of files is${numOfMeshes}")

// locating the RefMesh file i.e mesh0.stl. RefMesh has to be named mesh0.stl
val refMeshIndex : Int = (0 until numOfMeshes).map(i => meshFiles(i).getName).indexOf("mesh0.stl")
if (refMeshIndex != 0) {println("Rename refMesh File"); sys.exit() }

val targetMeshes: Array[TriangleMesh[_3D]] = meshFiles.tail.map(meshFile => MeshIO.readMesh(meshFile).get) //meshes to get fitted to refMesh
val LMFiles = new java.io.File(LMDir).listFiles()
// Landmarks of the reference mesh should be in LMs(0)
val LMs = LMFiles.map(f => LandmarkIO.readLandmarksJson[_3D](f).get)

A reference mesh view if you want:

 val refMeshGroup = ui.createGroup("refMeshGroup")
val refMeshView = ui.show(refMeshGroup, refMesh, "refMesh")
refMeshView.color = java.awt.Color.GREEN

### 3- Defining a Gaussian process

The deformation field is going to be modelled by a Gaussian process (GP). GP outputs plausible/permissible deformation components at each point of the reference mesh. To define a GP we need a mean deformation field and a kernel (this post). The GP should produce deformation fields capturing the deformations needed to morph the reference mesh onto each of the target meshes. Looking at the reference mesh, we see a combination of global and local geometrical features; a dome-like shape with an indentation. So, maybe it is better to define a kernel taking different scale of deformations into account. The mean field is set to be a zero field.

val mean = VectorField(RealSpace[_3D], (_ : Point[_3D]) => EuclideanVector.zeros[_3D]) //
val kernel1 = DiagonalKernel[_3D](GaussianKernel(sigma = 3) * 0.1, outputDim = 3)
val kernel2 = DiagonalKernel[_3D](GaussianKernel(sigma = 7) * 2.0, outputDim = 3)
val kernel = kernel1 + kernel2
val GP = GaussianProcess(mean, kernel)
val lowRankGP = LowRankGaussianProcess.approximateGPCholesky(refMesh.pointSet,GP, 0.05, NearestNeighborInterpolator())

We can view samples of deformations of the reference mesh by the GP to check whether the defined GP kernel is suitable of capturing the needed deformations or not.

 val gpSampleGroup =  ui.createGroup("gpSampleGroup")
val refMeshView2 = ui.show(gpSampleGroup, refMesh, "refMesh")
// Transforming whatever mesh in the gpSampleGroup by the GP-sampled deformation fields
val gpDefView = ui.addTransformation(gpSampleGroup, lowRankGP, "RefMeshDefrmdByGp")
removeGroupView(gpSampleGroup) // see the definition in the package

The deformations seems ok; nothing weird like intersecting triangles occurred. Let’s hope that the kernel is going to work and continue.

### 4- Defining regression data for posterior GPs

Now, it is the moment that we use the known landmark correspondence information. On each mesh there are 3 (inter-mesh) corresponding landmarks. Therefore, there are known deformation vectors from the landmarks on the reference mesh to their corresponding landmarks on each of the target mesh. We want to incorporate these known deformations into our GP deformation model. This can be done through GP regression, and the resultant model, with the incorporation of the known deformations, is called the posterior GP model. The theory is touched in this post.

The landmarks have IDs and coordinates. The coordinates are not associated with the nodes. Therefore, we first have to associate/find a node closest to each landmark of the reference mesh. Here is how:

//[Defining posterior lowrankGp given the landmark transformations from reference to a target mesh]
// finding points (nodes) corresponding/associated to/with landmarks of the refMesh
// The nodes on the mesh closest to the landmarks are the nodes associated with the landmarks
val refLMs = LMs(0)
val refLMNodeIds = refLMs.map(eachLM => refMesh.pointSet.findClosestPoint(eachLM.point).id)

Generating the regression data, i.e. known deformation vectors (from the reference mesh to each target mesh).

 // generating regression data i.e known deformations from the LMs' corresponding points on the RefMesh to corresponding landmark points of the target mesh
//landmarks are already sorted in the order of correspondence
val knownDeformations =  (0 until numOfMeshes-1).map(i => {
(0 until LMs(0).size).map(j => (LMs(i+1)(j).point - refMesh.pointSet.point(refLMNodeIds(j))))
})
// obtaining the coordinates of the refMesh points associated with the refMesh landmarks
val refLMpoints = refLMNodeIds.toIndexedSeq.map(eachId =>refMesh.pointSet.point(eachId))

The known deformation fields (regarding each target mesh) at the reference nodes (associated with the reference landmarks) can be viewed:

// visualizing known deformation fields
// generating a domain of points needed for a vector field definition/construction
val refLMdomain =  UnstructuredPointsDomain(refLMpoints)
val knownDeformationFields = knownDeformations.map(eachDefSet => {
DiscreteField[_3D, UnstructuredPointsDomain[_3D], EuclideanVector[_3D]](refLMdomain, eachDefSet)
})
val knownDeformationGroup = ui.createGroup("knownDeformations")
knownDeformationFields.map(eachfield => ui.show(knownDeformationGroup, eachfield, "knownDefs"))
removeGroupView(knownDeformationGroup)

Because the known/observed deformations is usually polluted with sort of uncertainty, a noise term should be added onto the known deformations. We add a little bit of noise as:

val noise = MultivariateNormalDistribution(DenseVector.zeros[Double](3),0.001* DenseMatrix.eye[Double](3))

Each set of regression data (known/observed deformation fields plus noise) regarding each target mesh contains sequences of data in the tuple form of (data point, deformation, noise). Data point is the a point of the reference mesh at which the known noisy deformations are observed. We define the following function to generate sets of regression data regarding each target mesh. Then, we use a function to generate the sets of regression data regarding the target meshes.

def RegressionDataGenerator(dataPoints:IndexedSeq[Point[_3D]], deformations:IndexedSeq[EuclideanVector[_3D]],noise:MultivariateNormalDistribution)={
val regressionDataSet = (0 until dataPoints.size).map(i => (dataPoints(i),deformations(i),noise))
regressionDataSet //.toIndexedSeq
}
// regression data sets regarding target meshes
val regressionDataSets = (0 until numOfMeshes - 1).map(i =>  RegressionDataGenerator(refLMpoints,knownDeformations(i),noise))

### 5- Posterior GP and registration

The following steps are going to be taken for each target mesh, so I use a for loop:

for( n <- (0 until numOfMeshes - 1)) {

#### 5-1- Defining the Posterior GP based on each regression data

 val posteriorGP = lowRankGP.posterior(regressionDataSets(n))

Let’s view some samples of transformation of the reference mesh deformed by the posterior GP based on one of the target meshes regression data:

 val postGpGroup = ui.createGroup("postGpGroup")
val refMeshView3 = ui.show(postGpGroup, refMesh, "refMesh")
val PGpView = ui.addTransformation(postGpGroup, posteriorGP, s"TtransformedByPGp $n") removeGroupView(postGpGroup) #### 5-2- Solving the NRR problem To solve an NRR problem based on Eq. 1, an agent modelling the deformation fields, a distance function, a regularizer, and the weight of the regularizer are needed. Scalismo reduces the NRR problem into the following minimization problem [1]: \argmin_{\alpha_1,…,\alpha_r}D\Big[\Gamma_R,\Gamma_T,\mu+\sum_{i=1}^{r}\alpha_i\sqrt{\lambda_i}\phi_i\Big]+\eta \sum_{i=1}^{r}\alpha_i^2\tag{2} where, ‘s are registration coefficients minimizing the above expression, is the average of the GP (here zero every where), ‘s and ‘s are Eigen values and Eigen functions of the low-rank approximation of the GP (see this post), is the rank of the low-rank approximation of the GP .These Eignen values/functions are known as the GP or the posterior GP is known. is the weight of the regularization term . The regularization term is an L2 regularizer. The distance function measuring the distance between the transformed reference mesh and the target mesh can be defined as the mean squared distance. For a moment, let’s assume that the surfaces are continuous. A point-to-point distance function is defined as: \text{CP}_{\Gamma_T}(x):=\|x-\argmin_{x’ \in \Gamma_T} \|x-x’\|\| where a surface, and is the Euclidean vector norm. This function calculates the distance between each point on and its closest point on . The arg min operator finds the closest point on . The average of the distance function over the continuous domain represents the average distance between and : D[\Gamma_R, \Gamma_T,u]:=\frac{1}{S(\Gamma_R)}\int_{\Gamma_R}\big (\text{CP}_{\Gamma_T}(x+u(x))\big )^2 ds in which is the area of . This distance measure goes in Eq. 2 and is evaluated numerically. It is good to know that numerical evaluation of the above integral considering discrete surfaces, i.e. meshes, gets the following form: D[\Gamma_R, \Gamma_T,u]\approx \frac{1}{n} \sum_{i=1}^{n}\big (\text{CP}_{\Gamma_T}(x_i+u(x_i))\big )^2 such that , is the number of nodes/points of ; here the surfaces are represented by meshes. The integral can be evaluated at number of sampled points/nodes of ; in this case we replaces n with m. Minimizing Eq. 2 is through iteration. To start, we need initial values of the coefficients , number of sampled points/nodes for numerical evaluation of the distance measure, initial value for the regularization weight, and number of iterations. The initial values of are set to be zeros by a vector: val initialCoefficients = DenseVector.zeros[Double](posteriorGP.rank) Now, instead of working with fixed values for number of sampled points, regularization weight, and number of iterations, the iteration procedure is divided into several stages in order to fine-tune the minimization problem. Physically, finer geometrical features (up to the capability of the GP) are considered at at each stage. To this end, we define a sequence of registration parameters, which are, the reqularization weight, number of iterations, and number of sampled points/nodes. This is how it looks like:  val registrationParameters = Seq( RegistrationParameters(regularizationWeight = 1e-1, numberOfIterations = 20, numberOfSampledPoints = 1000), RegistrationParameters(regularizationWeight = 1e-2, numberOfIterations = 30, numberOfSampledPoints = 1000), RegistrationParameters(regularizationWeight = 1e-4, numberOfIterations = 40, numberOfSampledPoints = 2000), RegistrationParameters(regularizationWeight = 1e-6, numberOfIterations = 50, numberOfSampledPoints = 4000) ) where, “RegistrationParameters” is a case class; see the defining code in the package file at the end of the post. We consider Eq. 2 to clarify the above approach. When the regularization weight is relatively large, the minimization problem leads to some relatively small coefficients, therefore modes of deformations associated with relatively small Eigen values in , are ruled out. This means only modes having large effects in the deformations remain. These modes of deformations are more toward global deformations rather than local deformations leading to fine geometrical features. Having said that, it is reasonable to start the minimization procedure with less number of points, uniformly sampled from our discrete domain , and fewer number of iterations. Thereby, a set of coefficients appropriate for an approximate global fit (of the reference mesh to the target mesh) is calculated. Then, the regularization weight is decreased so that other deformation modes with relatively smaller Eigen values can effectively appear in the minimization. This means taking care of more local deformations hence finer geometrical features. As we want to consider smaller deformations, we have to fine-sample the points, i.e. more number of points. At this level, number of iterations should also increase. The output of the minimization is the alpha-coefficients. The procedure starts with an initial value and after each (sub) iteration the input coefficients are updated. Also, once each stage of iteration as specified above is done, its output coefficients should travel to the start of the next stage. This can be done by a single line in Scala:  val finalCoefficients = registrationParameters.foldLeft(initialCoefficients)((modelCoefficients, regParameters) => doRegistration(posteriorGP, refMesh, targetMeshes(n), regParameters, modelCoefficients)) The function doRegistration is elucidated in the package file. Finally, the minimizing coefficients are calculated and stored in finalCoefficients. ### 6- The fitted mesh The coefficients are now used to deform the reference mesh (all points) onto a fitted mesh approximating the target mesh. The following two lines calculates the deformation field based on with the calculated ‘s.  val transformationSpace = GaussianProcessTransformationSpace(posteriorGP) val registrationTransformation = transformationSpace.transformForParameters(finalCoefficients) Applying the transformation on the reference mesh to get the fitted mesh:  val fittedMesh = refMesh.transform(registrationTransformation) ### 7- Visualising the fitted mesh Here, the fitted mesh on to each target mesh is visualised.  val fittedMeshGroup = ui.createGroup("fittedMeshGroup") val fittedMeshView = ui.show(fittedMeshGroup, fittedMesh, "fittedMesh") val targetMeshView = ui.show(fittedMeshGroup, targetMeshes(n), s"TargetMesh$n")

The fitted mesh on to the first target mesh:

To visually assess the accuracy of the fit, we can scroll through the cross sectional contours (normal to the coordinate system’s axes) of the fitted and the target meshes simultaneously. For example:

Based on the visualization of the cross sectional contours, the fitted mesh closely approximates the target mesh except at some end cross sections where local geometrical variations are relatively steep.

The other target mesh is approximated/fitted as below:

The worst mismatches of cross sectional contours for this case.

To accept or not to accept a fitted mesh; this is the question.

It depends on how much discrepancy is allowed in a particular application. Nonetheless, life goes on regardless of our choice. If we are picky, we can play with the kernel (and the registration parameters) to get a better fit.

Average distance and the Hausdorff distance are available as quantitative measures of fit, in Scalismo:

 val AveDist  = scalismo.mesh.MeshMetrics.avgDistance(refMesh, fittedMesh)
val HsdrfDist  = scalismo.mesh.MeshMetrics.hausdorffDistance(refMesh, fittedMesh)
println(s"\nAverage distance is $AveDist") println(s"\nHsdrf distance is$HsdrfDist \n")

Distance of first target mesh to its fitted mesh:
Average: 1.2 mm
Hausdorff: 3.9 mm
Distance of second target mesh to its fitted mesh:
Average: 1.4 mm
Hausdorff : 4.9 mm

### 8- Writing the fitted mesh into a file

Before writing the fitted meshes to files, we can observe that the reference mesh and each fitted mesh have the same number of nodes (here, 4470); indeed all in (one-to-one) correspondence. Use the following to access the attribute containing the number of points of the reference mesh (or any other ones).

 refMesh.pointSet.numberOfPoints

Writing to STL files:

 val fittedMeshFile =  new java.io.File(registeredMeshDir + s"ftdMesh$n.stl") MeshIO.writeSTL(fittedMesh, fittedMeshFile) ### 9- Finishing the code We close the for-loop and use some lines to exit the run: // to remove the groupp view after each fitting procedure is done removeGroupView(fittedMeshGroup) } //closes the for-loop val finito = readLine("finito [y/n]? ") if (finito == "y") ui.close() } // closes the object [1] Gaussian Process Morphable Models, Marcel Luthi , Thomas Gerig, Christoph Jud, and Thomas Vetter. IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 40 , Issue: 8 , Aug. 1 2018). ### Appendix: package package custom import scalismo.geometry._ import scalismo.common._ import scalismo.ui.api._ import scalismo.mesh._ import scalismo.registration.LandmarkRegistration import scalismo.io.{MeshIO, LandmarkIO} import scalismo.numerics.UniformMeshSampler3D import breeze.linalg.{DenseMatrix, DenseVector, csvread} import scalismo.registration.{Transformation, RotationTransform, TranslationTransform, RigidTransformation} import scala.io.StdIn.{readLine, readInt} import scalismo.numerics._ import scalismo.kernels._ import scalismo.statisticalmodel._ import scalismo.registration._ package object methods { implicit val rng = scalismo.utils.Random(42) // removing group view def removeGroupView(groupView:scalismo.ui.api.Group) : Unit = { val s = readLine(s"remove${groupView.name} and/orJust continue [y/n]? ")
if (s == "y") groupView.remove()
}

// registration

case class RegistrationParameters(regularizationWeight : Double, numberOfIterations : Int, numberOfSampledPoints : Int)

def doRegistration(
lowRankGP : LowRankGaussianProcess[_3D, EuclideanVector[_3D]],
referenceMesh : TriangleMesh[_3D],
targetMesh : TriangleMesh[_3D],
registrationParameters : RegistrationParameters,
initialCoefficients : DenseVector[Double]
) : DenseVector[Double] =
{
val transformationSpace = GaussianProcessTransformationSpace(lowRankGP)
val fixedImage = referenceMesh.operations.toDistanceImage
val movingImage = targetMesh.operations.toDistanceImage
val sampler = FixedPointsUniformMeshSampler3D(referenceMesh, registrationParameters.numberOfSampledPoints)
val metric = MeanSquaresMetric(fixedImage,movingImage,transformationSpace,sampler)
val optimizer = LBFGSOptimizer(registrationParameters.numberOfIterations)
val regularizer = L2Regularizer(transformationSpace)
val registration = Registration(
metric,
regularizer,
registrationParameters.regularizationWeight,
optimizer
)
val registrationIterator = registration.iterator(initialCoefficients)
val visualizingRegistrationIterator = for ((it, itnum) <- registrationIterator.zipWithIndex) yield {
println(s"object value in iteration $itnum is${it.value}")
//gpView.coefficients = it.parameters
it
}
val registrationResult = visualizingRegistrationIterator.toSeq.last
registrationResult.parameters
}

}

# Procrustes Analyses

## Some definitions*

1) The Shape of an object is all the geometrical information that remains when, location, scale, and rotational effects are removed from and object. In other words, shape of an object is invariant under Euclidean similarity transformations, which are translation, isotropic scaling, and rotation.

2) The Form of an object is all the geometrical information that remains when location and rotational effect of an objects are removed. Form is also called size-and-shape. Form is invariant under rigid-body transformations. Here, we can say shape is the form with size removed (the size/scale information is removed).

3) Same form and same shape: two objects have the same form, size-and-shape, if they can be translated and rotated relative to each other so that they exactly match. Two objects have the same shape, if they can be translated, rotated and scaled relative to each other so that they exactly match.

The words matching, registration, superimposing and fitting are equivalent.

4) A landmark is a point of correspondence on each object. e.g tip of the nose of all the dinosaurs in a population/sample.

5) Geometric shape analysis: An object can be (discretely) represented by a collection of landmark (point cloud), therefore, the landmarks coordinates retain (continue to have) the geometry of the point-cloud configuration. This approach to shape analysis is called the geometric shape analysis.

6) The Configuration is a collection (set) of landmarks on/of a particular object. The configuration of a m-D object with k landmarks can be represented by a matrix called the configuration matrix. For a 3D object we have:

\begin{bmatrix} x_1&y_1&z_1\\ x_2&y_2&z_2\\ x_3&y_3&z_3\\ \vdots& \vdots&\vdots\\ x_k&y_k&z_k\\ \end{bmatrix}

7) A size measure of a configuration is any positive real-valued function of the configuration matrix, X, such that .

8) The centroid size is defined as the square root of the sum of squared Euclidean distances from each landmark to the centroid of a configuration:

S(X):=\sqrt{\sum_{i=1}^{k}{\|X_{i,}-\overline{X}\|^2}}\tag{1}

where is the i-th row of X, and . is the Euclidean vector norm.

The centroid size can be re-written using matrix algebra:

S(X)=\|CX\|\tag{2}

where with being the identity matrix and is called the centering matrix, and is the Frobenius norm, sometimes also called the Euclidean norm, of a matrix, i.e.
. If the centroid of is already located at the origin, then .

9) The Euclidean similarity transformations of a configuration matrix X are the set of translated, rotated, and isotropically scaled X. In a mathematical notation:

\{\beta X \Gamma + 1_k\gamma^{\text T}:\beta\in \mathbb{R}^+, \Gamma\in SO(m),\gamma\in \mathbb{R}^m\}\tag{3}

where, is the scale (factor), is the space of rotational matrices, is the translation vector. Vectors are considered as column vectors. is a column vector of ones. for 3D.

10) Rigid-body transformations of a configuration matrix X are the set of translated and rotated X. In a mathematical notation:

\{X \Gamma + 1_k\gamma^{\text T}:\beta\in \mathbb{R}^+, \Gamma\in SO(m),\gamma\in \mathbb{R}^m\}\tag{4}

11) The centered pre-shape of a configuration matrix X is obtained (defined) by removing the location and scale/size information of the configuration. Location can be removed by centering the configuration, i.e. translating the configuration by moving its centroid to the origin. Therefore, the translated configuration is . The size is removed by scaling the configuration by its size defined as Eq. 2. Therefore, the pre-shape of X is obtained as:

Z_C:=\frac{CX}{\|CX\|}=\frac{CX}{\|X_C\|}\tag{5}

Removing size or scale means scaling to the unit size. Thereby, two configurations with different sizes lose their size information and also their scale. The scale is a quantity that expresses the ratio between sizes of two configurations, either two with different shapes or two with the same shape (one is the scaled version of the other one). In another scenario (like the full Procrustes analyses), if two or more configuration are scaled with different scale (factors), they also lose their scale/size information (relative to each other) although their sizes may not be unit.

12) Reworded definition of a shape: A shape is an equivalence class of geometrical figures/objects/configurations modulo (what remains after) the similarity transformation. The shape of X is denoted by [X]. In order to visualise the shape of an configuration, a representative of the equivalence class is considered and called icon, denoted by [X].

Two configurations and have the same shape, i.e. iff there exist such that:

X’=\beta X \Gamma + 1_k\gamma^{\text T}

In other words, two configurations have the same shape if they perfectly match after the removal of their location, rotation, and size/scale.

13) Shape space is the space/set of all shapes, i.e. equivalence classes. For example . All have their locations, rotations, and size removed.

14) Reworded definition of form (size-and-shape): form is an equivalence class of geometrical configurations modulo translation and rotation. The form/size-and-shape of a configuration/object X is denoted by , i.e an icon for the class. Two configuration and have the same form, i.e. iff there exist such that:

X’=\Gamma X + 1_k\gamma^{\text T}

15) Size-and-Shape/Form space is the space/set of all forms, i.e. equivalence classes. For example .

16) By removing the size of a form (scaling to unit centroid size) of a configuration, the shape of the configuration is obtained.

17) Clarification on the term “removing”: let’s imagine a set of n configurations (). We want to remove the scale, location, and rotation(al) information of the configurations. Removal of scale/size is to scale all the configurations to unit size (each divided by for example its centroid size).

Removal of location is to translate all the configurations to a particular target point/location in the space by selecting the same single landmark or point of reference (like the centroid) of each configuration and (rigidly) translating the configuration along the vector from the point of reference to the target point. The target point can be the origin of the coordinate system/frame of reference.

Removal of rotation of the configurations is through optimising over rotations by minimising some criterion. For instance, to remove the rotations (rotational information) of two configurations with their locations (and maybe scales/sizes) already removed, it is needed to apply a particular relative rotation on them (e.g. keep one stationary and rotate the other one) so that they get closest, with respect to some definition of distance, as possible. Once the rotation is removed, then any deviation of one configuration from the other one originates from their locations, scales/sizes, and their shapes. To compare shapes, we also have to remove the locations and scales of the configurations. Then any deviation is due to the shape of the configurations. If only location and rotational information are removed, then any relative deviation originates from the sizes and shapes (form) of the configurations.

## Matching two configurations and measure of distance between shapes

Consider a collection/set of shapes, as defined above, all having the same number of landmarks and the landmarks are in one-to-one correspondence. An important question is how to define a distance between two shapes in the set. This is natural to define a measure between the members of a set. The notion of shape distance indicates the difference between two shapes, i.e. how far two shapes are far from each other in the shape space.

A natural approach is to match/fit/superimpose/register a configuration X to a configuration Y (same number of landmarks and in correspondence) using the similarity transformations applied on X, and then the magnitude of difference between the fitted configuration of X, i.e. and indicates the magnitude of difference in shape between the two configurations X and Y.

So how to match two configuration? The full ordinary Procrustes analysis (FOPA) is the method to match/register/superimpose/fit two (m-D) configurations with the same number of landmarks as closely as possible onto each other. This method applies the similarity transformations to one of the configurations to closely match the other one. But what is the measure of closeness? The definition of the method makes it perspicuous:

The method of FOPA matches two configurations through transforming one onto the other one (target configuration) up to the similarity transformations and minimizing the squared Frobenius/Euclidean norm of the difference between the transforming and the target configurations. This means to minimize the following expression:

D_{FOPA}^2(X_1,X_2):=\|X_2-(\beta X_1\Gamma+1_k\gamma^\text{T})\|^2\tag{6}

The minimum value of is a measure of similarity/dissimilarity. Observe that is the transforming configuration (input) and is the target configuration. Also, all the similarity transformations are involved in the registration; this is why the term full is used. In order to find the matching configuration, the similarity parameters should be determined, i.e.:

(\hat\beta, \hat\Gamma, \hat\gamma)=\argmin_{\beta,\Gamma,\gamma}\|X_2-\beta X_1\Gamma-1_k\gamma^\text{T}\|^2\tag{7}

It is good to know that if the configurations are already centered.

The full Procrustes fit of onto is then:

X_1^{FP}=\hat\beta X_1\hat\Gamma+1_k\hat\gamma^\text{T}\tag{8}

It is tempting to chose the minimum of Eq. 6 as a measure of distance between the shapes of the two configurations. However, , i.e. not commutative, unless the configurations both have the same size; for example having the unit size by getting pre-scaled. Therefore, the full Procrustes distance, between two shapes (of two configurations) is defined as:

d_F^2(X_1,X_2):=\min\ D_{FOPA}^2(\frac{X_1}{\|X_1\|},\frac{X_2}{\|X_2\|})\tag{9}

FOPA involves isotropic scaling of the input configuration, in addition to the other two similarity transformations, to register it onto the target configuration. This means that the fitted configuration does not have the same size/scale as its original configuration. Therefore, FOPA removes the size of the input configuration.

Another scenario of fitting the input configuration to the target one is to register while preserving the size/scale of the input, i.e. not to scale the input (the scale of the target is also preserved). In this regard, partial ordinary Procrustes analysis (POPA) is a method of registration only over translation and rotation to match two configurations. This involves the minimization of:

D_{POPA}^2(X_1,X_2):=\|X_2-(X_1\Gamma+1_k\gamma^\text{T})\|^2\tag{10}

The partial Procrustes fit of onto is then: (see Eq. 7 with ):

X_1^{PP}=X_1\hat\Gamma+1_k\hat\gamma^\text{T}\tag{11}

Based on POPA, the partial Procrustes distance can be defined. If the two configurations have the same size, like by getting pre-scaled to the unit size, i.e. the size is removed, then the following is defined as the the partial Procrustes distance between two shapes (of two configurations) :

d_P^2(X_1,X_2):=\min\ D_{POPA}^2(\frac{X_1}{\|X_1\|},\frac{X_2}{\|X_2\|})\tag{12}

Calculating POPA distance, unlike the FOPA distance, does not involve additional scaling of the the (pre-scaled) unit sized configurations, i.e. in the minimization of . Nevertheless, obtaining POPA distance still entails removing the scale of the configurations.

Remark: the two defined distances have different values.

## Mean shape

First some motivation from the arithmetic mean of numbers. Every dinosaur knows that the arithmetic mean/average of n real numbers is defined as: . Let’s define a function as . In other words:

g(\mu):=\frac{1}{n} \sum_{i=1}^{n} (x_i-\mu)

It is now obvious that if we let then . This is what we want out of defining the function. However, zero is not the minimum of as this function is an absolutely decreasing function, hence no local or global minimum (just take the derivative w.r.t ). Let’s define a function as:

f(\mu):=\frac{1}{n} \sum_{i=1}^{n} (x_i-\mu)^2

What minimizes ? Solving , we get:

\frac{\mathrm d}{\mathrm d \mu}f(\mu)=\frac{1}{n} \sum_{i=1}^{n} -2(x_i-\mu)=0 \newline \implies \frac{1}{n} \sum_{i=1}^{n} (x_i-\mu)=0

This is just which has the answer . The second derivative of is which indicates that minimizes .

Using the Euclidean distance/metric/norm of real numbers, i.e. with being the absolute value, we can re-define as:

f(\mu):=\frac{1}{n} \sum_{i=1}^{n} |x_i-\mu|^2=\frac{1}{n} \sum_{1}^{n} d_{E}^2(x_i,\mu)

This results in (it can be easily proved by equating the derivative with zero):

\hat\mu=\argmin_{\mu}\frac{1}{n} \sum_{i=1}^{n} d_{E}^2(x_i,\mu)

This motivation suggests that for a set of objects with a defined distance function/metric, a mean/average object can be imagined. This average object is such an object that minimizes the average sum of squared distances from all objects to the average object. The Mean shape of a set of shapes sampled from a population of shapes can be defined using the distance functions as defined by Eq. 9 and Eq. 12:

The sample full Procrustes mean shape is defined as:

\hat\mu_F:=\arg\inf_{\mu}\frac{1}{n} \sum_{i=1}^{n} d_{F}^2(X_i,\mu)\tag{13}

The sample partial Procrustes mean shape is defined as:

\hat\mu_P:=\arg\inf_{\mu}\frac{1}{n} \sum_{i=1}^{n} d_{P}^2(X_i,\mu)\tag{14}

In words, the mean shape is a shape with minimum distance from each configuration’s shape.

Note that in both definitions, the configurations will be pre-scaled to the unit size (due to the definition of the metrics).

Remark: the mean shape refers to the mean of the shapes not the mean of configurations.

## Matching more than two configurations: generalized Procrustes analyses

The term FOPA/POPA refers to Procrustes matching of one configuration
onto another. If there are more than two configurations, it is natural to register/match them all to a mean shape or mean form. To do so, generalized Procrustes analysis (GPA) is utilized to superimpose two or more configurations onto a common unknown mean shape/form. GPA has two methods: full and partial GPA’s. The GPA provides a practical method of computing the sample mean shape defined by Eq. 13 or 14. Unlike the full/partial OPA, the output of GPA (the Procrustes fits) does not depend on the order of superimposition.

### Full GPA

The method of full GPA transforms configurations up to the similarity transformations (scaling + rotation + translation) with respect to a common unknown mean shape in order to minimise the following total sum of squares with respect to the transformation parameters and , while imposing a size constraint to avoid a trivial solution. A trivial solution can be all close to zero (in order not to make a configuration vanished, scale is not allowed to be zero) and no translation; hence the mean shape will be something with size close to zero, i.e almost vanished.

G(X_1,…X_n):=\sum_{i=1}^{n} \|\beta_iX_i\Gamma_i+1_k\gamma_i^\text{T}-\mu\|^2\tag{15} \newline \sum_{i=1}^{n}S^2(\beta_iX_i\Gamma_i+1_k\gamma_i^\text{T})=\sum_{i=1}^{n}S^2(X_i)

In the above expression, are unknown. Once the expression is solved, the estimate of the optimum/minimizing parameters, , and estimate of the mean shape, are found. Note that if we already knew the population mean shape, we would just align each configuration separately to that mean shape.

Once the above minimization is solved, the full (generalized) Procrustes fit of each configuration (onto the estimated mean) is:

X_i^{FP}=\hat\beta_iX_i\hat\Gamma_i+1_k\hat\gamma_i^\text{T}\tag{16}

Note that the scale of each transformed configuration (full Procrustes fit) is not necessarily the same. Therefore, full GPA does not preserve the scale of the configurations through their optimal transformations. This means that if one configuration is times larger than the other one, they will be times of each other after the optimal transformation and getting fitted to the mean shape.

Assuming that minimize expression 15, we can write:

\hat\mu=\argmin_{\mu}\sum_{i=1}^{n} \|\hat\beta_iX_i\hat\Gamma_i+1_k\hat\gamma_i^\text{T}-\mu\|^2\text{ or}\tag{17} \newline \hat\mu=\argmin_{\mu}\sum_{i=1}^{n} \|X_i^{FP}-\mu\|^2

The solution is then (see the proof here):

\hat\mu=\frac{1}{n}\sum_{i=1}^{n} X_i^{FP}\tag{18}

which is the arithmetic mean of the Procrustes fits. This means that once the configurations are registered to the estimate of the unknown mean, their arithmetic mean/average equals the estimated mean.

Now we want to show that the estimated mean shape has the same shape as the sample full Procrustes mean shape defined by Eq. 13. Without loss of generality, we pre-scale the configurations to the unit size, and replace the size constraint with .

Since Eq. 15 is some of positive reals, the optimum transformation parameters should also minimize each term in the sum. This means that minimizes each . Therefore, following Eq. 13, the sample full Procrustes mean shape of the full Procrustes fit is:

\hat\mu_F:=\arg\inf_{\mu}\frac{1}{n} \sum_{i=1}^{n}\min_{\beta_i,\Gamma_i,\gamma_i}\|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T})-\mu\|^2 \newline =\arg\inf_{\mu}\frac{1}{n} \sum_{i=1}^{n}\|(\hat\beta_i X_i\hat\Gamma_i+1_k\hat\gamma_i^\text{T})-\mu\|^2 \newline=(n^{-1})\arg\inf_{\mu}\sum_{i=1}^{n}\|X_i^{FP}-\mu\|^2

Therefore, (by Eq. 16 and 17), . These two have the same shape, which means there is a similarity transformation to perfectly superimpose them on each other. Here, it is obvious that only scaling by the factor of is needed.

Estimating the transformation parameters of full GPA method in order to estimate the sample full Procrustes mean shape and registering all configurations to the estimated mean shape, is then equivalent to the following minimization problem:

\inf_{\beta_i,\Gamma_i,\gamma_i}\sum_{i=1}^{n} \|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – \frac{1}{n} \sum_{j=1}^{n} (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2\\ \sum_{i=1}^{n}S^2(\beta_iX_i\Gamma_i+1_k\gamma_i^\text{T})=\sum_{i=1}^{n}S^2(X_i)

It can be proved that above is equal to (See the proof here):

\inf_{\beta_i,\Gamma_i,\gamma_i}\frac{1}{n}\sum_{i=1}^{n} \sum_{\underset{{j\lt n}}{j=i+1}}^{n}\|(\beta_i X_i\Gamma_i+1_k\gamma_i^\text{T}) – (\beta_j X_j\Gamma_j+1_k\gamma_j^\text{T})\|^2\\ \newline \sum_{i=1}^{n}S^2(\beta_iX_i\Gamma_i+1_k\gamma_i^\text{T})=\sum_{i=1}^{n}S^2(X_i)

Above means that the full GPA minimizes the sum of pairwise squared Frobenius distances between all the transformed versions of the configurations ( (full Procrustes fits) in the sample. It should be noted that, the output of the full GPA is configurations that are translated, rotated, and scaled versions of the original configurations. Hence, the scale, location, and rotational information of each configuration is removed. The output configurations are registered onto each other such that the sum of pairwise squared Frobenius distances between all of them is minimized. If the scale of each output configuration is removed by re-scaling it to unit size, then full Procrustes distance between each two shapes (of the output configurations), and also the estimated mean shape (once re-scaled to unit size) can be calculated.

Remark: the full GPA does not lead to the minimized (squared) Frobenius distance between two full Procrustes fits , but it leads to the minimized sum of the Frobenius distances of each full Procrustes fit and the rest of them in the set. It does minimize the (squared) Frobenius distance between each full Procrustes fit and the (full Procrustes) mean shape though.

Remark: The full GPA removes the scale of the configurations by scaling them in order to make them closer to the mean shape. The scales are not the same for all the configurations and hence the full GPA does not preserve the (original) scale (relative size) of the configurations. The full Procrustes fits and the estimated mean shape may not have the unit size. However, they are relatively scaled in a way to minimize the sum of squared distances. This is the notion of removing the scale information (of the original configurations). Therefore, the Procrustes fits represent the shape (information) of the configurations and the differences between each of them (between corresponding landmarks) and the mean shape is purely due to their shape variations.

In summary, the full GPA of configurations leads to a mean shape and full Procrustes fits such that each full Procrustes fit is superimposed onto the mean shape through the similarity transformations; hence, whatever geometrical variations/discrepancies remains between each full Procrustes fit and the mean shape is purely due to their shapes.

An algorithm for full GPA is as follows:

1- Translations/Removing locations: remove the location of each configuration by centring each configuration (each configuration is translated to the origin by its centroid) and initially let each Procrustes fits be:

X_i^{FP}= CX_i\text{ for } i=1,…,n

2- estimate as .

3- Rotations and scales: perform a FOPA on each centred configuration to rotate and scale it to . Now each recently transformed configuration, , is fitted to the estimated mean.

{X}_i^{*FP}=\hat\beta_i X_i^{FP} \hat\Gamma_i

and are transformation parameters out of the FOPA. Note that the rotation is about the origin, and no translation is needed as the locations are already removed.

4- Update the mean shape: .

5- let . Re-centring is not necessary because the centroid of each configuration at this stage is already located at the origin and the rotations are also about the origin. Re-scaling won’t change the location of the centroid because it is isotropic.

6- Repeat 3 to 5 until everyone’s happy, i.e. the change in the Procrustes sum of squares () from one iteration to the next one becomes less than a particular tolerance.

Remark: The configurations can be pre-scaled to have the unit size but it is not necessary because their size is removed once they get scaled to better fit the estimated mean at each iteration.

### Partial GPA

Partial GPA involves merely registration of a set of configurations by translation and rotation (not scaling). Partial GPA preserves the scale/size of the configurations as opposed to full GPA. This method is appropriate for analysis of forms, i.e. size-and-shapes. Partial GPA is through minimization of the following sum with respect to the transformation parameters () and and unknown form . Size constraint is not needed as there is no scaling involved.

G_P(X_1,…X_n):=\sum_{i=1}^{n} \|X_i\Gamma_i+1_k\gamma_i^\text{T}-\mu\|^2\tag{19}

Once the above minimization is solved, the partial Procrustes fit of each configuration onto the common form is:

X_i^{PP}=X_i\hat\Gamma_i+1_k\hat\gamma_i^\text{T}\tag{20}

Similar to Eq. 17 and 18, we can write:

\hat\mu=\frac{1}{n}\sum_{i=1}^{n} X_i^{PP}\tag{21}

However, is not going to have the same shape as the mean shapes previously defined (sample full/partial Procrustes mean shape). is then a form that its Frobenius distance to each partial Procrustes fit is minimum. I and some dinosaur think that this partial GPA is appropriate to group-register a set of configurations onto each other, however, we don not know its application at this moment.

The algorithm for partial GPA is the same as that of the full GPA except that the scale is removed i.e. set as 1.

* The definitions follow the book “statistical shape analysis, Ian L. Dryden, Kanti V. Mardia”, and the article “Procrustes Methods in the Statistical Analysis of Shape, Colin Goodall”. I may not have paraphrased them all, hence, some sentences may be subjected to copy right and they are not intended to be reused in official publications.

# SSM Using Scalismo (2): GP mean and kernel

Statistical shape modeling (SSM) using Scalismo is based on Gaussian process (a random field. A Gaussian process, like normal probability distribution, is completely defined by its mean ( µ ) and covariance function ( k ). The covariance function is called kernel. We write GP(µ , k). The mean and kernel are both functions. In SSM, GP models shape deformations (a vector field) in a 3D (or 2D) space. It means that the GP tells us the probable or plausible deformation vector at each point in the space. The mean function eats a point in space, and output a real vector of deformations; kernel eats two points in the space and output a matrix. If the points in our 3D geometric space are represented by their coordinates, the aforementioned functions are:

\mu:\mathbb{R^3}\to\mathbb{R^3} \newline k:\mathbb{R^3}\times\mathbb{R^3}\to\mathbb{R^{3\times3}}\tag{1}

Once the mean and the kernel functions are determined, we can construct our SSM based on the GP. Statistical shape models based on GPs are called Gaussian Process Morphable Models (GPMM) [1]. A GPMM is composed of a GP describing the deformation field, and a reference shape. A reference shape is a subset of our geometric space. Here, the geometric space is a set of point. Describing the deformations through a GP, we can have the GP produce plausible deformations of the reference shape. The plausibility of the deformations is determined by the kernel. Hence, a shape produced/described by a GPMM is mathematically defined as [1]:

\Gamma_S=\{x+u(x)|x\in\Gamma_R\subset\Omega\text{ , }u:\Omega\to\mathbb{R^3}\text{ and }\Omega\supset\Gamma_R\}\tag{2} \newline \Omega \text{ open in }\mathbb{R^3}

The set of points of any plausible shape \Gamma_S is simply formed through giving each point of the reference shape a new location in the space. The new location of each point is obtained by the map . The map eats a point, translate it by (along) the vector , and output a point. Although both points and vectors are analytically recognized by tuples in , I need to pay attention to the conceptual difference between a point and a vector in the space.

Where is the GP? Here, where the GP comes into the scene. All plausible functions (maps), , are distributed according to a GP distribution:

u(x)\sim\mathnormal{GP}(\mu(x),k(x,x’))\text{ s.t }x\in\Omega\tag{3}

The mean (average) shape of the sets of all plausible shapes is naturally defined as:

\overline{\Gamma}=\{x+\mu(x)|x\in\Gamma_R\}\tag{4}

If, for simplicity, we let , we’ll get .

Now, the kernel. The kernel is actually the covariance function measuring the variability or correlation of the components of two deformation vectors at two points. As each deformation vector has 3 components, there will be 9 covariance values organized in a 3 by 3 matrix:

k(x,x’):=cov(u(x),u(x’))\tag{5}

where is a vector in . If we consider (and ) as a column vector in the space of 3 by 1 matrices ( ), we have:

cov(u(x),u(x’))=E[(u(x)-\mu(x))(u(x’)-\mu(x’))^\text{T}]\tag{6}

With being the transpose operator. Expanding the above, we’ll get the matrix of with the components:

k_{i,j}(x,x’)=cov(u_i(x),u_j(x’))=E[(u_i(x)-\mu_i(x))(u_j(x’)-\mu_j(x’))] \newline\tag{7}

Or in an expanded form with some (easy to capture) change of notations:

k_{i,j}(x,x’)=\begin{bmatrix} cov(u_x^1,u_{x’}^1)&cov(u_x^1,u_{x’}^2)&cov(u_x^1,u_{x’}^3)\\[0.3em] cov(u_x^2,u_{x’}^1)&cov(u_x^2,u_{x’}^2)&cov(u_x^2,u_{x’}^3)\\[0.3em] cov(u_x^3,u_{x’}^1)&cov(u_x^3,u_{x’}^2)&cov(u_x^3,u_{x’}^3)\end{bmatrix}\tag{8}

We can observe that the function is symmetric in the sense . Why? Because covariance is a symmetric function, i.e. for two random variables (NOT random vectors), a and b.

Remark: symmetry of does not imply that the matrix is symmetric as well. If , then it is not necessarily true that , and so on. Note that , , and are four different random variables.

However, the kernel leads to a symmetric matrix when evaluated at the same point, i.e. , which is equal to

#### What sort of kernel should we use?

Let’s define a scalar Gaussian kernel as [1]:

k_s(x,x’):=s\ \exp({-\frac{\|x-x’\|^2}{\sigma^2}})\tag{9}

where is the Euclidean norm expressing the distance between two (geometric) points in the space, s is a scale, and is a parameter defining the distance range over which the random variables are correlated. This kernel only depends on the distance of two points and not their locations. This means the correlation between the deformations at two points decreases as their distance increases.

One choice of generating the kernel matrix regarding the deformations is to assume that the deformation components are not correlated. Since we are working with a GPs, being uncorrelated implies independence. In mathematical notations:

cov(u_x^i,u_{x’}^j)=0\text{ for }i\neq j\text{ s.t }i,j\in\{1,2,3\}\tag{10}

which implies: for , where denotes the related probability distribution. As a side note and are random variables, therefore above follows this: for any random variables x and y.

Now we use the scalar kernel as the building component of our matrix kernel and define a diagonal kernel with the same parameters in each direction (may be I can call it a diagonal kernel with homogeneous parameters):

k_{i,j}(x,x’)=\begin{bmatrix} k_s(x,x’)&0&0\\[0.3em] 0&k_s(x,x’)&0\\[0.3em] 0&0&k_s(x,x’)\end{bmatrix}\tag{11}

Let’s puts all these into practice. We consider a 1 by 1 (unit of distance) plane/plate meshed with triangular elements (STL). A zero-mean GP on the deformations describes all plausible/admissible deformations of the plane. The diagonal kernel with homogeneous parameters is considered. The following figures demonstrate samples of the deformations (sample path or trajectory of the GP) with different kernel’s parameters.

The kernel determines how much and how far (the radius of effect) the deformations at a point in the space should affect the deformations at its neighboring points. Consider the plots of for different parameters (Fig. 2). Remember that provides the covariance values. The higher the covariance, the higher the dependency of deformations. Keeping constant and increasing s, increases the dependency while keeping the radius of effect constant (Fig. 2 left image). At a point itself, increasing s, increases the variance of deformations (around its mean) at that point; hence, larger deformations are expected. On the other hand, keeping the scale, s, constant and increasing leads to a larger radius of effect while a constant variance. This means the possible range of deformations (around the mean) does not change at a point, but its deformations (the deformations at that point), now, affect the deformations at the surrounding points to a larger distance, i.e. larger radius of effect. To summarize it, s determines the variance (possible ranges) of deformation magnitudes at a point, and controls how locally/globally the deformations at a point should affect its surrounding points’ deformations. Final word: here, working with GPs, zero kernel/zero covariance implies independency of deformations at two points; I deform this way, you deform the way you like. Non-zero kernel means I deform this way and you follow me as much as I say while deforming.

Another interesting observation that we can made is to use the kernel to eliminate the deformations in specific directions. It can be done by setting the GP mean function to zero, as already done, and also set the variance of the components of the deformations to zero. A random variable vanishes when its mean and its variance are both zero. Let’s eliminate the deformations in x and y directions being the in-plain directions and let the plane deform only out-of-plane, i.e. orthogonal to itself (the z direction). This enforcement of deformations resembles plate bending. Here, is the kernel:

k_{i,j}(x,x’)=\begin{bmatrix} 0&0&0\\[0.3em] 0&0&0\\[0.3em] 0&0&k_s(x,x’)\end{bmatrix}\tag{12}

#### Some power of kernel combination:

Sometimes it is needed to model different scale of deformations, i.e. modeling global and local deformations simultaneously. For example, consider a “global” bending of a piece of wiggly wire. The deformations are shown in Fig. 4.

To consider different scale of deformation we should let the deformations at a point, say point P, strongly affect the deformations of the nearby neighboring points in order to model the local deformations around that point. Also, the deformations at P, should affect the deformations of farther points to some extent. This is achieved by adding kernels (element-wise in the case of a matrix). For example, the following figure shows how a GP can model a global bending of a plate on which there are localized spiky round bumps.

The kernel used to model the deformations (Fig. 5) follows Eq. 12. The graphs of the scalar functions and their sum are as follows:

It may be interesting to observe different deformation modes, .i.e eigen functions. The following figure shows some of the modes:

Here is the code:

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

import scalismo.geometry._
import scalismo.common._
import scalismo.ui.api._
import scalismo.mesh._
import scalismo.registration._
import scalismo.io.{MeshIO}
import scalismo.numerics._
import scalismo.kernels._
import scalismo.statisticalmodel._
import breeze.linalg.{DenseVector}
import java.awt.Color

scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
val ui = ScalismoUI()

//viewing the refMesh
val modelGroup = ui.createGroup("model")
val refMeshView = ui.show(modelGroup, refMesh, "referenceMesh")
val refMeshColor = new Color(115,238,70)
refMeshView.color = refMeshColor
// creating the mean vector fields
val mean = VectorField(RealSpace[_3D], (_ : Point[_3D]) => EuclideanVector.zeros[_3D])

Kernel form regarding Fig. 1:

val kernel = DiagonalKernel[_3D](GaussianKernel(sigma = 0.8) * 0.1,3)

Kernel form regarding Fig. 3:

val scalarValuedGaussianKernel3 : PDKernel[_3D]= GaussianKernel(sigma = 0.3)* 0.1
val scalarValuedGaussianKernel2 : PDKernel[_3D]= GaussianKernel(sigma = 0.1)* 0
val scalarValuedGaussianKernel1 : PDKernel[_3D]= GaussianKernel(sigma = 0.1)* 0
val kernel = DiagonalKernel( scalarValuedGaussianKernel1, scalarValuedGaussianKernel2, scalarValuedGaussianKernel3)

Kernel regarding Fig. 5:

val s1 = 0.009
val s2 = 0.15
val diagDernel1 = DiagonalKernel(GaussianKernel(sigma = 1)*0,GaussianKernel(sigma = 1)* 0,GaussianKernel(sigma = 0.032)* s1)
val diagDernel2 = DiagonalKernel(GaussianKernel(sigma = 1)*0,GaussianKernel(sigma = 1)* 0,GaussianKernel(sigma = 0.8)* s2)
val kernel = diagDernel1 + diagDernel2

The rest of the code:

// defining the GP
val gp = GaussianProcess(mean,  kernel)
val lowRankGP = LowRankGaussianProcess.approximateGPCholesky(refMesh.pointSet,  gp,  0.07,  NearestNeighborInterpolator())

// visualizing and sampling
val deformedGroup = ui.createGroup("deformed")
val refMeshView2 = ui.show(deformedGroup, refMesh, "referenceMesh")
refMeshView2.color = refMeshColor
val gpDefView = ui.addTransformation(deformedGroup, lowRankGP, "RefMeshDefrmdByGp")

#### References:

[1] Gaussian Process Morphable Models, Marcel Luthi , Thomas Gerig, Christoph Jud, and Thomas Vetter. IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 40 , Issue: 8 , Aug. 1 2018).

# SSM Using Scalismo (1): Rigid alignment

Hereby, I am going to use Scalismo to rigidly register/align/fit objects represented by triangular meshes onto a particular object. The alignment is based on partial ordinary Procrustes analysis method (POPA). POPA involves translation and rotation (not scaling) of an object with respect to another in order to align/fit the transforming object to the stationary one as close as possible. Read about POPA here. This initial alignment makes the non-rigid registration easier and faster later.

There are 3 geometries of femoral heads in the form of STL and I select one as the reference object (reference mesh) and I will make the other 2 aligned/registered on the reference one.

For each shape, I specified 3 anatomically-chosen landmarks (LMs) and had their coordinates (x,y,z) listed in separate CSV files. For example, one file contains:
51.5378,-65.1603,-33.8293
44.1069,-66.5612,-26.1846
56.3146,-67.5174,-30.7776

Remark: Scalismo recognizes/identifies a landmark by its ID (name). This means that landmarks pointing the same anatomical point/feature on different objects should have the same ID (although different coordinates).

I already set up Scala enviroment. A piece of code can be run by lines in Scala console (Ctrl+Shift+X in IntelliJ); in this way, use Ctrl+Enter for I/O in the console environment. The whole object can be rum by extending the App trait.

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

object InitialRR extends App {
//To send the code selection to Scala Console, select the code in the Editor, press Ctrl + Shift + X
//imports
import scalismo.geometry._
import scalismo.common._
import scalismo.ui.api._
import scalismo.mesh._
import scalismo.registration.LandmarkRegistration
import scalismo.io.{MeshIO, LandmarkIO}
import scalismo.numerics.UniformMeshSampler3D
import scalismo.registration.{Transformation, RotationTransform, TranslationTransform, RigidTransformation}

scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
// ui
val ui = ScalismoUI()
//------------------------------------------------------------------------------------------------------------------//
// This is a function to remove a group view by asking the order from the user
def removeGroupView(groupView:scalismo.ui.api.Group) : Unit = {
val s = readLine(s"remove ${groupView.name} [y/n]? ") if (s == "y") groupView.remove() } // [Loading stl and LM files] val stlDir = "I:/registration/SampleData/Meshes/" val LMDir = "I:/registration/SampleData/LMs/" val meshFiles = new java.io.File(stlDir).listFiles() meshFiles.foreach(file => println(s"${file}\n"))

val numOfMeshes = meshFiles.size
println(s"number of files is ${numOfMeshes}") val originalData = ui.createGroup("Original Data") var i: Int = 1 val (meshes, meshViews) = meshFiles.map(meshFile => { val mesh = MeshIO.readMesh(meshFile).get val meshView = ui.show(originalData, mesh, "mesh" + i) i = i + 1 (mesh, meshView) // return a tuple of the mesh and the associated view }).unzip // take the tuples apart, to get a sequence of meshes and meshViews //meshViews(0).remove() removeGroupView(originalData) //------------------------------------------------------------------------------------------------------------------// // [removing centroid of each mesh i.e moving the centroid to (0,0,0), and hence the mesh by the same translation] val MeshPoints = meshes.map(mesh => (mesh.pointSet.points).toArray) //iterator to array def centroid(TriMeshPoints: Array[scalismo.geometry.Point[scalismo.geometry._3D]]): EuclideanVector[_3D] = { val vecrdPoints = TriMeshPoints.map { point => point.toVector } (vecrdPoints.reduce((v1, v2) => v1 + v2)) / TriMeshPoints.size //centroid } //finding centroids val MeshCentroids = MeshPoints.map(mesh => centroid(mesh)) // translating meshes to the origin i.e making the centroid equal to the origin val translations = MeshCentroids.map(centVec => TranslationTransform[_3D](centVec.*(-1))) val translatedMeshes = (0 until numOfMeshes).map(i => meshes(i).transform(translations(i))) val zeroCentData = ui.createGroup("ZeroCentData") val translatedMeshViews = (0 until numOfMeshes).map(i => ui.show(zeroCentData, translatedMeshes(i), "mesh" + i)) //selecting a RefMesh val refMesh = translatedMeshes(0) //------------------------------------------------------------------------------------------------------------------// //[generating landmark Objcs of each mesh] //reading landmark coordinates from files associated with data set files val LmFiles = new java.io.File(LMDir).listFiles LmFiles.foreach(file => println(s"${file}\n"))

val LmCrdnts: Array[DenseMatrix[Double]] = LmFiles.map(file => csvread(file, separator=',', skipLines=0))

// - creating point objects from the lm crdnts.A vector of vectors; each vector contains translated pointObjs of each set of Lms in a file

val LmPointObjs = (0 until numOfMeshes).map(i =>{
(0 until LmCrdnts(i).rows).map(row =>
translations(i)(Point( LmCrdnts(i)(row,0), LmCrdnts(i)(row,1),LmCrdnts(i)(row,2))))  // translating the pointObjs by the translations that move the meshes to their centroids
})

val LMs = LmPointObjs.map(eachSet => {
var n = 0
eachSet.map(pointObj => {n = n+1; Landmark(s"lm-{n}", pointObj)}) //Ids' names are important for registeration }) // a vector of vectors of landmarks. Each vector contains LmObjs of each set of Lms in a file val LMviews = LMs.map (eachSet => { eachSet.map(lm => ui.show(zeroCentData,lm,lm.id)) }) removeGroupView(zeroCentData) // -----------------------------------------------------------------------------------------------------------------// // [Rigid registration based on landmarks. Rigid (Procrustes) alignment of each mesh to the RefMesh] // RefLMs = LMs(0) val bestTransform = (1 until numOfMeshes).map (i => LandmarkRegistration.rigid3DLandmarkRegistration(LMs(i) ,LMs(0), center = Point(0, 0, 0))) //rigid3DLandmarkRegistration pares the corresponding landmarks based on their Ids (names) //------------------------------------------------------------------------------------------------------------------// // [Transforming the landmarks and the meshes onto the RefMesh based on the bestTransform] val alignedData = ui.createGroup("alignedData") //transforming LMs val rigidTransformedLms = (0 until numOfMeshes-1).map (i => { LMs(i+1).map (lm => lm.transform(bestTransform(i))) //(i+1) as the RefLms stay where they are }) val rigidTransformedLmsviews = rigidTransformedLms.map (eachSet => { eachSet.map(lm => ui.show(alignedData,lm,lm.id)) }) val RefLmsView = LMs(0).map(lm => ui.show(alignedData,lm,lm.id)) //transforming Meshes val alignedMeshes = (0 until numOfMeshes-1).map(i => translatedMeshes(i+1).transform(bestTransform(i))) // showing the meshes val alignedMeshesViews = (0 until numOfMeshes-1).map(i => ui.show(alignedData, alignedMeshes(i), "Amesh" + i)) val RefMeshesView = ui.show(alignedData, refMesh, "RefMesh") //------------------------------------------------------------------------------------------------------------------// //Saving aligned mesh and LMs to new files val outputMeshDir = drive + "I:/registeration/AlignedData/Meshes/" val outputLmDir = drive + "I:/registeration/AlignedData/LMs/" MeshIO.writeMesh(refMesh, new java.io.File(outputMeshDir + "mesh0.stl")) (0 until numOfMeshes-1).foreach(i => MeshIO.writeMesh(alignedMeshes(i), new java.io.File(outputMeshDir + s"mesh{i+1}.stl")))

LandmarkIO.writeLandmarksJson[_3D](LMs(0), new java.io.File(outputLmDir + "LM0.json"))
(0 until numOfMeshes-1).foreach(i => LandmarkIO.writeLandmarksJson[_3D](rigidTransformedLms(i), new java.io.File(outputLmDir + s"LM\${i+1}.json")))
// asking for closeing the ui
val finito = readLine("finito [y/n]? ")
if (finito == "y") ui.close()
}

Here is what the code does, POPA has rigidly registered two femoral heads onto the one selected as the reference.