SSM Using Scalismo (2): GP mean and kernel

Statistical shape modeling (SSM) using Scalismo is based on Gaussian process (a random field. A Gaussian process, like normal probability distribution, is completely defined by its mean ( µ ) and covariance function ( k ). The covariance function is called kernel. We write GP(µ , k). The mean and kernel are both functions. In SSM, GP models shape deformations (a vector field) in a 3D (or 2D) space. It means that the GP tells us the probable or plausible deformation vector at each point in the space. The mean function eats a point in space, and output a real vector of deformations; kernel eats two points in the space and output a matrix. If the points in our 3D geometric space are represented by their coordinates, the aforementioned functions are:

\mu:\mathbb{R^3}\to\mathbb{R^3} \newline k:\mathbb{R^3}\times\mathbb{R^3}\to\mathbb{R^{3\times3}}\tag{1}

Once the mean and the kernel functions are determined, we can construct our SSM based on the GP. Statistical shape models based on GPs are called Gaussian Process Morphable Models (GPMM) [1]. A GPMM is composed of a GP describing the deformation field, and a reference shape. A reference shape is a subset of our geometric space. Here, the geometric space is a set of point. Describing the deformations through a GP, we can have the GP produce plausible deformations of the reference shape. The plausibility of the deformations is determined by the kernel. Hence, a shape produced/described by a GPMM is mathematically defined as [1]:

\Gamma_S=\{x+u(x)|x\in\Gamma_R\subset\Omega\text{ , }u:\Omega\to\mathbb{R^3}\text{ and }\Omega\supset\Gamma_R\}\tag{2} \newline \Omega \text{ open in }\mathbb{R^3}

The set of points of any plausible shape \Gamma_S is simply formed through giving each point of the reference shape a new location in the space. The new location of each point is obtained by the map x+u(x). The map eats a point, translate it by (along) the vector u(x), and output a point. Although both points and vectors are analytically recognized by tuples in \mathbb{R^3}, I need to pay attention to the conceptual difference between a point and a vector in the space.

Where is the GP? Here, where the GP comes into the scene. All plausible functions (maps), u(x), are distributed according to a GP distribution:

u(x)\sim\mathnormal{GP}(\mu(x),k(x,x’))\text{ s.t }x\in\Omega\tag{3}

The mean (average) shape of the sets of all plausible shapes is naturally defined as:

\overline{\Gamma}=\{x+\mu(x)|x\in\Gamma_R\}\tag{4}

If, for simplicity, we let \mu(x)=0, we’ll get \overline{\Gamma}=\Gamma_R.

Now, the kernel. The kernel is actually the covariance function measuring the variability or correlation of the components of two deformation vectors at two points. As each deformation vector has 3 components, there will be 9 covariance values organized in a 3 by 3 matrix:

k(x,x’):=cov(u(x),u(x’))\tag{5}

where u(x) is a vector in  \mathbb{R^3} . If we consider u(x) (and \mu(x)) as a column vector in the space of 3 by 1 matrices (  \mathbb{R^{3\times1}} ), we have:

cov(u(x),u(x’))=E[(u(x)-\mu(x))(u(x’)-\mu(x’))^\text{T}]\tag{6}

With (.)^T being the transpose operator. Expanding the above, we’ll get the matrix of k(x,x') with the components:

k_{i,j}(x,x’)=cov(u_i(x),u_j(x’))=E[(u_i(x)-\mu_i(x))(u_j(x’)-\mu_j(x’))] \newline\tag{7}

Or in an expanded form with some (easy to capture) change of notations:

k_{i,j}(x,x’)=\begin{bmatrix} cov(u_x^1,u_{x’}^1)&cov(u_x^1,u_{x’}^2)&cov(u_x^1,u_{x’}^3)\\[0.3em] cov(u_x^2,u_{x’}^1)&cov(u_x^2,u_{x’}^2)&cov(u_x^2,u_{x’}^3)\\[0.3em] cov(u_x^3,u_{x’}^1)&cov(u_x^3,u_{x’}^2)&cov(u_x^3,u_{x’}^3)\end{bmatrix}\tag{8}

We can observe that the function k(x,x') is symmetric in the sense k(x,x') = k(x',x) . Why? Because covariance is a symmetric function, i.e. cov(a,b) = cov(b,a) for two random variables (NOT random vectors), a and b.

Remark: symmetry of k(x,x') does not imply that the matrix [k_{i,j}(x,x')] is symmetric as well. If x \neq x', then it is not necessarily true that cov(u_x^1,u_{x'}^2) \neq  cov(u_x^2,u_{x'}^1) , and so on. Note that u_x^1 , u_{x'}^2 , u_x^2 and u_{x'}^1 are four different random variables.

However, the kernel leads to a symmetric matrix when evaluated at the same point, i.e. k(x,x), which is equal to cov(u_i(x),u_j(x))

What sort of kernel should we use?

Let’s define a scalar Gaussian kernel  k_s:\mathbb{R^3}\times\mathbb{R^3}\to\mathbb{R} as [1]:

k_s(x,x’):=s\ \exp({-\frac{\|x-x’\|^2}{\sigma^2}})\tag{9}

where  \|.\| is the Euclidean norm expressing the distance between two (geometric) points in the space, s is a scale, and \sigma^2 is a parameter defining the distance range over which the random variables are correlated. This kernel only depends on the distance of two points and not their locations. This means the correlation between the deformations at two points decreases as their distance increases.

One choice of generating the kernel matrix regarding the deformations is to assume that the deformation components are not correlated. Since we are working with a GPs, being uncorrelated implies independence. In mathematical notations:

cov(u_x^i,u_{x’}^j)=0\text{ for }i\neq j\text{ s.t }i,j\in\{1,2,3\}\tag{10}

which implies:  \rho(u_x^i|u_{x'}^j)=\rho(u_x^i) for  i\neq j , where \rho(.) denotes the related probability distribution. As a side note  u_x^i  and  u_{x'}^j are random variables, therefore above follows this: \rho_{X|Y}(x|y)=\rho_{X}(x) for any random variables x and y.

Now we use the scalar kernel as the building component of our matrix kernel and define a diagonal kernel with the same parameters in each direction (may be I can call it a diagonal kernel with homogeneous parameters):

k_{i,j}(x,x’)=\begin{bmatrix} k_s(x,x’)&0&0\\[0.3em] 0&k_s(x,x’)&0\\[0.3em] 0&0&k_s(x,x’)\end{bmatrix}\tag{11}

Let’s puts all these into practice. We consider a 1 by 1 (unit of distance) plane/plate meshed with triangular elements (STL). A zero-mean GP on the deformations describes all plausible/admissible deformations of the plane. The diagonal kernel with homogeneous parameters is considered. The following figures demonstrate samples of the deformations (sample path or trajectory of the GP) with different kernel’s parameters.

Fig.1. Sampled deformations of a plane from a GP with a diagonal kernel having different homogeneous parameters.

The kernel determines how much and how far (the radius of effect) the deformations at a point in the space should affect the deformations at its neighboring points. Consider the plots of k_s for different parameters (Fig. 2). Remember that k_s provides the covariance values. The higher the covariance, the higher the dependency of deformations. Keeping \sigma constant and increasing s, increases the dependency while keeping the radius of effect constant (Fig. 2 left image). At a point itself, increasing s, increases the variance of deformations (around its mean) at that point; hence, larger deformations are expected. On the other hand, keeping the scale, s, constant and increasing \sigma leads to a larger radius of effect while a constant variance. This means the possible range of deformations (around the mean) does not change at a point, but its deformations (the deformations at that point), now, affect the deformations at the surrounding points to a larger distance, i.e. larger radius of effect. To summarize it, s determines the variance (possible ranges) of deformation magnitudes at a point, and  \sigma controls how locally/globally the deformations at a point should affect its surrounding points’ deformations. Final word: here, working with GPs, zero kernel/zero covariance implies independency of deformations at two points; I deform this way, you deform the way you like. Non-zero kernel means I deform this way and you follow me as much as I say while deforming.

Fig. 2. Effects of the scalar kernel parameters on the kernel values. Horizontal axis: distance between two points, vertical axis: scalar kernel values.

Another interesting observation that we can made is to use the kernel to eliminate the deformations in specific directions. It can be done by setting the GP mean function to zero, as already done, and also set the variance of the components of the deformations to zero. A random variable vanishes when its mean and its variance are both zero. Let’s eliminate the deformations in x and y directions being the in-plain directions and let the plane deform only out-of-plane, i.e. orthogonal to itself (the z direction). This enforcement of deformations resembles plate bending. Here, is the kernel:

k_{i,j}(x,x’)=\begin{bmatrix} 0&0&0\\[0.3em] 0&0&0\\[0.3em] 0&0&k_s(x,x’)\end{bmatrix}\tag{12}
Fig.3. Sampled deformations of a plane from a GP with a kernel that eliminates in-plane deformations.

Some power of kernel combination:

Sometimes it is needed to model different scale of deformations, i.e. modeling global and local deformations simultaneously. For example, consider a “global” bending of a piece of wiggly wire. The deformations are shown in Fig. 4.

Fig. 4. Bending of a piece of wiggly wire.

To consider different scale of deformation we should let the deformations at a point, say point P, strongly affect the deformations of the nearby neighboring points in order to model the local deformations around that point. Also, the deformations at P, should affect the deformations of farther points to some extent. This is achieved by adding kernels (element-wise in the case of a matrix). For example, the following figure shows how a GP can model a global bending of a plate on which there are localized spiky round bumps.

Fig. 5. Different scales of deformations: global bending and localized round bumps.

The kernel used to model the deformations (Fig. 5) follows Eq. 12. The graphs of the scalar functions and their sum are as follows:

Fig. 6. Scalar functions and their sum.

It may be interesting to observe different deformation modes, .i.e eigen functions. The following figure shows some of the modes:

Fig. 7. Some global and local deformation modes.

Here is the code:

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

import scalismo.geometry._
import scalismo.common._
import scalismo.ui.api._
import scalismo.mesh._
import scalismo.registration._
import scalismo.io.{MeshIO}
import scalismo.numerics._
import scalismo.kernels._
import scalismo.statisticalmodel._
import breeze.linalg.{DenseVector}
import java.awt.Color

scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
val ui = ScalismoUI()

// loading our mesh file and let it be named refMesh
val refMesh = MeshIO.readMesh(new java.io.File("I:/Scalismo/plane.stl")).get

//viewing the refMesh
val modelGroup = ui.createGroup("model")
val refMeshView = ui.show(modelGroup, refMesh, "referenceMesh")
val refMeshColor = new Color(115,238,70)
refMeshView.color = refMeshColor
// creating the mean vector fields
val mean = VectorField(RealSpace[_3D], (_ : Point[_3D]) => EuclideanVector.zeros[_3D])

Kernel form regarding Fig. 1:

val kernel = DiagonalKernel[_3D](GaussianKernel(sigma = 0.8) * 0.1,3)

Kernel form regarding Fig. 3:

val scalarValuedGaussianKernel3 : PDKernel[_3D]= GaussianKernel(sigma = 0.3)* 0.1
val scalarValuedGaussianKernel2 : PDKernel[_3D]= GaussianKernel(sigma = 0.1)* 0
val scalarValuedGaussianKernel1 : PDKernel[_3D]= GaussianKernel(sigma = 0.1)* 0
val kernel = DiagonalKernel( scalarValuedGaussianKernel1, scalarValuedGaussianKernel2, scalarValuedGaussianKernel3)

Kernel regarding Fig. 5:

val s1 = 0.009
val s2 = 0.15
val diagDernel1 = DiagonalKernel(GaussianKernel(sigma = 1)*0,GaussianKernel(sigma = 1)* 0,GaussianKernel(sigma = 0.032)* s1)
val diagDernel2 = DiagonalKernel(GaussianKernel(sigma = 1)*0,GaussianKernel(sigma = 1)* 0,GaussianKernel(sigma = 0.8)* s2)
val kernel = diagDernel1 + diagDernel2

The rest of the code:

// defining the GP
val gp = GaussianProcess(mean,  kernel)
val lowRankGP = LowRankGaussianProcess.approximateGPCholesky(refMesh.pointSet,  gp,  0.07,  NearestNeighborInterpolator())

// visualizing and sampling
val deformedGroup = ui.createGroup("deformed")
val refMeshView2 = ui.show(deformedGroup, refMesh, "referenceMesh")
refMeshView2.color = refMeshColor
val gpDefView = ui.addTransformation(deformedGroup, lowRankGP, "RefMeshDefrmdByGp")

References:

[1] Gaussian Process Morphable Models, Marcel Luthi , Thomas Gerig, Christoph Jud, and Thomas Vetter. IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 40 , Issue: 8 , Aug. 1 2018).