# Scala tips

### Scala Option and Some

The absence of a value during rurtime should be somehow handled otherwise the will be a runtime error. The absent value is a `null`

value. If we have a value of type `D`

that may be absent (for example in method, function, class constructor), Scala uses an instance (object) of the (abstract) class`Option[D]`

as its container. The class `Option`

has two sub classes `Some`

and `None`

. Therefore, an instance of `Option`

is an instance of either `Some`

or `None`

. For a variable `x`

of type `Option[D]`

, the factory (of `Option[D]`

) creates `Some(x)`

if `x != null`

or `None`

if `x == null`

.

The following are examples of situations to use `Option`

:

1- Creating a type-Option val:

val optionalIntValue: Option[Int] = Some(1) val optionalValue = Some("abc") // Scala infers the type // or // val optionalIntValue: Option[Int] = None // type is needed

2- When a function may or may not return a value; we want to give it the option to return a value or not:

def myFunc: Option[String] = { ... }

3- When a parameter of a function can be present or absent, i.e being optional to be given:

def myFunc(param1: Option[D]): ... case class MyClass(param1: Option[D], param2: String)

Keeping in mind that `Option`

is just a type as a container for a value that can be absent, there are several ways to use an `Option`

-type variable. Examples:

1- The most general way is to use an `Option`

with a `match`

. We can define the output of a method as an `Option`

and handle the output. For example, a function that reads user input. User may just press Enter/Cancel and input nothing.

def readUserInput(): Option[processedUserInputType] = { ... } readUserInput() match { case Some(outputOftheFunction) => // Do something with outputOftheFunction case None => // Do something when there is no output }

This example is to show how to make a parameter of a function optional for the user. If not provided, it will be considered as a predefined value `None`

.

def show(x: Option[String], y: Option[Int] = None) = { x match { case Some(s) => println(s) case None => println ("?") } y match { case Some(number) => println( s"the number is ${number}") case None => println("no number") } } val myInput1 = Some("hello bear") show(myInput1) // hello bear no number show(None) // ? no number show(None, y=Some(123)) // ? the number is 123

2- Extracting the value of an `Option`

using `getOrElse`

method. A default value for the `None`

case is mandatory.

val someValue1 = Some(123) val someValue2 : Option[Int] = None someValue1.getOrElse(100) // 123 someValue2.getOrElse(100) // 100

# Modeling data with traits

# Curves

**Note:** by \R^n, we indicate an Euclidean geometrical space, i.e. space of (geometric) points with an origin and equipped with a normed vector space.

## 1- Parametrized smooth curves

**Definition: **a parametrized smooth curve is a function \gamma:I\sub\R\to\R^n where I=(a,b) is an open interval in extended \R and \gamma is a smooth function.

Note that a parameterized curve (always smooth) is referred to the* function *\gamma(t)=(\gamma_1(t), \dots,\gamma_n(t))\in \R^n, not to the function image \gamma(I), which is a set of points in \R^n. By a *curve* we mean *parametrized smooth curve* unless otherwise stated. A curve in \R^n means a curve (function) whose image is in \R^n.

**Definition:** a parametrized curve \tilde{\gamma}:(\tilde{t_1},\tilde{t_2})\to\R^n is a reparametrization of the curve \gamma:(t_1,t_2)\to\R^n if there is a smooth bijection \phi:( \tilde{t_1},\tilde{t_2} )\to(t_1,t_2) such that the inverse map \phi^{-1}:( t_1,t_2)\to (\tilde{t_1},\tilde{t_2}) is also smooth and \gamma(\phi(\tilde{t}))=\tilde{\gamma}(\tilde{t}) \quad \forall \tilde{t} \in (\tilde{t_1},\tilde{t_2})\sub \R. Note that t = \phi(\tilde{t}). This indicates that \gamma is also a reparametrization of \tilde\gamma. Also note that \phi(t) is monotonic. If decreasing, we have t_2 = \phi(\tilde{t_1}) \gt t_1 = \phi(\tilde{t_2})\text{ for } \tilde{t_1}\lt \tilde{t_2}. This indicates that reparametrization can change the relative direction of a curve.

Curves with different reparametrizations have the same image and hence the same geometrical properties.

Note: \tilde{\gamma}(\tilde{t}):=\gamma(\phi(\tilde{t}))=(\gamma_1(\phi(\tilde{t})),\dots,\gamma_n(\phi(\tilde{t}))=(\tilde{\gamma_1}(\tilde{t}),\dots,\tilde{\gamma_n}(\tilde{t}))

#### Arc length and unit speed curves

For any parameter t, \dot\gamma(t):=\frac{d}{dt}\gamma(t).

The arc length function is defined as:

That can be negative or positive depending on t is larger or smaller than t.

From a physical point of view, the trajectory of a moving point can be represented by a curve, therefore, the speed of a moving point along a curve (defined as the rate of change of its distance along the curve ) is:

A unit-speed curve is by definition is a curve with \|\dot{\gamma}(t)\|=1, \forall t\in I, i.e. \dot{\gamma}(t) is a unit vector. In applications for the sake of geometry, not physics, the speed of a curve is not important, therefore, considering a unit-speed parametrization of the curve is shown to simplify formulas and proofs. For example the following useful proposition is often useful in proofs:

**Proposition:** let n(t)\in\R^n\text{ with }\|n(t)\|=1,\forall t\in I be a smooth function of t, then n(t)\cdot\dot{n}(t)=0 meaning that \dot{n}(t) is either perpendicular to n(t) or zero for all t. Proof by considering n(t)\cdot n(t)=1. In particular, the proposition is true for \dot{\gamma}(t)\cdot \ddot{\gamma}(t) when the curve is unit-speed.

**Definition:** A point P:=\gamma(t) of a curve [latext]\gamma[/latex] is called a regular point iff \dot\gamma(t)\neq 0; otherwise a singular point. A regular curve is a curve iff all of its points are regular.

**Propositions:** A reparametrization of a regular curve is regular and the arc length function of a regular curve is smooth. A regular curve has a unit-speed reparametrization, and a unit-speed curve is a regular curve. The arc length (s) is the only unit-speed reparametrization of a regular curve (actually u=\pm s + c for some constant c).

From above, the tangent vector (and speed of an object moving along a regular curve; it’s moving, not stationary) does not vanish at any point.

## 2- Curvature and torsion of a curve

The curvature is a measure of an extent to which a curve (geometric object) is not contained in a straight line. This impose zero curvature for straight lines. Torsion quantifies the extent to which a curve is not contained in a plane. Hence, plane curves have zero torsion.

To find the measure of curvature, a plane curve is considered. The results can easily be extended to curves in =R^3 and higher dimensional curves (by extending the definition).

Let \gamma(t)\in \R^2 be a unit-speed curve, i.e. t\equiv s and \dot\gamma(t)=n(t). For any parameter value t corresponding to a point P=\gamma(t), any changes \Delta t in the parameter gives another point of the curve P_{\Delta}:=\gamma(t+\Delta t). The distance between latter point and the tangent line at the former point is (the distance between a line and a point), is defined as the shortest distance between the line and the point. This is the length of the perpendicular line segment from the point to the tangent line. Therefore, a measure of distance is (\gamma(t+\Delta t)-\gamma(t))\cdot \hat n where \hat n is the unit vector perpendicular to the tangent line (\dot\gamma(t)\cdot \hat n=0). This is a signed distance. By Taylor’s theorem:

Therefore, (\gamma(t+\Delta t)- \gamma(t))\cdot \hat n \approx \frac{1}{2}(\Delta t)^2\ddot\gamma(t)\cdot\hat n. Because \gamma(t) is a unit-speed curve we have \dot\gamma \perp \ddot\gamma; which indicates \ddot\gamma \parallel \hat n. Therefore:

So the magnitude of the distance, i.e. \frac{1}{2}|\ddot\gamma(t)|(\Delta t)^2 indicates the deviation of the curve from its tangent line (at a particular point). For a fixed deviation in the parameter, and consequently in the point, the quantity \|\ddot\gamma(t)\| expresses the scale of the deviation from the tangent line which is a straight line. This suggests the following definition:

**Definition:** for a unit-speed curve \gamma(t)\in \R^n, its curvature is defined as \kappa(t):=\|\ddot\gamma(t)\|.

For any regular curve (which always can be reparametrized to a unit-speed curve), the curvature at a point is the magnitude of the rate of the change of the unit tangent vector at that point. For a unit speed curve, its simply \|\ddot\gamma(t)\| . For a general regular curve in \R^3 though:

#### The direction of \ddot\gamma(t)

The direction of \ddot \gamma is in the sense of the direction of the turning of the curve when tracked in its forward direction. The forward direction of a curve is the sense of moving from one point to the next point. The next point of a point \gamma(t) is \gamma(t+\delta t) for positive or negative \delta.

The curvature of a straight line is zero and \dot\gamma\cdot\ddot\gamma = 0 \iff \ddot\gamma=0 though \dot\gamma=c s.t. c\not= 0 being a constant.

# Inquiring into a triangular mesh

All codes are written in Python and based on 3D plotting and mesh analysis library **PyVista**.

## Cells connected to a node

To speed up finding the cells connected to a node, I initiate a dictionary with items `{pointId : {cellIds}}`

. Using a set for cellIds leads to quick update and allows union properties of sets.

numOfPoints = mesh.n_points meshPointIds = range(numOfPoints) meshCells = mesh.faces.reshape(-1, 4)[:, 1:4] pntIdCellIdDict = {pntId: set() for pntId in meshPointIds} for cellIndx, cell in enumerate(meshCells): for pntId in cell: pntIdCellIdDict[pntId].update({cellIndx})

# Python common tips

## Modules

Note that `from myModule import *`

shouldn’t be used. If done, all the global variables defined in `myModule`

will appear in the current namespace. Modulus should not pollute each other.

## Lists and npArrays

**1-** Lists are (much) faster than npArrays when accessing or appending elements. However, npArrays are faster when doing linear algebra operations. It is more efficient to manipulate data in the form of lists and then convert them into npArrays when dealing with large data.

## Set

**1- A set of sets:** `frozenset()`

listOfSets1 = [{5},{1,2},{3,4},{5,7,9}] listOfSets2 = [{11},{1,2},{12,16},{5,7,9},{7,23,13}] setOfSets1 = {frozenset(eachSet) for eachSet in listOfSets1} setOfSets2 = {frozenset(eachSet) for eachSet in listOfSets2} setOfSets1.intersection(setOfSets2) Out[0]: {frozenset({5, 7, 9}), frozenset({1, 2})}

## Functions

**1- Type hinting**

def myFunction(text: str, flag: bool, name: str = "Aristotle") -> str: pass

## Operators

**1- The results of different operators on objects.** Objects (e.g. lists, dictionaries, other class objects, etc.) with the same value are usually stored at separate memory addresses.

L1 = [1,2,3] L2 = [1,2,3] print(L1==L2) print(L1 is L2) print(L1 is not L2) L1 = [1,2,3] L2 = [6,7,8] print(L1 != L2) print(L1 is not L2) # output True False True True True

**Useful Functions**

`zip()`

pair up iterables and gives and iterator (https://realpython.com/python-zip-function/)

## I/O

**Save an array to a text file.**

Method 1: use `np.savetxt()`

# Bayesian inference: Introduction

### On the notations

Consider the following function defined as:

The set A is the domain of the function and x is the argument of it. The set \Theta is the parameter space and the parameters theta‘s of the function belongs to that set. For example f(x,y;\alpha,\beta):=\alpha cos(x^y/\beta)\ \text{ s.t } (\alpha,\beta)\in \R^2 has two parameters. The following notation is sometimes used to differentiate between the argument and parameter variables of a function:

it reads “*f* of *x* given \theta.

In the probability theory, the concept of conditional probability also uses the same notation, i.e. P(A|B) that reads the probability of the event *A* given that the event *B* has occurred; or, the probability of -the event- A conditional on -the event- B. In the case of random variables/vectors, we write P(Y\in R|X\in S).

The definition of the probability density/mass functions also uses the aforementioned notation. Consider a random variable/vector X with a probability density function (pdf) having some parameters. Then the pdf is written as:

This should be understood from the context that \theta is not a random variable/vector and hence the above does not imply a conditional pfd. As \theta pertains to a probability distribution (model), another letter is sometimes added to the above notation in order to include the model’s type or some information on the model :

A conditional pdf is identified with the same notation. Considering two random variables/vectors X and Y, we write:

which is a function of x if y is already fixed. This reads the pdf of X given -the observed and fixed value of- Y. Note that the notation f_{X|Y}(x|y) is just a function notation.

A conditional pdf (as a result of normalizing a joint pdf which has parameters) has parameters; if the parameters of the conditional pdf are collected in a vector and denoted by the vector \theta, the conditional pdf can be represented as:

such that it is a function of x for a fixed value of the random variable y and fixed values of the parameters \theta. So, the notation “|” is used to indicate both the parameters of a function and conditions on/of random variables involved in a pdf. Whatever precedes “|” is (are) the function argument(s). Henceforward, I use the notation “|” for both of the purposes and the context will imply the meaning.

### Parametric inference

For a continuous random vector X\in R\sub \R^n, a continuous joint pdf f_X(x) can be defined as:

where \theta \in D_\theta is a set of *parameters* involved in the definition of the pdf.

The form f_X(x;\theta) can also be interpreted as a function of \theta at a constant x:

This function is called the **likelihood function**. The notation L(\theta |x) reads the likelihood of \theta given the (observed) data x. Because \theta is not a random variable/vector, the concept of conditional pdf is irrelevant. This is a function of the parameter \theta, not x, hence, it is not a density function. Depending on the variable in consideration, the function f_X(x;\theta) can be interpreted as either the density function or the likelihood function. It should be noted that the parameter(s) belongs to a model; this is usually implied implicitly. For example the pdf of a normally distributed random variable has two parameters \mu and \sigma^2.

Usually, the parameters of a probability distribution are unknown. Therefore, there is uncertainty about the parameters. The parameters can then be collected in a random vector \Theta. Now both the data and the parameters are random vectors (or variables). A joint pdf can be imagined for the data and the parameters of the pdf of the data:

The notation |M reads “given a/the model *M*“. This notation is used merely to convey the fact that \theta pertains to a model and also to express extra information about the model; indeed, it does not indicate a (quantitative) conditional on *M*.

Consequently, the following conditional pdf of \Theta given the data X=x can be written as:

For a given x, i.e. data, the above is a function of \theta. Repeating the conditional pdf law, we can write:

*M* indicates that a model (joint distribution of data and parameters) is already assumed; it can be an implicit assumption. Also, any other background information about the model (e.g. the range of the parameters) is integrated into *M*. The terms are named and explained as:

**1)** **The likelihood:** f_{X|\Theta}(x|\theta) is called **sampling distribution**. This is the distribution (of the observed) data conditional on the parameters. In other words, it tells us the pdf value of the data (measured) given some values of the parameters. This term is the pdf of data, x, conditioned on \theta, however, once the data have been observed/measured/given and hence becomes a fixed variable, the term becomes a function of the parameters (\theta). Therefore, this term is referred to as the **likelihood of the data**. There is a subtle point here. The term is naturally a pdf in x and hence can be imagined as:

Therefore, h_X(x;\theta) becomes a likelihood function if considered as a function of \theta, i.e. L(θ∣x):=h_X(x|θ) (once the data have been observed/measured, it automatically becomes a function of the parameters). This likelihood function is the key function describing both the phenomenon (expressed by the model) and the measurements. Sometimes it is referred to as the likelihood of the parameters; but it is not really a suitable name because the likelihood uses the same function form identical to the conditional pdf in data. Moreover, the unit of a f_{X|\Theta}(x|\theta) is D^{-1} where D is the unit of data. This is because the unit of f_{\Theta}(\theta) is \frac{1}{M} and the unit of f_{X,\Theta}(x,\theta) is \frac{1}{D}\frac{1}{M}.

**2) The prior:** f_\Theta(\theta) is called the** prior distribution** of \Theta. This expresses the distribution of \Theta before observing the data (regardless of data). It encapsulates the information we have or assumed about the possible values of the model parameters regardless of the data.

**3) The evidence:** f_X(x) is the distribution (marginal pdf) of data. In mathematical notation: f_X(x|M)= \int_{\text{all values of }\theta} f_{\Theta,X}(\theta,x|M)d\theta. The evidence is not a function of the parameters, therefore, it is just a normalization constant for the posterior.

**4)** **The posterior:** f_{\Theta|X}(\theta|x) is called the **posterior distribution** (pdf) of \Theta. This is the pdf of the parameters \Theta (function of the parameters) given the model information and (observed) data.

Noting and keeping in mind that each of the different pdf’s in Eq. 2 is a function of the variable preceding “|” and pertaining to the pdf of the function’s argument, we usually write the equation as (f is replaced with p):

By the aforementioned notation, the term(s) after “|” conveys that the pdf is a conditional pdf if the term(s) pertains (be a value of) a random vector/variable. E.g: p(x|\theta,M)\equiv p_{X|\Theta}(x|\theta,M).

For given/measured data (fixed), the numerator of Eq. 3 is a function of the parameters while the denominator is a constant. Therefore, the following representations are defined and usually used in inference:

### Application: Data modeling with parametric models

A **generative model** (also called the forward model) is the theoretical agent (usually a mathematical equation) that simulates/generates/predicts the observable data; the model is identified by its form (equation) and its parameters. A general form of a **non-stochastic** or **deterministic generative model** can be written as:

where y_s is the simulated data/measurements, x is the index at which the data is measured (e.g time, point at space, etc), and \theta_{f} \in \R^k contains the parameters of the function (model). A **stochastic model** possesses some randomness; unlike a deterministic model that produces the same output for the same parameters (and initial conditions), a stochastic model produces different outputs. Stochastic models predict distribution(s) over the outputs.

A generative model predicts the data in the absence of measurements noise i.e. y\approx y_s for measurements y. It means that the models gives the true value while the measurements are polluted with noise introducing uncertainty into the measurements. We should distinguish between the **true/ideal data** and the **noisy data/measurements**. Noisy data (out of measurements) is what we usually have. The model form and its parameters are always determined based on the observed noisy data/measurements. Therefore, we should know how the measurement process influences the true data/values. To this end, a **measurement model** also called a **noise model** is also to be defined. This model contains the uncertainty/noise in the measured data. The measurement model can then be written as y=y_s+\epsilon. The variable \epsilon is actually a random variable/vector introducing the noise.Consequently, the measurement, y becomes a random variable/vector as well. Having said that and using capital or bold letters for random vectors/variable and small or non-bold letters for non-random variables, we write:

where \Omega is a probability sample space. Note that f(x|\theta_{f}) is a function of x not a pdf. \bold y is inherently a stochastic process/random field, i.e. at each x (index), a random variable exits. The noise, \epsilon has a distribution and indeed a pdf, p_{\boldsymbol \epsilon}(\epsilon|\theta_{\epsilon}), with its own parameters \theta_{\epsilon}. In such a model, the noise average is set to be zero, i.e. E[\epsilon]=0, therefore, the mean of a measurement at each point is equal to the ideal data. This is simply because the simulating part is not a random vector, hence, E[Y]=f(x;\theta_{f})+0.

Because y_s is not a random variable/vector, the probability distribution of Y and of the noise (model) are the same with different means. In such a data modeling, the noise distribution is assumed (known) and the simulating function parameters are sought. **This approach is not Bayesian**.

A **Bayesian approach to data modeling** arises when the parameters of the generative model (simulating function) are also considered to be random variables, \Theta_f; this means there is uncertainty introduced into the parameters. hence, a probability model is designated to the parameters. The data model can now be written as:

where x indexes Y_s. Now the measurement model is the result of sum of the two random variables/vectors, i.e. a function of a random vector/variable, and the noise. The random variable/vector \Theta_f has a pdf with its own parameters:

where w is a vector of parameters. The assumption on the probability distribution of the parameters of a generative model enforces a prior on the model’s parameters. This prior does not depend on the data or the noise.

The probability distribution of noise is usually/initially assumed. What remains unknown are the probability distributions of Y and \Theta_{f}. In this regard, the joint probability distribution and hence the joint pdf of these two unknown random variables/vectors can be considered at each index x:

Using the Eq. 3 (Bayes’ theorem), we write:

where the assumed/preset parameters (in Eq. 8) are integrated into the model information.

The term p(y|\theta_f,M), namely the likelihood of data when observed, can be written in terms of the noise pdf if the noise random vector/variable is assumed to be independent of the parameters. It is usually the case. Denoting the cumulative distribution function by F_X(.), and considering the eq. 6 at one fixed indexing point x, we can write:

It is worthy of note that if the parameter vector \Theta_f is not a random vector (as used in the model in Eq. 5), but just a deterministic parameter, denoted by \theta_f, then at a particular point indexed by x, we have Y=g(\theta_f)+\boldsymbol \epsilon. This Y different from the random vector Y=Y_s+\boldsymbol \epsilon, therefore we denote it by Y^*. For a deterministic \theta_f, we have p_{\boldsymbol \epsilon}(y^*-g(\theta)), because:

As \theta becomes the parameter of the function p_{Y^*}(y^*), we can write p_{Y^*}(y^*|\theta)= p_{\boldsymbol \epsilon}(y^*-g(\theta)). Comparing with the case of probabilistic parameter, we can say that for a given value of \Theta, the random vector Y becomes Y^*, i.e Y|\Theta=\theta with the the pdf as in Eq. 11. In other words,

Note that for any two random vectors X and Y, the term Y|X=x (or vice versa) is another random vector like Z:=Y|X=x with the support \{z=(s,y):s=x,\text{ and }, y\in \text{support of Y}\}.

In any of the approaches on the parameters, Y or [/latex]Y^*[/latex] are to model the data. Therefore, sample of observed values of data are substituted for their values in order to find the model parameters.

It is common to assume a (multivariate) Gaussian distribution for the noise, \mathscr N(0,\Sigma_{\epsilon \epsilon}). The noise distribution now has a single parameter \Sigma_{\epsilon \epsilon} . This implies that the covariance matrix of noise is diagonal. Also, it is assumed that the noises (the random vector) of all measurements (a vector of measurements indexed by x) are *iid*, meaning that they have the same variance, i.e.* *identically distributed, and probabilisticly independent of each other, i.e. independent. In this regard (for y or y^*):

and with the *iid* assumption for the noise, its covariance matrix becomes an diagonal and isotropic having \sigma_{\epsilon}^2 as its diagonal elements:

One of the simplest models is a straight line; it simulates 1D data, like temperatures measured at different moments. The model is

To be reviewed and continued…

# SSM Using Scalismo (5): Creating a point distribution model

Finally, I arrive at statistical shape models. I am going to create a point distribution model (PDM) of shapes. Reading some theory on the point distribution models, I need **superimposed shapes** and their **mean shape**. Superimposed shapes and their mean shapes are available from my previous tutorial on GPA with Scalismo. The outcomes of the full GPA is going to be utilized for creating a PDM. The PDM in this tutorial uses shapes (not forms), and it is based on the multivariate normal distribution of the points’ coordinates collected in a shape vector (Eq. 16 in this post).

This post contains code blocks; if your browser does not automatically display the code blocks, please click on the displayed dark narrow rectangular areas to display the containing code.

I start with initializing Scalismo and loading the superimposed configurations, i.e. shapes. Following the previous tutorial, I use the femoral head shapes.

object PDM extends App { import scalismo.geometry._ import scalismo.common._ import scalismo.ui.api._ import scalismo.mesh._ import scalismo.io.{StatisticalModelIO, MeshIO} import scalismo.statisticalmodel._ import scalismo.registration._ import scalismo.statisticalmodel.dataset._ import scala.io.StdIn.{readLine, readInt} scalismo.initialize() implicit val rng = scalismo.utils.Random(42) // ui val ui = ScalismoUI() // Loading data val drive = "I:/" val meshDir = drive + "registration/Full GPA/" val unsortedMeshFiles = new java.io.File(meshDir).listFiles() unsortedMeshFiles.foreach(file => println(s"\n$file")) println(unsortedMeshFiles) // specifying the file name of the mean mesh generated through the full GPA val meanMeshName = "meanMesh.stl" val meshFileNames = unsortedMeshFiles.map(file => file.getName()) val indexOfMeanMesh = meshFileNames.indexOf(meanMeshName) // bringing the meanMesh to the head (first entry) of the meshFiles sequence val meshFiles = unsortedMeshFiles.updated(0, unsortedMeshFiles(indexOfMeanMesh)).updated(0, unsortedMeshFiles(indexOfMeanMesh))

I have to choose a reference mesh and some training meshes. The full GPA mean mesh is chosen as the reference mes and the three (registered and superimposed) meshes as the training meshes.

// specifying the reference mesh and the training meshes; the full GPA mean mesh is set to be the reference mesh val refMesh = MeshIO.readMesh(meshFiles.head).get val trainingMeshes = meshFiles.tail.map(file => MeshIO.readMesh(file).get)//the other 3 meshes

Once a reference mesh and training meshes are chosen, the deformation vector from each point of the reference mesh to its corresponding point of each training mesh should be calculated. Thereby, the deformation fields morphing the reference mesh to the training meshes are obtained.

// [The meshes are already aligned using the full GPA, and in correspondence, so we can create a discrete GP] // creating the deformation vector field val deformationFields = trainingMeshes.map(mesh => { val deformationVectors = refMesh.pointSet.pointIds.map { id => mesh.pointSet.point(id) - refMesh.pointSet.point(id) }.toIndexedSeq // a discrete deformation field needs a domain and the deformation vectors at each point of the domain DiscreteField[_3D, UnstructuredPointsDomain[_3D], EuclideanVector[_3D]](refMesh.pointSet, deformationVectors) })

The calculated deformation fields are samples (realizations) of a discrete *random field*. The domain of the random field is the reference mesh points. The random field of deformations is assumed to have a multivariate normal distribution. Here, the discrete random field is expressed by a random vector. Thereby, a shape, described by a shape vector, can be produced by adding the shape vector of the reference shape to the normally distributed random vector of deformations expanded by the deformation modes (this is explained by Eq. 16 in this post).

Technically, Scalismo works with the concept of Gaussian process (GP) which is a continuous random field whose any finite sub-collection of random variables (here deformations) has a multivariate normal (Gaussian) distribution. Therefore, continuous sample deformation fields should be approximated based on the known discrete realizations/samples of the random field. By interpolating a discrete field, Scalismo creates a continuous field. Different interpolation approaches are possible; Scalismo uses the Nearest Neighbor Interpolation method. The following code block defines a discrete GP (of the deformations) which is in fact a normally distributed random vector with elements indexed by the reference mesh points (a discrete domain). The discrete GP (random vector of deformations) is expressed in terms of its principal axes (modes). This is done using the PCA as described in this post (see Eq. 16).

val interpolator = NearestNeighborInterpolator[_3D, EuclideanVector[_3D]]() val continuousDefFields = deformationFields.map(field => field.interpolate(interpolator)) val gp = DiscreteLowRankGaussianProcess.createUsingPCA(refMesh.pointSet, continuousDefFields)

As I have 3 training meshes and therefore 3 training deformation fields, the rank of the sample covariance matrix of deformations is at most 3. This means that at most 3 principal axes are used to express the random vector of deformations.

I can sample the GP (random field) of deformations. By adding a sample deformation field to the reference mesh, I can generate a shape vector. This shape vector is a sample mesh from meshes whose points’ coordinates are normally distributed. Sampling some meshes from the *mesh distribution*, I’ll have the following realization:

val PDMGroup = ui.createGroup("refMeshGroup") val refMeshView = ui.show(PDMGroup, refMesh, "refMesh") refMeshView.color = java.awt.Color.GREEN val gpDefView = ui.addTransformation(PDMGroup, gp, "RefMeshDefrmdByGp")

The horizontal bar-like controls in the figures are proportional to the coefficients of the deformation modes (b_i=\alpha_i\sqrt{\lambda_i}). Each coefficient varies within three standard deviations (Eq. 4 in this post).

Scalismo assembles the reference mesh and the discrete GP as a statistical mesh model:

// [creating a statistical shape model (PDM) val model = StatisticalMeshModel(refMesh, gp.interpolate(interpolator))

This is the statistical PDM.

# Point Distribution Models of Shapes and Forms

A **point distribution model (PDM) **is a** model** that locates (or determines the geometrical (spatial) distribution of) the points of the shapes or forms (size-and-shapes) in a particular shape or form population. The word *distribution* refers to the spatial distribution of the points; not a probability distribution. This is my best understanding of the definition of PDM’s so far.

A PDM approximates the geometrical distribution of the points of shapes/forms in a population. This means that a PDM is intended to model/express all the possible shapes/forms in the population.

A PDM entails shapes or forms. This means we are working with **superimposed configurations** modulo their location, rotation, and scale information for the case of shapes, and their location and rotation information in the case of forms. Therefore, we assume that shapes or forms are available through performing a full or partial generalized Procrustes analysis (GPA) on the given configurations. In other words, a shape is an instance of the full Procrustes fits, and a form is an instance of partial Procrustes fits. Henceforward, whatever includes the concept of a shape in this article also holds for a form unless declared.

As explained, a shape can be (approximately) represented using landmarks, i.e. discrete set of points. By choosing a 3D Cartesian coordinate system (CS) with orthogonal basis vectors and *x*,*y*, and *z* coordinates (axes), the landmark coordinates can then be collected in a matrix. One can readily think of collecting the coordinates in a column vector (in fact every matrix is isomorphic to a vector). Therefore, a shape in a set of shapes, all having the same number of *N* landmarks is represented as a vector called the **shape vector** [1]:

See this post for linear algebra conventions a identities*.*

A shape vector X is a member of the vector space \mathbb{R}^{3N\times 1}. Therefore, there are bases, and X can be written as a linear combination of the basis vectors. By default, the *3N*-D X vector is born in the vector space spanned by the **standard basis**. Indeed, other bases are possible.

Using a basis \{v_1,\dots,v_{3N}\} for the vector space, we can write any shape vector as a deviation from an arbitrary constant (shape) vector a\in \mathbb{R}^{3N\times 1} :

X=a+\sum_{i=1}^{3N}b_iv_i\ ,b_i\in \mathbb{R}\\ \text {or}\\ X=a+Vb\quad ,\quad \text{s.t. }\ V\in\mathbb{R}^{3N\times 3N},\quad V_{,i}=v_i,\quad v_i^\text Tv_j=\delta_{ij}It should be noted that, a is a *point* (position vector) and Vb is a *free vector* in the concept of geometrical modelling.

On the other hand, X is a random vector with with a mean \mu_X and a covariance \Sigma_{XX}. The constant vector a can be replaced with \mu_X. Any other shape vector, like a known arbitrary shape vector in the population, can also be substituted for a. The basis vectors can be chosen such that the linear projection of X onto them generates random variables each having maximum possible variance. These vectors are obtained by PCA as a variance-maximization method. Because \Sigma_{XX} is symmetric, it has *3N* independent and mutually orthogonal eigenvectors, and hence any shape vector can be written as:

where, the term V\varLambda V^\text T is the singular value decomposition (SVD) of \Sigma_{XX} such that the eigenvalues, \lambda_i‘s, collected in the diagonal matrix \varLambda are sorted in the decreasing order. As the covariance matrix is symmetric, SVD is equivalent to the spectral/eigenvalue decomposition of the matrix.

So far we showed that a shape instance (represented by a shape vector) in the population can be generated by deforming the mean shape through a deformation field (vectors) which is determined by linear combinations of the eigenvectors of \Sigma_{XX} . The eigenvectors of \Sigma_{XX} are principal (*3N* dimensional) directions along which the variances of the linearly projected (random) shape vector are maximized. The principal directions are called **deformation modes** (deformations from the mean). The deformation modes are also called **eigenshapes**. The combining coefficients b_i‘s are called the **shape model parameters**. The deformation modes weighted by the shape model parameters adds up to produce a deformation vector that will be added to the mean shape vector to generate a shape. Such a representation is called **the modal representation**; this is the idea behind a PDM.

By choosing proper shape parameters, the model generates shapes in the shape population. It should be noted that there is generally no restrictions on the values of b_i‘s; in other words, the deformation modes can be combined arbitrarily. However, a generated shape out of an arbitrary values of b_i‘s may not fall into the shape population; this is not difficult to imagine.

If we let U:=X-\mu_X, then U is a random vector expressing the deviation of a shape **from the mean shape** of the population. I call this vector the **deviation** or the **deformation shape vector**. This vector contains deformation components (in three directions of the CS) of the mean shape at each of its points. The mean of the deformation shape vector is zero, \mu_U=0. In fact, \Sigma_{XX}=\Sigma_{UU}. This means that the elements of \Sigma_{XX} are variances (or covariances) of the deformation components. U can be linearly projected on the directions (unit vectors) defined by the eigenvectors of \Sigma_{XX}:

and by considering Eq. 2, we can conclude (it is simply the result of the linear projection):

U=Vb\implies U_p=V^\text T Vb=I_rb=bJust as a side note: \mu_U=\mu_{U_p}=\mu_b=0.

On the other hand \Sigma_{bb}=\Sigma_{U_pU_p}=V \Sigma_{UU}V^\text T = \varLambda. This is what we can conclude:

\begin{gathered}X=\mu_X+Vb=\mu_X+\sum_{i=1}^{3N}b_iv_i \tag{3} \\ \ \\ \text{s.t. }\ \Sigma_{bb}=\varLambda= \begin{bmatrix}\lambda_1&0&\dots\ &0\\ 0 &\lambda_2\ &\dots\ &0\\ \vdots&\vdots &\ddots &\vdots\\ 0&0&0&\lambda_{3N} \end{bmatrix}\quad \text{and}\quad \mu_b=0\\ \ \\ \text{with}\\ \ \\ \|b_iv_i\|=|b_i|\end{gathered}Note that \mu_b=0 because \mu_U=0.

It is obvious that the magnitude of b_i controls the magnitude of deformation of the mean shape by each mode. From above, the standard deviation of b_i is \sqrt{\lambda_i}. It is assumed that a PDM generates **plausible shapes**, or **well behaved shapes**, if the shape parameters are limited as:

A plausible shape is created by adding a plausible deformation shape vector to the mean shape vector; a plausible deformation shape vector is a deformation vector that generates shapes still belong to the shape family under consideration.

The covariance matrix \Sigma_{XX} (and also \Sigma_{bb}) is positive semi-definite; therefore, \lambda_i\ge 0. This implies that the variance of some b_i‘s can be zero, i.e. the zero-variance b_i‘s are constant. In that case, b_i=\mu_{b_i}=0 simply because no variation about the mean. Therefore, a deformation mode associated with a shape parameter b_i with zero variance (\lambda_i=0) vanishes from the modal representation. Assuming *M* zero eigenvalues (including algebraic multiplicity), we can write the modal representation of a shape (vector) as:

### Shape approximation

A PDM is based on the modal representation of shapes (Eq. 2 or 3). This kind of representation is useful when it comes to approximation of a shape. In this regard, only some of the deformation modes being relatively more effective in forming a shape is kept and the rest of the modes are ignored. We already showed (Eq. 5) that zero eigenvalue of the covariance matrix leads to zero variation of the corresponding shape parameter and consequently vanishing of the associated mode in the modal representation. This motivates us to throw out the shape parameters with relatively small variances, hence, their associated modes; thereby, approximating a shape vector by a *t*-term compact modal representation:

where v_i‘s are as in Eq.2.

Moreover, by the Least-squares optimality of PCA, we can approximate the shape vector as (see Eq. 3a in this post):

\begin{gathered} X\approx\tilde X =\mu_X + \tilde V\tilde V^\text T(X-\mu_X)= \mu_X +\sum_{i=1}^{t}v_iv_i^\text T(X-\mu_X) \tag{7} \\ \ \\ \text{s.t. }\ b\in \mathbb{R}^{t\times 1}\ , \tilde V=[v_1\quad v_2\ \dots\quad v_t] \in \mathbb{R}^{3N\times t}\ ,\text{ and } t<3N \end{gathered}This indicates that for approximating a particular shape vector, the shape model parameters should be chosen as:

b=\tilde V^\text T(X-\mu_X)\qquad \text{or}\qquad b_i=v_i^\text T(X-\mu_X)\tag{8}But, how many modes i.e. t do we need? it can be determined based on the **goodness-of-fit measure **defined for PCA as a dimension reduction tool (Eq. 7 in this post):

Therefore, we only need to keep the first *t* modes having large associated eigenvalues. Actually, we *hope* that the first *t* eigenvalues are substantially larger than the rest of the eigenvalues. As a suggestion g(t)\le 0.05 is ok.

### Sample shapes (data set)

In practice, the mean and covariance of the shape (random) vector X are unknown and should be estimated. Let S=\{S_1,\dots,S_n\} be a set of sample shapes (each represented by a shape vector S_i) from a particular shape family population. This is, in fact, a set of independent observations of identically distributed random variables. This set is called the **training set**. As a remark, it should be noted that we generally start with a set of configurations, X_c=\{C_1,\dots ,C_n\} out of a configuration population; for example the set of all dinosaurs. Then, we perform the full or the partial GPA to obtain a set of shapes/forms already denoted by S. Having said that, the following estimates are considered [2]:

where *n* is the number of training shapes. The covariance estimate is a symmetric matrix, therefore, it has *3N* independent eigenvectors, however, has at most rank *n* (properties of rank in this post), i.e. non-zero eigenvalues. Therefore, the modal representation of the training shapes and other shapes in the population predicted based on the model generated using sample shapes (training shapes) is:

where \hat v_i is an eigenvector of \hat\Sigma_{XX}.

When *n* is large enough, then we can use the following criterion to select the first *t* modes with relatively large eigenvalues:

This criterion is similar to Eq. 9.

## PDM’s with normal distributions

It can be assumed that a shape vector X, being a random vector, has a multivariate normal distribution i.e. [1]:

X\sim \mathscr N(\mu_X,\Sigma_{XX})\tag{13}This assumption can be manipulated toward a more practical form. Let’s consider the following proposition:

If Y\in \R^{r\times 1} has a multivariate normal distribution Y\sim \mathscr N(\mu_Y,\Sigma_{YY}), A\in \R^{s\times r}, and c\in \R^{s\times 1} being a constant vector, then, the linear transformation X=AY+c is distributed as X\sim \mathscr N(A\mu_Y+c,A\Sigma_{YY}A^\text T) .

Using the proposition and writing the shape vector in its modal form as X=\mu_X+Vb, we can conclude:

X=\mu_X+Vb\sim \mathscr N(\mu_X,\Sigma_{XX})\\ \ \\ \iff X\sim \mathscr N(V\mu_b+\mu_X,V\Sigma_{bb}V^\text T)\\ \ \\ \iff b\sim \mathscr N(0,\Sigma_{bb})where \Sigma_{bb} is the diagonal matrix of the eigenvalues as in Eq. 3.

By the marginalization property of normal distributions:

b_i\sim \mathscr N(0,\lambda_i)and, we can write:

\alpha_i:=\frac{b_i}{\sqrt{\lambda_i}}\implies \alpha_i\sim \mathscr N(0,1)Therefore, the modal representation of the shape vector becomes:

X=\mu_X+\sum_{i=1}^{3N}\alpha_i\sqrt{\lambda_i}v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{14}For shape vector approximation and sample shape vector, the following equations (based on Eq. 6 and 11) are readily obtainable:

\tilde X=\mu_X+\sum_{i=1}^{t}\alpha_i\sqrt{\lambda_i}v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{15} X=\bar S+\sum_{i=1}^{n}\alpha_i\sqrt{\lambda_i}\hat v_i\qquad \text{s.t }\ \alpha_i\sim \mathscr N(0,1) \tag{16}## Shape space spanned by the PDM

A shape vector X\in \R^{3N} can be represented by any of the equations 6, 11, 15, or 16. In any of the equations, a 3*N* dimensional shape vector is presented by a linear combination of basis vectors whose number of them is less than the dimension of the shape vector. This means that a PDM spans a space (of shape vectors) which is a subspace of \R^{3N} . Therefore, a PDM model established by training shapes can only constitutes shapes born out of the linear combinations of the shape modes, \hat v_i‘s. Such a model then misses the shapes (in the population) which are not captured by the training set of shapes or cannot be represented by linear combinations of the shape modes.

#### References

[1] *Gaussian Process Morphable Models*, Marcel Luthi , Thomas Gerig, Christoph Jud, and Thomas Vetter. IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 40 , Issue: 8 , Aug. 1 2018).

[2] *Image Processing and Analysis* (Chapter 7: *Model-Based Methods in Analysis of Biomedical Images*). R. Baldock and J. Graham, Oxford University Press, 2000.

# Linear Algebra Cheat Sheet (1)

**0- Notations and **convention

A variable/quantity/element can be scalar, vector, or a matrix; its type should be understood from the context if not explicitly declared.

**0-1-** A vector z\in \mathbb{F}^{n} is considered as a column vector, i.e. a single-column matrix. Therefore:

A row vector is therefore defined as w^\text T\in \mathbb{F}^{1\times n}.

**0-2-** Dot product of column vectors:

**0-3-** The *ij*-th element of a matrix A is denoted by A_{ij}. The *i*-th column and *j*-th row of the matrix are respectively denoted by A_{,i} and A_{j,} .

**0-4-** Columns and rows of a matrix are column and row vectors. A m\times n-matrix can then be represented as:

or

\begin{bmatrix}u_1^\text T\\ u_2^\text T\\ \vdots\\ u_n^\text T\end{bmatrix}\ \text{ s.t } u_i^\text T=A_{i,}**1- Orthogonal matrix**

a square matrix A\in \mathbb{R}^{n\times n} is said (definition) to be **orthogonal** iff A^{-1}=A^\text T; provided that A^{-1}, inverse of A exists. As a result, AA^{-1}=AA^\text T = I_n.

The following are equivalent for A:

a) A is orthogonal.

b) the column vectors of A are ortho-normal.

c) the row vectors of A are ortho-normal.

d) A is size preserving: \|Ax\|=\|x\|, \|.\| being the Euclidean norm, and x\in\ \mathbb{R}^{n\times 1}.

e) A is dot product preserving: Ax\cdot Ay=x\cdot y .

**2- Some matrix multiplication identities**

A\in \mathbb{F}^{m\times n},\ x\in \mathbb{F}^{n\times 1}\ \text{ then } Ax=\sum_{i=1}^n x_iA_{,i}
A\in \mathbb{F}^{m\times n}\ , D\in \mathbb{F}^{n\times n}\ \text{ and diagonal }, B\in \mathbb{F}^{n\times k},\ \text{ s.t.}\\ \ \\
A=\begin{bmatrix} v_1& v_2& \dots & v_n\end{bmatrix}, \ B=\begin{bmatrix}u_1^\text T\\ \\ u_2^\text T\\ \vdots\\ u_n^\text T\end{bmatrix},\ D_{ij}= \begin{cases}
\lambda_i &\text{, } i=j \\
0 &\text{, } i\ne j
\end{cases}
\\ \ \\ \text{then }ADB=\sum_{i=1}^n \lambda_iv_i u_i^\text T
**3- Change of basis**

let \beta:=\{b_1,\dots,b_n\} and \beta':=\{b'_1,\dots,b'_n\} be two basis of \mathbb R^n. Then the following holds for the coordinates of a vector v\in\mathbb R^n with respect to the two bases:

[v]_{\beta’}=P_{\beta\to\beta’}[v]_{\beta}\quad\text{s.t} \\ \ \\ P_{\beta\to\beta’}=\begin{bmatrix} [b_1]_{\beta’}& [b_2]_{\beta’}& \dots & [b_n]_{\beta’}\end{bmatrix}It can be proved that:

[v]_{\beta}=P_{\beta’\to\beta}[v]_{\beta’}\quad\text{s.t} \\ \ \\ P_{\beta’\to\beta}=P^{-1}_{\beta\to\beta’}