Lebesgue Integration

Terminology: we say the limit of a sequence or sum, or Sup or Inf of a set exists if it is finite (and unique). If it is +\infty or -\infty we say the limit/Sup/Inf is defined, i.e. unbounded, but does not exist. In any other cases, e.g \infty - \infty, we say it does not exist (or undefined).

Definition: A partition of a set S is a collection of non-empty (pairwise) disjoint subsets of S whose union equals S.

Definition: A real-valued simple function is a function that takes a finite or countable number of real values (NOT extended real values including \infty), i.e. its range is finite or countable. Note that the definition does not put any restriction on the domain or codomain of the function.

For example f:\mathbb R \to \mathbb R defined as,

    \[f(x) = \begin{cases} 0 & x\le 1 \\ 2.5 & 1<x\le 5 \\ -2 & x>5 \end{cases} \]

is a simple function.

A simple function \varphi: I \to \mathbb R on an interval I\subset \mathbb R can be defined as,

    \[ \varphi(x)=\sum_k^n a_k \mathcal X_{E_k}(x)\]

where a_k are constants, E_k\in \{E_1,\cdots, E_n\} such that I=\bigcup_{i=1}^n E_i and E_i\cap E_j=\emptyset for i\ne j, and \mathcal X_{E_k} is called the characteristic function of E_k being defined as,

    \[\mathcal X_{E_k}(x) = \begin{cases} 1, & x\in E_k \\ 0, & x\notin E_k \end{cases}\]

In words, the interval I was partitioned into pairwise disjoint intervals E_ks where E_k\ne \emptyset.

The above definition is called the canonical representation of a simple function and is equivalent to,

    \[\varphi (x) = \begin{cases} a_1, & x\in E_1 \\ a_2, & x\in E_2 \\ \cdots & \cdots \\ a_n, & x\in E_n \end{cases}\]

The domain of a simple function is not restricted to \mathbb R; a simple function can be \varphi: \Omega \to \mathbb R with its domain \Omega being any set.

Remark: a step function is a type of a simple function, i.e. a simple function is called a step function when \Omega\subset \mathbb R and each E_i is an interval of real numbers.

Definition: a measurable function is a simple measurable function if it is a simple function.

Approximating a function by simple function(s)

Let f:\Omega \to \mathbb R and R:=f(\Omega) is the range of the function. If we partition the range as R=\cup_{i=1}^n I_i where the subsets are pairwise disjoint, we can construct a simple function \varphi: \omega\ to \mathbb R approximating f as,

    \[\varphi(\omega) = \sum_{i=1}^n a_i \mathcal X_{E_i}(\omega)\]

where a_i\in I_i, and E_i=f^{-1}(I_i). Note that the \{E_i\}_1^n is a partition of disjoint sets because I_is are pairwise disjoint. Also, \cup_{i=1}^n E_i=\Omega.

To construct the simple function approximation, the range of the function was partitioned and then the domain was implicitly partitioned through the pre image. The reason is that firstly partitioning the real line as the codomain/range can be readly performed. Secondly, this approach is the foundation of Lebesgue integration. Lebesgue integration partitions the range into intervals, then each summand is a number in one of the intervals in the partition times the measure of the preimage of that interval. This makes Lebesgue integration capable of considering functions that are not Riemann integrable and also having nice properties like interchanging limit and integration operators. Any real-valued function can be written as the limit of a sequence of simple functions. For non-negative functions, however, the following theorem exists.

Theorem L1: Let f:\Omega \to \mathbb R^+ be a non-negative measurable function. Then, there exists a monotonically increasing sequence of non-negeative measurable simple functions f_n:\Omega \to \mathbb R^+ (where n\in \mathbb N) such that \underset{n\to \infty}{\lim f_n}=f,\ \forall \omega\in \Omega (this is a pointwise convergence). The approximation of f by {f_n}_{n=1}^\infty is an approximation from bellow, i.e. f_n(\omega) \le f_{n+1}(\omega) \le f(\omega). Such a sequence can be constructed as (this proves the existence),

    \[f_n(\omega) = \sum_{i = 1}^{ n 2^n} \frac{i - 1}{2^n} \mathcal X_{E_{n, i}}(\omega) + n \mathcal X_{E_n}(\omega)\]


    \[\begin{split}E_{n,i}&:={\omega| \frac{i-1}{2^n}\le f(\omega)<\frac{i}{2^n} },\quad i=1,2,\cdots, n2^n\\&\equiv f^{-1}\big ( [\frac{i-1}{2^n},\frac{i}{2^n}) \big )\\E_n&:={\omega| f(\omega)\ge n }\equiv f^{-1}\big ([n,\infty) \big)\end{split}\]

This is explained as follows. Regarding each f_n, the codomain of f is partitioned as \mathbb R^+ = [0,n)\cup [n,\infty). Then, [0,n) is devided/partitioned into n2^n subintervals \{I_{n,i}\}_{i=1}^{n2^n} such that l(I_{n,i})=\frac{1}{2^n}, i.e.,

    \[[0,n] = [0,\frac{1}{2^n})\cup [\frac{1}{2^n},\frac{2}{2^n})\cup\cdots\cup [\frac{n2^n-1}{2^n},n)\]

The rest of the codomain is considered as (n,\infty) regarding an n.Then, the preimage (f^{-1}) of each subinterval is determined (on the domain) as, E_{n,i}:= f^{-1}\big ( [\frac{i-1}{2^n},\frac{i}{2^n}) \big ) and E_n:= f^{-1}\big ( (n,\infty) \big ). These preimages partition the domain into pairwise disjoint sets because the partitioning sets of the codomain are disjoint. With these assumptions, each f_n is a simple function. Note that a simple function is bounded.

Proposition L1: Any real-valued function f can be approximated by monotonically increasing sequence of non-negeative measurable simple functions if written as f=f^+ - f^- where, f^+ := \max (f,0) and f^- := \max (-f,0). Note that both f^+ and f^- are positive and hence satisfies the conditions of Theorem L1.

Proposition L2: if f is bounded, then f_n \to f converges uniformely.

Integration with respect to a measure/ Lebesgue integral

Definition: Let (\Omega, \mathcal A, \mu) be a measure space and f:(\Omega, \mathcal A, \mu) \to (\mathbb R, \mathcal B) be a bounded measurable function. If S\subset \Omega, \mu(S) < \infty and \mathcal P = \{E_i\}_1^n is a disjoint measurable partition of S, i.e. S = \cup E_i and E_i\in \mathcal A, define,

    \[\begin{split}L(f,\mathcal P):=&\sum_{i=1}^n m_i \mu(E_i)\quad \text{lower sum}\\U(f,\mathcal P):=&\sum_{i=1}^n M_i \mu(E_i)\quad \text{upper sum}\\&m_i=\underset{x\in E_i}{\inf f(x)} \qquad M_i=\underset{x\in E_i}{\sup f(x)}\end{split}\]

Also, if \mathcal Q = \{H_i\}_1^m is another disjoint partition of S, we write \mathcal Q \succ \mathcal P and say \mathcal Q is a refinement of \mathcal P if H_i \subset E_j for some j.

Then, we can show,

  1. L(f, \mathcal P) \le U(f, \mathcal P).
  2. L(f,\mathcal Q) > L(f, \mathcal P) \iff \mathcal Q \succ \mathcal P i.e. increasing sequence.
  3. U(f,\mathcal Q) < L(f, \mathcal P) \iff \mathcal Q \succ \mathcal P i.e. decreasing sequence.
  4. \underset{\mathcal P}{\sup}L(f\mathcal, P) \le \underset{\mathcal P}{\inf}U(f\mathcal, P). i.e. as \mathcal P refines.

We say f is Lebesgue integrable over S if \underset{\mathcal P}{\sup}L(f\mathcal, P) = \underset{\mathcal P}{\inf}U(f\mathcal, P). In this case we write,

    \[\int_S f\mathrm d\mu = \underset{\mathcal P}{\sup}L(f,\mathcal P) = \underset{\mathcal P}{\inf}U(f,\mathcal P)$\]

Remark: \Omega can be a set of any objects (numbers, symbols, animals, etc).

The Lebesgue integration of a simple measurable function over a set of finite measure

Theorem L2: Let (\Omega, \mathcal A, \mu) be a measure space and \varphi: S\subset\Omega \to \mathbb R as \varphi(x)=\sum_{i=1}^n a_i \mathcal X_{E_i}(x) and a_i \in \mathbb R a measurable function where S=\bigcup_i E_i, \mu (S) < \infty and E_is are pairwise disjoint. Then, \varphi(x) is Lebesgue integrable on S iff \sum_{i=1}^n a_i \mu(E_i) exists. In this case we write,

    \[\int_S \varphi \mathrm d\mu = \sum_{i=1}^n a_i \mu(E_i)\]

For the proof, note that \{E_i\}_1^n is a (disjoint and measurable) partition of S and m_i=M_i on each E_i.

In above formulation, the sum is over the size of the partition, i.e n and a_is are sequentially in correspondence with E_is. We can however write the sum over the size of the range of the simple function. The range of a simple measurable function \varphi: S \to \mathbb R is a discrete, i.e. countable set, R\subset \mathbb R. If the number of elements in (the size of) R is m and \varphi(x)=\sum_{i=1}^n a_i \mathcal X_{E_i}(x), then,

    \[\int_S \varphi \mathrm d\mu = \sum_{i=1}^n a_i \mu(E_i) = \sum_{i=1}^m a_i\mu(\varphi^{-1}(a_i)) \equiv \sum_{a_i\in R} a_i\mu(\varphi^{-1}(a_i)) \]

Note that the range of the function is a set and it indeed contains distict values.

Example: Let \mu be the Lebesgue measure and \varphi: (0,9) \to \mathbb R be defined as,

    \[\varphi(x) = \begin{cases}1.0, & 0<x\le 1\\ 2.5, & 1< x<4\\ 1.0, & 4 \le x <9\end{cases}\]

Then, Lebesgue integration by summing over the size of the partition gives,

    \[\int_{(0,9)}\varphi\mathrm d\mu =1.0\mu((0,1])+2.5\mu((1,4))+1.0\mu([4,9])=1.0(1)+2.5(3)+1.0(5)=13.5\]

And, the integration by summing over the range of the function (R = \{1.0,2.5\}) gives,

    \[\begin{split}\int_{(0,9)}\varphi\mathrm d\mu =1.0\mu(\varphi^{-1}(1.0)) + 2.5\mu(\varphi^{-1}(2.5))&=1.0\mu((0,1] \cup [4,9))+2.5\mu((1,4))\\&=1.0\big( \mu((0,1])+\mu([4,9))\big)+2.5\mu((1,4))=13.5\end{split}\]

Example: Let (\mathbb R, \mathcal B, \mu) be a measure space and,

    \[\mu(I)=\begin{cases} l(I)/4, & I\subset[-2,2] \\0, & I\not\subset [-2,2]\end{cases}\]

if \varphi(x)=1.0\mathcal X_{(-\infty,-2)}+2.5\mathcal X_{[-2,1]}-1.5\mathcal X_{(1,+\infty)}, we calculate \int_{(-\infty, +\infty)}\varphi \mathrm d \mu as follows.

We note that the support of the integral has a finite measure with respect to the defined measure; because, \mu((-\infty,+\infty))=1. Therefore, we can write,

    \[\begin{split}\int_{(-\infty,+\infty)}\varphi\mathrm d \mu &= 1.0\mu((-\infty, -2)) + 2.5\mu([-2,1])-1.5\mu((1,2])-1.5\mu((2,+\infty))\\&=1(0)+2.5(3/4)-1.5(1/4)-1.5(0)=1.5\end{split}\]

Remark: Lebesgue integrability depends on the measure envolved.

Remark: if \varphi(x)=\sum_{i=1}^n a_i \mathcal X_{E_i}(x) then for any set S, we can write \int_S \varphi \mathrm d\mu = \sum_{i=1}^n a_i \mu(S\cap E_i)

Proposition [L3]: The follwing properties hold for integration of simple measurble functions over sets of finte measure:

  1. \int_S (f+g)\mathrm d\mu = \int_S f \mathrm d\mu+ \int_S g \mathrm d\mu.
  2. If A\cap B =\emptyset then \int_{A\cup B} f \mathrm d\mu = \int_A f \mathrm d\mu + \int_B f \mathrm d\mu.
  3. | \int_S f\mathrm d\mu |\le \int_S |f| \mathrm d\mu.
  4. If f \le g on S, then \int_S f\mathrm d\mu \le \int_S g \mathrm d\mu.
  5. If f=g almost every where on S, then \int_S f\mathrm d\mu = \int_S g \mathrm d\mu.

The Lebesgue integration of a bounded measurable function over a set of finite measure

The general definition of Lebesgue integration and the formulation for the Lebesgue integration of a simple function can be used to (re-) define the definition of the Lebesgue integration of a bounded function on a set of finite measure. In this regard, the lower and upper Lebesgue integrals of f:\Omega \to \mathbb R on S with \mu(S) < \inftyare defined as,

    \[\underline{\int_{S}} f &:=\sup \Big \{\int_S \varphi\mathrm d \mu\ \Big | \varphi \text { is simple}, \varphi(\omega) \le f(\omega) \ \forall \omega \in S \Big \}\]


    \[\overline{\int_{S}} f :=\inf \Big\{\int_S \psi \mathrm d \mu\ \Big | \psi \text { is simple}, \psi (\omega) \le f (\omega) \ \forall \omega \in S \Big \}\]

which are bounded and \underline{\int_{S}} f \le \overline{\int_{S}} f.

in which \mu is a measure on (\Omega, \mathcal A); the operations Sup and Inf are over all simple functions approximating the function.

By definition, if \underline{\int_{S}} f = \overline{\int_{S}} f, we say f is Lebesgue integrable over S and its integral equals the common value and is denoted as \int_{S} f \mathrm d \mu.

Theorem [Lebesgue integribility of functions]: If f:(\Omega, \mathcal A, \mu)\to (\mathbb R, \mathcal B) is a bounded measurable function on a set of finite measure, S\subset \Omega where \mu (S) < \infty, then f is Lebesgue integrable, i.e. \int_S f\mathrm d \mu exists. The converse is also true, stating that, if a bounded function on a set of finite measure is integrable (i.e. \int_S f\mathrm d \mu exists), then the function is a measurable function.

Proposition L4: Proposition L3 holds for the above afformentioned functions.

Theorem L3: Let f be a bounded measurable function on S with \mu(S) < \infty . If S=\underset{i\in J}{\bigcup} E_i where \{E_i\}_{i\in J} is at most a countable family of pairwise disjoint measurable sets, then \int_S f = \underset{i\in J}{\sum}\int_{E_i} f.

An example that can be solved by the above theorem is \int_{[0,1]} f \mathrm d \mu such that \mu=l(I) where I\in \mathcal B, and f:[0,1]\to \mathbb R such that f(x)=1\ \forall x\in \mathbb Q and otherwise f(x)=0.

Proposition [Lebesgue and Riemann integrations]: Let f:(\Omega\subset \mathbb R^n, \mathcal B^n, \mu)\to (\mathbb R, \mathcal B) be a bounded measurable function where \mu in the Lebesgue measure (n-dimensional interval length). If f has a finite number of discountinuity and and \mu(S) < \infty for S\subset \Omega. Then, the following integral exists and,

Lebesgue integration: \int_S f \mathrm d\mu = Riemann integration: \int_S f \mathrm dV

Riemann integration of a bounded function on a set of finite measure can be regarded as a particular case of the general Lebesgue integration. In fact, Riemann integration is based on subdividing the domain of a function whereas Lebesgue integration is based on subdividing the range of a function and using the inverse image to create measurable subdivision on the domain of the function.

The Lebesgue integration of an unbounded measurable function over a measurable set

Riemann integration is defined as the limit of the Riemann sum for bounded functions on bounded domains. For unbounded functions or domains, the Riemann integration is defined as limits:

1- If f:D\subset \mathbb R \to R is continuous (with finite number of discountinuity though) on I=[a,b)\subset D and unbounded at b as \lim_{x_i\to b^-} f=\pm \infty, then the follwing Riemann integral is defined provided that the limit on the RHS exists (i.e. being finite),

    \[\int_a^b f(x) \mathrm d x := \lim_{t\to b^-} \int_a^t f(x) \mathrm d x\]

that is calculating the integral for a bounded function and then taking the limit.

If f is unbounded as \lim_{x_i\to a^+}=\pm \infty, we define \int_a^b f(x) \mathrm d x := \lim_{t\to a^+} \int_t^b f(x) \mathrm d x. And if f is unbounded at c\in [a,b], then we split the integral at c and write the limits.

2- If the domain of integration is unbounded with respect to a variable, then the integration is defined as a limit if exists. For example, if f:\mathbb  R^3 \to \mathbb R continuous, then,

    \[\begin{split} \int_a^b\int_{0}^{+\infty}\int_{-\infty}^{+\infty} f(x,y,z)\mathrm d V&:=\lim_{t\to \infty}\lim_{s\to -\infty}\big (\int_a^b\int_{0}^{t}\int_{s}^{0} f(x,y,z)\mathrm d V\big ) \\ &+\lim_{t\to \infty}\lim_{r\to +\infty}\big (\int_a^b\int_{0}^{t}\int_{0}^{r} f(x,y,z)\mathrm d V\big )\end{split}\]

The above integrations are classified as improper Riemann integrations. Lebesgue integration approaches these unbounded cases in a natural (more general) way. Therefore, the temr improper is not used with these cases of Lebesgue integration.

Definition: A measurable function f on a set S is of finite support if there is a set S_0 \subset S for which \mu(S_0) < \infty and f\equiv 0 on S\setminus S_0. The support of f then becomes the set over which the function does not vanish.

Proposition: Let f be a bounded measurable function on S with a finite support S_0, then \int_S f \mathrm d \mu = \int_{S_0} f\mathrm d \mu.

Proof: by assumption f\equiv 0 on S\setminus S_0. The proposition is already proved If \mu S\setminus S_0 < \infty. For the case that \mu (S\setminus S_0)=\infty, i.e. unbounded, we can write,

    \[\mu (S\setminus S_0)=\infty \implies \mu(S\setminus S_0)=\sum_{i=1}^\infty \mu(E_i)\quad \text{ s.t }\quad E_i\ne \emptyset, \ \mu(E_i)<\infty, \ E_i\underset{i\ne j}{\cap} E_j=\emptyset\]

Therefore, we can write f_{S\setminus S_0}\equiv 0 as a simple function,

    \[f_{S\setminus S_0}(x)= \lim_{n\to \infty}\sum_{i=1}^n 0\cdot \mathcal X_{E_i} (x)\equiv 0\]

by using the formula for Lebesgue integration of a simple function we conclude that,

    \[\int_{(S\setminus S_0)}f\mathrm d \mu=\lim_{n\to \infty}\sum_{i=1}^n 0.\mu(E_i)=\lim_{n\to \infty}\sum_{i=1}^n0=0\]

Remark: in above we showed that \int_S 0\cdot \mathrm d\ mu=0 even if \mu(S) = \infty. Note that we did not write 0\cdot \infty=0; this expressin is undefined. But, we used \lim_{n\to \infty} \sum_{i=1}^n 0\cdot c_i being equal to zero. In other words, one may write, \int_{S} f \mathrm d \mu= 0\cdot \mu(f^{-1}(0))= 0\cdot \mu(S)=0\cdot \infty which is undefined. We should note that the theorem on the integration of simple function holds for sets of finite measure.

Proposition: Let f be a bounded measurable function on S (with finite or infinite measure). If f \equiv 0 almost everywhere on S, then \int_S f\mathrm d \mu=0.

To move ahead and define Lebesgue integration for any measurable functions including unbounded ons on any measurable support, non-negative functions are considered first. Considering f\ge 0 allows using lower approximations of the function by bounded functions of finite support.

Definition L1 [Lebesgue integration of non-negative functions]: For f \ge 0 on S, the integration is defined as,

    \[\int_S f\mathrm d \mu = \sup \Big\{ \int_S h \mathrm d \mu\ | h < \infty \text{ and of finite support and } 0\le h \le f \Big \}\]

the suprimum is on all functions described as h. If the suprimum of the above set of values exists, we say the function is integrable. If value is infnity the integral is defined and is unbounded. Note that h\le f is a pointwise expression means either h(x)<f(x) or h(x)=f(x) at each x. Also, each h is of finite support, meaning that except over some part of the domain with a finite measure, h valishes over te rest of its domain.

Above definition can also prove if f\equiv 0 almost everywhere on S, then \int_S f \mathrm d \mu = 0. To this end let h(x)=0 \le f(x), \forall x\in S.

Definition L2 [Lebesgue integration of functions]: The Lebesgue integration for any measurable function f:\Omega \to \mathbb R and a measurable set S\subset \Omega is defined as,

    \[\int_S f\mathrm d \mu := \int_S f^+ \mathrm d \mu - \int_S f^- \mathrm d \mu\]

provided that at least one of the integrals is finite. If the integral equals \infty or -\infty the integral is defined however we say the function is integrable if the integral is finite. The case \infty - \infty is undefined.

Theorem L4: let \varphi(x)= \underset{i\in I}{\sum}  a_i \mathcal X_{E_i}(x) be a non-negative simple function and S=\underset{i\in I}{\bigcup} E_i. Then, \int_S \phi \mathrm d \mu=\underset{i\in I}{\sum} a_i\mu(E_i)\equiv \underset{i\in I}{\sum}\int_{E_i}\varphi\mathrm d\mu.

Note that since \varphi \ge 0 the sum \underset{i\in I}{\sum} a_i\mu(E_i) is always defined, either finite or +\infty.

Theorem L5: Let f:(\Omega,\mathcal A,\mu)\to (\mathbb R, \mathcal B) be non-negative. Define \eta(A)=\int_A f(\omega)\mathrm d\mu for all A \in \mathcal A. Then

(a) \eta is countably additive on \mathcal A. I.e. \eta (\underset{i\in I}{\bigcup} A_i ) = \underset{ i\in I }{\sum} \eta(A_i) for I\subset \mathbb N and A=\underset{i\in I}{\bigcup} A_i as A_i \underset{i \neq j}{\cap} A_j=\emptyset. Which means,

    \[ \int_A f(\omega)\mathrm d\mu =  \int_{ \underset{i\in I}{\bigcup} A_i} f(\omega)\mathrm d\mu =  \sum_{i\in I} \int_{A_i} f(\omega)\mathrm d\mu \equiv \sum_{i=1}^\infty \int_{A_i} f(\omega)\mathrm d\mu \quad \text{if } I=\mathbb N\]

(b) for any not necessarily positive function, \eta is countably additive on \mathcal A if f is integrable on A.


(a) If \varphi:\Omega \to \mathbb R is a simple function such that \forall \omega , \ 0 \le \varphi \le f, by theorem L4, we can write, \int_A \varphi \mathrm d \mu = \underset{i\in I}{\sum}\int_{A_i}\varphi\mathrm d\mu \le \underset{i\in I}{\sum}\int_{A_i} f \mathrm d\mu. Therefore, by definition L1,

    \[\begin{split} \eta(A)&=\int_A f\mathrm d \mu = \sup \Big\{ \int_A \varphi \mathrm d \mu\ | \varphi < \infty \text{ and of finite support and } 0\le \varphi \le f \Big\}\\&= \sup \Big\{\underset{i\in I}{\sum} \int_{A_i} \varphi \mathrm d \mu\ | \varphi \quad " \Big \}\overset{(\text{as }\varphi\le f)}{\le} \underset{i\in I}{\sum} \int_{A_i} f \mathrm d \mu= \underset{i\in I}{\sum}\eta(A_i)\\&\therefore \eta(\underset{i\in I}\bigcup A_i)\le \underset{i\in I}{\sum}\eta(A_i)\end{split}\]

Because f\ge 0 and \mu is definitely a non-negative (countably additive) measure, it is clear that \eta (A) \ge \eta (A_n) for any n\in I. Note that A_n\subset A. Now, If \eta (A_n)=\+infty for some n\in I, then \eta (A) \ge \infty and by the above results \eta (A) \le \infty, therefore, \eta (A)=\sum \eta(A_i)=\infty and it is trivial.

So, we assume that \forall i\in I,\ \eta (A_i) < \infty, i.e finite. Therefore, for any \varepsilon_1, \varepsilon_2 \ge 0 we can let \varepsilon = \max (\varepsilon_1, \varepsilon_2) and find a simple function \varphi as 0\le \varphi \le f such that,

    \[\begin{split}\int_{A_1} \varphi\mathrm d \mu \ge\int_{A_1} f\mathrm d \mu - \varepsilon&, \quad \int_{A_2} \varphi\mathrm d \mu \ge\int_{A_2} f\mathrm d \mu - \varepsilon\\\implies \eta(A_1\cup A_2)= \int_{A_1} f\mathrm d \mu + \int_{A_2} f\mathrm d \mu &\ge \int_{A_1} \varphi\mathrm d \mu + \int_{A_2} \varphi\mathrm d \mu\ge \eta(A_1) + \eta(A_2) - 2\varepsilon\end{split}\]

Because \varepsilon is arbitrary, above indicates that \eta(A_1\cup A_2) \ge \eta(A_1) + \eta(A_2). It follows that (by induction if you want), for any n,

    \[ \eta(\bigcup_{i=1}^n A_i) \ge  \sum_{i=1}^n \eta(A_i)\]

And because A\supset \bigcup_{i=1}^n A_i),

    \[\eta(A)\ge \underset{n\to \infty}{\lim} \sum_{i=1}^n  \eta(A_i) \equiv  \underset{i\in I}{\sum} \eta(A_i)\]

Therefore (considering the first part of the proof),

    \[ \eta(A=\cup A_i) = \underset{i\in I}{\sum} \eta(A_i) \quad \blacksquare\]

(b) If f is integrable, then each f^+ and f^- exists and the proof of (a) can be applied to each part.

Corollary L1: for sets A and B such that B\subset A and \mu (A\setminus B)=0, then \int_A f\mathrm d \mu = \int_B f \mathrm d \mu. This shows that a set of measure zero is negligible in integration.

Proposition: Let f be a non-negative bounded measurable function on S. If \int_S f = 0, then f\equiv 0 almost everywhere on S.

Theorem L6: If a measurable function f is (Lebesgue) integrable (finite in fact) with respect to a measure \mu on S, then |f| is (Lebesgue) integrable on S, and |\int_S f \mathrm d \mu| \le  \int_S |f| \mathrm d \mu.

Proof: Let S=A\cup B as a disjoint partition such that f_A \ge 0 and f_B < 0. Then, by theorem L5,

    \[ \int_S |f| \mathrm d \mu = \int_A  |f| \mathrm d \mu +   \int_B  |f| \mathrm d \mu =  \int_A  f^+ \mathrm d \mu +   \int_B  f^- \mathrm d \mu < \infty \text{ as f in untegrable}.  \quad \blacksquare \]

For the second part, since -f, f\le |f| we can write \int_S f \mathrm d \mu \le \int_A  |f| \mathrm d \mu and -\int_S f \mathrm d \mu \le \int_A  |f| \mathrm d \mu which implies |\int_S f \mathrm d \mu| \le  \int_S |f| \mathrm d \mu \blacksquare.

By the above theorem, we see that integribility of f implies that of |f|. Because of that, Lebesgue integral is called absolutely convergent integral. It should be noted that Riemann integration is not neccessarily absolutely convergent.

Theorem L7: For a measurable function f on S, if |f|\le g and g is integrable on S. Then f is integrable on S.

Theorem L8 [Lebesgue’s monotone convergence theorem]: For a measurable set S, let \{f_i\} be a sequence of measurable functions, f_i: \Omega \to \mathbb R, such that

    \[0\le f_1(x) \le f_2(x) \le \cdots \quad x\in S\]

Let f be defined as the following pointwise convergence,

    \[f_n(x) \to f(x)\quad n\to \infty\quad \forall x\in S\]


    \[  f \mathrm d \mu = \int_S \underset{n\to \infty}{\lim} f_n \mathrm d \mu =  \underset{n\to \infty}{\lim} \int_S f_n \mathrm d \mu \]

Note that the sequence \{f_i\} converges to f from below. Also f may or may not be bounded and hence the integral. for proof see [WR].

Theorem L8: For a measurable set S, if \{f_i\}_1^\infty is a sequence of nonnegative measurable functions (\Omega \to \mathbb R) and

    \[ f(x) = \sum_{i=1}^\infty f_n(x)\quad \forall x\in S\]


    \[\int_S f \mathrm d \mu = \int_S  \sum_{i=1}^\infty f_n(x) = \sum_{i=1}^\infty  \int_S f_n \mathrm d \mu \]

Note that \{f_n\} has to be nonnegative but doesn’t need to be monotone. Proof of this theorem is by noting that the partial sums of the infinite sum form a monotonically increasing sequence and using Theorem L7.

Theorem L9 [Fatou’s theorem]: For a measurable set S if \{f_i\}_1^\infty is a sequence of nonnegative measurable functions and f(x) = \underset{n\to \infty}{\lim} \inf f_n(x)\quad \forall x \in S, then

    \[\int_S f \mathrm d \mu =  \int_S  \underset{n\to \infty}{\lim} \inf f_n(x)  \mathrm d \mu  \le   \underset{n\to \infty}{\lim} \inf   \int_S f_n  \mathrm d \mu \]

Note that \{f_i\} does not need to be monotone.

From theorem L9, we can conclude that if a measurable function f(x) is a limit of a sequence of nonnegative measurable functions \{f_i\}, then \int_S f \mathrm d \mu \le   \underset{n\to \infty}{\lim} \inf   \int_S f_n  \mathrm d \mu. This is because for any sequence of functions \underset{n\to \infty}{\lim} f_n(x)  \le \underset{n\to \infty}{\lim} \inf f_n(x). If the sequence is monotonically increasing functions then Theorem L8 holds.

Theorem L10 [Lebesgue’s dominated convergence theorem]: for a measurable set S, let \{f_i\} be a sequence of measurable functions such that f(x) = \underset{i\to\infty}{\lim} f_n(x)\ ,x\in S. If there exists a measurable function g(x) on S such that |f_n(x)| \le g(x) for all n and x\in S, meaning that f is uniformly bounded, then

    \[ \underset{i\to\infty}{\lim}  \int_S f_n \mathrm d \mu =\int_S  \underset{i\to\infty}{\lim}  f_n \mathrm d \mu = \int_S f \mathrm d \mu \]

Corollary L2: If \mu (S)<+\infty, i.e. finite measure, and \{f_n\} is uniformly bounded on S, and f_n \to f on S, then theorem L10 holds.

[WR] Walter Rudin-Principles of Mathematical Analysis, Third Edition-McGraw-Hill Science Engineering Math (1976)

Tensors 2

Tensor as an element of tensor product of vector spaces

Before presenting another way of definition a tensor, we define a notation. A linear map and a bilinear form are respectively written as a linear combination of e_i\varepsilon^j and \varepsilon^i\varepsilon^j. Any of these (for any i,j \le the dimensions of the corrresponding spaces) can be considered as one new object and denoted as for example. \clubsuit_i^j:=e_i\varepsilon^j and \spadesuit^i^j:=\varepsilon^i\varepsilon^j. The writing of the basis vectors and/or basis covectors adjacent to each other is usually denoted by e_i\otimes\varepsilon^j and \varepsilon^i\otimes\varepsilon^j. This notation is referred to as tensor product of (basis) vectors. A general definition will be presented later. Using this notation, for now, we can write a linear map and a bilinear form as,

    \[T=T_j^ie_i\otimes\varepsilon^j\quad\quad \mathfrak B = \mathfrak B_{ij}\varepsilon^i\otimes\varepsilon^j\]

This notation can be extended to be used with any finite linear combination of tensor products of basis vectors and/or covectors where the combination coefficients takes indices following the index level convension. For example we can write,

    \[\mathcal T:=\mathcal T^{ijlt}_{ks} e_i\otimes e_j\otimes e_l\otimes e_t \otimes \varepsilon^k\otimes \varepsilon^s    \]

Let’s define tensor product of vectors and covectors and their rules.

Tensor product of vectors and covectors

Let u, v and w be vectors or covectors (not necessarily basis ones), the we define the tensor product of each pair as uv \equiv u\otimes v, and etc., and the following rules and operations,

0. Order matters: u\otimes v \ne v\otimes u

  1. Scalar multiplication: \alpha (u\otimes v) = (\alpha u) \otimes v = u\otimes (\alpha v).
  2. Addition: u\otimes (v+w)=u\otimes v + u \otimes w and (v+w)\otimes u = v\otimes  u + w \otimes u.

The above rules can be extended to tensor product of any number of vectors or covectors. For example,

1. Scalar multiplication:

    \[\alpha (u\otimes v\otimes w \otimes x \otimes y) = (\alpha u) \otimes v \otimes w \otimes x \otimes y= u\otimes (\alpha v) \otimes w \otimes x \otimes y = \cdots = u\otimes v \otimes w \otimes x \otimes (\alpha y)\]

2. Addition:

    \[u\otimes v \otimes w \otimes x \otimes y + u\otimes v \otimes z \otimes x \otimes y = u\otimes v \otimes (w+z)\otimes x \otimes y\]

The above rules can be recruited to construct vector spaces, called tensor-product vector spaces. For example, if v\in V and f\in V^*, then,

    \[v\otimes f \in V\bigotimes V^*\]

Any vector spaces can get into a tensor product. For example, V\bigotimes V\bigotimes V^*\bigotimes V\bigotimes V^* with members like u\otimes v \otimes f \otimes w \otimes h with u,v,w \in V and f,h\in V^*.

Note that tensor product of vector space can be done on totally different vector spaces over the same field, e.g. V\bigotimes W\bigotimes U^*.

Basis for a tensor product space

Let V and W be vector spaces with bases \{e_i\}_1^n and \{\zeta_j\}_1^m respectively. if v\in V and w\in W, we can write,

    \[v\otimes w = (v^ie_i)\otimes (w^j\zeta_j)=v^i w^j e_i\otimes \zeta_j\]

This states that any vector v\otimes w \in V\bigotimes W can be written as a linear combination of e_i\otimes \zeta_j. Therefore,

  1. The set of vectors \{e_i\otimes \zeta_j | i=1,\cdots, n \text {and } j=1,\cdots, m\} is a basis for the vector space V\bigotimes W.
  2. The dimension of the vector space V\bigotimes W is n\times m.

The above can be extended to tensor product of any number of vector spaces, i.e. the tensor product of the basis vectors of vector spaces creates a basis for the resultant tensor product space.

Example: Let u=u^ie_i\in V and f=f_i\varepsilon^i,h=h_j\varepsilon^j \in V^* and \alpha\in \mathbb R.

Then, \{e_i\otimes \varepsilon^j\}_{i,j=1}^n is a basis for V\bigotimes V^*, and,

    \[\begin{split}V\bigotimes V^* &\ni u\otimes f + \alpha u\otimes h = u\otimes (f + \alpha h) \\& =u^ie_i\otimes(f_j\varepsilon^j + \alpha h_k\varepsilon^k)=u^if_je_i\otimes\varepsilon^j+\alpha u^ih_ke_i\otimes\varepsilon^k\\&\overset{k \text{ is dummy}}{=}u^if_je_i\otimes\varepsilon^j+\alpha u^ih_je_i\otimes\varepsilon^j=u^i(f_j +\alpha h_j)(e_i\otimes\varepsilon^j)\equiv \varphi_{ij}e_i\otimes\varepsilon^j\end{split}\]

Tensor by tensor product

Definition (Tensor-product view): Tensor is a collection of vectors and covectors combined together by using the tensor product (of vectors and/or covectors). A tensor \mathcal T of type (r,s) is a member of the tensor product space,

    \[\underbrace{V^*\bigotimes \cdots\bigotimes V^*}_{\text{r times}}\bigotimes \underbrace{V\bigotimes \cdots\bigotimes V}_{\text{s times}}\]

and written as,

    \[\mathcal T = \mathcal T_{i_1\cdots i_r}^{j_1\cdots j_s}\varepsilon^{i_1}\otimes\cdots\otimes \varepsilon^{i_r}\otimes e_{j_1}\otimes\cdots\otimes e_{j_s}\]

Note that \mathcal T_{i_1\cdots i_r}^{j_1\cdots j_s} collects the component or the coordinates of the tensor \mathcal T with respect to the basis vectors \{\varepsilon^{i_1}\otimes\cdots\otimes \varepsilon^{i_r}\otimes e_{j_1}\otimes\cdots\otimes e_{j_s}\} or inherently \{\varepsilon^\}

In this view, a vector v=v^ie_i\in V is a (0,1) tensor, a covector f=f_i\varepsilon^i\in V^* is a (1,0) tensor. A linear map T=T_j^i\varepsilon^j\otimes e_i is a (1,1) tensor. A bilinear form T=T_{ij}\varepsilon^i\otimes \varepsilon^j is a (2,0) tensor. A bilinear map T=T_{ij}^k \varepsilon^i\otimes\varepsilon^j\otimes e_k is a (2,1) tensor.

Tensors 1


Einstein summation convention is used here. A matrix M is denoted as [M] and its ij-th element is referred to by [M]_{ij}. Quantities or coefficients are indexed as for example u^i, A_{ij} or A_i^j. These indices do not automatically pertain to row and column indices of a matrix, but the quantities can be presented by matrices through isomorphisms once their indices are freely interpreted as rows and columns of matrices.

Coordinates of a vector

Let V be a n-dimensional vector space and \mathcal B=\{e_1,\cdots, e_n\} with e_i\in V be a basis for V. Then, we define the coordinate function as,

    \[[\cdot]_{\mathcal B}:V\to \mathbb M^{n\times 1}\]

such that for a vector v\in V written by its components (with respect to \mathcal B) as v=v_ie_i the function acts as,

    \[[v]_{\mathcal B}=\begin{bmatrix}v_1 \\ \vdots \\ v_n \end{bmatrix}\]

The coordinate function is a linear map.

Change of basis for vectors

Let \mathcal B and \tilde {\mathcal B} be two basis for V, then,

\tilde e_i=F_{ji}e_j and e_i=B_{ji}\tilde e_j

where the indices of the scalar terms F_{ji} and B_{ji} are intentionally set this way. So, if all F_{mn} are collected into a matrix [F], then the sum F_{ji}e_j is over the rows of the matrix for a particular column. In other words, we can utilize the rule of matrix multiplication and write,

    \[\tilde e_i=F_{ji}e_j=[F]^{\rm T}\begin{bmatrix} e_1\\ \vdots \\ e_n \end{bmatrix}\]

The same is true for [B]:=B_{ji}. In above formulations, note that j is a dummy index (i.e. we can equivalently write \tilde e_j=F_{ij}e_i=F_{ki}e_k)

Setting \mathcal B as the initial (old) basis and writing the current (new) basis \tilde {\mathcal B} in terms of \mathcal B is referred to as forward transform denoted by F_{ij}. Relatively, B_{ij} is called backward transform.

The relation between that forward and backward transforms is obtained as follows,

    \[\begin{split}e_i &= B_{ji} \tilde e_j=B_{ji}F_{kj}e_k\\&\implies B_{ji}F_{kj}=\delta_{ik}\\&\therefore [F]=[B]^{-1} \ , [B] = [F]^{-1}\end{split}\]

We now find how vector coordinates are transformed relative to different bases. A particular v\in V can be expressed by its components according to any of \mathcal B or \tilde{\mathcal B} basis, therefore,

    \[v=v_ie_i=\tilde v_i \tilde e_i\]

To find the relation between [v]_{\tilde{\mathcal B}} and [v]_{\mathcal B} we write,

    \[\begin{split}v&=v_ie_i=\tilde v_i\tilde e_i \implies v_ie_i = v_i B_{ji} \tilde e_j\equiv C_j\tilde e_j\implies C_j=\tilde v_j\\&\therefore \tilde v_i = B_{ij}v_j\equiv [B][v]_{\mathcal B}\\&\implies v_i = F_{ij}\tilde v_j\equiv [F][v]_{\tilde {\mathcal B}}\end{split}\]

As it can be observed, the old basis to new basis is transformed by the forward transform F_{ij} while the old coordinates v_i are transformed to the new ones, \tilde v_i, by the backward transform B_{ij}. Because the coordinates of v behave contrary to the basis vectors in transformation, the coordinates or the scalar components are said to be contravariant. A vector can be called a contravariant object because its scalar components (coordinates) transforms differently from the basis vectors whose their linearly combination equals to the vector. Briefly,

Proposition: Let v=v_ie_i. Then, the scalar components/coordinates v_i are transformed by B_{ij} if and only if the basis vectors e_i are transformed by F_{ij}, such that B_{ji}F_{kj}=\delta_{ik}.

Later, a vector is called a contravariant tensor. For the sake of notation and to distinguish between the transformations of the basis and the coordinates of a vector, in index of a coordinate is written as superscript to show it is contravariant. Therefore,

    \[v = v^ie_i=\tilde v^i\tilde e_i\]

Linear maps and linear functionals

Definition: \mathcal L(V, W) is defined as the space of all linear maps V\to W where the domain and codomain are vectors spaces.

It can be proved that \mathcal L is a vector space (\mathcal L, +, \cdot), hence, for T_1, T_2\in \mathcal L(V,W) and \alpha\in \mathbb R

    \[\alpha\cdot T_1=\alpha T_1\quad , (T_1+T_2)()=T_1()+T_2()\]

Note that the addition on the LHS is an operator in \mathcal L and the addition on the RHS is an operator in W.

Proposition 1: Let T\in \mathcal L(V, W), i.e a linear map from a vector space V to another one W. If \mathcal B=\{e_1, \cdots, e_n\} is a basis for V, and T(e_i)=w_i for w_i\in W and i=1,\cdots , n, then T is uniquely defined over V.

This proposition says a linear map over a space is uniquely determined by its action on the basis vectors of that space. In other words, if T(e_i)=w_i and T^*(e_i)=w_i then \forall v\in V, \ T(v)=T^*(v). proof: let T(e_i)=w_i (given by the nature of T), then for v\in V such that v=v^ie_i, we can write v^iT(e_i)=v^iw_i, therefore, T(v^ie_i)=T(v)=v^iw_i. Because, v^i‘s are unique for (a particular) v then v^iw_i is unique for v and hence T(v) must be unique for any v\in V. In other word, there is only one unique T over V such that T(e_i)=w_i.

As a side remark, if \mathcal B=\{e_1, \cdots, e_n\} is a basis for V, hence spanning V, then \{T(e_i)| i=1,\cdots n\} spans the range of T; The range of T is a subset of W.

By this proposition, a matrix completely determining a linear can be obtained for the linear map. let V be n-dimensional with a basis \mathcal B=\{e_i\}_1^n, and W be m-dimensional with a basis \mathcal B'=\{e_i'\}_1^m. Then there are coefficients T_i^j such that,

    \[T(e_i)= T_i^j e_j'\]

In the notation T_i^j, the index j is superscript because for a fixed e_i and hence a fixed i, the term T_i^j is the coordinate of T(e_i)\in W and it is a contravariant (e.g T(e_3)=T_3^j e_j'\equiv v^je_j').

For v\in V, and w=T(v), with the coordinates [v]_{\mathcal B} and [w]_{\mathcal B'}, we can show that,

    \[w_j = T_i^jv^i\]

This expression can be written as a matrix multiplication of [w]_{\mathcal B'}=[M][v]_{\mathcal B}, where [T]:=\mathcal M(T)\in \mathbb M^{m\times n} presented by its elements as,

    \[\begin{bmatrix} T_1^1 && T_2^1 && \cdots && T_n^1 \\T_1^2 && T_2^2 && \cdots && T_n^2 \\\vdots && \vdots && \cdots && \vdots\\T_1^m && T_2^m && \cdots && T_n^m \end{bmatrix}\]

As a remark, above can be viewed as columns of the matrix and written as,

    \[[T]=\begin{bmatrix} [T(e_1)]_{\mathcal B'} && [T(e_2)]_{\mathcal B'} && \cdots && [T(e_n)]_{\mathcal B'} \end{bmatrix}\]

Linear functional (linear form or covector)

Definition: a linear functional on V is a linear map f\in V^* :=\mathcal L(V,\mathbb F). The space V^* is called the dual space of V.

Proposition: Let \mathcal B=\{e_1, \cdots, e_n\} and \varepsilon_i \in V^* be defined as \varepsilon_i(e_j):=\delta_{ij}. Then, \{\varepsilon_i\}_1^n called dual basis of \mathcal B, is a basis of V^*, and hence \dim V = \dim V^*.

Proof: first we show that \varepsilon_i‘s are linearly independent, i.e. c_i\varepsilon_i=0 \implies c_i=0 \forall i=1, \cdots, n. Note that on the RHS, 0\in V^*. For a v\in V we can write c_i\varepsilon(v) and assume c_i\varepsilon(v)=0. Then,

    \[c_i\varepsilon(v)=c_i\varepsilon(v^je_j)=0\implies c_iv^j\varepsilon(e_j)=0\implies c_iv^j\delta_{ij}=0\implies c_iv_i=0\]

Since v is arbitrary, c_i=0 ■ .

Now we prove that \{\varepsilon_i\}_1^n spans V^*. I.e \forall f \in V^* \exists \{c_1, \cdots, c_n\} such that f=c_i\varepsilon_i. To this end, we apply both sides to a basis vector of V and write f(e_j)=c_i\varepsilon_i(e_j) which implies f(e_j)=c_j or explicitly c_j is found as c_j=f(e_j). Consequently, f=f(e_i)\varepsilon_i ■.

Consider V and \mathcal B. If f\in V^*, then the matrix of the linear functional/map f is

    \[[M]=\mathcal M(f)=\begin{bmatrix} f(e_1) && \cdots && f(e_n)\end{bmatrix}\in \mathbb M^{1\times n}\]

So, for v\in V as v=v^ie_i we can write,

    \[f(v)=[M][v]_\mathcal B\quad \in \mathbb R\]

Result: if the coordinates of a vector is shown by a column vector or single-column matrix (which is a vector in the space of \mathbb M^{n\times 1}), then a row vector or a single-row matrix represents the matrix of a linear functional.

Definition: a linear functional f\in V^*, which can be identified with a row vector as its matrix, is also called a covector.

Like vectors, a covector (and any linear map) is a mathematical object that is independent of a basis (i.e. invariant). The geometric representation of a vector in (or by an isomorphism in) \mathbb R^3 is an arrow in \mathbb E^3. For a covector isomorphic to \mathbb R^2, the representation is a set (stack) of planes in \mathbb E^3 that can be represented by iso lines in \mathbb E^2. A covector that is isomorphic to \mathbb R^3 can be represented by iso surfaces in \mathbb E^3.

Example: Let \mathcal B = \{e_1, e_2\} be a basis of V and [2,1] be the matrix of a covector f in some V^*. Then, if [x]_{\mathcal B} = [x_1,x_2]^{\rm T}, we can write,

    \[y=[2,1]\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \implies y = 2x_1 + x_2\]

which, for different values of y, is a set of (iso) lines in a Cartesian CS defined by two axes x_1 and x_2 along e_{1g} and e_{2g} that are the geometric representations of e_1 and e2. The Cartesian axes are not necessarily orthogonal.

If we chose any other basis \tilde {\mathcal B} = \{\tilde e_1, \tilde e_2\} for V, then the matrix of the covector f changes. Also, the geometric representations of \{\tilde e_1, \tilde e_2\} are different from e_{1g} and e_{2g} and hence the geometric representation of the covector stays the same shape.

Example: Let \mathcal B = \{e_1, e_2\} be a basis of V and \mathcal B^* = \{\varepsilon_1, \varepsilon_2\} be a basis for V^*. This means \epsilon_i\in V^* and \epsilon_i(e_j):=\delta_{ij}. Then, the matrix of each dual basis vector is as,

    \[\mathcal M(\varepsilon_i)=\begin{bmatrix}\varepsilon_i(e_1) && \varepsilon_i(e_2)\end{bmatrix} = \begin{bmatrix}\delta_{i1} && \delta_{i2}\end{bmatrix}\]

Change of basis for covectors

Let \mathcal B = \{e_i\}_1^n and \tilde{\mathcal B} = \{\tilde e_i\}_1^n be two bases for V, and hence, \mathcal B^* = \{\varepsilon_i\}_1^n and \tilde{\mathcal B^*} = \{\tilde \varepsilon_i\}_1^n be two bases for V^*. Each dual basis vector \tilde \epsilon_i can be written in terms of the (old) dual basis vectors by using a linear transformation as \tilde \varepsilon_i = Q_{ij}\varepsilon_j. Now, the coefficients Q_{ij} are to be determined as follows,

    \[\begin{split}\tilde \varepsilon_i(e_k) &= Q_{ij}\varepsilon_j(e_k)=Q_{ij}\delta_{jk}=Q_{ik}\\&\implies \tilde \varepsilon_i(e_k) = Q_{ik}\\&\therefore Q_{ij}=\tilde \varepsilon_i(e_j)\end{split}\]

Using the formula e_i=B_{ji}\tilde e_j​​​ regarding the change of basis of vectors, the above continues as,

    \[\begin{split}Q_{ij}&=\tilde \varepsilon_i(e_j)=\tilde \varepsilon_i(B_{kj}\tilde e_k)\\&\text{by linearity of covectors}= B_{kj}\tilde \varepsilon_i(\tilde e_k)=B_{kj}\delta_{ik}=B_{ij}\\\therefore Q_{ij}&=B_{ij}\end{split}\]

This indicates that the dual basis are transformed by the backward transformation. Referring to the index convention, we use subscript for components that are transformed trough a backward transformation. Therefore,

    \[\tilde \varepsilon^i=B_{ij}\varepsilon^j\]

meaning that dual basis vectors are contravariant because they behave contrary to the basis vectors in transformation from e_i to \tilde e_i.

Now let f\in V^*. Writing f=c_i\varepsilon^i=\tilde c_j \tilde \varepsilon^j and using the above relation, we get,

    \[\begin{split}c_i\varepsilon^i&=\tilde c_j \tilde \varepsilon^j \implies c_i F_{ij}\tilde \varepsilon^j=\tilde c_j \tilde \varepsilon^j\\\tilde c_j &= F_{ij}c_i\end{split}\]

meaning that they are transforming in a covariant manner when the basis of the vector space changes from e_i to \tilde e_i.

Briefly the following relations have been shown.

Basis and change of basis for the space of linear maps \mathcal L(V, W)

As can be proved \mathcal L(V,W) is a linear vector space and any linear map is a vector. Therefore, we should be able to find a basis for this space. If V is n-dimensional and W is m-dimensional, the \mathcal L(V,W) is mn-dimensional and hence its basis should have m\times n vectors, i.e. linear maps. Let’s enumerate the basis vectors of \mathcal L as \varphi_{ij}\in \mathcal L (V,W) for i=1, \cdots , m and j=1, \cdots , n, then any linear map T can be written as,

    \[ T = c_{ij}\varphi_{ij}\]

By proposition 1, any linear map is uniquely determined by its action on the basis vectors of its codomain. If \mathcal B = \{e_i\}_1^n be a basis for V, then for any basis vector e_k,


Setting a basis for W as \mathcal B' = \{e_i'\}_1^m, the above equation becomes,


This equation holds if,

    \[\begin{matrix}c_{ij}=a_{ik} \text{ and }  \varphi_{ij}(e_k)=e_i' && \text{ if } k=j \\c_{ij} = 0 \text{ and } \varphi_{ij}(e_k)=0 && \text{ if } k\ne j\end{matrix}\]

Therefore, we can choose a set of m\times n basis vectors \varphi_{ij} for \mathcal L (V,W) as,

    \[\varphi_{ij}(e_k) =  \begin{cases}e_i'\text{ if }k=j\\0\text{ if } k\neq  j\end{cases}\]

By recruiting the basis of V^*, the above can be written as,

    \[\{\varphi_i^j}=e_i'\varepsilon^j | i=1,\cdots, m \text{ and } j=1,\cdots, n\}\]

The term e_i'\varepsilon^j is obviously a linear map V\to W. It can be readily shown that c_{ij}\varphi_i^j=c_{ij}e_i'\varepsilon^j being a linear combinations of the derived basis vectors is linearly independent, i.e. c_{ij}e'_i\varepsilon^j(v)=0(v) for any v\in V (here, note that 0\in \mathcal L).

So, a linear map T can be written as a linear combination T = c_{ij}e_i'\varepsilon^j. Here, it is necessary to use the index level convention. To this end, we observe that for a fixed i the term c_{ij} couples with \varepsilon^j and represents the coordinates of a covector. As coordinates of a covector are covariant, index j is written as subscript. For a fixed j though, the term c_{ij} couples with e_i' and represents the coordinates of a vector. As coordinates of a vector are contravariant, index i should rise. Therefore, we write,

    \[T = c_j^ie_i'\varepsilon^j\]

The coefficients c_j^i can be determined as,

    \[\forall e_k\in \mathcal B\quad T(e_k) = c_j^ie_i'\varepsilon^j(e_k)=c_k^ie_i'\]

Stating that c_{ik} are the coordinates of T(e_k) with respect to the basis of W, i.e. \mathcal B'. Comparing with what was derived as T(e_k)=T_k^ie_i', we can conclude that c_k^i = T_k^i. Therefore,

    \[T=T_j^i e_i'\varepsilon^j\]

The above result can also be derived from w_i = T_j^iv^j as follows.

    \[\begin{split} w_i &= T_j^iv^j \implies w = (T_j^iv^j)e_i' = T_j^i\varepsilon_j(v)e_i'\\&\therefore T(v) = T_j^i\varepsilon_j(v)e_i' \text{ or } T = T_j^i e_i'\varepsilon^j \end{split}\]

Change of basis of \mathcal L(V,W) is as follows.

For \mathcal L(V,W), let \mathcal B=\{e_i\}_1^n and \tilde {\mathcal B}=\{\tilde e_i\}_1^n be bases for V, and \mathcal B'=\{e_i'\}_1^m and \tilde {\mathcal  B}'=\{\tilde e_i'\}_1^m be bases for W. Also, \mathcal B^*=\{\epsilon^i\}_1^n and \tilde {\mathcal B}^*=\{\tilde \epsilon^i\}_1^n are corresponding bases of V^*. Forward and backward transformation pairs in V and W are denoted as (F_{ij}, B_{ij}) and (F'_{ij}, B'_{ij}).

    \[\begin{split} T&=T_j^ie_i'\varepsilon^j = \tilde {T_j^i} \tilde e_i'\tilde \varepsilon^j \\&\implies T_j^ie_i'\varepsilon^j = \tilde {T_j^i} F'_{ki} e_k' B_{js}\varepsilon^s \implies T_s^k = \tilde {T_j^i} F'_{ki} B_{js}\\&(\text{ by } B_{nl}F_{lm}=\delta_{nm}) \ \implies B'_{lk}F_{sr}T_s^k = \tilde {T_j^i} \delta_{li}\delta_{jr}=\tilde T_r^l\\&\therefore \tilde {T_j^i} = B'_{ik}F_{sj}T_s^k\end {split}\]

Note that the coordinates T_s^k of a linear map need two transformations such that the covariant index s of T_s^k pertains to the forward transformation and the contravariant index k pertains to the backward transformation.

Example: let T\in \mathcal L(V,V), then,

    \[T = T_i^je_j\varepsilon^i \quad \tilde {T_s^t} = B_{tj}F_{is}T_i^j\]

If the matrices, [F], [F]^{-1}=[B], and [T] are considered, we can write,

    \[ [\tilde T] =  [F]^{-1}[T][F]\]

Bilinear forms

A bilinear form is a bilinear map defined as T:V\times V\to \mathbb R. Setting a basis for V, a bilinear form can be represented by matrix multiplications on the coordinates of the input vectors. If \{e_i\}_1^n is a basis for V, then

    \[\begin{split} T(u,v)&=T(u^ie_i,v^je_j)=u^iv^jT(e_i, e_j)\\&\implies B(u,v)=u^iv^jT_{ij} \quad \text{ with } T_{ij}:=T(e_i, e_j) \end{split}\]

which can be written as,

    \[[u]^{\rm T}[T][v]\]

where [T]\in\mathbb M^{n\times n} with [T]_{ij}=T_{ij}.

The expression u^iv^jT(e_i, e_j) indicates that a bilinear form is uniquely defined by its action on the basis vectors. This is the same as what was shown for linear maps by proposition 1. This comes from the fact that a bilinear form is a linear map with respect to one of its arguments at a time.

Now we seek a basis for the space of bilinear forms, i.e. \mathcal L_b(V\times V, \mathbb R). This is a vector space with the following defined operations.

    \[\begin{split}\forall B_1, B_2 \in \mathcal B\quad (B_1+B_2)(u,v) &= B_1(u,v) + B_2(u,v)\\\forall \alpha\in \mathbb R\quad \alpha B(u,v) &= B(\alpha u, v)= B(u, \alpha v)\end{split}\]

The dimension of this space is n\times n, therefore, for any bilinear form T there are bilinear forms \rho_{ij}\in \mathcal B_b such that,


From the result T(u,v)=u^iv^jT(e_i, e_j)=u^iv^jT_{i,j} we can conclude that

    \[\begin{split}T(u,v)&=u^iv^jT_{ij}= \varepsilon^i(u)\varepsilon^j(v)T_{ij}\implies T(.,.) = T_{ij}\varepsilon^i(.)\varepsilon^j(.)\\\therefore c_{ij}&=T_{ij}, \quad \rho_{ij}= \varepsilon^i\varepsilon^j\quad \text {or } \rho_{ij}(e_s,e_t)=\begin{cases} 1& s=i \text { and } t=j\ 0& \text {otherwise}\end{cases}\end{split}\]

Following the index level convention, the indices of T_{ij} should stay as subscripts because each index pertains to the covariant coordinates of a covector after fixing the other index.

If \mathcal B and \tilde {\mathcal B} are two bases for V, then the change of basis of the space of bilinear forms are as follows.

    \[\begin{split} T&=T_{ij}\varepsilon^i\varepsilon^j = \tilde T_{ij} \tilde \varepsilon^i\tilde \varepsilon^j = \tilde T_{ij}B_{is}\varepsilon^sB_{jt}\varepsilon^t\\&\implies T_{st}=\tilde T_{ij}B_{is}B_{jt}\\&\therefore \tilde T_{kl} = F_{sk}F_{tl}T_{st}\end {split}\]

Example: the metric bilinear map (metric tensor)

Dot/inner product on the vector space V over \mathbb R is defined as a bilinear map \langle \cdot , \cdot \rangle : V\times V \to \mathbb R such that, \langle u , v \rangle = \langle v , u \rangle and v\ne 0 \iff \langle v , v \rangle > 0. With this regard two objects (that can have geometric interpretations for Euclidean spaces) are defined as,

1- Length of a vector \|v\|^2:=\langle v,v\rangle
2- Angle between two vectors \cos \theta :=\langle u/\|u\|,v/|v\|\rangle

Let see how the dot product is expressed through the coordinates of vectors. With \{e_i\}_1^n being a basis for V, we can write,

    \[u\cdot v :=g\langle u, v \rangle = u^iv^jg_{ij} \quad \text {s.t}\quad g_{ij}=e_i\cdot e_j\]

The term g_{ij} is called the metric tensor and its components can be presented by an n-by-n matrix as [g]_{ij}=e_i\cdot e_j.

If the basis is an orthonormal basis, i.e. e_i\cdot e_j=0 \forall i\ne j, then g_{ij}=\delta_{ij} and [g] is the identity matrix. Therefore, v\cdot u= u^iv^i and \|v\|^2 = v^iv^i.

Multilinear forms

A general multilinear form is a multilinear map defined as T:V_1\times V_2\times \cdots \times V_n\to \mathbb R, where V_i is a vector space. Particularly setting V_i=V leads to a simpler multilinear form as T:V\times V\times \cdots \times V\to \mathbb R.

Following the same steps as shown for a bilinear map, a multilinear form T:V\times V\times \cdots \times V\to \mathbb R can be written as,

    \[\begin{split} T(u,v,\cdots , z)&=T(u^ie_i,v^je_j, \cdots, z^ke_k)= u^iv^j\cdots z^k T(e_i,e_j, \cdots, e_k)\\&\implies T(u,v,\cdots , z)=u^iv^j\cdots z^kT_{ij\cdots k} \quad \text{with}\quad T_{ij\cdots k}:=T(e_i, e_j, \cdots e_k) \end{split}\]

which implies,

    \[T = T_{ij\cdots k}\varepsilon^i \varepsilon^j\cdots \varepsilon^k\]

showing that a multilinear form can be written as a linear combination of covectors.

Multilinear maps

A multilinear map is a map T:V_1\times \cdots \times V_n \to W. A multilinear map can be written in terms of vector and covector basis. For example, consider T:V\times V \to W as T(u,v)=w with \{e_i\}_1^n and \{e'_i\}_1^m being bases for V and W. We can write,

    \[ T(u,v)=T(u^ie_i,v^je_j)=u^iv^jT(e_i,e_j)\]

Because for each i and j we have T(e_i,e_j)\in W, we can write,

    \[ T(e_i,e_j)=T_{ij}^ke'_k\]

We write the indices i and j in T_{ij}^k as subscripts in accordance with their position on the LHS; however, we’ll see that T_{ij}^k is a coordinate of a covector for each i or j when k is fixed. Combining above, we can write,

    \[\begin{split}T(u,v)&=u^iv^jT(e_i,e_j)=u^iv^jT_{ij}^ke'_k=\varepsilon^i(u)\varepsilon^j(v)T_{ij}^ke'_k\\&\therefore T = \varepsilon^i\varepsilon^jT_{ij}^ke'_k\end{split}\]

The term T_{ij}^k collects n\times n \times m coefficients and uniqully defines the multilinear map. We can imagine this term as a 3-dimensional array/matrix. Above also shows that the multilinear map can be written as a linear combination of basis covectors and basis vectors.

Definition of a tensor

Defining the following terms,

  • Vector space V and basis \{e_i\}_1^n and another basis \{\tilde e_i\}_1^n.
  • Basis transformation as \tilde e_j=F_{ij}e_{i}, and therefore e_j=B_{ij}\tilde e_{i}.
  • The dual vector space of V as V^*.
  • Vector space V' and basis \{e_i'\}_1^m and another basis \{\tilde e_i'\}_1^m.
  • Basis transformation as \tilde e_j'=F'_{ij}e'_i, and therefore e_j'=B'_{ij}\tilde e'_i
  • Linear map T\in \mathcal L(V,V').
  • Bilinear form \mathfrak B \in \mathcal L(V,V; \mathbb R).

we concluded that,

    \[\begin{split}v=v^ie_i=\tilde v^i\tilde e_i &\implies \tilde v^i = B_{ij}v^j\quad \text{contravariantly}\\\varepsilon^i, \tilde \varepsilon^i \in V^*, \varepsilon^i(e_j)=\tilde \varepsilon^i(\tilde e_j)=\delta_{ij} &\implies \tilde \varepsilon^i=B_{ij}\varepsilon^j \quad \text{contravariantly}\\f=c_i\varepsilon^i=\tilde c_j \tilde \varepsilon^j&\implies \tilde c_j = F_{ij}c_{i}\quad \text{covariantly}\\T= T_j^ie_i'\varepsilon^j = \tilde {T_j^i} \tilde e_i'\tilde \varepsilon^j &\implies \tilde {T_j^i} = B'_{il}F_{kj}T_k^l\quad \text{contravariant- and covariantly}\\\mathfrak B=\mathfrak B_{ij}\varepsilon^i\varepsilon^j = \tilde {\mathfrak B}_{ij} \tilde \varepsilon^i\tilde \varepsilon^j &\implies \tilde {\mathfrak B}_{ij} = F_{ki}F_{lj}{\mathfrak B}_{kl} \quad \text{covariantly and covariantly}\end{split}\]

It is observed that if a vector v\in V is written in terms a single sum/linear combination of basis vectors of V, then the components of the vectors change contravariantely with respect to a change of basis. Then, the covectors are considered and it is observed that their components change covariently upon change of basis of V^* or V. A linear map can be written as a linear combination of vectors and covectors. The coefficients of this combination is seen to change both contra- and covariantely when the bases (of V and V') change. A bilinear form though can be written in terms of a linear combination of covectors. The corresponding coefficients change covariantly with change of basis. These results can be generalized toward an abstract definition of a mathematical object called a tensor. There are two following approaches for algebraically denfining a tensor.

Tensor as a multilinear form

Motivated by how linear maps, bilinear forms, and multilinear forms and maps can be written by combining basis vectors and covectors, a generalized combination of these vectors can considered. For example,

    \[\mathcal T:=\mathcal T^{ij}_k^l_s^t e_i e_j\varepsilon^k e_l\varepsilon^s e_t\]

This object \mathcal T consists of a linear combination of a unified (merged) set of basis vector and covectors e_i e_j\varepsilon^k e_l\varepsilon^s e_t (of V and V^*) by scalar coefficients \mathcal T^{ij}_k^l_s^t. According to the type of the basis vectors, the indices become sub- or superscript, and hence it determines the type of the transformation regarding that index. By reordering the basis vectors and covectors, we can write,

    \[\mathcal T:=\mathcal T_{ks}^{ijlt} \varepsilon^k \varepsilon^s e_i e_j e_l  e_t\]

Recalling that vector components can be written as v^i=\varepsilon^i (v) implines that there is a map T_{(v)} for a particular vector v such that,

    \[v^i=T_{(v)}(\varepsilon^i) \quad\text{s.t}\quad T_{(v)}:V^* \to \mathbb R\]

And also the components of a covector f\in V^* determined as,

    \[f_i=f(e_i)\equiv T_{(f)}(e_i)\]

motivates defining \mathcal T_{ks}^{ijlt} as the array (collection of the coefficients) of a multilinear form,

    \[T:V\times V\times V^*\times V^*\times V^*\times V^*\to \mathbb R \iff \mathcal T_{ks}^{ijlt}\equiv T_{ks}^{ijlt}=T(e_k,e_s,\varepsilon^i,\varepsilon^j,\varepsilon^l,\varepsilon^t)\]

Therefore, the object or the multi-dimensional array \mathcal T_{ks}^{ijlt} which is dependent on chosen bases of V and V^* and its transformation rules are based on the types of the beases (or indices) can be intrinsically related to an underlying multilineat map T. By virtue of this observation, an object called a tensor is defined as the following.

Definition: A tensor of type (r,s) on a vector space V over a field \mathbb R (or \mathbb C) is a multilinear map as,

    \[\mathcal T: \underbrace{V\times \cdots\times V}_{\text{r times}}\times \underbrace{V^*\times \cdots\times V^*}_{\text{s times}} \to \mathbb R \text{ or } \mathbb C \]

The coordinates or the (scalar) components of a tensor \mathcal T can then be determined once a basis \{e_i\}_1^n for V and a basis \{\varepsilon^i\}_1^n for V^* are fixed. Therefore,

    \[\mathcal T_{i_1\cdots i_r}^{j_1\cdots j_s}\equiv \mathcal T (e_{i_1}, \cdots, e_{i_r}, \varepsilon^{j_1}, \cdots, \varepsilon^{j_s})\]

Note that r is the number of covariant indices and s is the number of contravariant indices. A tensor of type (r,s) can be imagined as an r+s-dimensional array of data containing (\text {dim} V)^{r+s} elements. Each index corresponds to a dimention of the data array.

By this definition, a vector v=v^ie_i is a (0,1) tensor as it can be viewed as,

    \[\begin{split}v&=v^ie_i = \varepsilon^i(v)e_i \implies v^i = \varepsilon^i(v)\\\mathcal T^i&:=v^i\implies \exists \mathcal T:V^*\to \mathbb R\quad \text {s.t}\quad \mathcal T(\varepsilon^i) = v^i\end{split}\]

This implies that for each v\in V there is a (multilinear with one input) map \mathcal T receiving a basis covector and returning the corresponding scalar component of the vector (\mathcal T:V^*\to \mathbb R). This corresponding map is unique for each vector and it is called a tensor.

A covector (dual vector or a linear form) f\in V^* is a (1,0) tensor because a covector f is a linear map \mathcal T:V\to \mathbb R

َ A linear map T:V\toV (or V\to W) is a (1,1) tensor because the term T_j^i in T=T_j^ie_i\varepsilon^j pertains to a covector’s components for a fixed i and to a vector’s components for a fixed j. Therefore,

    \[T_j^i=\mathcal T_j^i=\mathcal T(e_j, \varepsilon^i)\]

in other words, a linear map is a tensor viewed as a multilinear map \mathcal T: V\times V^* \to \mahbb R. Here, the multilibear form can be considered as \mathcal T(v,f) = f\circ T(v) by which gives T_j^i=\mathcal T_j^i=\mathcal(e_j, \varepsilon^i). Note that T(e_i) returns a vector and f being \varepsilon^j extract its j-th coordinate, and hencem the array/matrix of the linear map is retrieved.

A bilinear form \mathfrak B=\mathfrak B_{ij}\varepsilon^i\varepsilon^j is then a (2,0) tensor, where \mathfrak B_{ij}=\mathfrak B(e_i,e_j)

A multilinear form T = T_{ij}^k \varepsilon^i\varepsilon^j e'_k is a (2,1) tensor where T_{ij}^k=\mathcal T_{ij}^k=\mathcal T(e_i,e_j,\varepsilon^k) where \mathcal T:V\times V \times V^* \to \mathbb  R can be considered as \mathcal T (u,v,f)= f\circ T(u,v).

As an example, the cross product of two vectors u,v\in \mathbb R^3 defined as w=u\times v is a multilinear map \mathbb R^3 \times \mathbb R^3 \to \mathbb R^3 is a (2,1) tensor.

By convension, scalars are (0,0) tensors.

Remark: for a tensor \mathcal T_{i_1\cdots i_r}^{j_1\cdots j_s} we can write,

    \[\mathcal T = \mathcal T_{i_1\cdots i_r}^{j_1\cdots j_s} \varepsilon^{i_1}\cdots \varepsilon^{i_r} e_{j_1}\cdots e_{j_s} = \mathcal T (e_{i_1}, \cdots, e_{i_r}, \varepsilon^{j_1}, \cdots, \varepsilon^{j_s})\varepsilon^{i_1}\cdots \varepsilon^{i_r} e_{j_1}\cdots e_{j_s}\]

Example: Stress tensor. The Cauchy stress tensor in mechanics is a linear map and hence a (1,1) tensor.

Rank of a tensor

The rank of a (r,s)-type tensor is defined as r+s. In this regard, tensors of different types can have the same rank. For example tensors of types (1,1), (2,0), (0,2) have the same rank being 2. Here we compare these tensors with eachother.

A (1,1) tensor representing a linear map is \mathcal T_j^i where \mathcal T_j^i=\mathcal T(e_j, \varepsilon^i) with \mathcal T:V\times V^* \to \mathbb R.

A (2,0) tensor representing a bilinear form is \mathcal T_{ij} where \mathcal T_{ij}=\mathcal T(e_i, e_j) with \mathcal T:V\times V \to \mathbb R.

A (0,2) tensor is \mathcal T^{ij} where \mathcal T^{ij}=\mathcal T(\varepsilon^i, \varepsilon^j) with \mathcal T:V^*\times V^* \to \mathbb R.

The coeffieints of the above tensor are collected in 2-dimensional array/matrix; however, they follow different transformation rules based on their types.

Math interactive tools

Eigenvalues and eigenvectors of a 3-by-3 matrix.