Geometric numerical integration of the assignment flow

Alexander Zeilmann; Fabrizio Savarino; Stefania Petra; Christoph Schnörr

doi:10.1088/1361-6420/ab2772

1. Introduction

1.1. Overview, motivation

The assignment flow, recently introduced by [1] and detailed in section 2, denotes a smooth dynamical system evolving on an elementary statistical manifold, for the contextual classification of a finite set of data given on an arbitrary graph. Vertices of the graph are associated with the elements of the data set and correspond to locations in space and/or time in typical applications. Classification means to assign each datum exactly to one class representative, called label, out of a finite set of predetermined labels. Contextual classification means that these decision directly depend on each other, as encoded by the edges (adjacency relation) of the underlying graph. In the context of image analysis, classifying given image data on an image grid graph in this way is called the image labeling problem. We point out, however, that the assignment flow applies to arbitrary data represented on a graph.

A key property of the assignment flow is that decision variables do not live in the space used to model the data. Rather, a probability simplex is associated with each datum, on which a flow evolves until it converges to one of the vertices of the simplex that encode the labels. Each simplex is equipped with the Fisher–Rao metric which turns the relative interior of the simplex into a smooth Riemannian manifold. It is this particular geometry that effectively promotes discrete decisions that interact in a smooth way. Replacing in addition the Riemannian (Levi-Civita) connection by the $\alpha$ -connection (with $\alpha=1$ ) introduced by Amari and Chentsov [2], enables to carry out basic geometric operations in a computationally efficient way. Keeping the assignment flow as 'inference engine' separate from the data space and model allows to flexibly apply it to a broad range of contextual data classification problems. We refer to [2, 3] as basic texts on information geometry and to [4] for more information on the image labeling problem.

From a more distant viewpoint, our work ties in with the recent trend to explore the mathematics of deep networks from a dynamical systems perspective [5]. A frequently cited paper in this respect is [6] where a connection was made between the so-called residual architecture of networks and explicit Euler integration steps of a corresponding system of nonlinear ordinary differential equations (ODEs). We refer to [7] for a good exposition. While this offers a novel and fresh perspective on the learning problem of network parameters, it does not alter the basic ingredients of such networks that apparently have been adopted in an ad-hoc way, like parametrized static layers connected by nonlinear transition functions, ReLU activations etc.

By contrast, the assignment flow provides a smooth dynamical system on a graph (network), where all ingredients coherently fit into the overall mathematical framework. Based on this, we recently showed how discrete graphical models for image labeling can be evaluated using the assignment flow [8], and how unsupervised labeling can be modeled by coupling the assignment flow and Riemannian gradient flows for label evolution on feature manifolds [9]. Our current work, to be reported elsewhere, studies machine learning problems based on controlling the assignment flow. Here, in particular, algorithms play a decisive role that accurately integrate the assignment flow numerically on the manifold where it evolves. A thorough study of such algorithms is the subject of the present paper.

1.2. Contribution, organization

This paper presents two interrelated contributions, as illustrated by figure 1.

(1)
We derive from the assignment flow—henceforth called nonlinear assignment flow—the linear assignment flow, that still is nonlinear but governed by a linear ODE on the tangent space. This property is attractive for modeling tasks (e.g. parameter estimation and control) as well as for the design of numerical algorithms. In particular, our experiments show that the linear flow closely approximates the nonlinear flow, as far as concerns the final labeling results.
(2)
We study a range of algorithms for numerically integrating both the nonlinear and the linear assignment flow, respectively, while properly taking into account the underlying geometry.
- (a)
  Regarding the nonlinear assignment flow, we adopt the machinery of Lie group methods for the numerical integration of ODEs on manifolds [10] and devise corresponding extensions of classical Runge–Kutta (RK) schemes, called RKMK schemes (Runge–Kutta–Munthe–Kaas) in the literature [11]. We combine pairs of these extensions to form embedded RKMK schemes for adaptive step size control, analogous to classical embedded RK schemes [12].
- (b)
  Regarding the linear assignment flow, we take advantage in two alternative ways of the linearity of the flow on the tangent space.
  - (i)
    On the one hand, we derive a local error estimate in order to apply classical RK schemes [12] on the tangent space, with step sizes that adapt automatically.
  - (ii)
    On the other hand, we evaluate the integral representation of the linear flow, due to Duhamels formula, and approximately evaluate this integral using Krylov subspace methods, as has been developed in the literature on exponential integrators [13–15].

All these explicit numerical schemes are evaluated and discussed in section 6, using 'ground truth' flows as a baseline that were computed using the implicit geometric Euler scheme with a sufficiently small step size. All iterative algorithms are parameter free, except for specifying a single tolerance value with respect to the local error, that governs adaptive step size selection. Our experiments indicate a value for this parameter that 'works' regarding integration accuracy and labeling quality, but is not too conservative (i.e. small). In the case of the exponential integrator, we merely have to supply the final point of time T at which the linear assignment flow should be evaluated, in addition to the dimension of the Krylov subspace which controls the quality of the approximation. We conclude with a synopsis of our results in section 7.

1.3. Basic notation

Index sets I and J index vertices $i \in I$ of the underlying graph and labels $j \in J$ , respectively. $\mathcal{S}$ and $\mathcal{W}$ denote the basic statistical manifolds that we work with, defined in section 2. Points $p,q \in \mathcal{S}$ are strictly positive probability vectors, and we denote efficiently by

$\begin{align*} \newcommand{\e}{{\rm e}} \newcommand{\T}{\top} \newcommand{\B}{\mathbb{B}} \displaystyle q p = (q_{1} \cdot p_{1},\ldots, q_{|J|} \cdot p_{|J|})^{\T}, \qquad\qquad \frac{q}{p} = \Big(\frac{q_{1}}{p_{1}},\ldots,\frac{q_{|J|}}{p_{|J|}}\Big)^{\T} \nonumber \end{align*} \tag{ 1.1 }$

componentwise multiplication for general vectors, and componentwise subdivision only if $p \in \mathcal{S}$ . Likewise, functions like the exponential function and the logarithm with vectors as arguments apply componentwise,

$\begin{align*} \newcommand{\e}{{\rm e}} \newcommand{\T}{\top} \displaystyle e^{v} = (e^{v_{1}},e^{v_{2}},\ldots)^{\T}, \qquad\qquad \log v = (\log v_{1},\log v_{2},\ldots)^{\T}. \nonumber \end{align*} \tag{ 1.2 }$

$\newcommand{\Exp}{{\rm Exp}} \Exp$ and $\newcommand{\e}{{\rm e}} \exp$ denote exponential mappings defined in section 2, whereas $\newcommand{\expm}{{\rm expm}} \newcommand{\e}{{\rm e}} \expm$ denotes the matrix exponential in section 4.2. The ordinary exponential function defined on the real line $\newcommand{\R}{\mathbb{R}} \R$ is always denoted by $\newcommand{\R}{\mathbb{R}} e^{x},\,x \in \R$ .

$\newcommand{\T}{\top} \mathbb{1}=(1,\ldots,1){}^{\T}$ denotes the constant 1-vector with appropriate number of components depending on the context. We use the common shorthand $[n]=\{1,2,\ldots,n\}$ with $n \in \mathbb{N}$ . $\|\cdot\|$ denotes the $\newcommand{\e}{{\rm e}} \ell_{2}$ -norm and $\|\cdot\|_{p}$ the $\newcommand{\e}{{\rm e}} \ell_{p}$ -norm if $p \neq 2$ .

2. The assignment flow

We summarize the assignment flow introduced by [1] and related concepts required in this paper. Let $G=(I,E)$ be a given graph and let

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \displaystyle \mathcal{F}_{I} = \big\{\,f_{i} \colon i \in I\big\} \subset \mathcal{F} \nonumber \end{align} \tag{ 2.1a }$

be data given in a metric space

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle (\mathcal{F},d). \nonumber \end{align} \tag{ 2.1b }$

We call $\mathcal{F}_{I}$ image data even though the f _i typically represent features extracted from raw image at pixel $i \in I$ as a preprocessing step. G may be a grid graph (with self-loops) as in low-level image processing or a less structured graph, with arbitrary connectivity in terms of the neighborhoods

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \displaystyle \mathcal{N}_{i} = \{k \in I \colon ik=ki \in E\}\cup \{i\} . \nonumber \end{align} \tag{ 2.2 }$

We associate with each neighborhood $\mathcal{N}_{i}$ weights satisfying

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-weights} w_{ik} > 0,\qquad \sum_{k \in \mathcal{N}_{i}} w_{ik} = 1,\qquad \forall i \in I. \nonumber \end{align} \tag{ 2.3 }$

These weights parametrize the regularization property of the assignment flow and are assumed to be given. How to learn them from data in order to control the assignment flow will be reported elsewhere.

Along with $\mathcal{F}_{I}$ we assume prototypical data

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-GJ} \mathcal{G}_{J} = \big\{g_{j} \in \mathcal{F} \colon j \in J\big\} \nonumber \end{align} \tag{ 2.4 }$

to be given, henceforth called labels. Each label g_j represents the data of class j . Image labeling denotes the problem to assign class labels to image data depending on the local context encoded by the graph G. We refer to [8] for more details and background on the image labeling problem.

Assignments of labels to data are represented by discrete probability distributions

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\T}{\top} \displaystyle \label{eq:def-Wi} W_{i} = (W_{i1},\ldots,W_{i|J|})^{\T} \in \mathcal{S},\quad i \in I, \nonumber \end{align} \tag{ 2.5 }$

where

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-S} \mathcal{S} = \big\{\,p \in \R^{|J|} \colon p_{j}>0, \, \forall j \in J,\; \la\mathbb{1},p \ra=1\big\} \nonumber \end{align} \tag{ 2.6 }$

denotes the relatively open probability simplex equipped with the Fisher–Rao metric

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-gp-T0} g_{p}(u,v) = \sum_{j \in J} \frac{u_{j} v_{j}}{p_{j}},\qquad u, v \in T_{0} = \{\,p \in \R^{|J|} \colon \la \mathbb{1}, p \ra = 0\},\qquad p \in \mathcal{S}, \nonumber \end{align} \tag{ 2.7 }$

which turns $\mathcal{S}$ into a Riemannian manifold. In connection with $\mathcal{S}$ , we define the

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\T}{\top} \newcommand{\R}{\mathbb{R}} \displaystyle \mathbb{1}_{\mathcal{S}} = \frac{1}{|J|}(1,\dots,1)^{\T} \in \R^{|J|} \qquad\qquad({\bf barycenter}) \nonumber \end{align} \tag{ 2.8 }$

of $\mathcal{S}$ , i.e. the uniform distribution, the orthogonal projection

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Diag}{{\rm Diag}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \newcommand{\T}{\top} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-Pi-T0} \Pi_{T_{0}} \colon \R^{|J|} \to T_{0}, \qquad\quad \ \Pi_{T_{0}}(z) = \big(\Diag(\mathbb{1})-\mathbb{1}\mathbb{1}_{\mathcal{S}}^{\T}\big) z, \nonumber \end{align} \tag{ 2.9a }$

and the linear replicator map

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Diag}{{\rm Diag}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \newcommand{\T}{\top} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-Pi-p} \Rep{p} \colon \R^{|J|} \to T_{0}, \qquad\qquad \Rep{p}(z) = \big(\Diag(\,p)-p p^{\T}\big) z,\qquad p \in \mathcal{S} \nonumber \end{align} \tag{ 2.9b }$

satisfying

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:Pi-p-T0-commute} \Rep{p} = \Rep{p} \Pi_{T_{0}} = \Pi_{T_{0}} \Rep{p}. \nonumber \end{align} \tag{ 2.10 }$

Adopting the $\alpha$ -connection with $\alpha=1$ from information geometry as introduced by Amari and Chentsov [2, section 2.3], [3], the exponential map based on the corresponding affine geodesics reads

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \displaystyle \label{eq:exp-e-connection} \Exp \colon \mathcal{S} \times T_{0} \to \mathcal{S}, \qquad\ \ (\,p,v) \mapsto \Exp_{p}(v) = \frac{e^{\frac{v}{p}}}{\la p, e^{\frac{v}{p}}\ra} p \nonumber \end{align} \tag{ 2.11a }$

with inverse [1, appendix]

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-invexp} \Exp^{-1} \colon \mathcal{S} \times \mathcal{S} \to T_{0}, \qquad (\,p,q) \mapsto \Exp_{p}^{-1}(q) = \Rep{p}\log\frac{q}{p}. \nonumber \end{align} \tag{ 2.11b }$

Specifically, we define

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\ci}{\perp\!\!\!\perp} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-exp-1} \exp_{p} = \Exp_{p} \circ \Rep{p} \colon \R^{|J|} = T_{0}\oplus\R\mathbb{1}\to \mathcal{S}, \qquad\qquad z \mapsto \frac{p e^{z}}{\la p, e^{z}\ra},\qquad \forall p \in \mathcal{S} \nonumber \end{align} \tag{ 2.12a }$

with inverse [1, appendix]

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-invexp-eins} \exp_{p}^{-1} \colon \mathcal{S} \to T_{0}, \qquad\qquad q \mapsto \Pi_{T_{0}}\log\frac{q}{p}. \nonumber \end{align} \tag{ 2.12b }$

Remark 2.1. Calling (2.12b) the 'inverse' map is justified by the fact that $\newcommand{\e}{{\rm e}} \exp_{p}$ does not depend on any constant component $\newcommand{\R}{\mathbb{R}} \R\mathbb{1} \in T_{0}^{\perp}$ of the argument vector z. Yet, we choose $\newcommand{\R}{\mathbb{R}} \R^{|J|}$ as domain because $\newcommand{\e}{{\rm e}} \exp_{p}$ will be applied to arbitrary distance vectors $\newcommand{\R}{\mathbb{R}} D_{i} \in \R^{|J|}$ (see (2.17)) arising from given data, and the notation indicates that the implementation does not need to remove this component explicitly [1, remark 4].

These mappings naturally extend to the collections of assignment vectors (2.5), regarded as points on the

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{W} = \mathcal{S} \times \ldots \times \mathcal{S} \qquad(|I|\; {\rm times})\qquad\qquad({\bf assignment~manifold}) \nonumber \end{align} \tag{ 2.13 }$

with tangent space

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathcal{T}_{0} = T_{0} \times \ldots \times T_{0} \qquad(|I|\; {\rm times}) \nonumber \end{align} \tag{ 2.14 }$

and the corresponding mappings

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \mathbb{1}_{\mathcal{W}} = (\mathbb{1}_{\mathcal{S}},\ldots,\mathbb{1}_{\mathcal{S}}) \in \mathcal{W} \qquad\qquad\qquad\qquad\quad\quad \ \ \ ({\bf barycenter}) \nonumber \end{align} \tag{ 2.15a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\R}{\mathbb{R}} \displaystyle \Pi_{\mathcal{T}_{0}}(Z) = \big(\Pi_{T_{0}}(Z_{1}),\ldots,\Pi_{T_{0}}(Z_{|I|})\big) \in \mathcal{T}_{0}, \qquad\qquad W \in \mathcal{W},\quad Z \in \R^{|I||J|} \nonumber \end{align} \tag{ 2.15b }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:assignment-flow-maps-Pi-W} \Rep{W}(Z) = \big(\Rep{W_{1}}(Z_{1}),\ldots,\Rep{W_{|I|}}(Z_{|I|})\big) \in \mathcal{T}_{0}, \qquad\qquad W \in \mathcal{W},\quad Z \in \R^{|I||J|} \nonumber \end{align} \tag{ 2.15c }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\la}{\langle} \displaystyle \label{eq:exp-e-W} \Exp_{W}(V) = \big(\Exp_{W_{1}}(V_{1}),\ldots,\Exp_{W_{|I|}}(V_{|I|})\big) \in \mathcal{W}, \quad W \in \mathcal{W},\quad V \in \mathcal{T}_{0} \nonumber \end{align} \tag{ 2.15d }$

and $\newcommand{\Exp}{{\rm Exp}} \newcommand{\e}{{\rm e}} \exp_{W}, \Exp^{-1}_{W}, \exp^{-1}_{W}$ similarly defined based on (2.11b), (2.12a) and (2.12b). Finally, we define the geometric mean of assignment vectors [1, lemma 5]

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\pr}{{\rm pr}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:def-geom-mean} \mathcal{G}^{w}_{i}(W) = \Exp_{W_{i}} \Big(\sum_{k \in \mathcal{N}_{i}} w_{ik} \Exp_{W_{i}}^{-1}(W_{k})\Big) = \exp_{W_{i}}\Big(\log\frac{\prod_{k \in \mathcal{N}_{i}} W_{k}^{w_{ik}}}{W_{i}}\Big),\qquad i \in I \nonumber \end{align} \tag{ 2.16 }$

where $W_{k}^{w_{ik}}$ is the componentwise exponentiation of W_k with w_ik.

Using this setting, the assignment flow accomplishes image labeling as follows. Based on (2.1), (2.4), distance vectors

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\R}{\mathbb{R}} \displaystyle D = (D_{1},\ldots,D_{|I|}) \in \R^{|I||J|} \qquad\qquad\qquad\qquad ({\bf distance~vectors}) \nonumber \end{align} \tag{ 2.17a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\T}{\top} \displaystyle D_{i} =\big(d(\,f_{i},g_{1}),\ldots,d(\,f_{i},g_{|J|})\big)^{\T},\quad i \in I \nonumber \end{align} \tag{ 2.17b }$

are defined and mapped to

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle L(W) = \exp_{W}(-\frac{1}{\rho} D) \in \mathcal{W}, \qquad\qquad\qquad ({\bf likelihood~vectors}) \nonumber \end{align} \tag{ 2.18a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \displaystyle \label{eq:def-LiWi} L_{i}(W_{i}) = \frac{W_{i} e^{-\frac{1}{\rho} D_{i}}}{\la W_{i},e^{-\frac{1}{\rho} D_{i}} \ra},\quad \rho > 0,\qquad i \in I, \nonumber \end{align} \tag{ 2.18b }$

where $\rho$ is a user parameter to normalize the distances induced by the specific features f _i at hand. This representation of the data is regularized by local geometric smoothing to obtain

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-SW} S(W) \in \mathcal{W},\qquad S_{i}(W) = \mathcal{G}^{w}_{i}\big(L(W)\big),\quad i \in I, \qquad({\bf similarity~vectors}) \nonumber \end{align} \tag{ 2.19 }$

which in turn evolves the assignment vectors $W_{i},\, i \in I$ through the

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:assignment-flow} \dot W = \Rep{W}\big(S(W)\big),\qquad W(0) = \mathbb{1}_{\mathcal{W}}. \qquad\qquad\qquad ({\bf assignment~flow}). \nonumber \end{align} \tag{ 2.20 }$

Methods for numerically integrating this flow are examined in the following sections.

3. Geometric Runge–Kutta integration

We apply the general approach of [11] to our problem. For background and more details, we refer to [10] and [16, chapter 4].

3.1. General approach

In order to apply Lie group methods to the integration of an ODE on a manifold $\mathcal{M}$ , one has to check first if the ODE can be represented properly. Let

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \displaystyle \Lambda \colon \LG{G} \times \mathcal{M} \to \mathcal{M} \nonumber \end{align} \tag{ 3.1a }$

denote the action of a Lie group $\newcommand{\LG}[1]{\mathrm{#1}} \LG{G}$ on $\mathcal{M}$ satisfying

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\LG}[1]{\mathrm{#1}} \displaystyle \Lambda(e,p) = p \qquad \qquad\qquad\qquad \ \ \, {{\rm with~identity~}} e \in \LG{G}, \nonumber \end{align} \tag{ 3.1b }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\LG}[1]{\mathrm{#1}} \displaystyle \Lambda(g_{1}\cdot g_{2},p) = \Lambda(g_{1},\Lambda(g_{2},p)), \qquad {{\rm for~all}}\; g_{1},g_{2} \in \LG{G},\; p \in \mathcal{M}. \nonumber \end{align} \tag{ 3.1c }$

Furthermore, let $\newcommand{\Lg}[1]{\mathfrak{#1}} \Lg{g}$ denote the Lie algebra of $\newcommand{\LG}[1]{\mathrm{#1}} \LG{G}$ , $\newcommand{\Lg}[1]{\mathfrak{#1}} \Lg{X}(\mathcal{M})$ the set of all smooth vector fields on $\mathcal{M}$ ,

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Lg}[1]{\mathfrak{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \displaystyle \lambda \colon \Lg{g}\times\mathcal{M} \to \mathcal{M} \nonumber \end{align} \tag{ 3.2 }$

a smooth function and $\newcommand{\la}{\langle} \lambda_{\ast}$ the induced map defined by

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Lg}[1]{\mathfrak{#1}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-lambda-ast-a} \lambda_{\ast} \colon \Lg{g} \to \Lg{X}(\mathcal{M}), \qquad\qquad (\lambda_{\ast} v)_{p} = \frac{\rm d}{{\rm d}t}\lambda(t v,p)\big|_{t=0} \qquad{{\rm for~all}}\; v \in \Lg{g},\; p \in \mathcal{M}. \nonumber \end{align} \tag{ 3.3 }$

Then $\newcommand{\la}{\langle} \lambda$ is a Lie algebra action if the induced map $\newcommand{\la}{\langle} \lambda_{\ast}$ is a Lie algebra homomorphism, i.e. $\newcommand{\la}{\langle} \lambda_{\ast}$ is linear and satisfies $\newcommand{\la}{\langle} \newcommand{\Lg}[1]{\mathfrak{#1}} \lambda_{\ast}[u,v]=[\lambda_{\ast}u,\lambda_{\ast}v],\, u,v \in \Lg{g}$ , with the Lie brackets on $\newcommand{\Lg}[1]{\mathfrak{#1}} \Lg{g}$ and $\newcommand{\Lg}[1]{\mathfrak{#1}} \Lg{X}(\mathcal{M})$ on the left-hand side and the right-hand side, respectively. In particular, based on a Lie group action $\Lambda$ , a Lie algebra action is given by [11, lemma 4]

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\la}{\langle} \displaystyle \label{eq:lambda-by-Lambda} \lambda(v,p) = \Lambda(\exp_{\LG{G}}(v),p), \nonumber \end{align} \tag{ 3.4 }$

where $\newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\Lg}[1]{\mathfrak{#1}} \newcommand{\e}{{\rm e}} \exp_{\LG{G}} \colon \Lg{g} \to \LG{G}$ denotes the exponential map of $\newcommand{\LG}[1]{\mathrm{#1}} \LG{G}$ . Thus, for this choice of $\newcommand{\la}{\langle} \lambda$ , the induced map (3.3) is given by [11, theorem 5]

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Lg}[1]{\mathfrak{#1}} \newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\la}{\langle} \displaystyle \label{eq:lambda-ast-Lambda} (\lambda_{\ast} v)_{p} = \frac{\rm d}{{\rm d}t}\Lambda(\exp_{\LG{G}}(t v),p)\big|_{t=0} \qquad{{\rm for~all}}\; v \in \Lg{g},\; p \in \mathcal{M}. \nonumber \end{align} \tag{ 3.5 }$

Now, given an ODE on $\mathcal{M}$ , the basic assumption underlying the application of Lie group methods is the existence of a function $\newcommand{\R}{\mathbb{R}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\Lg}[1]{\mathfrak{#1}} f \colon \R \times \mathcal{M} \to \Lg{g}$ such that the ODE admits the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:dot-y-on-M} \dot y = \big(\lambda_{\ast} f(t,y)\big)_{y},\qquad y(0)=p. \nonumber \end{align} \tag{ 3.6 }$

For sufficiently small t, the solution of (3.6) then can be parametrized as

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle y(t) = \lambda\big(v(t),p\big), \nonumber \end{align} \tag{ 3.7a }$

where $\newcommand{\Lg}[1]{\mathfrak{#1}} v(t) \in \Lg{g}$ satisfies the ODE

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\dexp}{{\rm dexp}} \newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\la}{\langle} \displaystyle \label{eq:dot-y-on-g-b} \dot v = (\dexp_{\LG{G}}^{-1})_{v}\big(\,f(t,\lambda(v,p))\big),\qquad v(0)=0, \nonumber \end{align} \tag{ 3.7b }$

with the inverse differential $\newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\dexp}{{\rm dexp}} (\dexp_{\LG{G}}^{-1})_{v}$ of $\newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\e}{{\rm e}} \exp_{\LG{G}}$ evaluated at $\newcommand{\Lg}[1]{\mathfrak{#1}} v \in \Lg{g}$ . A major advantage of the representation (3.7) is that the task of numerical integration concerns the ODE (3.7b) evolving on the vector space $\newcommand{\Lg}[1]{\mathfrak{#1}} \Lg{g}$ , rather than the original ODE evolving on the manifold $\mathcal{M}$ . As a consequence, established methods can be applied to (3.7b), possibly after approximating $\newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\dexp}{{\rm dexp}} \dexp_{\LG{G}}^{-1}$ by a truncated series in a computationally feasible form.

3.2. Application to the assignment flow

Assume an ODE on $\mathcal{S}$ defined by (2.6) is given. The application of the approach of section 3.1 is considerably simplified by identifying $\newcommand{\LG}[1]{\mathrm{#1}} \LG{G} = T_{0}$ with the flat tangent space (2.7) and consequently also $\newcommand{\LG}[1]{\mathrm{#1}} \newcommand{\Lg}[1]{\mathfrak{#1}} T_{0} \cong \Lg{g} = T_{e}\LG{G}$ . One easily verifies that the action $\newcommand{\col}[2]{{#1}_{\bullet,#2}} \Lambda \colon T_{0} \times \mathcal{S} \to \mathcal{S}$ defined as

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:Lambda-simplex} \Lambda(v,p) = \exp_{p}(v), \nonumber \end{align} \tag{ 3.8 }$

with the right-hand side given by (2.12a), satisfies (3.1), i.e.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \Lambda(0,p) = p, \nonumber \end{align} \tag{ 3.9a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \displaystyle \Lambda(v_{1}+v_{2},p) = \frac{p e^{v_{1}+v_{2}}}{\la p, e^{v_{1}+v_{2}} \ra} = \Lambda(v_{1},\Lambda(v_{2},p)). \nonumber \end{align} \tag{ 3.9b }$

Proposition 3.1. The solution $W(t)$ to assignment flow (2.20) emanating from any W₀ = W(0) admits the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:ass-flow-W-by-V} W(t) = \exp_{W_{0}}\big(V(t)\big) \nonumber \end{align} \tag{ 3.10a }$

where $V(t) \in \mathcal{T}_{0}$ solves

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:ass-flow-dot-V} \dot V = \Pi_{\mathcal{T}_{0}} S\big(\exp_{W_{0}}(V)\big),\qquad V(0)=0. \nonumber \end{align} \tag{ 3.10b }$

Proof. Since geodesics through $0 \in T_{0}$ in directions $v \in T_{0}$ have the form $\gamma(t) = t v$ , the differential of the exponential map of $\newcommand{\LG}[1]{\mathrm{#1}} T_{0}=\LG{G}$ , $\newcommand{\e}{{\rm e}} \exp_{T_{0}}(v) = \gamma(1) = v$ , is the identity and thus (3.5) gives

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle (\lambda_{\ast} v)_{p} = \frac{\rm d}{{\rm d}t}\Lambda\big(\gamma(t),p\big)\big|_{t=0} = \Rep{p}(v), \nonumber \end{align} \tag{ 3.11 }$

with $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} \Rep{p}$ defined by (2.9b). As a result, the basic assumption (3.6) concerns ODEs on $\mathcal{S}$ that admit the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\R}{\mathbb{R}} \displaystyle \dot p = \Rep{p}\big(\,f(t,p)\big),\qquad p(0)=p_{0}, \nonumber \end{align} \tag{ 3.12 }$

for some function $\newcommand{\R}{\mathbb{R}} \newcommand{\col}[2]{{#1}_{\bullet,#2}} f \colon \R \times \mathcal{S} \to T_{0}$ and some $p_{0} \in \mathcal{S}$ . Since $\newcommand{\la}{\langle} \lambda=\Lambda$ by (3.4), the parametrization (3.7) reads

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle p(t) = \Lambda\big(v(t),p_{0}\big) \nonumber \end{align} \tag{ 3.13a }$

where $v(t) \in T_{0}$ solves

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \dot v = f\big(t,\Lambda(v,p_{0})\big),\qquad v(0)=0. \nonumber \end{align} \tag{ 3.13b }$

This setting extends to the assignment flow by defining (see (2.15)) $\newcommand{\col}[2]{{#1}_{\bullet,#2}} \Lambda \colon \mathcal{T}_{0} \times \mathcal{W} \to \mathcal{W}$ and $\newcommand{\la}{\langle} \newcommand{\col}[2]{{#1}_{\bullet,#2}} \lambda_{\ast} \colon \mathcal{T}_{0} \to \mathcal{T}_{0}$ as

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \Lambda(V,W) = \exp_{W}(V),\qquad \big(\lambda_{\ast}(V)\big)_{W} = \Rep{W}(V). \nonumber \end{align} \tag{ 3.14 }$

The basic assumption (3.6) then reads

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \dot W = \big(\lambda_{\ast}f(t,W)\big)_{W} = \Rep{W}\big(\Pi_{\mathcal{T}_{0}} S(W)\big) \overset{(2.10)}{=} \Rep{W}\big(S(W)\big),\qquad W(0) = \mathbb{1}_{\mathcal{W}}, \nonumber \end{align} \tag{ 3.15 }$

which is the assignment flow (2.20). Due to (3.6), for any W₀ = W(0), it admits the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle W(t) = \Lambda\big(V(t),W_{0}\big), \nonumber \end{align} \tag{ 3.16a }$

where $V(t) \in \mathcal{T}_{0}$ solves

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \dot V = f\big(t,\Lambda(V,W_{0})\big) = \Pi_{\mathcal{T}_{0}}S(\exp_{W_{0}}(V)),\qquad V(0)=0, \nonumber \end{align} \tag{ 3.16b }$

which is (3.10). □

Remark 3.2. While the basic formulation (2.20) of the assignment flow is autonomous, we keep in what follows the explicit time dependency of the function $f(t,\cdot)$ of the parametrization (3.10), because in more advanced scenarios the flow may become non-autonomous. A basic example concerns unsupervised problems [9] where labels vary, and hence the distance vectors (2.17) and in turn the vector field defining the assignment flow depend on t.

Using the above representation and taking into account the simplifications of the general approach of section 3.1, the RKMK algorithm [11] for integrating the assignment flow from t = 0 to t = h is specified as follows. Let $a_{i,\, j}, b_{j}$ be the coefficients of an s-stage, qth order classical RK method satisfying the consistency condition $c_{i}=\sum_{j \in [s]} a_{i,j}$ [12, chapter II]. Starting from any point

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle W_{0} = W(0), \nonumber \end{align} \tag{ 3.17a }$

the algorithm amounts to compute the vector fields

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\w}{\omega} \newcommand{\la}{\langle} \displaystyle \label{eq:RKMK-U-update} U^{i} = h \sum_{j \in [s]} a_{i,j} \widetilde U^{\,j},\qquad\qquad\quad i \in [s] \nonumber \end{align} \tag{ 3.17b }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\w}{\omega} \newcommand{\la}{\langle} \displaystyle \label{eq:RKMK-f-step} \widetilde U^{i} = f\big(h c_{i},\Lambda(U^{i},W_{0})\big),\qquad\ \, i \in [s] \nonumber \end{align} \tag{ 3.17c }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\w}{\omega} \newcommand{\la}{\langle} \displaystyle \label{eq:RKMK-V-update} V = h \sum_{j \in [s]} b_{j} \widetilde U^{\,j} \nonumber \end{align} \tag{ 3.17d }$

and the update

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:RKMK-W-update} W(h) = \Lambda(V,W_{0}). \nonumber \end{align} \tag{ 3.17e }$

Replacing $W_{0} \leftarrow W(h)$ , computing the update and iterating this procedure generates the sequence $(W^{(k)})_{k \geqslant 0}$ which approximates $(W(t_{k}))_{k \geqslant 0},\, t_{k} = k h$ .

A s-stage RKMK scheme is specified using the corresponding Butcher tableau of the form

Specifically, we consider the following explicit RKMK schemes of order 1, 2, 3, 4:

Note the increasing number of stages that raise the approximation order. This comes at a price, however, because each stage evaluates at step (3.17c) the right-hand side of (3.10b) which is the most expensive operation. As a consequence, it is not clear a priori if using a multi-stage scheme and a larger step size h is superior to a simpler scheme that is evaluated more frequently using a smaller step size.

In addition to the above explicit schemes, we consider the simplest implicit RKMK scheme

Implicit schemes are known to be stable for much larger step sizes. Yet, they require to solve at every step a fixed point equation which is done by an iterative inner loop.

The performances of these numerical schemes are examined in section 6.

4. Linear assignment flow, exponential integrator

The ODE (3.10b) which parametrizes the assignment flow together with (3.10a), evolves on a linear space but is a nonlinear system of ordinary differential equations. In this section, we provide an approximate representation of the assignment flow within time intervals through a linear ODE evolving on the tangent space (section 4.1), and a corresponding numerical scheme (section 4.2).

The resulting flow on the assignment manifold is still nonlinear, though. The basic idea is to capture locally a major part of the nonlinearity of the (full) assignment flow, by a linear ODE on the tangent space that enables to apply alternative integration schemes.

4.1. Linear assignment flow

Our ansatz has two ingredients. Firstly, we adopt the parametrization

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\la}{\langle} \displaystyle \label{eq:W-by-V-expW0} W(t) = \Exp_{W_{0}}\big(V(t)\big),\qquad V(t) \in \mathcal{T}_{0} \nonumber \end{align} \tag{ 4.1 }$

of the solution $W(t)$ to the assignment flow by a trajectory in the tangent space $\mathcal{T}_{0}$ , similar to (3.10a), except for using the 'true' exponential map (2.11a) and (2.15d), respectively, corresponding to the underlying affine connection. Secondly, we use an affine approximation of the vector field on the right-hand side of (2.20), that defines the assignment flow. The following corresponding definition generalizes the flow studied by [17] from the barycenter to arbitrary base points W₀, and from a flow on $\mathcal{S}$ to a flow on $\mathcal{W}$ .

Definition 4.1 (linear assignment flow). We call linear assignment flow every flow induced by an ODE of the form

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:ass-flow-linear} \dot W = \Rep{W}\Big(s_{0} + S_{0} \Rep{W_{0}}\log\frac{W}{W_{0}}\Big),\qquad W(0) = W_{0} \in \mathcal{W}, \nonumber \end{align} \tag{ 4.2 }$

with a fixed vector s₀ and a fixed matrix S₀, for arbitrary W₀.

An important property of the flow (4.2)—which explains its name—is the possibility to parametrize it by a linear ODE evolving on the tangent space $\mathcal{T}_{0}$ .

Proposition 4.2. The linear assignment flow (4.2) admits the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\la}{\langle} \displaystyle \label{eq:W-ass-flow-linear} W(t) = \Exp_{W_{0}}\big(V(t)\big), \nonumber \end{align} \tag{ 4.3a }$

where $V(t) \in \mathcal{T}_{0}$ solves

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:dot-V-ass-flow-linear} \dot V = \Rep{W_{0}}(s_{0} + S_{0} V),\qquad V(0)=0. \nonumber \end{align} \tag{ 4.3b }$

Proof. Parametrization (4.1) yields

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:proof-lin-ass-Vt} V(t) = \Exp_{W_{0}}^{-1}(W(t)) \overset{(2.11b)}{=} \Rep{W_{0}}\log\frac{W(t)}{W_{0}} \nonumber \end{align} \tag{ 4.4 }$

and by differentiation

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle \dot V(t) = \Rep{W_{0}}\Big(\frac{\dot W(t)}{W(t)}\Big). \nonumber \end{align} \tag{ 4.5a }$

Solving (4.2) for $\frac{\dot W}{W}$ after inserting (2.15c), and substitution in the preceding equation gives

and since $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} \Rep{W_{0}}\mathbb{1}=0$ by (2.9b)

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle = \Rep{W_{0}}\Big(s_{0} + S_{0} \Exp_{W_{0}}^{-1}\big(W(t)\big)\Big) \overset{(4.4)}{=} \Rep{W_{0}}\big(s_{0} + S_{0} V(t)\big). \nonumber \end{align} \tag{ 4.5c }$

The initial condition follows from $\newcommand{\Exp}{{\rm Exp}} V(0)=\Exp_{W_{0}}^{-1}(W_{0})=0$ . □

Remark 4.3. Note that, despite the linearity of (4.3b), the resulting flow (4.3a) solving (4.2) is nonlinear. Thus, one may hope to capture the major nonlinearity of the full assignment flow (2.20) by a linear ODE on the tangent space, at least locally in some time interval. Within this interval, the evaluation of (2.19) is not required, and the linearity of the tangent space ODE (4.3b) can be exploited for integration.

We conclude this section by computing the natural choice

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:s0-S0-natural} s_{0} = S(W_{0}),\qquad S_{0} = {\rm d}S_{W_{0}} \nonumber \end{align} \tag{ 4.6 }$

of the parameters of the linear assignment flow (4.2) in explicit form, where s₀ is immediate due to (2.19), but the Jacobian $S_{0} = {\rm d}S_{W_{0}}$ of $S(W)$ , evaluated at W₀, is not.

Proposition 4.4. Let $\newcommand{\R}{\mathbb{R}} S(W) \in \R^{|I||J|}$ denote the global similarity vector obtained by stacking the local similarity vectors $S_{1}(W),\ldots,S_{|I|}(W)$ of (2.19). Then, with

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-s0} s_{0} = S(W_{0}),\qquad s_{0i} = S_{i}(W_{0}),\qquad s_{0i,j} = S_{ij}(W_{0}) = \big(S_{i}(W_{0})\big)_{j}, \quad i \in I,\; j \in J \nonumber \end{align} \tag{ 4.7a }$

and the replicator map $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} \Rep{s_{0,i}}$ defined by (2.9b), the Jacobian of $S(W)$ at $W_{0} \in \mathcal{W}$ is given by

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\R}{\mathbb{R}} \displaystyle S_{0} = {\rm d}S_{W_{0}} = \left(\begin{array}{@{}ccc@{}} A_{11}(W_{0}) & \ldots & A_{1|I|}(W_{0}) \nonumber \\ \vdots & \ddots & \vdots \nonumber \\ A_{|I|1}(W_{0}) & \ldots & A_{|I||I|}(W_{0}) \end{array}\right) \in \R^{|I||J| \times |I||J|}, \nonumber \end{align} \tag{ 4.7b }$

where the action of each $|J| \times |J|$ block matrix has the form

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:def-dS-ik} A_{ik}(W_{0})(V_{k}) = \left\{\begin{array}{@{}ll@{}} w_{ik}\Rep{s_{0,i}}\Big(\frac{V_{k}}{W_{0,k}}\Big), &{{\rm if}}\; k \in \mathcal{N}_{i} \nonumber \\ 0, &{{\rm if}}\; k \not\in\mathcal{N}_{i} \end{array}\right., \qquad W_{0} \in \mathcal{W},\quad V_{k} \in T_{0},\qquad i,k \in I \nonumber \end{align} \tag{ 4.7c }$

and the non-zero entries if $k \in \mathcal{N}_{i}$ (using (4.7a))

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:dSik-entries} A_{ik,jl}(W_{0}) = \big(A_{ik}(W_{0})\big)_{jl} = w_{ik} \left\{\begin{array}{@{}ll@{}} (1 - s_{0i,j})\frac{s_{0i,j}}{W_{0,kj}}, &{{\rm if}}\; j=l \nonumber \\ -s_{0i,j}\frac{s_{0i,l}}{W_{0,kl}}, &{{\rm if}}\; j\neq l \end{array}\right.,\qquad j,l \in J. \nonumber \end{align} \tag{ 4.7d }$

The proof follows below after two preparatory lemmata.

Lemma 4.5. Let $p \in \mathcal{S}$ . Then the differential of $\newcommand{\col}[2]{{#1}_{\bullet,#2}} \newcommand{\e}{{\rm e}} \exp_{p} \colon T_{0} \to \mathcal{S}$ at $u \in T_{0}$ applied to $v \in T_{0}$ is given by

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:dexp-puv} {\rm d}\exp_{p}(u)(v) = \Rep{\exp_{p}(u)}(v),\quad u,v \in T_{0},\; p \in \mathcal{S}. \nonumber \end{align} \tag{ 4.8 }$

Moreover, we have

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:exp-eins-v} \exp_{p}(v-\log p) = \exp_{\mathbb{1}_{\mathcal{S}}}(v),\quad v \in T_{0},\; p \in \mathcal{S}. \nonumber \end{align} \tag{ 4.9 }$

Proof. Let $\gamma(t)$ be a smooth curve in T₀ with $\gamma(0)=u$ and $\dot\gamma(0)=v$ . Using (2.12a), we compute

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle \frac{{\rm d}}{{\rm d}t}\frac{p e^{\gamma(t)}}{\la p, e^{\gamma(t)}\ra}\bigg|_{t=0} = \frac{\la p,e^{u} \ra p v e^{u} - \la p, v e^{u}\ra p e^{u}}{\la p, e^{u}\ra^{2}} = \frac{p e^{u}}{\la p, e^{u} \ra}\Big(v - \frac{\la p e^{u},v\ra}{\la p, e^{u}\ra}\mathbb{1}\Big) = \Rep{\exp_{p}(u)}(v), \nonumber \end{align} \tag{ 4.10 }$

which is (4.8). As for (4.9), using the representation

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:p-exp-eins} p \overset{(2.12a)}{=} \exp_{\mathbb{1}_{\mathcal{S}}}(\log p) = \exp_{\mathbb{1}_{\mathcal{S}}}(\Pi_{T_{0}}\log p), \nonumber \end{align} \tag{ 4.11 }$

where the last equation takes into account remark 2.1, we obtain

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle & \exp_{p}(v-\log p) \nonumber \\ \nonumber & \qquad= \exp_{p}(v-\Pi_{T_{0}}\log p) \overset{(4.11)}{=} \exp_{\exp_{\mathbb{1}_{\mathcal{S}}}(\Pi_{T_{0}}\log p)}(v-\Pi_{T_{0}}\log p) \nonumber \end{align} \tag{ 4.12a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \overset{(3.8)}{=} \Lambda\big(v-\Pi_{T_{0}}\log p,\Lambda(\Pi_{T_{0}}\log p,\mathbb{1}_{\mathcal{S}})\big) \overset{(3.9)}{=} \Lambda(v-\Pi_{T_{0}}\log p + \Pi_{T_{0}}\log p,\mathbb{1}_{\mathcal{S}}) \nonumber \end{align} \tag{ 4.12b }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle = \Lambda(v,\mathbb{1}_{\mathcal{S}}) = \exp_{\mathbb{1}_{\mathcal{S}}}(v). \nonumber \end{align} \tag{ 4.12c }$

□

We use this lemma to represent the similarity vectors in a convenient form for subsequently proving proposition 4.4.

Lemma 4.6. The similarity vectors (2.19) admit the representation

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:Si-proof-repr} S_{i}(W) = \exp_{\mathbb{1}_{\mathcal{S}}}\Big(\sum_{k \in \mathcal{N}_{i}} w_{ik}\big(\log W_{k}-\frac{1}{\rho}D_{k}\big)\Big),\quad i \in I. \nonumber \end{align} \tag{ 4.13 }$

Proof. By (2.19) and (2.16), we obtain

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\pr}{{\rm pr}} \newcommand{\B}{\mathbb{B}} \displaystyle S_{i}(W) = \exp_{W_{i}}\Big(\log\frac{\prod_{k \in \mathcal{N}_{i}} L_{k}(W_{k})^{w_{ik}}}{W_{i}}\Big) = \exp_{W_{i}}\Big(\sum_{k \in \mathcal{N}_{i}} w_{ik} \log L_{k}(W_{k}) - \log W_{i}\Big) \nonumber \end{align} \tag{ 4.14a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \overset{(2.18b)}{=} \exp_{W_{i}}\bigg(\sum_{k \in \mathcal{N}_{i}} w_{ik}\Big( \log W_{k}-\frac{1}{\rho} D_{k} - \log\big(\la W_{k},e^{-\frac{1}{\rho}D_{k}}\ra\big)\mathbb{1}\Big) - \log W_{i}\bigg) \nonumber \end{align} \tag{ 4.14b }$

using again $\newcommand{\la}{\langle} \newcommand{\e}{{\rm e}} \exp_{p}(v+\lambda\mathbb{1})=\exp_{p}(v)$ for all $\newcommand{\R}{\mathbb{R}} \newcommand{\la}{\langle} \lambda \in \R,\, v \in T_{0},\, p \in \mathcal{S}$ (see remark 2.1)

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle = \exp_{W_{i}}\Big(\sum_{k \in \mathcal{N}_{i}} w_{ik}\big( \log W_{k}-\frac{1}{\rho} D_{k}\big) - \log W_{i}\Big) \overset{(4.9)}{=} \exp_{\mathbb{1}_{\mathcal{S}}}\Big(\sum_{k \in \mathcal{N}_{i}} w_{ik}\big( \log W_{k}-\frac{1}{\rho} D_{k}\big)\Big). \nonumber \end{align} \tag{ 4.14c }$

□

Proof of proposition 4.4. Setting

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\ci}{\perp\!\!\!\perp} \newcommand{\la}{\langle} \displaystyle \label{eq:Si-exp-Zi} S_{i}(W) \overset{(4.13)}{=} \exp_{\mathbb{1}_{\mathcal{S}}} \circ\, Z_{i}(W),\qquad Z_{i}(W) = \sum_{k \in \mathcal{N}_{i}} w_{ik}\big(\log W_{k}-\frac{1}{\rho}D_{k}\big), \nonumber \end{align} \tag{ 4.15 }$

we compute using a smooth curve $\gamma(t)$ in $\mathcal{W}$ with $\gamma(0)=W$ and $\dot\gamma(0)=V$ ,

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:dZi-WV} {\rm d}Z_{i}(W)(V) = \frac{{\rm d}}{{\rm d}t} Z_{i}\big(\gamma(t)\big)\big|_{t=0} = \sum_{k \in \mathcal{N}_{i}} w_{ik} \frac{{\rm d}}{{\rm d}t}\log\big(\gamma(t)\big)\big|_{t=0} = \sum_{k \in \mathcal{N}_{i}} w_{ik} \frac{V_{k}}{W_{k}}. \nonumber \end{align} \tag{ 4.16 }$

Thus, using (4.15) and (4.8) gives

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\R}{\mathbb{R}} \displaystyle {\rm d}S_{i}(W)(V) \overset{(4.15)}{=} {\rm d}\exp_{\mathbb{1}_{\mathcal{S}}}\big(Z_{i}(W)\big)\big({\rm d}Z_{i}(W)(V)\big) \overset{(4.8),(4.15)}{=} \Rep{S_{i}(W)}\big({\rm d}Z_{i}(W)(V)\big), \nonumber \end{align} \tag{ 4.17a }$

and using the linearity of the map $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} \Rep{S_{i}(W)}$ and (4.16),

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle = \sum_{k \in \mathcal{N}_{i}} w_{ik} \Rep{S_{i}(W)}\Big(\frac{V_{k}}{W_{k}}\Big), \nonumber \end{align} \tag{ 4.17b }$

which proves (4.7c). Inserting $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} \Rep{S_{i}(W)}$ due to (2.9b) yields (4.7d).

The following section specifies an alternative integration scheme for the linear assignment flow (4.2). Its approximation properties are numerically examined in section 6.

4.2. Exponential integrator

We focus on the linear ODE (4.3b) that together with (4.3a) determines the linear assignment flow due to (4.2). The solution to (4.3b) is given by Duhamel's formula [18],

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\expm}{{\rm expm}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:V-t-Duhamel} V(t) = \expm (t A) V(0) + \int_{0}^{t} \expm\big((t-\tau)A\big) a {\rm d}{\tau}\quad{{\rm where}}\quad A = \Rep{W_{0}} S_{0},\quad a = \Rep{W_{0}} s_{0}, \nonumber \end{align} \tag{ 4.18 }$

which involves the matrix exponential of the matrix A of dimension $|I||J| \times |I||J|$ (square of number of pixels $\times$ number of labels), which can be quite large in image labeling problems (10⁴–10⁷ variables). Explicitly computing the matrix exponential is neither feasible, because it is dense even if A is sparse, nor required in view of the multiplication with the vector a. Rather, taking into account $V(0)=0$ and that uniformly converging series can be integrated term by term, we set t = T large enough and evaluate

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\expm}{{\rm expm}} \displaystyle V(T) = \! \int_{0}^{T} \expm\big((T-\tau)A\big) a {\rm d}{\tau} = \expm(T A)\int_{0}^{T} \sum_{k=0}^{\infty} \frac{(-\tau A)^{k}}{k!} a {\rm d}{\tau} \nonumber \end{align} \tag{ 4.19a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\expm}{{\rm expm}} \newcommand{\B}{\mathbb{B}} \displaystyle = \expm(T A) \sum_{k=0}^{\infty} \Big[\frac{\tau (-\tau A)^{k}}{(k+1)!}\Big]_{\tau=0}^{T} a = T \expm(T A) \sum_{k=0}^{\infty} \frac{(-T A)^{k}}{(k+1)!} a \nonumber \end{align} \tag{ 4.19b }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \displaystyle \label{eq:V-by-phi1-c} = T\vphi_{1}(T A) a \nonumber \end{align} \tag{ 4.19c }$

where $\newcommand{\vphi}{\varphi} \vphi_{1}$ is the entire function

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \displaystyle \label{eq:def-phi-1} \vphi_{1}(z) = \sum_{k=0}^{\infty} \frac{z^{k}}{(k+1)!} = \frac{e^{z}-1}{z}, \nonumber \end{align} \tag{ 4.20 }$

whose series representation yields valid expressions (4.19c) also if the matrix A is singular.

We refer to [19] for a detailed exposition of matrix functions and to [20] and [19, sections 10 and 13] for a survey of methods for computing the matrix exponential and the product of matrix functions times a vector. For large problem sizes, the established methods of the two latter references are known to deteriorate, however, and methods based on Krylov subspaces have been developed [13, 14] and become the method of choice in connection with exponential integrators [15].

We confine ourselves with sketching below a state-of-the-art method [21] for the approximate numerical evaluation of (4.19). The evaluation of its performance for integrating the linear assignment flow and a comparison to the methods of section 5.1, are reported in section 6. A more comprehensive evaluation of further recent methods for evaluating (4.19) that cope with large problem sizes as well (e.g. [22]), is beyond the scope of this paper.

In order to compute approximately $\newcommand{\vphi}{\varphi} \vphi_{1}(T A) a$ , one considers the Krylov subspace

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Span}{{\rm span}} \newcommand{\Sp}{\mathbb{S}} \displaystyle \mathcal{K}_{m} = \Span\{a, A a, \ldots,A^{m-1} a\}, \nonumber \end{align} \tag{ 4.21 }$

with orthogonal basis $\mathcal{V}_{m} = (v_{1},\ldots,v_{m})$ arranged as column vectors of an orthogonal matrix $\mathcal{V}_{m}$ and computed using the basic Arnoldi iteration [13]. The action of A is approximated by

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\T}{\top} \displaystyle \mathcal{H}_{m} = \mathcal{V}_{m}^{\T} A \mathcal{V}_{m} \nonumber \end{align} \tag{ 4.22 }$

which in turn yields the approximation

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \newcommand{\T}{\top} \displaystyle \label{eq:vphi-1-approximation} \vphi_{1}(A) a \approx \vphi_{1}(\mathcal{V}_{m} \mathcal{H}_{m} \mathcal{V}_{m}^{\T}) a = \mathcal{V}_{m} \vphi_{1}(\mathcal{H}_{m}) \mathcal{V}_{m}^{\T} a = \|a\| \mathcal{V}_{m}\vphi_{1}(\mathcal{H}_{m}) e_{1}, \nonumber \end{align} \tag{ 4.23 }$

where $\newcommand{\T}{\top} e_{1}=(1,0,\ldots,0){}^{\T}$ denotes the first unit vector and the last equality is implied by the Arnoldi iteration producing $\mathcal{V}_{m}, \mathcal{H}_{m}$ , which sets $v_1 = a/\|a\|$ . Note that $\newcommand{\vphi}{\varphi} \vphi_{1}$ merely has to be applied to the much smaller $m \times m$ matrix $\mathcal{H}_{m}$ , which can be safely and efficiently computed using standard methods [19, 20]. The vector $\newcommand{\vphi}{\varphi} \vphi_{1}(\mathcal{H}_{m}) e_{1}$ can be recovered [23, theorem 1] in form of the upper m entries of the last column of $\newcommand{\w}{\omega} \newcommand{\expm}{{\rm expm}} \newcommand{\e}{{\rm e}} \expm(\widehat{\mathcal{H}}_{m})$ with the extended matrix

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\w}{\omega} \displaystyle \widehat{\mathcal{H}}_{m} = \left(\begin{array}{@{}cc@{}} \mathcal{H}_{m} & e_{1} \nonumber \\ 0 & 0 \end{array}\right). \nonumber \end{align} \tag{ 4.24 }$

If the degree of the minimal polynomial of a (i.e. the nonzero monic polynomial p of lowest degree such that $p(A) a=0$ ) is equal to m, then the approximation (4.23) is even exact [13, theorem 3.6].

5. Step sizes, adaptivity

We specify in this section the step size selection for the numerical RKMK schemes of section 3. In addition, for the linear assignment flow (section 4.1), we conduct a local error analysis in section 5.1 for RK schemes based on the linearity of the tangent space ODE that governs this flow. A corresponding explicit error estimate enables to determine a sequence $(h_{k})_{k \geqslant 0}$ of step sizes that ensure a prespecified local accuracy at each step k.

In order to determine step sizes for the nonlinear assignment flow (2.20), we proceed differently, because the corresponding vector field depends nonlinearly on the current iterate and estimating local Lipschitz constants will be expensive and less sharp. We therefore adapt in section 5.2 classical methods for local error estimation and step size selection for nonlinear ODEs based on embedded Runge–Kutta methods [12, section II.4], to the geometric RKMK methods of section 3.

The experimental evaluation of both approaches is reported in section 6.

5.1. Linear assignment flow

We focus on the linear ODE (4.3b) that together with (4.3a) determines the linear assignment flow (4.2). Due to its approximation property demonstrated in section 6.3.1, we only consider the linearization point $W_{0}=\mathbb{1}_{\mathcal{W}}$ . Since the ODE (4.2) evolves on the linear space $\mathcal{T}_{0}$ , we apply the classical s-stage explicit RK scheme, rather than the geometric s-stage RKMK scheme (3.17), to obtain

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \newcommand{\R}{\mathbb{R}} \displaystyle U^{i} = \Rep{W_{0}} s_{0} + \Rep{W_{0}} S_{0}\Big( V^{(k)} + h \sum_{j \in [s-1]} a_{i,j} U^{\,j}\Big),\qquad i \in [s],\label{eq:RK-Ui} \nonumber \end{align} \tag{ 5.1a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:RK-V-update} V^{(k+1)} = V^{(k)} + h \sum_{i \in [s]} b_{i} U^{i},\qquad V^{(0)} = V(0) = 0. \nonumber \end{align} \tag{ 5.1b }$

Specifically, regarding the explicit schemes listed at the end of section 3.2 in terms of their Butcher tableaus, consecutively inserting (5.1a) into (5.1b) yields with the shorthands $a, A$ defined by (4.18),

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle V^{(k+1)} = h a \qquad \qquad \qquad \qquad\qquad + (I + h A) V^{k},\qquad\qquad \qquad \qquad \qquad \ ({\bf FE}) \nonumber \end{align} \tag{ 5.2a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle V^{(k+1)} = \Big(h + \frac{h^{2}}{2} A\Big) a \qquad \qquad \qquad + \Big(I + h A + \frac{h^{2}}{2} A^{2}\Big) V^{(k)}, \quad\quad \quad\quad\quad \ \ \ ({\bf H2}) \nonumber \end{align} \tag{ 5.2b }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle V^{(k+1)} = \Big(h+\frac{h^{2}}{2}A+\frac{h^{3}}{6}A^{2}\Big)a \qquad\quad+ \Big(I+h A+\frac{h^{2}}{2}A^{2}+\frac{h^{3}}{6}A^{3}\Big) V^{(k)}, \ \, \qquad\quad ({\bf H3}) \nonumber \end{align} \tag{ 5.2c }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle V^{(k+1)} = \Big(h+\frac{h^{2}}{2}A+\frac{h^{3}}{6}A^{2}+\frac{h^{4}}{24}A^{3}\Big) a \quad + \Big(I+h A+\frac{h^{2}}{2}A^{2}+\frac{h^{3}}{6}A^{3}+\frac{h^{4}}{24} A^{4}\Big) V^{(k)}. \qquad ({\bf RK4}) \nonumber \end{align} \tag{ 5.2d }$

Comparison with (4.18) shows that due to the linearity of the ODE, each scheme results in a corresponding Taylor series approximation, depending on its order q, of the equation

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\expm}{{\rm expm}} \newcommand{\vphi}{\varphi} \newcommand{\la}{\langle} \displaystyle \label{eq:linear-update-exact} V(t_{k+1}) = h\vphi_{1}(h A) a + \expm(h A) V(t_{k}) \nonumber \end{align} \tag{ 5.3 }$

that is,

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle V^{(k+1)} = h \Big(\sum_{i=0}^{q-1} \frac{(h A)^{i}}{(i+1)!} \Big) a + \Big(\sum_{i=0}^{q} \frac{(h A)^{i}}{i!} \Big) V^{(k)}\label{eq:Vk+1-series} \nonumber \end{align} \tag{ 5.4a }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle = p_{1,q}(h A) a + p_{2,q}(h A) V^{(k)}, \nonumber \end{align} \tag{ 5.4b }$

with matrix-valued polynomials $p_{1,q}, p_{2,q}$ . Our strategy for choosing the step size h is based on the local error estimate specified below as theorem 5.2 and prepared by the following Lemma.

Lemma 5.1 ([24, equation (8.4.8)]).

lemma Let $q \in \mathbb{N}$ and $\newcommand{\R}{\mathbb{R}} t \in \R$ . Then

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{lem:taylor-sum} \sum_{i=0}^{q} \frac{t^{i}}{i!} = e^{t} \frac{\Gamma(1+q,t)}{q!} \nonumber \end{align} \tag{ 5.5 }$

with the incomplete Gamma function

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-Gamma-incomplete} \Gamma(1+q,t) = \int_{t}^{\infty} \tau^{q} e^{-\tau}{\rm d}{\tau}. \nonumber \end{align} \tag{ 5.6 }$

Theorem 5.2. Let $V(t)$ solve (4.2) with $W_{0}=\mathbb{1}_{\mathcal{W}}$ , and let $(V^{(k)})_{k > 0}$ be a sequence generated by a RK scheme (5.1) of order q. Set $V(t_{k})=V^{(k)}$ in (5.3). Then $V(t_{k+1})$ in (5.3) is the exact value of the linear assignment flow emanating from $V^{(k)}$ , and regarding (5.4) the local error estimate

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:local-error-bound-a} \|V(t_{k+1})-V^{(k+1)}\| \leqslant e^{h\|A\|}\Big(1-\frac{\Gamma(1+q,h\|A\|)}{q!}\Big)\Big(\frac{\|a\|}{\|A\|} + \|V^{(k)}\|\Big) \nonumber \end{align} \tag{ 5.7a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:local-error-bound-b} < e^{h\|A\|} (1-e^{-h\|A\|})^{(1+q)} \Big(\frac{\|a\|}{\|A\|} + \|V^{(k)}\|\Big) \nonumber \end{align} \tag{ 5.7b }$

holds, where $\Gamma(1+q,h\|A\|)$ is given by (5.6) and $\|A\|$ denotes the spectral norm of the matrix $\newcommand{\R}{\mathbb{R}} \newcommand{\Rep}[1]{R_{#1}} A = \Rep{W_{0}}(S_{0})$ .

Proof. Using (5.3) and (5.4) and $V(t_{k})=V^{(k)}$ , we bound the local error by

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\expm}{{\rm expm}} \newcommand{\vphi}{\varphi} \displaystyle \|V(t_{k+1})-V^{(k+1)}\| \leqslant \|h\vphi_{1}(h A)-p_{1,q}(h A)\| \|a\| + \|\expm(h A)-p_{2,q}(h A)\| \|V^{(k)}\|, \nonumber \end{align} \tag{ 5.8a }$

and inserting the series (5.4a) gives

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:proof-bound-series-b} \leqslant h \Big(\sum_{i=q}^{\infty} \frac{(h\|A\|)^{i}}{(i+1)!}\Big) \|a\| + \Big(\sum_{i=q+1}^{\infty} \frac{(h\|A\|)^{i}}{i!} \Big) \|V^{(k)}\|. \nonumber \end{align} \tag{ 5.8b }$

Both series absolutely converge for any h. By Lemma 5.1, we have

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle \sum_{i=q}^{\infty} \frac{t^{i}}{(i+1)!} \overset{j=i+1}{=} \sum_{j=q+1}^{\infty} \frac{1}{t} \cdot \frac{t^{\,j}}{j!} = \frac{e^{t}}{t}\Big(1-\frac{\Gamma(1+q,t)}{q!}\Big), \nonumber \end{align} \tag{ 5.9a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle \sum_{i=q+1}^{\infty} \frac{t^{i}}{i!} = e^{t}\Big(1-\frac{\Gamma(1+q,t)}{q!}\Big). \nonumber \end{align} \tag{ 5.9b }$

Applying these equations to (5.8b) yields

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle h \sum_{i=q}^{\infty} \frac{(h\|A\|)^{i}}{(i+1)!} = \frac{e^{h\|A\|}}{\|A\|}\Big(1-\frac{\Gamma(1+q,h\|A\|)}{q!}\Big), \nonumber \end{align} \tag{ 5.10a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle \sum_{i=q+1}^{\infty} \frac{(h\|A\|)^{i}}{i!} = e^{h\|A\|}\Big(1-\frac{\Gamma(1+q,h\|A\|)}{q!}\Big), \nonumber \end{align} \tag{ 5.10b }$

and substitution into (5.8)

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:proof-local-bound} \|V(t_{k+1})-V^{(k+1)}\| \leqslant e^{h\|A\|}\Big(1-\frac{\Gamma(1+q,h\|A\|)}{q!}\Big)\Big(\frac{\|a\|}{\|A\|} + \|V^{(k)}\|\Big), \nonumber \end{align} \tag{ 5.11 }$

which is (5.7a). To show (5.7b), we use the representation

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \newcommand{\B}{\mathbb{B}} \displaystyle \label{eq:Gamma-inc-repr} \frac{1}{p}\Gamma\Big(\frac{1}{p},x^{\,p}\Big) \overset{(5.6)}{=} \frac{1}{p} \int_{x^{\,p}}^{\infty} t^{\frac{1}{p}-1} e^{-t} {\rm d}{t} \overset{t=\tau^{\,p}}{=} \int_{x}^{\infty} e^{-\tau^{\,p}}{\rm d}{\tau} \nonumber \end{align} \tag{ 5.12 }$

and the lower bound [25, corollary of theorem 1]

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:Gamma-Alzer-bound} \frac{1}{\Gamma(1+1/p)} \int_{x}^{\infty} e^{-t^{\,p}}{\rm d}{t} > 1 - (1-e^{-\alpha x^{\,p}})^{1/p},\qquad \alpha \geqslant \max\big\{1,\big(\Gamma(1+1/p)\big)^{-p}\big\}, \nonumber \end{align} \tag{ 5.13 }$

that holds for all x > 0 and $0 < p \neq 1$ , with the Gamma function

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-Gamma} \Gamma(q) = \int_{0}^{\infty} \tau^{q-1} e^{-\tau}{\rm d}{\tau},\qquad \Gamma(n+1) = n! \quad{{\rm if}}\;n \in \mathbb{N}. \nonumber \end{align} \tag{ 5.14 }$

Put

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:proof-Gamma-set-x-p} x=h\|A\| \qquad{{\rm and}}\qquad p=\frac{1}{1+q}. \nonumber \end{align} \tag{ 5.15 }$

Since $q \geqslant 1$ and $\newcommand{\bi}{\boldsymbol} \big(\Gamma(1+1/p)\big){}^{-p} = \Gamma(2+q){}^{-\frac{1}{1+q}} = \big(\frac{1}{(q+1)!}\big){}^{\frac{1}{1+q}} < 1$ , we set $\alpha=1$ in view of (5.13). Furthermore, we have

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \Gamma(1+q,h\|A\|) \overset{(5.15)}{=} \Gamma\big(1/p,x\big) = \Gamma\big(1/p,(x^{1/p})^{\,p}\big) \overset{(5.12)}{=} p\int_{x^{1/p}}^{\infty} e^{-t^{\,p}}{\rm d}{t} \nonumber \end{align} \tag{ 5.16a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \overset{(5.13), \alpha=1}{>} p\Gamma\big(1+1/p\big)\big(1-\big(1-e^{-x}\big)^{1/p}\big), \nonumber \end{align} \tag{ 5.16b }$

and using $\newcommand{\bi}{\boldsymbol} p\,\Gamma\big(1+1/p\big) \overset{(5.15)}{=} \frac{\Gamma(2+q)}{1+q} = \frac{(1+q)!}{1+q}=q!$ , since q is integer,

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle = q!\big(1-(1-e^{-x})^{(1+q)}\big). \nonumber \end{align} \tag{ 5.16c }$

Thus,

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\B}{\mathbb{B}} \displaystyle e^{x}\Big(1-\frac{\Gamma\big(1/p,x\big)}{q!}\Big) < e^{x}\big(1-e^{-x}\big)^{(1+q)} \nonumber \end{align} \tag{ 5.17 }$

which after substituting (5.15) and in turn into (5.11), proves (5.7b). □

Theorem 5.2 enables to control the local integration error by choosing the step size h, using the simple form of the bound (5.7), depending on the constants $\|a\|, \|A\|$ and the norm $\|V^{(k)}\|$ of the current iterate. Specifically, we choose

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:hk-RK-linear} h = h_{k} \qquad{{\rm such~that}}\qquad \|V(t_{k+1})-V^{(k+1)}\| \leqslant \tau \nonumber \end{align} \tag{ 5.18 }$

by (5.7), for some prespecified value $\tau$ . Inspecting the parametrization (4.3) shows that $\|V(t)\|$ grows—and hence the step sizes (5.18) decrease—until $W(t)$ is close enough to a vertex of $\mathcal{W}$ (which represents a labeling) and satisfies a termination criterion that stops the chosen iterative RK scheme (5.1).

In order to check how large $\|V(t)\|$ then will be, assume

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\veps}{\varepsilon} \newcommand{\R}{\mathbb{R}} \displaystyle W_{i}=(\veps,\ldots,\veps,1-(|J|-1)\veps,\veps,\ldots,\veps) \in \R^{|J|}\qquad{{\rm and}}\qquad \veps \ll \frac{1}{|J|-1} \leqslant 1, \nonumber \end{align} \tag{ 5.19 }$

that is $W_{ij} \approx 1$ and $W_{il} \approx 0$ if $l \neq j$ . Then with $W_{0}=\mathbb{1}_{\mathcal{W}}$ and by (4.3) and (2.12a)

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Exp}{{\rm Exp}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle V_{i} = \Exp_{\mathbb{1}_{\mathcal{S}}}^{-1}(W_{i}) = \Rep{\mathbb{1}_{\mathcal{S}}}\big(\log W_{i}-\log\mathbb{1}_{\mathcal{S}}\big) = \frac{1}{|J|} \big(\log W_{i} - \frac{1}{|J|}\la\mathbb{1},\log W_{i}\ra\mathbb{1}\big) \nonumber \end{align} \tag{ 5.20a }$

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\veps}{\varepsilon} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \displaystyle \log W_{i} \approx (\log\veps,\ldots,\log\veps,0,\log\veps,\ldots,\log\veps),\qquad \frac{1}{|J|}\la\mathbb{1},\log W_{i}\ra \approx \frac{|J|-1}{|J|}\log\veps \approx \log\veps \nonumber \end{align} \tag{ 5.20b }$

and hence

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\veps}{\varepsilon} \newcommand{\la}{\langle} \displaystyle \label{eq:termination-rough} \|V_{i}\| \approx \frac{1}{|J|}\log\frac{1}{\veps}, \qquad\qquad \|V\| \approx \frac{|I|}{|J|}\log\frac{1}{\veps}. \nonumber \end{align} \tag{ 5.20c }$

Thus, as soon as the norm $\|V(t)\|$ has grown to the order $\newcommand{\veps}{\varepsilon} \log\frac{1}{\veps}$ , a termination criterion that checks if $W(t)$ is $\newcommand{\veps}{\varepsilon} \veps$ -close to some vertex of $\mathcal{W}$ , will be satisfied.

Figure 2 quantitatively illustrates how much the factor $e^{h\|A\|}(1-e^{h\|A\|}){}^{(1+q)}$ of the upper bound (5.7) overestimates the exact factor (5.11) computed in the proof of theorem 5.2, and hence how conservative (i.e. too small) the step size h_k will be chosen to achieve (5.18). The curves of figure 2 show that the estimate (5.7) is fairly tight and suited to adapt the step size. Furthermore, comparing the ordinate values of both panels for q = 1 and q = 4 shows that, in order to achieve a fixed accuracy $\tau$ in (5.18), using a higher-order RK scheme (5.1) enables to choose a larger step size.

5.2. Nonlinear assignment flow

Similar to the preceding section, we wish to select step sizes $(h_{k})_{k \geqslant 0}$ in order to control the local error on the left-hand side of (5.7). Because an estimate like the right-hand side of (5.7) that is valid at each step k, is not available for the nonlinear assignment flow, we adapt established embedded RK methods [12, section II.5] to the geometric RKMK schemes (3.17).

The basic strategy is to evaluate twice step (3.17d)

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\w}{\omega} \newcommand{\la}{\langle} \displaystyle \label{eq:RK-embedded-V-hat-V} V = h \sum_{j \in [s]} b_{j} \widetilde U^{\,j},\qquad\qquad \widehat V = h \sum_{j \in [s]} \widehat b_{j} \widetilde U^{\,j}, \nonumber \end{align} \tag{ 5.21 }$

using a second collection of coefficients $\newcommand{\w}{\omega} \widehat b_{j},\, j \in s$ , but with the same vector fields $\newcommand{\w}{\omega} U^{i},\widetilde U^{i},\, i \in [s]$ . Thus each embedded method can be specified by augmenting the corresponding Butcher tableau accordingly,

Proper embedded methods combine a pair of RK schemes of different order $\newcommand{\w}{\omega} q, \widehat q$ so that $\newcommand{\w}{\omega} \|V-\widehat V\|$ indicates if the step size h is small enough, at each step k of the overall iteration. Since the vectors $\newcommand{\w}{\omega} U^{i},\widetilde U^{i},\, i \in [s]$ are used twice, this comes at little additional costs. We also point out that unlike the linear case, the magnitude $\|V\|$ of tangent vectors has much less influence, because the scheme (3.17) that is consecutively applied at each step k, is based on (3.10b) with the initial condition $V(0)=0$ . As a consequence, the magnitude $\|V\|$ of the update (3.17d) will be relatively small at each step k.

We list the tableaus of two embedded methods that we evaluate in section 6.

The first method combines the forward Euler scheme and Heun's method of order 2. The second method complements Heun's method of order 3 as specified on [12, p 166]. We call RKMK-1/2 and RKMK-3/2 the geometric versions of these schemes when they are applied in connection with (3.17).

We conclude this section by specifying the extension of (3.17) in order to include adaptive step size control.

The distance function d_I defined by (6.1). Typical parameter values are $\tau=0.01, n_{\tau}=20$ . Starting with a small initial step size h₀, the algorithm adaptively generates a sequence (h_k) whose values increase whenever the local error estimate is much smaller (by a factor $n_{\tau}$ ) than the prescribed tolerance $\tau$ .

6. Experiments and discussion

This section is organized as follows (see also figure 1). We specify details of our implementation in section 6.1. Section 6.2 reports the evaluation of the geometric RKMK schemes (3.17) with embedded step size control (algorithm 1), for integrating the nonlinear assignment flow. Section 6.3 is devoted to the linear assignment flow: assessment of how closely it approximates the nonlinear assignment flow, evaluation of the RK schemes (5.1) with adaptive step size selection (5.18), and evaluation of the exponential integrator introduced in section 4.2.

Algorithm 1. Embedded RKMK with adaptive step size control.

6.1. Implementation details

All algorithms were implemented and evaluated using Mathematica. We did not apply any assignment normalization as suggested by [1, section 3.3.1], since mathematica can work with arbitrary numerical precision. Our experiments thus illustrate intrinsic properties of the assignment flow the hold on the open assignment manifold $\mathcal{W}$ , and the reliability and efficiency of a variety of algorithms for integrating this flow numerically while respecting the underlying geometry. We do not examine the flow on the boundary of $\newcommand{\ol}[1]{\overline{#1}} \ol{\mathcal{W}}$ which does not belong to the domain of our approach, by definition. For an investigation of a particular numerical scheme in this specific context, we refer to [26].

Throughout the experiments, we used uniform weights in (2.19) and (2.16), respectively, since how to choose these 'control variables' in a proper way depending on the application at hand, is subject of our current work. Yet, we point out that the algorithms of the present paper do cover such more general scenarios. For example, the simplest geometric RKMK scheme (3.17) was recently used for integrating the assignment flow in unsupervised scenarios [9], where labels evolve and hence distance vectors (2.17) no longer are static but vary with time $D = D(t)$ , too.

6.1.1. Ground truth flows.

In order to obtain a baseline for assessing the performance of linearizations, of approximate numerical integration by various schemes or both, we always solved the assignment flow (nonlinear or linear) with high numerical accuracy using the geometric implicit Euler scheme (nonlinear flow) or the euclidean implicit Euler scheme (linear flow), with a sufficiently small step size h. This requires to solve a fixed point equation as part of every iteration k, involving the nonlinear mapping on the right-hand side of (3.10b) in case of the nonlinear assignment flow, or the linear mapping on the right-hand side of (4.3b) in case of the linear assignment flow. These fixed point equations were iteratively solved as well, and the corresponding iterations terminated when subsequent elements of the corresponding subsequences $(V^{k_{i}})_{i \geqslant 0}$ that measure the residual of the fixed point equation, satisfied

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-DI} d_{I}(V^{k_{i+1}},V^{k_{i}}) = \frac{1}{|J|} \max_{i \in I} \|V_{i}^{k_{i+1}}-V_{i}^{k_{i}}\| \leqslant 10^{-8}. \nonumber \end{align} \tag{ 6.1 }$

Starting these inner iterative loops with $V^{k_{0}}=V^{(k)}$ and terminating with $\newcommand{\mrm}[1]{\mathrm{#1}} V^{k_{i,\mrm{end}}}$ , we set $\newcommand{\mrm}[1]{\mathrm{#1}} V^{(k+1)}=V^{k_{i,\mrm{end}}}$ and continued with the outer iteration k + 1.

6.1.2. Termination criterion.

As suggested by [1], all iterative numerical schemes generating sequences (W^(k)) were terminated when the average entropy of the assignment vectors dropped below the threshold

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:termination-criterion} -\frac{1}{|I||J|}\sum_{i \in I}\sum_{j \in J} W_{ij}^{(k)} \log W_{ij}^{(k)} < 10^{-3}. \nonumber \end{align} \tag{ 6.2 }$

If this happens, then—possibly up to a tiny subset of pixels $i \in I$ —all assignment vectors W_i are very close to a unit vector and hence almost uniquely indicate a labeling.

6.1.3. Data.

Besides using the one-dimensional signal shown by figure 7 that enables to visualize the entire evolution of the flow as plot in a single simplex (see figure 9), we used two further labeling scenarios for evaluating the numerical integration schemes.

Figure 3(a) shows a scenario adopted from [1, figure 6] using $\rho=0.1$ and $|\mathcal{N}_{i}|=7 \times 7$ . The input data (right panel) comprise 31 labels encoded as vertices (unit vectors) of a corresponding simplex. This results in uniform distances (2.17) and enables to assess in an unbiased way the effect of regularization by geometric diffusion in terms of the similarity map (2.19).

Figure 3(b) shows a color image together with four color vectors used as labels, as illustrated by the panel on the right. In contrast to the data of figure 3(a) with a high level of noise and a uniform data term (as motivated and explained above), the input data shown on the left of figure 3(b) are not noisy but comprise spatial structures at quite different scales (fine texture, large homogeneous regions), causing a nonuniform data term and a more complex assignment flow.

Both scenarios together provide a testbed in order to check and compare schemes for numerically integrating the assignment flow.

6.2. Nonlinear flow: embedded RKMK-schemes

Figures 4 and 5 show the results of the two embedded RKMK schemes of section 5.2 used to integrate the full nonlinear assignment flow (2.20), for the data shown by left panels of figures 3(a) and (b).

**Figure 4.** Nonlinear assignment flow, embedded RKMK schemes. Results of processing the data shown by figure 3(a), left panel (parameter: $\rho=0.1, |\mathcal{N}_{i}|=7 \times 7$ ). The ground truth labeling resulting from integrating the (full nonlinear) assignment flow using the *implicit* geometric Euler scheme (step size h = 0.01) is displayed in the left panel. The labeling results of these *explicit* schemes are shown in the middle. Because there is almost no difference to the ground truth result, both explicit schemes integrate the assignment flow sufficiently accurate. The corresponding sequences of adaptive step sizes (h_k) generated by the embedded geometric schemes RKMK-1/2 and RKMK-3/2 of section 5.2 are shown in the rightmost panel.
Download figure:
Standard image High-resolution image

**Figure 4.** Nonlinear assignment flow, embedded RKMK schemes. Results of processing the data shown by figure 3(a), left panel (parameter: $\rho=0.1, |\mathcal{N}_{i}|=7 \times 7$ ). The ground truth labeling resulting from integrating the (full nonlinear) assignment flow using the *implicit* geometric Euler scheme (step size h = 0.01) is displayed in the left panel. The labeling results of these *explicit* schemes are shown in the middle. Because there is almost no difference to the ground truth result, both explicit schemes integrate the assignment flow sufficiently accurate. The corresponding sequences of adaptive step sizes (h_k) generated by the embedded geometric schemes RKMK-1/2 and RKMK-3/2 of section 5.2 are shown in the rightmost panel.
Download figure:
Standard image High-resolution image

**Figure 5.** Nonlinear assignment flow, embedded RKMK schemes. Results of processing the data shown by figure 3(b), left panel (parameters: $\rho=0.5$ , $|\mathcal{N}_{i}|=3 \times 3$ (less regularization; top row) and $|\mathcal{N}_{i}|=5 \times 5$ (more regularization; bottom row). The ground truth labeling was computed by integrating the (full nonlinear) assignment flow using the *implicit* geometric Euler scheme (step size h = 0.01). The embedded RKMK schemes (section 5.2) generated the adaptive step sizes shown in the panels on the right and almost identical labeling results.
Download figure:
Standard image High-resolution image

**Figure 5.** Nonlinear assignment flow, embedded RKMK schemes. Results of processing the data shown by figure 3(b), left panel (parameters: $\rho=0.5$ , $|\mathcal{N}_{i}|=3 \times 3$ (less regularization; top row) and $|\mathcal{N}_{i}|=5 \times 5$ (more regularization; bottom row). The ground truth labeling was computed by integrating the (full nonlinear) assignment flow using the *implicit* geometric Euler scheme (step size h = 0.01). The embedded RKMK schemes (section 5.2) generated the adaptive step sizes shown in the panels on the right and almost identical labeling results.
Download figure:
Standard image High-resolution image

The two embedded RKMK schemes combine RKMK schemes of different approximation order $q/q'$ , 1/2 and 3/2, respectively, which reuse vector field evaluations (3.17) in order to produce sequences of tangent vectors $(V^{(k)}), (\hat V^{(k)})$ that enable to estimate the local approximation error by monitoring the distances $d_{I}(V^{(k)},\hat V^{(k)})$ . As specified by algorithm 1, step sizes h_k adaptively increase provided a prescribed error tolerance is not violated.

The parameter values $\tau=0.01$ (tolerance) and $n_{\tau}=20$ (tolerance factor), used to produce the results shown by figures 4 and 5, suffice to integrate accurately the full nonlinear assignment flow by the respective explicit schemes, as the comparison with the ground truth labeling generated by the implicit geometric Euler scheme shows. Since RKMK-3/2 has higher order q than RKMK-1/2, larger step sizes can be tolerated. On the other hand, each iteration of RKMK-3/2 is about twice expensive as RKMK-1/2.

Both plots of the step size sequences (h_k) reveal that the initial step size h₀ = 0.01 was much too small (conservative), and that a fixed value of h_k is adequate for most of the iterations. This value was larger for the experiment corresponding to figure 4 due to the uniform data term (by construction, as explained in section 6.1.3) and the more uniform scale of spatial structures. By contrast, the presence of spatial structures at quite different scales in the data corresponding to figure 5 causes a more involved assignment flow to be integrated, and hence to a smaller step size after adaption. Comparing the two rightmost panels of figure 5 shows that the strength of regularization (neighborhood size $|\mathcal{N}_{i}|$ ) had only little influence on the sequences of step sizes.

We never observed decreasing step sizes in these supervised scenarios, that is the corresponding step of algorithm 1 never was active. This may change in more involved scenarios, however (see remark 3.2).

Overall, a few dozens of explicit iterations suffice for accurate geometric numerical integration of the assignment flow with well below 1% wrongly labeled pixels (figure 6). Each iteration may be implemented in a fine-grained parallel way and has computational costs roughly equivalent to a convolution, besides mapping to the tangent space $\mathcal{T}_{0}$ and back to the assignment manifold $\mathcal{W}$ , at each iteration.

**Figure 6.** Number of iterations versus wrongly labeled pixels. We relate the number of iterations to the percentage of wrongly labeled pixels (compared to the ground truth) for the nonlinear assignment flow (integrated by RKMK-3/2 with tolerance $\tau$ and initial step size h₀ as in algorithm 1) and the linear assignment flow (integrated by an adaptive RK scheme of order 4, with $\tau$ defined by (5.18)) for the mandrill experiment. In general, parameter choices which increase the number of iterations, yield a better labeling. Thus the parameters of the (non-)linear assignment flow can be chosen to achieve an application-specific compromise between runtime and accuracy.
Download figure:
Standard image High-resolution image

**Figure 6.** Number of iterations versus wrongly labeled pixels. We relate the number of iterations to the percentage of wrongly labeled pixels (compared to the ground truth) for the nonlinear assignment flow (integrated by RKMK-3/2 with tolerance $\tau$ and initial step size h₀ as in algorithm 1) and the linear assignment flow (integrated by an adaptive RK scheme of order 4, with $\tau$ defined by (5.18)) for the mandrill experiment. In general, parameter choices which increase the number of iterations, yield a better labeling. Thus the parameters of the (non-)linear assignment flow can be chosen to achieve an application-specific compromise between runtime and accuracy.
Download figure:
Standard image High-resolution image

6.3. Linear assignment flow

The approach of section 4 involves two different approximations:

(i)
the linear assignment flow (4.2) approximating the full assignment flow (2.20), and
(ii)
the numerical integration of the linear assignment flow using two alternative numerical schemes:
- (a)
  adaptive RK schemes (section 5.1) based on the parametrization of proposition 4.2 and
- (b)
  the exponential integrator (section 4.2).

Due to the remarkable approximation properties of the linear assignment flow when a single linearization at the barycenter is only used (section 6.3.1), we entirely focused on this flow when evaluating the numerical schemes (a) and (b) in sections 6.3.2 and 6.3.3.

6.3.1. Approximation property.

We report a series of experiments for the 1D signal depicted by figure 7 using both the full and the linear assignment flow in order to check how closely the latter approximates the former. Then we discuss the linear assignment flow for the two 2D scenarios shown by figure 3.

**Figure 7.** (a) A noisy 1D signal used for the experiments of section 6.3.1. (b) The piecewise constant signal (red) used to generate (a) by superimposing noise. (c) *Local* rounding to the next label and comparing to (b) indicates the noise level. Local rounding is equivalent to omitting regularization in the assignment flow by replacing the interacting similarity vectors $S(W)$ in (2.20) by the non-interacting likelihood vectors $L(W)$ of (2.18).
Download figure:
Standard image High-resolution image

The parameter value $\rho=0.1$ for scaling in (2.18) the data, for all 1D experiments discussed below. This gave a larger weight to the 'data term' so that—in view of the noisy data (figure 7)—the regularization property of the assignment flow (2.20), in terms of the similarity vectors S_i(W) interacting through (2.19), was essential for labeling.

We first explain how the linearizations of the assignment flow were controlled. According to proposition 4.2, using the parametrization (4.3a) and the linear ODE (4.3b) is equivalent to the linear assignment flow (4.2). Using again the parametrization (4.3a) and repeating the proof of proposition 4.2 shows that the full assignment flow (2.20) is locally governed by the nonlinear ODE

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\R}{\mathbb{R}} \displaystyle \dot V = \Rep{W_{0}}\big(S(\exp_{W_{0}}(V))\big),\qquad V(0)=0. \nonumber \end{align} \tag{ 6.3 }$

Taking into account (4.6) and subtracting the right-hand side of the approximation (4.3b) from the above right-hand side gives

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\Rep}[1]{R_{#1}} \newcommand{\la}{\langle} \newcommand{\R}{\mathbb{R}} \displaystyle \label{eq:S-exp-V-approximation} \Rep{W_{0}}\big(S(\exp_{W_{0}}(V))\big) - \Rep{W_{0}}(s_{0}+S_{0} V) = \Rep{W_{0}}\big(S(\exp_{W_{0}}(V)) - S(W_{0}) - {\rm d}S_{W_{0}}(V)\big), \nonumber \end{align} \tag{ 6.4 }$

which shows that this approximation deteriorates with increasingly large tangent vectors $V$ . As a consequence, we first solved the linear flow (4.2) using (4.3) without updating the point of linearization $W_{0}=\mathbb{1}_{\mathcal{W}}$ and fixed after termination at $\newcommand{\mrm}[1]{\mathrm{#1}} k_{\mrm{end}}$ (=number of required outer iterations) the constant

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\mrm}[1]{\mathrm{#1}} \displaystyle \|V\|_{\max} = \max_{i \in I}\|V_{i}^{(k_{\mrm{end}})}\|. \nonumber \end{align} \tag{ 6.5 }$

Then we solved the linear assignment flow again and updated the linearization point W₀ in view of (6.4) whenever

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:def-V-s-threshold} \max_{i \in I}\|V_{i}^{(k)}\| > \frac{\|V\|_{\max}}{c},\qquad c \geqslant 1, \nonumber \end{align} \tag{ 6.6 }$

using the parameter c to control the number of linearizations: a single linearization and no linearization update if c = 1 and an increasing number of updates for larger values of c. We updated components of the linearization point W_0,i by $W_{i}(h),\, i \in I$ only when $\min_{j \in J} W_{ij}(h) > 0.01$ , in order to keep linearization points inside the simplex, in view of the entries (4.7d) of ${\rm d}S_{W_{0}}$ normalized by components of W_0,k.

After termination, the induced labelings were compared to those of the full assignment flow, and the number of wrongly assigned labels was taken as a quantitative measure for the approximation property of the linear assignment flow: Except for the minimal neighborhood size $|\mathcal{N}_{i}|=3$ , a single linearization almost suffices to obtain a correct labeling. Overall, the maximal number of 3 labeling errors (out of 192) is very small, and these errors merely correspond to shifts by a single pixel position of the signal transition in the case $|\mathcal{N}_{i}|=9$ (see figure 8). We conclude that for supervised labeling, the linear assignment flow (4.2) (which is nonlinear(!)—see remark 4.3) indeed captures a major part of nonlinearity of the full assignment flow (2.20). Figure 9 illustrates the similarity of the two flows (and the dissimilarity in the case $|\mathcal{N}_{i}|=3$ ) in terms of all $|I|=192$ sequences $(W_{i}^{(k)}),\,i \in |I|$ , plotted as piecewise linear trajectories.

**Figure 8.** Nonlinear versus linear assignment flow. top row: labelings determined by the assignment flow as a reference for the linear assignment flow, using different neighborhood sizes $|\mathcal{N}_{i}|$ . bottom row: labeling determined by the linear assignment flow that differs in 3 pixels from the corresponding above result of the full assignment flow. These errors and the two errors in the case $|\mathcal{N}_{i}|=9$ are shown in the center and right panel. These and further results listed as table 1 show that the linear assignment flow achieves high-quality labelings.
Download figure:
Standard image High-resolution image

**Figure 8.** Nonlinear versus linear assignment flow. top row: labelings determined by the assignment flow as a reference for the linear assignment flow, using different neighborhood sizes $|\mathcal{N}_{i}|$ . bottom row: labeling determined by the linear assignment flow that differs in 3 pixels from the corresponding above result of the full assignment flow. These errors and the two errors in the case $|\mathcal{N}_{i}|=9$ are shown in the center and right panel. These and further results listed as table 1 show that the linear assignment flow achieves high-quality labelings.
Download figure:
Standard image High-resolution image

**Figure 9.** Nonlinear versus linear assignment flow. Comparison of the assignment flow (2.20) (top row) and the linear assignment flow (4.2) (bottom row) in terms of all $|I|$ solution curves $W_{i}(t),\, i \in I$ , plotted in the 3-label simplex. A major part of the trajectories approaches more or less directly a vertex, whereas another part changes the original direction due to regularization by geometric smoothing. Except for the cases with the minimal neighborhood $|\mathcal{N}_{i}|=3$ , the similarity of both flows is apparent. This illustrates the reason for very small observed numbers of labeling errors, as listed and depicted by table 1 and figure 8.
Download figure:
Standard image High-resolution image

**Figure 9.** Nonlinear versus linear assignment flow. Comparison of the assignment flow (2.20) (top row) and the linear assignment flow (4.2) (bottom row) in terms of all $|I|$ solution curves $W_{i}(t),\, i \in I$ , plotted in the 3-label simplex. A major part of the trajectories approaches more or less directly a vertex, whereas another part changes the original direction due to regularization by geometric smoothing. Except for the cases with the minimal neighborhood $|\mathcal{N}_{i}|=3$ , the similarity of both flows is apparent. This illustrates the reason for very small observed numbers of labeling errors, as listed and depicted by table 1 and figure 8.
Download figure:
Standard image High-resolution image

We cannot assure, however, that this approximation property persists in more general cases (see remark 3.2) whose study is beyond the scope of the present paper.

We now turn to the scenarios shown by figure 3. Figure 10 shows the results obtained using the implicit Euler scheme and the same parameter settings that were used to integrate the nonlinear flow, to obtain the ground truth flows and results depicted by the leftmost panels of figures 4 and 5. Comparing the labelings returned by the linear and nonlinear assignment flow, respectively, confirms the discussion of the 1D experiments detailed above: the results agree except for a very small subset of pixels close to signal transitions which are immaterial for subsequent image interpretation.

The results shown by figure 10 served as ground truth for studying the explicit numerical schemes of sections 6.3.2 and 6.3.3 for integrating the linear assignment flow.

6.3.2. Adaptive RK schemes.

We evaluated the adaptive RK schemes (FE) of order q = 1 and (RK4) of order q = 4, due to (5.1), supposed to integrate the linear ODE (4.3b), after rearranging the polynomials of (5.1) in Horner form.

Figures 11 and 12 show the results for the linear assignment flow based on a single linearization at the barycenter, using the results shown by figure 10 as ground truth. The step sizes h_k were computed at each iteration k using the local error estimate (5.7) such that $\frac{1}{|I|^{1/2}} \|V(t_{k+1})-V^{(k+1)}\| \leqslant \tau = 0.01$ , that is on average $\|V_{i}(t_{k+1})-V_{i}^{(k+1)}\| \leqslant \tau$ for all pixels $i \in I$ . The spectral norm $\|A\|$ was computed beforehand using the basic power iteration.

**Figure 11.** Linear assignment flow, adaptive RK-schemes. Results of the linear assignment flow (4.2) based on the parametrization (4.3), the RK schemes (5.1) of order q = 1 (FE) and q = 4 (RK4), and adaptive step size selection based on the local error estimate (5.7). The labeling results for q = 1, q = 4 are almost identical to ground truth from figure 10.
Download figure:
Standard image High-resolution image

**Figure 12.** Linear assignment flow, adaptive RK-schemes. Results of the linear assignment flow (4.2) for $|\mathcal{N}_{i}|=3 \times 3$ (top row) and $|\mathcal{N}_{i}|=5 \times 5$ (bottom row) based on the parametrization (4.3), the RK schemes (5.1) of order q = 1 (FE, second column) and q = 4 (RK4, third column). The labeling results for q = 1, q = 4 are almost identical to the ground truth from figure 10. The rightmost column shows the adaptive step size selection based on the local error estimate (5.7). For fixed q, increasing the neighborhood size $|\mathcal{N}_{i}|$ has almost no effect: the step size sequences agree up to the second digit.
Download figure:
Standard image High-resolution image

**Figure 12.** Linear assignment flow, adaptive RK-schemes. Results of the linear assignment flow (4.2) for $|\mathcal{N}_{i}|=3 \times 3$ (top row) and $|\mathcal{N}_{i}|=5 \times 5$ (bottom row) based on the parametrization (4.3), the RK schemes (5.1) of order q = 1 (FE, second column) and q = 4 (RK4, third column). The labeling results for q = 1, q = 4 are almost identical to the ground truth from figure 10. The rightmost column shows the adaptive step size selection based on the local error estimate (5.7). For fixed q, increasing the neighborhood size $|\mathcal{N}_{i}|$ has almost no effect: the step size sequences agree up to the second digit.
Download figure:
Standard image High-resolution image

As explained above when the criterion (5.18) was introduced, step sizes must decrease due to the increasing norms $\|V^{(k)}\|$ , in order to keep the local integration error bounded. Furthermore, in agreement with figure 2, raising the order q of the integration scheme leads to significantly larger step sizes and hence to smaller total numbers of iterations, at the cost of more expensive iterative steps. Yet, roughly taking these additional costs into account by multiplying the total iteration numbers for q = 4 by 4, indicates that raising the order q reduces the overall costs. In this respect our findings for the linear assignment flow differ from the corresponding findings for the nonlinear assignment flow and the embedded RKMK schemes, discussed in section 6.2.

Table 1. Approximation property of the linear assignment flow. The entries $x, y$ specify the number x of linearizations and the number y of wrongly assigned labels (out of 192 assigned labels), depending on the neighborhood size $|\mathcal{N}_{i}|$ (strength of regularization) and the parameter c specifying the tangent space threshold (6.6).

		c
		1	2	3	4	5
	3	1, 3	4, 3	7, 3	10, 1	13, 0
$\|\mathcal{N}_{i}\|$	5	1, 0
	9	1, 2	5, 0

6.3.3. Exponential integrator.

For integrating the linear assignment flow with exponential integrators, we consider equation (4.19) and the Krylov space approximation (4.23)

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \newcommand{\la}{\langle} \displaystyle \label{eq:experiments-exp-int} V(T) = T \varphi_1 \big(T A\big) a \approx T \|a\| \mathcal{V}_m \varphi_1(T \mathcal{H}_m) e_1. \nonumber \end{align} \tag{ 6.7 }$

As the evaluation of $\varphi_1(T \mathcal{H}_m) e_1$ is explained in section 4.2, we only discuss here the choice of the parameters m and T.

The dimension m of the Krylov subspace controls the quality of the approximation, where larger values theoretically lead to a better approximation. In our experiments, rather small numbers, like m = 5, turned out to suffice to produce labelings very close to the ground truth labelings, that were generated by the implicit Euler method—see figures 13 and 14. As the runtime of the algorithm increases with growing m, this parameter should not be chosen too large.

**Figure 13.** Linear assignment flow, exponential integrators, influence of the Krylov subspace dimension m and the point of evaluation T. The results correspond to the labeling problem shown by figure 14. They demonstrate that m can be chosen quite small. Figure (a) shows that the mean entropy decreases with increasing T but does not decrease for *fixed T* and m > 3. Figure (b) shows for $T=1, \ldots, 29$ (same color code as (a)) and for each m the number of labels that change when the dimension of the Krylov subspace is increased to m + 1. All curves decrease with increasing m, and curves that reach 0 label changes just discontinue due to the logarithmic scale. The plot shows that independent of T (i.e. for any T) there is an m such that increasing the dimension of the Krylov space does not change the labeling at all.
Download figure:
Standard image High-resolution image

**Figure 13.** Linear assignment flow, exponential integrators, influence of the Krylov subspace dimension m and the point of evaluation T. The results correspond to the labeling problem shown by figure 14. They demonstrate that m can be chosen quite small. Figure (a) shows that the mean entropy decreases with increasing T but does not decrease for *fixed T* and m > 3. Figure (b) shows for $T=1, \ldots, 29$ (same color code as (a)) and for each m the number of labels that change when the dimension of the Krylov subspace is increased to m + 1. All curves decrease with increasing m, and curves that reach 0 label changes just discontinue due to the logarithmic scale. The plot shows that independent of T (i.e. for any T) there is an m such that increasing the dimension of the Krylov space does not change the labeling at all.
Download figure:
Standard image High-resolution image

**Figure 14.** Linear assignment flow, exponential integrators, sample labelings. Figure (a) shows the ground truth labeling as generated by the implicit Euler method. Figure (b) displays the labeling generated by the exponential integrator using the Krylov subspace dimension m = 5, which is very close to the ground truth labeling. As demonstrated by figure 13, m cannot be chosen too small, however, since labelings then start to deteriorate rapidly. This is illustrated by figure (c) which shows the labeling for m = 3. Comparison with (b) shows that dimensions 4 and 5 'contain' small-scale details of the correct labeling induced by the linear assignment flow.
Download figure:
Standard image High-resolution image

Scaling A and a in (4.18) affects the vector field defining the linear ODE (4.3b). Hence, fixing any time point T depends on this scaling factor, too. As a consequence, since A and a depend on the problem data (4.18), the choice of T is problem dependent. On the other hand, the discussion following the proof of theorem 5.2 showed that $\|V(t)\|$ increases with t, and T merely has to chosen large enough such that $W(T)$ defined by (4.3a) satisfies the termination criterion (6.2)—see (5.20c) for a rough estimate. Choosing T overly large will cause numerical underflow and overflow issues, however.

Almost all runtime is consumed by the Arnoldi iteration producing the subspace basis $\mathcal{V}_{m}$ . Due to the small dimension m, the total runtime is very short, and time required for the subsequent evaluation of the right-hand side of (6.7) is negligible.

7. Conclusion

We investigated numerical methods for image labeling by integrating the large system of nonlinear ODEs defining the assignment flow (2.20), which evolves on the assignment manifold. All methods exactly respect the underlying geometry. Specifically, we adapted RKMK methods and showed that embedded RKMK-methods work very well for automatically adjusting the step size, at negligible additional costs. Raising the order enables leads to larger step sizes, which is compensated by the higher computational costs per iteration, however. In either case, each iteration only involves convolution like operations over local neighborhoods together with pixelwise nonlinear functions evaluations.

We derived and introduced the linear assignment flow, a nonlinear approximation of the (full) assignment flow that is governed by a large linear system of ODEs on the tangent space. Experiments showed that the approximation is remarkably close, as measured by the number of different label assignments.

We investigated two further families of numerical schemes for integrating the linear assignment flow: established RK schemes with adaptive step size selection based on a local integration error estimate, and exponential integrators for approximately evaluating Duhamel's integral using a Krylov subspace. In the former case, higher-order schemes really pay, unlike for the RKMK schemes and the full assignment flow, as mentioned above. Choosing the classical RK scheme with q = 4, for example, few dozens of iterations suffice to reach the termination criterion, with high potential for parallel implementation. The exponential integrators, on the other hand, directly approximate the integral determining $V(T)$ and in this sense are non-iterative. Here, a Krylov subspace basis of low dimension merely has to be computed, using a standard iterative method. Even though this method is differs mathematically from the RK schemes, it has potential for real-time implementation as well.

All methods provide a sound basis for more advanced image analysis tasks that involve labeling by evaluating the assignment flow as a subroutine. Accordingly, our future work concerns an extension of the unsupervised labeling approach [9], where label dictionaries are directly learned from data through label assignment. Furthermore, methods under investigation for learning to adapt regularization parameters of the assignment flow to specific image classes, require advanced discretization and numerical methods based on the results reported in the present paper.

Acknowledgments

This work was supported by the German Research Foundation (DFG), Grant No. GRK 1653.

Geometric numerical integration of the assignment flow

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Overview, motivation

1.2. Contribution, organization

1.3. Basic notation

2. The assignment flow

3. Geometric Runge–Kutta integration

3.1. General approach

3.2. Application to the assignment flow

4. Linear assignment flow, exponential integrator

4.1. Linear assignment flow

4.2. Exponential integrator

5. Step sizes, adaptivity

5.1. Linear assignment flow

5.2. Nonlinear assignment flow

6. Experiments and discussion

6.1. Implementation details

6.1.1. Ground truth flows.

6.1.2. Termination criterion.

6.1.3. Data.

6.2. Nonlinear flow: embedded RKMK-schemes

6.3. Linear assignment flow

6.3.1. Approximation property.

6.3.2. Adaptive RK schemes.

6.3.3. Exponential integrator.

7. Conclusion

Acknowledgments

Geometric numerical integration of the assignment flow

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

1.1. Overview, motivation

1.2. Contribution, organization

1.3. Basic notation

2. The assignment flow

3. Geometric Runge–Kutta integration

3.1. General approach

3.2. Application to the assignment flow

4. Linear assignment flow, exponential integrator

4.1. Linear assignment flow

4.2. Exponential integrator

5. Step sizes, adaptivity

5.1. Linear assignment flow

5.2. Nonlinear assignment flow

6. Experiments and discussion

6.1. Implementation details

6.1.1. Ground truth flows.

6.1.2. Termination criterion.

6.1.3. Data.

6.2. Nonlinear flow: embedded RKMK-schemes

6.3. Linear assignment flow

6.3.1. Approximation property.

6.3.2. Adaptive RK schemes.

6.3.3. Exponential integrator.

7. Conclusion

Acknowledgments