Abstract
The assignment flow is a smooth dynamical system that evolves on an elementary statistical manifold and performs contextual data labeling on a graph. We derive and introduce the linear assignment flow that evolves nonlinearly on the manifold, but is governed by a linear ODE on the tangent space. Various numerical schemes adapted to the mathematical structure of these two models are designed and studied, for the geometric numerical integration of both flows: embedded Runge–Kutta–Munthe–Kaas schemes for the nonlinear flow, adaptive Runge–Kutta schemes and exponential integrators for the linear flow. All algorithms are parameter free, except for setting a tolerance value that specifies adaptive step size selection by monitoring the local integration error, or fixing the dimension of the Krylov subspace approximation. These algorithms provide a basis for applying the assignment flow to machine learning scenarios beyond supervised labeling, including unsupervised labeling and learning from controlled assignment flows.
Export citation and abstract BibTeX RIS
1. Introduction
1.1. Overview, motivation
The assignment flow, recently introduced by [1] and detailed in section 2, denotes a smooth dynamical system evolving on an elementary statistical manifold, for the contextual classification of a finite set of data given on an arbitrary graph. Vertices of the graph are associated with the elements of the data set and correspond to locations in space and/or time in typical applications. Classification means to assign each datum exactly to one class representative, called label, out of a finite set of predetermined labels. Contextual classification means that these decision directly depend on each other, as encoded by the edges (adjacency relation) of the underlying graph. In the context of image analysis, classifying given image data on an image grid graph in this way is called the image labeling problem. We point out, however, that the assignment flow applies to arbitrary data represented on a graph.
A key property of the assignment flow is that decision variables do not live in the space used to model the data. Rather, a probability simplex is associated with each datum, on which a flow evolves until it converges to one of the vertices of the simplex that encode the labels. Each simplex is equipped with the Fisher–Rao metric which turns the relative interior of the simplex into a smooth Riemannian manifold. It is this particular geometry that effectively promotes discrete decisions that interact in a smooth way. Replacing in addition the Riemannian (Levi-Civita) connection by the -connection (with ) introduced by Amari and Chentsov [2], enables to carry out basic geometric operations in a computationally efficient way. Keeping the assignment flow as 'inference engine' separate from the data space and model allows to flexibly apply it to a broad range of contextual data classification problems. We refer to [2, 3] as basic texts on information geometry and to [4] for more information on the image labeling problem.
From a more distant viewpoint, our work ties in with the recent trend to explore the mathematics of deep networks from a dynamical systems perspective [5]. A frequently cited paper in this respect is [6] where a connection was made between the so-called residual architecture of networks and explicit Euler integration steps of a corresponding system of nonlinear ordinary differential equations (ODEs). We refer to [7] for a good exposition. While this offers a novel and fresh perspective on the learning problem of network parameters, it does not alter the basic ingredients of such networks that apparently have been adopted in an ad-hoc way, like parametrized static layers connected by nonlinear transition functions, ReLU activations etc.
By contrast, the assignment flow provides a smooth dynamical system on a graph (network), where all ingredients coherently fit into the overall mathematical framework. Based on this, we recently showed how discrete graphical models for image labeling can be evaluated using the assignment flow [8], and how unsupervised labeling can be modeled by coupling the assignment flow and Riemannian gradient flows for label evolution on feature manifolds [9]. Our current work, to be reported elsewhere, studies machine learning problems based on controlling the assignment flow. Here, in particular, algorithms play a decisive role that accurately integrate the assignment flow numerically on the manifold where it evolves. A thorough study of such algorithms is the subject of the present paper.
1.2. Contribution, organization
This paper presents two interrelated contributions, as illustrated by figure 1.
- (1)We derive from the assignment flow—henceforth called nonlinear assignment flow—the linear assignment flow, that still is nonlinear but governed by a linear ODE on the tangent space. This property is attractive for modeling tasks (e.g. parameter estimation and control) as well as for the design of numerical algorithms. In particular, our experiments show that the linear flow closely approximates the nonlinear flow, as far as concerns the final labeling results.
- (2)We study a range of algorithms for numerically integrating both the nonlinear and the linear assignment flow, respectively, while properly taking into account the underlying geometry.
- (a)Regarding the nonlinear assignment flow, we adopt the machinery of Lie group methods for the numerical integration of ODEs on manifolds [10] and devise corresponding extensions of classical Runge–Kutta (RK) schemes, called RKMK schemes (Runge–Kutta–Munthe–Kaas) in the literature [11]. We combine pairs of these extensions to form embedded RKMK schemes for adaptive step size control, analogous to classical embedded RK schemes [12].
- (b)Regarding the linear assignment flow, we take advantage in two alternative ways of the linearity of the flow on the tangent space.
- (i)On the one hand, we derive a local error estimate in order to apply classical RK schemes [12] on the tangent space, with step sizes that adapt automatically.
- (ii)
All these explicit numerical schemes are evaluated and discussed in section 6, using 'ground truth' flows as a baseline that were computed using the implicit geometric Euler scheme with a sufficiently small step size. All iterative algorithms are parameter free, except for specifying a single tolerance value with respect to the local error, that governs adaptive step size selection. Our experiments indicate a value for this parameter that 'works' regarding integration accuracy and labeling quality, but is not too conservative (i.e. small). In the case of the exponential integrator, we merely have to supply the final point of time T at which the linear assignment flow should be evaluated, in addition to the dimension of the Krylov subspace which controls the quality of the approximation. We conclude with a synopsis of our results in section 7.
1.3. Basic notation
Index sets I and J index vertices of the underlying graph and labels , respectively. and denote the basic statistical manifolds that we work with, defined in section 2. Points are strictly positive probability vectors, and we denote efficiently by
componentwise multiplication for general vectors, and componentwise subdivision only if . Likewise, functions like the exponential function and the logarithm with vectors as arguments apply componentwise,
and denote exponential mappings defined in section 2, whereas denotes the matrix exponential in section 4.2. The ordinary exponential function defined on the real line is always denoted by .
denotes the constant 1-vector with appropriate number of components depending on the context. We use the common shorthand with . denotes the -norm and the -norm if .
2. The assignment flow
We summarize the assignment flow introduced by [1] and related concepts required in this paper. Let be a given graph and let
be data given in a metric space
We call image data even though the f i typically represent features extracted from raw image at pixel as a preprocessing step. G may be a grid graph (with self-loops) as in low-level image processing or a less structured graph, with arbitrary connectivity in terms of the neighborhoods
We associate with each neighborhood weights satisfying
These weights parametrize the regularization property of the assignment flow and are assumed to be given. How to learn them from data in order to control the assignment flow will be reported elsewhere.
Along with we assume prototypical data
to be given, henceforth called labels. Each label gj represents the data of class j . Image labeling denotes the problem to assign class labels to image data depending on the local context encoded by the graph G. We refer to [8] for more details and background on the image labeling problem.
Assignments of labels to data are represented by discrete probability distributions
where
denotes the relatively open probability simplex equipped with the Fisher–Rao metric
which turns into a Riemannian manifold. In connection with , we define the
of , i.e. the uniform distribution, the orthogonal projection
and the linear replicator map
satisfying
Adopting the -connection with from information geometry as introduced by Amari and Chentsov [2, section 2.3], [3], the exponential map based on the corresponding affine geodesics reads
with inverse [1, appendix]
Specifically, we define
with inverse [1, appendix]
Remark 2.1. Calling (2.12b) the 'inverse' map is justified by the fact that does not depend on any constant component of the argument vector z. Yet, we choose as domain because will be applied to arbitrary distance vectors (see (2.17)) arising from given data, and the notation indicates that the implementation does not need to remove this component explicitly [1, remark 4].
These mappings naturally extend to the collections of assignment vectors (2.5), regarded as points on the
with tangent space
and the corresponding mappings
and similarly defined based on (2.11b), (2.12a) and (2.12b). Finally, we define the geometric mean of assignment vectors [1, lemma 5]
where is the componentwise exponentiation of Wk with wik.
Using this setting, the assignment flow accomplishes image labeling as follows. Based on (2.1), (2.4), distance vectors
are defined and mapped to
where is a user parameter to normalize the distances induced by the specific features f i at hand. This representation of the data is regularized by local geometric smoothing to obtain
which in turn evolves the assignment vectors through the
Methods for numerically integrating this flow are examined in the following sections.
3. Geometric Runge–Kutta integration
We apply the general approach of [11] to our problem. For background and more details, we refer to [10] and [16, chapter 4].
3.1. General approach
In order to apply Lie group methods to the integration of an ODE on a manifold , one has to check first if the ODE can be represented properly. Let
denote the action of a Lie group on satisfying
Furthermore, let denote the Lie algebra of , the set of all smooth vector fields on ,
a smooth function and the induced map defined by
Then is a Lie algebra action if the induced map is a Lie algebra homomorphism, i.e. is linear and satisfies , with the Lie brackets on and on the left-hand side and the right-hand side, respectively. In particular, based on a Lie group action , a Lie algebra action is given by [11, lemma 4]
where denotes the exponential map of . Thus, for this choice of , the induced map (3.3) is given by [11, theorem 5]
Now, given an ODE on , the basic assumption underlying the application of Lie group methods is the existence of a function such that the ODE admits the representation
For sufficiently small t, the solution of (3.6) then can be parametrized as
where satisfies the ODE
with the inverse differential of evaluated at . A major advantage of the representation (3.7) is that the task of numerical integration concerns the ODE (3.7b) evolving on the vector space , rather than the original ODE evolving on the manifold . As a consequence, established methods can be applied to (3.7b), possibly after approximating by a truncated series in a computationally feasible form.
3.2. Application to the assignment flow
Assume an ODE on defined by (2.6) is given. The application of the approach of section 3.1 is considerably simplified by identifying with the flat tangent space (2.7) and consequently also . One easily verifies that the action defined as
with the right-hand side given by (2.12a), satisfies (3.1), i.e.
Proposition 3.1. The solution to assignment flow (2.20) emanating from any W0 = W(0) admits the representation
where solves
Proof. Since geodesics through in directions have the form , the differential of the exponential map of , , is the identity and thus (3.5) gives
with defined by (2.9b). As a result, the basic assumption (3.6) concerns ODEs on that admit the representation
for some function and some . Since by (3.4), the parametrization (3.7) reads
where solves
This setting extends to the assignment flow by defining (see (2.15)) and as
The basic assumption (3.6) then reads
which is the assignment flow (2.20). Due to (3.6), for any W0 = W(0), it admits the representation
where solves
which is (3.10). □
Remark 3.2. While the basic formulation (2.20) of the assignment flow is autonomous, we keep in what follows the explicit time dependency of the function of the parametrization (3.10), because in more advanced scenarios the flow may become non-autonomous. A basic example concerns unsupervised problems [9] where labels vary, and hence the distance vectors (2.17) and in turn the vector field defining the assignment flow depend on t.
Using the above representation and taking into account the simplifications of the general approach of section 3.1, the RKMK algorithm [11] for integrating the assignment flow from t = 0 to t = h is specified as follows. Let be the coefficients of an s-stage, qth order classical RK method satisfying the consistency condition [12, chapter II]. Starting from any point
the algorithm amounts to compute the vector fields
and the update
Replacing , computing the update and iterating this procedure generates the sequence which approximates .
A s-stage RKMK scheme is specified using the corresponding Butcher tableau of the form
Specifically, we consider the following explicit RKMK schemes of order 1, 2, 3, 4:
Note the increasing number of stages that raise the approximation order. This comes at a price, however, because each stage evaluates at step (3.17c) the right-hand side of (3.10b) which is the most expensive operation. As a consequence, it is not clear a priori if using a multi-stage scheme and a larger step size h is superior to a simpler scheme that is evaluated more frequently using a smaller step size.
In addition to the above explicit schemes, we consider the simplest implicit RKMK scheme
Implicit schemes are known to be stable for much larger step sizes. Yet, they require to solve at every step a fixed point equation which is done by an iterative inner loop.
The performances of these numerical schemes are examined in section 6.
4. Linear assignment flow, exponential integrator
The ODE (3.10b) which parametrizes the assignment flow together with (3.10a), evolves on a linear space but is a nonlinear system of ordinary differential equations. In this section, we provide an approximate representation of the assignment flow within time intervals through a linear ODE evolving on the tangent space (section 4.1), and a corresponding numerical scheme (section 4.2).
The resulting flow on the assignment manifold is still nonlinear, though. The basic idea is to capture locally a major part of the nonlinearity of the (full) assignment flow, by a linear ODE on the tangent space that enables to apply alternative integration schemes.
4.1. Linear assignment flow
Our ansatz has two ingredients. Firstly, we adopt the parametrization
of the solution to the assignment flow by a trajectory in the tangent space , similar to (3.10a), except for using the 'true' exponential map (2.11a) and (2.15d), respectively, corresponding to the underlying affine connection. Secondly, we use an affine approximation of the vector field on the right-hand side of (2.20), that defines the assignment flow. The following corresponding definition generalizes the flow studied by [17] from the barycenter to arbitrary base points W0, and from a flow on to a flow on .
Definition 4.1 (linear assignment flow). We call linear assignment flow every flow induced by an ODE of the form
with a fixed vector s0 and a fixed matrix S0, for arbitrary W0.
An important property of the flow (4.2)—which explains its name—is the possibility to parametrize it by a linear ODE evolving on the tangent space .
Proposition 4.2. The linear assignment flow (4.2) admits the representation
where solves
Proof. Parametrization (4.1) yields
and by differentiation
Solving (4.2) for after inserting (2.15c), and substitution in the preceding equation gives
and since by (2.9b)
The initial condition follows from . □
Remark 4.3. Note that, despite the linearity of (4.3b), the resulting flow (4.3a) solving (4.2) is nonlinear. Thus, one may hope to capture the major nonlinearity of the full assignment flow (2.20) by a linear ODE on the tangent space, at least locally in some time interval. Within this interval, the evaluation of (2.19) is not required, and the linearity of the tangent space ODE (4.3b) can be exploited for integration.
We conclude this section by computing the natural choice
of the parameters of the linear assignment flow (4.2) in explicit form, where s0 is immediate due to (2.19), but the Jacobian of , evaluated at W0, is not.
Proposition 4.4. Let denote the global similarity vector obtained by stacking the local similarity vectors of (2.19). Then, with
and the replicator map defined by (2.9b), the Jacobian of at is given by
where the action of each block matrix has the form
and the non-zero entries if (using (4.7a))
The proof follows below after two preparatory lemmata.
Lemma 4.5. Let . Then the differential of at applied to is given by
Moreover, we have
Proof. Let be a smooth curve in T0 with and . Using (2.12a), we compute
which is (4.8). As for (4.9), using the representation
where the last equation takes into account remark 2.1, we obtain
□
We use this lemma to represent the similarity vectors in a convenient form for subsequently proving proposition 4.4.
Lemma 4.6. The similarity vectors (2.19) admit the representation
Proof. By (2.19) and (2.16), we obtain
using again for all (see remark 2.1)
□
Proof of proposition 4.4. Setting
we compute using a smooth curve in with and ,
Thus, using (4.15) and (4.8) gives
and using the linearity of the map and (4.16),
which proves (4.7c). Inserting due to (2.9b) yields (4.7d).
The following section specifies an alternative integration scheme for the linear assignment flow (4.2). Its approximation properties are numerically examined in section 6.
4.2. Exponential integrator
We focus on the linear ODE (4.3b) that together with (4.3a) determines the linear assignment flow due to (4.2). The solution to (4.3b) is given by Duhamel's formula [18],
which involves the matrix exponential of the matrix A of dimension (square of number of pixels number of labels), which can be quite large in image labeling problems (104–107 variables). Explicitly computing the matrix exponential is neither feasible, because it is dense even if A is sparse, nor required in view of the multiplication with the vector a. Rather, taking into account and that uniformly converging series can be integrated term by term, we set t = T large enough and evaluate
where is the entire function
whose series representation yields valid expressions (4.19c) also if the matrix A is singular.
We refer to [19] for a detailed exposition of matrix functions and to [20] and [19, sections 10 and 13] for a survey of methods for computing the matrix exponential and the product of matrix functions times a vector. For large problem sizes, the established methods of the two latter references are known to deteriorate, however, and methods based on Krylov subspaces have been developed [13, 14] and become the method of choice in connection with exponential integrators [15].
We confine ourselves with sketching below a state-of-the-art method [21] for the approximate numerical evaluation of (4.19). The evaluation of its performance for integrating the linear assignment flow and a comparison to the methods of section 5.1, are reported in section 6. A more comprehensive evaluation of further recent methods for evaluating (4.19) that cope with large problem sizes as well (e.g. [22]), is beyond the scope of this paper.
In order to compute approximately , one considers the Krylov subspace
with orthogonal basis arranged as column vectors of an orthogonal matrix and computed using the basic Arnoldi iteration [13]. The action of A is approximated by
which in turn yields the approximation
where denotes the first unit vector and the last equality is implied by the Arnoldi iteration producing , which sets . Note that merely has to be applied to the much smaller matrix , which can be safely and efficiently computed using standard methods [19, 20]. The vector can be recovered [23, theorem 1] in form of the upper m entries of the last column of with the extended matrix
If the degree of the minimal polynomial of a (i.e. the nonzero monic polynomial p of lowest degree such that ) is equal to m, then the approximation (4.23) is even exact [13, theorem 3.6].
5. Step sizes, adaptivity
We specify in this section the step size selection for the numerical RKMK schemes of section 3. In addition, for the linear assignment flow (section 4.1), we conduct a local error analysis in section 5.1 for RK schemes based on the linearity of the tangent space ODE that governs this flow. A corresponding explicit error estimate enables to determine a sequence of step sizes that ensure a prespecified local accuracy at each step k.
In order to determine step sizes for the nonlinear assignment flow (2.20), we proceed differently, because the corresponding vector field depends nonlinearly on the current iterate and estimating local Lipschitz constants will be expensive and less sharp. We therefore adapt in section 5.2 classical methods for local error estimation and step size selection for nonlinear ODEs based on embedded Runge–Kutta methods [12, section II.4], to the geometric RKMK methods of section 3.
The experimental evaluation of both approaches is reported in section 6.
5.1. Linear assignment flow
We focus on the linear ODE (4.3b) that together with (4.3a) determines the linear assignment flow (4.2). Due to its approximation property demonstrated in section 6.3.1, we only consider the linearization point . Since the ODE (4.2) evolves on the linear space , we apply the classical s-stage explicit RK scheme, rather than the geometric s-stage RKMK scheme (3.17), to obtain
Specifically, regarding the explicit schemes listed at the end of section 3.2 in terms of their Butcher tableaus, consecutively inserting (5.1a) into (5.1b) yields with the shorthands defined by (4.18),
Comparison with (4.18) shows that due to the linearity of the ODE, each scheme results in a corresponding Taylor series approximation, depending on its order q, of the equation
that is,
with matrix-valued polynomials . Our strategy for choosing the step size h is based on the local error estimate specified below as theorem 5.2 and prepared by the following Lemma.
Lemma 5.1 ([24, equation (8.4.8)]).with the incomplete Gamma function
Theorem 5.2. Let solve (4.2) with , and let be a sequence generated by a RK scheme (5.1) of order q. Set in (5.3). Then in (5.3) is the exact value of the linear assignment flow emanating from , and regarding (5.4) the local error estimate
holds, where is given by (5.6) and denotes the spectral norm of the matrix .
Proof. Using (5.3) and (5.4) and , we bound the local error by
and inserting the series (5.4a) gives
Both series absolutely converge for any h. By Lemma 5.1, we have
Applying these equations to (5.8b) yields
and substitution into (5.8)
which is (5.7a). To show (5.7b), we use the representation
and the lower bound [25, corollary of theorem 1]
that holds for all x > 0 and , with the Gamma function
Put
Since and , we set in view of (5.13). Furthermore, we have
and using , since q is integer,
Thus,
which after substituting (5.15) and in turn into (5.11), proves (5.7b). □
Theorem 5.2 enables to control the local integration error by choosing the step size h, using the simple form of the bound (5.7), depending on the constants and the norm of the current iterate. Specifically, we choose
by (5.7), for some prespecified value . Inspecting the parametrization (4.3) shows that grows—and hence the step sizes (5.18) decrease—until is close enough to a vertex of (which represents a labeling) and satisfies a termination criterion that stops the chosen iterative RK scheme (5.1).
In order to check how large then will be, assume
that is and if . Then with and by (4.3) and (2.12a)
and hence
Thus, as soon as the norm has grown to the order , a termination criterion that checks if is -close to some vertex of , will be satisfied.
Figure 2 quantitatively illustrates how much the factor of the upper bound (5.7) overestimates the exact factor (5.11) computed in the proof of theorem 5.2, and hence how conservative (i.e. too small) the step size hk will be chosen to achieve (5.18). The curves of figure 2 show that the estimate (5.7) is fairly tight and suited to adapt the step size. Furthermore, comparing the ordinate values of both panels for q = 1 and q = 4 shows that, in order to achieve a fixed accuracy in (5.18), using a higher-order RK scheme (5.1) enables to choose a larger step size.
Download figure:
Standard image High-resolution image5.2. Nonlinear assignment flow
Similar to the preceding section, we wish to select step sizes in order to control the local error on the left-hand side of (5.7). Because an estimate like the right-hand side of (5.7) that is valid at each step k, is not available for the nonlinear assignment flow, we adapt established embedded RK methods [12, section II.5] to the geometric RKMK schemes (3.17).
The basic strategy is to evaluate twice step (3.17d)
using a second collection of coefficients , but with the same vector fields . Thus each embedded method can be specified by augmenting the corresponding Butcher tableau accordingly,
Proper embedded methods combine a pair of RK schemes of different order so that indicates if the step size h is small enough, at each step k of the overall iteration. Since the vectors are used twice, this comes at little additional costs. We also point out that unlike the linear case, the magnitude of tangent vectors has much less influence, because the scheme (3.17) that is consecutively applied at each step k, is based on (3.10b) with the initial condition . As a consequence, the magnitude of the update (3.17d) will be relatively small at each step k.
We list the tableaus of two embedded methods that we evaluate in section 6.
The first method combines the forward Euler scheme and Heun's method of order 2. The second method complements Heun's method of order 3 as specified on [12, p 166]. We call RKMK-1/2 and RKMK-3/2 the geometric versions of these schemes when they are applied in connection with (3.17).
We conclude this section by specifying the extension of (3.17) in order to include adaptive step size control.
The distance function dI defined by (6.1). Typical parameter values are . Starting with a small initial step size h0, the algorithm adaptively generates a sequence (hk) whose values increase whenever the local error estimate is much smaller (by a factor ) than the prescribed tolerance .
6. Experiments and discussion
This section is organized as follows (see also figure 1). We specify details of our implementation in section 6.1. Section 6.2 reports the evaluation of the geometric RKMK schemes (3.17) with embedded step size control (algorithm 1), for integrating the nonlinear assignment flow. Section 6.3 is devoted to the linear assignment flow: assessment of how closely it approximates the nonlinear assignment flow, evaluation of the RK schemes (5.1) with adaptive step size selection (5.18), and evaluation of the exponential integrator introduced in section 4.2.
Algorithm 1. Embedded RKMK with adaptive step size control.
6.1. Implementation details
All algorithms were implemented and evaluated using Mathematica. We did not apply any assignment normalization as suggested by [1, section 3.3.1], since mathematica can work with arbitrary numerical precision. Our experiments thus illustrate intrinsic properties of the assignment flow the hold on the open assignment manifold , and the reliability and efficiency of a variety of algorithms for integrating this flow numerically while respecting the underlying geometry. We do not examine the flow on the boundary of which does not belong to the domain of our approach, by definition. For an investigation of a particular numerical scheme in this specific context, we refer to [26].
Throughout the experiments, we used uniform weights in (2.19) and (2.16), respectively, since how to choose these 'control variables' in a proper way depending on the application at hand, is subject of our current work. Yet, we point out that the algorithms of the present paper do cover such more general scenarios. For example, the simplest geometric RKMK scheme (3.17) was recently used for integrating the assignment flow in unsupervised scenarios [9], where labels evolve and hence distance vectors (2.17) no longer are static but vary with time , too.
6.1.1. Ground truth flows.
In order to obtain a baseline for assessing the performance of linearizations, of approximate numerical integration by various schemes or both, we always solved the assignment flow (nonlinear or linear) with high numerical accuracy using the geometric implicit Euler scheme (nonlinear flow) or the euclidean implicit Euler scheme (linear flow), with a sufficiently small step size h. This requires to solve a fixed point equation as part of every iteration k, involving the nonlinear mapping on the right-hand side of (3.10b) in case of the nonlinear assignment flow, or the linear mapping on the right-hand side of (4.3b) in case of the linear assignment flow. These fixed point equations were iteratively solved as well, and the corresponding iterations terminated when subsequent elements of the corresponding subsequences that measure the residual of the fixed point equation, satisfied
Starting these inner iterative loops with and terminating with , we set and continued with the outer iteration k + 1.
6.1.2. Termination criterion.
As suggested by [1], all iterative numerical schemes generating sequences (W(k)) were terminated when the average entropy of the assignment vectors dropped below the threshold
If this happens, then—possibly up to a tiny subset of pixels —all assignment vectors Wi are very close to a unit vector and hence almost uniquely indicate a labeling.
6.1.3. Data.
Besides using the one-dimensional signal shown by figure 7 that enables to visualize the entire evolution of the flow as plot in a single simplex (see figure 9), we used two further labeling scenarios for evaluating the numerical integration schemes.
Figure 3(a) shows a scenario adopted from [1, figure 6] using and . The input data (right panel) comprise 31 labels encoded as vertices (unit vectors) of a corresponding simplex. This results in uniform distances (2.17) and enables to assess in an unbiased way the effect of regularization by geometric diffusion in terms of the similarity map (2.19).
Download figure:
Standard image High-resolution imageFigure 3(b) shows a color image together with four color vectors used as labels, as illustrated by the panel on the right. In contrast to the data of figure 3(a) with a high level of noise and a uniform data term (as motivated and explained above), the input data shown on the left of figure 3(b) are not noisy but comprise spatial structures at quite different scales (fine texture, large homogeneous regions), causing a nonuniform data term and a more complex assignment flow.
Both scenarios together provide a testbed in order to check and compare schemes for numerically integrating the assignment flow.
6.2. Nonlinear flow: embedded RKMK-schemes
Figures 4 and 5 show the results of the two embedded RKMK schemes of section 5.2 used to integrate the full nonlinear assignment flow (2.20), for the data shown by left panels of figures 3(a) and (b).
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageThe two embedded RKMK schemes combine RKMK schemes of different approximation order , 1/2 and 3/2, respectively, which reuse vector field evaluations (3.17) in order to produce sequences of tangent vectors that enable to estimate the local approximation error by monitoring the distances . As specified by algorithm 1, step sizes hk adaptively increase provided a prescribed error tolerance is not violated.
The parameter values (tolerance) and (tolerance factor), used to produce the results shown by figures 4 and 5, suffice to integrate accurately the full nonlinear assignment flow by the respective explicit schemes, as the comparison with the ground truth labeling generated by the implicit geometric Euler scheme shows. Since RKMK-3/2 has higher order q than RKMK-1/2, larger step sizes can be tolerated. On the other hand, each iteration of RKMK-3/2 is about twice expensive as RKMK-1/2.
Both plots of the step size sequences (hk) reveal that the initial step size h0 = 0.01 was much too small (conservative), and that a fixed value of hk is adequate for most of the iterations. This value was larger for the experiment corresponding to figure 4 due to the uniform data term (by construction, as explained in section 6.1.3) and the more uniform scale of spatial structures. By contrast, the presence of spatial structures at quite different scales in the data corresponding to figure 5 causes a more involved assignment flow to be integrated, and hence to a smaller step size after adaption. Comparing the two rightmost panels of figure 5 shows that the strength of regularization (neighborhood size ) had only little influence on the sequences of step sizes.
We never observed decreasing step sizes in these supervised scenarios, that is the corresponding step of algorithm 1 never was active. This may change in more involved scenarios, however (see remark 3.2).
Overall, a few dozens of explicit iterations suffice for accurate geometric numerical integration of the assignment flow with well below 1% wrongly labeled pixels (figure 6). Each iteration may be implemented in a fine-grained parallel way and has computational costs roughly equivalent to a convolution, besides mapping to the tangent space and back to the assignment manifold , at each iteration.
Download figure:
Standard image High-resolution image6.3. Linear assignment flow
The approach of section 4 involves two different approximations:
- (i)
- (ii)
Due to the remarkable approximation properties of the linear assignment flow when a single linearization at the barycenter is only used (section 6.3.1), we entirely focused on this flow when evaluating the numerical schemes (a) and (b) in sections 6.3.2 and 6.3.3.
6.3.1. Approximation property.
We report a series of experiments for the 1D signal depicted by figure 7 using both the full and the linear assignment flow in order to check how closely the latter approximates the former. Then we discuss the linear assignment flow for the two 2D scenarios shown by figure 3.
Download figure:
Standard image High-resolution imageThe parameter value for scaling in (2.18) the data, for all 1D experiments discussed below. This gave a larger weight to the 'data term' so that—in view of the noisy data (figure 7)—the regularization property of the assignment flow (2.20), in terms of the similarity vectors Si(W) interacting through (2.19), was essential for labeling.
We first explain how the linearizations of the assignment flow were controlled. According to proposition 4.2, using the parametrization (4.3a) and the linear ODE (4.3b) is equivalent to the linear assignment flow (4.2). Using again the parametrization (4.3a) and repeating the proof of proposition 4.2 shows that the full assignment flow (2.20) is locally governed by the nonlinear ODE
Taking into account (4.6) and subtracting the right-hand side of the approximation (4.3b) from the above right-hand side gives
which shows that this approximation deteriorates with increasingly large tangent vectors . As a consequence, we first solved the linear flow (4.2) using (4.3) without updating the point of linearization and fixed after termination at (=number of required outer iterations) the constant
Then we solved the linear assignment flow again and updated the linearization point W0 in view of (6.4) whenever
using the parameter c to control the number of linearizations: a single linearization and no linearization update if c = 1 and an increasing number of updates for larger values of c. We updated components of the linearization point W0,i by only when , in order to keep linearization points inside the simplex, in view of the entries (4.7d) of normalized by components of W0,k.
After termination, the induced labelings were compared to those of the full assignment flow, and the number of wrongly assigned labels was taken as a quantitative measure for the approximation property of the linear assignment flow: Except for the minimal neighborhood size , a single linearization almost suffices to obtain a correct labeling. Overall, the maximal number of 3 labeling errors (out of 192) is very small, and these errors merely correspond to shifts by a single pixel position of the signal transition in the case (see figure 8). We conclude that for supervised labeling, the linear assignment flow (4.2) (which is nonlinear(!)—see remark 4.3) indeed captures a major part of nonlinearity of the full assignment flow (2.20). Figure 9 illustrates the similarity of the two flows (and the dissimilarity in the case ) in terms of all sequences , plotted as piecewise linear trajectories.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageWe cannot assure, however, that this approximation property persists in more general cases (see remark 3.2) whose study is beyond the scope of the present paper.
We now turn to the scenarios shown by figure 3. Figure 10 shows the results obtained using the implicit Euler scheme and the same parameter settings that were used to integrate the nonlinear flow, to obtain the ground truth flows and results depicted by the leftmost panels of figures 4 and 5. Comparing the labelings returned by the linear and nonlinear assignment flow, respectively, confirms the discussion of the 1D experiments detailed above: the results agree except for a very small subset of pixels close to signal transitions which are immaterial for subsequent image interpretation.
Download figure:
Standard image High-resolution imageThe results shown by figure 10 served as ground truth for studying the explicit numerical schemes of sections 6.3.2 and 6.3.3 for integrating the linear assignment flow.
6.3.2. Adaptive RK schemes.
We evaluated the adaptive RK schemes (FE) of order q = 1 and (RK4) of order q = 4, due to (5.1), supposed to integrate the linear ODE (4.3b), after rearranging the polynomials of (5.1) in Horner form.
Figures 11 and 12 show the results for the linear assignment flow based on a single linearization at the barycenter, using the results shown by figure 10 as ground truth. The step sizes hk were computed at each iteration k using the local error estimate (5.7) such that , that is on average for all pixels . The spectral norm was computed beforehand using the basic power iteration.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageAs explained above when the criterion (5.18) was introduced, step sizes must decrease due to the increasing norms , in order to keep the local integration error bounded. Furthermore, in agreement with figure 2, raising the order q of the integration scheme leads to significantly larger step sizes and hence to smaller total numbers of iterations, at the cost of more expensive iterative steps. Yet, roughly taking these additional costs into account by multiplying the total iteration numbers for q = 4 by 4, indicates that raising the order q reduces the overall costs. In this respect our findings for the linear assignment flow differ from the corresponding findings for the nonlinear assignment flow and the embedded RKMK schemes, discussed in section 6.2.
Table 1. Approximation property of the linear assignment flow. The entries specify the number x of linearizations and the number y of wrongly assigned labels (out of 192 assigned labels), depending on the neighborhood size (strength of regularization) and the parameter c specifying the tangent space threshold (6.6).
c | ||||||
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
3 | 1, 3 | 4, 3 | 7, 3 | 10, 1 | 13, 0 | |
5 | 1, 0 | |||||
9 | 1, 2 | 5, 0 |
6.3.3. Exponential integrator.
For integrating the linear assignment flow with exponential integrators, we consider equation (4.19) and the Krylov space approximation (4.23)
As the evaluation of is explained in section 4.2, we only discuss here the choice of the parameters m and T.
The dimension m of the Krylov subspace controls the quality of the approximation, where larger values theoretically lead to a better approximation. In our experiments, rather small numbers, like m = 5, turned out to suffice to produce labelings very close to the ground truth labelings, that were generated by the implicit Euler method—see figures 13 and 14. As the runtime of the algorithm increases with growing m, this parameter should not be chosen too large.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageScaling A and a in (4.18) affects the vector field defining the linear ODE (4.3b). Hence, fixing any time point T depends on this scaling factor, too. As a consequence, since A and a depend on the problem data (4.18), the choice of T is problem dependent. On the other hand, the discussion following the proof of theorem 5.2 showed that increases with t, and T merely has to chosen large enough such that defined by (4.3a) satisfies the termination criterion (6.2)—see (5.20c) for a rough estimate. Choosing T overly large will cause numerical underflow and overflow issues, however.
Almost all runtime is consumed by the Arnoldi iteration producing the subspace basis . Due to the small dimension m, the total runtime is very short, and time required for the subsequent evaluation of the right-hand side of (6.7) is negligible.
7. Conclusion
We investigated numerical methods for image labeling by integrating the large system of nonlinear ODEs defining the assignment flow (2.20), which evolves on the assignment manifold. All methods exactly respect the underlying geometry. Specifically, we adapted RKMK methods and showed that embedded RKMK-methods work very well for automatically adjusting the step size, at negligible additional costs. Raising the order enables leads to larger step sizes, which is compensated by the higher computational costs per iteration, however. In either case, each iteration only involves convolution like operations over local neighborhoods together with pixelwise nonlinear functions evaluations.
We derived and introduced the linear assignment flow, a nonlinear approximation of the (full) assignment flow that is governed by a large linear system of ODEs on the tangent space. Experiments showed that the approximation is remarkably close, as measured by the number of different label assignments.
We investigated two further families of numerical schemes for integrating the linear assignment flow: established RK schemes with adaptive step size selection based on a local integration error estimate, and exponential integrators for approximately evaluating Duhamel's integral using a Krylov subspace. In the former case, higher-order schemes really pay, unlike for the RKMK schemes and the full assignment flow, as mentioned above. Choosing the classical RK scheme with q = 4, for example, few dozens of iterations suffice to reach the termination criterion, with high potential for parallel implementation. The exponential integrators, on the other hand, directly approximate the integral determining and in this sense are non-iterative. Here, a Krylov subspace basis of low dimension merely has to be computed, using a standard iterative method. Even though this method is differs mathematically from the RK schemes, it has potential for real-time implementation as well.
All methods provide a sound basis for more advanced image analysis tasks that involve labeling by evaluating the assignment flow as a subroutine. Accordingly, our future work concerns an extension of the unsupervised labeling approach [9], where label dictionaries are directly learned from data through label assignment. Furthermore, methods under investigation for learning to adapt regularization parameters of the assignment flow to specific image classes, require advanced discretization and numerical methods based on the results reported in the present paper.
Acknowledgments
This work was supported by the German Research Foundation (DFG), Grant No. GRK 1653.