Non-uniform Kozlov–Treschev averagings in the ergodic theorem

V. I. Bogachev

doi:10.1070/RM9940

This research was supported by the Russian Foundation for Basic Research (grants nos. 18-31-20008 and 20-01-00432), Moscow Center of Fundamental and Applied Mathematics, and the Foundation for the Advancement of Theoretical Physics and Mathematics BASIS (grant no. 18-1-6-83-1).

§ 1. Introduction

This paper is devoted to generalizations and refinements of several results of Kozlov and Treschev in [58]–[60], [56] connected with the study of convergence of non-uniform averagings in the situation of the individual ergodic theorem and its extensions. In its original setting their problem was concerned with classical dynamical systems on manifolds (see [55], [61], and [57]), but both the formulations of the results and the methods of proofs could be carried over directly to a considerably more general context of measure spaces, measurable transformations of them, and functions on them. Moreover, in this more general situation the formulations and proofs turned out to be even slightly simpler. Therefore, our discussion below follows this generality. The cited papers have attracted the attention of other researchers and have been further developed (see [15], [20], [21], [53], [54], and also the recent paper [67] in physics). Below we give a survey of this research along with some new results.

The aforementioned non-uniform averaging is defined by the formula

$\begin{equation} K_t^\nu f(x)=\int_0^{+\infty}f(g_{ts}(x))\,\nu(ds) \end{equation} \tag{ 1.1 }$

for integrable functions $f$ on a probability space $(X,\mathscr{B},\mu)$ on which a semigroup of measurable transformations $g_t$ , $t\geqslant 0$ , acts with preservation of the measure $\mu$ , that is, $g_0(x)=x$ , $g_{t+s}(x)=g_t (g_s(x))$ , and the map $(t,x)\mapsto g_t(x)$ is measurable, where $\nu$ is a Borel probability measure on $[0,+\infty)$ . A more general problem deals with the integrals

$\begin{equation} K_t^\nu f(x)=\int_0^{+\infty} T_{ts}f(x)\,\nu(ds), \end{equation} \tag{ 1.2 }$

where $\{T_t\}$ is a semigroup of bounded linear operators on the space $L^1(\mu)$ . The question is posed of the existence of the limit of the function $K_t^\nu f$ almost everywhere or in the mean as $t\to+\infty$ . In addition, in this situation there arises an interesting dynamics of measures on the space $X$ (see §5).

If $\nu$ is Lebesgue measure on the interval $[0,1]$ , then after the change of variable $r=ts$ we arrive at the classical averages

$\begin{equation*} A_t f(x)=\frac{1}{t}\int_0^{t} f(g_{r}(x))\,dr \end{equation*}$

or, in the operator case,

$\begin{equation*} A_t f(x)=\frac{1}{t}\int_0^{t} T_r f(x)\,dr. \end{equation*}$

For the former integral the Birkhoff–Khinchin theorem (see [15], [64], [75], [31], and [34]) gives convergence almost everywhere for any integrable function $f$ , to the function ${\mathsf{E}(f\mid \mathscr{I})}$ equal to the conditional expectation of $f$ with respect to the $\sigma$ -algebra $\mathscr{I}$ generated by all measurable functions invariant with respect to all the transformations $g_t$ . This conditional expectation is also the projection $\mathscr{P}f$ of $f$ on the subspace of functions invariant with respect to all the transformations $g_t$ . For the latter integral a number of convergence results are also known, which we present in §3. The Kozlov–Treschev theorem gives convergence of $K_t^\nu f(x)$ to the same limit for bounded measurable functions $f$ and absolutely continuous measures $\nu$ .

There is a vast literature on individual ergodic theorems for operator semigroups (not necessarily generated by transformations of measure spaces, as in the first ergodic theorems) on spaces of integrable functions, beginning with the classical works [33] and [34] by Dunford and Schwartz, the main results in which will be recalled below. We note the survey [3], the papers [12], [5], [40], [90], [91], [51], [65], [44], [94], [95] investigating convergence of the ratios $A_tf(x)/A_tg(x)$ , which reduce to $A_tf(x)$ for $T_t1=1$ and $g=1$ but are more natural in other situations, the papers [35], [77], [80], [38], [39], [36], and [37] on convergence of resolvent averages of the form

$\begin{equation*} \lambda R_\lambda f(x)=\lambda\int_0^{+\infty}e^{-\lambda s}T_sf(x)\,ds, \end{equation*}$

which corresponds to the weight $e^{-s}$ and $\lambda\to 0$ (such averages were considered as early as [33] and [34]), the paper [1] on averaging with monotone weights, and also the papers [78], [79], [81], [82], [71], [13], and [96] on classical averaging or averaging with other special weights. However, it should be noted that the author of [12] already considered non-uniform averages of the form

$\begin{equation*} Z_tf(x)=\int_0^t T_sf(x)\,dv(s), \end{equation*}$

where $v$ is an increasing function of bounded variation satisfying the equation $v=1+v*w$ with some probability measure $w$ . In this setting the existence of a limit was proved for the ratio $Z_tf(x)/Z_tp(x)$ , where $f\in L^1(\mu)$ and $p$ is a probability density with respect to $\mu$ . If $T_t 1=1$ , then this case is similar to the classical situation with division of an integral by $t$ ; the only difference is that the integral over $[0,t]$ is taken not with respect to Lebesgue measure, but with respect to $dv$ . In addition, [28] and [29] dealt with non-uniform averages exactly of the form (1.1) with densities $\varrho$ having bounded supports, which has some specific features (results in these papers are mentioned below). In [6], sequences of non-uniform averages defined by sequences of different weights were studied. Many works have been devoted to non-uniform discrete averages for single transformations or operators (see [64], [11], [69], and the literature therein). Certain non-uniform averages of a different kind connected with ergodic group actions were considered in [87]–[89], [7], [70], and [42], where there are also additional references. There is an even more extensive literature on convergence of averages in the mean, but here we hardly touch upon this question.

In §3 we discuss convergence of non-uniform averages for unbounded functions $f$ mostly in the second, more general, case of semigroups. In §4, non-uniform averaging of stochastic systems is considered. In §5, weak convergence of measures generated by the averaging measure is discussed.

§ 2. Notation, terminology and auxiliary results

Throughout, we use the basic standard notions and facts connected with the Lebesgue integral, the spaces $L^p$ with their classical norms $\|\,\cdot\,\|_p=\|\,\cdot\,\|_{L^p}$ , and linear operators. In some constructions and results we use certain more special concepts introduced below.

In particular, we shall use the Orlicz classes of integrable functions generated by convex functions (see [62] and [76]). An increasing convex function $V$ on $[0,+\infty)$ is called an $N$ -function if

$\begin{equation*} V(0)=0, \quad V(s)>0 \quad \text{whenever } s>0,\quad \lim_{s\to 0} \frac{V(s)}{s}=0,\quad \lim_{s\to+\infty} \frac{V(s)}{s}=+\infty. \end{equation*}$

In other words,

$\begin{equation*} V(s)=\int_0^s p(t)\,dt, \end{equation*}$

where $p\geqslant 0$ is a non-decreasing function on $[0,+\infty)$ such that

$\begin{equation*} \lim_{t\to+\infty} p(t)=+\infty,\quad \lim_{t\to 0+} p(t)=0. \end{equation*}$

An $N$ -function $V$ is said to satisfy the $\Delta_2$ -condition for large values if, for some $C>0$ and $s_0>0$ , we have

$\begin{equation*} V(2s)\leqslant CV(s) \quad \forall\,s\geqslant s_0. \end{equation*}$

For every $\lambda>1$ this implies the estimate

$\begin{equation*} V(\lambda s)\leqslant C(\lambda) V(s) \quad \forall\,s\geqslant s_0 \end{equation*}$

with some $C(\lambda)>0$ . If this condition holds for $s_0=0$ , then $V$ is said to satisfy the global $\Delta_2$ -condition.

For such a function we obtain

$\begin{equation*} V(t+s)\leqslant C V(s)+CV(t). \end{equation*}$

To satisfy the $\Delta_2$ -condition for large values it suffices to have the estimate

$\begin{equation*} sV_r'(s)\leqslant C_1 V(s) \end{equation*}$

for the right-hand derivative $V_r'$ of the function $V$ outside some interval (see [62], Chap. 1, §4).

Typical examples of $N$ -functions $V$ satisfying the global $\Delta_2$ -condition are $V(s)= s^p$ with $p>1$ (but not with $p=1$ ) and also $V(s)=s\log (s+1)$ .

Below we use the known fact (see [62], §8 and Lemma 5.1 in §5) that, for every integrable function $f$ , there exists an $N$ -function $V$ satisfying the global $\Delta_2$ -condition for which the function $V\circ f$ is integrable.

For an $N$ -function $V$ on $[0,+\infty)$ the complementary (or complex conjugate) function $W$ is defined by

$\begin{equation*} W(u)=\max_{v\geqslant 0} [uv-V(v)]. \end{equation*}$

Hence Young's inequality $uv\leqslant V(v)+W(u)$ holds. The complementary function is also convex and increasing (and is an $N$ -function). For example, if $V(v)=v^p/p$ , where $p>1$ , then $W(u)=u^{q}/q$ , where $p^{-1}+q^{-1}=1$ .

The Orlicz class $L_V(\mu)$ generated by a given $N$ -function $V$ consists of the equivalence classes of $\mu$ -measurable functions $f$ for which $V\circ |f|\in L^1(\mu)$ . One has $L_V(\mu)\subset L^1(\mu)$ . It is known that the Orlicz class $L_V(\mu)$ is a convex set, but for an atomless measure $\mu$ it becomes a linear space precisely when the function $V$ satisfies the $\Delta_2$ -condition for large values. Even if $V$ does not satisfy this condition, one can introduce the Banach space $\overline{L}_V(\mu)$ of equivalence classes of measurable functions with finite Orlicz norm

$\begin{equation*} \|f\|_V=\sup\biggl\{\biggl|\int fg\,d\mu\biggr| \colon \int W(|g|)\,d\mu \leqslant 1\biggr\}, \end{equation*}$

where $W$ is the complementary function for $V$ . Then $L_V(\mu)\subset \overline{L}_V(\mu)$ , and if the $\Delta_2$ -condition holds for large values, then the two sets are equal (in the general case $\overline{L}_V(\mu)$ coincides with the linear span of the class $L_V(\mu)$ ). In addition, in the general case

$\begin{equation*} \int_X fg\,d\mu\leqslant \|f\|_V\|g\|_W \end{equation*}$

for $f\in \overline{L}_V(\mu)$ and $g\in \overline{L}_W(\mu)$ . The Orlicz norm is equivalent to the Luxemburg norm defined without the complementary function by the equality

$\begin{equation*} p_V(f)=\inf\biggl\{s>0\colon \int_X V\biggl(\frac{|f|}{s}\biggr)\,d\mu\leqslant 1\biggr\}. \end{equation*}$

One has $p_V(f)\leqslant \|f\|_V\leqslant 2p_V(f)$ . If the $\Delta_2$ -condition holds for large values, then convergence of a sequence $\{f_n\}$ to zero in one of these norms is equivalent to convergence to zero of the integrals of $V(|f_n|)$ (see [62], Theorem 9.4).

A linear operator $T$ on the space $L^p(\mu)$ is said to be non-negative if $Tf\geqslant 0$ for all $f\geqslant 0$ .

The norm of an operator $T$ on $L^p(\mu)$ will be denoted by $\|T\|_{\mathscr{L}(L^p)}$ , and the norm of an operator $T$ between Banach spaces $E$ and $F$ is denoted by $\|T\|_{\mathscr{L}(E,F)}$ .

If $(X,\mathscr{B},\mu)$ is a probability space and $\mathscr{A}$ is a sub- $\sigma$ -algebra of $\mathscr{B}$ , then the conditional expectation of a $\mu$ -integrable function $f$ is the $\mu$ -integrable function $\mathsf{E}(f\mid\mathscr{A})$ which is measurable with respect to $\mathscr{A}$ and such that

$\begin{equation*} \int_X f g\,d\mu=\int_X\mathsf{E}(f\mid\mathscr{A})g\,d\mu \end{equation*}$

for all bounded $\mathscr{A}$ -measurable functions $g$ . One can also say that this is the projection of $f$ with some special properties on the subspace of $\mathscr{A}$ -measurable functions (for functions in $L^2(\mu)$ this is the usual orthogonal projection).

Below we need the following Jensen inequality for operators, which can be extracted from the results in [43] but which for the reader's convenience we derive here from the usual Jensen inequality (we even give two derivations).

Lemma 2.1. Let $(X,\mathscr{B},\mu)$ be a probability space and let $T\colon L^\infty(\mu)\to L^\infty(\mu)$ be a non-negative operator with $\|T\|_{\mathscr{L}(L^\infty)}\leqslant 1$ . Suppose that $V$ is an increasing convex function on $[0,+\infty)$ with $V(0)=0$ . Then

$\begin{equation} V(|Tf|)\leqslant T(V\circ |f|)\quad \textit{almost everywhere}. \end{equation} \tag{ 2.1 }$

If $T$ extends to a continuous operator from $L^1(\mu)$ to $L^1(\mu)$ and $f\in L^1(\mu)$ is such that $V(|f|)\in L^1(\mu)$ , then this estimate remains valid.

Proof. Since $|Tf|\leqslant T|f|$ by the non-negativity of $T$ , we can assume that $f\geqslant 0$ . Moreover, it suffices to prove the assertion for bounded $f$ . Fix some $\mathscr{B}$ -measurable bounded versions of the functions $f_1=Tf$ and $f_2=T(V\circ f)$ .

For every $x$ , we can find measurable sets $B_n(x)$ such that for $\mu$ -almost all $x$ we get that $\mu(B_n(x))>0$ for all $n$ and

$\begin{equation*} f_1(x)=\lim_{n\to\infty}\frac{1}{\mu(B_n(x))}\int_{B_n(x)}f_1\,d\mu,\quad f_2(x)=\lim_{n\to\infty}\frac{1}{\mu(B_n(x))}\int_{B_n(x)}f_2\,d\mu. \end{equation*}$

For example, one can take the sets

$\begin{equation*} B_n(x)=\{y\colon |f_1(y)-f_1(x)|+|f_2(y)-f_2(x)|\leqslant n^{-1}\}. \end{equation*}$

Then for a fixed $n$ we cover $X$ by the measurable disjoint sets

$\begin{equation*} X_j=\{x\colon (f_1(x),f_2(x))\in K_j\}, \end{equation*}$

where a finite collection of disjoint Borel sets $K_j$ of diameter less than $1/n$ covers a square containing the values of the map $(f_1,f_2)$ , and we take only those sets $X_j$ which have positive $\mu$ -measure. For every point $x$ in such a set $X_j$ we have $X_j\subset B_n(x)$ , hence $\mu(B_n(x))>0$ . Moreover,

$\begin{equation*} \biggl|\int_{B_n(x)} f_i(y)\,\mu(dy)-f_i(x)\mu(B_n(x))\biggr|\leqslant n^{-1} \mu(B_n(x)), \qquad i=1,2, \end{equation*}$

which gives the indicated relations.

Now fix a point $x$ such that $\mu(B_n(x))>0$ for all $n$ and let $f_1(x)$ and $f_2(x)$ be the indicated limits. Consider the non-negative measure

$\begin{equation*} \sigma_{n,x}(B)=\frac{1}{\mu(B_n(x))}\int_{B_n(x)}T I_B\,d\mu= \int_B T^{*}I_{B_n(x)}\,d\mu, \end{equation*}$

where $T^*$ is the bounded operator on $L^\infty(\mu)$ adjoint to the operator $T$ on $L^1(\mu)$ . Thus, the measure $\sigma_{n,x}$ is given by the density $T^{*}I_{B_n(x)}$ with respect to the measure $\mu$ . The integral of a bounded measurable function $\varphi$ with respect to the measure $\sigma_{n,x}$ is given by

$\begin{equation*} \int_X \varphi(y)\,\sigma_{n,x}(dy)= \frac{1}{\mu(B_n(x))}\int_{B_n(x)}T\varphi\,d\mu. \end{equation*}$

It is clear that $0\leqslant \sigma_{n,x}(X)\leqslant 1$ , since $T1\leqslant 1$ .

By the condition $V(0)=0$ and the convexity of $V$ , for the subprobability measure $\sigma_{n,x}$ we have Jensen's inequality

$\begin{equation*} V\biggl(\int_X f\,d\sigma_{n,x}\biggr)\leqslant \int_X V\circ f\,d\sigma_{n,x}. \end{equation*}$

By our choice of $x$ and the continuity of $V$ we obtain $V(Tf(x))$ on the left-hand side and $T(V\circ f)(x)$ on the right-hand side as $n\to\infty$ .

Let us consider another justification for the case of $L^\infty(\mu)$ . We recall (for example, see [27], §11.6) that the Gelfand transform $G$ defines a linear isometry of the complex Banach algebra $L^\infty(\mu)$ and the space $C(\Delta)$ of continuous complex functions on the compact space $\Delta$ of maximal ideals of this Banach algebra; moreover, $G$ is a homomorphism of algebras and preserves the non-negativity of elements. Hence, the non-negative operator $T$ on $L^\infty(\mu)$ generates a non-negative operator $S$ on $C(\Delta)$ by the formula $S\varphi= GTG^{-1}\varphi$ . By the Riesz theorem, for every $x\in \Delta$ there exists a Radon measure $m_x$ on $\Delta$ for which

$\begin{equation*} S\varphi(x)=\int_\Delta \varphi(y)\,m_x(dy), \qquad \varphi\in C(\Delta). \end{equation*}$

This measure is non-negative because $S\varphi(x)\geqslant 0$ for $\varphi\geqslant 0$ . In addition, $m_x(\Delta)\leqslant 1$ since $S1\leqslant 1$ , as follows from the estimate $T1\leqslant 1$ and the equality $G1=1$ . By the usual Jensen's inequality for subprobability measures we get that $V(S\varphi)\leqslant S(V\circ \varphi)$ for $\varphi\geqslant 0$ . Now it remains to observe that $G(V\circ f)=V(Gf)$ for all non-negative $f\in L^\infty(\mu)$ , since for every continuous function $H$ on the real line we have $G(H\circ f)=H(Gf)$ , because this is true for all polynomials $H$ and remains true for their uniform limits on a closed interval containing the values of $f$ . Finally, it is worth noting that in most applications the operators with the stated properties are given in integral form by means of families of measures, so for them the assertion follows directly from Jensen's inequality. $\square$

Remark 2.2. If instead of the condition $\|T\|_{\mathscr{L}(L^\infty(\mu))}\leqslant 1$ we impose the weaker condition $\|T\|_{\mathscr{L}(L^\infty(\mu))}\leqslant \lambda$ , where $\lambda>1$ , and the function $V$ is required to satisfy the global $\Delta_2$ -condition, then from (2.1) we obtain the estimate

$\begin{equation} V(|Tf|)\leqslant C(\lambda)T(V\circ |f|)\quad \text{almost everywhere}. \end{equation} \tag{ 2.2 }$

Remark 2.3. (i) It is known (see [64], §4.1, Theorem 1.3, for example) that for every bounded operator $T$ on $L^\infty(\mu)$ there exists a non-negative operator $|T|$ with the same norm and defined on non-negative functions by

$\begin{equation*} |T|f=\sup\{|Tg|\colon |g|\leqslant f\} \end{equation*}$

such that for all functions in $L^\infty(\mu)$

$\begin{equation} \bigl|\,|T|(f)\bigr|\leqslant |T|(|f|). \end{equation} \tag{ 2.3 }$

For a contraction $T$ this gives the estimate

$\begin{equation*} V(|Tf|)\leqslant |T|(V\circ|f|). \end{equation*}$

(ii) For every bounded operator $T$ on $L^1(\mu)$ there also exists a non-negative operator $|T|$ on $L^1(\mu)$ with the same norm (and defined by the same formula) for which (2.3) is true. If the operator $T$ is a contraction on $L^\infty(\mu)$ , then so is $|T|$ . This was proved by Kantorovich [49] for more general lattices, and was later rediscovered by several authors in studying operators on $L^1$ (see [33], [30], and also the comments in [64], §4.1). Thus, any operator $T$ with finite norms on $L^1(\mu)$ and $L^\infty(\mu)$ is majorized by a non-negative operator $|T|$ with preservation of both norms. For a semigroup of operators $T_t$ , the operators $|T_t|$ constructed by the indicated formula do not always form a semigroup, but it was proved in [51] and [65] (see also [64], §7.2, Theorem 2.7) that if $\{T_t\}_{t\geqslant 0}$ is a contractive operator semigroup on $L^1(\mu)$ , then there exists a contractive semigroup $\{S_t\}_{t\geqslant 0}$ of non-negative operators for which $|T_tf|\leqslant S_t |f|$ , and moreover, $S_t\leqslant U_t$ for every contractive semigroup of non-negative operators $U_t$ with the property that $|T_tf|\leqslant U_t |f|$ . Unlike the case of a single operator, it is important here that the semigroup be contractive (see [51]). If the maps $t\mapsto T_t f$ are continuous on $(0,+\infty)$ for all $f\in L^1(\mu)$ , then this property is inherited by the semigroup $\{S_t\}_{t\geqslant 0}$ . Finally, it follows from the proofs given in these works that if $\|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1$ for all $t$ , then $\|S_t\|_{\mathscr{L}(L^\infty)}\leqslant 1$ .

The following technical lemma will be useful below.

Lemma 2.4. Let $\varphi \in L^{\infty} [0,+\infty)$ and $\varrho\in L^1[0,+\infty)$ . Then the function

$\begin{equation*} t\mapsto \int_0^{+\infty} \varphi(ts)\varrho(s)\,ds= \int_0^{+\infty} \varphi(s)t^{-1}\varrho(t^{-1}s)\,ds \end{equation*}$

is continuous on $(0,+\infty)$ . The same is true if $\varphi\in L^p[0,+\infty)$ and $\varrho\in L^q[0,+\infty)$ , where $p\in (1,+\infty)$ and $p^{-1}+q^{-1}=1$ , or if the function $\varrho$ is concentrated on a bounded interval and there is an $N$ -function $V$ such that $V\circ \varrho$ and $W\circ \varphi$ are integrable, where $W$ is the complementary function for $V$ .

If $\varphi$ is an arbitrary non-negative measurable function on $[0,+\infty)$ , then

$\begin{equation*} \sup_{t>0}\,\int_0^{+\infty} \varphi(ts)\varrho(s)\,ds= \sup_{t>0, t\in \mathbb{Q}}\,\int_0^{+\infty}\varphi(ts)\varrho(s)\,ds, \end{equation*}$

where infinite values of integrals and suprema are allowed.

Proof. It is well known that for every measurable function $f$ on $(0,+\infty)$ the functions $f(t_ns)$ converge to $f(ts)$ in measure on every compact interval as $t_n\to t$ in $(0,+\infty)$ . This implies the continuity of the map $t\mapsto t^{-1}f( t\,\cdot\,)$ from $(0,+\infty)$ to the space $L^1[0,+\infty)$ and also to the space $L^p[0,+\infty)$ or the Orlicz space $L_V$ provided that $f$ belongs to this space. Hence, the integral of a product with the function $\varphi$ is continuous in $t$ .

The last assertion of the lemma is valid for bounded functions $\varphi_n=\min(\varphi,n)$ by the continuity in $t$ of the corresponding integrals, which increase pointwise to the (possibly, infinite) integral containing $\varphi$ . Then the suprema for $\varphi_n$ increase to the supremum for $\varphi$ . $\square$

We recall that bounded countably additive measures on the Borel $\sigma$ -algebra $\mathscr{B}(X)$ of a topological space $X$ are called Borel measures. A non-negative Borel measure $\mu$ is called a Radon measure if for any Borel set $B$ and any $\varepsilon>0$ , there exists a compact set $K\subset B$ such that $\mu(B\setminus K)<\varepsilon$ . A signed Borel measure $\mu$ is said to be Radon if its total variation $|\mu|$ defined by $|\mu|=\mu^{+}-\mu^{-}$ is Radon, where $\mu^{+}$ and $\mu^{-}$ are the positive and negative parts of $\mu$ , respectively (information about measures can be found in [15]).

We also recall that a Hausdorff topological space said to be Souslin if it is the image of some complete separable metric space under a continuous map. On Souslin spaces all Borel measures are Radon.

A sequence of Borel measures $\mu_n$ on a topological space $X$ is said to be weakly convergent to a Borel measure $\mu$ if for every bounded continuous function $f$ on $X$

$\begin{equation*} \int_X f\,d\mu=\lim_{n\to\infty} \int_X f\,d\mu_n. \end{equation*}$

This convergence is weaker than the convergence $\mu_n(B)\to \mu(B)$ for every Borel set. However, in the case of Radon probability measures on completely regular spaces weak convergence is equivalent to convergence on all Borel sets $B$ such that the topological boundary of $B$ (the difference between the closure and the interior) has zero $\mu$ -measure. About weak convergence of measures see [15] and [19].

§ 3. Non-uniform Kozlov–Treschev averagings and the ergodic theorem

We have already stated the classical ergodic theorem on convergence of averages almost everywhere, that is, the so-called individual ergodic theorem (which differs from ergodic theorems on convergence in some norm). This theorem has interesting and non-trivial generalizations (proved by other means) to the case of a semigroup of bounded linear operators $T_t$ on $L^1(\mu)$ . We shall present several such generalizations. First we recall that a semigroup $\{T_t\}_{t\geqslant 0}$ of bounded linear operators on a Banach space $E$ is said to be strongly measurable if for every $f\in E$ the map $t\mapsto T_tf$ with values in $E$ is Lebesgue measurable. Such a semigroup is said to be strongly integrable on compact intervals if the indicated maps are Bochner integrable on every compact interval. Strong measurability implies the continuity of the map $t\mapsto T_t f$ for $t>0$ , but it does not imply the strong continuity of the semigroup, which is continuity of the map $t\mapsto T_t f$ also at zero (see [4]). It is known (see [34], Theorem III.11.17) that for any Bochner integrable map $\psi\colon [a,b]\to L^1(\mu)$ there exists a real function $h$ on $[a,b]\times X$ which is integrable with respect to the product of Lebesgue measure and $\mu$ (and which is measurable with respect to the product of the corresponding $\sigma$ -algebras) such that for almost every $t$ the function $x\mapsto h(t,x)$ is a representative of the equivalence class of the element $\psi(t)\in L^1(\mu)$ . In [23] (Exercise 1.8.14) a simple construction is described that, in the case of a separable measure $\mu$ (that is, a measure with separable $L^1(\mu)$ ) and a family of $\mu$ -integrable functions $\xi_s$ with $s\in S$ and $(S,\mathscr{S})$ a measurable space such that the integrals of the functions $\xi_s$ over sets in $\mathscr{B}$ are measurable in $s$ , enables one to construct an $\mathscr{S}\otimes \mathscr{B}$ -measurable function $(s,x)\mapsto \xi(x,s)$ with $\xi_s(x)=\xi(s,x)$ almost everywhere for every fixed $s$ . In place of separability of the measure $\mu$ it suffices to have separability of the subspace generated by all the $\xi_s$ , so this construction applies to any continuous map $s\mapsto \xi_s$ from $(0,+\infty)$ to $L^1(\mu)$ . With the aid of the indicated version of $T_tf$ one can give meaning to the integral of $T_tf(x)$ with respect to $t$ for any fixed $x$ . The situation is similar for general Borel measures on $(0,+\infty)$ instead of Lebesgue measure. Throughout, in integrals we mean such versions. It is useful to note that if a function on $\mathbb{R}\times X$ is continuous in the first argument and $\mathscr{B}$ -measurable in the second argument, then it is $\mathscr{B}(\mathbb{R})\otimes \mathscr{B}$ -measurable (see [15], Lemma 6.4.6). In addition, Lemma 2.4 enables us, under the conditions we are considering, to compute the maximal functions introduced below by using rational $t$ .

The following result was proved in the paper [33] by Dunford and Schwartz (which they included in §11 of Chap. IV and §7 of Chap. VIII of their well-known monograph [34]).

Theorem 3.1. Let $\{T_t\}_{t\geqslant0}$ be a strongly measurable operator semigroup on $L^1(\mu)$ such that

$\begin{equation*} \|T_t\|_{\mathscr{L}(L^1)}\leqslant 1 \quad\textit{and} \quad \|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1. \end{equation*}$

Then for every function $f\in L^1(\mu)$ the averages

$\begin{equation*} A_tf(x)=\frac{1}{t}\int_0^t T_sf(x)\,ds \end{equation*}$

converge $\mu$ -almost everywhere as $t\to+\infty$ .

The limit function $\mathscr{P}f$ is some projection of the function $f$ onto the closed subspace of all functions in $L^1(\mu)$ that are invariant with respect to the operators $T_t$ . However, one should bear in mind that, unlike the Hilbert space $L^2(\mu)$ , a projection onto a closed subspace in $L^1(\mu)$ need not exist or may be non-unique. If the $T_t$ are Markov operators, which means that $0\leqslant T_t f\leqslant 1$ for $0\leqslant f\leqslant 1$ and $T_t1=1$ , and the measure $\mu$ is invariant with respect to them, that is,

$\begin{equation*} \int_X T_tf\,d\mu=\int_X f\,d\mu, \end{equation*}$

which also gives the equality $\|T_t\|_{\mathscr{L}(L^1)}=\|T_t\|_{\mathscr{L}(L^\infty)}=1$ , then $\mathscr{P}f$ coincides with the conditional expectation of $f$ with respect to the $\sigma$ -algebra $\mathscr{I}$ generated by all functions in $L^1(\mu)$ that are invariant with respect to the $T_t$ . If only constants have this property, then the limit in the theorem equals the integral of $f$ . For general semigroups the limit function $\mathscr{P}f$ can differ from the conditional expectation. For example, if $T_tf=e^{-t}f$ , then the limit is zero.

The same authors (see [34], Theorem VIII.7.7) obtained the following useful estimate. It employs the following convention. It is known (see [15], Theorem 4.7.1) that, given a family $\{h_\alpha\}$ of $\mu$ - measurable functions for which there exists a $\mu$ -measurable function $h$ with the property that for each $\alpha$ we have ${h_\alpha(x)\leqslant h(x)}$ almost everywhere (such a function $h$ is called a lattice upper bound of the given family), there exists an at most countable set of indices $\alpha_n$ such that for each $\alpha$ we have $h_\alpha(x)\leqslant \sup_n h_{\alpha_n}(x)$ almost everywhere. Any other lattice upper bound of the given family is $\geqslant\sup_n h_{\alpha_n}$ almost everywhere. This enables one to define the lattice least upper bound (lattice supremum) $\sup_\alpha h_\alpha$ as a measurable function by taking the supremum over a countable subset. Of course, with this convention one can obtain a function that is smaller than the usual $\sup_\alpha h_\alpha(x)$ , and moreover, the usual supremum can give a non-measurable function. Nevertheless, if $\alpha\in\mathbb{R}$ and the function $h_\alpha(x)$ is measurable in both arguments, then the usual supremum with respect to $\alpha$ will also give a $\mu$ -measurable function (see [15], Corollary 2.12.8 and Exercise 6.10.42), but it is not always the lattice supremum (for example, if $f_\alpha(x)=1$ for $x=\alpha$ and $f_\alpha(x)=0$ for $x\not=\alpha$ on an interval with Lebesgue measure, then the usual supremum equals $1$ while the lattice supremum is $0$ ).

Theorem 3.2. Suppose that in the situation of the previous theorem $\{f_\alpha\}_{\alpha \in \mathfrak{A}}\subset L^p(\mu)$ , where $p\in [1,+\infty)$ , is a family of functions such that there exists a function $g\in L^p(\mu)$ with $|f_\alpha|\leqslant g$ for all $\alpha$ . Let

$\begin{equation*} f^{*}(x)=\sup_\alpha\,\sup_{t>0}|A_t f_\alpha(x)| \end{equation*}$

(in the sense of the lattice supremum). Then in the case $p>1$

$\begin{equation*} f^{*}\in L^p(\mu)\quad\textit{and} \quad \|f^*\|_{L^p(\mu)}\leqslant 2\biggl(\frac{p}{p-1}\biggr)^{1/p} \|g\|_{L^p(\mu)}. \end{equation*}$

In the case $p=1$ , and under the additional assumption that $g\max(\log g,0)\in L^1(\mu)$ ,

$\begin{equation*} \|f^*\|_{L^1(\mu)}\leqslant 2+2\int_X g\max (\log g, 0)\,d\mu. \end{equation*}$

Note that in the case of a single function $f$ (which is our interest here), to compute $f^{*}(x)$ we can take the supremum over rational $t$ by the continuity of $A_t f(x)$ in $t$ , and for this we do not need the general remark made above.

It is known that the estimates $\|T_t\|_{\mathscr{L}(L^1)}\leqslant 1$ and $\|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1$ imply the inequality $\|T_t\|_{\mathscr{L}(L^p)}\leqslant 1$ for all $p\in (1,+\infty)$ . It was noted in [37] (see also [2]) that in the case of non-negative operators the last inequality gives the estimate $\|f^*\|_p\leqslant p(p-1)^{-1}\|f\|_p$ .

In [41] estimates for $f^{*}$ were studied in the weighted space $L^p(w\cdot\mu)$ for a strongly continuous semigroup of non-negative operators $T_t$ on $L^p(\mu)$ , and it was shown that the boundedness of $f^{*}$ in $L^p(w\cdot\mu)$ is equivalent to the uniform boundedness of the norms of the operators $A_t$ themselves on $L^p(w\cdot\mu)$ .

In the study of convergence almost everywhere the following theorem of Banach [8] (see also [9]) has proved useful. In this theorem $L^0(\mu)$ is the space of equivalence classes of $\mu$ -measurable functions equipped with the topology of convergence in measure, which is generated by the metric

$\begin{equation*} d(f,g)=\int_X \min(|f-g|,1)\,d\mu, \end{equation*}$

making $L^0(\mu)$ a complete metric topological vector space.

Theorem 3.3. Let $E$ be a Banach space and let $U_n\colon E\to L^0(\mu)$ be a sequence of continuous linear operators such that for every $f$ in some dense set the limit $\lim_{n\to\infty}U_nf(x)$ exists for $\mu$ -almost all $x$ . Suppose also that for every $v\in E$

$\begin{equation*} \limsup_{n\to\infty} |U_n v(x)|<+\infty \quad \textit{$\mu$-almost everywhere.} \end{equation*}$

Then for every $v\in E$ the limit

$\begin{equation*} Uv(x)=\lim_{n\to\infty}U_nv(x) \end{equation*}$

exists almost everywhere, and $U\colon E\to L^0(\mu)$ is a continuous linear operator.

In addition, an analogous assertion is true for a family of continuous linear operators $U_\lambda\colon E\to L^0(\mu)$ , $\lambda\geqslant 0$ , with $n\to\infty$ replaced by $\lambda\to+\infty$ , under the additional condition of the uniform continuity of $U_\lambda v$ in measure on compact intervals, that is, under the condition that for every $L>0$ and every pair of numbers $\varepsilon>0$ and $R >0$ there exists a $\delta>0$ such that for every vector $v\in E$ with $\|v\|<\delta$ one has the estimate $\mu(x\colon |U_\lambda v(x)|>R)<\varepsilon$ for all $\lambda\leqslant L$ .

The last condition of uniform continuity in measure is satisfied in all the situations of interest for us, because for this condition to hold it suffices that the norm $\|U_\lambda\|_{\mathscr{L}(E,L^1)}$ be bounded by some number $C(L)$ on the interval $[0,L]$ , since by the Chebyshev inequality this gives us that

$\begin{equation*} \mu(x\colon |U_\lambda v(x)|>R)\leqslant \frac{\|U_\lambda v\|_{L^1(\mu)}}{R}\leqslant \frac{C(L)\|v\|}{R}\,, \qquad \lambda \in [0,L]. \end{equation*}$

If in this theorem of Banach one takes everywhere finite versions of the measurable functions $U_n v$ , then $\limsup_{n\to\infty}$ can be replaced by $\sup_n$ .

In [32], [33], and [34], Theorem IV.11.3, the reader can find the following generalization of Banach's result.

Theorem 3.4. Let $A_1\supset A_2\supset \cdots$ be countable sets. Suppose that for every $a\in A_1$ a continuous linear operator $T_a$ from a Banach space $E$ to $L^0(\mu)$ is given such that for every $v\in E$

$\begin{equation*} \sup_{a\in A_1}|T_a v(x)|<+\infty \quad \textit{$\mu$-almost everywhere}. \end{equation*}$

Suppose also that for every $f$ in some dense subset of $E$ ,

$\begin{equation} \lim_{n\to\infty}\,\sup_{a,b\in A_n}|T_af(x)-T_bf(x)|=0 \quad \textit{$\mu$-almost everywhere}. \end{equation} \tag{ 3.1 }$

Then (3.1) is true for all $f\in E$ .

In our situation this can be applied to the countable set of times $A_1=\mathbb{Q}\cap [0,+\infty)$ and its subsets $A_n=A_1\cap [n,+\infty)$ .

Below we consider applications of these theorems to generalizations of the Kozlov–Treschev theorem, but we note at once that even without these rather subtle facts one can easily obtain the assertion (i) from the next proposition. The assertion (ii) follows from the Banach theorem (it is easy to verify that its hypotheses are satisfied).

Proposition 3.5. (i) Let $E$ be a Banach space continuously embedded in $L^1(\mu)$ , let $M$ be a Banach space continuously embedded in $L^1[0,+\infty)$ , and let $\{T_t\}_{t\geqslant0}$ be a family of continuous linear operators from $E$ to $L^0(\mu)$ such that the functions $s\mapsto T_{ts}v(x)\varrho(s)$ belong to $L^1[0,+\infty)$ for $\mu$ -almost all $x$ when $t\geqslant 0$ , $v\in E$ , and $\varrho\in M$ . Suppose that for every pair $f\in E$ and $\nu=\varrho\,ds$ with $\varrho\in M$ the estimate

$\begin{equation*} |K_t^\nu f(x)|\leqslant \|f\|_E\|\varrho\|_M \end{equation*}$

holds for all $t\geqslant 0$ and almost all $x$ . If the limit $\lim_{t\to+\infty} K_t^\nu f(x)$ exists almost everywhere for elements $f$ and $\varrho$ of sets dense in $E$ and $M$ , respectively, then this is true for all $f\in E$ and $\varrho\in M$ .

(ii) This assertion remains valid if the estimate above is replaced by the weaker estimate

$\begin{equation*} |K_t^\nu f(x)|\leqslant \Psi(\|f\|_E ,\|\varrho\|_M,x), \end{equation*}$

where $(u,v,x)\mapsto \Psi(u,v,x)$ is a non-negative function on $\mathbb{R}\times \mathbb{R}\times X$ that is continuous in $(u,v)$ and measurable in $x$ .

Example 3.6. The Kozlov–Treschev theorem follows from this proposition by the classical ergodic theorem if we take $E=L^\infty(\mu)$ and $M=L^1[0,+\infty)$ , since $L^1[0,+\infty)$ contains a dense subset of step functions with bounded support, that is, finite linear combinations of the indicator functions of intervals, and for the indicator function of an interval $[a,b]$ the assertion is true by the classical ergodic theorem (since it is true for the indicator functions of $[0,b]$ and $[0,a]$ , which can be verified by a change of variable, as in the case of the interval $[0,1]$ ). Moreover, it is obvious that

$\begin{equation*} |K_t^\nu f(x)|\leqslant \|f\|_{L^\infty}\|\varrho\|_{L^1}. \end{equation*}$

This observation clearly remains in force in the more general situation of Theorem 3.1. Actually, we only need here to have the conclusion of Theorem 3.1 for bounded functions $f$ together with the estimate $\|T_t f\|_{L^\infty}\leqslant C\|f\|_{L^\infty}$ .

For $E$ and $M$ one can take suitable Orlicz spaces, provided that we have managed to estimate the maximal function $\sup_{t>0} |K_t^\nu f(x)|$ .

We note straightaway that convergence of $K_t^\nu f$ in $L^1(\mu)$ is not a problem, because if we have the estimate $\|T_t\|_{\mathscr{L}(L^1)}\leqslant C$ for some probability measure $\nu$ , then we get the estimate

$\begin{equation*} \|K_t^\nu\|_{\mathscr{L}(L^1)}\leqslant C, \end{equation*}$

which implies the convergence of $K_t^\nu f$ in $L^1(\mu)$ for all $f\in L^1(\mu)$ provided there is convergence for all elements $f$ in some dense set, say, for bounded functions or functions with finitely many values. Therefore, for every sequence of points $t_n$ tending to infinity there is a subsequence $\{n_j\}$ for which the functions $K_{t_{n_j}}^\nu f$ converge $\mu$ -almost everywhere. If $\|T_t\|_{\mathscr{L}(L^p)}\leqslant C$ and $f\in L^p(\mu)$ , then the same estimate implies convergence in $L^p(\mu)$ .

Here we are interested not in methods for giving an independent proof of the existence of a pointwise limit for non-uniform averages (although we will say something about such methods), but rather in ways to derive it from the case of the usual averaging. One such way was already pointed out by Dunford and Schwartz [33], who observed that for a monotonically decreasing positive integrable function $\beta$ on $[0,+\infty)$ ,

$\begin{equation*} \int_0^{+\infty} T_{s}f(x)\beta(s)\,ds \leqslant f^{*}(x)\int_0^{+\infty}\beta(s)\,ds, \end{equation*}$

where the classical maximal function generated by $\{T_t\}_{t\geqslant0}$ is defined by

$\begin{equation} f^*(x)=\sup_{t>0}|A_tf(x)|=\sup_{t>0}\frac{1}{t} \biggl|\int_0^t T_sf(x)\,ds\biggr|. \end{equation} \tag{ 3.2 }$

In our situation we have the same inequality:

$\begin{equation} \biggl|\int_0^{+\infty} T_{ts}f(x)\beta(s)\,ds\biggr| \leqslant f^{*}(x)\int_0^{+\infty}\beta(s)\,ds. \end{equation} \tag{ 3.3 }$

It follows from the case $t=1$ using the change of variable $r=ts$ . We explain the justification in this case. For a continuously differentiable function $\beta$ , we get by integrating by parts that

$\begin{equation*} \int_0^a T_{s}f(x)\beta(s)\,ds=\int_0^{a}\frac{d}{ds}(sA_sf(x))\beta(s)\,ds= a A_{a}f(x)\beta(a)-\int_0^{a}sA_sf(x)\beta'(s)\,ds, \end{equation*}$

which can be estimated from above by the quantity

$\begin{equation*} f^{*}(x)\biggl(a\beta(a)+\int_0^{a}s(-\beta'(s))\,ds\biggr)= f^{*}(x)\int_0^a \beta(s)\,ds. \end{equation*}$

Letting $a\to +\infty$ , we obtain the desired estimate for a continuously differentiable decreasing function $\beta$ , and the general case is deduced by means of approximations (though in the previous calculations we could use the Lebesgue–Stieltjes integral for a general monotone function $\beta$ ). The inequality for $-f$ gives (3.3) with the absolute value.

Thus, the result of Banach stated above implies the following assertion.

Theorem 3.7. Suppose that for a strongly measurable semigroup $\{T_t\}_{t\geqslant0}$ of operators on $L^1(\mu)$ the classical theorem on the existence of limits of the functions $A_tf(x)$ almost everywhere holds for all $f\in L^1(\mu)$ . If the density $\varrho$ of the measure $\nu$ is estimated from above by an integrable monotonically decreasing function, $\sup_{t\geqslant 0} \|T_t\|_{\mathscr{L}(L^\infty)}<+\infty$ , and for each $t$ the function $s\mapsto \|T_{ts}\|_{\mathscr{L}(L^1)}\varrho(s)$ is integrable on $[0,+\infty)$ , then for every function $f\in L^1(\mu)$

$\begin{equation} \lim_{t\to+\infty} K_t^\nu f(x)=\mathscr{P}f(x) \end{equation} \tag{ 3.4 }$

almost everywhere. If the measure $\mu$ is invariant with respect to the operators $T_t$ and there are no non-constant functions which are invariant with respect to the semigroup $\{T_t\}_{t\geqslant0}$ , then

$\begin{equation} \lim_{t\to+\infty} K_t^\nu f(x)=\int_X f\,d\mu \end{equation} \tag{ 3.5 }$

for $\mu$ -almost all $x$ .

Proof. The integrability of $\|T_s\|_{\mathscr{L}(L^1)}\varrho(s)$ ensures the existence of $K_t^\nu f(x)$ for almost all $x$ , since the integral of $|T_{ts}f(x)|\varrho(s)$ with respect to the measure $\mu\otimes\nu$ turns out to be finite. The inequality (3.3) ensures that $\sup_t |K_t^\nu f(x)|<+\infty$ almost everywhere for all $f\in L^1(\mu)$ . In order to apply Banach's theorem, we have to verify that the assertion of the theorem is true for bounded $f$ . This follows from Proposition 3.5 applied to $E=L^\infty(\mu)$ and $M=L^1[0,+\infty)$ , where as a dense subset one should take the set of step functions $\varrho$ with bounded support (for them convergence is ensured by the classical theorem, the validity of which for the given semigroup is assumed in our hypotheses). $\square$

Corollary 3.8. Let $\{T_t\}_{t\geqslant0}$ be a strongly measurable semigroup of operators on $L^1(\mu)$ such that $\|T_t\|_{\mathscr{L}(L^1)}\leqslant 1$ and $\|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1$ . If the density $\varrho$ of the measure $\nu$ is estimated from above by an integrable monotonically decreasing function, then the conclusion of the previous theorem holds for every function $f\in L^1(\mu)$ .

Remark 3.9. The Banach theorem cited above leads to the following somewhat more general conclusion: if under the conditions on the semigroup indicated in Theorem 3.7 the equality (3.4) holds for all $f\in L^1(\mu)$ for some density $\varrho$ , then it remains valid for every density $\varrho_1\leqslant C\varrho$ , where $C$ is a constant. Note also that one can combine different conditions on the density $\varrho$ by writing it as a sum or by partitioning the half-line.

Unlike the case of the classical averaging, where the convergence is for all integrable functions $f$ , in the Kozlov–Treschev construction it is necessary to impose certain restrictions on $f$ or on the connection between $f$ and $\varrho$ . Let us consider a modification of the example in [20].

Example 3.10. Let $X$ be the unit circle with normalized Lebesgue measure $\mu$ , and let the transformation $T_t$ be rotation by the angle $-t$ , that is, let $T_tz=\exp(i\theta-it)$ for $z=\exp(i\theta)$ , where $\theta\in [0,2\pi)$ . Also, let

$\begin{equation*} f(z)=|\sin \theta|^{-\alpha}, \qquad \alpha\in \biggl(\frac{1}{2}\,,1\biggr). \end{equation*}$

There is an absolutely continuous probability measure $\nu$ with support in $[0,1]$ such that $\limsup_{n\to\infty} K_{n}^\nu f(z)=+\infty$ for all $z$ . Moreover, the density $\varrho$ of $\nu$ can be taken in $L^r[0,1]$ with some $r>1$ , and the function $f$ belongs to $L^{p}(\mu)$ with $p\in [1,1/\alpha)$ .

Indeed, let $\alpha=1/2+\delta$ and $\delta\in (0,1/2)$ . We take the probability measure $\nu$ with density $\varrho$ concentrated on the set

$\begin{equation*} \bigcup_{k=4}^{\infty}\biggl[\frac1k-\frac1{k^3},\frac1k\biggr] \end{equation*}$

by specifying

$\begin{equation*} \varrho(s):= ck^{2-\delta} \quad \text{if}\quad s\in \biggl[\frac{1}{k}-\frac{1}{k^3}\,,\frac{1}{k}\biggr],\quad k\geqslant 4. \end{equation*}$

At all other points in $[0,1]$ we set $\varrho(s)=0$ , and the constant $c$ is taken so that $\nu$ is a probability measure. Then

$\begin{equation*} \begin{aligned} \, K_{n}^\nu f(z)&=\int_0^{1}f(\exp(i\theta-ins))\,\varrho(s)\,ds \\ &=\sum_{k=4}^\infty ck^{2-\delta}\int_{1/k-1/k^3}^{1/k} |\sin(\theta-ns)|^{-\alpha}\,ds \\ &=\sum_{k=4}^\infty ck^{2-\delta}n^{-1}\int_{n/k-n/k^3}^{n/k} |\sin (\theta-u)|^{-\alpha}\,du. \end{aligned} \end{equation*}$

It is known (see [50], §8) that for every $\theta\in [0,2\pi)$ there exists an infinite set of pairs of natural numbers $n$ and $k$ such that

$\begin{equation*} \biggl|\theta-\frac{n}{k}\biggr|\leqslant \frac{1}{k^2}\,. \end{equation*}$

For such triples $\theta$ , $n$ , $k$ we find that

$\begin{equation*} \frac{n}{k}\leqslant 2\pi+1\quad\text{and}\quad |u-\theta|\leqslant \frac{2\pi+2}{k^2}\leqslant \frac{9}{k^2} \quad\text{for all}\quad u\in \biggl[\frac{n}{k}-\frac{n}{k^3}\,,\frac{n}{k}\biggr]. \end{equation*}$

Therefore, by the equality $2p=1+2\delta$ we have

$\begin{equation*} \begin{aligned} \, K_{n}^\nu f(z)&\geqslant ck^{2-\delta} n^{-1} \int_{n/k-n/k^3}^{n/k}|\sin(u-\theta)|^{-\alpha}\,du \\ &\geqslant ck^{2-\delta} n^{-1}\int_{n/k-n/k^3}^{n/k} \frac{k^{2\alpha}}{9}\,du=\frac{c}{9}k^{\delta}\geqslant \frac{c}{9}\biggl(\frac{n}{2\pi+1}\biggr)^{\delta}. \end{aligned} \end{equation*}$

Hence $\limsup_{n\to\infty} K_{n}^\nu f(z)=+\infty$ . Note that $\varrho\in L^r[0,1]$ for $r<2/(2-\delta)$ .

Remark 3.11. The following result of Stein ([86], p. 73) is well known. Let $\{T_t\}_{t\geqslant0}$ be a strongly continuous operator semigroup on $L^1(\mu)$ such that $\|T_t\|_{\mathscr{L}(L^1)}\leqslant 1$ and $\|T_t\|_{\mathscr{L}(L^{\infty})}\leqslant 1$ and such that these operators are self-adjoint on $L^2(\mu)$ . Then the maximal function

$\begin{equation*} M_0 f(x):=\sup_{t>0}|T_tf(x)| \end{equation*}$

has the property that for every $p>1$ there exists a number $C_p> 0$ such that for all $f\in L^p(\mu)$

$\begin{equation*} \|M_0f\|_{L^p(\mu)}\leqslant C_p\|f\|_{L^p(\mu)}. \end{equation*}$

In our situation we obtain from this estimate the inequalities

$\begin{equation*} |K_t^\nu f(x)|\leqslant M_0 f(x)\quad\text{and} \quad \|K_t^\nu f\|_{L^p(\mu)}\leqslant C_p \|f\|_{L^p(\mu)}. \end{equation*}$

Thus, for every absolutely continuous measure $\nu$ and every function $f\in L^p(\mu)$ with $p>1$ one has convergence of $K_t^\nu f(x)$ almost everywhere. This follows from the Banach theorem (for which one only needs the estimate $M_0f(x)<+\infty$ almost everywhere), but is easily verified directly. However, the operators $T_t$ generated by transformations $g_t$ of the space $X$ are usually not symmetric on $L^2(\mu)$ . On the contrary, general operator semigroups arising in applications are frequently symmetric. For example, the Ornstein–Uhlenbeck semigroup (see [18], [14], [16], and [17]) is defined by the formula

$\begin{equation*} T_tf(x)=\int_{\mathbb{R}}f\bigl(e^{-t}x-\sqrt{1-e^{-2t}}\,y\bigr)\,\gamma(dy) \end{equation*}$

on the space $L^1(\gamma)$ with respect to the standard Gaussian measure with density $(2\pi)^{-1/2}e^{-x^2/2}$ on the real line. The Ornstein–Uhlenbeck semigroup is non-negative and symmetric on the space $L^2(\gamma)$ , and moreover, it is contractive on all the spaces $L^p(\gamma)$ . The generator of this semigroup on $L^2(\gamma)$ is the Ornstein–Uhlenbeck operator given by $Lf(x)=f''(x)-xf'(x)$ on the space $W^{2,1}(\gamma)$ of Sobolev functions with respect to $\gamma$ . The measure $\gamma$ is invariant with respect to this semigroup, that is,

$\begin{equation*} \int T_tf\,d\mu=\int f\,d\gamma \quad \forall\,f\in L^1(\gamma). \end{equation*}$

Note that in the results presented here we do not assume that the measure $\mu$ is invariant with respect to the operators $T_t$ (of course, for the operators generated by measure-preserving transformations $g_t$ this will be automatically the case), but if $\mu$ is nevertheless invariant for non-negative operators $T_t$ , then $\|T_t\|_{\mathscr{L}(L^1)}\leqslant 1$ . If, in addition, $T_t1=1$ (that is, the operators are Markov), then $\|T_t\|_{\mathscr{L}(L^1)}=1$ .

The case of a measure $\nu$ with bounded support has some features making it possible to apply the classical ergodic theorem. Suppose that for a $\mu$ -integrable function $f$ we choose some $N$ -function $V$ on $[0,+\infty)$ satisfying the global $\Delta_2$ -condition and such that the function $V(f)$ is also integrable. Such a choice is always possible, as explained in §2. Suppose in addition that the following condition holds for a strongly measurable semigroup $\{T_t\}_{t\geqslant0}$ of bounded operators on $L^1(\mu)$ : for some $C_0\geqslant 0$

$\begin{equation} V(|T_t f|)\leqslant C_0 T_t(V\circ |f|)\quad\text{$\mu$-almost everywhere.} \end{equation} \tag{ 3.6 }$

This condition holds with $C_0=1$ by Lemma 2.1 if the operators $T_t$ are continuous and non-negative on $L^1(\mu)$ and are contractions on $L^\infty(\mu)$ . It holds with some $C_0=C(\lambda)$ if the operators $T_t$ are continuous and non-negative on $L^1(\mu)$ and their norms on $L^\infty(\mu)$ are bounded by $\lambda$ .

Lemma 3.12. Suppose that the condition (3.6) is satisfied and the operators $T_t$ are non-negative. Let $W$ be the complementary function for $V$ . If the density $\varrho$ on the interval $[0,\tau]$ is such that the function $W(\varrho)$ is integrable on $[0,\tau]$ , then $\mu$ -almost everywhere

$\begin{equation} \int_0^\tau |T_{ts}f(x)|\,\varrho(s)\,ds \leqslant C_0 \tau V(|f|)^{*}(x)+\int_0^\tau W(\varrho(s))\,ds. \end{equation} \tag{ 3.7 }$

Moreover, this inequality holds if $\|T_t\|_{\mathscr{L}(L^1)}\leqslant 1$ and $\|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1$ .

Proof. By Young's inequality, for every $t>0$

$\begin{equation*} \int_0^\tau |T_{ts}f(x)|\,\varrho(s)\,ds \leqslant \int_0^\tau V(|T_{ts}f(x)|)\,ds+\int_0^\tau W(\varrho(s))\,ds. \end{equation*}$

The first integral on the right-hand side does not exceed

$\begin{equation*} C_0\int_0^\tau|T_{ts}(V\circ|f|)(x)|\,ds= \frac{C_0}{t}\int_0^{\tau t}T_{r}(V\circ |f|)(x)\,dr\leqslant C_0\tau(V\circ |f|)^*(x), \end{equation*}$

which gives the desired estimate. In the case of a contractive semigroup on $L^1$ and $L^\infty$ we can take the dominating semigroup of non-negative operators $S_t$ indicated in Remark 2.3. These are contractions on $L^1$ and on $L^\infty$ , and hence they satisfy (3.6), so the inequality (3.7) holds for them, and then it holds also for the original semigroup. $\square$

The Banach theorem stated above and the convergence of the classical averages for bounded functions $f$ imply the following assertion in view of Lemma 3.12.

Theorem 3.13. Let $\{T_t\}_{t\geqslant0}$ be a strongly measurable operator semigroup on $L^1(\mu)$ such that

$\begin{equation*} \|T_t\|_{\mathscr{L}(L^1)}\leqslant 1, \quad \|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1. \end{equation*}$

Suppose that the density $\varrho$ is concentrated on the interval $[0,\tau]$ and the function $W(\varrho)$ is integrable on $[0,\tau]$ , where $W$ is the complementary function for $V$ . Then the equality (3.4) holds for $\mu$ -almost all $x$ .

Below we will consider the maximal function also for non-uniform averagings.

Example 3.14. Let $\{T_t\}_{t\geqslant0}$ be a strongly measurable operator semigroup on $L^1(\mu)$ such that

$\begin{equation*} \|T_t\|_{\mathscr{L}(L^1)}\leqslant 1, \quad \|T_t\|_{\mathscr{L}(L^\infty)}\leqslant 1. \end{equation*}$

Suppose that $f\in L^p(\mu)$ for some $p\geqslant 1$ , and the density $\varrho\in L^q[a,b]$ is concentrated on the interval $[a,b]$ , where $p^{-1}+q^{-1}=1$ . Then (3.4) holds for $\mu$ -almost all $x$ .

This assertion is contained in [53], where the condition (3.6) was used without justification in the proof, but according to Lemma 2.1 this is legitimate (yet another inaccuracy in the proof in [53] is that it employs the estimate (3.8) below, for which the condition ${p^{-1}+q^{-1}<1}$ is needed, but according to what we proved above the result is true for ${p^{-1}+q^{-1}=1}$ as well).

It was shown above that the condition (3.6) enables us to obtain an estimate that can then be combined with the Banach theorem. However, the convergence we are interested in can be proved directly on the basis of (3.6), without using the maximal function (see below). For semigroups generated by transformations of the space $X$ Theorem 3.13 was proved in [20] (as noted above, it also follows from results in [28] and [29]).

Remark 3.15. (i) Suppose that the operators $T_t$ are Markov and the measure $\mu$ is invariant for them. We observe that in the proof of convergence the general case reduces to the case $g=\mathsf{E}(f\mid\mathscr{I})=0$ . Indeed, for $g$ convergence is trivial, since $T_{ts}g=g$ by the invariance of $\mathscr{I}$ -measurable integrable functions with respect to $T_t$ . Moreover, we have $\mathsf{E}(f-g\mid\mathscr{I})=0$ .

When we use a convex function $V$ satisfying the global $\Delta_2$ -condition and such that the function $V(f)$ is integrable, we find that once $f$ is replaced by the difference $f-g$ the function $V(|f-g|)$ will also be integrable, because

$\begin{equation*} V(|f-g|)\leqslant V(|f|+|g|)\leqslant V(2|f|)+V(2|g|)\leqslant C V(|f|)+C V(|g|) \end{equation*}$

by the $\Delta_2$ -condition, and the function $V(|g|)\leqslant V(\mathsf{E}(|f|\mid\mathscr{I}))$ is estimated from above by the integrable function $\mathsf{E}(V(|f|)\mid\mathscr{I})$ according to Jensen's inequality (see [15], Proposition 10.1.9).

(ii) Let us see how Theorem 3.13 can be proved directly in this situation without using the Banach theorem. Recall that the chosen function $V$ satisfies the $\Delta_2$ -condition. As indicated in (i), it suffices to consider the case where $\mathsf{E}(f\mid\mathscr{I})=0$ . For each $n\in \mathbb{N}$ we introduce the bounded cut-off functions

$\begin{equation*} f_n=\min(\max(f,-n),n), \end{equation*}$

which converge to $f$ pointwise and in $L^1(\mu)$ , and moreover, the compositions $V(f_n)$ converge to $V(f)$ in the same way. We write $f$ as

$\begin{equation*} f=f_n+g_n, \qquad g_n=f-f_n. \end{equation*}$

The functions $g_n$ converge pointwise to zero and do not exceed $|f|$ in absolute value. Hence, the integrals of the functions $V(|g_n|)$ converge to zero. Therefore, the integrals of the conditional expectations $\mathsf{E}(V(|g_n|)\mid\mathscr{I})$ also converge to zero. Passing to a subsequence, we can assume that $\mathsf{E}(V(|g_n|)\mid\mathscr{I})\to 0$ almost everywhere. The functions $\mathsf{E}(f_n\mid\mathscr{I})$ converge in $L^1(\mu)$ to $\mathsf{E}(f\mid\mathscr{I})$ , that is, to zero in our situation. Passing to a subsequence once again, we can assume that $\mathsf{E}(f_n\mid\mathscr{I})(x)\to 0$ almost everywhere.

For any $n$ the bounded function $f_n$ is such that $K_t^\nu f_n(x)\to \mathsf{E}(f_n\mid\mathscr{I})$ almost everywhere as $t\to +\infty$ . Let us take a set $\Omega$ of unit measure for all of whose points $x$ there is such convergence for each $n$ and also the convergence $\mathsf{E}(V(|g_n|)\mid\mathscr{I})(x)\to 0$ and the convergence $\mathsf{E}(f_n\mid\mathscr{I})(x)\to 0$ .

We show that $K_t^\nu f(x)\to 0$ for all $x\in\Omega$ . Let $\varepsilon>0$ . By the $\Delta_2$ -condition there exists a $\delta>0$ such that $\|\varphi\|_V<\varepsilon$ if the integral of $V(|\varphi|)$ is less than $\delta$ (see [62], Theorem 9.4). Fix $n$ such that

$\begin{equation*} |\mathsf{E}(f_n\mid\mathscr{I})(x)|< \varepsilon \quad\text{and}\quad \mathsf{E}(V(|g_n|)\mid\mathscr{I})< \tau^{-1}\delta. \end{equation*}$

Next we take $t_1$ such that

$\begin{equation*} |K_t^\nu f_n(x)-\mathsf{E}(f_n\mid\mathscr{I})(x)|\leqslant \varepsilon \quad \forall\,t\geqslant t_1, \end{equation*}$

which is possible by the boundedness of $f_n$ . Hence

$\begin{equation*} |K_t^\nu f_n(x)|\leqslant 2\varepsilon \quad \forall\,t\geqslant t_1. \end{equation*}$

Further, we have the estimate

$\begin{equation*} \int_0^\tau V(|T_{ts} g_n(x)|)\,ds \leqslant \int_0^\tau T_{ts}(V\circ |g_n|)(x)\,ds= \frac{\tau}{\tau t}\int_0^{\tau t}T_{r}(V\circ |g_n|)(x)\,dr. \end{equation*}$

As $t\to+\infty$ , the right-hand side tends to $\tau\mathsf{E}(V(|g_n|)\mid\mathscr{I})<\delta$ . Hence, there exists a $t_2\geqslant t_1$ such that the left-hand side is less than $\delta$ for all $t>t_2$ . Therefore, for such $t$ the function $h_t(s)=T_{ts}g_n(x)$ satisfies $\|h_t\|_V<\varepsilon$ . We finally obtain

$\begin{equation*} |K_t^\nu f(x)|\leqslant|K_t^\nu f_n(x)|+|K_t^\nu g_n(x)|\leqslant 2\varepsilon+\|h_t\|_V\|\varrho\|_W\leqslant 2\varepsilon+\varepsilon\|\varrho\|_W, \end{equation*}$

which completes the proof.

In [53] there is a proof of the following assertion giving convergence for families of functions in the case of a measurable semiflow $\{g_t\}_{t\geqslant0}$ of measure-preserving transformations of the space $X$ .

Theorem 3.16. Let $\{f_{\alpha,\tau};\alpha,\tau\geqslant 0\}$ be some set of $\mu$ -measurable functions on the space $X$ such that $\sup_{\alpha,\tau}|f_{\alpha,\tau}|\in L^p(\mu)$ . Also, suppose that the function $(\alpha,\tau,x)\mapsto f_{\alpha,\tau}(x)$ is measurable with respect to $\mathscr{B}([0,+\infty))\otimes \mathscr{B}([0,+\infty))\otimes \mathscr{B}$ , the density $\varrho$ has support in an interval $[a,b]$ , $\varrho\in L^q[a,b]$ , where $p\in [1,+\infty]$ and $q=p/(p-1)$ , and $q=+\infty$ for $p=1$ . Let $\theta$ be a positive function on $(0,+\infty)$ such that $\theta(t)\to+\infty$ and $\theta(t)\leqslant Ct$ . If $f_{\alpha,\tau}(x)\to f(x)$ almost everywhere as $\alpha,\tau\to +\infty$ , then

$\begin{equation*} \lim_{t\to +\infty}\int_a^b f_{\theta(t)s,t}(g_{ts}(x))\varrho(s)\,ds= \mathsf{E}(f\mid\mathscr{I})(x) \end{equation*}$

for $\mu$ -almost all $x\in X$ .

We mention a result in [53] for maximal functions generated by a weight (it was assumed there that the operators are non-negative, but according to Remark 2.3 the estimate remains valid even without this assumption). We return to these functions below.

Theorem 3.17. Let $\{T_t\}_{t\geqslant 0}$ be a strongly measurable operator semigroup on $L^1(\mu)$ such that

$\begin{equation*} \|T\|_{\mathscr{L}(L^1)}\leqslant 1, \quad \|T\|_{\mathscr{L}(L^{\infty})}\leqslant 1. \end{equation*}$

In addition, suppose that $\varrho\in L^q[0,+\infty)$ , where $q\in (1,+\infty]$ , and that there exists a decreasing function $\beta\in L^1[0,\infty)$ such that for some $t_0\geqslant 0$

$\begin{equation*} \varrho(s)\leqslant\beta(s) \quad \forall\,s\in [t_0,+\infty). \end{equation*}$

Then for any function $f\in L^p(\mu)$ with $p^{-1}+q^{-1}<1$

$\begin{equation} \biggl\|\,\sup_t\int_0^{+\infty}T_{ts}f(x)\varrho(s)\,ds\biggr\|_{L^p(\mu)} \leqslant C(p)(t_0^{q/(q-1)}\|\varrho\|_{L^q[0,t_0]}+ \|\beta\|_{L^1(\lambda)})\|f\|_{L^p(\mu)}. \end{equation} \tag{ 3.8 }$

Note that $p^{-1}+q^{-1}<1$ by a condition of the theorem, and hence the weight $\varrho$ need not belong to $L^{p/(p-1)}$ .

According to what we said above, the supremum on the left-hand side of (3.8) can be taken over a countable dense set.

Now it is appropriate to mention a number of results in the papers [28] and [29] involving convergence of averages of the form

$\begin{equation*} A_t^\varphi f(x)=\frac{1}{t}\int_0^{+\infty}f(g_s(x)) \varphi\biggl(\frac{s}{t}\biggr)\,ds \end{equation*}$

defined by a probability density $\varphi$ with bounded support on the half-line in the case of a semiflow of transformations $g_s$ preserving the measure $\mu$ , and also convergence of more general averages of the form

$\begin{equation*} A_t^\varphi f(x)=\frac{1}{t^d}\int_{\mathbb{R}^d_{+}}f(g_s(x)) \varphi\biggl(\frac{s}{t}\biggr)\,ds \end{equation*}$

for transformations $g_s$ preserving the measure $\mu$ and indexed by points $s\in \mathbb{R}^d_{+}$ such that $g_{t+s}(x)=g_t(g_s(x))$ and $g_0(x)=x$ .

In the cited papers convergence of such averages was investigated using the Lorentz class $\Lambda_\mu(\psi)$ defined for any probability density $\psi$ on $[0,1]$ and consisting of equivalence classes of $\mu$ -measurable functions $f$ with finite quantity

$\begin{equation*} \|f\|_\psi=\int_0^1 \psi(s) f^{*}(s)\,ds, \end{equation*}$

where $f^{*}$ is the decreasing equimeasurable rearrangement of the function $|f|$ , that is, the decreasing right-continuous function on $(0,1)$ for which

$\begin{equation*} \mu(x\colon |f(x)|>s)=\lambda(u\colon f^{*}(u)>s), \qquad s\geqslant 0 \end{equation*}$

(where $\lambda$ is Lebesgue measure). On Lorentz classes of the indicated form see [72] and [63], Chap. II, §5. In the case where the function $\psi$ is decreasing (and only in this case), the Lorentz class $\Lambda_\mu(\psi)$ is a normed space with the norm $\|f\|_\psi$ (the fact that this is indeed a norm is not obvious and needs verification). Then this space is a Banach space.

The following results were proved in [28].

Theorem 3.18. Let $\varphi$ be a probability density on the cube $[0,1]^d$ with Lebesgue measure, let $\varphi^{*}$ be its decreasing equimeasurable rearrangement on $(0,1)$ , and let $f\in \Lambda_\mu(\varphi^{*})$ . Then $\lim_{t\to +\infty}A_t^\varphi f(x)$ exists $\mu$ -almost everywhere.

The justification employs some estimates for the maximal function

$\begin{equation} M^\varphi f(x)=\sup_{t>0}|A_t^\varphi f(x)| \end{equation} \tag{ 3.9 }$

(the same as in Theorem 3.17).

Theorem 3.19. There is a number $c_d$ depending only on $d$ , with $c_1=1$ , such that for all non-negative functions $f\in \Lambda_\mu(\varphi^{*})$ and all $R>0$

$\begin{equation*} \mu(M^\varphi f>R)\leqslant \frac{c_d}{R}\int_0^{\mu(M^\varphi f>R)} f^{*}(u)\varphi^{*}\biggl(\frac{u}{\mu(M^\varphi f>R)}\biggr)\,du. \end{equation*}$

In place of the unit cube in $\mathbb{R}^d$ one can use any cube with normalized Lebesgue measure, and this also lets us use the function $\varphi^{*}$ . It is interesting to consider the case of a weight $\varphi$ with non-compact support (there are also Lorentz classes for such weights; see [72] and [63], Chap. II, §5).

In addition, it was shown in [28] that if $d=1$ and $\varphi$ is an increasing weight on $(0,1)$ , then for the existence of the limit the inclusion $f\in \Lambda_\mu(\varphi^{*})$ is necessary in the following sense: if $f\geqslant 0$ and $f\notin \Lambda_\lambda(\varphi^{*})$ , where $\lambda$ is Lebesgue measure, then there exists a function $g$ on $(0,1)$ which is equimeasurable with $f$ and such that, for the flow of shifts by $t$ (mod $1$ ), convergence of $A_t^\varphi g(x)$ fails for every $x\in (0,1)$ .

It would be interesting to obtain analogues of these results for operator semigroups on $L^1(\mu)$ . It is clear that Theorem 3.18 implies Theorem 3.13 in the case of the semigroup $\{T_t\}_{t\geqslant0}$ generated by the transformations $\{g_t\}_{t\geqslant 0}$ , because in the situation of Theorem 3.13 the integral of $\varphi(s) f^{*}(s)$ is estimated by the sum of the integrals of $V(f^{*}(s))$ and $W(\varphi(s))$ over the interval $[0,1]$ , where the integral of $V(f^{*}(s))$ over $[0,1]$ with respect to Lebesgue measure equals the integral of $V(f(x))$ with respect to the measure $\mu$ . One can derive Theorem 3.18 from Theorem 3.19 by means of the Banach theorem. Indeed, the estimate for the maximal function shows that the measure of the set of points $x$ with $M^\varphi f(x)=+\infty$ is zero, for if it is equal to a number $q>0$ , then the integral on the right-hand side is not less than $c_d^{-1}qR$ , but this integral is not greater than the integral of $f^{*}(s)\varphi^{*}(s)$ because $\varphi^{*}$ is decreasing, and $f^{*}(s)\varphi^{*}(s)$ is integrable since $f\in \Lambda^{\varphi^{*}}_\mu$ , all of which leads to a contradiction as $R\to+\infty$ . Thus, the function $A_t^\varphi f(x)$ is bounded with respect to $t$ for almost all $x$ , and the bounded functions (for which the limit exists, as we know) are dense in the Banach space $\Lambda_\mu^{\varphi^{*}}$ (see [63], Chap. II, §5). It would be interesting to continue the study of the maximal function (3.9), which is more naturally connected with the summation method for general densities. This is also important for obtaining conditions for convergence without using the classical averaging, which undoubtedly should lead to more general and sharper results.

The next assertion gives conditions enabling us to get rid of boundedness of the support or monotonicity of $\varrho$ by requiring some estimates at infinity which admit arbitrarily high peaks (see [20] for the proof).

Theorem 3.20. Suppose that the semigroup $\{T_t\}_{t\geqslant0}$ is generated by a semigroup of measure-preserving transformations $g_t$ of the space $X$ . Suppose also that the density $\varrho$ of the measure $\nu$ satisfies the following condition: there exist positive numbers $a(n)$ such that $\varrho(t)\leqslant a(n)$ for $t\in [n,n+1)$ , $n=0,1,2,\dots$ , and $\sum_{n=1}^\infty n a(n)<+\infty$ . Let $f$ be a $\mu$ -integrable function. Then the equality

$\begin{equation*} \lim_{t\to+\infty} K_t^\nu f(x)=\mathsf{E}(f\mid\mathscr{I})(x) \end{equation*}$

holds for $\mu$ -almost all $x\in X$ , and if the semiflow $\{g_t\}$ is ergodic, then (3.5) holds.

Is it possible to omit the absolute continuity of the averaging measure $\nu$ ? It is readily seen that in many cases the presence of atoms of the measure $\nu$ excludes convergence of $K_{t}^\nu f(x)$ . For example, if $\nu$ is just the Dirac measure at the point $1$ , then $K_{t}^\nu f(x)=f(g_t(x))$ . For the group of rotations $g_t$ of the circle by the angle $t$ there is no limit of $g_t(x)$ as $t\to+\infty$ , so it is easy to find even a continuous function $f$ for which $K_{t}^\nu f(x)$ has no limit for any $x$ . The case of an atomless measure is more interesting, but also here the limit can fail to exist even for bounded continuous functions in the same example of the group of rotations $g_t$ by the angle $t$ on the circle with Lebesgue measure $\mu$ . Indeed, it is known (see [73], §2.2) that there exists an atomless singular Borel probability measure $\nu$ on $[0,1]$ such that its Fourier transform $\widehat{\nu}$ has no limit at infinity. Then for the simplest function $f(z)=z$ we get that $K_{t}^\nu f(z)=z\widehat{\nu}(t)$ has no limit as $t\to+\infty$ for any $z$ . However, it is not clear whether there exists a singular probability measure $\nu$ on $[0,1]$ such that for the group of rotations under consideration the limit of $K_t^\nu f(x)$ exists $\mu$ -almost everywhere as $t\to+\infty$ for every bounded Borel function $f$ (or every function $f\in L^1(\mu)$ ). If we admit only continuous functions $f$ in this setting, then the answer is affirmative. Indeed, then it suffices to have a limit for all polynomials in $z$ and $z^{-1}$ , that is, it suffices that the Fourier transform of the measure $\nu$ have zero limit at infinity. It is well known that there exist singular measures $\nu$ with this property (see §2.2 in [73]). In this situation the existence of the limit of $K_t^\nu f(x)$ for all $f\in L^p(\mu)$ turns out to be equivalent to the property that the maximal function $\sup_{t>0} K_t^\nu |f|(x)$ is finite $\mu$ -almost everywhere for such $f$ . The case of continuous $f$ is discussed in the next section. Of course, convergence of $K_t^\nu f$ to $f$ in $L^1(\mu)$ can hold for singular measures $\nu$ too. For example, this is the case for the group of rotations of the circle with Lebesgue measure and every measure $\nu$ whose Fourier transform tends to zero at infinity. This is clear from the convergence already noted on continuous functions along with the estimate $\|K_t^\nu\|_{\mathscr{L}(L^1)}\leqslant 1$ , which holds for all probability measures $\nu$ .

Korolev [54] obtained an analogue of the well-known Wiener–Wintner theorem for the case of non-uniform averaging. The latter theorem [93] asserts the following for the usual averages: for any ergodic semigroup $\{g_t\}_{t\geqslant 0}$ of measure-preserving transformations and every integrable function $f\in L^1(\mu)$ , there exists a set $A_f\subset X$ with $\mu(A_f)=1$ such that for all $x\in A_f$ and $\lambda\in[0,2\pi)$ the averages

$\begin{equation*} \frac{1}{T}\int_0^T e^{i\lambda s}f(g_s (x))\,ds \end{equation*}$

have a limit as $T\to +\infty$ , and if the semigroup $\{g_t\}_{t\geqslant 0}$ is weakly mixing and $\lambda\ne 0$ , then this limit is $0$ .

We recall that a semigroup $\{g_t\}_{t\geqslant 0}$ of measure-preserving transformations of the space $X$ is said to be weakly mixing (see [31], Chap. 1, §6) if for every two functions $\varphi,\psi\in L^2(\mu)$

$\begin{equation*} \lim_{t\to +\infty}\frac{1}{t}\int_0^t \biggl|\int_X \varphi(g_s(x))\psi(x)\,\mu(dx)- \int_X \varphi\,d\mu\int_X \psi\,d\mu\biggl|^2\,ds=0. \end{equation*}$

This property implies ergodicity. An analogous property is introduced for operator semigroups.

Consider the non-uniform averages

$\begin{equation*} F_{t,\lambda}f(x)=\int_0^{+\infty} e^{i\lambda ts}f(g_{ts}(x))\,\nu(ds). \end{equation*}$

Korolev [54] proved the following assertion.

Theorem 3.21. Let $\{g_t\}_{t\geqslant0}$ be a weakly mixing semigroup. Then for every function $f\in L^p(\mu)$ and every probability density $\varrho\in L^q(\lambda)$ on $[0,+\infty)$ with bounded support, where $p^{-1}+q^{-1}=1$ and $p,q\in [1,+\infty]$ , there exists a set $A\subset X$ such that $\mu(A)=1$ and for all $x\in A$ and $\lambda\in(0,2\pi)$

$\begin{equation*} \lim_{t\to +\infty}\int_0^{+\infty} e^{i\lambda ts} f(g_{ts}(x))\varrho(s)\,ds=0. \end{equation*}$

In connection with the Wiener–Wintner theorem we mention the papers [83] and [10], and also [68] and [74], where singular averagings were considered. It would be interesting to extend these results to the case of operator semigroups. It is also of interest to study the rate of convergence of non-uniform averages, which has already been investigated for the usual averages (see [47], [48]). Finally, the question arises of possible analogues of results in [45] and [46] in the situation under consideration.

§ 4. Non-uniform averagings for stochastic systems

In [21] non-uniform averagings were considered for stochastic equations. The main distinction of the stochastic case from the deterministic case is the absence of the semigroup property with respect to the time.

Let $x\mapsto A(x)=(a^{ij}(x))$ be a continuous map on $\mathbb{R}^d$ with values in the space of linear operators on $\mathbb{R}^d$ , let $b=(b^i)$ be a Borel vector field on $\mathbb{R}^d$ , and let $w(t)$ , $t\geqslant 0$ , be a $d$ -dimensional Wiener process. We shall assume that $w(t)$ , $t\geqslant 0$ , is the coordinate process on $(W,\mathscr{B}(W),P)$ , where $W:=C([0,+\infty),\mathbb{R}^d)$ is the space of all continuous trajectories on the half-line with the topology of uniform convergence on compact sets, $\mathscr{B}(W)$ is the Borel $\sigma$ -algebra, and $P$ is the Wiener measure.

We consider the stochastic differential equation

$\begin{equation} d\xi_t^x=A(\xi_t^x)\,dw_t+b(\xi_t^x)\,dt,\qquad \xi_0^x=x. \end{equation} \tag{ 4.1 }$

The formal generator of the diffusion defined by this equation has the form

$\begin{equation*} Lf=\frac{1}{2}\operatorname{trace}(AA^{*}D^2f)+\langle b,\nabla f\rangle. \end{equation*}$

Suppose that the following conditions are satisfied:

(i) $a^{ij}\in W^{p,1}_{\rm loc}(\mathbb{R}^d)$ , where $W^{p,1}_{\rm loc}(\mathbb{R}^d)$ is the local Sobolev class of functions in $L^p_{\rm loc}(\mathbb{R}^d)$ with first-order generalized derivatives in $L^p_{\rm loc}(\mathbb{R}^d)$ , and moreover, $p= 2d$ , $AA^{*}\geqslant cI$ , where $c$ is a positive constant, and $b$ is a locally bounded map;

(ii) there exists a function $V\in C^2(\mathbb{R}^d)$ such that the sets $\{V\leqslant c\}$ are compact and

$\begin{equation*} LV(x)\to -\infty \quad \text{as } |x|\to +\infty. \end{equation*}$

With the aid of results in [92] one can show that there exists a strong solution $\xi_t^x$ of the indicated equation. It is also known (see [23] and [22]) that the process obtained has a unique invariant probability measure $\mu$ with a positive continuous density of class $W^{p,1}_{\rm loc}(\mathbb{R}^d)$ with respect to Lebesgue measure, and the strongly continuous semigroup $\{T_t\}_{t\geqslant0}$ on $L^1(\mu)$ generated by the process $\xi_t^x$ is strongly Feller, that is, it takes functions in $L^1(\mu)$ to continuous functions, and the measure $\mu$ is ergodic with respect to this semigroup, that is, if $T_tf=f$ for all $t$ , then the function $f$ coincides with a constant almost everywhere. For what follows, only the listed properties of the process and its transition semigroup will be essential.

We denote by $P_x$ the image of the Wiener measure under the map $\Psi_x\colon W\to W$ defined by $\Psi_x(w)(t)=\xi_{t}^x(w)$ . It is readily seen that the measures $P_x$ are weakly continuous in $x$ , and hence the map $x\mapsto P_x(B)$ is measurable for every set $B\in \mathscr{B}(W)$ . We define a probability measure $P_{\mu}$ on the space $(W,\mathscr{B}(W))$ by

$\begin{equation*} P_{\mu}(B):=\int_{\mathbb{R}^d}P_x(B)\,\mu(dx). \end{equation*}$

On the path space $W$ the semigroup of shifts $\Theta_t$ acts by the formula $(\Theta_t\xi)(s):=\xi(s+t)$ . It is known (see [85], Chap. 1, §1.2) that the measure $P_{\mu}$ on $(W,\mathscr{B}(W))$ is ergodic if and only if $\mu$ is ergodic, and this holds under our assumptions.

Let $f\in L^p(P_{\mu})$ , where $p\geqslant 1$ . It follows from the Birkhoff–Khinchin ergodic theorem that for $P_{\mu}$ -almost all $w\in W$

$\begin{equation*} \lim_{t\to +\infty}\frac{1}{t}\int_0^t f(\Theta_s w)\,ds=\int_{W} f(w)\,P_{\mu}(dw). \end{equation*}$

Let $\mathscr{T}$ be the class of all $\Theta_t$ -invariant sets. The ergodicity of $P_{\mu}$ implies that the sets in $\mathscr{T}$ have $P_{\mu}$ -measure $0$ or $1$ .

The following ergodic theorem with the usual averaging holds (variants of this theorem can be found in [85], Chap. 1, §1.2, Theorem 7, and [66], §1.3, and [21] contains a short proof in precisely the given formulation).

Theorem 4.1. Let the conditions (i) and (ii) hold. Then for every Borel function $f$ in $L^1(\mu)$

$\begin{equation} \lim_{t\to +\infty}\frac{1}{t}\int_0^t f(w(s))\,ds= \int_{\mathbb{R}^d} f(y)\,\mu(dy) \end{equation} \tag{ 4.2 }$

for $P_x$ -almost all $w\in W$ and for every $x\in \mathbb{R}^d$ .

Since $P_x=P\circ \Psi_x^{-1}$ , the previous theorem gives the following assertion.

Corollary 4.2. Let the conditions (i) and (ii) hold. Then for any Borel function $f$ in $L^1(\mu)$ and any $x\in \mathbb{R}^d$

$\begin{equation} \lim_{t\to +\infty}\frac{1}{t}\int_0^t f(\xi_s^x(w))\,ds= \int_{\mathbb{R}^d}f(y)\,\mu(dy) \end{equation} \tag{ 4.3 }$

for $P$ -almost all $w\in W$ .

We proceed to Kozlov–Treschev type averaging in the stochastic case. Let $\nu$ be an absolutely continuous probability measure on $[0,+\infty)$ with density $\varrho$ with respect to Lebesgue measure. For every function $f\in L^1(\mu)$ the averages

$\begin{equation*} F_t(x,w):=\int_0^{+\infty} f(\xi_{ts}^x(w))\,\nu(ds) \end{equation*}$

are defined for $P$ -almost all $w$ , where $P$ is the Wiener measure. For these averages $F_t(x,w)$ we obtain the following assertion.

Corollary 4.3. Let the conditions (i) and (ii) hold. Then for every bounded Borel function $f$ on $\mathbb{R}^d$ and for every $x\in \mathbb{R}^d$

$\begin{equation} \lim_{t\to +\infty}\int_0^{+\infty}f(\xi_{ts}^x(w))\varrho(s)\,ds= \int_{\mathbb{R}^d}f(y)\,\mu(dy) \end{equation} \tag{ 4.4 }$

for $P$ -almost all $w\in W$ .

For functions $f$ in $L^p(\mu)$ the following assertion is valid.

Theorem 4.4. Let the conditions (i) and (ii) hold and let $f\in L^p(\mu)$ and $\nu=\varrho\,ds$ , where $\varrho\in L^q[0,+\infty)$ is a probability density and $p^{-1}+q^{-1}=1$ . Assume one of the conditions

(i) the density $\varrho$ has bounded support in the interval $[a,b]$ ,

(ii) $p>1$ and there exists a non-decreasing function $\beta$ on $[0,+\infty)$ such that $\beta \geqslant 0$ , $\beta \in L^q[0,+\infty)$ , and $\varrho(t)\leqslant \beta(t)$ on $[t_0,+\infty)$ for some $t_0$ .

Then for any $x\in \mathbb{R}^d$ and for $P$ -almost all $w\in W$

$\begin{equation*} \lim_{t\to +\infty}\int_0^{+\infty}f(\xi_{ts}^x(w))\,\varrho(s)\,ds= \int_{\mathbb{R}^d} f\,d\mu. \end{equation*}$

Besides non-uniform averages of stochastic systems, it would be interesting to apply analogous ideas to convergence of solutions of non-linear parabolic equations to stationary distributions (see [24]–[26], [84]), and also to analysis of attractors (see [52]).

§ 5. Dynamics of measures

Non-uniform Kozlov–Treschev averagings generated by transformations $g_t$ of $X$ preserving the measure $\mu$ motivate the consideration of a family of probability measures $\nu_{t,x}$ on $X$ defined in the following way: the measure $\nu_{t,x}$ is the image of the measure $\nu$ under the map

$\begin{equation*} S_{t,x}\colon [0,+\infty)\to X,\qquad S_{t,x}(s):=g_{ts}(x). \end{equation*}$

By the definition of the image of a measure, the integral of a $\mathscr{B}$ -measurable bounded function $\varphi$ with respect to the measure $\nu_{t,x}$ is given by the formula

$\begin{equation*} \int_X \varphi(z)\,\nu_{t,x}(dz)=\int_0^{+\infty}\varphi(g_{ts}(x))\,\nu(ds). \end{equation*}$

Here we discuss the character of convergence of the measures $\nu_{t,x}$ to the measure $\mu$ . In the case of an ergodic system the result due to Kozlov and Treschev has some similarity to convergence of the measures $\nu_{t,x}$ to $\mu$ in the topology of convergence on every set. However, this convergence can fail literally, because convergence for almost all $x$ appears only after the integration of every fixed function $f$ , and therefore the corresponding measure-zero set can depend on $f$ . For example, consider the ergodic group $\{g_t\}$ of shifts along trajectories of the differential equation

$\begin{equation*} \frac{dz_1}{dt}=\alpha_1,\qquad \frac{dz_2}{dt}=\alpha_2 \end{equation*}$

(with incommensurable $\alpha_1$ and $\alpha_2$ ) on the two-dimensional torus $S^2$ with the normalized Lebesgue measure $\mu$ , which is invariant with respect to $g_t$ . For every Borel probability measure $\nu$ on $[0,+\infty)$ , all the measures $\nu_{t,x}$ are singular with respect to $\mu$ , since every measure $\nu_{t,x}$ is concentrated on a curve of $\mu$ -measure zero. Hence for no $x$ can one have convergence of these measures to $\mu$ on every set (although for an absolutely continuous measure $\nu$ one has weak convergence, as we shall see below). In this circle of questions it turns out to be natural to employ the topological structure of the space $X$ , because we shall be discussing weak convergence of measures with respect to the duality with the space of bounded continuous functions (see §2). As above, we assume the joint measurability of $g_t(x)$ with respect to $(t,x)$ .

The next results were proved in [20].

Theorem 5.1. Let $\mu$ be a Radon probability measure on a completely regular topological space $X$ such that all compact sets in $X$ are metrizable (for example, $X$ is itself metrizable or is a Souslin space). Suppose that the semiflow $\{g_t\}_{t\geqslant 0}$ is ergodic. Let the measure $\nu$ be absolutely continuous with respect to Lebesgue measure on $[0,+\infty)$ . Then for $\mu$ -almost all $x$ the measures $\nu_{t,x}$ converge weakly to $\mu$ as $t\to +\infty$ .

We recall that a family $\mathscr{M}$ of Radon measures on a space $X$ is said to be uniformly tight if for any $\varepsilon>0$ there exists a compact set $K_\varepsilon\subset X$ such that

$\begin{equation*} |\mu|(X\setminus K_\varepsilon)\leqslant \varepsilon \quad \forall\,\mu\in \mathscr{M}. \end{equation*}$

According to the Prohorov theorem, every weakly convergent sequence of Borel measures on a complete separable metric space is uniformly tight. It is also known that every uniformly tight family of Radon measures that is bounded in variation on a completely regular space $X$ has a compact closure in the weak topology (see [15] or [19]). If the space $X$ is Souslin, then every sequence of measures in such a family contains a weakly convergent subsequence.

Theorem 5.2. Suppose that $X$ is a Souslin (or metric) space and $\mu$ is a Radon probability measure such that the semiflow $\{g_t\}_{t\geqslant 0}$ is ergodic. Let the measure $\nu$ be absolutely continuous. Then for any $\varepsilon >0$ there exists a compact set $X_{\varepsilon}\subset X$ such that $\mu(X_{\varepsilon})>1-\varepsilon$ and the family of measures $\nu_{t,x}$ with $t\geqslant \varepsilon$ and $x\in X_{\varepsilon}$ is uniformly tight.

Since $\nu_{0,x}$ is the Dirac measure at the point $x$ , in the case of a non-compact space $X$ one cannot manage without removing some part of the space and making an indentation from zero with respect to $t$ . However, it is not clear whether it is enough just to remove some part of the space.

We now discuss analogous questions for singular atomless measures $\nu$ on $[0,+\infty)$ with Fourier transform $\widetilde{\nu}$ given by

$\begin{equation*} \widetilde{\nu}(y)=\int_0^{+\infty} e^{iys}\,\nu(ds), \qquad y\in \mathbb{R}. \end{equation*}$

Example 5.3. Let $X=S$ be the unit circle with the normalized Lebesgue measure $\mu$ and let the transformation $g_t$ be rotation by the angle $t$ . If the Fourier transform of $\nu$ tends to zero at infinity, then the measures $\nu_{t,z}$ converge weakly to $\mu$ for all $z\in S$ as $t\to+\infty$ .

Conversely, if one has weak convergence for some $z$ , then the Fourier transform of $\nu$ tends to zero at infinity.

Indeed, for the function $f(z)=z$ or $f(z)=z^{-1}$ on $S$ we have, respectively,

$\begin{equation*} K_t^\nu f(z)=\int_S f(u)\,\nu_{t,z}(du)= f(z)\widetilde{\nu}(t)\quad\text{or}\quad K_t^\nu f(z)=f(z)\widetilde{\nu}(-t). \end{equation*}$

If for some $z$ we have weak convergence of the measures $\nu_{t,z}$ , then $\widetilde{\nu}$ has some limits $l_1$ and $l_2$ at $+\infty$ and $-\infty$ . We show that both limits are zero. It is known (see [73], Theorem 3.2.3) that the limit

$\begin{equation*} \lim_{T\to +\infty}\frac{1}{2T}\int_{-T}^{T} e^{-isy}\widetilde{\nu}(y)\,dy \end{equation*}$

equals the size of the jump at the point $s$ of the distribution function of the probability measure $\nu$ , which equals zero in our case due to the absence of atoms of $\nu$ . Taking $s=0$ , we see that this is possible only in the case when $l_1=-l_2$ . The convolution $\nu *\nu$ has no atom at zero either, so for its Fourier transform $\widetilde{\nu}^2$ the indicated limit at $s=0$ also equals zero. Hence $l_1^2=-l_2^2$ , and thus $l_1=l_2=0$ .

If we are given that $\widetilde{\nu}(y)\to 0$ as $|y|\to +\infty$ , then for the functions $f$ of the form $\exp (ik\theta)$ with non-zero integer $k$ the integrals with respect to the measures $\nu_{t,z}$ tend to zero. The integral of $f\equiv 1$ equals $1$ . Then convergence holds for finite linear combinations of such functions, but they uniformly approximate all continuous functions on the circle.

In the stochastic case considered in §4, as in the case of the deterministic semigroup, there arise the measures $\nu_{t,x,w}$ on $\mathbb{R}^d$ defined as the images of the measure $\nu$ under the maps

$\begin{equation*} S_{t,x,w}\colon [0,+\infty)\to \mathbb{R}^d,\quad S_{t,x,w}(s):=\xi_{ts}^x(w), \end{equation*}$

that is, the integral of a bounded Borel function $\varphi$ with respect to the measure $\nu_{t,x,w}$ is given by the formula

$\begin{equation*} \int_{\mathbb{R}^d} \varphi(z)\,\nu_{t,x,w}(dz)= \int_0^{+\infty}\varphi(\xi_{ts}^x(w))\,\nu(ds). \end{equation*}$

We mention two results obtained in [21]. As in §4, we assume that the probability space for our system is the path space $W$ with the Wiener measure $P$ .

Theorem 5.4. Suppose that the conditions (i) and (ii) after (4.1) hold, $\xi_t^x$ is the solution of (4.1), and $\mu$ is the corresponding invariant probability measure for the process. Let $\nu$ be an absolutely continuous Borel probability measure on $[0,+\infty)$ . Then for any $x\in \mathbb{R}^d$ and for $P$ -almost all $w$ the measures $\nu_{t,x,w}$ converge weakly to the measure $\mu$ as $t\to +\infty$ .

There is also an analogue of the theorem on uniform tightness.

Theorem 5.5. Under the hypotheses of the previous theorem, for any $\varepsilon >0$ there exists a compact set $B_{\varepsilon}\subset \mathbb{R}^d \times W$ such that $(\mu\otimes P)(B_{\varepsilon})>1-\varepsilon$ and the family of measures

$\begin{equation*} \{\nu_{t,x,w}\colon t\geqslant\varepsilon,\ (x,w)\in B_{\varepsilon}\} \end{equation*}$

is uniformly tight.

The author thanks F.-Y. Wang and S. V. Shaposhnikov for useful discussions.

Non-uniform Kozlov–Treschev averagings in the ergodic theorem

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

§ 1. Introduction

§ 2. Notation, terminology and auxiliary results

§ 3. Non-uniform Kozlov–Treschev averagings and the ergodic theorem

§ 4. Non-uniform averagings for stochastic systems

§ 5. Dynamics of measures

Non-uniform Kozlov–Treschev averagings in the ergodic theorem

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Abstract

§ 1. Introduction

§ 2. Notation, terminology and auxiliary results

§ 3. Non-uniform Kozlov–Treschev averagings and the ergodic theorem

§ 4. Non-uniform averagings for stochastic systems

§ 5. Dynamics of measures