Zernike polynomials and their applications

The Zernike polynomials are a complete set of continuous functions orthogonal over a unit circle. Since first developed by Zernike in 1934, they have been in widespread use in many fields ranging from optics, vision sciences, to image processing. However, due to the lack of a unified definition, many confusing indices have been used in the past decades and mathematical properties are scattered in the literature. This review provides a comprehensive account of Zernike circle polynomials and their noncircular derivatives, including history, definitions, mathematical properties, roles in wavefront fitting, relationships with optical aberrations, and connections with other polynomials. We also survey state-of-the-art applications of Zernike polynomials in a range of fields, including the diffraction theory of aberrations, optical design, optical testing, ophthalmic optics, adaptive optics, and image analysis. Owing to their elegant and rigorous mathematical properties, the range of scientific and industrial applications of Zernike polynomials is likely to expand. This review is expected to clear up the confusion of different indices, provide a self-contained reference guide for beginners as well as specialists, and facilitate further developments and applications of the Zernike polynomials.

(Some figures may appear in colour only in the online journal) * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
The Zernike polynomials are a sequence of continuous functions that form a complete orthogonal set over a unit disk. They were named after the optical physicist Frits Zernike (figure 1), winner of the 1953 Nobel Prize in Physics and the inventor of phase-contrast microscopy. Since most optical systems have circular apertures, Zernike polynomials are useful for wavefront analysis and thus play important roles in optics. Zernike polynomials can be generally divided into two basic types, i.e. Zernike circle polynomials and Zernike annular polynomials, which are defined over a unit disk and an annular unit disk, respectively. The Zernike circle polynomials were first introduced by Zernike in 1934 as eigenfunctions of a second-order rotationally invariant partial differential equation to describe the phase contrast method [1,2] and were derived by Bhatia and Wolf in 1954 from the requirements of orthogonality and invariance [3]. The Zernike annular polynomials first appeared in a report of Perkin-Elmer Corporation in 1971 [4] and were discussed by Tatian in 1976 for aberrations balancing in optical systems with annular pupils from the standpoint of lens design [5]. They are systematically studied and explicitly given by Mahajan in 1981 [4]. Zernike polynomials gradually aroused people's interests after introduction (figure 2) and have found widespread applications in optics and image processing. In 1942, Bernard Nijboer, a PhD student of Zernike, expanded aberration functions of a symmetrical optical system into a series of Zernike polynomials and formulated an efficient representation of the complex amplitude distribution of a point object in the image plane [6]. This work allows analytical evaluation of diffraction integrals and the point spread function (PSF) of a general optical system and is referred to as the Nijboer-Zernike theory. However, the Nijboer-Zernike theory is only valid in the case of small aberrations and can only produce accurate results at positions close to geometrical focus. Seventy years later, Janssen formulated a general expression in terms of power-Bessel series and extended the Nijboer-Zernike theory for optical systems with large aberrations [8]. The extended Nijboer-Zernike theory can analytically compute the PSF of an aberrated optical system described by Zernike coefficients and accelerates further developments in the focused field diffraction theory. While the developments of diffraction theory of aberrations solely rely on analytical derivations, Zernike polynomial-based wavefront analysis depends on the use of computers. In the 1970s, with the rise of adaptive optics, Noll proposed a modified set of Zernike polynomials by normalizing and sorting the polynomials for statistical analysis of wavefront aberrations caused by atmospheric turbulence [9]. At the same time, Loomis at the University of Arizona introduced a reordered subset of Zernike polynomials to the interferogram processing software FRINGE for wavefront analysis in interferometric measurements [10,11]. This subset called Zernike fringe set contains only 37 terms but has good corresponding relationships with classical aberrations. In 1980, Teague extended the applications of Zernike polynomials from  [7]. Reprinted from [7], Copyright © 2012 Elsevier B.V. All rights reserved. optics to image processing and pioneered Zernike moments, which hold the property of rotation invariance and can be used as shape descriptors for pattern recognition [12]. The Zernike moments have since then become a valuable shape descriptor for image analysis. After entering the 21st century, the developments of Zernike polynomials gradually become mature and several Zernike sets were standardized to promote effective communication by the American National Standards Institute (ANSI) [13,14] and the International Organization for Standardization (ISO) [15][16][17] (see figure 3).
The widespread use of Zernike polynomials stems from their unique mathematical properties. First, Zernike polynomials are orthogonal over a unit circle. The orthogonality makes the expansion coefficients of a wavefront function independent of the number of terms [18]. This also enables convenient mathematical manipulations of wavefronts, such as addition, subtraction, translation, rotation, and scaling. Second, while other polynomials orthogonal over a unit disk also exist, Zernike polynomials are unique in the sense that they have good corresponding relationships with classical aberrations, such as astigmatism, coma, and spherical aberration [19,20]. This enables fast classifications and quantifications of wavefront aberrations. Third, Zernike polynomials make the evaluation of the image quality of an optical system easy since the system PSF can be analytically computed from the Zernike expansion coefficients of wavefront aberrations based on the (extended) Nijboer-Zernike theory [6,8,21]. In addition, Zernike polynomials can serve as a basis set for wavefront reconstruction in slope sensitive wavefront sensors, such as the Shack-Hartmann wavefront slope sensor [22,23] and the lateral shearing interferometers [24], which are important wavefront sensing tools in ophthalmic optics and adaptive optics.
Nowadays, a variety of indices for Zernike polynomials are in use by authors and authorities around the world. Wellknown ones include the Noll indices [9,25], the OSA/ANSI indices [13-15, 17, 26, 27], the Fringe/University of Arizona indices [10,19], the ISO 14999 indices [16,28], the Born and  Wolf indices [29], and the Malacara indices [30,31]. Each indexing scheme adopts a different naming, normalization, and indexing strategy and even the coordinate system may be different, which causes great confusion to researchers working with the polynomials and hinders effective communication. Moreover, mathematical properties of Zernike polynomials developed in the past few decades, such as derivatives, Fourier transform, and recurrence relations, are scattered in the literature and no work summarizes these results. This motivates us to prepare a review paper on the Zernike polynomials with the aims of clearing up the confusion of different indices, summarizing mathematical properties, surveying state-of-theart applications, and providing a quick reference guide for scientists and engineers in this community.
The remainder of this review is organized as follows (see figure 4). Section 2 reviews different indexing schemes for the Zernike circle polynomials, their mathematical properties, roles in wavefront fitting, relationships with classical Seidel aberrations and the Strehl ratio, connections with other important functions, such as the XY monomials and the Legendre polynomials. Section 3 discusses orthonormal polynomials over noncircular pupils based on the Zernike circle polynomials with an emphasis on Zernike annular polynomials, whose definition, mathematical properties, and roles in wavefront fitting are presented. Section 4 surveys stateof-the-art applications of Zernike polynomials in a range of fields, including diffraction theory, optical design, optical testing, ophthalmic optics, adaptive optics, and image analysis. Finally, section 5 draws concluding remarks. Table 1 lists the acronyms and symbols used in this review.

Definitions
Zernike polynomials over circular pupils are called Zernike circle polynomials or simply Zernike polynomials. They are defined over a unit disk and can be most conveniently expressed in polar coordinates (ρ, θ), where ρ is the normalized radial coordinate (0 ⩽ ρ ⩽ 1) and θ is the polar angle measured counterclockwise from the +x-axis (0 ⩽ θ < 2π), as shown in figure 5(a). The polar coordinates ρ and θ can be converted to the Cartesian coordinates x and y using the trigonometric functions: x = ρ cos θ, y = ρ sin θ. (1) Likewise, the Cartesian coordinates x and y can be converted to polar coordinates ρ and θ by:  It is worth noting that while most people follow the convention that θ is positive when measured counterclockwise from the +x-axis, some authors, such as Born [32] and Malacara [30,31], measure the polar angle from the +y-axis in the clockwise direction [33], which stems from early (precomputer) aberration theory and is not recommended [27].
Zernike polynomials have several different indexing schemes during evolution, causing confusion to researchers, especially beginners. In this section, we classify indices in the literature into six groups, i.e. the Noll indices, the OSA/ANSI indices, the Fringe indices, the ISO 14999 indices, the Born and Wolf indices, and the Malacara indices, and compare their naming, normalization, and indexing strategies.
2.1.1. The Noll indexing scheme. When Zernike first introduced the orthogonal polynomials, radial polynomials and azimuthal functions were explicitly given [1,2]. However, the normalization and ordering methods for these polynomials were not specified. Noll in 1976 sorted and normalized Zernike polynomials to facilitate statistical analysis of wavefront distortion caused by atmospheric turbulence [9]. The indexing scheme was later followed by many authors [34][35][36] and was used in commercial software, such as Zemax [25] as the standard indices. Note that the 'standard' indices in Zemax are not associated with any ANSI or ISO standards. In this section, we summarize the Noll indexing scheme, discuss normalized and non-normalized Zernike circle polynomials, and extend the definitions from the real domain to the complex domain.
2.1.1.1. Real Zernike circle polynomials. In the Noll indices, the normalized or orthonormal Zernike circle polynomials are defined as the products of normalization factors, radial polynomials, and azimuthal (angular) functions, which are written as [9]: where the index n is the degree of the radial polynomials, R m n (ρ); the index m is the azimuthal frequency describing the repetition of the angular function; n and m are non-negative integers and satisfy n − m ⩾ 0 and n − m = even; j is a mode-ordering number starting from 1 and its relationships with n and m are presented in equation (8). There are a total of (n + 1)(n + 2)/2 linearly independent polynomials for a degree ⩽ n. The radial polynomial R m n (ρ) is defined as [1,6]: The radial polynomials of the first few degrees are shown in figure 6. It is easy to verify the following relations: The normalized Zernike circle polynomials meet the following orthonormality condition: where δ jj ′ is the Kronecker delta function.
The orthonormal Zernike circle polynomials can be sorted by either the single index, j, or the double indices, n and m. The former is useful for describing Zernike expansion coefficients while the latter is useful for unambiguously describing the functions. To convert a given value j to n and m, one can use the following relationships [36]: , n is even, where ⌊x⌋ denotes the floor function that gives as output the greatest integer less than or equal to x. For example, ⌊2.4⌋ = 2.
To convert given values of n and m to j, the following relationship can be used: Table 2 lists the first 37-term real orthonormal Zernike circle polynomials in the polar and Cartesian coordinate systems and the values for n, m, and j.
The non-normalized real Zernike circle polynomials can be obtained by dropping the normalization factors from the normalized Zernike circle polynomials as: (10) They satisfy the following relationship: The orthogonality of non-normalized Zernike circle polynomials can be written as: Note that the integral in the denominator is equal to π. Figures 7 and 8 show the three-dimensional (3D) visualization of the non-normalized Zernike circle polynomials up to the sixth degree and their corresponding interferometric fringe patterns as in optical testing [37].

Complex Zernike circle polynomials.
The Zernike circle polynomials in the complex domain were not defined in Noll's original definition [9]. However, they can be obtained based on Bhatia and Wolf's work [3] by replacing the azimuthal functions in real Zernike circle polynomials with a complex exponential function. The orthonormal complex Zernike circle polynomials can be written as [4]: where n is a non-negative integer, l is an integer, n − |l| ⩾ 0 and is even. The radial polynomial is defined as: The normalized complex Zernike circle polynomials meet the following orthonormality condition: where * denotes complex conjugate.
The non-normalized version of the complex Zernike circle polynomials are defined as [32]: The orthogonality can be expressed as: The complex and real definitions of Zernike circle polynomials are related via the Euler's formula [3,32]. The complex version is useful to define Zernike moments in image analysis, which will be discussed in section 4.6.

The OSA/ANSI indexing scheme.
The OSA/ANSI indices for Zernike circle polynomials were initially developed by an OSA Standards Taskforce in 1999 to reach consensus recommendations on definitions, conventions, and standards for reporting of optical aberrations of human eyes [26,27]. It was later standardized in ANSI Z80.28 [13,14] and ISO 24157 [15,17] and adopted in some commercial software, such as COMSOL Ray Optics Module [38].
The Zernike circle polynomials in the OSA/ANSI indices employ a right-handed coordinate system, as shown in figure 9, and are defined in the real domain as [13,14,26,27].
where n is a non-negative integer, m is an integer, n − |m| ⩾ 0 and is even, j is a mode-ordering number starting from 0. The radial polynomial, R m n (ρ), is defined as:     The normalization factor, N m n , can be written as: The Zernike circle polynomials under the OSA/ANSI indices can be sorted by either the single index, j, or the double indices, n and m. To achieve conversion among these indices, one can use the following relationships [26,27]: where ⌈x⌉ denotes the ceiling function that gives as output the least integer greater than or equal to x. For example, when j = 4, n = ⌈1.7⌉ = 2, m = 0. Table 3 lists the first 37-term Zernike circle polynomials in the OSA/ANSI indices and the values for n, m, and j.

The Fringe indexing scheme.
The Zernike circle polynomials under the fringe indexing scheme (also known as the USAF set) were first developed by John Loomis in an interferogram analysis program called FRINGE at the University of Arizona, Optical Sciences Center in the 1970s [10,11,40,41]. They are a low-order Zernike set supplemented with radial polynomials of higher order and are preferred for lens design and optical metrology because they group terms according to optical wavefront aberration order [42,43].
The Zernike circle polynomials under the fringe indices do not have normalization factors and can be written as: where n is a non-negative integer, m is an integer, n − |m| ⩾ 0 and is even, j is a mode-ordering number starting from 0 (In CODE V and Zemax, j starts from 1 instead of 0). The radial polynomial is expressed as: Note that the above formulas are modified from the Wyant and Creath formula [19] to facilitate comparisons with other indices. The final mathematical expression for each term, as listed in table 4, is the same as that in Wyant's notation.
Defining N = (n + |m|)/2, Zernike fringe polynomials can be sorted as follows. First arrange N in ascending order from 0 to 6; then sort n in ascending order for a given value of N; finally organize m in descending order for given values of N and n. Compared with other Zernike sets, the Zernike fringe set is unique in the sense that it only has 37 terms (N ⩽ 6). This small polynomial set is useful for interferogram analysis and automatic lens design and is widely adopted in commercial optical software, such as Zemax [25], CODE V [29], OSLO [44,45], and MetroPro [46].

The ISO-14999 indexing scheme.
The ISO-14999 indices were first published by ISO in the ISO/TR 14999-2 technical report in 2005 [28] for the description of wavefront in interferometric measurement of optical elements and optical system and then updated in 2019 [16].
The Zernike circle polynomials under the ISO-14999 indexing scheme do not have normalization factors and can be written as [16,28]: where n is a non-negative integer, m is an integer, n − |m| ⩾ 0 and is even, j is a mode-ordering number starting from 0 (j = 0, 1, 2, …, ∞). The radial polynomial is expressed as: Defining N = n + |m|, the Zernike circle polynomials under the ISO-14999 indices are sorted as follows. First arrange N in ascending order from 0 to ∞; then sort n in ascending order for a given value of N; finally organize m in descending order for given values of N and n. One may find that the ISO-14999 indices share almost the same definition as the fringe indices except that the former contains infinite terms. Actually, the fringe set is a subset of the ISO-14999 set, which is called the Extended Fringe Zernike Polynomials in CODE V [29]. Table 5 lists the first 37 terms of the ISO-14999 set and the values for n, m, N, and j.

The Born and Wolf indexing scheme.
In the classic textbook Principle of Optics, Born and Wolf reviewed the definition of Zernike circle polynomials and used it for the expansion of aberration functions [32,47]. Many people [12,48] later follow the Born and Wolf definition and treat it  [19,25,29].  [3,32]: where n is a non-negative integer, m is an integer, n − |m| ⩾ 0 and is even, j is a mode-ordering number starting from 1 (j = 1, 2, …, ∞). The radial polynomial is expressed as: The Zernike circle polynomials under the Born and Wolf indices are sorted as follows. First arrange n in ascending order from 0 to ∞ and then sort m in descending order for Quaternary y-coma 35 10 0 252ρ 10 − 630ρ 8 + 560ρ 6 − 210ρ 4 + 30ρ 2 − 1 Quaternary spherical 36 12 6 6 ρ 6 cos 6θ a given value of n. The Born and Wolf indices are used by several authors [12,48] and software [29]. For example, although the software CODE V does not explicitly define the standard Zernike polynomials, the tabulated polynomials in its manual [29] have the same expressions as those in the Born and Wolf indices. Table 6 lists the first 37-term Zernike polynomials under the Born and Wolf indices and the values for n, m, and j.

The Malacara indexing scheme.
Different from the aforementioned five indexing schemes, the Malacara indices use a different coordinate convention, where the polar angle, θ, is measured clockwise from the +y-axis. The Zernike circle polynomials under the Malacara indices do not have normalization factors and can be written as: where n is a non-negative integer, m is an integer, n − |m| ⩾ 0 and is even, j is a mode-ordering number starting from 1 (j = 1, 2, …, ∞). The radial polynomial is expressed as: The Zernike circle polynomials in the Malacara indices have the same ordering scheme as the Born and Wolf indices and are sorted as follows. First arrange n in ascending order from 0 to ∞ and then sort m in descending order for a given value of n. The Malacara indices are mainly used in the first and second editions of the well-known book Optical Shop Testing [30,31], the third edition of which, however, defines the Zernike circle polynomials under the Noll indexing scheme [36]. Table 7 lists the first 37-term Zernike circle polynomials under the Malacara indices and the values for n, m, and j.

Comparisons.
The different indexing schemes of Zernike circle polynomials are compared and summarized in table 8 from the perspectives of coordinate system, normalization, and ordering strategy. In particular, the sorting of the polynomials under different indices is shown in figure 10 for the first few degrees. An illustration of the sources and applications of the six indices is presented in figure 11. For convenience, the Noll indices will be used in the remaining part of the article unless otherwise stated.  [31].

Mathematical properties
In this section, we review major mathematical properties of Zernike circle polynomials, including orthogonality, symmetry, Fourier transform, integral representation of radial polynomials, derivatives, and recurrence relations. For more properties, one can refer to [51,52].

Orthogonality.
The orthogonal relationships of real and complex Zernike circle polynomials have been presented and can be found in equations (7), (12), (15) and (17). Moreover, the radial and azimuthal functions of Zernike circle polynomials are also orthogonal and satisfy the following relationships [36]: cos mθ cos m ′ θ, j and j ′ are both even cos mθ sin m ′ θ, j is even and j ′ is odd sin mθ cos m ′ θ, j is odd and j ′ is even sin mθ sin m ′ θ, j and j ′ are both odd dθ =      π(1 + δ m0 )δ mm ′ , j and j ′ are both even; πδ mm ′ , j and j ′ are both odd; 0, otherwise.
Note that the Noll indices are used here.

Symmetry.
The symmetry of Zernike circle polynomials can be expressed as: where Z j is the Fourier transform of Z j and (u, v) are Cartesian coordinates in the frequency domain. Use (r, ϕ) to denote the polar coordinates in the frequency domain and apply the transformation relationships x = ρcosθ, y = ρsinθ, u = rcosϕ, v = rsinϕ, the Fourier transform of Zernike circle polynomials can be written as [9,53]: where J n (x) is the nth-order Bessel function of the first kind and is defined as [54]: The Fourier transform of Zernike circle polynomials is useful for the conversion between Zernike coefficients and Fourier series coefficients of a wavefront [53].

Integral representation of radial polynomials.
Substituting equation (35) into the inverse Fourier transform of Zernike circle polynomials (equation (34)), an integral representation of Zernike radial polynomials can be obtained as [2,6]: This integral representation is useful for deriving a recurrence relation of the derivatives of Zernike radial polynomials [9].

2.2.5.
Derivatives. The integral representation for the radial polynomials provides a good starting point for calculating derivatives. The derivatives of radial polynomials can be written in a recursion relation as [9]: In polar coordinate system, the partial derivatives of Zernike circle polynomials under the Noll indices with respect to x and y can be written as [18,55]: where the normalization factor and the azimuthal function are: In Cartesian coordinate system, the partial derivatives of Zernike circle polynomials under the OSA/ANSI indices with respect to x and y can be written as [14]: where n ′ increases with a step of 2 in the summations and in the case that (|m| + 1) is larger than (n − 1), the first summation term does not exist. The Cartesian derivatives of Zernike circle polynomials can also be obtained using recurrence relations, as reported in [55][56][57]. The derivatives of Zernike circle polynomials are useful for certain problems, such as ray tracing in optical design [58] and wavefront reconstruction in wavefront sensing [22,24].

Recurrence relations.
Computation of high-order of Zernike circle polynomials is necessary for some applications, such as Zernike moments-based image analysis. Although the radial polynomials are explicitly formulated (equation (19)), direct numerical computation suffers from the problem of low computational efficiency and possible cancellation errors [59][60][61]. To deal with these problems, various recurrence relations have been proposed for evaluating the radial polynomials [60][61][62][63][64]. Here we briefly review four widely-used recurrence methods, including the modified Kintner method [59,65], the Prata's method [66], the q-recursive method [59], and the Shakibaei and Paramesran method [62]. The modified Kintner method was first proposed by Kintner in 1976 [65] and improved by Chong et al in 2003 [59] by adding recurrence relations for special cases when n − m = 0 and 2. The improved recurrence relation can be expressed as: where n and m are non-negative integers and satisfy n − m ⩾ 0 and n − m = even; the coefficients are given by: The modified Kintner method is a degree-varying (n-varying) approach that computes radial polynomials at higher order from those at lower order for a fixed value of m. The Prata method was proposed by Prata and Rusch in 1989 [66] and the recurrence relation can be written as: where, The q-recursive method was proposed by Chong et al [59] and the three-term recurrence relation can be written as: where the coefficients are given by: Different from the modified Kintner method and the Prata method, the q-recursive method is an m-varying method that computes radial polynomials at lower m from those at higher m for a fixed radial order n. The Shakibaei and Paramesran method uses a particularly simple recursion, in which a radial polynomial is expressed as a linear combination of three earlier computed radial polynomials as [62]: The recursion can be initialized with the conditions R 0 0 (ρ) = 1 and R m n (ρ) ≡ 0 when n < m. According to [67], the speed and accuracy of the recursion outperforms the Prata method and the q-recursive method in an image processing setting.

Summary.
The mathematical properties of Zernike circle polynomials are summarized in table 9.

Wavefront fitting 2.3.1. Mathematical formulation.
A wavefront function W defined over a unit circle can be represented by the linear combination of finite terms of Zernike circle polynomials as [31,34,68,69]: where R is the radius of the pupil, 0 ⩽ ρ ⩽ 1, J is the maximum number of terms of the polynomials, a j is the expansion coefficients, and Z j is the jth-term Zernike circle polynomial. The equation can be equivalently expressed in Cartesian coordinates as: Written in discrete and matrix forms, equation (51) becomes: a Angle brackets denote the inner product of two functions. b c is a constant. where, where K is the total number of data points within the unit circle. Generally, equation (52) is an overdetermined linear system, where there are more equations (K) than unknowns (J). It can be written into the normal equation [34,68]: where the superscript T denotes matrix transpose. The solution can be obtained by matrix inversion as: The Zernike-based wavefront fitting has several useful properties [18]. First, the truncation of the expansion of a wavefront does not change the expansion coefficients. In other words, the expansion coefficients are independent from each other: Second, all Zernike terms except the piston term have a mean value of zero and, therefore, the mean value of a wavefront equals the piston coefficient, i.e.: Third, wavefront variance equals the sum of the square of each expansion coefficient, excluding the piston coefficient, i.e.: The properties of Zernike based wavefront fitting are summarized in table 10.

Transformation of Zernike coefficients with pupil translation, rotation, or resizing.
Zernike polynomials and their associated coefficients are commonly used to quantify the wavefront aberrations of the eye. When the aberrations of different eyes, pupil sizes, or corrections are compared or averaged, it is important that the Zernike coefficients have been calculated for the correct position, orientation, and size of the pupil. In this section, we discuss transformation relationships of Zernike expansion coefficients for translated, rotated, and resized pupils, which are shown in figure 13.

Translation.
Translating a pupil ( figure 13(b)) changes the expansion coefficients of a wavefront defined over it. Assuming that the displacements along the x and the y axis are ∆x and ∆y, respectively, the translated wavefront function can be expanded using the Taylor series [14,70] as: New wavefront expansion coefficients b m n can be obtained by computing the first-order derivatives of the Zernike circle polynomials in equation (59), which can be expressed as linear combinations of the untransformed Zernike circle polynomials (see section 2.2.5).

Rotation.
Rotating a pupil ( figure 13(c)) similarly changes the expansion coefficients of a wavefront defined over it, which should be taken into consideration for applications such as vision correction surgery [71]. For a wavefront counterclockwise rotated with respect to its original coordinate system by an angle α, transformed Zernike expansion coefficients

Resizing.
Comparison of Zernike expansion coefficients of wavefronts over different non-normalized pupils requires the same aperture size. Therefore, it is necessary to calculate expansion coefficients for an arbitrary pupil size based on the expansion coefficients of the full pupil. Many transformation relationships for pupil resizing have been developed [72][73][74][75][76][77][78][79][80][81] and two simpler methods were described by Dai [18,82] and Janssen [79,83]. Suppose there are two wavefronts, W 1 and W 2 , defined over concentric pupils with radii of R 1 and R 2 , respectively, as shown in figures 13(a) and (d). W 2 is part of W 1 and R 2 ⩽ R 1 . The Zernike expansion of the wavefront W 2 can be written as: where 0 ⩽ ρ ⩽ 1, b j is the expansion coefficients in the OSA/ANSI indices for the wavefront W 2 . Define a scale factor ε = R 2 /R 1 and the expansion can also be written as: where a j is the expansion coefficients for the wavefront W 1 .
Connecting equations (61) and (62), Dai gives [18,82]: where n max is the maximum radial degree of the Zernike circle polynomials and G i n (ε) is a resizing factor, defined as [18]: where i ⩽ ⌊(n max − n)/2⌋ and ⌊x⌋ denotes the floor function that gives as output the greatest integer less than or equal to x. The results suggest that the transformed expansion coefficient b m n is a linear combination of a m n and more untransformed coefficients a m n are involved for the calculation of transformed coefficients b m n for lower degrees. Table 11 lists the expressions for the resizing factor G i n (ε) and transformed expansion coefficients b m n for n max = 6. In addition to the Dai's formula (equation (63)), a concise expression with an elegant proof is given by Janssen and Dirksen as [79,83] where n' = n, n + 2, …, and R n+2 n ≡ 0. The Janssen and Dirksen expression is mathematically equivalent to the Dai's formula but has the advantages of simplicity and only involving the radial polynomials, Zernike which can provide better numerical stability for high radial degrees.
A simple numerical simulation is presented in figure 14 to demonstrate the idea of wavefront resizing. Figure 14(a) is the original wavefront defined over a 3 mm-radius pupil and its first 37 expansion coefficients, a j , under the OSA/ANSI indices are shown in figure 14(b). Figure 14(c) illustrates the Zernike expansion coefficients b j for the 2 mm-radius portion of the original wavefront based on the conversion relationship in equation (63). Figures 14(d) and (e) show the reconstructed wavefront using the transformed coefficients b j and the ground truth, respectively. The wavefront difference map is displayed in figure 14(d). The simulation suggests that equation (63) is effective for Zernike expansion coefficients computation over an arbitrary pupil size in wavefront resizing. Table 11. Resizing factor G i n (ε) and transformed expansion coefficients b m n for nmax = 6 [18].
where l is a non-negative integer describing the dependence of the given term upon the distance of the image point from the axis; n and m are two non-negative integers determining the type of aberration. The first two terms in equation (66) represent the transverse (W 111 ) and the longitudinal (W 020 ) focal shifts, respectively. The remaining aberration terms constrained by the relation l + n = 4 are called primary or Seidel aberrations, which include five monochromatic aberrations, namely, spherical aberration (W 040 ), coma (W 131 ), astigmatism (W 222 ), field curvature (W 220 ), and distortion (W 311 ). These aberrations are sometimes called third-order aberrations when referring to ray aberration, which can be obtained as the derivative of wavefront aberration. For a fixed image point, r is a constant and can be absorbed into the coefficients. Assuming the relative aperture and the size of the field to be such that higher-order terms can be ignored, the expression of the wavefront aberration in equation (66) reduces to [19,20]: Table 12 lists the first-and third-order aberrations. The wavefront aberration for a rotationally symmetric system can also be expanded by a set of Zernike series instead of power series. Assuming the first nine terms of Zernike circle polynomials are used for the expansion, the wavefront aberration can be written as [20]: + a 4 ρ 2 cos 2θ + a 5 ρ 2 sin 2θ + a 6 (3ρ 3 − 2ρ) cos θ + a 7 (3ρ 3 − 2ρ) sin θ + a 8 (6ρ 4 − 6ρ 2 + 1). (68) It can be further rearranged as [20]: wherein the expressions for the coefficients and phase angles are tabulated in table 13. The expansion in equation (69) has a similar form to equation (67) indicating the coefficients of Seidel aberrations can be converted from Zernike expansion coefficients. However, one should keep in mind that without  [20].

Aberrations Coefficients Phase
Piston ap = a 0 − a 3 + a 8 - ϕc = arctan(a 7 /a 6 ) Spherical as = 6a 8 a The sign in the defocus coefficient is chosen to minimize the magnitude of the coefficient [20]. b The sign in the astigmatism coefficient is chosen to be opposite to the sign in the defocus coefficient [20]. considering field dependence, the terms in equation (69) are not true Seidel aberrations. Wavefront measurement using an interferometer only provides data at a single field point. For this reason, field curvature looks like defocus and distortion like tilt. A set of wavefronts from different object points should be measured to determine the Seidel aberrations unambiguously from a Zernike expansion.

Relation with Strehl ratio.
The Strehl ratio is defined as the ratio of the intensity I at the Gaussian image point in the presence of aberration, divided by the intensity I 0 when no aberration was present, as shown in figure 16. It is given by [19]: where W is the wavefront aberration with respect to the best reference sphere in the unit of wavelength. The Strehl ratio is a good measure of image quality when an optical system is well corrected. For modest amounts of aberrations, equation (70) can be approximated as [86,87]: where σ 2 is the variance of the wavefront across the pupil and is defined as [19]: The Strehl ratio is inversely proportional to the variance of a wavefront, which can be characterized by Zernike expansion coefficients.
where n and m are non-negative integers and n ⩾ m. The XY monomials are also frequently used for representing wavefront aberrations, largely because they are a simple and complete set of basis functions. However, they are less popular than Zernike polynomials, especially after the 1980s, due to their non-orthogonality [88]. The conversions of wavefront expansion coefficients based on XY monomials and Zernike polynomials have been discussed by several authors and can be found in [88,89].

Jacobi polynomials.
The Jacobi polynomials are a class of classical orthogonal polynomials and can be defined by Rodrigues formula as [51]: where α, β > −1. Their explicit expressions are given as [51]: They are orthogonal with respect to the weight (1 − x) α (1 + x) β on the interval [−1, 1]: The Zernike radial polynomials are a special case of the Jacobi polynomials multiplied by ρ m with [90]: The first few terms of the Jacobi polynomials are illustrated in figure 17. For more information about the Jacobi Polynomials, one can refer to [51,91].

Legendre polynomials.
The Legendre polynomials, sometimes called Legendre functions of the first kind, are solutions to the Legendre differential equation. They are a special class of the Jacobi polynomials with α = β = 0 and can be defined by Rodrigues formula as [51]: Their explicit expressions are given as [51]: They relate to the Zernike radial polynomials via [21]: The first few terms of the Legendre polynomials are illustrated in figure 18. For more information about the Legendre Polynomials, one can refer to [51,92].

Bessel functions.
The nth-order Bessel function of the first kind is defined as [92]: They relate to the Zernike radial polynomials via [32]: which is of great importance for the reduction of the diffraction integral in the Nijboer-Zernike theory [6,21]. The first few terms of the Bessel functions are illustrated in figure 19.
They relate to the Radon transforms of Zernike radial polynomials ℜ m n via [93]:  The equation (85) can be used to compute the Zernike radial polynomials for large values of the degree n [94]. The first few terms of the Chebyshev polynomials of the second kind are illustrated in figure 20.

Pseudo Zernike polynomials.
The pseudo Zernike polynomials (see table 14), first derived by Bhatia and Wolf in 1954 [3], are a set of polynomials orthogonal over a unit circle and analogous to complex Zernike circle polynomials. They are obtained by eliminating the condition n − |l| = even from the definition of the complex Zernike circle polynomials in equation (16). Specifically, the pseudo Zernike polynomials are defined as: where n is a nonnegative integer, l is an integer, and n − |l| ⩾ 0; the radial polynomials of pseudo Zernike polynomials can be written as: The relation between the pseudo Zernike radial polynomials (equation (87)) and the Zernike radial polynomials (equation (19)) is given by [3]: The first few terms of the pseudo Zernike radial polynomials are illustrated in figure 21. Pseudo Zernike polynomials can be used for wavefront sensing [83], and to define pseudo Zernike moments, which can generate moment invariants as shape descriptors for pattern recognition (section 4.6.2).

Zernike polynomials over arbitrary pupil shapes
Zernike circle polynomials are in widespread use for wavefront analysis in optical systems with circular pupils. They are unique in the sense that they are not only orthogonal across a unit circle, but they also represent balanced aberrations yielding minimum variance. However, in practice, optical systems do not always have circular pupil shapes. Non-circular pupils, such as annular, hexagonal, elliptical, rectangular, and square, are also very common. For example, many telescopes, such as the Hubble space telescope, have annular pupils [95,96]; some mirrors of large telescopes are segmented into small hexagonal segments to facilitate fabrication, testing, and alignment [97]; the pupil of a human eye is slightly elliptical [98]; rectangular or square optics are applied in anamorphic optical systems [99,100] and high-powered laser systems [101]. In such cases, Zernike circle polynomials are no longer orthogonal and their advantages are lost. It is necessary to construct new orthonormal polynomials for aberration representation. Methods for constructing orthonormal polynomials mainly include the recursive Gram-Schmidt process [37] and the nonrecursive matrix approach [102]. The Gram-Schmidt orthogonalization approach is briefly summarized below.
Using the Gram-Schmidt orthonormalization process [103], a set of polynomials F j (x, y) orthogonal over noncircular pupils can be constructed from Zernike circle polynomials as [4,37,104]: where Z j+1 F i denotes the mean value of Z j+1 F i and is defined as: where A is the area of the region of integration. N j+1 is a normalization factor and can be expressed as: The constructed polynomials satisfy the following orthonormality condition: Since an orthonormal polynomial is a linear combination of Zernike circle polynomials (equation (89)), the wavefront decomposition with a set of orthonormal polynomials over noncircular pupils is identical to the decomposition with a corresponding set of Zernike circle polynomials. However, in this case, the Zernike circle polynomials do not represent balanced aberrations and their expansion coefficients lack physical significance [105].
The constructed orthogonal polynomials are determined recursively and each term is a linear combination of Zernike circle polynomials with no higher radial order. The Gram-Schmidt orthonormalization approach can be applied to construct orthonormal polynomials over any pupil shape [106,107]. Figure 22 presents five common noncircular pupils, including annular, rectangular, square, hexagonal, and elliptical pupils. Orthonormal polynomials over these noncircular pupils can be obtained using the Gram-Schmidt orthogonalization process and are tabulated in table 15.

Zernike polynomials over annular pupils
Annular pupil plays an important role in optical systems, such as telescopes for astronomical observation [96] and stitching interferometers for aspheric wavefront testing by annular sub-apertures [109][110][111]. Orthonormal Zernike polynomials over annular pupils, called Zernike annular polynomials, can be constructed using the Gram-Schmidt orthogonalization process based on Zernike circle polynomials. Zernike annular polynomials first appeared in a report of Perkin-Elmer Corporation in 1971 [4], were later discussed by Tatian in 1976 [5] and systematically studied and explicitly given by Mahajan in 1981 [4].
Zernike annular polynomials are defined over a unit annular disk with an obscuration ratio of ε (0 ⩽ ε < 1) and can be most conveniently expressed in polar coordinates (ρ, θ), where ρ is the normalized radial coordinate (ε ⩽ ρ ⩽ 1) and θ is the polar angle measured counterclockwise from the +x-axis (0 ⩽ θ < 2π), as shown in figure 23.

Real Zernike annular polynomials
Real Zernike annular polynomials have normalized and non-normalized forms. The normalized form defined under the Noll indexing scheme can be written as [4,35,112]: where the index n is the degree of the radial polynomials, R m n (ρ; ε); the index m is the azimuthal frequency describing the repetition of the angular function; n and m are non-negative integers and satisfy n − m ⩾ 0 and n − m = even; j is a modeordering number starting from 1, and ε is the obscuration ratio. There are a total of (n + 1)(n + 2)/2 linearly independent polynomials for a specific degree of n. The radial polynomials R m n (ρ; ε) can be obtained by Gram-Schmidt orthogonalization and are given by: where: (95) and the weighting factor ω m n can be determined according to the orthogonality condition of the radial polynomials as: Exemplary profiles of the radial polynomials are shown in figure 24. It is easy to verify that when ε = 0, Zernike annular polynomials reduce to circle polynomials. The normalized Zernike annular polynomials meet the following orthonormality condition: where δ jj ′ is the Kronecker delta function.
Similar to Zernike circle polynomials, orthonormal Zernike annular polynomials can be sorted by either the single index, j, or the double indices, n and m. The former is useful for describing Zernike expansion coefficients while the latter is useful for unambiguously describing the functions. To convert between the indices n, m, and j, one can use the relationships described in equation (8) and (9). Table 16 lists the first 28-term orthonormal real Zernike annular polynomials in the polar coordinate system and the values for n, m, and j. For more terms up to the 45th, one can refer to the tables 5-7 in [104]. The non-normalized Zernike annular polynomials can be obtained by dropping the normalization factors from the orthonormal Zernike annular polynomials as: They satisfy the following orthogonality condition: Figures 25 and 26 show the 3D visualization of the nonnormalized Zernike annular polynomials up to the sixth degree for ε = 0.6 and their corresponding interferometric fringe patterns as in optical testing [37].
Moreover, the radial polynomials of Zernike annular polynomials are also orthogonal over the annular aperture and satisfy the following relationships [4]:

Recurrence relation.
The recurrence relationship for generating radial polynomials of Zernike annular polynomials was derived by Tatian in 1974 [4,5] and can be written as: where n and m obey the same conditions defined in Zernike annular polynomials (n and m are non-negative integers, n − m ⩾ 0 and is even); k is a non-negative integer (k = 0, 1, 2, …, ∞); l = (n − m)/2; u = ρ 2 ; Q m l (u) is a set of orthogonal polynomials obtained by orthogonalizing the sequence 1, u, ..., u l over the interval (ε 2 , 1) with a weight function u m and can be written as [4,5]: The coefficient h m l is: Especially, when m = n, The above recurrence relationship can be initialized with R 0 0 (ρ; ε) = 1.

The Fourier transform.
The Fourier transform of Zernike annular polynomials is derived by Dai and Mahajan [95] and can be written as: where (r, ϕ) denotes the polar coordinates in the frequency domain and:

Radial polynomials
Equation (94) Equation (101) Coordinate system Ordering n, m both in ascending order where J is the Bessel function of the first kind (equation (82)) and: A list of the first few terms for H m n , g n ′ , and h n ′ ′ can be found in [95]. The expression in equation (111) reduces to the Fourier transform of Zernike circle polynomials (equation (30)) when ε = 0.

Wavefront fitting.
The orthogonality of Zernike annular polynomials makes them an excellent basis for wavefront analysis in annular optical systems. An annular wavefront can be represented by the linear combination of finite terms of Zernike annular polynomials as [113]: where J is the total terms of the polynomials, a j is the expansion coefficients, and Z j is the jth-term Zernike annular polynomial. The equation can be equivalently expressed in Cartesian coordinates as: Written in discrete and matrix forms, equation (51) becomes: where: where K is the total number of data points within the unit circle. Generally, equation (117) is an overdetermined linear system, where there are more equations (K) than unknowns (J). It can be written into the normal equation [34,68]: where the superscript T denotes matrix transpose. The solution can be obtained by matrix inversion as: Figure 27 shows an example illustrating annular wavefront decomposition using 28-term orthonormal Zernike annular polynomials under the Noll indices. The amplitude of each coefficient indicates the strength of corresponding aberrations (table 16).

Applications
The unique properties of Zernike polynomials have enabled them to be an attractive mathematical tool in many fields. In this section, we survey their applications in a range of fields, including diffraction theory, optical design, optical testing, adaptive optics, ophthalmic optics, and image analysis, as illustrated in figure 28.  [32,47]. In a perfect optical imaging system, the light waves from a point object emerge in the image space as spherically convergent waves and form the well-known Airy pattern. However, a perfect imaging system never exists in practice. Waves emerging from a practical optical system deviate from a spherical wave and possess complicated forms.
Consider the wave propagation model illustrated in figure 29, where an aberrated wavefront at the exit pupil converges to the image plane. Let W a and W r denote the aberrated wavefront and its Gaussian reference wavefront in the unit of length, respectively. The position of the exit pupil is defined by the Cartesian coordinates (x, y, z) or the cylindrical coordinates (ρ, θ, z); the position of the image plane is defined by the Cartesian coordinates (ξ, η, ζ) or the cylindrical coordinates (r, ϕ, υ). The complex amplitude distribution at the exit pupil, called the pupil function, can be written as [114]: where A(ρ, θ) is the amplitude function and Φ is the phase function in the form of: where λ is the wavelength. According to the scalar Debye integral [32,114], the normalized complex amplitude, U(r, ϕ, υ), in the focal region of the image plane is given by:  where υ is defined as the negative axial coordinate (-ζ) normalized with respect to the axial diffraction unit, λ/(πs 2 0 ), and s 0 is the numerical aperture (NA) of the focusing beam. When the image plane is at the best focus (υ = 0), the complex amplitude U(r, ϕ, υ) in equation (123) reduces to the Fourier transform of the pupil function P(ρ, θ).
The PSF, defined as the diffraction pattern of a point object in the image plane, can be written as the squared modulus of the complex amplitude U [115,116], i.e.
The image of an extended object formed by an optical system is the convolution of the object itself with the PSF of the system, which can be mathematically modeled as [117]: where f and g denote the object and the image, respectively, and * represents convolution. To understand the impact of wavefront aberrations on the final image quality, the PSF needs to be evaluated. In the next section, we briefly review the analytical PSF computation approaches first developed by Nijboer and Zernike [6,21] and later extended by Janssen [8], where expanding wavefront aberrations at the exit pupil using Zernike circle polynomials is the key.

PSF computation using the Nijboer-Zernike theory.
In general, analytical evaluation of the diffraction integrals in equation (123) is difficult except for some specific cases.
In 1942, Bernard Nijboer, a PhD, student of Zernike, expanded the aberration function at the exit pupil into a series of Zernike circle polynomials and formulated an efficient representation of the complex amplitude distribution in the image plane [6,21]. This work allows analytical evaluation of the diffraction integral and the PSF of a general optical system and is referred to as the Nijboer-Zernike theory. For completeness, we briefly review the basic principle of the theory, which is well summarized in [114].
In the Nijboer-Zernike theory, the pupil function is assumed to be uniform in amplitude and thus can be written as a purely phase-aberrated function, i.e.
Expanding the pupil function into the Taylor series, the diffraction integral in equation (123) becomes: Expanding the phase function, Φ(ρ, θ), into a set of Zernike circle polynomials gives: Substituting the Zernike expansion into the integral (equation (127)) and performing the integration over θ using elementary Bessel function operations, we obtain: where α m n is the Zernike expansion coefficients and J m is a Bessel function of the first kind and of order m. Note that in the reduction, the phase aberration is considered small enough so that truncation of the infinite series in equation (128) after the term k = 1 is allowed. The above equation can be further reduced using the relationship in equation (83) as: where the prime indicating that m = n = 0 should be excluded from the summation. The expression of the complex amplitude, U, provides an analytical method for evaluating the PSF of an optical system. Although elegant in expression, the Nijboer-Zernike approach is not widely used in practice [114], largely because that the derivation requires the amplitude over the pupil to be uniform and the wavefront aberration is restricted to be sufficiently small (in the order of a fraction of the wavelength [32]). Figure 30 illustrates the appearance of the PSF of an optical system when only a single Zernike term (root mean square value: 0.1 µm, wavelength: 570 nm) is present in the wavefront aberration. Figure 31 presents an example showing that wavefront aberrations degrade the image quality of an optical system.  [8,114,118,119]. The extended Nijboer-Zernike theory can analytically compute the PSF of an aberrated optical system described by Zernike coefficients and accelerates further developments in focused field diffraction theory.
The extended Nijboer-Zernike theory adopts a generalized definition for the pupil function and expands it using Zernike circle polynomials as [114]: The symbol n k denotes combination and is defined as: Note that the equation (134) suffers from loss-of-digits and slow convergence for larger υ under standard precision. An advanced version of the ENZ-theory has been developed to virtually eliminate the convergence problem by replacing the power-Bessel series in equation (134) with Bessel-Bessel series [120]. Using equation (134), we can compute the PSF of an optical system with an exit pupil defined by a set of βcoefficients (equation (131)). The extended Nijboer-Zernike theory has been used in several applications, such as aberration retrieval in high-NA optical lithography systems [121][122][123] and acoustic diffraction problems [124,125].

Optical design
Optical design is the process of designing an optical system to meet specific performance requirements and constraints. Owing to their unique properties, Zernike polynomials are beneficial to wavefront analysis and surface representation in modern lens design programs. In wavefront analysis, since Zernike expansion coefficients are independent and directly represent balanced aberrations, it is convenient to decompose wavefront aberrations of an optical system into a set of Zernike polynomials to evaluate the contribution of each aberration [25]. Moreover, the coefficients of Zernike polynomials can also be used as variables of the merit function of a lens system to facilitate system optimization [126].
In surface representation, Zernike polynomials have emerged as a means of describing the shape of freeform optical surfaces [127][128][129][130][131]. State-of-the-art lens design programs, such as Zemax and CODE V, empower optical designers to use Zernike polynomials to represent freeform surfaces, which are called Zernike surfaces. For example, Zernike phase surfaces and Zernike sag surfaces are defined and used in Zemax. The Zernike phase surfaces are standard surfaces, such as planes, spheres, and conics, superimposed with phase terms defined by Zernike polynomials [132]. The phase term can be written as: where m represents the diffraction order, Z j is Zernike circle polynomials, and a j is the expansion coefficients, ρ is the normalized radial coordinate and θ is the polar angle. This surface type is well suited to modeling system aberrations for which measured interferometer data is available [132]. The Zernike sag surfaces are defined as the conic surface (figure 32) plus additional deformation terms characterized by even orders of the power series and finite terms of Zernike polynomials [58,132]. They are given by: where c denotes the curvature of the base conic; r = x 2 + y 2 is the radial ray coordinate in lens unit; k is the conic constant; α j and a j are the coefficients of the power series and the Zernike polynomials, respectively; J is the maximum number of terms of the Zernike polynomials; ρ is the normalized radial coordinate and θ is the polar angle. The Zernike phase surfaces (equation (138)) describe phase variation of a surface while the Zernike sag surfaces (equation (139)) characterize surface deformations. These Zernike surfaces can also employ Zernike annular polynomials to define the aspheric terms when an optical system has an annular pupil. Figure 33 is an example showing the design of a long wave infrared reflective imaging system optimized with Zernike surfaces [133].

Optical testing
Optical testing is concerned with testing the optical quality of optical systems by optical techniques [134,135]. The applications of Zernike polynomials in optical testing are mainly concentrated in the field of optical surface or wavefront measurement by phase-shifting interferometry [136], the principal purpose of which is to determine the aberrations present in an optical component or an optical system [19]. There are many different types of phase-shifting interferometers used in practice, such as the Fizeau, Mach-Zehnder, and Twyman-Green interferometers [137][138][139][140]. Here we use a phase-shifting Twyman-Green interferometer as an example to demonstrate the usefulness of Zernike polynomials in precise surface figure measurement. A typical optical layout of a phase-shifting Twyman-Green interferometer is shown in figure 34. The emitted beam from a laser source is first collimated by a beam expander and then divided by a beam splitter into two parts. The reflected part (red) propagates to a reference mirror and is then reflected back serving as the reference beam. The transmitted part (blue), after passing through a compensation lens, is incident onto the optical surface under test and then reflected back along the same path. The reference beam and the measurement beam meet at the beam splitter, interfere with each other, and produce fringe patterns with periodic intensity modulation. The fringe patterns carry the surface figure information of the optical component under test and are finally recorded  by a charge-coupled device (CCD) detector. Phase shifting is achieved by moving the reference mirror a certain amount with a piezoelectric transducer (PZT).
The intensity of a fringe pattern can be mathematically modeled as [142][143][144]: where a(x, y) and b(x, y) are the background and the modulation terms, respectively, and Φ(x, y) is the phase map to be recovered. There are three unknowns in equation (140), indicating at least three frames of phase-shifted interferograms are needed to recover the phase function. This is known as the three-step or three-bucket phase demodulation algorithm. In practice, there are many more different phaseshifting algorithms in use, such as the four-step, five-step, and least-squares algorithms [137,138,142]. Herein the four-step algorithm will be introduced. Suppose that four interferograms with phase shifts of 0, π/2, π, and 3π/2 are collected. Their intensity functions can be written as: The phase, Φ(x, y), can be simply calculated as: Since the value of the inverse tangent function is within [−π, π], the calculated phase, Φ(x, y), is typically wrapped. An extra process, called phase unwrapping [145,146], is needed to yield a continuous phase map.
Generally, misalignment errors, such as tilt and defocus, are present in the phase function, Φ(x, y) and need to be removed to reveal the true surface figure. This can be achieved by expanding the phase function into finite terms of Zernike polynomials and eliminating the coefficients of the tilt and defocus terms. Mathematically, the phase expansion can be written as: where a j is the expansion coefficients and can be computed by the least-squares method described in section 2.3.1. The final figure map of the surface under test can be obtained as [141,147]: where a 0 , a 1 , a 2, and a 3 represent the coefficients of the piston, x-tilt, y-tilt, and defocus terms of the Zernike expansion, respectively. The Zernike expansion coefficients, a j , can be further used to calculate the PVr (peak-to-valley robust) [148], which is a robust amplitude parameter for describing the figure error of the optical surface under test. For ease of understanding, the whole procedure for surface figure retrieval from a set of phase-shifted fringe patterns is illustrated in figure 35 and a state-of-the-art commercial Fizeau interferometer for optical testing is shown in figure 36.

Ophthalmic optics
The eye, like any other optical system, suffers from a number of specific optical aberrations [149]. Aberrations of eyes with refractive errors include lower-order aberrations and higherorder aberrations. Lower-order aberrations, such as myopia, hyperopia, and regular astigmatism, account for approximately 90% of the overall ocular aberration and are the most common causes of visual impairment [150]. In contrast, higher-order aberrations, such as spherical aberration, coma, and trefoil, account for less than 10% of ocular aberrations but they may significantly impact on visual performance when the pupil is large [149,151]. Measuring the aberrations of the human eye can provide objective and quantitative data for vision correction and is of critical importance to certain corrective measures [23,152], such as wavefront-guided refractive surgery [153,154], which has been a paradigm shift in the field of refractive error correction. The most commonly used tool for the measurement of ocular aberrations is the Shack-Hartmann wavefront slope sensor [155], which was developed by Shack and Platt in the late 1960s [156,157] and is an evolutionary technology of the Hartmann Screen test. The Shack-Hartmann wavefront slope sensor can measure wavefront like an interferometer but uses optical components less expensive.
The setup and principle for aberrations measurement of the eye using a Shack-Hartmann wavefront slope sensor are illustrated in figure 37. An incident infrared light beam is reflected by a beam splitter and focused onto the retina. Since the beam diameter is small (approximately 1 mm), the light spot on the retina can be regarded as a point source independent of eye aberrations. This point source emits spherical waves, which will be affected by eye aberrations and become aberrated planar waves when leaving the eye. The aberrated waves pass through the beam splitter and are detected by a Shack-Hartmann wavefront slope sensor, which consists of a 2D microlens array and a CCD camera located at the focal plane of the microlenses. In this arrangement, the whole aberrated wavefront is actually divided into many smallareas, which can be locally treated as plane waves and are individually focused onto the CCD camera. When the eye is aberration-free, the outgoing wavefront from the eye is planar and the CCD camera detects a regular spot pattern (shown as black dots in figure 37(b)). In contrast, when the eye has aberrations, the outgoing wavefront from the eye is aberrated and individual parts of the wavefront are tilted with respect to the reference wavefront, resulting in displaced focal spots (shown as red dots in figure 37(b)) after being imaged onto the CCD camera. The magnitude of the position shifts of the displaced spots reflects the tilt amount of the measured wavefront and can be used to recover the original wavefront using the algorithm described below.
In a Shack-Harmann wavefront slope sensor, the relationship between the position shift of an actual spot and the slope of an aberrated wavefront can be written as [22,23]: where f is the focal length of the microlens array; ∆x and ∆y denote the shifts of the actual spot with respect to its ideal position in the x and y directions, respectively. Based on these relationships, the aberrated wavefront, W(x, y), can be recovered using either zonal or modal algorithms [22,24]. In a modal reconstruction, the wavefront is represented by finite terms of Zernike circle polynomials as: where a j is the expansion coefficients. Taking the derivatives with respect to x and y for both side of equation (146) at each sampling points gives [23]: where K is the total number of sampling points. Substituting equation (145) into equation (147) yields a matrix equation as: where: s is a 2 K × 1 column vector containing measured wavefront slope data, a is a J × 1 column vector containing the unknown Zernike expansion coefficients, A is a 2 K × J coefficient matrix whose elements can be computed using the derivative formulas of Zernike polynomials in equation (42). The unknown a can be computed by matrix inversion as: Substituting the expansion coefficients (equation (152)) into equation (146) gives the wavefront aberrations of the eye. The first measurement of ocular aberration using a Shack-Hartmann wavefront slope sensor was performed by Liang et al in 1994 [23]. The wavefront sensor was later improved by increasing sampling density to provide more complete descriptions of the aberrations of the eye, including irregular and classical aberrations [152]. Since then, measuring ocular aberrations by Shack-Hartmann wavefront slope sensor has become common in clinical practice. Ocular aberrations can also be measured by a wavefront curvature sensor, in which curvature polynomials can be used to obtain Zernike aberration coefficients [158].
In addition to wavefront reconstruction in a Shack-Hartmann wavefront slope sensor, Zernike polynomials are also very useful in the analysis of the aberrations of the eye [159]. Since Zernike polynomials are orthogonal over a circular disk, their expansion coefficients contain a wealth of measurable metrics, such as root mean square error, equivalent defocus, spherocylindric refraction values [155] can be derived for more illustrative description of eye aberrations. Consensus recommendations on definitions, conventions, and standards of Zernike polynomials were developed by OSA in 1999 for reporting of optical aberrations of the human eye [27]. The recommendations were later standardized in ANSI Z80.28 [14] and ISO 24157 [15,17] and accepted by the vision community. Figure 38 shows a photograph of a commercially available aberrometer, which uses a Shack-Hartmann wavefront slope sensor for aberrations measurement. Zernike polynomials are used for wavefront reconstruction and aberration reporting of the eye [155].

Adaptive optics
Ground-based telescope is an important tool to explore the universe. Its image quality is critical to astronomical observations but can be degraded significantly by atmospheric turbulenceinduced optical aberrations. Naturally, light coming from distant stars is plane waves before reaching the atmosphere of the earth and can theoretically form images limited only by the optical diffraction limit. However, due to the effect of atmospheric turbulence, the light wavefront will be distorted when propagating through the atmosphere, degrading the image quality of a telescope. Adaptive optics is such a technology that can improve the performance of an astronomical telescope by compensating wavefront aberrations induced by atmospheric turbulence using wavefront correctors [160,161]. The technique was first envisioned by Babcock in 1953 [162] but did not come into common usage until the 1990s.
A typical adaptive optics system for an astronomical telescope consists of three principal subsystems: a wavefront  Schematic of an astronomical telescope equipped with an adaptive optics system, which contains a deformable mirror, a wavefront sensor, and a control computer. Reproduced with permission from [163]. sensor, a deformable mirror, and a control computer [163], as illustrated in figure 39. Its working principle is sketched in figure 40 and can be understood as follows. A telescope captures the light from the object of interest, such as a distant star or a satellite. Before being focused on the camera, the light is first sampled by a wavefront sensor, such as a Shack-Hartmann wavefront sensor, and the sampling data are transferred to a control computer. The control computer performs mathematical reconstruction to recover the wavefront distortion of the sampled light and drives a servo system to control the wavefront corrector, such as a deformable mirror, to compensate for the wavefront distortion. After compensation, the wavefront of the light should be less distorted, yielding images with improved quality at the camera. If the light from the object is too faint to determine the wavefront distortion, reference sources, such as nearby natural guide stars or artificial guide stars, can be used to facilitate the correction process. Figure 41 is an example showing that adaptive optics can improve the image quality of a telescope significantly.
The use of Zernike polynomials in adaptive optics can be reflected in two aspects. On one hand, Zernike polynomials provide a unique set of functions for the representation, reconstruction, and analysis of wavefront distortions in adaptive  optics. Generally, atmospheric turbulence, described by the Kolmogorov model [160], generates smoothly varying optical wavefronts [161], which can be decomposed into different modes by Zernike polynomials [9,161]. The decomposition makes it possible to use modal algorithms to reconstruct and analyze wavefronts measured by slope-sensitive wavefront sensors, such as the Shack-Harmann wavefront slope sensor described in section 4.4. On the other hand, Zernike polynomials offer a modal basis for the compensation of wavefront distortions caused by atmospheric turbulence. In practice, both zonal and modal approaches are used for wavefront compensation in adaptive optics [161]. The zonal approach achieves the compensation by an array of independent subapertures while the modal approach compensates for distorted wavefronts over the whole aperture. High-order aberrations are suitable for the use of the zonal method while low-order aberrations described can be compensated for more effectively by Zernike based modal methods. Although Zernike polynomials are not statistically orthogonal and are not independent [161] when used for turbulence compensation, they are near optimum for loworder corrections [9,161,165].

Image analysis
In addition to applications in optics, Zernike polynomials also play an important role in moments-based image analysis. Image moments are real-or complex-valued quantities used to characterize an image function and describe its features. They are commonly used in statistics to characterize the distribution of random variables and in mechanics to measure the mass distribution of a body. The use of moments for image analysis is straightforward if we treat the pixel intensity of a binary or gray level image as a random variable. Image moments, M, can be considered as projections of an image function onto a set of basis functions and are mathematically defined as: M =¨f(x, y)ψ(x, y)dxdy, (153) where f (x, y) is the image function and ψ(x, y) is the basis function. Image moments have been intensively studied in image analysis because they can be used to construct moment invariant features for the description and recognition of deformed objects and patterns. Common moments used for image analysis include geometric moments, rotational moments, complex moments, and orthogonal moments [48,179]. Among them, geometric moments, which use a power series as the basis function [ψ(x, y) = x n y l ], are the earliest. Based on geometric moments, Hu first introduced moment invariants in 1962 using the theory of algebraic invariants and constructed seven moment invariants to linear transformations (translation, rotation, scaling, and skew) [180]. This work opens the door to moment invariants based image analysis and pattern recognition. In contrast to geometric moments, orthogonal moments are a family of image moments that use orthogonal polynomials as the kernel. Orthogonal moments have simple inverse transform and minimum information redundancy compared with geometric moments and are widely used in practice. Zernike moments are an important type of orthogonal moments.

Zernike moments and fast calculation.
Zernike moments for image analysis and pattern recognition were first introduced by Teague in 1980 [12]. They are defined over a unit circle by employing Zernike polynomials as the basis function and can be written as [12]: f(x, y) V l n (x, y) * dxdy, where V l n is the non-normalized complex Zernike polynomials (equation (16)), the asterisk denotes complex conjugate, and M nl is the Zernike moment of degree n with repetition l. n is a non-negative integer, l is an integer, n − |l| ⩾ 0 and is even. The completeness and orthogonality of V l n allow for the representation of a square integrable image function, f (x, y), defined on a unit disk using Zernike polynomials as [181,182]: f(x, y) = n + 1 π n l M nl V l n (x, y).
The expression in equation (155) suggests that the image, f (x, y), can be theoretically reconstructed from its Zernike moments. However, the practical importance of this property is not that significant because moments are not a good tool for image compression in general [183]. For a digital image, equation (154) can be written in discrete form as: where x 2 + y 2 ⩽ 1.  (154)) and also hold the property of rotation invariance. However, they eliminate the constrain of n − |l| = even and thus have more moment invariants for the same degree n [Pseudo Zernike moments contain (n + 1) 2 invariants while Zernike moments have (n + 1)(n + 2)/2]. It is shown that pseudo Zernike moments are less sensitive to image noise than conventional Zernike moments [48]. Pseudo Zernike moments also have fast computation algorithms [196,197] and have been used in a range of image analysis and pattern recognition applications [198,199].

Discussion and conclusion
Although Zernike polynomials have been successfully used in a range of fields, it is important to be aware of potential pitfalls. First, Zernike circle polynomials are only orthogonal over a unit circle. For systems with non-circular pupils, such as annular and hexagonal pupils, Zernike circle polynomials are neither orthogonal nor represent balanced aberrations. In these cases, orthonormal polynomials can be constructed by orthogonalizing Zernike circle polynomials across the pupil [37,105], as discussed in section 3.1. Second, Zernike polynomials are only orthogonal in a continuous fashion. This suggests that in general, they are not or at least not strictly orthogonal over a discrete set of data points in numerical simulation or real experiments. Potential errors should be taken into consideration when data points are sparse or unevenly distributed [200,201]. Third, when comparing Zernike expansion coefficients of two wavefronts, it is important to specify the pupil diameters since the expansion coefficients vary with aperture size. This is especially true when comparing the aberrations of the eye from two measurements. Furthermore, Zernike polynomials may fail to represent some complex, irregular surfaces or shapes using a reasonable number of terms. Representative examples include fabrication errors present in the single-point diamond turning process [19] and irregular corneal aberrations of postsurgical or pathological eyes [202,203].
In conclusion, we provide a comprehensive account of the development of Zernike polynomials in the past several decades, including the history, definitions, mathematical properties, roles in wavefront fitting, relationships with associated physical concepts, and connections with other polynomials, and survey their state-of-the-art applications. Potential pitfalls when using the Zernike polynomials are also discussed.
For Zernike polynomials over circular pupils, there are at least six different indexing schemes used by national and international standards, commercial software, and prominent scientists, including the Noll, OSA/ANSI, Fringe (University of Arizona), ISO-14999, Born and Wolf, Malacara indices. All indices share the same expression for the radial polynomials, which is the eigenfunctions of a second-order rotationally invariant partial differential equation [1,6]. However, they differ from each other in naming, normalization, and indexing strategies, which are compared and summarized (table 8). Zernike polynomials possess rigorous mathematical properties, such as orthogonality and symmetry, and are closely related to other functions, such as XY monomials, Jacobi polynomials, Legendre polynomials, Bessel functions, and pseudo Zernike polynomials. Their Fourier transform, integration representation, derivative, and recurrence relations can be explicitly obtained to facilitate solving complex problems. Zernike polynomials are well-suited for wavefront analysis in optics because they have good corresponding relationships with Seidel aberrations. The wavefront fitting problem can be solved using the least-squares method. Expansion coefficients represent the standard deviations of corresponding aberration terms (except the piston term) and contain a wealth of information about the wavefront. The expansion coefficients can be easily transformed when the original wavefront is translated, rotated, or resized (section 2.3.2).
Zernike circle polynomials are only orthogonal over the interior of a unit circle. Polynomials orthogonal over non-circular pupils can be constructed based on Zernike circle polynomials. The most commonly used construction approach is the recursive Gram-Schmidt orthogonalization method. Based on this method, orthonormal polynomials over five noncircular pupils, including annular, rectangular, square, hexagonal, and elliptical pupils common in optics, are discussed. The orthonormal polynomials over annular pupils, called Zernike annular polynomials, are reviewed with emphasis due to their practical significance. The Zernike annular polynomials are defined based on the Noll indices and their recurrence relations and Fourier transform are explicitly presented. The Zernike annular polynomials have similar corresponding relationships with Seidel aberrations as Zernike circle polynomials and are well-suited for wavefront analysis over annular pupils.
In addition, we also survey state-of-the-art applications of Zernike polynomials in a range of fields, including the diffraction theory of aberrations, optical design, optical testing, ophthalmic optics, adaptive optics, and image analysis. In the diffraction theory of aberrations, Zernike polynomials are used to expand the wavefront aberration at the exit pupil of an optical system and corresponding expansion coefficients are used to compute the PSF at the image plane according to the (extended) Nijboer-Zernike theory. In optical design, Zernike polynomials are used to analyze the wavefront aberration of a designed optical system, represent freeform surfaces, and facilitate system optimization. In optical testing, Zernike polynomials are used to fit measured interferometric wavefronts and remove misalignment errors. In ophthalmic optics, Zernike polynomials are used to reconstruct ocular wavefront measured by a Shack-Hartmann wavefront slope sensor and report optical aberrations of the eye. In adaptive optics, Zernike polynomials are used for the representation, reconstruction, and compensation of optical wavefronts distorted by atmospheric turbulence. In image analysis, Zernike polynomials are used to define Zernike moments and pseudo Zernike moments, which hold the property of rotation invariance and can be used as shape descriptors for pattern recognition.
This review is aimed to clear up the confusion of different indexing schemes, provide a self-contained reference guide for beginners as well as specialists, and facilitate further developments and applications of Zernike polynomials.

Data availability statement
The data generated and/or analyzed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.