Optical Dirac equation

We write the charge-free Maxwell equations in a form analogous to that of the Dirac equation for a free electron. This allows us to apply to light some of the ideas developed for the relativistic theory of the electron. Valuable insight is gained, thereby, into the forms of the optical spin and orbital angular momenta.

Maxwellʼs equations are remarkable in that they have survived the introduction of quantum theory and of both the special and general theories of relativity. The form in which the equations are expressed has changed many times, however. It is interesting to note that in his original paper, Maxwell used no fewer than twenty symbols to represent his various fields [1]. It was Heaviside who, in the words of Sir Edmund Whittaker, 'clear(ed) away this accumulation of rubbish' and gave us Maxwellʼs equations in their now familiar form [2]. For the free electromagnetic field, that is in the absence of charges and currents, we write these as Here we have used the natural system of units in which ε 0 and μ 0 are both unity, as is the speed of light. The quantities E and B are the (real) electric field and the magnetic field (or perhaps, more properly, the magnetic flux density). It is common, also, to write these in a manifestly Lorentzinvariant form by use of the field tensor μν F or, indeed, in general covariant form to allow for the description of non-inertial frames of motion or the effects of a gravitational field [3,4].
We shall be interested in the links between Maxwellʼs equations and those arising in quantum mechanics and especially in relativistic quantum theory. Perhaps the first step in this direction was published in the earliest days of the old quantum theory, although such a link was not the initial aim. Riemann and Silberstein introduced the complex vector field the effect of which is to reduce Heavisideʼs four equations into just two [5][6][7][8] The Riemann-Silberstein vector appears in some textbooks [9,10] and has been applied to a variety of problems in classical electromagnetism including calculating the fields associated with a moving charge [11]. Of these two equations, the first may be viewed as a constraint on the forms of the allowed Riemann-Silberstein vectors with the second providing the dynamics. We can write the second equation in a matrix form [12][13][14][15][16]  This is yet more apparent if we multiply by ℏ and introduce the momentum and identify τ p · as our Hamiltonian. It is interesting to note the similarity between this equation and the two component theory of Weyl, appropriate for neutrinos, which we can obtain by replacing the three-component  with a two-component field and the spin-1 τmatrices with the familiar Pauli matrices appropriate for spin-1 2 particles [18]. Although it is not our main aim, no account of this topic would be complete without a discussion of the photon wave function. It is not surprising, given the above analysis, that many authors have sought a wave function for the photon. Perhaps the most natural approach is to adopt the Riemann-Silberstein vector, or equivalently  , for this purpose [12,13,15,16,[19][20][21]. One objection we might raise to this is that  has the dimensions of the square root of an energy density rather than the square root of a (probability) density normally associated with a wave function. It has been suggested that we can solve this problem by dividing  by p or −   ( ) 2 2 1 4 so that each frequency component of the field is divided by ω  [22][23][24][25][26]. The problem with this, however, is that the resulting wave functions exhibit highly nonlocal behaviour and a number of other undesirable features [16]. We can take the Riemann-Silberstein vector as our wave function but it is necessary to get the dimensions right either by dividing by the mean energy or by introducing the reciprocal of the Hamiltonian into the definition of our scalar product. These issues are discussed further in the excellent review article by Białynicki-Birula [16].
It is not our main aim to investigate quantum optical effects and we neither require nor seek a photon wave function. Instead, we seek to exploit the analogy between Maxwellʼs equations and the Dirac equation to learn something about, classical optics and, in particular, the mechanical properties of light. There exist a variety of Dirac-like formulations of Maxwellʼs equations, but the most appropriate for our purposes is essentially that proposed by Darwin [27,28] for a six-component 'spinor' 1 ψ: and it follows that, any spinor of this form will give equivalent results. The choice of spinor is, therefore, a conventional one and we opt to work with ψ as given in (8).
In this paper we apply to our optical Dirac equation some of the methods that have been developed for Diracʼs electron equation. We find, in particular, that the Foldy-Wouthysen transformation and the elimination of optical, Zitterbewegung lead, in a natural way, to familiar slowly-varying quantities.
The paper is intended to present the new material in its historical context and for this reason well-known material is blended with the less familiar and with material we believe to be original. It may help the more experienced and widely-read reader, however, to give an indication of where the novel material is to be found. The optical Foldy-Wouthysen transformation has been adapted from one for particles with spin 0 or spin 1 [47] but has not, to my knowledge, been applied to optics. The associated splitting of the fields into positive and negative frequency parts and the consequent derivation of the paraxial wave equation is, I think, new as is the approach to deriving slowly-varying mechanical properties. The spin and orbital angular momenta, helicity, zilches and infra-zilches have been given previously, but the novelty here is to show that they all arise naturally from the Dirac formalism.
It is pleasing to find, in particular, that many of the hard-won properties of optical angular momentum appear naturally from the optical Dirac equation.

The Dirac equation: a reminder
The Dirac equation was formulated to provide a quantum theory of the electron and a major motivation for this work was the existence of 'duplexity' phenomena, in which the number of observed energy eigenstates for an electron in an atom is twice that given by Schrödinger theory [48]. We note with interest, in light of our investigation of optical angular momentum, Diracʼs observation that 'the resultant orbital angular momentum of an electron moving in an orbit in a central field of force is not a constant'. In its place, it is the total angular momentum, comprised of both an orbital and a spin part that is conserved.
The Dirac equation is now a standard part of the physics curriculum and a discussion of it can be found in any of a number of now standard texts [49][50][51][52][53]. We present, in this section, only a brief review of this material as a precursor to studying the corresponding features of our optical Dirac equation. Further details can be found by consulting the cited texts. The electron Dirac equation has the form where m is the mass of the electron and we continue to work in units in which the speed of light is unity. The state ψ is a four-component spinor and the operators α and β are the 4 × 4 matrices where σ is a vector, the components of which are the familiar Pauli matrices, 0 is the 2 × 2 zero matrix and I is the unit matrix. The square of each of these matrices is the unit matrix and they mutually anticommute α α α α δ βα α β We can identify the positive quantity ρ ψ ψ = † with the probability density for the electron position by normalizing it so that integration over all space gives unity. This density obeys an associated continuity equation, which may be derived directly from the Dirac equation where j is identified as the probability current This leads us to identify α as the velocity operator for the electron. The eigenvalues of the three operators, α x , α y and α z are all ±1, which each correspond to the velocity of light. We can resolve this apparent conflict with experience by noting that a wave-packet formed by superposing energy eigenstates moves with the expected group velocity given by the ratio of the average momentum and the average energy. There is a remnant of the velocity eigenvalues in the rapid oscillations associated with the positive-and negative-energy components of the state. This, Zitterbewegung has a frequency in excess of  m 2 or, equivalently, a spatial period comparable to the Compton wavelength of the electron.

Mechanical properties
The properties of an electron in relativistic quantum theory are represented by operators, just as in the more familiar non-relativistic theory. The position and momentum operators, x and p satisfy the canonical commutation relation which we recognize as the velocity operator. The momentum operator commutes with the Hamiltonian and so, unlike the velocity operator, it is a constant of the motion This demonstrates the conservation of momentum for a free electron. It is with the angular momentum that the surprises begin, as the orbital angular momentum for a free electron is not conserved. We can define an orbital angular momentum operator precisely as in non-relativistic quantum theory = × L x p.
This operator does not commute with the Dirac Hamiltonian and so satisfies the non-trivial equation of motion .
In contrast with the non-relativistic theory, the orbital angular momentum of a free electron, or of one confined a by a rotationally symmetric potential, is not a constant of the motion. Diracʼs resolution of this paradoxical situation was to add to the orbital angular momentum a spin angular momentum Like the orbital angular momentum, this is not a constant of the motion, but satisfies the equation of motion In relativistic quantum theory the spin and orbital angular momenta are naturally coupled together and it is only the total angular momentum, given by the sum of these, that is conserved. The total angular momentum operator Σ = + J L (25) is a constant of the motion. It is this quantity, rather than the orbital or spin angular momenta, that appears in the relativistic expressions for atomic spectra [50,51,54]. We should note that the separation implicit in (25) is not, itself, Lorentz covariant.

Foldy-Wouthuysen transformation
It is apparent that the velocity operator for a Dirac electron is very different from that in nonrelativistic quantum theory. This feature, together with the non-conservation of spin and orbital angular momentum and a desire to understand the non-relativistic limit motivated the discovery of a unitary transformation that diagonalizes the Dirac Hamiltonian [55]. The required unitary operator is generated by an Hermitian and time-independent operator, S, and has the form [51,55] The operator β has eigenvalues ±1, with the positive eigenvalue corresponding to the upper two components of ψ and the negative to the lower two. Clearly, the transformation has succeeded in separating the positive energy states from those with negative energy.
If we consider only low energies in the non-relativistic regime then this, on making a binomial expansion, reduces to ⎛ If we further restrict our attention only to the two-component positive energy part of the state and shift our zero of energy, so as to remove the constant m term, then we recover the familiar Schrödinger equation for a free particle. The operators representing the mechanical and other properties of the electron are also transformed by the Foldy-Wouthuysen transformation. In particular, the position operator x is transformed to [55] The momentum operator commutes with S and so is unchanged by the transformation. The orbital and spin angular momenta are also changed by the Foldy-Wouthuysen transformation, with the orbital angular momentum operator becoming The total angular momentum commutes with S and so is unchanged by the transformation. We can readily use this fact to obtain the transformed spin angular momentum operator. It is important to realize that using the transformed Hamiltonian, states and observables is entirely equivalent to working with the untransformed quantities and the original Dirac equation.

Mean properties
In the non-relativistic limit the position and momentum operators are x and p, as they are in the Dirac theory. With this non-relativistic limit in mind, it is reasonable to ask what role these operators play in the transformed system. We note that in the transformed picture, the original position operator satisfies the equation of motion For a positive energy state this gives the velocity operator we might have expected: E p p . For a negative energy state the velocity is the same apart from an overall minus sign. The, Zitterbewegung or rapid oscillations, associated with an instantaneous velocity equal to that of light, has disappeared. For this reason, Foldy and Wouthuysen refer to the operator x in their transformed system as the, mean-position operator. Working with this mean-position operator, x, and the transformed Dirac equation (30) gives the familiar Schrödinger evolution in the nonrelativistic limit. The other original operators L and Σ are similarly the, mean-orbital and meanspin angular momenta. It is important to note that it is these mean quantities that correspond to the those familiar from non-relativistic quantum theory [55].
The mean quantities can also be used with the original Dirac equation by simply inverting the Foldy-Wouthuysen transformation. This means, in particlar, that the mean position, orbital and spin angular momentum operators appropriate for use with the Dirac equation are, respectively, or, for the original Dirac Hamiltonian

Properties of the optical Dirac equation
It is natural to ask if the Dirac equation can be applied to particles with spins other than the 1 2 associated with the electron [58] and an early application was to the (bosonic) mesons [59]. Our interest, of course, is the representation of the electromagnetic waves in Dirac form and to this end we rewrite our optical Dirac equation (9) in the form where we associate the 'momentum' operator p with −    i and α is a vector the components of which are the 6 × 6 matrices α τ τ To these three matrices we add the 6 × 6 form of Diracʼs β-matrix We have deliberately written these matrices using the same notation as that for the electron Dirac equation, so as to emphasize the similarities between them. The differences are apparent in the fact that the matrices for the optical Dirac equation are 6 × 6, rather than the 4 × 4 for the electron, and also that the commutation relations among the matrices are rather different [47,[59][60][61][62][63][64][65]. These are summarized in appendix A. The optical Dirac equation leads to the same conservation law as for its electronic counterpart (15) but the relevant quantites are now the energy density and the Poynting vector  To these we can add the density of the optical angular momentum, which we write as We trust that no confusion will arise if we stick with established convention in which both the optical angular-momentum density and the electron probability current are represented by j.
Each of these mechanical quantities is globally conserved, of course, and we should confirm this fact using the properties of our optical Dirac equation. The global conservation of energy follows directly from the local conservation law (41) where we have used Gaussʼs theorem and the physical requirement that the fields tend to zero at infinity.
For the remaining properties we can proceed in the same way, using the Dirac equation for ψ and for ψ † . Alternatively, we can exploit further the analogy between the (classical) optical Dirac equation and the (quantum) Dirac equation for the electron, by introducing the 'Hamiltonian' operator and use Heisenbergʼs equation of motion. Let us emphasize that this is an entirely equivalent procedure to working with the optical Dirac equation itself and that, despite the appearance of ℏ, the description is essentially classical. The optical momentum, for example is, expressed in terms of the operator α and the momentum, density at position r therefore corresponds to the operator αδ − r x ( ), where x is the position operator, as before [66][67][68]. Both approaches lead to the same result. For the electromagnetic momentum density we find where we have introduced the summation convention in which a repeated index implies a summation over the three cartesian coordinates and T ji is the familiar momentum flux density [36] ji ij i j i j 2 2 In deriving the momentum continuity equation (46) we use the commutation properties of the Kemmer matrices, given in appendix A and the transverse nature of E and B which mean that = = p E p B · 0 · . If we follow the same procedure for the operator α × x then we find the familiar local conservation for the electromagnetic angular momentum where j is given by (43) and M ji is the angular-momentum flux density [36,69] ji ilk l jk

Optical Foldy-Wouthuysen transformation
The expressions obtained in the preceding subsection are exact (within the Maxwell theory of the free field) and contain both rapidly and slowly oscillating contributions. For a monochromatic field of angular frequency ω, for example, the densities and flux densities of the energy, momentum and angular momentum include constant terms and terms oscillating at frequency ω 2 . In the majority of situations in optics, this frequency is too high to be observed in the interaction with matter and it is appropriate to remove the rapidly oscillating contributions so as to arrive at slowly-varying or mean properties. This process is known by different terms in different parts of optics: we may think of it as averaging over many optical cycles, the slowlyvarying amplitude approximation or, within quantum optics, as making the rotating-wave approximation in the interaction with a detector. The situation is reminiscent and indeed precisely analogous to the, Zitterbewegung apparent in the physical quantities calculated from the Dirac equation for the electron. It is natural to construct the appropriate cycle-averaged optical properties by adopting the same procedure as for the electron: we apply a Foldy-Wouthuysen transformation to diagonalize the Hamiltonian for our optical Dirac equation and then seek suitable operators from which to obtain mean mechanical properties.
The optical Foldy-Wouthuysen transformation we seek is the natural analogue of that for the electron [47] with the required unitary operator having the form S i where θ is a function, to be determined, of the, operator = p p | |. We can expand the unitary operator (50) in a form analogous to that given for the electron transformation in (26), although the form is complicated, somewhat, by the algebra of the Kemmer, as opposed to Dirac matrices. We find the form ⎛ where we define the operator These positive and negative frequency parts of the field are those that arise naturally in the theory of optical coherence and are associated, in the quantum theory of light, with the photon annihilation and creation operators respectively [70][71][72]. The transformed spinor naturally separates these components with positive frequency electric components in the top three entries and negative frequency components in the lower three ⎛ It is useful to note that the action of α × p p on our transformed spinor gives the minus complex conjugate and similary α ψ We emphasize that the combination of this transformed spinor and the transformed Hamiltonian, ′ H , is exact and fully equivalent to our untransformed quantities.
We conclude this subsection by noting that the paraxial approximation, a mainstay of laser physics [73,74], is readily derived from our transformed optical Dirac equation. It follows from (53) and the transverse nature of the fields that our transformed Dirac equation has the form There is no mass term, as in the corresponding equation for the electron (30)  so that cartesian components of the positive and negative frequency parts of both the electric and magnetic fields obey, in this approximation, a suitable paraxial wave equation [73,74].

Mean optical properties
We can use our transformed spinor, ψ′, to arrive at mean mechanical properties by following the procedure Foldy and Wouthuysen applied to the electron Dirac equation [55]. We transform the operators corresponding to the mechanical properties of interest and drop terms that are of odd order in the α, as these introduce cross terms between the positive and negative frequency parts of the spinor. This serves to remove the rapidly oscillating contributions in the same way as the corresponding procedure for the electron removes the, Zitterbewegung.
Let us begin by considering the energy density. We have transformed the spinor and our form for the energy density must be unchanged if we also transform the operator the expectation value of which is the energy density. It is straightforward to see that the operator for the energy density at position r must be δ − r x ( ), where x is the position operator, as 3 † Naturally, we obtain exactly the same result if we work with our transformed spinor ψ′ and the transformed energy density operator where we have used the fact that both ψ′ and ψ′ † contain only transverse fields so that  ψ′ = 0 2 and  ψ′ = 0 † 2 . We obtain the mean or cycle-averaged energy density by keeping only those terms that are of even order in α which is the required average. In reaching this result we have used the anti-commutation relation (A.1) and the complex-conjugation properties (58) and (59). This simple expression for the cycle-averaged energy density illustrates the benefit of working with the transformed spinor.
We can continue in the same way, by keeping terms of only even order in α to find the averaged forms of our other mechanical properties. For the momentum density we find which is the required cycle-averaged form of Poyntingʼs vector. In reaching this expression, we have made use of the readily proved identity The averaged momentum density (66), although evidently correct, is not in its most useful form. We can obtain a form closer to that used in optics [75] by evaluating the operator product α α × p ( ) using (A.9). After a little work we find and so write this in the simpler form We emphasize that here ω is an, operator and that no restriction to monochromatic or nearmonochromatic fields is necessary. The action of the operator ω 1 simply divides each temporal Fourier component of the field by its associated angular frequency. The second line in (70) is a divergence and therefore does not contribute to the total momentum density. It should be noted that this operator introduces some spatial averring in that, for example, is not simply related to B at position r alone. The density of a mechanical property is only defined up to such a divergence and so we are at liberty to neglect this term. If this statement needs justification, we recall that conserved quantities, like momentum, may be obtained from symmetries by means of Noetherʼs theorem and that the local resulting local conservation law only, defines the density up to a divergence [76][77][78][79] 2 . We note, however, that retaining such divergence terms allows us to express the momentum density in a variety of suggestive forms [80], although, because of the non-uniqueness described above, it is difficult to assign to these any particular physical significance. If we drop the divergence term then we are left with the simple form which is reminiscent of the momentum density in non-relativistic quantum theory [81], although the presence of the frequency operator, ω is an important difference.

Optical angular momentum
Our task, in this section, is to determine how the optical angular momentum, the orbital and spin components and the helicity arise from our optical Dirac equation. There are three reasons for undertaking this task. Firstly, of course, our discussion of the mechanical effects of light would not be complete without including angular momentum. Secondly angular momentum played an important role in the development of Diracʼs theory of the electron, in particular in determining the necessity of the existence of electron spin and its intimate relationship to the more familiar orbital angular momentum [48]. Finally optical angular momentum is currently a vibrant and 2 It is beholden on me to alert the reader to two significant errors in the paper by Bliokh et al [79]. Firstly it is stated in their equation (3.36) that the, total spin and orbital angular momenta obtained from the conventional and dual-symmeric Lagrangians are different. They are not, as shown explicitly in [38]. Secondly, in their, note added, the authors, referring to the conventional and dual-symmetric Lagrangians, state that the choice makes a difference and has important physical consequences. It does not; both lead to the same Maxwell equations and are, therefore, physically indistinguishable. They have the same symmetries and conserved quantities as demonstrated in the appendix of [78].
active field of research [82][83][84][85] and we may hope that an enhanced understanding of optical angular momentum might be helpful in the further development of the field.

Spin and orbital components of the angular momentum
The problem of separating optical angular momentum into spin and orbital parts has an interesting history. It has been suggested, on theoretical grounds, that such a separation is not physically meaningful [86][87][88][89][90][91], and such a conclusion is, at first sight, natural as these quantities are not separately conserved for the Dirac electron [48]. This position is somewhat at odds, however, with experimental work which suggested, at least in some circumstances, that precisely such a separation is meaningful [92,93]. The resolution of this apparent contradiction is that the separation of the total angular momentum into spin and orbital parts is indeed a physical one, but that neither the spin nor the orbital components is, by itself, an angular momentum [94,95]. Both the spin and orbital parts are the generators of rotations that are constrained by the requirement that the rotated fields must be transverse [38,39].
We have seen that the total angular momentum density is expressed, in the formalism of the optical Dirac equation in terms of the alpha matrices in the form Hence with the aid of elementary vector identities we find ⎛ 2 We can complete our calculation of the total angular momentum density by acting with α α × r p ( ) · to the, left on ψ † . The result is It is natural to identify these with the orbital, l, and spin, s parts of the angular momentum density and if we write these in terms of the fields and the potentials we find Σ ψ ϕ ψ ϕ where we have made use of (A.5). It follows that we can derive locally conserved quantities using ψ (1) in place of ψ or, indeed, in combination with it. We present just one example where Z 000 is one of the components of Lipkinʼs Zilch [102].
We can extend the family of analogous conserved quantities indefinitely [40,44,78]  are also solutions of the optical Dirac equation. The repeated action of Σ  p · 2 generates successive curl operations on the fields and thereby produces locally conserved quantities that depend on derivatives of the fields of ever higher order.
It is also possible to extend the set of conserved quantities in the other direction, with successive 'inverse-curl' operations to produce the infra-zilches [44,78].

Conclusion
The formal connection between Maxwellʼs equations and the Dirac equation was noticed very soon after the latter was first derived and there is an extensive literature on the topic, much of it aimed at formulating a wavefunction for the photon [16,21]. Our aim in this study has not been to address the quantum properties of light but rather to see what can be learnt about the classical electromagnetic field from the Dirac theory of the electron. We have seen that many of the properties of light emerge naturally from the optical Dirac equation, including its principal mechanical properties and the associated conservation laws. where τ τ a b { · , · } is a symmetric three by three matrix with elements We can combine this with the commutation relation (A.2) to write the product of two τ matrices in the form It is also useful to note a simple expression for products of three matrices [59] α α α α α α δ α δ α + = + .