Modeling "quantum" field theory through "classical" cellular automaton

The purpose of this paper is to illustrate how cellular automaton can be used in order to arrive at discrete, but "classical", model leading to the emergent "quantum" field theory.


Introduction
It is commonly assumed that one of the main problems in interpretation of quantum mechanics is incompatibility of wavefunction collapse with relativity. This, however, is not entirely true. It is possible to claim that our spacetime is fundamentally non-relativistic but there is a "rule" that the "allowed" Lagrangians are the ones that can be written in relativistic form. Such Lagrangians would still produce relativistic phenomenology; at the same time the superluminal signals that are responsible for wavefunction collapse will not be subject to said Lagrangians which will allow them to propagate faster than light. Thus, there will be a preferred frame which will allow us to avoid grandfather paradox. It is true that non-locality is counter-intuitive regardless of presence or absence of relativity. But such non-locality can be avoided by realizing that measurement takes finite amount of time, δt; thus, if the size of the universe (L) is also finite, then superluminal signals that are moving faster than L/δt would account for entanglement, while their speed is still finite. Finally, the collapse itself can be described by means of either Bohmian or GRW models, modified to account for "finite" speed of above signals.
There is, however, another issue that is problematic: what is the physical meaning of ψ, regardless of presence or absence of its collapse? If there were no interference, we could have said that ψ refers to probability. Since there is interference, we would like to instead say that it is analogous to classical electromagnetic field (which is also subject to interference). What stops us from sticking with the second option is the fact that ψ is evaluated over Fock space while classical electromagnetic field lives in R 3 . Apart from that, ψ taken over Fock space also forces us to accept strictly-infinite speed of superluminal signals, while in single particle case it could have been "very large but finite" per earlier argument. While some might assume that in QFT case locality is restored, this is not true either: despite the fact that propagators between spacelike separated points are zero, the "definitions" of in-and out-states that we are ultimately computing are still non-local.
The ultimate solution to these issues is to "encode" functions over Fock space in terms of functions over R 3 . This in principle is possible: one can envision a computer, built in R 3 -based universe, which is programmed to model quantum field theory. We can go around the seeming obstacle that Fock space is "larger" than R 3 through discretization. In the context of QFT each point in space has its own set of degrees of freedom. Thus, the finer the discretization scale, the more degrees of freedom we have. Therefore, we can trade these extra degrees of freedom with the ones needed for creation/annihilation of particles. In other words, Fock space with "rougher" space discretization can be modeled through wave functions living in "finerdiscretized" R 3 . Such "wave functions", however, are not well behaved since we would like every single new point in space to add non-trivial piece of information that is to be utilized. In what follows we will describe how the encoding is done.

Quantum states as ψ(φ)-functionals
We recall that spin-0 QFT is simply a "quantum mechanics" of infinitely dimensional harmonic oscillator. Furthermore, n-dimensional harmonic oscillator establishes one to one correspondence between the "Fock space" over n points with a wave function over R n . Therefore, Fock space in QFT should correspond to wave function over R ∞ , where ∞ is the number of points in R 3 . The role of x k in Harmonic oscillator is taken up by φ( x) in spin-0 QFT (integer index k is replaced by "label" x, while "variable" x is replaced by φ). Thus, ψ( x, t) is replaced with ψ(φ, t) and R ∞ describes the set of all possible φ : R 3 → R. A "point" in R ∞ is one specific φ : R 3 → R, while its x 0 -th coordinate is the value φ( x 0 ). One should note that domain of φ is R 3 rather than R 4 ; in other words we assume an existence of "preferred frame".
The "space derivative" terms imply the coupling between x-th coordinate and ( x + δ x)-th. It is easy to see that the coupled harmonic oscillators can be represented as decoupled ones in "rotated" reference frame; here, we roate {φ( x 1 ), φ( x 2 ), · · · }-frame into {φ( p 1 ),φ( p 2 ), · · · }. Of course, this rotation corresponds to Fourier transform. We will discretize our theory and assume that the only "allowed" momenta are ( p 1 , − p 1 , · · · , p n , − p n ). Thus, for each given k we have two-dimensional harmonic oscillator with coordinates ( p k , − p k ). The resulting ψ(φ) will simply be a product of n such 2D harmonic oscillators. General state becomes which, through Fourier series, corresponds to

Encoding quantum states in terms of functions of x
The definition of quantum states above is "non-classical"; after all, "classical" fields (such as electromagnetic field) are functions over set of points rather than set of functions. Therefore, our next challenge is to replace ψ(φ, t) with ψ( x, t Thus, we have replaced a functional ψ(φ) with functions q(x), φ(x) and ψ(x). This restores classical realism since we now have functions of points (similar to classical electromagnetic field) as opposed to functions of functions (corresponding to quantum probability amplitudes). None of these three functions are "well behaved" (although they "jump" between two different "wellbehaved" trajectories). In some sense, "not well behaved" function is "more complex" than "well behaved" one (in case of "well behaved" function most of the information about its value at a point can be inferred from its values at neighboring points which means that the former carries less nontrivial information). This "extra complexity" is what allows us to use a single "not well behaved" function in order to "encode" probability amplitudes of two different "well behaved" ones at the same time; namely, ψ[sin x] and ψ[cos x]. This, however, comes with a price: each of these two functions is represented with the discretization interval 2δx instead of δx. This price, however, is not a big loss if the functions we are trying to "encode" (which in this case are sin x and cos x) are well behaved. If, on the other hand, we try to encode functions that are not well behaved, the information loss due to δx → 2δx becomes more important. However, we can still "not care" about it if we assume δx is smaller than Planck scale.
We can now relax conditions on q and rewrite the above expression as where q : S → {1, 2} is some unknown function over discrete set S ⊂ R. In order to get "sample" of both sin x and cos x over any given small region, we need to assume that q(x) takes values of both 1 and 2 within any such region: in other words, q(x) is not well behaved. Let us now generalize the above to the situation where we want to define ψ(φ (1) ), · · · , ψ(φ (M ) ) while replacing M = 2 with M 2, x ∈ R with x ∈ R 3 and stationary ψ(φ) with evolving ψ(φ, t). The condition on q(x) generalizes to We will then define φ and ψ according to We can now use the Eq 8 in order to "convert" Eq 2 into a "local" framework: The above is "realistic" definition of general Fock space state described in Eq 1. The domain of ψ is now { x 1 , · · · , x N }, which is a subset of R 3 , as desired! Now, in order for the above to approximate functional ψ(φ) we need to make sure that {φ (1) , · · · , φ (M ) } sprinkles the range of φ densely enough. We accomplish this in the following way. First, we impose a constraint φ min ≤ φ( x) ≤ φ max , where φ min and φ max are chosen in such a way that the "tails" of ψ(φ) that are being "cut off" are "very small". After that, we will discretize the set of trajectories by imposing a restriction φ ∈ {φ (1) , · · · , φ (M ) }. We also assume that the spacelike volume of the universe, V , is finite, and the following set of conditions holds: where δv V is much smaller than Plank's volume, δφ φ max −φ min is the smallest detectable variation in φ, and N M V /δv is the total number of lattice points. This inequality statistically guarantees that any conceivable φ that obeys specified upper and lower bounds can, in fact, be approximated by some sample of M "choices" of φ.
By referring back to the analogy of harmonic oscillator, we see that expectation value of φ grows when we have larger number of particles. Thus, imposing boundaries on φ amounts to making the probability amplitude of "very large" particle numbers "even smaller" than it would have been otherwise. Such probability amplitude will still be non-zero: after all, even if "most" of the support of ψ(φ) is outside (φ min , φ max ) range, the latter is still a subset of the former. On the other hand, if we take a few particle state, ψ(φ) would still have very small "tails" outside (φ min , φ max ) so it would still be affected. Nevertheless, it is still correct to say that probability amplitudes of "very large" particle numbers get lowered a lot more than the ones of few particle states. We can adjust φ min and φ max to make sure that the only states whose probability amplitudes are significantly affected are the ones that are never reached, such as us having enough particles to fill the entire universe with solid matter.
Let us now go back and reflect on what our encoding has accomplished. It is easy to see that Eq 8 can be satisfied if and only if Since both ψ( x i , t) and ψ( x j , t) evolve in time, the above correlation can only hold in one preferred frame. Such preferred frame is enforced by superluminal signals. We assume that superluminal signals move with finite speed but fast enough to pass our universe within very small time (the finite size of the universe follows from the fact that number of points, N , is finite). Thus, the above correlation is approximate, but its error is too small to be detected. Intuitively, we have a hologram. A lattice, consisting of N points, is divided into M different sublattices: (q)-th sublattice is {k|q( x k ) = q}. If we were to make an assumption ψ( x k , t) = δ q( x k ) q(t) (to be abandoned shortly), we would be "looking" at sublattice number q(t) ∈ N while "ignoring" all other sublattices. Changing the sublattice we are "looking" at will result in perceived change of picture while, in reality, the pictures drawn on each sublattice are the same. This is in direct analogy to hologram. Now, if we were to abandone ψ( x k , t) = δ q( x k ) q(t) and replace it with "less restrictive" condition of Eq 11, we will "quantize" the hologram in a sense that we can be "looking" at several different "sub-pictures" at the same time, similarly to particle going through both slits "at the same time". For any given k, the value of ψ( x k ) is a "probability amplitude" that we are looking at the entire sublattice in which the point number k happens to reside. Eq 11 has to hold simply because the questions "are we looking at sublattice in which point i resides?" and "are we looking at the sublattice in which point j resides?" are equivalent if i and j reside in the same sublattice; thus, the "probability amplitudes" assigned to corresponding answers "yes" should be the same. Now, from Section 2, we know that any "fixed" particle state in Fock space corresponds to "several sublattices looked at the same time", as well. But if the "linear combination" of "sublattices" we are considering is different from the ones prescribed by any given state in Fock space, we Finally, similar to what we explained about δx and 2δx in example of sin x and cos x will now be replaced with δv/M and δv. However, this information loss wont be a problem if we assume that δv is below Planck scale. 4. Desired "nonlocal" dynamics of ψ( x k , t) We recall that Fock space and path integral formulations of QFT are equivalent. Therefore, after having convinced ourselves that {q( x k ), ψ( x k ), φ( x k )} describes particle states, we can make them evolve per Hamiltonian evolution if and only if we can design ψ( x k , t) in such a way that it reproduces path integral. In light of the fact that we only have M allowed φ : R → R 3 , any trajectory is bound to look like step function in time. This implies discretization of time. For simplicity, let us for now assume continuous space and discrete time. If we postulate the process where the Lagrangian density L is given by L( z, aδt; φ) = d 3 xd 3 yK( x, y, z, φ( x, (a − 1)δt), φ(y, (a − 1)δt), φ(z, aδt)) For this reason we will refer to K as Lagrangian generator. It turns out that there is a direct way of "reading off" the Lagrangian generator from the Lagrangian. In particular, a Lagrangian can be produced (up to 0(δtα −1/2 )) from the Lagrangian generator where α is very large and We have thus established that Eq 12 is our version of path integral. But, of course, that equation involves ψ(φ) which we "don't like". Therefore, we will use the previous section in "replacing" ψ(φ) with ψ( x k ) which we "like". Thus, we have The last equation refers only to quantities living in R 3 and, at the same time, it reproduces mathematical information of QFT up to desired approximation.
5. "Local" model of emergence of "nonlocal" properties of ψ( x k , t) So far we have solved part of the problem: Eq 20 has "classical ontology" in a sense that it lives in R 3 . The above equation, however, is nonlocal. So the next part of the problem is to come up with "local" mechanism of its emergence. In order to do that, we will picture cellular automaton. Each lattice point can perform only very simple calculation and pass the result of that simple step to other lattice points which perform other steps (the simplicity of work of each given lattice point is due to very small number of "parameters" it can operate with; after all, we would like to view the latter as discretized, non-well behaved, "classical fields"; and the number of fields in "classical physics" is small). The eventual outcome will be the emergence of Eq 20. The process is "local" in a sense that a point i can only "pass" the information to point j if i ↔ j (in other words, i and j are connected by the same edge of the lattice) and it can do so within time period δt or greater. However, it is superluminal; in other words, the "finite" amount of time for the above calculation over entire lattice (which far exceeds N δt) is still very small. Let us now go ahead and describe the algorithm of "calculation". First, lattice points "evaluate" the sum inside the exponent, "store" it in their "memory", and after that they "exponentiate" it. Thus, we will introduce additional parameter, q p , in addition to q( x p ). We then introduce "dynamical" parameters L( x p , t) and S( x p , t). Our goal is to define their dynamics in such a way that, at the equilibrium, where it is understood that K is "very small" unless x i , x j and x k are "very close" (due to the e −αx 2 coefficients in 17, 18 and 19) ; thus, | x k − x p | < implies that non-negligible contributions occur when x i , x j and x k are all close to p. The q o tells us that S( x p ) and L( x p ) specifically refer to "transitions" from φ (q ( xp)) to φ (q( xp)) . The counterpart of Eq 11 is In order to expect large number of lattice points with any given (q, q ) within volume δv, we assume ∆v = M δv =⇒ Eq 10 N M 2 ≈ V ∆v (24) Now, we know from "conventional" physics that ψ evolves while L does not. Therefore, we would like ψ( x p , t) to change while (L( x p ), S( x p )) not to. But our goal is to make everything "mechanical". Thus, instead of simply postulating desired values of (L( x p ), S( x p )), we will introduce a mechanism through which said values are reached. In other words, (L( x p ), S( x p )) does evolve at first, but eventually the evolution becomes negligibly small once equilibrium is reached. Lets start from a toy example of such evolution, for some aforegiven B = {b 1 , · · · , b p }. It can be shown by induction that If we assume n − m p, the first term on right hand side will go to 0; the element b l would appear approximately (n − m)/p times in the sum, leading to If we "wait long enough" we would reach the situation which produces desired result if we set 1 = 2 . Let us now return to L( x p ). Our first goal is to "generate" Eq 22. This immediately tells us that Thus, at any given time, point p adds one of these K-s per prescription in Eq 25. Which of the K-s point p adds is determined by the three memory boxes point p has in which it stores information about three other points. While, at any given time, the point p can remember at most three points, it systematically deletes some points from its memory and replaces them with others. Thus, over time, it ends up adding all possible K-s just like Eq 25 prescribes.
Let us now be more specific. The three memory boxes that point p posesses are defined in terms of internal parameters attached to point p. Thus, memory boxes 1, 2 and 3 are represented by the left hand side of the equations below while the information stored in these boxes is represented by right hand side: where t refers to "slightly earlier" time than t due to delays of superluminal signals, and but the specific choices of i, j and k that satisfy the above vary over time. Thus, the 25 is produced per the following algorithm: t)). Now, the information stored in the memory boxes changes whenever point x p receives a signal from some other point x l such that q( x l ) ∈ {q( x p ), q ( x p )}. If q( x l ) = q ( x p ), then, in x p -s mind, j will replace i and l will replace j. If q( x l ) = q( x p ) then in x p -s mind l will replace k. All these operations, of course, are performed by point x p . Therefore, point x p needs to know the information about l before it performs the above operations. In fact, even if q( x l ) ∈ {q( x p ), q ( x p )}, point p needs to first know the parameters of l in order to "realize" that it has to "ignore" l. Such "knowledge" that point x p possesses is represented by left hand side of the following equation: and the mechanism of "storing" it into one of the memory boxes is described by the following algorithm: , q 2 ( x p ), L 2 ( x p ), S 2 ( x p ), ψ 2 ( x p ), φ 2 ( x p ), x 2 ( x p ))(t + δt) = = (q 2 ( x p ), q 2 ( x p ), L 2 ( x p ), S 2 ( x p ), ψ 2 ( x p ), φ 2 ( x p ), x 2 ( x p ))(t) c) (q 3 ( x p ), q 3 ( x p ), L 3 ( x p ), S 3 ( x p ), ψ 3 ( x p ), φ 3 ( x p ), x 3 ( x p ))(t + δt) = = (q 3 ( x p ), q 3 ( x p ), L 3 ( x p ), S 3 ( x p ), ψ 3 ( x p ), φ 3 ( x p ), x 3 ( x p ))(t) Now we have to fill in a gap and explain how do the said signals (described by parameters with index 0) come about. In order for Rule 1 to agree with process described in Eq 25, the information in the three memory boxes needs to appear at random. This can be achieved by letting each point emit a signal with its own period, which would imply that the sequence of emitted signals changes over time. This can be enforced by assigning a "timer" θ( x k , t) to a lattice point x k , and it emits a signal whenever θ( x k , t) crosses 2πn. The difference in periodicity can be produced by having a "step size" (δθ)( x k ) differ for each point k. Thus, we propose the following algorithm: Rule 5: θ( x k , t + δt) = θ( x k , t) + (δθ)( x k ), where (δθ)( x k ) is different for each k.
The act of emitting a signal will be described by the parameter σ( x k , t). Normally, σ( x k , t) is 0, but during the time of signal emission it becomes −σ max (Rule 6) at the point of emission. On the other hand, σ( x l , t) becomes +σ max (Rule 7) at the points past which the signal passes. In both cases, σ( x, t) will go on to return to 0 afterwards; but it can only do so in small steps (Rule 8). Now, a point l is not receptive to any signals unless σ( x l , t) = 0 (Rules 9, 10, 11). The fact that it "takes time" for σ( x, t) to "become" zero implies that once a signal passes by a certain point it would only be able to return to that point after said period of time is over. Now, if the total number of points (N ) is smaller than the number of steps needed for recovery (σ max /δσ), the signal will cover all the points and "have nowhere to go". Thus, after passing through each point once, it will disappear. On the other hand, if σ/δσ is much smaller than