Investigation of a neural implicit representation tomography method for flow diagnostics

In this work, a new gridless approach to tomographic reconstruction of 3D flow fields is introduced and investigated. The approach, termed here as FluidNeRF, is based on the concept of volume representation through Neural Radiance Fields (NeRF). NeRF represents a 3D volume as a continuous function using a deep neural network. In FluidNeRF, the neural network is a function of 3D spatial coordinates in the volume and produces an intensity of light per unit volume at that position. The network is trained using the loss between measured and rendered 2D projections similar to other multi-camera tomography techniques. Projections are rendered using an emission-based integrated line-of-sight method where light rays are traced through the volume; the network is used to determine intensity values along the ray. This paper investigates the influence of the NeRF hyperparameters, camera layout and spacing, and image noise on the reconstruction quality as well as the computational cost. A DNS-generated synthetic turbulent jet is used as a ground-truth representative flow field. Results obtained with FluidNeRF are compared to an adaptive simultaneous algebraic reconstruction technique (ASART), which is representative of a conventional reconstruction technique. Results show that FluidNeRF matches or outperforms ASART in reconstruction quality, is more robust to noise, and offers several advantages that make it more flexible and thus suitable for extension to other flow measurement techniques and scaling to larger-scale problems.


Introduction
Three-dimensional (3D) flow diagnostics have received increasing attention and interest due to the spatial complexity that occurs in many real-world engineering applications.3D tomography is a technique that reconstructs a 3D scene from a series of projections acquired by a distribution of 2D sensors surrounding a flow field.Image-based 3D flow diagnostic techniques that use tomography include chemiluminescence (CTC), laser-induced fluorescence (LIF) [1], optical pyrometry [2], tomographic absorption spectroscopy (TAS) [3], particle image velocimetry (PIV) [4,5], tomographic emission spectroscopy [6,7], background oriented Schlieren (BOS) [8], and many more.The tomographic reconstruction method is generally an ill-posed and under-defined inverse problem, making it difficult to solve.While current methods have limitations, volumetric tomography plays a crucial role in understanding combustion processes and high-speed flows that are inherently 3D.Therefore, improvements to the tomographic methods are necessary to capture the full 3D nature of these complex flows [9].
One of the most common tomography techniques used for flow diagnostics is iterative reconstruction techniques, mainly algebraic reconstruction techniques (ARTs) [10].ART-based techniques that include ART, simultaneous ART (SART) [11], and multiplicative ART (MART) [12], have been demonstrated to produce acceptable reconstructions with a limited number of views distributed around a wide arc.However, due to the nature of the ill-posed inverse problem these methods are known for producing non-realistic high-frequency content that must be truncated or filtered out.One way to overcome these challenges is by imposing physically-motivated a priori information/structure through classical regularization techniques including Tikhonov [13,14], total variation (TV) [15], and Bayesian formulations [16].While regularization has improved reconstructions of less complex volumes, these techniques increase computational time and have been demonstrated to produce overly-smooth volumes and other artifacts for more complex scenes [9].Another challenge faced by the classical tomography techniques is the volume discretization that is required to form a discrete number of equations that are solved.Volume discretization techniques inherently limit the topology of the reconstruction and are prone to aliasing and discretization errors [9].Additionally, there is a trade-off between volume resolution and computational costs, where the computational time and storage requirements scale with the number of volume elements.
A recent and comprehensive review article by Grauer et al [9] does an excellent job of summarizing the state-ofthe-art in volumetric emission tomography for combustion applications where the overall progress, potential, and widespread applicability of such measurements is quite clear.One promising active area in tomography research is using deep learning tools.Most preliminary attempts to use deep learning for tomography have implemented supervised learning techniques, where the deep neural network is training on a ground truth set of perspectives and 3D field.The deep neural network is trained to relate projections to voxel intensities.The validity of these supervised tomographic methods is uncertain when applied to volumes outside the training set, where the imaging model and physics do not relate.Alternatively, an emerging semi-supervised deep learning method called physics-informed neural networks (PINNs), first introduced by Raissi et al [17], uses a deep neural network that relates space and time to relevant flow field parameters (u, v, w, p).Thus, the neural network of PINNs approximate the flow field with a continuous function.PINNs are useful in ensuring flow fields obey physical constraints by using Navier-Stokes, advection-diffusion, and other equations.While PINNs have been employed mainly in the context of computational fluid dynamics, a PINN approach that combines measurement loss and physics loss for BOS measurements has been demonstrated by Molnar et al [18,19].The use of PINNs for data assimilation and the post-processing of experimental particle tracking data, such as the work of Di Carlo et al [20] and Clark Di Leoni et al [21], is also noted and related to the current effort.
In this work, we draw our inspiration from the computer vision community, which has been developing techniques for shape representations and view synthesis as described by Mildenhall et al [22].A neural implicit representation technique that has shown great promise for accurate and efficient view synthesis is neural radiance fields (NeRFs).NeRF represents an arbitrary volume using a neural network that is a function of 3D location within the volume and 2D view direction.This parallels the idea of the light-field where all of the light rays in a volume are represented by their position and direction of propagation.Within the computer vision community, NeRF has been shown to be capable of capturing very detailed and high-frequency spatial information about a scene from a collection of 2D images obtained from arbitrary directions.Similar to PINNs, NeRF removes the inherent topology limitation with the continuous volume representation, reduces memory costs by orders of magnitude, and provides modularity to incorporate physically realistic ray models, rendering techniques, other inputs, and physics-based loss functions.These methods are generalized and can leverage both observed data and the underlying physics to learn the system dynamics more efficiently, allowing them to be able to solve both forward and inverse problems [17].Additionally, the advances that are made for neural implicit representation methods and PINNs in general are largely application agnostic.Thus, the adoption of such an approach for tomographic 3D flow reconstruction stands to benefit from the immense advances taking place across the entire domain.
This work proposes and studies a NeRF-based 3D flow field tomography technique, referred to here as FluidNeRF.A preliminary proof-of-concept investigation of FluidNeRF was presented by Kelly and Thurow [23].In addition, Zhang et al [24] concurrently demonstrated a similar NeRF based tomography method that they referred to as Neural Volume Reconstruction Technique, and Chu et al [25] presented a physics-based NeRF method for 3D smoke visualization.The current work is distinct in that it aims to significantly expand upon the details of understanding of NeRF-based techniques.Specifically, this work investigates the influence of hyperparameters including network size, positional encoding, and spatial sampling techniques and their effect on reconstruction accuracy.Then, key external factors for FluidNeRF's reconstruction accuracy are examined including camera layout and image noise with direct comparisons made to reconstructions performed using a conventional ART-based approach.This work is an instrumental step to understanding the limits of tomography algorithms that use neural implicit representations, and it provides a framework that can lead to improved models and techniques for combustion and high-speed flow diagnostics.

Neural radiance fields for flow visualization
The FluidNeRF reconstruction algorithm is schematically shown in figure 1.First, a light ray is traced through the volume along a path corresponding to a pixel from one of the projection images.A user defined number of query points are taken along the ray with each point defined by its 3D object space coordinate (x j , y j , z j ).These coordinates are passed through a positional encoding (γ) step to increase the dimensionality, which has been shown in the NeRF literature as being essential for the representation of high spatial frequency information when used in combination with ReLU activation functions [22].The encoded coordinates are then passed to the multi-layer perceptron (MLP) with the activation function of the nodes indicated by σ.The MLP outputs the predicted volume luminance density (E j ) at each queried 3D coordinate.Once all query points along the ray are interrogated, a pixel value is rendered through numerical integration of the volume luminance along the ray.The measurement loss (L) that is used to update the MLP is then calculated using the mean square error (MSE) between the rendered pixel value (I i ) and the measured pixel value ( Îi ).L is accumulated for a batch of rays before updating the MLP (volume).

Discrete ray tracing
The volume sampling method is a key component to rendering image projections from the reconstructed volume and directly impacts the reconstruction quality via L. In addition, the computational efficiency of the method is also affected with computational time expected to scale proportionately with the number of points sampled.To produce efficient sampling along the ray, a multi-resolution technique is used.First, FluidNeRF produces a semi-random set of samples along a ray cast through the volume.Then, a second set of samples are calculated using information from E j that are found from semirandom set of query points.This two-stage sampling method is refereed to as Hierarchical sampling [22].This method helps increase samples in areas of interest, while limiting samples in locations with constant or no intensity.This can be important in reducing the total number of samples required to capture the important features of a volume that create the projections as compared to using only uniform or semi-random sampling methods.
For this work, a simple pin-hole camera model is used for tracing rays through the volume.Each ray has an origin (r o ), corresponding to the pinhole location in space relative to the volume, and a direction (r d ) given by a unit vector.Discrete samples along the ray can be found by r o + s • r d , where s is the distance from the ray query point to the camera pinhole.
To calculate the semi-random samples, the ray is first segmented into N c evenly spaced regions between a user defined near-field and far-field points (s near , s far ).Then a point along the ray is randomly sampled in each region as defined by equation (1) [22] ] . ( This sampling technique will produce a semi-uniform sampling, distributed along the ray since a single sample is taken in each region.
After the first set of semi-random locations have been interrogated, E j is calculated for each point through the MLP.This information is then used to produce another set of samples.For the current FluidNeRF method, the spatial gradient of intensity along the ray is used.The gradient is found numerically using dE = Ej+1−Ej sj+1−sj .While this finds an approximate gradient, automatic differentiation can be used to find the exact gradient at each point and will be incorporated in the future.The amplitude of the gradients are normalized along the ray ), producing a piece-wise constant probability density function (PDF).With the PDF, steeper gradients have a higher probability of a new sample.Then, N f feature samples are determined with the PDF.Thus, the final image will be rendered using all (N tot = N c + N f ) samples.

Positional encoding
With the sampling along the ray completed, the 3D spatial coordinates are individually encoded to a higher dimension by applying a Fourier feature mapping as first demonstrated by Tancik et al [26].Neural networks are notorious for converging to lower order solutions, leading to poor performance in representing the high-frequency content in a volume.Thus, positional encoding uses a variant of Fourier feature mapping as shown in equation ( 2) where L is an integer value that determines the dimensionality of the encoding.A higher value of L produces more values and increases the dimension of the input array.Not only does it improve the performance in capturing high-frequency information, it has also been shown to increase training convergence rate.Thus, positional encoding is incorporated into FluidNeRF.We note that the use of different activation functions, such as the SIREN approach used in [25,27], may mitigate the need for positional encoding.This will be explored in future work.

Image rendering
Another critical step for an accurate reconstruction is the image rendering, as the imaging model (projection function [9]) should approximate the measurements in a physically realistic manner, as reconstructions use back-projection to solve for the volume of interest.In many applications of interest, it is reasonable to approximate the flow field as optically thin, in which case a simple emission-based projection rendering model can be applied.FluidNeRF incorporates a modified emission-based image rendering model as described by [28].For this model, the light only originates from the volume that is bounded by s near and s far for a particular ray.
The light that reaches a pixel is calculated with equation ( 3) where I i is the intensity of the ith pixel, Ω is the solid angle as a function of s, and A is the area of the ith pixel in object space as a function s.Ω accounts for the capture area of the light due to the size of the lens aperture.The incorporation of depthdependent area and a solid angle term is unique to our method that better approximates the imaging model used for experiments.Equation ( 3) is estimated using the numerical methods as described previously.A simple schematic of the ray tracing and rendering is shown in figure 2. In our method, this integral is approximated using the midpoint quadrature method as presented in equation (4).
where sj is the midpoint between the j and j + 1 sample, and ∆s j is the distance between midpoints (s j+1 − sj ).The area of a pixel can be found using the local magnification ( pi −Mj ) at the sj location.For the quadrature method, Ω is the normalized solid angle relative to the focal plane that is calculated via (s o /s j ) 2 , where s o is the object distance of the main lens.Note that the emission-based rendering model is only valid for optically thin volumes, however, other rendering techniques, such as the the absorption-emission model [22], will be incorporated in the future.

Updating the network
The MLP approximates the volume and is updated using a loss function that quantifies the difference between I i and the meas-   5) where N pix is the number of pixels used per iteration.MSE is a standard loss equation that has been used for other NeRF variants [22,24], however, the inclusion of other loss functions may improve performance [25] and can be investigated in the future.

Implementation
FluidNeRF used for this work is implemented in Python employing the Tensorflow 2.9 library.The default hyperparameters of the method are presented in table 1 and used throughout this work unless otherwise indicated.The network size is a user defined hyperparameter that is determined at run-time.
The activation function (σ) for this work is the rectified linear unit (ReLU).An iteration of FluidNeRF is completed once all images have been considered, where a random batch of pixels from each image are queried.For this work, a user specified 1024 pixels from each image are selected during each iteration for training.Once an iteration is finished the network is updated using the Adam optimizer with an initial learning rate of 5 × 10 −4 that decays exponentially to 5 × 10 −5 after 25 000 iterations.The Adam optimizer hyperparameters are kept as default as previously reported by [22].A convergence criteria is used to stop training when the L varies less than 2% across a 2000-iteration stretch with a 200-iteration running average, similar to the implementation by Molnar et al [19].
FluidNeRF is computed on a single NVIDIA Tesla T4 GPU that was provided by the Auburn University Easley Cluster.

Numerical validation
FluidNeRF is evaluated using a synthetic data set to provide a ground truth for comparison.This section covers the validation data set and the metrics that are used to assess performance.

Synthetic data set
The ground truth volume that is used for validation was generated from a direct numerical simulation (DNS) of a highpressure, turbulent mixing jet as shown in figure 3(a).The jet flow field is represented as a scalar-field that has a value of 1.0 for pure fluid originating from the nozzle and 0.0 for the surrounding fluid, with values between 0 and 1 indicating the mixture between the jet and the surrounding fluid.The jet originates from a circular nozzle with an exit diameter of 2.36 mm with a Reynolds number of 5000 as described by Sharan and Bellan [29].The data is analogous to several flow diagnostic techniques, including LIF in combustion, or passive scalar flow visualizations such as that achieved with dye or smoke injection.The current data set has several features of practical interest including (i) laminar region with a top hat cross section, (ii) transition region with large-scale asymmetric flow structures forming around the periphery, and (iii) turbulent region consisting of a broad range of spatial frequency content.These characteristics offer a practical volume for assessing the impact of hyperparameters on the spatial resolution of FluidNeRF's reconstruction.

Synthetic image generation
The synthetic perspectives of the DNS jet volume were generated using the Advanced Flow Diagnostics Laboratory's (AFDL) in-house tomography software [30,31] as shown in figure 3(b).Before the CFD values can be used in the AFDL software, the values were interpolated from a variable-sized mesh grid onto a constant-sized voxel grid using a linear interpolation scheme.The jet volume was discretized into 801 × 801 × 801 volume elements with a voxel size of 0.05 mm.The AFDL software assumes the optics can be approximated with the thin lens equation and that the volume is optically thin.The synthetic images are formed by iteratively casting rays (N rays ) from each voxel position to the sensor using ray transfer matrices.Casting a discrete number of rays produces a type of shot noise that is typically found in imaging.For computationally efficient calculation of voxel/pixel intercepts, each voxel is approximated as a sphere, with the diameter of the sphere equal to the side length of the square voxel.The origin of each ray for a voxel is randomly generated on the surface of the sphere.The number of rays cast from each voxel is determined relative to the ratio between the collection angle at that location relative to the collection angle at the focal plane, where N rays is the number of rays cast from each voxel at the focal plane.This renders an image while also accounting for the solid angle of the collection lens.From each origin, a ray is cast at a random direction within the collection angle of the main lens.Finally, as the rays intercept the sensor plane at integer pixel positions, the signal is accumulated with the intensity of each ray corresponding to the originating voxel value.The image rendering settings for this work is presented in table 2. These settings produced a maximum intensity that would be representative of a 16-bit image.
A planar camera distribution and a spherical camera distribution, as shown in figure 4(a), are used to evaluate the impact of camera layout on reconstruction quality.The planar layout consist of up to 120, evenly-spaced perspectives, with the focal plane located at the center of the volume.The camera plane is perpendicular to the flow direction.Subsets of the 120 cameras are generated to evaluate the effect of number of cameras.Note that a few perspectives of the 120 cameras had distinct aliasing artifacts due to near-perfect alignment of the voxel grid coordinate system with the image sensor coordinates, which is a common issue with discretized schemes.Figure 5 shows an example of one such image,  where the aliasing is present in the form of horizontal bands in the higher intensity regions.Plans are being made to address this aliasing issue in the future, but the images are retained here where we note that both FluidNeRF and ASART used the exact same set of images.The perspective locations for the spherical data set were calculated using a Fibonacci lattice to distribute camera locations around a sphere (figure 4(b)).After rendering the projections, the images were post-processed to apply varying levels of Gaussian image noise.The signal-tonoise (SNR) ratio of the images were determined relative to the maximum intensity in the image, with levels of 1%, 2.5%, 5%, and 10%.

Grid-based reconstruction method
FluidNeRF is compared to a modern ART-based method called adaptive, simultaneous algebraic reconstruction technique (ASART).As noted before, ART-based methods have been the most popular iterative tomography methods for combustion and flow diagnostics.Simultaneous ART-based (SART) methods improve the performance of the original ART method by simultaneously updating the predicted volume across all projections, rather than a projection-by-projection update.The simultaneous methods increase computational efficiency and are motivated by volumetric imaging [9].ASART incorporates a modified multilevel access scheme to arrange the order of projection data, adaptively correct the relaxation parameters that correct discrepancy between actual and computed projections, and a column-sum substitution [32].The initial prediction and updating equations for ASART are given by equations ( 6) and ( 7), where is the intensity of a volume element (voxel) j at iteration k, w ij is the weighting between pixel i and voxel j, P i is the intensity on the image for pixel i, S is the number of pixels per projection, and µ is a relaxation factor (generally, 0 < µ ⩽ 2) that can enforce numerical stability.The weighting function w ij is an approximation of the complex relationship between voxels and pixels, and w ij is calculated each instance of the iterations due to the memory requirement.ASART has been shown to improve accuracy and convergence rate for scalar-field measurements compared to the family of ART algorithms [32,33].
Our ASART implementation uses a bilinear interpolation scheme to relate voxels to pixels (w ij ), where each voxel contributes intensity to a 2 × 2 grid of pixels for each camera.We employ µ = 0.5 for this work to ensure numerical stability.The volume was reconstructed at a resolution of 400 3 voxels, unless indicated otherwise.When making direct comparisons, FluidNeRF is interrogated at the same spatial location as the center of each voxel.The ASART implementation allows for a maximum of 250 iterations of the technique before stopping; however, a convergence criteria was used for early stopping since ART-based methods are semi-convergent [9].For the convergence criteria, the error is accumulated for each iteration (P i − ∑ N j w ij E (k) j ).The convergence criteria stops the reconstructions when the iteration error varies less than 2% over the last 20 iterations using a sliding mean with a window of 4. The convergence criteria was selected to be similar to what is used by FluidNeRF.Our ASART method is implemented in C/C++ and parallelized using OpenMP.ASART was processed on Intel Xeon Gold 6248R processors provided by the Auburn University Easley Cluster.

Numerical metrics
In addition to visual inspection of the reconstructed volumes, two different performance metrics are used to quantify the accuracy of the reconstructions.Volume metrics compare the volume reconstruction to a ground truth, which is possible due to the synthetic data set.One of the quantitative metrics is the normalized root-mean-square-error (NRMSE), given by E is the predicted volume value, Ê is the ground truth, and N vox is the number of voxels or query points used.NRMSE is one of the most common accuracy metric in volumetric imaging [9].NRMSE is an averaged error quantification across the volume.
While not as common in volumetric imaging, structural similarity index metric (SSIM) has been a popular for comparing images in the computer vision community [22], but has been employed recently for tomography [24].SSIM investigates the luminance, contrast, and structure of the image, to evaluate the ability of a reconstruction to capture the overall structure and intensity of the image/volume as shown in equation ( 9) µ is the mean value across sample points, σ x and σ y is the variance of the intensity value, σ xy is the covariance, and c 1 and c 2 are constant variables that stabilize SSIM.Luminance is compared to the ground truth using µ.The second parenthesis in the denominator of equation ( 9) captures the contrast compared to the ground truth using σ x and σ y .The structural comparison is employed through σ xy .

Results and discussion
A preliminary comparison between FluidNeRF and ASART is made by visualizing a central slice along the jet axis as presented in figure 6.The overall structure of the jet is clearly reconstructed using both methods with the differences being most apparent in the error plots shown in (d) and (e).Both methods clearly capture the laminar jet core, the transition region near the end of the jet core and the diffuse, turbulent flow region downstream.The most significant difference between the two reconstructions is found in the laminar region where ASART has difficulty capturing the sharp transition between the jet core and ambient flow.Due to the necessary discretization of the volume, ASART has difficulty modeling the sharp gradient at the jet's edge and compensates by underpredicting the intensity in the core.FluidNeRF also appears to slightly outperform ASART in the transition region where sharp features are still prevalent; however, the differences between the two methods begins to diminish in the turbulent flow region where the flow is dominated by finer scale, smoother structures.In this region, both methods still reconstruct the larger scales and general distribution of the jet fluid throughout the volume but struggle to capture the finest details, thus, indicating the limit of the spatial resolution.
In the remainder of this section, a closer look at the performance of FluidNeRF is offered, beginning with the influence of the various hyperparameters.Once the hyperparameters have been selected, the impact of camera layout and image noise are investigated.

Hyperparameters
The choice of hyperparameters in the FluidNeRF reconstruction, such as network depth and height, can have a significant impact on the reconstruction quality, similar to voxel resolution in ART-based methods, as well as the computational time.All of the hyperparameter results presented here are for a 15 camera, evenly-spaced planar camera layout.This first investigation is the MLP network size.For this comparison the other hyperparameters were held constant as shown in table 1.The ability of the network to approximate the radiance field is  affected by network size.Figure 7 illustrates the reconstruction accuracy as a function of network depth at two different network heights.In general, a deeper network improves the approximation of the volume.For a height of 256 nodes, the reconstruction had diminishing returns or leveled after approximately eight layers.Reducing the network height by a half does decrease the accuracy for a given depth, but the accuracy of the shorter network converges to the taller network with increasing depth.SSIM is equal for both heights with a depth of ten layers.
In addition to approximating the volume, the network depth also has a direct impact on computational efficiency as the number of operations is directly proportional to the network size.For the case with ten layers, the taller network has 256 * 8 nodes compared to 128 * 8 nodes of the shorter network.Fewer operations leads to improved computational efficiency in reconstruction as depicted by the convergence time for each network size in figure 7(c).In almost every case, the computational time was <50% with a height reduction of 50%.These results show that quicker reconstructions can be achieved with a measured trade off in reconstruction accuracy.For the rest of this work, a network size of 8 layers with 256 nodes each is chosen to maintain the focus on higher fidelity reconstructions.
The next hyperparameter investigated is positional encoding.As discussed, increasing L increases the dimensionality of the inputs, thus allowing the MLP to better reconstruct the higher frequency content in the volume.The free jet volume is composed of different regions (laminar, transition, and turbulent) that have varying levels of spatial frequencies as shown in figure 8(a).The reconstruction NRMSE for each positional encoding for the three regions is presented in figure 8(b).The turbulent and transition regions are characterized by higher spatial frequency content and are much more difficult to reconstruct, as was also shown in figure 6, and thus, corresponds to higher NRMSE error.The laminar region has much lower NRMSE with L having little influence on the accuracy.While increasing L is not needed, it is also observed that increasing L does not negatively effect the NRMSE of the reconstruction.The transition and turbulent regions show similar trends where increasing L causes an improvement in performance (reduced NRMSE) until L = 8, after which further improvements are not observed.Therefore, L = 8 is considered here as the optimal encoding.
Another advantage of positional encoding is the convergence rate as illustrated in figures 8(c)-(e).The termination point of each line indicates the convergence rate as a function of L, where increasing L reduces the number of iterations for convergence.Note that the reconstruction was conducted on the full volume, thus, the termination point is the same across all regions.The laminar region converged quickly for all L.There is little difference between the trends of each encoding.The transition and turbulent regions show similar results as laminar, but the convergence rate is more apparent with increasing L. In these regions, the NRMSE exhibits a negative initial slope that progressively becomes steeper with increasing L until approximately L = 8.After L = 8, the NRMSE lines coalesce to the same trend.Increasing L causes <5% increases in computational time per iteration between L = 0 and L = 8, however, approximately 30% less iterations are required for convergence.Thus, increasing L improves the ability to accurately capture the volume and reduces the training time.It should be noted that the optimal value for L is dependent on the spatial complexity of the flow field of interest, however, L values greater than the optimal do not cause a reduction in accuracy.
The number of spatial samples along each ray used for rendering the perspective pixel is the final hyperparameter that is considered.Figure 9 shows NRMSE and SSIM of the FluidNeRF reconstruction utilizing semi-random sampling (N c = N tot ) and multi-resolution sampling (N c = N tot /2, N f = N tot /2) methods.The reconstruction accuracy quickly improves when the N tot is equal to or greater than 64.With N tot samples greater than this limit, the reconstruction quality appears to be limited by the MLP approximation of the volume rather than the discrete ray tracing method as indicated by a constant accuracy for N tot > 64. Figure 9(b) also illustrates that clustering sample points around the peak gradients in the volume only slightly improves the quality of the reconstruction with the difference most notable at N tot of 32 and 64.Alternatively, SSIM indicates a much more pronounced difference for N tot ⩽ 32 with the coarse sampling outperforming the multi-resolution sampling.The information provided by N c with low sample densities does not seem to provide sufficient information to adequately determine where the fine samples should be focused.Using these results, N tot should be equal to or greater than 128 in order to maximize the reconstruction quality.Note that this is lower than what traditional discretized methods require along the line-of-sight.Therefore, the optimal  settings for maximizing reconstruction accuracy and reducing computational time is N c = 64 and N f = 64, where increasing sampling past this will increase computational time without improving accuracy.

Camera layout
To study the effect of camera layout on reconstruction accuracy, the volume was reconstructed with FluidNeRF using two camera configurations: (a) a 360 degree evenly-spaced, planar layout representative of what might be expected in experiments and (b) a more challenging spherical layout with the same number cameras that provides better coverage of the full angular space.Figure 10 shows the accuracy difference between the two layouts, where both have a similar trend, but the planar layout outperforms spherical for both metrics, NRMSE and SSIM.This is perhaps counter-intuitive, as the increased angular separation of the spherical layout should increase the accuracy due to perspectives being more dissimilar as discussed by [9].This is the case for general tomography problems, however, the jet flow field utilized here is characterized by an axis of symmetry about which flow features are expected to be organized.Thus, the flow features are best distinguished by the perspectives perpendicular to the jet axis.Additionally, the spherical layout leads to perspective views with a relatively longer line-of-site (LOS) through the jet, leading to a less efficient use of image sensor resolution.Both configurations produce adequate reconstructions, but the planar configuration is superior for our case.Therefore, the rest of this work uses the planar configuration.
With the planar layout, the effect of the number of perspectives on the reconstruction accuracy was investigated and compared to ASART as presented in figure 11.For a practical camera layout with four cameras, typical for PIV/PTV, ASART and FluidNeRF performed about the same with ASART performing slightly better as indicated by SSIM.Increasing the number of cameras to 8 and 12, common for camera configurations of scalar-field reconstructions [9], drastically increases the accuracy of both methods.For these cases, FluidNeRF outperforms ASART.As expected, increasing the number of perspectives improves reconstruction quality for both cases, but shows diminishing returns after approximately 20 cameras for ASART and 30-40 cameras for FluidNeRF.More importantly, FluidNeRF outperforms ASART for all cases with 8-12 perspectives of FluidNeRF producing equal or better quality than ASART with the maximum number of viewpoints considered.We also note that the trends for ASART are not always monotonic.The authors believe this is most likely associated with aliasing issues associated with the discrete nature of the ASART method where the perfectly uniform distribution of cameras relative to the voxel grid can lead to numerical artifacts.Still, it is important to recognize that the same perspectives were used for both FluidNeRF and ASART.Regularization techniques could be included in ASART, at significant additional computational expense, to reduce the influence of these artifacts on the reconstruction.The lack of a discrete grid and random sampling through the volume enables FluidNeRF to naturally avoid the same aliasing artifacts.

Computational efficiency
Given the disparity in programming languages and processors utilized in the current implementations of ASART and FluidNeRF, a direct comparison of computational time is not appropriate.However, it is useful to compare the trends of computational time for each algorithm as the overall scale (i.e.number of cameras, volume resolution) of the problem increases.The computational time associated with an iteration of each method is primarily expected to scale with O(N pix * N vox ) and O(N pix * N tot ) for ASART and FluidNeRF, respectfully.The rate at which a method converges to a final solution, however, may not scale according to these metrics.To compare trends in the overall reconstruction times, both methods were run using 4, 8, 12, 20, 40, and 60 cameras, thus, varying the amount of image data (i.e.N pix ) considered.For each of the N pix cases, ASART was reconstructed with different volume resolutions of 200 3 , 400 3 , 500 3 , and 800 3 to repres-ent a common choice required by a user when using volume discretized techniques.An equivalent choice is not necessary when using FluidNeRF as it is a gridless technique, although one could potentially reduce the size of the MLP or N tot to achieve a similar effect.Figure 12 displays the convergence time for both FluidNeRF and ASART for various combinations.Note that the convergence time is provided for reference, but only the trends are considered here in the context of the scalability of each method to larger domains.For ASART, N pix = 4 * N cams as we employ bilinear interpolation to relate image data to each voxel.As expected, the ASART convergence time scales linearly with N pix * N vox due to the discrete relationship between voxels and pixel.FluidNeRF, on the other hand, is characterized by a convergence time that only modestly increases as the number of cameras or ray sampling rate is increased.This indicates that FluidNeRF can effectively utilize the increased number of projections (N pix ), where the convergence time only slightly increased relative to the gain in accuracy with increasing N pix (figure 11).
The relationship between the density of the voxel grid and true spatial resolution in ASART is not straightforward with a full analysis beyond the scope of this work.Nonetheless, it can generally be considered that a denser voxel grid will intrinsically reduce the limitation on spatial resolution.One challenge, however, is that increasing the grid density can also result in the tomography problem becoming more underdefined, which can induce numerical instabilities.Therefore, increasing resolution does not necessarily increase reconstruction quality.As a relevant aside, this was investigated by comparing the reconstruction accuracy for 400 3 and 800 3 volume resolution cases.ASART reconstruction accuracy stays relatively constant between the two resolutions.For example, the ASART reconstruction NRMSE using 15 cameras at 400 3 and 800 3 volume resolutions is 0.225 and 0.215, respectfully.Therefore, increasing volume resolution slightly improved the reconstruction, but it does not reach the reconstruction accuracy of FluidNeRF (NRMSE = 0.165 using 15 cameras).Therefore, FluidNeRF has the ability to expand to higher resolution cases while maximizing accuracy compared to discretized methods.
Another important distinction between ASART and FluidNeRF is the memory and storage requirements for each reconstruction.With ASART, the memory requirements for the solution scale with N vox .For the 800 3 volume resolution case, each volume requires 2 GB of memory.Comparatively, FluidNeRF requires 1.9 MB to store the weights of the neural network for the default hyperparameters.It is not clear if a deeper network would be needed to represent other types of flow fields; nonetheless, FluidNeRF holds distinct advantages for investigating problems with increasing volume size or resolution requirements.

Image noise
In this subsection, the impact of image noise on reconstruction quality is evaluated.Figure 13(a) shows the reconstruction quality at different levels of image noise for a varying number of cameras.For FluidNeRF, the reconstruction quality slightly deteriorates for noise levels up to 5% after which the impact of noise is more pronounced.Even at 10% noise, FluidNeRF is able to acquire adequate reconstructions.However, higher noise flattens the NRMSE curve, indicating that the noise reduces the effectiveness of increasing the number of perspectives.
Figure 13(b) presents a comparison between ASART and FluidNeRF with and without 5% Gaussian noise applied to the perspectives.It is immediately apparent that FluidNeRF continues to outperform ASART even in the presence of noise with ASART maintaining a lower NRMSE value throughout.ASART has been shown to be more robust to noise than most ART-based methods; however, it is still quite susceptible compared to FluidNeRF.As expected, noise has much higher impact for cases with fewer cameras.Increasing the number of cameras allows ASART to overcome some of the challenges associated with noise, but to the same degree as FluidNeRF, which is shown here to be more robust.Overall, FluidNeRF is more applicable for experiments since image noise is unavoidable.The effect of noise on the ASART and FluidNeRF reconstructions is qualitatively illustrated in figures 14 and 15, respectfully, which show 2D cross-sectional slices of the jet in the turbulent regions at different noise levels for different number of cameras.For FluidNeRF, the differences in reconstructions is rather subtle with sharper features observed for low-noise, high-camera cases.The ASART reconstructions, on the other hand, show more pronounced differences with reconstruction artifacts that are typically associated with discrete volume representations and aliasing effects.This is most notable for the lowest number of cameras where salt and pepper type noise is prevalent.

Conclusions
In conclusion, a neural implicit representation tomography technique called FluidNeRF was introduced and compared to standard ART-based reconstruction methods.FluidNeRF's performance was demonstrated and characterized using synthetic images generated from a DNS volume of a turbulent jet where it was found to perform comparably or better than ASART.FluidNeRF is similar to ART-based methods in that a volume is reconstructed in an iterative fashion by comparing reconstructed projects with measured projections; however, it distinct in that it approximates the reconstructed volume as a continuous function using a neural network and that reconstruction is performed using standard machine learning training algorithms and volume rendering.NeRF-based methods have several advantages compared to traditional, ART-based tomography methods including: (i) the volume is represented as a continuous function that can be arbitrarily interrogated without the need for discretization, (ii) the volume is stored in a compact manner, significantly reducing memory requirements by orders of magnitude, thus enabling scalability to much higher resolutions and larger camera arrays, (iii) the use of standard machine learning tools facilitates future improvements that leverage the rapid advances taking place across the entire discipline of machine learning and (iv) the overall machine learning architecture is naturally modular and can be relatively easily adapted to incorporate more realistic or computational efficiently ray models, volume rendering techniques, physics-based loss functions and other inputs.
With the recent introduction of NeRF as a viable emission tomography method [23,24], this work provides deeper insight into the effects of network hyperparameters and external parameters including camera layout and image noise on reconstruction accuracy.This work found that network size can have an impact on the reconstruction quality, but with an associated impact on the computational time.Deep learning techniques have enabled the growth of network sizes to provide volume approximation fidelity that surpass discretized methods.For the flow field investigated here, a deeper, thin network size has a small effect on reconstruction quality.A network with half the height and two layers deeper produced similar results, while cutting computational time in half due to the reduction of computations per input and network unknowns.Positional encoding improved the ability of the network to capture high-frequency content in the volume.Increasing positional encoding maximizes reconstruction quality in regions with high-frequency content and preserves the quality in low-frequency regions, where an optimal encoding of L = 8 was found.Another important step is the spatial sampling along a camera ray to maximize the image projection.A multi-resolution sampling technique enhances reconstruction accuracy for a given N tot , although only slightly for the parameters considered here.In terms of camera layout, a planar configuration that is oriented perpendicular to the mean flow path outperformed the spherical configuration, emphasizing the importance of considering a priori knowledge of the flow during experimental set-up.
Overall, the FluidNeRF technique has significant potential for 3D flow visualization with this work being an important first step.In the future, other hyperparameters of the method should also be investigated as this can further improve the spatial fidelity and computational efficiency of the method.This should include additional consideration of the activation function, image rendering models, and loss functions.More broadly speaking, the modularity of the FluidNeRF model offers even more potential for this method to be expanded to other flow fields and applications by the inclusion of time as an input, the development of more advanced and physically realistic imaging models, and the incorporation of physics-based constraints in the loss terms.

Figure 2 .
Figure 2. Ray tracing and image rendering method schematic.The solid rays correspond to rays cast from pixel centers and the dashed lines correspond to boundaries between the pixels.The query points are indicated by the points along the ray.

Figure 3 .
Figure 3. (a) A central x-y slice of the non-dimensional passive scalar volume originating from the nozzle and an iso-surface of the scalar at a contour value of 0.2.The orange arrow indicates the flow direction (positive x-direction).(b) A line-of-sight integrated perspective image generated from the non-dimensional scalar value of the DNS volume.

Figure 4 .
Figure 4. (a) Planar camera layout with the jet marked in blue with the mean velocity coming into or out of the page, and (b) spherical camera positions calculated using Fibonacci lattice with the free jet volume indicated by the red box.

Figure 5 .
Figure 5.A line-of-sight integrated perspective image of the DNS free jet data.A small amount of aliasing, in the form of horizontal streaks, caused by the synthetic image rendering method is highlighted.

Figure 6 .
Figure 6.A central x-y slice of (a) the ground truth free jet, (b) FluidNeRF reconstruction, and (c) ASART reconstruction using 15 cameras in the planar configuration.The difference between the ground truth and (d) FluidNeRF and (e) ASART reconstructions.

Figure 7 .
Figure 7. Reconstruction accuracy as a function of network depth at different heights (128 & 256) using (a) NRMSE and (b) SSIM.(c) Convergence time for each combination of network height and depth.

Figure 8 .
Figure 8.(a) x-y central slice of the ground truth volume with red boxes indicating the laminar, transition, and turbulent regions (left to right) utilized for the comparison.(b) NRMSE as a function of positional encoding for the three different regions.NRMSE as a function of reconstruction iterations for the (c) laminar, (b) transition, and (c) turbulent regions.

Figure 9 .
Figure 9. (a) NRMSE and (b) SSIM as a function of Ntot samples using 15 cameras.

Figure 10 .
Figure 10.Reconstruction (a) NRMSE and (b) SSIM of FluidNeRF in the spherical and planar configurations with different number of perspectives.

Figure 11 .
Figure 11.(a) NRMSE and (b) SSIM for reconstructions using ASART and FluidNeRF using different numbers of cameras in the planar configuration.

Figure 12 .
Figure 12.Convergence time for (a) ASART as a function of N pix * Nvox and (b) FluidNeRF as a function of N pix * Ntot.

Figure 13 .
Figure 13.(a) Reconstruction accuracy of FluidNeRF at different image noise levels as a function of number of cameras (color version required), and (b) comparison of NRMSE for FluidNeRF and ASART with 0% noise and 5% image noise using various numbers of cameras.

Table 1 .
Default hyperparameters of FluidNeRF unless specified otherwise.
ured projection Îi .In this work, a MSE loss function is used as shown in equation (

Table 2 .
Camera settings for the free jet synthetic experiments.
Magnification Pixel pitch Focal length f/# Resolution Nrays