Highly immersive imaging: Depth of field effect implemented through ray tracing with multiple samples

This paper explores the implementation of depth of field effects through ray tracing with multiple samples. We focus on simulating the imaging process of the human eye, giving more consideration to the factors influencing the depth of field effect. Building upon ray tracing shading computations, we have employed a multi-sampling technique for rendering. This approach complements spatial information in the two-dimensional image, providing accurate depth of field effects. Moreover, as this process closely resembles the way the human eye naturally captures images, it results in a more immersive visual experience. Additionally, based on the advantageous features of parallel computing in distributed ray tracing, we explore an implementation scheme for parallel computing acceleration with multiple buffers. As the number of sampling points increases, rendering quality improves significantly, accompanied by substantial acceleration. Finally, we discuss the limitations of our method and potential directions for improvement, as well as future application scenarios in the fields of computer graphics and mixed reality.


Introduction
In traditional rendering methods, the camera is always treated as a single point.However, in reality, the process of imaging by the camera or the human eye can be approximated as light passing through a lens with a certain aperture size and projecting onto an image sensor or retina.Based on optical principles, there is only one plane in space that is truly in focus.Points on non-focal planes will form circles of confusion on the image sensor or retina.The phenomenon of different distances in a scene appearing with varying degrees of sharpness due to the focusing process is known as depth of field.The depth of field effect plays a crucial role as it helps to compensate for the spatial information lost in twodimensional images.
In order to achieve depth of field effects in the rendering process within a virtual environment, this article employs a method that simulates the camera and human eye imaging process.Building upon ray tracing, we establish multiple sampling points on the lens and combine the calculated results to obtain the final pixel color values.We implement real-time rendering using this method.In order to demonstrate the effectiveness of our method, we conducted three experiments, with the first two experiments designed for testing and validating the depth of field effects in reflection and refraction scenes, and the final experiment testing the acceleration achieved through parallel computation using four buffers.Since this method aligns well with human visual habits, the resulting rendered images exhibit a high degree of realism and immersion.

Depth of field
Depth of field (DOF) refers to the range of distance within a scene, from the nearest to the farthest objects, that appears acceptably sharp and in focus.In other words, it is the area in front of and behind the subject that is perceived as sharp when viewed or recorded through a camera lens.A shallow depth of field results in a limited area of sharp focus, while a deep depth of field encompasses a broader range of sharpness throughout the scene.

Causes
In the real imaging process, light passes through the lens system and forms an image on the screen.When we focus, there is actually only one plane that is truly in focus, which we call the focal plane.Anything in front of or behind this plane is not in a focused state.On the focal plane, light rays from a point on an object converge to a single point on the image plane, whereas for unfocused objects, light rays from a point will fall on different points on the image plane, forming a blurred circle.This circle is called the circle of confusion (CoC).[1]

Effects
Due to the presence of depth of field, in images generated from the capture of a three-dimensional scene, only one plane is truly sharp, and objects that are farther or closer to this plane will exhibit increasing blurriness as their distance from that plane grows.When the focal plane is fixed and objects move from near to far, the objects undergo a process of transitioning from blurry to sharp and then back to blurry.On the other hand, when the scene objects are fixed, and the focal plane moves from near to far (the focusing process), nearby objects gradually become blurry, while distant objects gradually become clear.This imparts additional depth attributes to two-dimensional images and is also an important way for the human eye to perceive the spatial relationships of objects within a scene.In terms of its impact on the degree of blurriness, aperture size is the primary factor.Images captured with a larger aperture exhibit a stronger depth of field effect, meaning that the degree of blurriness increases more rapidly with distance from the focal plane.

Ray tracing
Ray tracing is a computer graphics technique for creating highly realistic images.It works by simulating how light interacts with objects in a 3D scene.

Process
Turner Whitted introduced the concept of ray tracing in 1980 and provided the implementation principles.[2] 1. Ray Generation: We start by casting rays from the viewer's perspective (or the camera) through each pixel on the image plane.
2. Intersection Testing: These rays are traced into the 3D scene to see if they hit any objects.If they do, we record this intersection.
3. Light Interaction: When a ray hits an object, we create secondary rays to simulate how light interacts with the object's surface.For instance, we can simulate reflections and refractions.
4. Recursive Ray Tracing: Secondary rays can generate more rays, creating a chain of interactions, such as reflections within reflections.
5. Shading and Lighting: At each intersection, we calculate how the object interacts with light.This considers factors like material properties, light sources, and the viewer's perspective.
6. Color Accumulation: The calculated colors accumulate along the rays, determining the pixel's color on the image.
7. Repeat for All Pixels: This entire process is repeated for every pixel to create a realistic image.

Features
Ray tracing is known for its ability to produce highly realistic images with accurate lighting, shadows, and reflections.It is used in applications like video games, computer-generated movies, architectural visualization, and scientific simulations.While ray tracing can be computationally intensive and may require powerful hardware, it has become more accessible with the development of specialized graphics cards and software libraries that accelerate the rendering process.
In recent years, an increasing number of ray tracing technologies have been proposed and deployed.NVIDIA deliberately added RT cores in their graphics cards to specifically optimize ray tracing rendering performance, gaining a lot of attention and widespread adoption.[3] Jae-Ho also introduced RayCore, a mobile ray tracing hardware architecture capable of achieving real-time Whitted ray tracing on mobile devices.[4] These technologies have greatly leveraged the advantages of ray tracing and have made it possible for more stunning and widely applicable real-time ray tracing.

Related research
Barsky and Kosloff published a research paper surveying depth of field approaches in computer graphics, including Kraus's algorithm that achieves a significantly improved image quality by employing recently proposed GPU-based pyramid methods for image blurring and pixel disocclusion.[1,5] After that, Lee presented a real-time GPU-based post-filtering method for rendering acceptable depth-of-field effects suited for virtual reality and a method that approximates different views by relying on a layered imagebased scene representation.[6,7] Yu started with a single rasterized view of the scene and sampled the light field by warping the reference view to nearby views.[8] In recent years, there have been developments and achievements in rendering depth of field effects as well.Selgrad proposed an algorithm that renders the scene from a single camera position and computes a layered image using a single pass by constructing per-pixel lists [9] Zhang proposed a new filtering approach that takes approximated occluded pixels into account to synthesize the DOF effects for images.[10] Franke presented a scattering-based method that supports settings with partial occlusion relying on multiple layers of scene data.[11] Xu presented a new post-processing method that adaptively smooths the image frame with local depth and circle of confusion information based on a recursive filtering process.[12] Combining NVIDIA's summary of depth-of-field rendering techniques in GPU Gems, we can roughly classify these methods into five categories.[13] Among them, distributed ray tracing achieves the best depth-of-field rendering results.[14] However, it comes with significant computational overhead compared to other methods, and there has been relatively less research and expansion in this area.Now, with significant advancements in graphics rendering hardware, we aim to implement depth of field effects on the foundation of ray tracing and propose innovative ideas to achieve improved rendering quality, heightened realism, and enhanced performance.

Our Method: Ray tracing & Multiple sampling
Generally, this approach combines ray tracing and multiple sampling to achieve depth of field effects.
It is based on a straightforward and reliable ray tracing renderer, where the camera's position is set within the aperture size range, and the direction of rays is calibrated.The core idea of this method is to use a simple model to simulate the imaging process of the human eye as closely as possible.
To implement this method in code, this paper uses WebGL tools provided by the Shadertoy platform and extends the previously written and effective ray tracing code.It simulates multiple sampling renders on the lens and synthesizes the final image.Moreover, due to the simplicity of the scene, this method allows for real-time rendering.Through this simulation, we validate the effectiveness and feasibility of achieving depth of field effects in ray tracing as we envisioned.

Stage 1: Reflection Stage 1 work
In Stage 1, we employed uniform 32-point sampling at the circular edge of the lens.In the scene, we set up three spheres and two mirror planes.Among the three spheres, one is a golden metal sphere, one is a red plastic sphere, and the third is a green plastic sphere that revolves around the golden sphere.Then, at a distance further away from these three positions, we placed a mirrored plane capable of reflecting rays.
In rendering based on the pinhole camera model, the virtual images formed by objects in the mirror are often incorrectly assumed to be located at the same depth as the mirror itself.In other words, when we focus on the virtual images in a mirror, the depth information is always considered to be at the position of the mirror in 3D space, leading to erroneous depth of field calculations.Consequently, different virtual images at varying distances from the mirror are not properly distinguished, causing them to appear undifferentiated at the same mirror position.However, our method aims to address this issue.
To better verify the adherence of virtual images formed by mirror reflections to real-world principles, we adjusted the focal plane to be able to move between the nearest object (physical golden sphere) and the farthest object (virtual image of the golden sphere in the mirror).By moving the mouse, we could change the position of the focal plane, allowing it to be closer to or further from the viewpoint.

Stage 1 results
We have shared the source code on Shadertoy.Anyone can browse the source code and run it on their own device, rendering the resulting image through the following link.Reflection With real-time rendering capabilities, we observed that as the focal plane moves, the scene undergoes a similar focusing process, resulting in a noticeable depth of field effect.
Through experimental results, we verified that rendering with sampling on the lens can achieve accurate depth of field effects.It is worth noting that using traditional post-processing methods would not be able to correctly render virtual images formed by mirrors.

Stage 2: Refraction Stage 2 work
In the scene set up in Stage 1, we added a glass material sphere to verify whether the depth of field effect would still appear correctly with refraction included.This glass sphere has a refractive index of 0.8 and its position can be changed by the mouse.
In this experiment, we kept the focal plane fixed between the golden and red spheres.This means that when observing the closer golden sphere through the glass sphere, the blurriness of the depth of field effect should be reduced.On the other hand, when observing the farther red sphere, the blurriness of the depth of field effect should be enhanced.

Stage 2 results
We have shared the source code on Shadertoy.Anyone can browse the source code and run it on their own device, rendering the resulting image through the following link.Refraction We can observe that the glass sphere exhibits the characteristics of a convex lens.Through the glass sphere, the golden sphere appears very clear, while the red sphere becomes more blurred compared to direct observation.
The experimental results further validate that rendering with lens-based sampling can still exhibit correct depth of field effects in both reflections and refractions.It is worth noting that using traditional post-processing methods, objects with the same transparent material would not show different depth of field effects.Obviously, the scheme of sampling at the circular edge is not entirely reasonable because the color of the same pixel should be influenced by all the light rays passing through the circular area determined by the aperture size.The reason for not adopting this method in the first two stages is that 32 sampling points in this method are far from sufficient, which results in very poor image quality.However, increasing the number of sampling points would result in significant computational overhead in the ray tracing rendering process.
To achieve the goal of having more sampling points, we considered parallel computing.We set up four buffers, each generating the same number of random sampling points and performing rendering calculations synchronously.Ultimately, we summed the results to obtain an image with four times the sampling points of a single buffer.Additionally, we controlled the random sampling point generation process to ensure that the positions of each sampling point are different within the same frame while remaining stable across different frames to reduce flickering, resulting in smooth real-time rendering.
(a) Focus on the gold ball (b) Focus on the red plastic ball

Stage 3 results
We have shared the source code on Shadertoy.Anyone can browse the source code and run it on their own device, rendering the resulting image through the following link.Parallel computing Part 1: We still use the scene from the first stage and retain the feature of controlling the focal plane position with the mouse.In the end, we obtained an image rendered and composed with 96 random sampling points.The depth of field effect can be observed to render correctly.However, due to the insufficiency of sampling points and the substantial overhead from repetitive calculations, only lower-quality and less responsive real-time rendering results can be achieved.

Part 2:
To verify the acceleration effects achieved through parallel computing and analyze the benefits it brings, we conducted a comparative experiment.We collected frame rates for both direct rendering without buffer acceleration and rendering with parallel computing acceleration using four buffers at various sample point numbers.The results obtained are shown in the graph below.From the graph, we can observe that when the number of sample points is relatively low, the parallel computing acceleration with multiple buffers actually leads to a decrease in the real-time rendering frame rate.This is because, when using multiple buffers for computation, the CPU has to allocate computational resources.This, in comparison to rendering with a single buffer, results in a higher time consumption for processing a single frame.However, as the number of sample points increases, the advantages of parallel computing acceleration with multiple buffers gradually become evident.
In this experiment, when the number of sample points is approximately 160 or more, rendering with four buffers shows a higher frame rate.Furthermore, with an increase in the number of sample points, the acceleration ratio also gradually improves, demonstrating a strong positive correlation between the two.Combining our previous experimental results, it is evident that a higher number of sample points leads to a more realistic and detailed rendering effect.This indicates that the more one desires a highquality rendering outcome, the greater the benefits of our method from parallel computing acceleration.
Hence, we can conclude that using our method to achieve depth of field effects in ray tracing yields significant benefits from parallel acceleration computations.We believe this to be an excellent feature in distributed ray tracing, with substantial gains from parallel acceleration.Therefore, as hardware capabilities continue to support and optimize parallel computing, our method will exhibit excellent rendering efficiency and acceleration effects.

Conclusion
This method focuses on validating the feasibility and effectiveness of implementing depth of field effects through distributed ray tracing on non-pinhole lenses.Based on experiments conducted in three stages, we have devised a scheme for real-time rendering and arrived at preliminary conclusions regarding this method.
By considering that rays always pass through the aperture-sized lens, the color of each pixel is computed through shading calculations based on rays sampled at different points on the lens.This imaging approach better simulates the physical laws present in the real world.Additionally, it aligns more closely with the patterns observed by the human eye, resulting in a final rendered image that appears more realistic and immersive due to the accurate depth of field effect.
Furthermore, through comparison with other methods, we have found that this approach consistently produces correct results in situations involving reflections, refractions, and other optical phenomena.It boasts characteristics such as simplicity, strong interpretability, and adherence to physical principles, making it easy to extend and expand upon.

Limitations
This method produces highly realistic depth of field effects.However, the quality of the depth of field effect is strongly correlated with the number of sampling points, which inevitably leads to significant computational overhead.Furthermore, this problem exhibits diminishing returns, meaning that the visual quality is exponentially related to the number of sampling points, with each incremental improvement in quality requiring a rapidly increasing computational cost.
Furthermore, this method is a simplified simulation of human eye imaging, and there are many details that have not been fully taken into account.For example, the path of light rays in the human eye is not entirely equivalent to passing through an ideal lens, different colors separate when refracted, and there are the effects of binocular vision on depth perception, among other factors.

Future work
This rendering technology opens up a lot of development possibilities for current imaging solutions.With the rapid advancement of hardware performance, including GPUs, highly simulation-based, realistic graphics rendering solutions become feasible.Breaking free from the traditional methods of three-dimensional scene imaging based on the inherent single-point camera model, we can discover imaging approaches that better align with real visual perception.Leveraging the convenience and strong optical properties offered by ray tracing technology, we can apply more scientifically rigorous, physics-based rendering to the field of computer graphics.For instance, Schedl presented a generalized depthof-field light-field rendering method.[15] In this method, plenoptic cameras together with advanced light-field rendering enable depth-of-field effects that go far beyond the capabilities of conventional imaging.Furthermore, more advanced optical models and refined rendering theories are the primary goals that our method aims to incorporate.
In the future, integrating this rendering technology with mixed-reality devices could offer incredibly realistic and immersive visual experiences while reducing motion sickness.Carnegie pointed out that a conflict between accommodation and vergence depth cues on stereoscopic displays is a significant cause of visual discomfort.[16] So their article describes the results of an evaluation used to judge the effectiveness of dynamic depth-of-field blur in an effort to reduce discomfort caused by exposure to stereoscopic content on HMDs.
Meanwhile, display device technology is also advancing rapidly.In some of the latest research findings, display devices that are radically different from traditional displays have been designed and tested.Applying this rendering technology to display devices with spatial depth effects, such as Split-Lohmann multifocal display designed by Qin, [17] would ensure more accurate optical properties, thereby reducing visual inconsistencies (such as reflection and refraction in transparent materials).
We believe that, with the development and integration of rendering and other technologies, the arrival of visual experiences that are indistinguishable from reality is inevitable.

Figure 3 .
Figure 3. Objects in front of the mirror

Figure 4 .
Figure 4. Virtual image reflected by a specular surface

Figure 5 .
Figure 5. Add a glass ball & Keep the focus point between the red sphere and the gold sphere.
(a) Observe the gold sphere (b) Observe the red plastic sphere

Figure 6 .
Figure 6.Fix the focus point & Observing objects through the glass ball 4.4.Stage 3: Parallel Computing Stage 3 workObviously, the scheme of sampling at the circular edge is not entirely reasonable because the color of the same pixel should be influenced by all the light rays passing through the circular area determined by the aperture size.The reason for not adopting this method in the first two stages is that 32 sampling points in this method are far from sufficient, which results in very poor image quality.However, increasing the number of sampling points would result in significant computational overhead in the ray tracing rendering process.To achieve the goal of having more sampling points, we considered parallel computing.We set up four buffers, each generating the same number of random sampling points and performing rendering calculations synchronously.Ultimately, we summed the results to obtain an image with four times the sampling points of a single buffer.Additionally, we controlled the random sampling point generation

Figure 8 .
Figure 8. Experimental results data for computation acceleration with multiple buffers.