Semi-reflective visor-based compact head-worn light field display

We propose a visor-based see-through head-worn light field display. The visor is a semi-reflective concave image combiner that overlays virtual imagery to the user’s visual field. The visor has a toroidal surface profile for off-axis astigmatism correction. Virtual images are created at different depths using a classical light field assembly (LFA), which comprises a microlens array and a display source. The LFA is placed at an angle above the visor, clear of the user’s line of sight. The image plane of the LFA is placed near the focal plane of the visor. Since the LFA forms virtual images farther away from the plane of the display source, the LFA itself can be brought close to the visor, allowing for a more compact display system compared to conventional head-worn displays.


Introduction
Amongst the divergent optical designs for head-worn displays (HWDs), designs that incorporate semi-transparent reflectors as the combiner have been the earliest types of head-mounted displays [1][2][3]. These designs are sometimes referred to as the bug-eye system, due to the resemblance of the globular combiner shape to insect eyes. Despite the advances in the design of HWDs that use diffractive, holographic, microstructured, and/or freeform optics [4][5][6][7][8][9], the reflector-based designs have proliferated thanks in part to the relative simplicity in constructing a combiner out of a semi-reflective surface. This approach also allows for a large field of view (FOV) and a large eyebox, while accommodating glasses-wearers. * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
As such they are found in a wide variety of applications, from military applications such as pilot helmets [10,11] to consumer applications such as augmented reality headsets [12][13][14]. Some of the recent developments in reflector-based designs make use of deformable reflectors [15,16] or foveated imaging technique [17]. One considerable drawback, however, is that reflector-based systems tend to be bulky, which gives them poor aesthetics. Another set of recent works describes smaller form-factor systems using folded optics designs [18,19], however, at the cost of the FOV.
This work demonstrates a new configuration of head-worn light field displays (LFDs) that use a visor-type image combiner. In general, LFDs are displays capable of generating images at varying depths from the eye, by emulating the light field emanating from a volumetric scene. The advantages are largely two-fold; first, the virtual image-forming light field assembly (LFA) can be placed much closer to the combiner than the intrinsic focal plane of the image combiner. This can reduce the headset size and allows the device to sit closer to the face compared to conventional designs, which are essential characteristics for building a smaller form-factor device compared to existing reflector-based HWDs [12][13][14][15][16][17]. Second, the pitches of the elemental image and the microlens array (MLA) are equal in our LFA, which makes our LF-HWD a telecentric system, i.e. the FOV is constant over the depth range with no vignetting at the peripheries of the virtual image. This allows us to make the LFA behave like a pupil expander that produces divergent beams, with a minimal amount of pixel redundancy between elemental images. An elemental image is a section of a scene containing the light field corresponding to the section being viewed from a certain perspective and position. These characteristics in turn enlarge the eyebox and increase the effective resolution of the virtual light field image. This is an advantage over traditional LFDs that do not use a combiner [20] and previous reflector-based integral imaging display [21] that must deal with a trade-off between the eyebox size and the effective resolution. As well, using a semi-transparent combiner does not deteriorate the see-through imagery unlike in direct-view type LFDs [22,23] where the display and microstructure arrays are placed in the user's line of sight. In the following sections, we describe in detail the imaging principle of the proposed display system and simulation results, as well as the implementation of the proposed system through a prototype and the characterization of that prototype.

Theory
The proposed display system is composed mainly of two optical components, the concave reflector and the LFA, which is an assembly of a MLA and a display panel. Figure 1 depicts the side-view of the system. The concave reflector is tilted upwards at an angle θ rt in the yz-plane with respect to the line of sight of the user, and the LFA plane is tilted by 2θ rt . Figure 2 shows the LFA and the ray propagation through it in closer detail. The LFA is in a Galilean configuration [8,24], which forms a virtual image on the opposite side of the MLA, rather than a real image between the reflector and the LFA. The virtual image plane of the LFA is placed at the focal plane of the reflector such that it introduces telecentricity. The reflector forms the final virtual image that is perceived by the user. In figure 2, the three green squares represent light-emitting pixels on the display panel, each in one of the three neighboring elemental images, that collectively form a single point image in the virtual image plane of the LFA. The point image is formed along the refracted rays passing through the microlenses, that emanate from these pixels. The angle of incidence of the chief ray shown in figure 2 is related to the MLA thickness d (assuming a negligible microlens sag), and the shift distance between adjacent elemental images w shift (in number of pixels). The additional variables p p and N in equation (1) represent the pixel pitch and an integer denoting the Nth position  In this example, we = pe. We assume the display panel is infinitesimally thin. The gaps between the three beam paths are assumed negligible with 100% fill factor MLA.
of the microlens and the elemental image pair from the center axis of the display (the on-axis position being the 0th position of the array). This yields the angle of refraction of the chief ray where n is the refractive index of the MLA. Equations (1) and (2) are true, provided a paraxial condition in which p p w shift /d is small. In the same paraxial condition, the LFA virtual image distance (as shown in figure 2) measured from the display plane (assuming negligible display thickness) depends on the additional variable p m , the microlens pitch. Evidently from equation (3), we can create multiple image planes as v varies with the change in w shift . In order for the reflector to form a virtual image over the entire range of v produced by the LFA, the plane of the farthest possible v (using the lowest possible w shift ) needs to be placed at the focal plane of the reflector. As a result, for the plane of the farthest v, the final virtual image formed by the reflector is at optical infinity; increasing w shift brings the virtual image closer to the user. Because the reflector is tilted with respect to the LFA, a significant amount of oblique astigmatism is introduced. As such, we use a simple toroidal reflector, having different spherical radii r s and r t in the sagittal and tangential planes, respectively, to correct for the off-axis reflection [25]. Such a reflector can readily be fabricated in-house using a 3D printing-based process (section 3.1). It is worth noting that the telecentric configuration of the system allows us to set p m equal to the elemental image pitch p e (or w e if w e = p e , both variables are in number of pixels), unlike in direct-view and reflectorbased LFDs [20,21] where these quantities are typically dissimilar. This enforces here a coaxial and identical optical relationship between each pair of microlens and its corresponding elemental image over the entire LFA, which helps to maintain a uniform FOV over the virtual image depth field, and imaging condition such as brightness across the virtual image. Figure 3 shows the ray propagation and key parameters for estimating the horizontal and vertical FOV. The horizontal FOV ϕ h can be estimated such that where In equations (5) and (6), l is the width of the lightfield image on the display, and The vertical FOV approximated as .
Note that we assume both tangential and sagittal contours of the reflector are spherical. As well, we assume the convergence point F t (figure 3 right) is on-axis for convenience. The effective pixel resolution of the virtual image (also of the original image before light field conversion) is given in number of pixels. In equation (9), N max is the total number of elemental images in 1D, which is also equal to the display pixel resolution divided by w e . It should be emphasized that ER is a function of w shift , which suggests that the apparent size of the virtual image will differ at different depths. In other words, when the original image of a fixed pixel resolution (fixed size) is sampled with different w shift values for creating a virtual image at different depths, N max changes (increasing w shift decreases N max ). Since N max is directly proportional to the LFA width and height l and h, the virtual image size in terms of the FOV will decrease when w shift increases (virtual image at nearer distance), and vice versa.
The size of the eyebox assumes 100% microlens fill-factor. Contrary to the traditional MLA-based LF-HWD such as [18] and [20], whose eyebox size is close to floor (w e /w shift ) × p m , the use of the reflector introduces the multiplier term f t /(d + v) to the eyebox size equation. A tangible interpretation of this term is that the width of the beam as it departs the MLA (figure 2), will have expanded by the multiplier term when it reaches the reflector, since it is diverging. With f t /(d + v) > 1, we can use a larger value for w shift , which increases the effective resolution according to equation (9). Note that the dependence of the eyebox size on floor (w e /w shift ) suggests that the virtual images at different depths will have different eyebox sizes. Also, it should be noted that the ray convergence points F s and F t (figure 3) in the sagittal and the tangential plane, respectively, may not be identical depending on the curvature of the reflector in the respective planes.

Simulation
We use raytracing software Zemax OpticStudio to validate the imaging theory outlined in section 2.1. Figure 4 shows the simulation setup in the non-sequential mode in OpticStudio. The setup consists of an LFA, a toroidal reflector, and an ideal eye to bring the reflector image into focus. Table 1 lists the parameters used in the simulation. Note that some parameters are fixed quantities. Other parameters are determined arbitrarily but influenced by the fixed parameters, MLA fabrication constraints, and consideration for the eyebox size. For example, p m is chosen as such because it must be an integer multiple of p p and is also close to the maximum microlens base diameter we can reliably fabricate. A larger microlens is desired as it increases the eyebox size. Since the eyebox size can also vary depending on the value of w shift as per    from the eye. The distance between the LFA and the reflector is mostly constrained by the fabrication limitations. Ideally, the LFA should be very close to the visor for a couple of reasons. First, this reduces the form factor as much as possible. Second, it allows for the separation of the LFA virtual image plane from the physical LFA plane to prevent the physical structure of the LFA from being seen when the virtual image is in focus. However, this requires microlenses with a longer focal length (larger ROC), but microlenses with too shallow of a sag are not commercially available and cannot be made using our current fabrication process (to be explained further in section 3.2). As a result, the maximum distance v to the display from the reflector we can achieve is limited to only ∼6 mm, which also limits the size reduction of the prototype. Figure 5 shows the test image used in the simulation. The test image consists of two adjacent sub-images, each of which has been converted into light field patterns from the test target image on the left side of figure 5, using w shift = 3 and 5. This in turn forms virtual images at different distances from the eye; OpticStudio estimates that the larger left sub-target forms a virtual image at infinity, and the smaller right sub-target at ∼290 mm from the eye pupil. Figure 6 shows the simulated retinal images on the ideal eye, with the eye focused at near and far-field virtual images.

Fabrication of the toroidal reflector
A toroidal reflector suited for our design parameters does not appear to be commercially available, so we turn to fabricating our own by thermoforming a polyethylene terephthalate glycol (PETG) sheet over a mold. A similar approach is described in [17]. Unlike [17] however, where the thermoformed plastic is temporarily used as an envelope to help an optical glue cure and smooth the surface, after which it is discarded, we use the thermoformed PETG sheet as the reflector. The mold is printed with an Anicubic Photon stereolithography printer, using their clear resin. The mold is designed in a way that the outer surface of the thermoformed reflector touches the mold. The surface of the mold is manually polished using a 600-grit wet sandpaper, to prevent the thermoformed reflector from taking on the roughness of the 3D-printed mold surface. However, in our experiments, manual polishing (even with highergrit sandpapers) does not provide an adequate smoothness. As well, excessive polishing can distort the curvature of the mold and leave scratches on the surface. Therefore, after a quick polish, we lightly coat the surface of the mold using an epoxy resin (Max Clear resin from Polymer Composites Inc.). This seals the surface irregularities, resulting in a smoother surface due to surface tension. The curvature of the mold was compensated to account for the thickness of the plastic sheet and the resin coating which combined are about 1 mm. The PETG sheet is then vacuum thermoformed over the mold using a generic thermoformer with a 600 W heating coil. After thermoforming, the inner surface (concave side) of the thermoformed reflector is sputter coated with a 10 nm layer of silver using an Angstrom Engineering PVD sputter coater. The resulting reflectance is about 50%, measured using an Avantes AvaSpec-2048L spectrometer. Figure 7 shows the mold after resin coating and the thermoformed reflector after silver deposition.

Fabrication of the MLA
The MLA is double-cast from a microfabricated MLA mold on a silicon wafer. Figure 8 shows the final double-cast MLA on a 51 mm × 76 mm glass slide with a 1 mm thickness. For fabrication of the MLA mold, we follow an identical photolithography and reflow process as shown in [26]. First, hexamethyldisilane is vapor-deposited on a 4 inch silicon wafer to promote adhesion between the wafer and the photoresist. Next, a ∼35 µm thick photoresist film (AZ 4620 positive photoresist) is spin-coated using a two-step coating procedure (first layer spin-coated at 730 rpm and second layer at 3000 rpm using a Headway PWM32 spinner), and baked on a hot plate at 100 • C. Then, the photoresist is exposed through a photomask and developed, which results in an array of cylindrical Epoxy-coated thermoforming mold (left) and the thermoformed and silver-deposited reflector. The excess resin did not drain completely and beaded partially at the edge of the mold (red circles). However, this is not critical to the functioning of the reflector. The reflector is semi-transparent as the text underneath is visible. The reflector is about 50 mm in diameter across the short side (sagittal plane) and 55 mm across the long side (tangential plane). photoresist islands. Finally, the wafer is heated to 145 • C to reflow the photoresist islands, at which point they become spherical caps (or lenses) due to the surface tension of the molten photoresist while maintaining their footprint. If the height of the spherical cap (sag height) is very small compared to the lens diameter, then the photoresist reflow process will require a substantial amount of lateral reflow of highly viscous photoresist to achieve a spherical cap, which can take a long time. This is associated with the risk that the photoresist might be damaged by long exposure to the high reflow temperature. To reduce the risk of photoresist damage, conditions of the reflow e.g. the aspect ratio between the sag height and the lens footprint need to be met. Chapter 4 in [27] includes an in-depth study on considerations for the reflow process.
The photoresist MLA has a 612 µm pitch (equal to w e × p p ) with individual microlenses having a ∼580 µm base diameter representing a fill-factor of ∼71%, and a 714 ± 25 µm ROC on average (the target ROC is 721.5 µm). The ROC was measured by scanning the surface profile of the microlenses at the four corners of the array using a Dektak XT mechanical profilometer, then fitting the profilometer data with a spherical contour using the Pratt method [28,29]. The variation in microlens ROC across the array is most likely due to the unevenness in photoresist film thickness after spin coating, which may vary by ±5% for films >8 µm in thickness, according to the manufacturer (Microchemicals GmbH). Figure 9 shows a couple of the scanned microlens profiles with spherical contours fitted to the data. Since the original photoresist MLA on silicon wafer is not visually clear, we use a double-casting process to make a clear replica. We first cast a negative mold of the photoresist MLA using the Max Cast flexible epoxy resin from Polymer Composites Inc. at room temperature for 24 h. Once the negative mold is cured, we cast with PDMS to get the final clear MLA. The PDMS is cured at room temperature for a minimum of 24 h.

Prototype
We build a prototype using the fabricated reflector and the MLA. Note that this is a monocular system meant to be viewed with one eye. The majority of the other prototype components such as the display frame are 3D printed with a fused deposition modelling (FDM) printer. The MLA is mounted on an XYZ micro-positioning stage (Thorlabs DT12XYZ) for alignment of the MLA with the display. The MLA mount also allows for some rotational freedom in the plane of the MLA. A Sharp LS029B3SX02 LCD panel with a pixel resolution of 1440 × 1440 pixels, a 36 µm pixel pitch, and a 300 cd m −2 of brightness is used as the display source. The 3D-printed display frame rests on a set of threaded nuts, one on each side of the frame, that ride a pair of bolts with a 10-32 thread. Turning the nuts adjusts the distance between the display and the reflector, which is mounted at the ends of the said bolts. The completed prototype assembly is suspended from a retort stand, as shown in figure 10.

Image assessment
We evaluate the optical quality of the toroidal reflector by measuring the modulation transfer function (MTF) from the virtual images of line pair (lp) test targets formed by the toroidal reflector without the LFA in place ( figure 11). Note that for this we place the display source at the focal plane of the reflector. We use a Nikon D60 digital camera with a 35 mm lens to capture the virtual images of the test targets in a dark room. The distance between the display and the reflector is adjusted such that the virtual images are formed at as close to optical infinity as possible, with the camera also focused at infinity and pointed at the center of the reflector. Figure 12 shows the comparison between the measured MTF and the OpticStudio estimate using a 4 mm pupil on the ideal eye in both the tangential and the sagittal planes. For the MTF, we measure the relative maximum/minimum luminance per pixel row/column (depending on the direction of the lps), then  calculate the contrast and average over the area of the lp targets. Note that the lp targets span about 3 • × 3 • to 6 • × 6 • , depending on the spatial frequency, therefore the measured MTF represent an averaged value within these angular ranges. The OpticStudio MTF estimate represents an upper-bound as it was determined using on-axis rays. Regardless, the measured MTF is expected to be lower due to the reflector surface irregularities introduced during fabrication.
We switch to using a LG V40 ThinQ phone camera for capturing the virtual images formed by the light field source images; the Nikon camera with its lens does not fit into the test setup to be positioned close to the reflector within the eyerelief distance. We first measure the FOV by displaying and capturing the image of concentric rings (figure 13), each of which represents a unique angle subtended from the camera.
The default LG V40 ThinQ phone camera has a FOV of about 67 • × 53 • and manual focus adjustability. The measured monocular FOV is approximately 40 • × 31 • , in line with what is estimated by equations (4)- (8). FOV measurements are also shown in figure 13. Note that this is only about half of the maximum FOV the Sharp display can support (about 73 • × 78 • , since the display width = height = 51.84 mm). This is because we are limited by the size of the MLA used, which is less than 30 mm × 30 mm. Figure 14 shows the virtual images we obtain from the test setup through the semi-transparent reflector, using the lp test target shown in figure 5. Because the fabricated MLAs have a fill-factor of ∼70%, stray light that propagates through the gaps between the microlenses causes glare and reduces the quality of the virtual images. To solve this, we implement a  virtual aperture array with the elemental images where each elemental image is formed into a circle with a diameter equal to or smaller than the base diameter of the microlenses, similar to a method introduced in [23]. However, constricting the aperture diameter too much reduces the brightness of the virtual image. We have experimented with different aperture diameters and found that setting the aperture diameter to about 60% of the microlens diameter of 580 µm produces virtual images with a good balance between image quality and brightness. Figure 15(a) shows a test target consisting of two letter sequences, one in green and one in red. The light fieldconverted test target in figure 15(b) is prepared such that the two letter sequences will appear at different distances in the virtual image. The difference in image quality with and without the virtual aperture array can be observed in figures 15(d)-(f). Figures 15(d) and (e) are camera-captured  Figure 15(f) is a captured virtual image using the elemental image of figure 15(b), without the virtual aperture array.
We identify a few shortcomings in the quality of the virtual images seen through the reflector. First, the virtual images appear grainy and 'squiggly,' as observed in figures 15(d) and (e). This is because the pixel density of the Sharp IPS panel is somewhat low, and the pixels are arranged in a zig-zag pattern, with each pixel being rather large at 36 µm in comparison to the microlenses. In addition, the microlenses magnify the pixel structure and make them more visible, as seen in figures 15(g) and (h). This can be mitigated by using a display with smaller pixels and/or using an intermediate diffusing layer between the MLA and the display. Second, when we shift the camera focus to near field, the MLA structures become visible, as in the left side of figure 14, because the distance at which the MLA structures are imaged by the reflector becomes close to the near field image. This can be remedied by using a higher fill-factor (ideally 100%) MLA such that the amount of stray light is reduced, thereby reducing the chances of the MLA being illuminated due to interreflection within the MLA. Also, moving the LFA closer to the reflector (with a longer MLA focal length), away from the intrinsic focal plane of the reflector, and potentially applying an anti-reflective coating to the MLA may help. Third, defective microlenses in the MLA make the virtual images look as if there are water droplets on them, as in figures 13 and 14. Other aberrations such as distortion are also visible, which should be addressed using a freeform reflector with a non-symmetrical and higher order surface profile, optimizing the microlenses by customizing the microlens focal lengths specific to the microlens location in the array [30], and image pre-processing techniques [31].

Conclusions
We present an LFHWD using a semi-transparent toroidal reflector as the image combiner. We show that putting the LFA in a Galilean configuration can reduce the overall size of a visor reflector-based LFHWDs and synchronizing the period of the elemental images to the MLA pitch makes for a telecentric system such that the light field image FOV is constant over the focal depth range. We evaluate the optical quality of the house-made toroidal mirror by measuring the MTF using lp test targets. The MTF measurements, while lower than the MTF estimated in OpticStudio, have expected values given the quality of the mirror. With the prototype assembly we show virtual images generated from light field source images, at different focal planes, captured using a camera. A few shortcomings are identified with the quality of the virtual images seen through the reflector, due to the low quality of the housefabricated MLA and the characteristics of the display panel used in the prototype, such as the stray light propagating in between the microlenses due to the low fill factor. Despite this, this work demonstrates the capabilities of our new LFHWD concept through a proof-of-principle prototype.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).