Virtual reality simulation of epi-retinal prosthetic vision highlights the relevance of the visual angle

Objective Retinal prostheses hold the potential to restore artificial vision in blind patients suffering from outer retinal dystrophies. The optimal number, density, and coverage of the electrodes that a retinal prosthesis should have to provide adequate artificial vision in daily activities is still an open question and an important design parameter needed to develop better implants. Approach To address this question, we investigated the interaction between the visual angle, the pixel number and the pixel density without being limited by a small electrode count, as in previous research reports. We implemented prosthetic vision in a virtual reality environment in order to simulate the real-life experience of using a retinal prosthesis. We designed four different tasks simulating: object recognition, word reading, perception of a descending step and crossing a street. Main results The results of our study showed that in all the tasks the visual angle played the most significant role in improving the performance of the participant. Significance The design of new retinal prostheses should take into account the relevance of the restored visual angle to provide a helpful and valuable visual aid to profoundly or totally blind patients.


Introduction
Retinal prostheses are neuroprostheses able to revert blindness caused by outer retinal dystrophies, such as retinitis pigmentosa and dry age-related macular degeneration [1,2]. Over two decades, retinal prostheses yielded extraordinary results in patients [3][4][5]; however, despite the progresses made in the research field, there are still several limitations affecting the possibility to provide useful vision in daily life [6]. Blindness is defined as visual acuity of less than 20/400 or a corresponding visual field loss to less than 10 degrees, in the better eye with the best possible correction (World Health Organization). In North America and most of European countries, legal blindness is defined as visual acuity of 20/200 or visual field no greater that 20 degrees.
Although visual acuity is an important measure of the quality of vision (both natural and artificial), another very important feature is the size of the visual field, which might affect several visually-guided behaviours important in daily life. For retinal prostheses, as a rule of thumb, the visual field is linked to the retinal coverage of the prosthesis, while the visual acuity to the density of the stimulating electrodes. Therefore, the number, density, and coverage of the electrodes is an important design parameter for retinal prostheses to provide adequate artificial vision in daily life. The retinal prostheses in clinical use are limited by the number of pixels that can be wired to the implantable stimulator, which impose a small retinal coverage and/or a low pixel density. As a consequence, the clinical devices were intrinsically limited to either increase the pixel density in a small surface area [3] or enlarge the surface area at the expense of the pixel density [4,7].
An early study on healthy participants under pixelated vision indicated that an array of 25 × 25 pixels and 30 degrees of visual angle could provide adequate mobility skills [8]. It is worth noting that, combined together, these parameters (625 pixels and 30 degrees) are beyond what is offered by retinal prostheses in clinical use today. A follow up study confirmed the results reporting that a visual angle of 33 x 23 degrees was preferred by participants for mobility with more that 1,000 pixels to allow for safer decision-making [9]. Other studies estimated that, to be useful in daily life, a retinal prosthesis should have at least 500 pixels in the central area of about 5 mm in diameter [10,11]. More recently, a trial on healthy participants showed that the number of pixels required to recognize common objects is on the order of 3,000 to 5,000 [12]. Collectively, these results suggest that covering a large visual angle even with a limited resolution seems sufficient for mobility skills, while high pixel density is required for object recognition. Unfortunately, all of these previous studies focused on only one part of the problem, either the wide visual angle or the high pixel density. In addition, these studies were often based on a conversion of images into an idealized phosphene representation, where phosphenes are round and perfectly aligned. Although this might hold true for sub-retinal prostheses, this is not the case for epi-retinal devices. Due to the direct activation of axons in the nerve fibre layer, phosphenes are often elongated or with a complex shape [13,14]. Since, wide-field arrays have been so far designed for epiretinal placement only, the complex shape of phosphenes must be considered when simulating artificial vision.
Recently, photovoltaic retinal prostheses allowed for wireless retinal stimulation [15][16][17][18][19][20][21][22], thus overcoming in principle the problem of prioritizing between pixel density or surface area. Photovoltaic retinal prostheses have the ability to embed a large number of pixels, limited only by the manufacturing capability and the conversion efficiency of the pixel. Moreover, conformable wide-field retinal prostheses enabled the enlargement of the retinal coverage without affecting the pixel density [23].
These research results prompted us to investigate the interaction between the retinal coverage and the pixel density without being limited to a small number of electrodes. We took advantage of virtual reality (VR) software paired with a portable head mounted display (HMD) to monitor the performance of healthy participants under simulated prosthetic vision with variable pixel density and retinal coverage. The stimulation layout was mapped accordingly to the epi-retinal wide-field retinal prosthesis known as POLYRETINA [23], which allows in the same device both a high pixels density and a wide visual field. Since POLYRETINA is an epi-retinal prosthesis, the direct activation of axons resulting in elongated phosphenes was considered according to recent computational models [13]. Also, taking advantage of the flexibility allowed by VR, we tested behavioural performances not only for word and object recognition, but also in common daily activities, such as recognising a descending pair of steps or crossing a busy street.
Our results provide insights about the optimal number, density, and coverage required for a retinal prosthesis to be useful in daily life. In particular we found that the visual angle had the most significant effect in improving the performance of the tested participant. These results led to the conclusion that in order to be useful in daily life, retinal prostheses must restore artificial vision in a large visual angle so as to overcome limitations related to head movements for space scanning and mental reconstruction of the visual scene.

Experiment design.
The participants performed four behavioural tests, described as 'object recognition', 'word reading', 'step perception', and 'street crossing'.
The tests were performed using a Dell Precision 3630 desktop computer with an Intel Xeon E-2146G CPU (3.50 GHz) and an nVidia GeForce GTX 1080 GPU. To simulate prosthetic vision two commercial VR HMDs were used: the FOVE ('object recognition' and 'word reading') and the VIVE Pro Eye ('step perception' and 'street crossing'). The FOVE has a resolution of 2560 ´ 1440 pixels (1280 ´ 1440 pixels per eye), covers a visual field of approximately 95 degrees, and updates at 70 Hz. The HMD collects data about the participant's head orientation through internal gyroscopes, and tracks eye gaze through two integrated infrared eye-tracking cameras at a frequency of 120 Hz. Correspondingly, the VIVE Pro Eye has a resolution of 2880 x 1600 (1440 x 1600 pixels per eye), a maximum field of view of 110 degrees and a 90Hz refresh rate. Head orientation and position is provided using two external cameras aimed at the headset, and eye-tracking is similarly accomplished using integrated infrared cameras operating at 120 Hz.
During the 'object recognition' and 'word reading' tests, the participants were seated, wearing the FOVE HMD, and holding a game controller. Participants were then shown a series of objects or words and asked to identify them. When the participants felt like they had recognized the object or word, they would press a button on the game controller and then tell their guess to the examiner, who would mark the answer as correct or incorrect and proceed to the next item (minimum time interval of 1s). Participants were allowed to explore the objects or words via head movements and eye movements. For the objects, participants also had the ability to rotate the object with the game controller. Words were either presented in English, French or Italian at the preference of the participant from a list of commonly used words. The full list of 108 possible objects is shown in Fig. 1.
During the 'step perception' test, participants were asked to stand while wearing the VIVE Pro Eye HMD and holding two VR controllers. The virtual scenario involved the participants standing on a platform with two steps placed diagonally in front of them. The height of both steps was altered in each trial and participants were instructed to determine which step they thought was highest, or the easiest to step down on to. Participants were allowed to explore the environment via head and eye movements. Once they had made a decision, they would press a button on the corresponding left or right controller. Differences in height were 10, 20, 30 and 40 cm.
During the 'street crossing' test, the virtual environment replicated a trafficked street, with cars moving at a speed of 30 km hr -1 in both directions, and the participants standing on the edge of the sidewalk. Participants were allowed to explore the environment via head and eye movements. At random intervals between 8 and 12 seconds, there would be a gap of 2, 4, 6, or 8 seconds on the left side of the street and 4, 6, or 8 seconds on the right (note that the length of the opening on the two sides could be different) where no cars would be present.
If the participants felt that it was safe to cross the street, they would take two steps forward. The program would then determine that the participant had crossed the road, based on data received from the VIVE motion capture system. In case there was a car too close to the participant's position, the system would mark the crossing as failed and play the sound of a car crashing, otherwise the crossing would be marked as successful.

Simulation of prosthetic vision
All of the test environments as well as the simulation of prosthetic vision were developed using Unity (www.unity.com). Data collection was also performed in Unity. For tests using the FOVE headset, eyetracking information was obtained via the FOVE Software Development Kit (SDK). Otherwise, when using the VIVE Pro Eye HMD, the SRanipal eye-tracking SDK was used to track participants eye.
The simulated prosthetic vision was computed using Cg shaders, allowing the system to run in real-time.
Briefly, shaders are pieces of code that are run on a computer's graphics hardware for each output pixel (FOVE:1280x1440; VIVE Pro Eye: 1440x1600), every update frame (FOVE: 75Hz; VIVE Pro Eye: 90Hz).
To save on computing time, phosphene centres were pre-calculated and saved as a texture which could be accessed by the shaders. In this sense, the texture equated to a data matrix with a width and height equal to the resolution of the output image, where each element in the matrix held the coordinate of the centre of the closest phosphene to the pixel. Pixels were then activated in the Cg shader depending on the following criteria: a) they are inside the POLYRETINA field of view, b) they are inside their closest phosphene, c) their closest electrode is not "broken" (10% chance), and d) the luminance of the input image at the equivalent pixel exceeded 0.5.
Images were converted into phosphenes using four layouts based on the POLYRETINA prosthesis: 100/150, 80/120, 60/90, and 40/60 (electrode diameter in µm / electrode pitch in µm). The number of pixels for each layout is reported in Tab. 2 as a function of the visual angle. For each phosphene image, three sources of random variability were included: the phosphene size was randomly varied between -30% to +30% of the electrode size, the phosphene brightness was randomly varied between 50% (grey) to 100% (white) of the phosphene's default brightness, and 10% of the electrode were considered not functional. Lastly, image presentation was not continuous but pulsed. Ideally, we would have planned 10-ms image presentation followed by 40-ms of dark (equivalent to a 20 Hz repletion rate). However, due to the refresh rate of both the FOVE and VIVE Pro Eye being higher than 10 ms (13 ms for the FOVE and 11 ms for the VIVE Pro Eye), the black pause was 39 ms for the FOVE and 44 ms for the VIVE Pro Eye.
The process just described generates a phosphene image, often called a scoreboard (SB) model. However, based on the report from participants implanted with epi-retinal prostheses, phosphenes are more likely to be interpreted as curved oblong-like shapes [13]. A second Cg shader was used to account for the axonal fibre activation, having as an input the phosphene image. The values describing the trajectory of the axonal fibre passing through each pixel were pre-generated using a mathematical model [24] to reduce the amount of realtime processing. The shader then used these values to generate each phosphene's tail, following the underlying axon fibres. The tail's brightness would dissipate as a function of the distance from its phosphene, using the activation function described in [25]. Finally, the tails were superimposed on top of the original phosphene image. This image representation is called an axon map (AM) model. After this, one final Cg shader applies a small 3-pixel Gaussian blur to the image in order to soften the pixel edges for both the SB and SM models.
The image is then displayed in the right eye only of the VR headset.
For the 'object recognition', 'step perception' and 'street crossing' tests, an edge detection algorithm was also applied (for both the SB and AM model) just before the phosphene shader. The implementation is a modified version of the widely used Canny edge detection algorithm [26], except that instead of thinning the edges, it was preferable to thicken them to ensure that enough phosphenes were activated to represent the detected edge.
The entire code of the project is available online at https://github.com/lne-lab/polyretina_vr.

Ethical statement
Experiments were approved by the human research ethics committee of École polytechnique fédérale de

Results
Object recognition. The first test assesses the capability to recognize common objects (35 ° in size) under prosthetic vision (Fig. 1).
In a first cohort of participants (n = 4; participants 1 to 4), images were converted using the SB model and the layout with the lowest resolution (100/150). The participant's visual field was restricted to various visual angles: 5, 10, 15, 25, 35, and 45 ° (Fig. 2a). For each visual angle, 18 objects were randomly presented (108 objects per participant). We evaluated the success rate as percentage of correct answers (Fig. 2b) and the time required to provide a correct answer (Fig. 2c). In the latter, the time was not considered when the given answer was not correct. Both measures showed that the increase of the visual angle resulted in a higher success rate and lower time to response. For each participant, we quantified a performance index to account for both the success rate and the time taken to provide a correct answer [27,28], defined as in equation (1): For the 'object recognition' experiment, the chance level was 0 since the objects were unknown. A linear regression model showed that the performance index (Fig. 2d)  to highlight only the effect of the above-mentioned parameters. Four layouts (100/150, 80/120, 60/90, and 40/60) and three tail lengths (1, 2, 3 °) were randomly varied to obtain a total of 12 possible configurations ( Fig. 3a). For each configuration 9 different objects were randomly presented (108 different objects per participant). As before, we evaluated the success rate (Fig. 3b) and the time required to provide a correct answer (Fig. 3c). From those parameters we estimated the performance index (Fig. 3d). For each tail length, the linear regression model revealed that the slopes are not significantly different than zero, thus suggesting that the layout had not a relevant role once the visual angle was maximised (Tab. 4). Polling the data together, the differences between the slopes (obtained with the three tail lengths) are also not The 'object recognition' experiment revealed that the pixel layout had little or no effect. On the other hand, the performance is highly dependent on the visual angle. The tail length also affected object recognition, with longer tails reducing the performance.
Word reading. An advantage of a large visual field is the possibility to display entire words in the patient's field of view instead of single letters. This second test assesses the capability to read words under wide-field prosthetic vision. A list of words composed by 4 to 6 letters was generated in the mother language of each participant and presented with the AM model (tail lengths: 1, 2, 3 °).
In the first cohort of participants (n = 4, participants 1 to 4), images were converted using the 100/150 layout and the letter height was fixed to 7 °, thus each word occupied a visual field ranging from 15.6 ° (4 letters) to 40.6 ° (6 letters). The participant's visual field was restricted to various visual angles: 5, 10, 15, 25, 35, and 45 °. For each configuration (visual angle and tail length) 10 different words were randomly presented for a total of 180 different words per participant (Fig. 4a). Similar to the 'object recognition' test, both the success rate ( Fig. 4b) and the time required to provide a correct answer (Fig. 4c) showed that the increase of the visual angle resulted in a higher success rate and lower time to response. As a consequence, the performance index also increased with the visual angle (Fig. 4d). For the 'word reading' experiment, the chance level was 0 since the words were unknown. For each tail length, a nonlinear regression model (second order polynomial) revealed that the second order polynomials are significantly non-zero (B1 and B2 coefficients) thus suggesting that the visual angle significantly affected the performance index (Tab. 5). Polling the data together, one curve cannot fit all the data sets (p < 0.0001, F = 5.893, DFn = 6, DFd = 63) suggesting that the tail length also affected the performance index. The visual angle is a key parameter to improve the performance in reading words, so in the second cohort of participants (n = 10, participants 5 to 14), we tested the combined impact of the pixel layout, the tail length, and the letter height on the word recognition capability. The visual angle of the participant was kept unconstrained (45 °), to highlight only the effect of the above-mentioned parameters. The underlying hypothesis is that with a wide visual angle, the same word can be enlarged to maximise its readability. Five letter heights (from 3 ° to 7 °), four layouts (100/150, 80/120, 60/90, and 40/60), and three tail lengths (1, 2, and 3 °) were randomly varied to obtain a total of 60 possible configurations (Fig. 5). The words occupied an average visual field ranging from 6.4 ° (4 letters, 3 ° height) to 40.6 ° (6 letters, 7 ° height). For each configuration 10 different words were randomly presented for a total of 600 different words per participant.
The experiment was split in three session of 200 words each to limit fatigue.
Qualitatively, for all the three tail lengths considered, both the success rate (Fig. 6a) and the time required to provide a correct answer (Fig. 6b) improved by increasing the pixel density and the letter height, in particular for the smallest letter heights (3 ° and 4 °) and the sparser layouts (100/150 and 80/120). To quantitively estimate the effect of the pixel layout, we split the dataset in three groups based on the tail length and performed linear and nonlinear regression on the performance index (Fig. 6c-e). For each tail length, the two sparser layouts (100/150 and 80/120) were fitted by a nonlinear regression model (second order polynomial), while the two denser layouts (60/90 and 40/60) were fitted with a linear regression model. The nonlinear regression model revealed that the second order polynomials were significantly non-zero for all the tail lengths (Tab. 6).
This result suggested that for the sparser layouts increasing the letter height up to the maximum allowed by the visual angle is a good strategy to improve the reading performance. For the denser layouts, the linear regression model revealed that the slopes were not significantly different than zero (Tab. 6). Only for tail length 3° and layout 60/90 the slope is significantly not-zero. This result suggested that when the pixel number and density increased (i.e. the visual resolution increases) small characters can be easily recognised, and the increase in the letter height became less important. The 'word reading' experiment demonstrated that, similar to object recognition, the visual angle has the strongest effect on performance, in particular when the pixel layout does not allow for extremely high resolution. A large visual angle allows words to be presented with a larger letter's height, which is a good strategy to improve reading performance.
Step perception. The third test assesses the capability to identify shallow steps, which would be useful for safe ambulation. In this cohort of participants (n = 11, participants 5 to 11, 13 and 15 to 17), images were converted using the 100/150 layout and four tail lengths: 0, 1, 2, 3 °. The tail length equal to zero correspond to the SB model. The participant's visual angle was restricted to various visual angles: 5, 10, 15, 25, 35, and 45 °. Four differences in step height were tested: 10, 20, 30 and 40 cm. For each configuration (tail length, visual angle and step height) 4 repetitions were randomly presented for a total of 384 different conditions per participant (Fig. 7a). The participants were encouraged to take a break during the experiment to reduce fatigue. The participants were instructed to answer only if they were convinced, they knew the correct answer, otherwise to say 'I do not know' and the trial considered as failed. Similar to previous tests, both the percentage of correct answers (Fig. 7b) and the time required to provide a correct answer (Fig. 7c) showed that the increase of the visual angle resulted in a higher success rate and lower time to response. The performance index (Fig. 7d) were computed with a chance level of 50%, therefore the performance index was set to 0 when the success rate was 0%, as well as when the performance index was negative (i.e. success rate below the chance level).
For each tail length, a nonlinear regression model (second order polynomial) revealed that the second order polynomials are significantly non-zero (slope, B1 coefficient) thus suggesting that the visual angle significantly affected the performance index (Tab. 7). Polling the data together, one curve cannot fit all the data sets (p = 0.0134, F = 2.380, DFn = 9, DFd = 252) suggesting that the tail length also affected the performance index. The 'step perception' experiment confirmed that the visual angle has a strong effect for the increase of the performance.

Street crossing.
The last test is a representation of typical outdoor daily activity which requires orientation capability in a dynamic environment. In this cohort of participants (n = 9, participants 7 to 11, 13 and 15 to 17), images were converted using the 100/150 layout and four tail lengths: 0, 1, 2, 3 °. The tail length equal to zero corresponds to the SB model. The participant's visual angle was restricted to various visual angles: 10, 15, 25, 35, and 45 °. The smallest visual angle (5 °) was removed because it was not possible for participants to perform the test. For each configuration (tail length and visual angle) 6 repetitions were randomly presented for a total of 120 different conditions per participant (Fig. 8a-c). The participants were instructed to safely cross the street regardless of the time taken. Since the gaps were randomly presented, we quantified only the percentage of correct passages (Fig. 8b) and not the time required. For each tail length, the linear regression model revealed that the slopes are significantly non-zero for tail lengths 0 °, 2 ° and 3 °. The slope is not significantly different than zero for the tail length 1 °, for which the linear regression does not fit well the data (Tab. 9). Polling the data together, the differences between the slopes are not statistically significant (p = 0.7074, F = 1. As for the previous tests, also the 'street crossing' test highlighted the importance of the visual angle for users of retinal prostheses.

Discussion
In the past decade, several studies assessed the performance of participants under pixelated vision using either HMDs and computer monitors.
A first subset of studies focused mostly on resolution and the capability to recognise faces, objects or text [27,[29][30][31][32]. In these studies, the pixel number (m x n) was the parameter addressed the most, within a range from 8 x 8 to 40 x 50, and the effect of grey levels and pixel dropout was also evaluated. A common result among the studies is the increase in performance with an increase in the number of pixels. Remarkably, the visual angle was consistently not taken into account, even if it was not a fixed parameter since it was changing with the pixel number: if a 20 x 20 array would provide a 7 x 7 degrees of visual angle, a 34 x 34 array would provide a 12 x 12 degrees of visual angle [32]. In this latter report, the reading speed was found to decrease with the reduction of the visual angle. Other studies focused more on navigation skills and obstacle avoidance [8,9,33]. Here, the visual angle become an important factor to improve behavioural performance.
While both approaches (maximisation of the pixel number or the visual angle) are currently adopted respectively to improve of recognition skills or mobility skills, the development of large-field and high-density devices like POLYRETINA opens up the possibility to combine both features in a single device.
With conventional wired retinal prostheses, the limiting parameter was always the maximum number of electrodes which could fit in the device due to the presence of the implantable pulse generator, the efficiency of the wireless power transfer and the size of the trans-scleral cable. Therefore, while keeping the electrode number constant, it might be preferable to increase the electrode density at expenses of the visual angle since a large retinal coverage with poor electrode resolution will result in an increasing difficulty to recognise objects and shapes, which is essential also for safe navigation. The advantage of a photovoltaic prosthesis like POLYRETINA is the capability to include a theoretically unlimited number of pixels, and therefore to scale the number of pixels proportionally to the visual angle. Under such conditions, it appeared that the visual angle played a major role for object recognition, safe navigation and other common daily activities. Indeed, our results showed that the visual angle is consistently the most significant factor to increase behavioural performance.
Another important parameter is the tail length. It is intuitive that the distortion of the phosphene's shape impairs the behavioural performance. This distortion is caused by the activation of the axon of passages, which is a main limitation of epi-retinal prostheses. However, several approaches were recently reported to overcome the direct activation of retinal ganglion cells and favour the network-medicated activation, mainly based on waveform shaping [34,35] and input filters [36]. In humans, it was also demonstrated that long sinusoidal-like pulses promote the perception of circular phosphenes [37]. Improving the epi-retinal stimulation to avoid distorted phosphenes is a fundamental step to improve performance during daily activities.

Conclusion
We believe that profoundly and totally blind patients will benefit from the restoration of artificial vision within a substantially large visual angle. However, the visual angle cannot be decoupled from the pixel count and density. Wide-field and high-density photovoltaic prostheses possess the advantage of restoring a large visual angle with a moderate pixel resolution, and thus become a helpful and valuable visual aid to patients with retinitis pigmentosa. Experiments with simulated prosthetic vision might help in designing the layout of widefield and high-density to be tailored to the patient's needs. However, results from these models must be