The perceived effects of augmented trail sensing and mood recognition abilities in a human–fish biohybrid system

The use of technologies to enhance human and animal perception has been explored in pioneering research about artificial life and biohybrid systems. These attempts have revealed that augmented sensing abilities can emerge from new interactions between individuals within or across species. Nevertheless, the diverse effects of different augmented capabilities have been less examined and compared. In this work, we built a human–fish biohybrid system that enhanced the vision of the ornamental fish by projecting human participants onto the arena background. In contrast, human participants were equipped with a mixed-reality device, which visualized individual fish trails (representing situation-oriented perceptions) and emotions (representing communication-oriented perceptions). We investigated the impacts of the two enhanced perceptions on the human side and documented the perceived effects from three aspects. First, both augmented perceptions considerably increase participants’ attention toward ornamental fish, and the impact of emotion recognition is more potent than trail sense. Secondly, the frequency of human–fish interactions increases with the equipped perceptions. The mood recognition ability on the human side can indirectly promote the recorded positive mood of fish. Thirdly, most participants mentioned that they felt closer to those fish which had mood recognition ability, even if we added some mistakes in the accuracy of mood recognition. In contrast, the addition of trail sensing ability does not lead to a similar effect on the mental bond. These findings reveal several aspects of different perceived effects between the enhancements of communication-oriented and situation-oriented perceptions.


Introduction
Natural organisms (animals, plants, microorganisms) have evolved with diverse sensing capabilities to interact with conspecifics and respond to the environment (Evans 1996, Bijlsma and Loeschcke 2013, Richardson et al 2014. These inherent perceptions are vital to organisms in nature because they directly determine the channels and amounts of information that can be acquired from the surroundings. Naturally, these perceptions may deteriorate with time due to aging or impairment (Roberts and Allen 2016). However, based on technology, enhancement of perceptions becomes possible such that natural organisms can respond to their environment in a new way. Various studies (Macrae and Bodenhausen 2001, Corbetta and Shulman 2002, Bodenhausen and Hugenberg 2009, Rensink 2013, Stephenson et al 2021, Wolfe et al 2022 have shown that enhanced perceptions can concentrate human attention and change people's understanding of their social environment, but different perceptions have various effects on these changes. This article aims to investigate and compare how two distinct types of enhanced perception (one towards situational awareness and the other towards social communication) affect the cognition and behavior of humans when they interact with a tank of ornamental fish.

Background to perception
Perception is a natural organism's ability to organize, identify and interpret presented information gathered from its sensors for understanding its living surroundings. This ability is essential to most natural organisms, as it can directly determine cognition and behavior (Bruner and Postman 1949). For human beings, it is crucial to perceive detailed information about the behavioral semantics and emotional states of others for the achievement of self-regulation (Horne 2012). For other animals, such as elephants (e.g. smell to pick up unusual odors and avoid danger; Bates et al 2007) and rats (e.g. touch with their whiskers to find food and to communicate with conspecifics; Smith and Alloway 2013), perceptions are vital for survival and reproduction. Each creature has particular types of perceptions, by which a (limited) part of local information is extracted for the understanding of circumstances. Taking tactile perception for example, mimosas (Cahill Jr et al 2013) and snails (Logunov and Konnov 1983) can only sense the strength of pressure, ants (Ratnieks 2007) and bees (Esch et al 2001) have evolved to communicate through tactile perception, while gorillas (Clark et al 2019) and dogs (Byrne et al 2017) can identify intention and emotion through touch stimuli conducted from haptic cells in their skin.
In general, we can divide perceptions into situation-oriented and communication-oriented functions by the utility of the perceived information. Situation-oriented perception (situational awareness) refers to those functions concerning awareness of information about the surroundings (Bischoff and Graefe 1999). Trail sensing is one example of integrated situation-oriented perception that is well developed in predatory mammals such as bears, wolves and dogs (Standing et al 1970) and various species of ant. Communication-oriented perception (social perception) refers to the perceptions built up to achieve interaction, communication and social cognition between individuals of the same species (Mackie et al 2000), such as mood recognition and language acquisition. The retrieved semantics of these perceptions vary among individuals, because no uniform objective evaluation or measurement can be applied even if they occur under the same circumstances. For example, past working memory may have an influence on individuals' emotions when in a group of people in the same physical environment because the received perception stimuli may be augmented or neglected when they pass through the memory part of the brain (Kessel et al 2016). People's emotions may also be influenced by other people in the same environment, and interaction among individuals can reinforce group-level emotions (Smith and Mackie 2016).

Perception enhancement
From the perspective of individuals, specific sensory impairment, usually due to injury or aging, is common in nature (Roberts and Allen 2016), and the harm caused by the loss of certain perceptions can differ from species to species. For example, deprivation of vision is fatal to shoal fish because vision helps shoal fish maintain coordinated movement, while the loss of visual ability can be less critical to ants since ants can still survive with tactile sensing (Seidl andWehner 2006, Clifton et al 2020). Nevertheless, although some loss of certain perceptions can be adapted to afterwards, it can still change individual cognition and behaviors to some extent. A decline in taste perception in females can result in a change in eating habits (Pepino et al 2014) and older domesticated cats often suffer from feline cognitive dysfunction, which causes disturbances in sleeping patterns and reduced activity due to decline in sight and hearing perception (Chen et al 2015). Even for individuals with normal-level perceptions, enhanced perceptions can be advantageous in biological evolution, as these perceptions can provide individuals with more information enabling them to warn about and understand nearby and neglected dangers. For human beings, perception enhancement can not only help people with perception impairment lead a normal life but can also lead to a deeper awareness and understanding of the surrounding environment (Tyan et al 2014, Chu et al 2018. The importance of perception enhancement to normal people lies in a better sense of the environment, which changes people's cognition and behavior compared with normal levels of perception. In psychology, cognition is categorized into attention, social cognition, memory, executive function and psychomotor speed according to its diverse functions (Kihlstrom 1987, Adolphs 2006, Bodenhausen and Hugenberg 2009. Attention is the process of allocating limited 'resources' to select and track a particular item based on perceived global importance, associated with the intensity of perception (Rensink 2013). Irrelevant information can be noise to perception, and enhancement of relevant perception can increase the attention level to the target. For object detection tasks, individuals can detect the target more precisely when provided with advanced feature information (such as cueing) about the target location (Corbetta andShulman 2002, Rensink 2013). The biological essence behind attentional shift is eye movement guided by peripheral information from enhanced perception (Wolfe et al 2022). On the other hand, social cognition is the observer's ability to detect the state of others, such as emotion, traits or thoughts (Bodenhausen and Hugenberg 2009). Perceptions play an essential role in bridging between outside social targets and psychologically meaningful representations of inner experience, which directly determine social cognition (Bodenhausen and Hugenberg 2009). Research has also shown that category perception can simplify human understanding of the complex social world (Macrae and Bodenhausen 2001). In this article, we aim to investigate how human attention and social cognition can been changed by different enhanced perceptions.
To achieve multi-perception enhancement beyond nature, different high-tech devices with mixed-reality (MR) technology have been applied in various domains. In medical practice, a surgery simulation system has been developed with augmented reality (AR) and HoloLens to enhance surgeons' visual and audio perceptions, mixing virtual surgery scenarios with real actions (Condino et al 2018). In the field of engineering, to obtain a better understanding of stiffness for tangible objects, a MR system renders stiffness with a two degree of freedom wearable tactile display for the finger (De Tinguy et al 2018). In the field of education, MR/virtual reality (VR) devices are used in classes to increase students' attention levels, which enhances their learning experience and engages them in active learning Azhar et al 2018).

Biohybrid systems
A biohybrid system is a system with biology-machine interaction containing both biological and nonbiological components. The concept comes from the idea of achieving collaboration between artificial systems and group-living animals by perceiving, communicating and interacting with animals (Halloy et al 2013). Biohybrid systems help us understand how a biological apparatus (e.g. muscles) can work with the interaction of integrated technologies. Biological behavioral characteristics can be revealed in biohybrid systems. Furthermore, in an artificial system robots can also be used to manipulate the behavior of groups of living organisms to accomplish particular tasks (Halloy et al 2013, Romano et al 2019. Two current topics in biohybrid system research are 'biohybrid organisms' and 'animal-robot mixed societies' . 'Biohybrid organisms' focusses on integrating artificial devices with individual living organisms and 'animal-robot mixed societies' focusses on adding artificial devices (usually biomimetic robots) to biological communities (Romano et al 2019). For biohybrid organisms, an integrated sensor array has been applied to enhance individuals' olfactory and tactile perception capability (Liu et al 2012, Lucarotti et al 2013. Other researchers have studied rodents (Nickell et al 2007, Guo et al 2016, Zhang et al 2018 and substitutes for real skin (Cheneler et al 2014, Low et al 2019, Jang et al 2020. However, this research has mainly focussed on improving the method by which such enhancements can be achieved and investigating the potential electrophysiological mechanisms, rather than studying the impact of the enhanced perceptions. For animal-robot mixed societies, biomimetic robotic bees that can imitate the communication of bees (the waggle dance) (Michelsen et al 1992, Landgraf et al 2011, 2012, Griparić et al 2017, Lazic and Schmickl 2021 have been introduced into the hive to avoid contact with polluted nectar sources and enhance the bee colony's perception of local environments. This builds up an artificial ecological system, such as HIVEPOLIS (Ilgün et al 2021). Other attempts to promote human-fish interaction have been made by controlling the position and number of bubbles in a fish tank according to observers' actions ('Bubble Talk') (Ko et al 2018) and by displaying the inferred emotions of fish to enhance observers' emotional perception of the fish ('AffectiveNemo') (Isokawa et al 2019).

Subjects
This research recruited 34 participants (21 men and 13 women aged 22-44 years; mean age 29 years, standard deviation ±6 years). Twenty of the participants had pets, whereas 14 had never owned one. Twenty-five individuals had prior familiarity with VR or AR, while the remaining nine had no prior experience.

Apparatus
A square interaction platform (9 m 2 ) was set on a 1.25 m high table, as shown in figure 1(a). Five goldfish (Carassius auratus) with distinct appearance resided in a glass tank (0.9 m wide × 0.45 m deep × 0.45 m high) with water between 15 • C and 20 • C, depicted in figure 1(d). An oxygen pump and an aquarium filter were installed to provide the fish with a suitable living environment (figure 1(b)). A LCD display (MI L43M5-EK) was set up behind the aquarium to give the fish an interaction stimulus.
Sensor arrays, including cameras, depth cameras and computers, were deployed in the environment, mounted either above or around the glass tank. As depicted in figure 2, these sensors were utilized to track the movements of humans and fish, analyze sensory data and enhance perceptions in real time. An ordinary camera (aoniA30 HD 1080P) feeds data about humans to the computer at the edge of the tank. The depth camera (ZED 2) is mounted above the tank to gather information about the depth of the fish while simultaneously collecting live photographs of the tank. A computer (Intel(R) Core(TM) i7-9700K CPU@3.60 GHz, NVIDIA GeForce RTX 3080) used the artificial intelligence (AI) algorithm Yolo V5 to determine the location of the fish based on the depth information provided by the depth camera and to identify the position of participants wearing reflective vests with the regular camera.
A MR device (HoloLens 2) was used to blend the processed and inferred perceptual information with the real world and immersively enhance the participants' perceptions by allowing them to immediately view the new perceptual information in the real world. HoloLens 2 included a gaze tracker (60 FPS)  The biohybrid system constructed a bridge to facilitate interactions between people and fish, in which artificial intelligence (AI) evaluates the information gathered by a sensor array and feedback to improve human and fish perception. The sensor array captured human and fish movement data and sent them to computers. The algorithm Yolo V5 was utilized to determine the locomotion of fish and inferred the physiological states of the fish. A mixed reality device, HoloLens 2, was used to enhance human situation-oriented and communication-oriented perceptions. The integrated gaze tracker helped monitor human attention during various settings in the experiments.
that facilitated the generation of stimuli and analysis of human physical attention.

Stimuli
There were two types of stimuli in the experiment, situation-oriented and communication-oriented, both generated by HoloLens 2. To aid in detecting the subject of human attention, each fish was wrapped in a cube (0.0675 m 3 ). The gaze tracker of HoloLens 2 follows the fixations of participants, and HoloLens 2 displays matching stimuli when the fixations are inside the rectangle.  The situation-oriented stimulus was a retracing trail that displayed the position and physical activity of fish in the past. As shown in figure 3 the different times for which the trail was retraced were utilized to control the levels of situation-oriented stimulus: (a) 2 s back (RT1), (b) 4 s back (RT2) and (c) 8 s back (RT3). Similar to the fish, the retracing trail comprises numerous cubes (8.4375 × 10 −4 m 3 ) to help identify attention. The retracing trail was shown immediately when fixations were on the fish and erased when fixations moved away.
The communication-oriented stimulus was a mood tag (shown as emojis) developed based on the behavior of the fish. As shown in figure 4(a), five emojis indicating neophobic, aroused, freezing, fright and normal were used to express the physiological states of fish to humans (Laming andSavage 1980, Kim et al 2014). Randomness was implemented to regulate varying levels of expression. Emojis were selected randomly at three levels of expression accuracy: (a) randomly selecting one emoji out of four emojis that do not contain the predicted one based on the physiological state of the fish (MT1), in which each of the four emoji has a 25% chance of being selected; (b) selecting one emoji out of five with proportional probability (MT2), in which the predicted emoji has a 50% chance of being chosen while the last four have a 12.5% chance of being chosen individually; and (c) displaying the predicted emoji (MT3). The mood tag is likewise encompassed by a globe (radius 0.0108 m) to aid in the placement of the individual's gaze. Similar to the situation-oriented stimulus, the mood tag was quickly shown while fixations were on the fish and disappeared when fixations were no longer on the fish.
In addition to enhancing human perceptions, fish were also exposed to stimuli to boost the efficiency of human-fish encounters and to determine the responses of the fish to human contact. Previous studies have demonstrated that light may alter fish behavior (Popper andCarlson 1998, Romano andStefanini 2022b) and that fish can receive social and emotional support from robotic fish (Romano and Stefanini 2022a). To guarantee the validity of the interaction, we designed a means for people to interact with fish. The designed interaction included both stimuli above. The monitor's backdrop color featured two modes, dark and light. At the beginning of the trial, three artificial fish of various hues swam in response to human movement on a black backdrop. The fake fish were there until the end of the trial. As soon as the user lifted his or her hand, the computer identified the gesture taken by the ordinary camera and changed the monitor's backdrop between dark and light.

Procedure
Participants wearing a reflective vest and HoloLens 2 were instructed to freely examine the fish tank and engage with the fish while standing in front of the apparatus. Participants were given a preset method of interaction prior to the experiments in which the backdrop color of the display behind the tank changed when a special gesture was performed. They could also opt to touch the aquarium to engage with the fish. All engagement behaviors were captured with the ordinary camera.
Four different combinations of situation-oriented and communication-oriented stimuli could be presented: (a) a single situation-oriented stimulus, Scene Trail, a retracing trail; (b) a communicationoriented stimulus, Scene Mood, a mood tag presented alone; (c) a situation-oriented stimulus and a communication-oriented stimulus, Scene M& T, in which both the backtracking trail and mood tag are shown; (d) Scene Null, no stimulus but the real world. In the cross-modal condition, there are nine combinations of stimulus levels. In both Scene Trail and Scene Mood, participants were exposed to three 60 s periods of the matching stimulus. In Scene M&T, three out of nine possible combinations were selected randomly for investigation so that participants would still experience three 60 s sessions. In Scene Null, participants experienced three 60 s periods. The observations in each scene are made in succession, uninterrupted. After each scene, participants were queried with a subjective survey (appendix A). To counteract any possible behavioral or psychological bias associated with time, four scenarios and three sessions in Scene Trail, Mood, and M&T were randomly ranked. The whole process for each person took around 25 min.
After each experiment, five types of information were gathered by the sensor arrays: (a) the gaze data from the gaze tracking of HoloLens 2, which contained participants' fixations during the experiment (in experiments in which the mood tag or the trace trail was not displayed, the participants' gaze fixations on the place where the mood tag or the trace trail was supposed to appear were taken into account); (b) the fish locomotion, which recorded the movement of fish by the depth camera; (c) the relative position of humans in front of the tank by the ordinary camera; (d) the time and movement of human engagement with fish recorded by the ordinary camera; and (e) the subjective surveys following each scene.

Physical attention varies with different enhanced perceptions
Identifying human gaze or eye movement serves the ultimate objective of determining an individual's attention. HoloLens 2 gaze data assisted in monitoring participants' attention to objects in the aquarium, such as the fish encased in a rectangle, mood tags encased in a rectangle and retracing trails comprising many cubes. In 60 s, the number of fixations inside matched targets was compared to indicate the attention level of participants.
There are aspects that may influence attention, such as strong stimuli, movements, emotional strain, etc (Knudsen 2007). Our study suggests that situation-oriented perception may affect attention differently than communication-oriented perception. The retracing traces brought more attention to the fish itself, but the mood tag, regardless of whether the phrase was correct or not, drew more attention to the mood of the fish rather than the fish itself. Consequently, when two perceptual improvements were present, the focus was divided between the fish itself and the mood tag, which gave a distinct outcome. The intensification of perception might increase overall attention in superposition.
As indicated in figure 5, enhancing either situation-oriented stimuli (RT1, RT2, RT3) or communication-oriented stimuli (MT1, MT2, MT3) significantly increased participants' attention (p-values are shown in tables B1(a) and (b)). However, the intensity of each stimulus is not positively correlation with attention, which means that a trail traced farther back or more accurate mood expression does not contribute to an increase in total attention. Regarding the situation-based stimulus, the retracing track received twice as much attention when it was traced 4 s earlier (RT2) as in Scene Null. The trail retraced 8 s back (RT3) seemed to redirect participants' attention away from the fish and fish-related objects, resulting in a decrease in overall attention relative to RT1 and RT2. One potential explanation for this phenomenon is that the retracing of tracks sends varying amounts of information to which humans can attend. However, attention will be diverted when the quantity of information is much less or greater than expected.
The impact of mood tags on human attention was significantly greater than that of Scene Null (p-values shown in table B1(b)), in contrast to the effect of retracing tracks on human attention. The attention to trails has been greatly diverted to the attention to mood tags, whilst the attention to fish remains almost equal to Scene Null. Similar to the correlation found in Scene Trail, the intensification of the accuracy of mood expression does not have a positive correlation with attention either (MT1, MT2, MT3). Compared with Scene Null, humans place significantly more emphasis on the mood tags themselves, independent of the veracity of the expression.
The superposition of retracing trails and mood tags had distinct impacts compared with when just one of the perceptual enhancements is present. Compared with the session with only retracing trails (RT1, RT3), adding more accurate mood expression (RT1 vs RT1&MT1, RT1 vs RT1&MT3, RT3 vs RT3&MT1, RT3 vs RT3&MT3) significantly increased the overall attention (comparisons shown in tables B1(c)-(f)). In more detail, given the same retracing trail stimulus, the addition of a mood tag extracts the attention to the fish and acts on the attention to the mood of the fish. On the other hand, adding retracing trails (MT1 vs RT1&MT1, MT1 vs RT3&MT1, MT3 vs RT1&MT3, MT3 vs RT3&MT3) distracts human attention from the mood. Intensifying the situationoriented stimulus helps increase human attention to the fish, while the increase is not as significant as intensifying the communication-oriented stimulus (p-values shown in tables B1(c)-(f)).
Therefore, situation-oriented perception affects human attention to the item itself, but communication-oriented perception affects human attention to the communication's semantics. The superposition of both senses does not equal the total effect of adding either a situation-oriented stimulus or a communication-oriented stimulus. The attention will oscillate between the object and the communication semantics, resulting in a general decline in attention. Intensifying communication-oriented perception is more effective than intensifying situationoriented perception for boosting attention to objects and object-related things. In the superposition condition, attention is less than in the single perception condition.

Subjective attention varies with different enhanced perceptions
As well as measurable physical attention, there are internal factors in individuals which affect their attention, such as interests, the effort required by the task, trains of thought, etc (Knudsen 2007). The second question relating to attention (appendix A) is queried after each scene in subjective surveys. The scoring range for the question is from 1 to 5 (strongly disagree to strongly agree).
Since the average score for the attention-related question in each survey (appendix A), Scene Null (average 3.97), Scene Trail (average 3.76), Scene Mood (average 4.0) and Scene Mood&Trail (average 3.88), failed the Shapiro-Wilk test, the Friedman test was used to compare the four groups. According to table B3(a) in the appendix, these four groups have a common distribution. A Friedman post hoc Wilcoxon test was used to compare each pair of groups. The subjective attention to fish in Scene Mood is higher than in the other three. The inclusion of mood tags and trails led participants to believe that they paid less attention to the fish than to Scene Null. In Scene Mood&Trail, participants paid somewhat more attention to the fish than they did in Scene Trail, which differs from our description of physical attention above. Individuals' concern for communication semantics, i.e. mood, may account for their heightened awareness of fish rather than the retracing trails. Participants might treat the retracing trails as objects rather than part of the fish. This also explains why fish were believed to be the least attentive in Scene Trail.
In conclusion, focusing more on communication semantics is likely to communicate humanity and provide the impression that one is paying greater attention to the object. However, this impact is insufficient to counteract the mental distraction caused by other objects.

The subjective effect of social cognition before interaction with fish
In the studies, five emojis represented the physiological states of fish, which may also be inferred from the fish trails. As attention fluctuated between fish and fish-related objects, individuals were more inclined to engage with fish. In this research, the willingness to interact with fish, which reflects social cognition of humans, is also considered. The third question in each survey (A.2.3, A.3.3, A.4.3, A.5.3 in appendix A) inquired about participants' readiness to interact with these goldfish. The scoring range is from 1 to 5 (strongly disagree to strongly agree). The four groups failed the Shapiro-Wilk test and showed significant difference after applying the Friedman test (pvalues shown in table B3(b)). According to the Friedman post hoc Wilcoxon test results in table B3(b), participants' propensity to engage with fish without mood tags and trails is considerably greater than that of the other three groups. Even without significance, there was a difference between Scene Trail and Scene Mood, indicating that individuals are more likely to interact with fish in scenes including both stimulus and mood tags than in settings containing simple trails. Since only one question was used to analyze participants' social cognition during the experiment, the result might be sensitive to the samples and individuals. More objective discussions about humanfish interactions are given in the following section.

Human-fish interaction
In human civilization, interaction is regarded as a dynamic, ever-changing series of social behaviors between individuals or groups. Likewise, for this biohybrid system, the interaction between humans and fish should be bidirectional, including not just human behavior to attract the attention of fish but also the changes in the physiological condition of the fish. As described in the preceding section, people may engage with fish using a specific gesture or they can do anything they choose. During the experiment, the interactional behaviors of the participants were counted to demonstrate expression from humans to fish. On the other hand, physiological variables were recorded and extrapolated to demonstrate the responsiveness of fish to human behavior. In addition to the data acquired from sensor arrays, subjective questionnaires indicated changes in the mental states of participants, allowing us to completely examine the mental reactions of people to various perception enhancements.

Interaction behavior from humans to fish
Participants were informed that they could act freely during the experiment, either interact or not, either use specific gestures to interact or use any other method to interact. Recorded data were analyzed by a third party to determine the number of times participants used other methods to interact with fish.
With interaction through specific gestures, a kernel density estimation (KDE) of the distribution of participants with the number of such interactions in each experiment was plotted in figure 6(a). It can be seen that most participants tend to interact through this method relatively few times (less than ten times) in an experiment, regardless of the scene. Two other noticeable points are that participants were . Such pattern similarities and differences between these patterns imply an underlying mechanism in the loop of perception, cognition, interaction and physiological response. A vs B indicates that scene A has more corresponding mood responses than scene B. more likely to interact in a more frequent manner (more than 20 times) in scenes with both perceptions enhanced and were more likely to interact in a less frequent manner (less than 10 times) in scenes where only mood perception was enhanced. However, according to the results of the significance test in table B2(a), the previous two points do not possess strong significance.
With other interaction methods that were freely used by participants, another KDE of the distribution of participants with the number of such interactions in each experiment was plotted in figure 6(b). It can be seen that this figure shows many similarities to figure 6(a). Most of participants tended to interact through this method relatively few times (less than 20 times) in an experiment, regardless of the scene. Participants were more likely to interact in a more frequent manner (more than 40 times) in scenes with both perceptions enhanced and were more likely to interact in a less frequent manner (less than 20 times) in the other scenes, according to the significance test results in table B2(b).
Such inconsistency of significance in different interactions, yet consistency in character, implies that enhancement of different perceptions may have an effect on participants' behavior. However, for reasons which are not clear, such an effect did not cause a significant difference for every interaction method. This may be a confounding effect with multiple reasons, such as the experimental time not being long enough for participants to sufficiently demonstrate a strong difference for different perception enhancement or the change of cognition, proved by attention changes, may not always be reflected instantaneously or substantially.

Physiological states from fish to humans
During each experiment, which can be viewed as an interaction period, the change in the mood of fish during this interaction period was recorded and the mood response analyzed. Figure 7 shows the difference in mood response between different scenes. It is obvious that when comparing the differences between Scene Trail or Scene Mood and Scene Null (Trail vs Null and Mood vs Null), they share a pattern similarity in having more freezing and neophobic corresponding to less normal and fright as the mood response. Also, when comparing the difference between Scene Mood&Trail and Scene Trail or Scene Mood (M&T vs Trail and M&T vs Mood), they also share a pattern similarity, with Scene Mood&Trail have much less freezing than other moods in the mood response.
Such distinct synchronization in pattern may imply that participants tend to interrupt fish or cause neophobic behavior when they are in normal or fright mood, as freezing 'is considered as a pause in ongoing behavior' (Goodman andWeinberger 1973, Laming andSavage 1980), when only one perception was enhanced. When both perceptions were enhanced, participants tend to interrupt less in exchange for causing other moods. However, as this difference itself is reflected to some small extent, further research is needed.

The subjective feeling from humans
The fifth, sixth and seventh questions in each survey (A.2.5, A.3.5, A.4.5, A.5.5; A.2.6, A.3.6, A.4.6, A.5.6; A.3.7, A.4.7 in appendix A) investigated related aspects of participants' subjective opinions on interaction, with each question corresponding to participants' subjective opinion on the reaction of fish to participants' interaction action; participants' beliefs about understanding fish mood; and participants' desire to obtaining corresponding perception enhancement capability, respectively.
In question 5 participants were asked if they agree that a fish is reacting to their interaction. The static analysis showed that when both perceptions are enhanced, participants were more likely to believe that fish reacted to their interaction action compared with Scene Null; when enhancing trail perception, participants were less likely to believe so; when enhancing mood perception, participants' cognition shown no significant difference compared with Scene Null, according to the significance test results in table B3(d).
In question 6 participants were asked whether they believe they understand fish mood. The scene with enhanced mood perception significantly increased participants' confidence in their beliefs. Scenes with the only difference being whether trail perception was enhanced showed no significant effect on participants' confidence, according to the significance test in table B3(e). This suggests that the existence of communication-oriented perception enhancement, whether it is correct or not, can directly produce an effect of psychological suggestion on people's cognition, as the mood information presented to the participants contained some deliberately wrong information in some scenes. On the other hand, the augmentation of situation-oriented perception, trail perception, does not show such an effect, even though the mood of fish can be inferred from the fish trails. These results suggested that although both perceptions were enhanced through vision, their difference in characteristics, situation-oriented and communication-oriented, may cause this perception information to be processed by different parts of the brain, thus leading to a different effect on cognition.
Furthermore, evaluation of participants' perception preferences through question 7 shows that the perception that draws more attention, mood perception, is much preferred by participants, according to the significance test in table B3(f). Not only is this consistent with the effect of these perceptions on participants' attention levels, but it also combines the findings of questions 5 and 6 and suggests that participants prefer perceptions that enhance their social and communicative abilities. This may imply that social cognition is more important to participants than context-related cognition at a subconscious level.

Conclusion and discussion
This research explored the impacts of two distinct enhanced perceptions on people's attention and social cognition, respectively, which are found to result in a high rate of human-fish interaction with the support of MR devices and AI algorithm experiments. These findings reveal that perception enhancement is an artificial way to construct biological mental bonds between natural organisms and realize a biohybrid ecosystem.
The results in section 3.1 suggest that perception enhancement (both situation-and communicationoriented perception enhancement) increases human attention with different focusses in the experiment. Situation-oriented enhancement focusses human attention on relevant target items, while communication-oriented enhancement increases the communication's semantics. However, the superposition of both perception enhancements is a nonlinear function of the input of each individual perception stimulus to attention. A general decline in attention occurs when both situation and communication are enhanced, which distracts participants' attention between the object and the communication semantics. Experimental statistics implies that enhancing communication-oriented perception is more effective than situation-oriented perception for boosting attention to target objects. Similar results were also reflected in the subjective questionnaires, with most participants agreeing that providing communication-oriented perception can help them concentrate more and have a greater interest in understanding the fish. This is because the presence of communication-oriented perception (i.e. mood tag) enhancement, whether it is correct or not, can directly produce a psychological suggestion-like effect on people's understanding of fish and thus concentrates their attention. Besides augmentation of attention, participants have also reported a raised willingness to interact with fish when provided with both stimulus and mood tags compared with only trail information in the subjective questionnaire. In the experiments in section 3.2, the increased willingness to interact triggers human-fish interaction. On one hand, participants are recorded to interact more frequently with fish when provided with both mood tag and trail. On the other hand, negative moods (i.e. freezing and neophobic) detected in the fish sharply decreased when interacting with human participants who were provided with both mood tags and trail information.
This research is a trial of applying advanced devices in biosystems to augment organisms' perception with the support of Artificial Life technologies (such as MR and AI algorithms), which can be a benchmark for building a new ecosystem-level biohybrid system. For instance, this research can be used to enhance mutual understanding between humans and fish, thus leading to the formation of mental bonds among species and creating a new concept of the metaverse. The traditional metaverse aims to build a virtual human society with enhanced human perceptions, while this research is dedicated to creating a virtual society that promotes interaction and mental bonds between natural organisms and humans via enhanced cross-species perceptions; this could be a blueprint for a cross-species metaverse. Furthermore, such a cross-species metaverse will arouse human interest and increase humanity's understanding of other species. With the mental bonds created, humans can better guide the behavior of endangered organisms in escaping environmental disasters or accomplishing tasks to protect local ecology and improve the natural environment in future biohybrid research. Apart from these applications, this paper also provides insights to measure human attention and breaks down human perception into situation-oriented and communication-oriented perception, so future discussion on human perception can be separated into the above two classes. The method in this paper for building a humanfish biohybrid system takes advantage of enhancement of visual perception, and the designed MR devices can thus be applied to other species with visual perception in both academic and industrial scenarios. For example, future research and application on enhancing cows' perception of living on open grassland with MR devices to increase milk production and quality could be inspired by this work. Enhancing the perception of disabled people with MR devices is another possible application. However, for organisms that lack visual perception, such as some insects (e.g. ants) and plants, other perceptions need to be considered for enhancement to build a cross-species interaction biohybrid system.
These experiments have investigated the impacts of different enhanced perceptions (situation-and communication-oriented perceptions) on increasing people's attention and promoting social understanding of natural organisms. However, the experiments are not perfect, and there is still some ongoing work to complete and analyze with this benchmark. Further studies on how perception-enhanced humannature interaction can influence the cognition and behavior of natural organisms can be investigated (e.g. with fish). From the perspective of natural fish, the interaction between fish and artificial systems in this experiment is only performed through virtual 'fish' images on the back screen of the tank, which only provides the living fish with additional visual stimulus. The visual stimulus may not be captured by the fish during experiments due to their poor eyesight. In the future, biomimetic robot fish could be used to replace virtual 'fish' images, providing natural fish with more aspects of enhanced perception when interacting with humans. From the aspect of user experience, all the computation and objective detection algorithms can be integrated into MR devices (HoloLens) without additional screens to enhance the experiment's reality and users can be better immersed in the biohybrid system.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: http://58. 246.144.58:19700/d/710c4e9d398e45558a90/.