How to Fool Your Robot: Designing Exploitable Sensory Systems

Based on real-world interactions in our lives and in the lives of our ancestors, humans have developed a multitude of psychological, social, and reflexive actions for efficient living. We consider the integration of similar behaviours into embodied robots through the design of their sensory systems, evaluating their impact through a novel lens: how magicians exploit these human behaviours in order to fool their spectators into experiencing impossible events. We explore the consequences of designing agents which can experience magic effects, and argue that such design facilities lifelike actions.


Introduction
Over the last 20 years, researchers have become increasingly interested in the psychology of magic and the psychological effects which have been empirically discovered and exploited by magicians [1,2].In particular, misdirection -the intentional control of a spectator's focushas proven a fruitful area for exploration and categorisation of psychological phenomena [3].Magicians exploit learned social cues, psychological behaviours, and optical illusions [4,5,6,7], alongside a conscious suspension of disbelief similar to that employed when watching films or reading books [8].The effect emerges from a mismatch between predicted and observed events [9], much like the psychology of humour [10,11].
During a similar time frame, the fields of robotics and artificial intelligence (AI) have continued to grow exponentially.Researchers have explored the uses of AI in performancebased creativity tasks including comedy, acting, and painting [12,13,14].Few examples exist for magic performances: Zaghi et al. train deep neural networks to track and analyse simple coin tricks using object permanence [15], whilst Williams and McOwan use optimisation techniques to design the most deceptive puzzles [16,17].However, magic relies on a knowledge and experience of physical laws, which is difficult for AIs to understand without a form of embodiment.Magicians exploit the assumptions and affordances which we learn and develop from real-world interactions [18], based on the information received by our sensory systems.Though this is most commonly visual and audio information, other senses are involved in similar processes, with pickpockets exploiting a form of tactile change blindness [19].
By considering the effect of magicians' techniques on physical robotic sensory systems, we set the stage for future developments of embodied AI.Examining the behaviours which lead to being fooled sheds light on biomimetic actions [20,21,22], thought processes [23], expectations [24], and perception [25], enabling the development of more lifelike robots.In this work we assume that our robotic agent has finite processing and storage capabilities, and is not omniscient i.e. it has access only to the information collected by its sensors.Under these constraints, we focus on the practicalities of implementation, and how the placement of imperfections in our system could lead to our robot being fooled by conjuring techniques.To evaluate this, we must first define what we mean by 'fooling': Section 2 explores a number of suggestions.From here, we categorise and explore a selection of phenomena which lead to this experience in humans, discussing the practicalities and implications of reproducing these behaviours in our robotic designs.

Probabilistic Approaches
Ideally, we aim for a quantitative measure of 'being fooled' when watching a magic performance.Lamont and Wiseman [4] qualitatively consider a magician's spectator to be fooled when they perceive the effect and not the method.This is a good starting point for our considerations: the effect is a physically impossible set of events which the magician encourages by controlling and manipulating the available sensory data [26,27].If our agent chooses this interpretation over the genuine actions then we consider it fooled; however, it is difficult to assign quantitative thresholds as to when this occurs, or whether one agent is more fooled than another.
With this in mind, a number of probabilistic models have been developed, building upon the popular 'Bayesian brain' model in which predictive coding deduces the likely source of incoming information [28,29,30].This is achieved using prior knowledge and real-world experiences which, in order for a robot to have equivalent knowledge, must be similarly experienced or explicitly programmed.Grassi and Bartels [9] consider predictive coding in the context of magic & magicians' techniques, in which magicians aim to maximise the prediction error between an anticipated prior and the observed sensory input, manipulating these inputs alongside the spectator's memory-based expectations.In this way, fooling can be explicitly quantified as the Shannon surprise between the predicted and observed distributions [31].However, this contrasts with the 'impossibility' models in that fooling corresponds to a near-zero, but still modelled probability.Additionally, Grassi and Bartels note that 'while not all surprises are experienced as magic, all magic is experienced as surprise'.This suggests that there are additional contributors to the experience of magic and its enjoyment; Caffarrati et al. observe strong neural responses to magic tricks using scalp EEG recordings [32].
Nonetheless, throughout this work we consider a Bayesian framework, choosing to focus on the numerous factors which lead to this erroneous prediction, rather than those which evoke an emotional response from its occurrence.This framework helps to model a multitude of psychological effects which magicians have learned to exploit: for example, an observer believing that a bag already contains a coin will believe this even more strongly upon hearing a 'clink' when the bag is tapped with a solid pen.To our Bayesian agent, this aural information transforms the prior -a distribution skewed in favour of a coin's presence -into a more skewed posterior.When the bag is then seen to be empty, the effect is the same: the coin has vanished from the bag.However, by further skewing the distribution (using a technique referred to by magicians as a 'convincer'), the surprise between the anticipated event ('the coin is definitely in the bag' ) and the visual information of an empty bag is larger, leading to our agent being more fooled.Similarly, the model explains the difference in performing a trick to an audience of magicians or to an audience primed to be inherently suspicious: these experiences lead to different priors which require more observable evidence to shift in favour of effect over method [24,5].
In these examples, the requirement of embodiment is not necessarily clear.A camera and microphone which feed into a network trained for object permanence could also be fooled by this series of actions, despite having little real-world experience.However, convincers occur to various extents throughout the entire effect: the dynamics of the coin as it leaves the magician's hand follow the expected parabolic arc; the bag deforms at the moment the coin hits its interior, with exactly the shape and magnitude expected of the perceived fabric and coin weight/velocity; the expected sound is that which we expect from the ballpoint pen which the spectator is so used to using in their everyday life.Beyond extreme cases, a spectator does not consciously consider all of these variables, but each sensory input which matches their real-world expectations serves to shift the posterior distribution further from the final effect.As the tricks become more complex, these expectations become increasingly important in augmenting the effect: the magician relies on exploiting both the lower-level priors (e.g.'the bag now contains a £1 coin' ) and the hyperprior understandings of the world learned from experience of physical laws (e.g.'a £1 coin will make a clink when tapped through a felt bag with a ballpoint pen' ).Embodiment is therefore necessary in developing this broad set of hyperpriors, but also in the experience of many tricks: we are especially surprised when a magician makes an object vanish from inside our own fist since we are able to constantly feel and confirm its presence up until the final moment.Furthermore, we wish to consider the implementation of imperfect embodied agents: these may still have gaps in their hyperprior knowledge, as discussed in Section 3.3.Whilst the idealised Bayesian model considers that an agent can consider every possible outcome in parallel, these gaps can lead to the observed outcomes being unaccounted for by the model; the agent does not know whether the outcome was likely or unlikely.Most robot control algorithms would consider an unknown state of this type to be highly undesirable, aiming to use the available sensory information to return to known conditions.Whether this state can be equated with the human experience of 'being fooled' or just 'being confused' is beyond the scope of this discussion.In Sections 3 & 4 we consider the sensory underloads & overloads which magicians exploit to induce fooling or confusion, and how their techniques are likely to affect an imperfect agent.Before this, we briefly examine the existing literature which categorises these techniques.

Existing Taxonomies
Broadly speaking, magicians control the sensory information perceived by their spectators using misdirection: a topic which has been widely discussed and debated in the magic literature.A popular manual [33] describes the principle: 'wherever you direct their attention, the audience will look there'.Nelms [8] subdivides misdirection into that of the mind, attention, and eye, whilst Corinda [34] distinguishes between misdirection achieved by speech and achieved by action.A number of more thorough taxonomies and categorisations of misdirection techniques have been developed [35,36,4,37]: the reader is directed to Kuhn et al. [38] for a thorough analysis and comparison.At the highest level, this work separates perceptual, memory, and reasoning misdirection.Whilst all three levels contribute to the Bayesian framework described above, we focus here on the perceptual, which relates most strongly to an embodied agent's sensory collection and filtering.This is subdivided into attentional and non-attentional forms: using psychological phenomena to control what the spectator should and shouldn't perceive, respectively.Sections 3 & 4 take a selection of techniques from each category, examining the arguments for and against designing an agent which is vulnerable to each.Finally, Section 5 briefly considers the 'ruse/feigning' sublevel of reasoning misdirection, in which an agent is presented with false information.

Dealing with Incomplete Information
The first case to consider is that in which our agent is not presented with sufficient sensory information to strongly reach a particular conclusion i.e. the Bayesian posterior has a large standard deviation, rather than a sharp peak.Examples include: concealment, in which an object is obscured from view and the observer assumes its location; and speed, in which visual information is presented faster than the framerate of the agent (equivalent to saying that 'the hand is quicker the eye' ).
An idealised Bayesian agent could simply consider all available scenarios at once, whilst keeping track of and updating their associated probabilities.However, the true parallel tracking of every possibility requires infinite bandwidth, and a practical implementation must instead 'fill in the gaps' to reduce the search space to a discrete size.The same is true with humans: we can't focus on every method, and instead take cues from the magician's behaviours and actions to fill in the gaps in our sensory inputs and draw our own conclusions.For example, we regularly rely on gestures and gaze direction in our social interactions to 'fill in' unknown information.If we have been told that there is a cat stuck in a tree which is obscured from our viewpoint, we deduce its location by following the direction in which a crowd of people looks and points.Though it is still possible that the cat is elsewhere, or that the cat does not even exist (see Section 5), we instinctively use these social cues as an input to reduce the standard deviation of our Bayesian posterior.
Similarly, if we missed the moment at which a magician picked up a coin, such that it could be in either hand, we instinctively focus on the clenched fist towards which the magician looks and points.Provided this is not done consciously, we soon forget to consider that the coin might not be in this hand, experiencing surprise when this is shown to be the case.
To be similarly fooled, our agent must have learned to instinctively recognise and follow these social cues; else they experience minimal surprise when the hand is shown empty.As we strive to develop robots which can socially interact with humans, these are the type of developments which shift the robot out of the 'uncanny valley' [39] and into the realm of lifelike interactions.Conversely, we are not fooled by these same actions if we realise their deliberateness: if the magician intently stares at one fist and rigidly points with the other, we grow so suspicious as to shift the posterior in favour of the pointing hand [20].It is not as clear that a sociallyadept robot would implement this suspicion threshold unless it had been explicitly programmed against deception.It is therefore straightforward to program a robot which is never fooled by these actions -neglecting all social cues -or which is always fooled -believing all perceived social cues -but much more difficult to produce a lifelike balance between the two.The choice strongly depends on application: companionship robots could be designed to follow all social cues, whilst a safety-critical application might be less trusting.

Dynamic Expectations
Another exploitable shortcut made by our brains is the constant prediction of dynamics.We anticipate a ball's parabolic arc as soon as it is thrown, based on our prior observations of rigid body mechanics.We can approximate the position at which the ball should be caught just by observing its release, though in practice we continue to observe the trajectory in order to make closed-loop prediction updates.
Kuhn and Rensink [40] demonstrate how these short-term predictions can be exploited in a magic effect, causing a ball to seemingly vanish despite never being thrown, in the same way a dog begins to chase a tennis ball which has not been released.Before the vanish, a number of ordinary throws serve to prime the spectator's dynamic expectations, leading to reports of the ball vanishing mid-air.Interestingly, eye tracking shows that this deception does not occur in the oculomotor system, with the authors hypothesising that 'the illusory effect is caused by covert redirection of the attentional spotlight to the predicted position of the ball '.In a robotic implementation, this corresponds to the predicted dynamics of an object being weighted more heavily than the visual input of its motion.This might be beneficial if the visual systems operate with low frame rates, such that the current position could significantly deviate from the last observed position.Trajectory predictions are highly useful in a number of dynamic robots, and high surprise levels could be temporarily caused by a mismatch in prediction and observation, before the posterior is shifted to the ball remaining in the hand.Small mismatches could even be used to prevent fooling: if a thrown object does not follow the expected trajectory, an agent might infer the presence of threads or magnets which are supposed to remain hidden.
The interesting question is then perhaps whether there are benefits to designing an agent which foucses on predictions and disregards its sensory input to experience the ball vanishing mid-air.Since Kuhn and Rensink find priming to increase the likelihood of this effect, we can consider a bio-inspired energy-efficient robot which lowers the processing power designated to repeated observations: if it has seen and predicted multiple throws of the ball, it could lower its frame rate during future throws.Magicians have known and exploited similar human behaviours for decades: In a chapter on controlling attention, Nelms [8] notes how 'monotony kills interest'.

Amodal Completion
Camí et al. [37] note the influence of unconscious amodal completion in deception: that is, perceiving an entire object despite its partial concealment.As well as forming the basis of many optical illusions, this effect can be exploited by magicians: if we see a pen being held horizontally by its centre, we visualise the concealed middle of the pen, based on our prior experience and knowledge of biros.If our embodied agent has learned the concept of a pen, its onboard vision system can use the available information to assign a high probability to the object being a pen, rather than two separate halves or a painted piece of wood.These deductions are vital in a robot which navigates and maps its surroundings based on the available information; it should not need to see every side of a chair to deduce the presence of a chair, assuming the floor space which it occupies.For any level of task-efficiency, the robot must rely on both its previous experience of physical interactions with common objects and of visual training data rather than reconstructing these assumptions from scratch.Magicians are well-aware of a similar necessity in humans, often relying on everyday objects being overlooked in order to facilitate deception [33].
Conversely, amodal absence can be exploited: if most of an empty hand is visible, we find it easy to imagine that the entire hand is empty.This effect has already been shown to be a contributor to road accidents [41], suggesting that similar psychological assumptions should not be programmed into self-driving cars or safety-critical applications.

Hyperprior Incompleteness
The final example of incomplete information which we consider is that of an incomplete or incorrect background knowledge.In the Bayesian framework, Grassi et al. [9] refer to 'deeply held hyperpriors' such as physical laws: we experience a high surprise if our sensor inputs suggest that these hyperpriors are violated.Robots do not necessarily need to follow the same hyperpriors as us, depending on their environmental niche [42]: an ice-hockey playing robot might experience high surprise if it encounters an object which does not glide smoothly along the floor, but its restriction to a 2D plane might mean that it requires no hyperpriors of gravitational laws, and experiences no impossibility when witnessing a levitation.
Lamont and Wiseman [4] identify a 'lack of scientific knowledge' as a potential area of exploitation for magicians: early fraudulent séances used phosphorescent paints to create ghostly apparitions which would be unlikely to phase modern audiences [43].More recently, advances in self-healing materials might prove pivotal in the development of new methods for 'cut-andrestored' magic effects.Any agent without knowledge of these materials could be fooled by their demonstration -robots would need to be kept up to date with scientific breakthroughs to avoid this kind of deception.Similarly, Kuhn et al.'s taxonomy [38] considers 'false assumptions about magic' as a sublevel of reasoning misdirection.
On a psychological level, it is well observed that we often behave less randomly than we expect, such as the presence of biases when asked to select a random number [44].Magicians use this in their favour to increase the likelihood of an impossible experience [34], though it seems unlikely that robotic spectators would be exploitable in this way: a robot can easily generate a random playing card, without a bias towards the Ace of Spades or Queen of Hearts.

Dealing with an Information Excess
We next consider the opposite case: rather than making assumptions to fill in the gaps from missing information, we now imagine that there is too much information for our robot to process in real-time, and that it must decide which sensory inputs to ignore, simplify, or not examine in depth.This threshold is of course dependent on the physical hardware -we assume that any practical implementation must have a finite limit, and cannot be omniscient.We briefly consider the effect of memory, or stored information, before looking at bio-inspired methods of sensory filtering.

Memory
We assume that our agent's on-board memory is sufficiently limited to prevent the storage of all sensory data for later processing, it being more efficient to store key moments, images, or conclusions.
According to Kuhn et al.'s taxonomy [38], memory misdirection can be subdivided into forgetting and mis-remembering, which can occur on different timescales.A thorough analysis of the role of memory in the reconstruction and recollection of magic performances is well beyond the scope of this work, though it is empirically observed that spectators of a magic effect will tend to augment the impossibility of their experience during recollections [4].Unless an agent holds an incomplete or incorrect hyperprior (Section 3.3), a perfect storage of all available information should eventually enable deduction of the method used through replaying, especially with the knowledge of the final effect.This relates to the common advice for magicians to never repeat a trick to the same audience.We focus here on instantaneous fooling i.e. the initial discrepancy between prediction and observation at the moment of the effect.

Sensory Filtering
We consider sensory filtering -the separation of useful information from available background noise -to occur in two parts.The first is physical and largely refers to sight, in which the vision system (a mounted camera or our head and eyes) can be redirected to select the perceived area.The second considers information which is physically received, but psychologically neglected: humans perform much of this filtering unconsciously, which could provide clues in the design of lifelike robots.
Much of the physical choices made for gaze direction are contextual, such as looking directly at a task being performed or a person to whom we are speaking.Within this area of interest, magicians search to direct a spectator's gaze towards or away from a specific stimulus, much as with the social cues discussed in Section 3. Being asked a question or called by name can induce a brief moment of eye contact, which is exploited at the moment of a secret move [45].Social robots without these inbuilt reflexive actions might seem distant or unengaged with human interactions.However, tasks which require extreme focus should not involve these reflexes, requiring some kind of hierarchy-based architecture for prioritisation.
Amongst others, movements play a large part in directing visual attention and in inducing change blindness [46,47].Pickpockets differentiate between fast linear and slower curvilinear gesture trajectories in order to control attention [19], whilst magicians rely on larger gestures to cover smaller secret movements [8].Again, the duplication of these behaviours in our robots would help them seem more natural.This alone is not a reason for implementation -these behaviours would also emerge from a bio-inspired algorithm which tracked objects which it believed to be of the highest interest, based on context, motions, history, and social cues.Choosing which objects to touch can also be grouped under our physical filtering category: physically interacting with an object provides details about its size, shape, and material properties.If an agent's fitness function is based solely on reducing uncertainty then these interactions would be constant.This is not what we see in humans, indicating a number of other factors in this motor control: social conformity, lack of interest, or belief that we already know the object's properties (Section 3.2).
The post-perception filtering of information can be both conscious -such as straining to hear a friend's conversation in a crowded room -or unconscious, such as failing to feel the glasses resting on your head during a prolonged search.A notable example is that of change blindness -the unawareness of of a stimulus's visual change, despite its obviousness when attention is drawn to it.Its effect is widely studied beyond the scope of this work, but we briefly note how it has been observed to result from transient interruptions such as blinking, eye movements, or flickering [48,49].Kuhn et al. investigate a spectator's experience of a magic trick in which the method is clearly visible, using social cues to draw attention away from this area [50].If we continue to develop and incorporate bio-inspired robotic vision systems which filter the incoming information into regions and objects of interest, we would expect our systems to be similarly susceptible to change blindness in real-time if the system's rules are exploited to conceal stimuli.
Post-perception filtering need not necessarily be visual.Tactile information is similarly filtered -Macknik et al. analyse the pickpocket's exploitation of certain aspects of our somatosensory systems during the theft of a watch [19]: "To steal a watch directly from the wrist of a mark, the pickpocket might first squeeze the wrist while the watch is still on (invoking contrast gain adaptation).This has two effects.First, it makes a high contrast somatosensory impression that adapts the touch receptors in the skin, making them less sensitive to the subsequent light touches that are required to unbuckle and remove the watch.second, the high contrast impression leaves behind a somatosensory after image, giving rise to the illusion that the watch is still on after it has been removed".
From this, we see that available tactile information -an object in contact with the skin -is filtered and ignored by the brain, given an initial stimuli by the pickpocket.This is in part achievable becuase of the wrist's insensitivity compared to the rest of the hand and fingertips: it is easier to steal a watch than a wedding ring.This is due to the tactile information which we receive being pre-filtered by the morphological distribution of sensors over our bodies [51].In robotic systems, sensor morphology similarly impacts the available data.Robots with highly sensitive hands might learn similar grasping techniques to us, whilst the relocation of all tactile sensors to one finger would certainly lead to different developments and pickpocketing techniques.

Dealing with False Information
One sublevel of Kuhn et al.'s reasoning misdirection is 'ruse/feigning' in which a magician deliberately presents their spectator with false information.As previously discussed, the level to which an agent trusts the information presented to it should be highly application-dependent.An agent which knows it is watching a magician must assign some level of uncertainty to statements to avoid being fooled, else the statement 'this is an ordinary pack of cards' immediately assigns a zero probability to any kind of gimmicked deck.We can imagine such intrinsic dubiousness being present in any robot which regularly interacts with the public, even if only through acceptable action thresholds [52]: 'You must kill all humans.'False information is not necessarily verbal -see Section 2.1's discussion on convincers in the Bayesian framework.The example here focuses on false information which directly impacts our assumptions ('there is a coin in the bag' ), though this can also be indirect: if we observe a magician struggle to cleanly shuffle a pack of cards we quickly discount the possibility of intricate sleight-of-hand moves.Avoiding these effects in robotic software design is a matter of assigning uncertainty to all information received and inferred.This is commonplace in many systems which employ Kalman filtering to counter the effects of measurement noise or sensor damage.Our considerations raise a difficult question in these systems: how do we determine that our sensors are damaged, and that a series of improbable events -deception -have not instead been correctly observed?

Conclusions
Using the actions and behaviours which allow us to perceive magic effects, we consider the design and filtering of a robot's visual and somatosensory information in the context of morphological design, probablisitic frameworks, and psychological reflexes.We see how physical information selection emerges from social cues, the incorporation of which would help to bring our designs out of the uncanny valley and towards natural interactions.
Post-perception, psychological studies provide a very useful basis for developing 'unconscious' filtering rules in the face of hardware limitations, though we must incorporate an awareness of the potential for these to be exploited.In this way, Bayesian frameworks enable explicit considerations of the uncertainties associated with conclusions, observations, and assumptions; their magnitudes must be application specific, often compromising between social conformity and information certainty.As we continue to develop more general-purpose and complex designs, a robot's perception of magic effects can provide a useful set of benchmarks as to which end of the spectrum we operate.