Open to Evolve Embodied Intelligence

For the goal of automatically evolving Embodied Intelligence (EI), we investigate an open software architecture inspired by the high surface area to volume ratio of animal lungs, which aims to avoid information theoretic limits on long term evolution experiments (LTEE) encountered with monolithic genetic programming trees. Instead individuals are teams composed of 1023 trees whose inputs and outputs are linked by a low entropy loss branching data (air) pathway. Most trees are shallow and software engineering’s failed disruption propagation (FDP) is observed in the small fraction of deep trees. After initial search, most improvements are at intermediate depths and performance is still rising even after 100 000 generations. Despite the use of double precision for the bifurcating data interconnect, some information loss is seen, particularly in early generations. The static optimisation benchmark, appears to encourage early convergence, which locks the population into possibly sub-optimal phenotypes. Later thousands of small improvements, sometimes in large bloated ensemble members, appear to compensate for early overfitting. Using tournament fitness selection and subtree crossover, we target pure nested side-effect free floating point functions, which are known to have low FDP, and high fidelity data paths, in the hope of generating code which is not too robust so as to prevent on going improvement. However, we again find genetic changes deep within trees are silent. For single precision, we find a maximum evolvability sweet spot with trees of depth 10 to 100. Accordingly, we suggest to evolve very large very complex programs needed for Embodied Intelligence, an open structure with a high surface area permitting most mutation sites to be within 10-100 levels of the organism’s environment, and many better placed test oracles to monitor the impact of mutated code, will be needed.


Open Complex System
Efforts to evolve artificial intelligence are running into information theoretic limits on program depth.Instead can we evolve software systems which are like cell interiors (e.g. Figure 1).These could have processing concentrated in thin computing membranes in a permeating data interconnect environment.The high surface area membranes might be composed of very many small adjacent programs each of limited depth placed side-by-side.The membranes forming an open structure with many gaps between them.The gaps themselves supporting high bandwidth communication with no, or little, processing ability and consequently little information loss.Figure 2 shows 1300 small programs arranged in an open structure.
Rich Lenski, in his long-term evolution experiment (LTEE) [3,4] has demonstrated that Nature can continue to innovate in a static environment for more than 75 000 generations.We have shown genetic programming [5] can do similarly for at least 100 000 generations.However, when evolving deep structures, progress slows dramatically and therefore we feel monolithic  Lung like open complex evolving system composed of 1300 individual genetic programming functions (average height 9.22).These compute element are placed side-by-side to form an open structure.The gaps promote short-cut side effects between functions' input and outputs and the environment [2].deep structures will not be sufficient to automatically evolve complex systems.Instead an open structure like Figure 2 may be needed.
Section 3 describes initial experiments using an architecture inspired by our lungs.Section 4 considers the discovery of the maximum evolvability sweet spot and how it may change over time or in other experiments.In Section 5 we consider ways to improve these experiments, and in Section 6 we conclude that although we do see continued fitness improvement, there is still a long way to go to demonstrate automatic Embodied Intelligence.But first, the next section describes recent work based on information theory, which lead to the view that failed disruption propagation is inevitable in both human designed and automatically evolved digital software, and therefore the need to control the depth of nesting in Embodied Intelligence.

Background: Evolution and Failed Disruption Propagation in Nested Software
Taking advantage of modern high performance parallel computers, we have been investigating the long term evolution of genetic programming (GP).Firstly, using Poli's submachine code GP [6,7], to evolve large binary Boolean trees [8] and more recently exploiting SIMD Intel AVX and multi-core parallelism to evolve floating point GP [9,10].Running for up to a million generations without size limits has generated, at two billion nodes, the biggest programs yet evolved and forced the development [11,12,13] of, at the equivalent of more than a trillion GP operations per second, the fastest GP system [14,15,16].It has also prompted detailed analysis of programs [17], including from an information theoretic [18] perspective [19,20,21].(Of course information theory has long been used with evolutionary computing, e.g.[22].) One immediately applicable result has been the realisation that in deep GP trees most changes have no impact on fitness and once this has been proved, for a given child, its fitness evaluation can be cut short and fitness simply copied from the parent.This can lead to enormous speed ups [23].
Without insight [24] into the construction of the test suite (i.e. in black box testing), and where fitness testing is limited by failed disruption propagation (FDP) [25], simply increasing the number of tests suffers diminishing returns and their joint effectiveness at best increases by a slow logarithmic O(log n) factor when tests are independent [21].Instead FDP suggests, if possible, fitness tests should be augmented by checking values as close to the genetic change as possible.Alternatively to alleviate FDP, we could move the genetic changes closer to the existing test oracle [26] (at the tree's root).However this risks generating huge amounts of "dead" non-evolving deep code, which once created becomes fossilised and is never subsequently modified.
We have also considered traditional (human written) imperative programs and shown these to are much more robust than is often assumed [27,28].Indeed we suggest that information theory provides a unified view of the difficulty of testing software [29,30,25], particularly positioning test oracles [31,32,20].
The question of why fitness is so often exactly inherited [33,34,35], despite brutal genetic change, is answered by the realisation that without side effects, the disruption caused by mutation or crossover must be propagated up the tree through a long chain of irreversible operations to the root node.Being irreversible, each function in the chain can loose entropy.In many cases deeply nested functions progressively loose information about the perturbation as the disruption fails to propagate to the program's output giving rise to a deep neutral network.Thus the mutations and crossovers become invisible to fitness testing and their utility cannot be measured.Without fitness selective pressure, evolution degenerates into an undirected random walk [8,10].
In bloated structures information loss leads, from an evolutionary point of view, to extremely high resilience, robustness and so stasis.From the engineering point of view this is problematic, as then almost all genetic changes have no impact and evolutionary progress slows to a dawdle.Since all digital computing is irreversible [36], it inherently losses information and so without short cuts, must lead to failed disruption propagation (FDP).We suggest in order to evolve large complex systems it must be possible to measure the impact of genetic changes, therefore we must control FDP and suggest in the next section that to evolve large systems, they be composed of many programs of limited nesting depth and structured to allow rapid communication of both inputs and outputs to the (fitness determining) environment.

Experiments with a High Surface Area Lung Like Architecture
We seek a software architecture with a large interface between thin evolvable code and I/O data.We take inspiration from lungs (Figure 3), which hold a large surface area between the animal's body, particularly its blood vessels, and the air, in order to permit exchange of gases such as carbon dioxide and oxygen out of and into the animal.Although the lungs of different mammals vary enormously in size, the airways show a common bifurcating pattern with the diameter of the airways decreasing as 2 − n 3 for the first 17 branches.Branching continues to n=24 but with a slower power law [37].
To limit computational costs, in our initial experiment we limit our "lungs" to a bifurcation depth of n=10, giving 511 branch points (bronchioles) and 512 alveoli ends (total 1023, Figure 4).We anchor a traditional genetic programming (GP) tree [5] at each (giving 1023 evolvable trees per individual).We think of each tree as having the task of regulating the passage of data through the data channel beneath it.The output of each tree is discharged into the channel.In addition to the usual GP primitives (see Table 1), trees at branch points can monitor data upstream of them, data in the channel to their left and to their right.All the trees can read how deep they are in the "lung" using the special leaf D, which has a value 1 to 10.The 512 end point trees (D=10) in each individual are evaluated first.These are followed by the 256 trees closer to the "lung" "output" (D=9).Thus level 9 trees can access left, right and current values, which have been calculated from the outputs of the neighbouring level 10 trees.These are followed by the 128 level 8 trees, and so on until the level 1 tree.
To minimise entropy loss, double precision is used throughout the data channel, e.g. for calculating "current".However as usual, normal single precision floating point precision is used inside the trees and they generate single precision outputs.Double precision is also used when calculating the mean of the 1023 outputs during fitness calculation.Initial trees are each created with ramped half and half [39], depth between 2 and 6. 100% unbiased subtree crossover.100 000 generations.
No size or depth limits.
DIV is protected division (y!=0)?x/y : 1.0f For each test case, the GP's output is the mean of all 1023 trees.Fitness is the sum over the 48 test cases of absolute difference between the mean output and the target value.Occasionally an individual tree gives a very bad non-finite value.This invalidates the mean of all the trees, but such individuals have very poor fitness and are seldom selected to be parents and are excluded from the population without any special processing.

Initial Results
Figure 5 shows, as hoped, the new architecture is able to solve the problem in a reasonable time 1and continues to show improvements even after many thousands of generations 2 .
The runs show some aspects of GP convergence [10].Although Figure 6 shows there is variation between runs, they all show large increase in size (bloat).Across the ten runs, 86% of trees in the best individual in the population do not change throughout the last 90% of the run.The distribution of these "converged" trees has a long tail but most are small and contain only  Evolution of mean absolute error in ten runs of Sextic polynomial [39] with population of 500, each team composed of 1023 trees.One crossover per team per generation, i.e. 1022 trees copied without modification from the 1 st parent (mum).1, 3 or 5 nodes (i.e. up to two functions and three leafs).Indeed, at the end of the runs, about half the best trees contain five or fewer nodes.In contrast, of those tree which do change their height in the last 90% of the run, 41% (5% of all trees) contain on average more than 10 levels by the end of their run.It is the largest trees amongst this 5% that give the great increase in size seen in Figure 6.
The location of the large trees is different in each run, and appears to be essentially random.Figure 7 shows for the best individual in the first run the depth and location of its constituent trees.
We expect our large evolved "lung" trees to behave as other large binary GP trees, so that on average the distance from crossover points to the trees' output is about half the depth of the tree [41,42,10].For deep trees this means any disruption caused by crossover may fail to propagate to the root node and so the child's fitness will be equal to its parent's.Figure 8 reports failed disruption propagation (FDP) in the deep trees.The solid curve in Figure 8 is the empirical fit for FDP which suggests at least half of crossovers deeper than 63 levels will make no difference.(With other mixtures of functions, e.g. containing logic variables or conditional statements, we expect the depth needed to be much less [8,20].) In the case of shallow trees, crossovers must be near the root node, however Figure 9 shows that at the end of the runs, although almost all shallow crossovers do disrupt fitness only a tiny fraction improve fitness.Indeed by the end of the 10 runs the 2300 trees composed of a single leaf were sufficiently converged, that in the last 1000 generations all 1 122 850 crossovers gave back the same child.As expected with trees containing at least one function (i.e.size 3) most crossovers are disruptive (until trees are deep enough to suffer FDP).These small converged trees are the most resistant to improvement with only about 0.1% of children being better.However intermediate converged trees are more evolvable as deeper crossovers become possible and the fraction of improvements increases to about 0.5%.The deepest improvement is 335 levels from the root node (Figure 9).But only three of the 6 845 crossovers deeper than 200 improve fitness. .Solid purple line gives, for the ten runs in the last 1000 generations, the fraction of crossovers whose disruption not only propagates to the root node but also improves fitness (see also Figure 8).For comparison the dashed blue line gives the fraction of trees by their average depth at the end of the ten runs (rescaled by 1/30 to plot on same axis).Note log horizontal scale.

Evolution of Maximum Evolvability
In the last 1000 generations there is more chance of crossovers at least 7 levels deep being successful (see Figure 9).It appears that this is because this late in the run the population is now good and so only has scope for small fitness improvements.It seems that deeper changes may find small phenotype changes which are sometimes beneficial, whereas code changes close to the root node may make larger phenotypic changes, which could overshoot the target output.This trend seems to continue to about depth 20.Still deeper, failed disruption propagation (FDP) starts preventing many crossovers changing the phenotype at all.
We anticipate over considerably extended evolution, perhaps many millions of generations, continued improvement of the ensemble, will mean the fitness step size will continue to decrease, moving the "left cliff" of the evolvability sweet spot in Figure 9 to the right.However the "right FDP" edge will not move.Leading eventually to a narrowing of the maximum evolvability range.Possibly this will be accompanied by more team members moving into that narrowing range?
With a different mix of ingredients in our trees, especially if they include conditionals or Boolean operators, there may be considerably more entropy loss, so causing much more FDP.This will reduce the maximum depth limit on the location of maximum evolvability.(I.e.move the "FDP cliff" to the left.)To avoid pinching out our evolvability bubble, we now need to consider again its left hand edge.We need to ensure that the fitness function is not too discrete.Instead, the fitness calculation must be sufficiently graded, so that as we make progress and fitness improves, it is still possible for genetic changes to find small fitness improvements.

Discussion
In theoretical work on small digital circuits Wright and Laue [43] argue that digital evolution supports an "arrow of complexity" whereby maximum Kolmogorov complexity increases with time.Note that by using Kolmogorov complexity they measure not increase in program size (i.e.bloat) but changes in functionality.Here we see huge overfitting and so are seeing Kolmogorov complexity increase.It is doubtful that this is useful, but our goal is not to solve the benchmark problem, but to use it to explore ways and difficulties of extended evolution [44,14].
Another interesting recent experiment by Kelly, Smith, Heywood and Banzhaf [45], considers extended genetic programming evolution (up to 89 days) using their "Tangled Program Graphs" representation.They evolve agents to play aspects of the video game ViZDoom.
In terms of our experiments, it seems we need more open ended benchmarks.We have stuck with just one genetic change per fitness evaluation [46].However, with increasing numbers of team members this begins to look infeasible.Certainly in human genetic we see genomes being split into multiple diploid chromosomes with multiple crossover points and mutation sites and a more equal contribution of identifiable genes from the two parents.In terms of computational experiments it may be feasible to use multiple fitness measurements, possibly include cheap ways [47] to abort complete fitness evaluation early [48], to complement increased number of crossovers or mutations per individual per generation.
Although we should be wary of building too many preconception into any system which hopes to show extended evolution, having studied the smothering blanket of failed disruption propagation (FDP) in pure floating point code and also in integer [20] and logic [8] expressions, perhaps it is time to adopted more directed choices of genetic changes.Alternatively rather than cooking in our choices, perhaps such as system could learn from its own past and be self adaptive.

Conclusions
In all runs we see a few trees growing to enormous depth, which become resistant to phenotypic change due to failed disruption propagation.It may be that future evolving "thin membrane" systems will have to include mechanisms to prevent such growths or include mechanisms to exorcise them.We also see shallow trees quickly discovered but then become less evolvable as the team as a whole improves.Nevertheless we have that demonstrated a new many tree architecture, inspired by the branching air pathways in lungs, is capable of supporting extended evolution.As with many evolutionary systems (and indeed life on earth) it shows early lock-in followed by many small evolutionary improvements.For our pure floating point expressions, typically about 70 trees in the ensembles evolve into the sweet spot of maximum evolvability and have a depth between 10 and 100 nested levels.

2 Figure 1 .
Figure 1.Small fraction of one of the active membranes within eukaryotic cells [1].

Figure 2 .
Figure 2.Lung like open complex evolving system composed of 1300 individual genetic programming functions (average height 9.22).These compute element are placed side-by-side to form an open structure.The gaps promote short-cut side effects between functions' input and outputs and the environment[2].

Figure 4 .Table 1 . 48 i=1
Figure 4. Lung like data structure.Data flows from 512 alveoli ends (right) collecting outputs of evolving trees and providing them with addition inputs.Ten of 1023 trees shown.

6
Figure 5.Evolution of mean absolute error in ten runs of Sextic polynomial[39] with population of 500, each team composed of 1023 trees.One crossover per team per generation, i.e. 1022 trees copied without modification from the 1 st parent (mum).

Figure 8 .
Figure 8. Solid red line shows the fraction of crossovers in 21 trees deeper than 300 levels that do not change fitness.Dashed blue make it worse.Dotted purple improve fitness.Grouped by distance from crossover site to the root node.Above 500, the bin size is increased from 10 to 100.The smooth curve (black) is the best RMS fit of an exponential decay to the fraction of crossovers which do change fitness.(Data gathered from last 1000 generations in all ten runs.)