Comprehensive analysis of gene regulatory dynamics, fitness landscape, and population evolution during sexual reproduction

The fitness landscape is a critical concept in biophysics, evolutionary biology, and genetics that depicts fitness in the genotype space and visualizes the relationship between genotype and fitness. However, the fitness landscape is challenging to characterize because the quantitative relationships between genotype and phenotype and their association to fitness has not been comprehensively well described. To address this challenge, we adopted gene regulatory networks to determine gene expression dynamics. We analyzed how phenotype and fitness are shaped by the genotype in two-gene networks. A two-by-two matrix provided the two-gene regulatory network in which a vector with two angle values (Θ) was introduced to characterize the genotype. Mapping from this angle vector to phenotypes allowed for the classification of steady-state expression patterns of genes into seven types. We then studied all possible fitness functions given by the Boolean output from the on/off expression of the two genes. The possible fitness landscapes were obtained as a function of the genetic parameters Θ. Finally, the evolution of the population distribution under sexual reproduction was investigated in the obtained landscape. We found that the distribution was restricted to a convex region within the landscape, resulting in the branching of population distribution, including the speciation process.

Usually, the expression of RNAs and proteins is a complex and 23 dynamic process that determines a resultant phenotype. In other 24 words, the relationship between genotype and phenotype de-25 pends on complex gene expression dynamics. Fitness, then, is a 26 function of dynamically regulated phenotype. 27 We adopted gene regulatory networks (GRN) that describe 28 gene expression dynamics to address these questions about the 29 fitness landscape. First, we showed that the degree of activa-30 tion or inhibition in expression dynamics continuously defines 31 genotype parameters. Second, phenotypes, that is, protein ex-32 pression levels, are determined by gene expression dynamics; 33 whereas, genomes provide the gene regulatory network. Thus, a 34 complex genotype-phenotype relationship was obtained. Then,

38
In the first part, the phenotype and fitness are obtained from 39 the genotype in a two-gene network. Because the two-gene 40 network has only a few possible steady states, we classified these 41 Figure 2 Regulatory network of two genes. The arrows represent the transcribed regions and the boxes represent the promoter regions. The mRNA is transcribed from each of the X-and Y-transcribed regions, from which proteins X and Y are synthesized. These proteins bind to the promoter regions of the two genes and regulate their transcription. The translation is assumed to occur very quickly and is therefore, omitted from this figure. Because proteins X and Y regulate genes x and y, there are four interactions. The magnitudes of these interactions are indicated by J xx , J xy , J yx , and J yy .
steady states by introducing the angle vector Θ = (θ x , θ y ) from 42 the 2×2 gene regulation matrix. Θ provides the characteristic 43 parameters of the genotype. Next, we considered fitness as a 44 function of the on/off expression patterns to describe the fitness 45 landscape as a function of Θ.

46
The second part discusses the evolution of population distri-47 bution by sexual reproduction within all possible fitness land-48 scapes. In particular, when the high-fitness region consisted 49 of two disjointed parts, we found that the speciation of the 50 two groups occurred by sexual recombination, whereas con-51 vex regionalization from the non-concave fitted region was also 52 demonstrated. Finally, we discuss the relationship between the 53 fitness landscape and the convex regionalization of epistasis, 54 sexual reproduction, and speciation. proteins X and Y bind to a gene promoter, they either promote or is promoted, and when J ij is negative, it is inhibited.
1 When a protein binds to a promoter, gene transcription is 2 promoted or inhibited. For simplicity, we assumed that a gene is −x and −y on the right-hand side of the second term represent 10 protein X and Y degradation, respectively. Here, the expres-11 sion threshold was set at 0.5, to make the expression and non-12 expression states symmetric for later fitness function simplicity.

13
β of the sigmoid function was set to 100 to make the Hill function Attractors of gene-expression dynamics 16 Considering the relationship between genotype and phenotype, 17 we focused on where the expression levels x and y converged 18 to a fixed point or cycle. We investigated how phenotype (x, y) 19 was determined depending on the genotype, J ij (i, j = x, y).

20
To investigate the fixed point, we first obtained the nullclines 21 of x and y, respectively, which were curves that satisfiedẋ = 0 or 22ẏ = 0 in Eq.
(2). The nullclines were represented by the equation: which are illustrated in Fig.3 x ≈ 0, as well as x ≈ ζ = 0.5. In this paper, for simplicity, these    x = α, y = 0 or x = α, y = 1 is the fixed-point attractor, 50 where 0 < α < 1, depending on the initial conditions and 51 were defined as dynamics-type-Cx. x = 0, y = α or x = 1, y = α is the fixed-point attractor, 54 where 0 < α < 1, depending on the initial conditions. We 55 defined this a dynamics-type-Cy. we referred to this as dynamics-type-P.

63
Introduction of Θ

64
We were able to classify steady states that corresponded to phe-65 notypes and introduced a vector that characterized the genotype. 66 Originally, the genotype in the model was represented by a J ij in 67 the 2×2 real matrix. However, the GRN that adopted a sigmoid 68 function only uses two states, 0, 1, beside 0.5, which reduced the 69 dimension of the genotype. This two-dimensional parameter 70 represented the genotype's declination angle, Θ. 71 We then focused on the shape of the nullcline in Fig.3. The The dynamics and attractors were determined by using Θ =

90
(θ x , θ y ) 91 Θ torus and attractor classification 92 Using θ x and θ y , we could explicitly classify the attractors of the 93 teo-gene regulatory dynamics (Fig.4). This section introduces 94 the θ torus, which had a finite range of θ x and θ y and specified 95 each attractor type.

96
First, the behavior of the gene expression dynamics was de-97 termined by the direction of the J ij , Θ row vector. Here, the shows the relationship from Θ to the attractors in (x, y) and the 10 one between genotype and phenotype.

11
Defining the fitness function 12 We obtained the fitness landscape for Θ torus by first defining the As for the fitness function w(x, y), we introduced a continu-29 ous function to satisfy the Boolean function such that w(x, y) for 30 x, y = (0, 1) was either 0 or 1, and determined that the simplest  (1, 0) or (0, 1), the fitness had a maximum value of 1. By contrast, when (x, y) was (0, 0) or (1, 1), the fitness had a minimum value of 0. As the simplest form, This function required that the expression of only one gene, x or 44 y, was necessary for survival, but fitness was lost if both genes 45 were expressed (e.g., switching the two pathways by the inputs). 46 OR: requires the expression of either one of the two genes If at least one of x, y is 1, then the fitness had a maximum value of 1.
This corresponded to the case in which the expression of x or y 47 was needed for survival (e.g., both X and Y proteins have similar 48 functions).

49
Classification of the fitness landscape 50 We next explored the fitness landscape of genotype spaces that 51 depended on the fitness function represented by Θ torus. Here, 52 for a given Θ, we obtained the attractor of (x, y) for a fixed initial for (Q), ( 3 8 π, − 5 8 π) for (S), (− 3 8 π, 7 8 π) for (Cy), ( 7 8 π, − 3 8 π) for (Cx), ( 3 8 π, 3 8 π) for (P), and ( 7 8 π, 7 8 π) for (H). (ii) Classification of each dynamic category was based on (θ x , θ y ). The boundaries between the categories in the figure are θ y = θ x ± 3 2 π, θ y = θ x ± π 2 , θ y = −θ x ± π, θ y = ± 3 4 π, θ y = ± 1 4 π, θ x = ± 3 4 π, and θ x = ± 1 4 π, respectively.  Definition of mutation and recombination 66 Before discussing convex regionalization, we defined the procedure for genetic evolution. We considered two inheritance modes: asexual and sexual reproduction processes. In asexual reproduction, GRN J new in the next generation was chosen from a set of J fit with high fitness in the population. For mutations, the network adjacency matrix for the next generation, J new , was changed by adding a random value generated by a normal distribution with mean 0 and variance σ. Note that the mutation was not introduced to Θ but to J new . This was because Θ was an abstract characteristic value and not the actual value for simplicity. In sexual reproduction, two individuals, J fit1 and J fit2 , were selected from the set J fit to generate high fitness. The GRN in the next generation was then defined by row-wise mixing of the two highly fitted individuals. Therefore, reproduction with free recombination. In the Θ torus, sexual re-7 production could be expressed as parents with J fit1 : (θ fit1 x , θ fit1 y ) 8 and J fit2 : (θ fit2 x , θ fit2 y ), and with θ fit1 x ̸ = θ fit2 x and θ fit1 y ̸ = θ fit2 y . Ad-9 ditionally, suppose a rectangle with each side parallel to the θ x 10 or θ y axis (Fig.6). By mixing the row vectors of the adjacency 11 matrices with sexual reproduction, the children from (θ fit1 x , θ fit1 y ) 12 and (θ fit2 Transfer of genetic parameters (θ x , θ y ) to children through recombination. Sexual reproduction involves mixing the row vectors of the adjacency matrix. The vector of children from the parents (θ fit1 x , θ fit1 y ) and (θ fit2 x , θ fit2 y ) were randomly chosen from θ fit1

ONE RECTANGLE (seen in AND
x or θ fit1 x , and θ fit1 y or θ fit1 y . Graphically, two vertices can fall to a child on another diagonal from the parent rectangular axis.
Following the definition of speciation by hybrid sterility, it can 1 be concluded that speciation occurred in this case.

L-SHAPE fitness landscape:
The difference in population dis-3 tribution between asexual and sexual reproduction was also ob-4 served in the L-SHAPE fitness landscape, as shown in Fig.8(iii).

5
During asexual reproduction, the genotype population was 6 spread over the entire L-shaped area of the landscape; how-7 ever, during sexual reproduction, the population was biased 8 in one direction of the L-SHAPE. Here, the L-SHAPE fitness 9 landscape ( Fig.9) extended vertically and horizontally in two 10 directions, but the offspring from the parent between these two 11 directions (IV of Fig.9) were less fit as a result of the genetic 12 change in Fig.6. 13 Hence, the offspring produced from sexual reproduction be-14 tween the two branches of the L-SHAPE landscape (II and III in 15 Fig.9) shrank into a rectangular region, either vertically (I and III 16 in Fig.9) or horizontally (I and II in Fig.9). Such convex region-  Dobzhansky (1936Dobzhansky ( , 1937; Muller (1940Muller ( , 1942), which is 21 supposed to be an L-shaped fitness landscape. However, we 22 found that some individuals from the common square area (I 23 in Fig.9) of the two edges maintained high fitness and were not 24 reproductively isolated, indicating that complete speciation did 25 not occur. Only some of the two rectangular regions (II and III 26 in Fig.9) were not fit, as an offspring could be located in IV in 27 Fig.9. However, this convex regionalization was achieved so that  Comparison of the population distributions in two rectangular fitness landscapes with X ONLY(w(x, y) = x) fitness (a)during asexual reproduction (mutation only) or during sexual reproduction, where the population is branched into two cases, (b) and (c), which differ in each run of the evolution simulation. The initial expression was chosen as (x 0 , y 0 ) = (0.05, 0.5). The mutation rate was set to 0.01. "gen:" number is the generation number and "fit:" number is the average fitness of the population.  landscape. In the TWO RECTANGLES landscape, the popula-26 tion was limited to one of the two separate high-fitness regions, 27 which led to speciation. In the L-SHAPE landscape, complete 28 speciation was not achieved, but convex regionalization led to 29 two distinct population distributions, as in speciation.

30
When discussing the effects of multiple gene interactions, 31 epistasis is often adopted. Epistasis is defined as a nonlinear 32 change in fitness with multiple mutations. When the fitness 33 change is lower or higher than the addition of changes in mul-34 tiple mutations, it is called negative or positive epistasis, re-35 spectively. Epistasis is applied to local fitness changes, which 36 benefits the study of the effects of relatively small mutations. In also be applied to a system that interacts with a steep sigmoid 7 function. Here, the threshold ζ = 0.5 was used for simplicity and 8 symmetry, but even if the threshold for each genes is changed, 9 Θ can be used because the dynamics equivalent to this study 10 were obtained by the transformation of variables, even though 11 the classification of dynamics was more complex. In addition, Θ 12 can be extended to systems with more than two (N) genes. In  39 The authors declare that they have no conflicts of interest.