In-silico Gene Editing of LCYB in Musa acuminata and Its Functional Analysis Related to Lycopene Beta-Cyclase Pathway

Banana (Musa acuminata) has high biodiversity and belongs to the important commodities after rice, particularly in Indonesia. for effective biofortification of bananas, a thorough understanding of the fruit’s genetic makeup, nutritional composition, and bioavailability of nutrients is necessary. If the study of bananas is incomplete or lacking, it can impede the development of biofortified varieties. The gene-regulated vitamin A pathway in a banana is LCYB. Therefore, this study aimed to design activating LCYB gene using CRISPR/Cas 9 and predict its gene and protein functional analysis related to the lycopene beta-cyclase pathway. We performed sequence analysis of LCYB (GeneBank: KP406755.1) to construct sgRNA to activate the expression of LCYB by in-silico approaches. We also successfully amplified the LCYB gene in various accession collections. Based on in-silico predicting sgRNA activity, we found a total of 192 putative sgRNA both in the positive or negative strand in the M.acuminata LCYB gene sequence. We investigated three sgRNA targets sequence-related MaLCYB activation, i.e., CTTTAGATGAGTCATACAAGGGG, ACGAGAGTTCACTACCCAAGAGG, and AGAATTGAGTTGCTCCACCGAGG with an efficiency score of 73.23, 71.00, and 70.21%, respectively. The mutation of the gene could change the functional protein and influence the lycopene beta-cyclase pathway. In silico analysis was an important tool to predict genome editing in M.acuminata to minimize technical sgRNA construction in vivo.


Introduction
Banana is cultivated mostly in tropical area with warm wet weather and can adapted to subtropical area which are prone to abiotic stress such as Heat stress [1].Lying in the tropical area, Indonesia belongs to one of the banana (Musa acuminata) diversity centers in the world.Banana varieties reach more than 200 local cultivars [2] that can grow throughout Indonesia.Banana is the most important fruit and belongs to the fourth important commodities after rice, wheat and corn in Indonesia [3].Therefore, this fruit becomes an important crop for supporting food security [4].
Furthermore, this fruit is prospective for alleviating nutritional disorders such as vitamin A deficiencies.Vitamin A is critical throughout life, it regulates diverse processes in the body [5].Vitamin A is important for pregnant women and combat stunting [6].Although often the Vitamin A content is not utilized properly due to excessive processing of ingredients and various diseases that attack banana plants.
In silico methods refer to computational or simulation-based approaches that use computer models, algorithms, and data analysis to study and predict various phenomena.These methods are particularly useful when experimental methods may be costly, time-consuming, or impractical.There are some reasons why in silico methods may be chosen: Cost-effectiveness, Speed and efficiency, Exploration of complex systems, and in silico methods can leverage existing data and knowledge to generate insights and make predictions particularly to make sgRNA [7].The future research goal was to increase vitamin A in banana.However, this research was limited to analyze LCYB sequence particularly by bioinformatically editing the gene [8] with Crispr Cas9 [4] by manipulating Lycopene Beta-Cyclase Pathway.
For many decades, banana breeding was unsuccessful, in particular because of polyploidy, sterility and parthenocarpy; but in the past 10 years Musa researchers world-wide have made a number of important breakthroughs, and now the first high yielding, disease-resistant varieties are available for wide-scale testing and distribution to farmers [9]; [10].Nevertheless, it is clear that a significant banana research effort is still needed to address threats to production.

Figure 1. Lycopene Beta-Cyclase pathway
CRISPR soon displaced its predecessors due to its extraordinary efficiency, simplicity, and ability to target both DNA and RNA [11].Benchling was designed specifically for pharma and biotech companies.Benchling creates CRISPR sgRNAs by assessing characteristics like as target location, specificity, and efficiency.This web application works in tandem with the CRISPR sgRNA design tool to allow you to select a gene or genome coordinates and automatically annotate a given genome with exonic information from 40+ reference genomes.The diferent potential sgRNAs as oligos could be organized using tags and folders, and stored on Benchling.This useful information can be subsequently used for cloning the sgRNAs in the CRISPR plasmids.Alternatively, sgRNAs, along with scores and of-target sites could be exported, for easy use in one's own spreadsheets.It can be accessed from https://benchling.com.
TracrRNA and crRNA make up the guide RNA in the CRISPR-Cas9 system.These two RNAs (tracrRNA and crRNA) are combined to form a single-RNA chimera known as sgRNA (single-guide RNA), which has dual-tracrRNA and crRNA secondary structure and is useful in genome editing.The sgRNA construct generates more efficiently than the dual-crRNA:tracrRNA construct in the target gene deletion by the CRISPR-Cas9 system in rice [12].
The unique sequence of nucleotides (8)(9)(10)(11)(12) at 3′ end of sgRNA is essential requirement for binding and cleavage of target sites.The type uniqueness of sgRNA influences both on-target and offtarget effects.The GC content range 30-80% was discovered to contribute to the efficiency of sgRNA with target site.Design sgRNAs have unique secondary (three crucial stem loops named as intact RAR (repeat and anti-repeat) 2 and stem loop 3) structure are major limits that need to be considered for choice of efficient sgRNAs for precise targeting [13].Indelphi is a software tool specifically designed for the prediction and analysis of insertions and deletions (indels) in DNA sequences.It utilizes machine learning algorithms and statistical models to predict the likelihood and characteristics of indels in a given DNA sequence [12]; [14].The method used is a literature study particularly in silico data using bioinformatics tools.Therefore, This research aimed to design sgRNA and predict the LCYB gene mutation sites in Musa Acuminata.

Materials
Materials needed include references DNA (Musa acuminata) in GeneBank, UGENE software, and chop-chop database.

Methods
Nucleotide sequence of the LCYB gene was downloaded from NCBI with the access number.The phylogenetic tree was made using MEGA11 software, alignment using ClustalW, and tree construction using Parsimony with bootstrap 1000.Gene structure prediction was performed with FGENESH+(www.softberry.com/berry.phtml?topic=fgenesh&group=help&subgroup=gfind).
We performed functional analysis using Indelphi to analyze LCYB mutation.For input data preparation, we input DNA sequences of LCYB to predict and analyze indels using FASTA format.We used some parameters in the prediction model, default sequence alignment settings, and default scoring thresholds.Initiate the indel prediction process by running the analysis command in the software [14] .

Evolution pathways of the LCYB gene in M. acuminata and M. troglodytarum
Figure 2 shows a molecular phylogenetic tree based on the sequence of nucleotide sequences in the LCYB gene of the Musaceae group.Musa acuminata AAA has a distant evolutionary relationship with the LCYB gene Musa troglodytarum compared to the M. balbisiana shown in Figure 2. Nevertheless, all classes of bananas contain ß-carotene and the β-carotene biosynthetic pathway in Figure 1.
Nucleotide sequences were analyzed by alignment with Clustal-W with MEGA 11 software.Sequences that had many similarities were cut at both the 5' and 3' ends.Phylogenetic analysis used the Parsimony method with bootstrap of 1000 using the 2-parameter Kimura model.Bootstrap 1000 to validate how well the model's data set is used in predicting the phylogenetic tree.The bootstrap values are shown as numbers on the tree branches.The branch length describes the number of base substitutions which can be either DNA polymorphisms or haplotypes.Figure 2 shows a molecular phylogenetic tree based on the sequence of nucleotide sequences in the LCYB gene of the Musaceae group.The bootstrap values are shown as numbers on the tree branches.The branch length describes the number of base substitutions which can be either DNA polymorphisms or haplotypes.

LCYB Gene Sequences LCYB gene sequence data in Musa acuminata
is not yet available in the Benchling database.Therefore, we enter data in the form of fasta from NCBI.Then annotated according to the LCYB gene sequence in Musa acuminata location NC_025210.1 (18023820..18026039) in the NCBI gene database and adjusted to predicted gene structure (exons and introns) via FGENESH+ (www.softberry.com/berry.phtml?topic=fgenesh&group=help&subgroup=gfind).It is known that the LCYB gene is located on chromosome 9 and is on a positive thread (forward), has one long exon and 1 intron.Total length of the gene is 1633 bp with a protein coding region sequence or also called coding sequence/CDS (exon sequence) from 220 -1492 bp (Figure 3)

In-silico Analysis of LCYB and in-silico sgRNA Construction
The major concern of the CRISPR-Cas9 system Benchling is the occurrence of relative high amount of of-target efects in tested organisms.Benchling's sgRNA suggestion ranking is based on a number of factors that are prioritized in order of importance.(i) efficiency score, (ii) off-target count (sgRNA does not bind to the target DNA) (iii) GC content.Benchling recommended as many as 100 sgRNA designs for LCYB gene editing at exon 1 sites from 220-1620.However, to increase the potential for knockout, the selection of sgRNAs in this study was carried out with three considerations, namely (i) the top ranking sgRNAs from Benchling's recommendations, (ii) sgRNAs that are in the early exons, and (iii) sgRNAs with more GC components of 50%.sgRNAs that target exon regions that encode functional protein domains can increase the potential for null mutations (mutations that cause expression of a gene to result in a non-functional protein).
The selection of sgRNA at the start of the exon sequence aims to increase the potential for reading frame shift and knockout mutations to occur in the target gene in this study, namely LCYB.Reading frame shift mutations can occur if nucleotide indels are not multiples of three in the double-stranded region of the CRISPR/Cas9 complex.According to [15], the earlier the mutation occurs, the protein function is predicted to change.All sgRNA candidates in this study had good precision scores with sgRNA 1 having the highest precision score of 0.44 (Table 4).The precision score shows the frequency distribution of indel mutation types after double-stranded truncation.A score close to 1 indicates that the prediction of the type of indel mutation based on the sgRNA candidate has high precision because it is known that the prediction of the type of mutation is dominant and homogeneous.This indicates that inDelphi's mutation prediction based on all sgRNA candidates in this study is considered accurate.Prediction of genotypes and indel types with the highest frequency based on all candidates can be seen in Figure 3.All sgRNA candidates in this study is predicted to cause deletion mutations in the upstream part of the PAM, except for sgRNA 20 which is predicted to cause 1 bp insertion.Mutation prediction deletions can be microhomological deletions or non-microhomological deletions.Type The deletion in this study is predicted to be a microhomological deletion indicated by good microhomological strength (Table 1).[12] stated that the prediction of 1 bp insertion mutation and microhomology deletion in inDelphi can be trusted because the predicted data consistent in repeated studies conducted.predicted sgRNAs causing the insertion of 1 bp is also predicted to have a mutation frequency good frame of reference shift.This is an additional consideration for sgRNA 20 for use in LCYB gene editing.The predicted value of the reading frame shift mutation frequency in inDelphi is calculated based on the prediction of the number of indel length frequencies that are not multiples of three so that they can cause +1 and +2 reading frame shift mutations (a reading frame shift by one or two nucleotides).The reading frame shift mutation frequency prediction does not take into account the type of shift mutation +0 reading frame, namely an indel that is judged not to cause a reading frame shift mutation (the number of indel nucleotides is a multiple of three) resulting in a wild-type LCYB protein.Prediction of reading frame shift mutations in inDelphi also does not take into account the areas where indel mutations occur, namely in exons or introns.The sgRNA designs 1, 2, and 3 had good reading frame shift mutation frequency prediction values of 82.5%; 78.1%; and 67.5% (Table 2).Predictive value of good reading frame shift mutation frequency.This is because most of the prediction of indel length is based on sgRNA it is not a multiple of three.

Conclusion
SgRNA sequences 1 and 2 are recommended for editing the LCYB gene in Musa acuminata sgRNA 3 with the sequence 5'-agaattgagttgctccaccg-3' and PAM 5'-AGG-3' predicted to guide the occurrence of a double strand cut in the first exon and cause a 2 bp deletion.sgRNA 2 with 5'-cggcaaccccataacctcgg-3' and 5'-TGG-3' PAM predicted will lead to a double-stranded cut in the first exon and lead to an 8 bp deletion.Two pairs of primers based on sgRNA 1 and 2 have also been designed as oligoduplexes for the construction of the pRGEB32 plasmid as a vector for LCYB gene editing in Musa acuminata using the CRISPR/Cas9 system.
It is recommended to carry out an analysis of the sgRNA design using other software to strengthen the validation of the sgRNA design, especially regarding the 2D and 3D structure of the predicted mutant LCYB protein that will be produced.In addition, the results of the sgRNA design in this study can be used further in the construction of the pRGEB32 vector for editing the LCYB gene in Musa acuminata using the CRISPR/Cas9 system.

Figure 3 .
Figure 3. Represent of exons (green box) and intron (gray line) of LCYB.Green triangle is TSS, red rhombus is poly-A tail.

Table 1 .
SgRNA design candidate for LCYB gene in Musa acuminata3.5.In silico Analysis for Predicting LCYB Gene Mutations in Musa acuminataData from inDelphi analysis based on all sgRNA candidates included predictions of indel mutation types and post-mutation LCYB genotypes, indel length, total reading frame shift mutation frequency, type of reading frame shift mutation, precision score, and microhomology strength.

Table 2 .
The results of the predictive analysis of the LCYB gene mutation in inDelphi are based on sgRNA candidate