Analysis of Genome Characteristics of Helianthus annuus J-01 Chloroplast

Helianthus annuus J-01 was sequenced using a high-throughput sequencing platform, and the structural characteristics of the whole chloroplast genome sequence were analyzed. The results showed that the H. annuus J-01 chloroplast genome, like most higher plants, has a typical circular double-stranded tetrad structure. The chloroplast genome size is 151142bp, GC content is 38.99%, and a total of 127 genes have been annotated, including 84 protein-coding genes, 35 tRNA genes and 8 rRNA genes. A total of 18 genes in the H. annuus J-01 chloroplast genome contain introns, of which ycf3 and clpP genes contain 2 introns. In protein-coding genes, leucine is the amino acid with the highest codon encoding rate, and the amino acid with the lowest encoding rate is cysteine. The A/T content of the third codon is 69.32%, and the third codon of the H. annuus J-01 chloroplast genome coding gene prefers to use A/T.


Introduction
Chloroplast is an essential organelle in plant cytoplasm. It can undergo photosynthesis and provide energy for plant growth and development. It is the basis of plant growth and development and the main factor of productivity [1][2]. In plants, the chloroplast genome contains a large amount of genetic information and is highly conserved; the self-replication and evolution of its genome remain relatively independent of species. Therefore, the chloroplast genome is often used to explore the occurrence, development and evolution of plant genomics and bioinformatics [3][4]. Thus, the complete sequence of the chloroplast genome has become a valuable tool suitable for studying molecular phylogeny and molecular ecology.
Helianthus annuus belongs to the Asteraceae Helianthus is an annual herbaceous plant and is one of the four major oil crops in the world [5]. This study takes H. annuus J-01 chloroplast as the research object, and conducts genomics research on it, determines the characteristics of the H. annuus J-01 chloroplast genome, and provides a reference for the subsequent systematic evolution and genetic diversity analysis of Helianthus.

Total DNA extraction and sequencing
Taking fresh H. annuus J-01 leaves, washing them with sterile water, add to liquid nitrogen and extracting the total DNA by using a plant DNA extraction kit and using agarose gel electrophoresis to detect the quality of the DNA. After reaching the sequencing requirements, H. annuus J-01 DNA will be sent to BGI for high-throughput sequencing, and the rest will be frozen for use.

Splicing and annotation of chloroplast genome sequence
Using stqc software to filter the original data to remove linkers on reads, bases with a quality of less than 30 at both ends, sequences containing ambiguous base N, and sequences with a length of less than 60 bp to obtain the desired high-quality data. Using SOAP denovo software to assemble the obtained Clean Data into the contig sequence, using BLAT to locate it on the chloroplast reference genome of the closed-source species, to get the relative position between the contig sequences, and then splice and correct the contig to get the full-length frame map. Fill in the gap in the sequence to get a circular chloroplast genome to complete the map sequence. Using CpGAVAS for gene annotation and get annotation results. Finally, using Organellar Genome DRAW generates a complete annotated physical map of the chloroplast genome of circular H. annuus J-01.

Basic characteristics of H. annuus J-01 chloroplast genome
H. annuus J-01 chloroplast genome is a typical circular double-stranded tetrad structure, with a total length of 151142 bp, GC content is 38.99%. It includes a small single-copy fragment (SSC, 18312 bp), a large single-copy fragment (LSC, 83534 bp) and a pair of inverted repeats (IRa and IRb, 24648 bp) (Figure 1). H. annuus J-01 chloroplast genome has a total of 127 genes annotated, including 84 proteincoding genes, 35 tRNA genes, and 8 rRNA genes ( Table 1).
The 84 protein-coding genes can be divided into three major categories. The first category includes 29 genes related to self-replication, including three subunits encoding RNA polymerase (large subunit, small subunit and DNA-dependent RNA polymerase). The second category contains 45 genes related to photosynthesis (ATP synthase subunit, NADH-dehydrogenase subunit, cytochrome b/f complex subunit, photosystem I subunit, photosystem II subunit Subunits, ribulose diphosphate oxygenase/carboxylase subunits). The third category includes 6 other genes encoding proteins and 4 genes with unknown functions.

H. annuus J-01 chloroplast genome intron information
A total of 18 genes in the H. annuus J-01 chloroplast genome contain introns, and the ycf3 and clpP genes contain 2 introns. In addition, the intron of ndhA is the longest at 1088 bp, and the intron of trnL-UAA is the shortest at 437 bp (

Codon Usage in the chloroplast genome of H. annuus J-01
Statistics of the codons of all protein-coding genes in the H. annuus J-01 chloroplast genome found that Leu is the amino acid with the highest coding rate, and 10.61% (5099) of the codons are involved in coding; Cys is the amino acid with the lowest coding rate. 1.13% (541) of the codons are involved in coding, and the A/T content of the third codon is 69.32%. This codon preference is universally present in the chloroplast genomes of other higher plants (Table 3) [7].