Assembly and annotation of chloroplast genomes
The assembly resulted in a complete sequence of the cp genome of C. hirtinoda with a length of 139,561 bp (Fig. 1), consisting of 83.166 bp large single-copy region, 20.811 bp small single-copy regions, and two IR regions of 21,792 bp, comprising the typical quadripartite structure of terrestrial plants. The cp genome of C. hirtinoda was annotated with 130 genes, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes (Table 1). Most of the 15 genes of C. hirtinoda cp contains introns. Of these, 13 genes contain an intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rps16, trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC) and only the gene cyf3 comprises two introns, and the gene clpP the intron was removed (Supplementary Table S1). The rps12 the gene contained two copies, and the three exons were spliced into a trans-splicing gene18.
The accD, ycf1, and ycf2 genes were missing in the cp genome of C. hirtinodaand introns in genes clpP and rpoC1 were lost. This phenomenon is consistent with previous systematic evolutionary studies of the genome structure of plants in the family Poaceae.19. The phenomenon of missing genes is reported in other plants20,21,22,23.
The total GC content in the C. hirtinoda cp was 38.90%, and the content of each of the four bases, A, T, G, and C, was 30.63%, 30.46%, 19.57%, and 19.33%, respectively (Table 2 ). The LSC region (36.98%) and the SSC region (33.21%) showed much lower values than the IR region (44.23%), indicating a non-uniform base content distribution in the cp genome, probably due to four rRNAs in the IR region, which in turn increases the GC content in the IR region. These values were similar to previously reported cp genome results for some Poaceae plants.24.25.
Repeat sequences and codon analysis
SSR consists of 10 bp long base repeats and is widely used to explore phylogenetic evolution and analysis of genetic diversity26,27,28,29.
A total of 48 SSRs were detected in C. hirtinoda(Fig. 2A). In terms of SSR distribution, the majority (79%) of SSRs (38) were observed in the LSC region, while 6 SSRs in the IR region (13%) and 4 SSRs in the SSC region (8 %) were discovered (Fig. .2B). Previous research suggests that the distribution of SSR numbers in each region and differences between locations in GC content are related to expansion or contraction of the IR boundary.30.
The REPuter program revealed that the cp genome of C. hirtinoda was identified with 61 repeats, consisting of 15 palindromic, 19 direct, and no inverse and complementary repeats (Fig. 3). We noticed that repeating the analyzes of three chimonobambusa species in the genus had 61-65 repeats, with a single setback in C. hejiangensis. Most repeat lengths were between 30 and 100 bp, and repeat sequences were located in the IR or LSC region31 (Supplementary Table S2).
We have identified 20,180 codons in the coding region of C. hirtinoda (Fig. 4, Supplementary Table S3). The AUU codon of Ile was the most used, and the TER of UAG was the least used codon (817 and 19), excluding termination codons. Leu was the most encoded amino acid (2,170) and the TER was the lowest (85). Relative Synonymous Codon Usage (RSCU) value greater than 1.0 means a codon is used more frequently32. RSCU values for 31 codons exceeded 1 in the C. hirtinoda cp, and of these, the third most frequent codon was A/U with 29 (93.55%), and the frequency of the AUG and UGG start codons used showed no bias (RSCU = 1).
Comparative analysis of genome structure
The nucleotide variability (Pi) values of the three cp genomes discovered in the chimonobambusa species within the genus ranged from 0 to 0.021 with a mean value of 0.000544, as demonstrated by DnaSP 5.10 software analysis. Five peaks were observed in the two single-copy regions, and the highest peak was present in the trnT-trnE-trnY region of the LSC region (Fig. 5). The Pi value for LSC and SSC is significantly higher than that of the IR region. In the IR region, very different sequences were not observed, a highly conserved region. The sequences of these highly variable regions are reported in other plants during examinations for species identification, phylogenetic analysis, and population genetics research.33,34,35.
Structural information for complete cp genomes among three chimonobambusa Species within the genus revealed sequences in most regions to be conserved (Fig. 6). The LSC and SSC regions show a remarkable degree of variation, greater than the IR region, and the non-coding region shows higher variability than the coding region. In noncoding areas, 7–9k, 28–30k, 36k, and other gene loci differed significantly. Genoa rpoC2, rps19, ndhJ and other regions differ in the protein coding region. However, the agreement between the tRNA and rRNA regions is 100%. A similar phenomenon has also been reported by other36.
IR contraction and expansion in the chloroplast genome
Due to the unique circular structure of the cp genome, there are four junctions between the LSC/IRB/SSC/IRA regions. During the evolution of species, the stability of the sequences of the two IR regions has been ensured by the expansion and contraction of the IR region of the chloroplast genome to some extent, and this adjustment is the main reason for the variation the length of the chloroplast genome.37.38.
The variations at the IR/SC limits in the three chimonobambusa the genomes of the chloroplast genus were very similar in organization, gene content, and gene order. The size of the IR ranges from 21,797 bp (C. tumidissinoda) to 21,835 bp (C. hejiangensis). The ndhH The gene spans the SSC/IRa boundary, and this gene spanned 181 to 224 bp in the IRa region for all three chimonobambusa gender. The gene rps19 was extended from the IRb to the LSC region with a gap of 31-35 bp. The rpl12 The gene was located in the LSC region of all genomes, ranged 35-36 bp outside the LSC/IRb (Fig. 7).
Three chloroplast genomes of the chimonobambusa gender were compared using the purple alignment. The results showed that all the sequences show a perfect conservation of synteny without inversion or rearrangement (Fig. 8).
We performed a phylogenetic analysis using the complete chloroplast genomes and matK gene reflecting the phylogenetic position of C. hirtinoda. Maximum likelihood (ML) analysis based on complete chloroplast genomes indicated seven nodes with all-branch support (100% bootstrap value). However, the three chimonobambusa the genera showed a moderate relationship due to fewer samples used, confirming that C. hirtinoda is closely related to C. tumidissinoda with a bootstrap value 62% higher than C. hejiangensis. A phylogenetic tree based on matK gene revealed that chimonobambusa the species grouped in a branch corresponded to the phylogenetic tree constructed by the complete cp genome tree (Fig. 9). The results show that the entire chloroplast genome identified related species better than the first, consistent with the previous study39.