<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The contribution of plasmids to trait diversity in a soil bacterium</title></titleStmt>
			<publicationStmt>
				<publisher>Oxford Academic</publisher>
				<date>02/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10548250</idno>
					<idno type="doi">10.1093/ismeco/ycae025</idno>
					<title level='j'>ISME Communications</title>
<idno>2730-6151</idno>
<biblScope unit="volume">4</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Sarai S Finks</author><author>Pranav Moudgalya</author><author>Claudia Weihe</author><author>Jennifer B_H Martiny</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>Plasmids are so closely associated with pathogens and antibiotic resistance that their potential for conferring other traits is often overlooked. Few studies consider how the full suite of traits encoded by plasmids is related to a host’s environmental adaptation, particularly for Gram-positive bacteria. To investigate the role that plasmid traits might play in microbial communities from natural ecosystems, we identified plasmids carried by isolates of Curtobacterium (phylum Actinomycetota) from a variety of soil environments. We found that plasmids were common, but not ubiquitous, in the genus and varied greatly in their size and genetic diversity. There was little evidence of phylogenetic conservation among Curtobacterium plasmids even for closely related bacterial strains within the same ecotype, indicating that horizontal transmission of plasmids is common. The plasmids carried a wide diversity of traits that were not a random subset of the host chromosome. Furthermore, the composition of these plasmid traits was associated with the environmental context of the host bacterium. Together, the results indicate that plasmids contribute substantially to the microdiversity of a soil bacterium and that this diversity may play a role in niche differentiation and a bacterium’s adaptation to its local environment.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>A high degree of genetic variation is encompassed within traditional operational taxonomic units (OTUs) of bacteria <ref type="bibr">[1]</ref>. This so-called microdiversity encompasses an enormous amount of variability in traits that influence a bacterium's ecological role and its contributions to community functioning <ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>. Plasmids may contribute to this microdiversity as they can encode a diversity of traits <ref type="bibr">[5]</ref> that may allow a bacterium to adapt rapidly to environmental changes <ref type="bibr">[6]</ref>.</p><p>The most striking examples of this are the transfer of metal and antibiotic resistance, particularly in the human gut microbiome and clinical environments <ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref><ref type="bibr">[10]</ref>. Beyond toxin resistance, however, evidence of the importance of plasmids to broader niche-adaptation is sporadic <ref type="bibr">[11,</ref><ref type="bibr">12]</ref> Most of what we currently know is based on a handful of well represented genera (e.g., Vibrio, Pseudomonas, and Burkholdaria) within the phylum Pseudomonadota (e.g., reference <ref type="bibr">[13]</ref>) and few studies consider gram-positive bacteria (e.g., reference <ref type="bibr">[14]</ref>) but see, Finks and Martiny, 2023 <ref type="bibr">[5]</ref>.</p><p>A general understanding of plasmid evolution, the diversity of traits that they carry, and their importance for adaptation in most bacterial communities thus remains elusive <ref type="bibr">[5,</ref><ref type="bibr">15]</ref>. To investigate these unknowns in a soil bacterium, we focused on the widespread genus Curtobacterium <ref type="bibr">[16]</ref> for which we have isolated a number of closely-related strains from the top layer of soil (plant litter) in different environments. Curtobacterium strains associated with plant disease can carry plasmids encoding for putative virulence encoded genes <ref type="bibr">[17]</ref>. However, plasmid prevalence and diversity for this genus, as in other soil bacteria, is largely uncharacterized.</p><p>Plasmids can mobilize across broad bacterial host ranges <ref type="bibr">[18]</ref>, interact with other types of mobile genetic elements <ref type="bibr">[19]</ref>, and recombine with their hosts <ref type="bibr">[20]</ref>. We thus expected that Curtobacterium plasmids would also be subject to a high degree of mobility and recombination.</p><p>However, plasmids are also vertically transmitted to daughter cells during host cell replication such that, at some level of genetic resolution, they will be phylogenetically conserved. Thus, plasmids might be conserved within Curtobacterium ecotypes, previously defined as genetic clades with similar phenotypes that are adapted to local environmental conditions including temperature and moisture <ref type="bibr">[21]</ref>. Alternatively, selection might act on plasmids separately from that of an ecotype's chromosome such that plasmid traits vary by environment rather than host phylogeny. To test these alternatives, here we asked: (1) Are plasmids within the Curtobacterium genus phylogenetically conserved? (2) What traits do the plasmids encode and how do these compare to the chromosome? (3) Are plasmid traits correlated with the environment from which they were isolated?</p><p>Long-read sequencing of 23 strains and additional reference genomes resulted in analysis of 26 putative plasmids from 18 Curtobacterium strains (Figure <ref type="figure">1</ref>; Supplemental Methods).</p><p>Three lines of evidence suggest that these sequences are indeed plasmids. First, the average plasmid GC content was approximately 7% lower relative to the host chromosomes (Figure <ref type="figure">1D</ref>).</p><p>Second, the topology (usually circular) and replicon sizes (smaller than the chromosome) of the sequences are well-known signatures of plasmids <ref type="bibr">[22,</ref><ref type="bibr">23]</ref>. Third, all but one plasmid (pD03b) carried some kind of plasmid feature. Interestingly, two plasmids of strain P990 showed % GC contents that were half that of other plasmids (32.3 % and 35.3 % versus ~ 67 %; Table <ref type="table">S1</ref>), suggesting more recent acquisition of these mobile genetic elements. Approximately half of the plasmid sequences encoded genes for known plasmid replicon types (RepA-type, n=4; Table <ref type="table">S6</ref>) or MOB relaxases (MOBF or MOBP, n=12; Table <ref type="table">S7</ref>). In addition, some plasmids carried genes necessary for conjugative, cell-to-cell DNA transfer (e.g., trwC) and for partitioning to daughter cells during host replication and division (e.g., parA/B/G; Figure <ref type="figure">S1</ref>). Based on sequencing Finks et al., 2024 5 coverage, most plasmids appeared to be present in single copies, whereas some smaller ones were present in high-copy numbers (Table <ref type="table">S1</ref>). None of the Curtobacterium plasmid sequences grouped into known plasmid taxonomic units (PTUs), although this is not surprising given the low representation of Actinomycetota in databases (Supplementary Methods <ref type="bibr">[18,</ref><ref type="bibr">24]</ref>).</p><p>Plasmids were common among Curtobacterium strains, but their distribution across the phylogeny was not random. Plasmids were notably absent from ecotype IV and very common in ecotype I (Figure <ref type="figure">1A</ref>). That said, plasmid size varied greatly even within clades (1.5 -607 kb, mean=136 kb), supporting the idea that plasmids are not phylogenetically conserved in this genus (Figure <ref type="figure">1C</ref>). Indeed, genetic (mash) similarity of the plasmids was not correlated with the genetic similarity of the host chromosomes (Figure <ref type="figure">1B</ref>; RELATE: r = 0.28; P = 0.08).</p><p>Curtobacterium plasmids encoded more than 4,000 gene calls that clustered into 2,396 distinct orthologous groups (Figure <ref type="figure">S1</ref>). Despite making up only 3% of the gene content of the entire dataset, this genetic diversity spanned 22 COG functional categories. Based on whole genome alignments, Curtobacterium plasmids did not appear to share a conserved backbone, such as is commonly observed for some IncF type plasmids found in Enterobacteriaceae <ref type="bibr">[25]</ref>.</p><p>Only one gene, lsr2 (a putative histone-like protein), was shared by 38% of the 26 plasmids, whereas most other genes were shared by fewer than 3 plasmids (Figure <ref type="figure">S1</ref>). BlastP searches of consensus amino acid sequence alignments of Lsr2 against the NCBI Reference Proteins (refseq_protein) database reveals this small protein (~12 kDa) is ubiquitous throughout the genus. In M. smegmatis, this protein appears to be involved in the biosynthesis of mycolyldiacylglycerols, an apolar lipid in the cell wall, as well as a DNA-binding function having a transcriptional regulatory role <ref type="bibr">[26]</ref><ref type="bibr">[27]</ref><ref type="bibr">[28]</ref>.</p><p>The Curtobacterium plasmids encoded a diversity of traits that were not a random subset of chromosomal traits (G (21) = 1203.2, P &lt; 0.001; Figure <ref type="figure">2A</ref>). Not surprisingly, genes associated with the mobilome, prophages and transposons (X) were relatively more prevalent on plasmids than the chromosome, but other functions including those associated with cell motility (N) were relatively more abundant on plasmids than on chromosomes (Figure <ref type="figure">2B</ref>). Conversely, carbohydrate transport and metabolism functions (G) were more prevalent on Curtobacterium chromosomes than plasmids. Given their role in soil carbon cycling, it is notable that 11 plasmids carried 46 CAZyme (carbohydrate active enzyme) genes (Table <ref type="table">S9</ref>; Figure <ref type="figure">S2A</ref>), and in more than half of these cases, the CAZyme family was not present on the associated host chromosome. We also identified two genes encoding nitrate assimilation (narB) on a plasmid (Figure <ref type="figure">S2B</ref>).</p><p>Finally, plasmid trait composition differed significantly by the environment from which the host was isolated, explaining ~14% of variation in COG functional categories (PERMANOVA: Pseudo-F (7): 1.424, P = 0.042). For instance, plasmids isolated from grassland and alpine environments encoded a higher prevalence of carbohydrate transport and metabolism (G) genes, whereas those isolated from two arid environments (Desert and Salton-Sea), encoded a relatively high number of genes associated with cell motility (N) and translation, ribosomal structure and biogenesis (J) (Figure <ref type="figure">2B</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our results indicate that plasmids contribute substantially to the microdiversity of</head><p>Curtobacterium and that this diversity may play a role in its adaptation to the local environment.</p><p>Horizontal transfer appeared to break up any signal of vertical transmission of plasmids, even within Curtobacterium ecotypes. However, only about half the plasmids encoded for genes known to facilitate mobility from one bacterium to another. This result is similar to that of marine Vibrio spp., where plasmids also appear to spread rapidly by horizontal gene transfer, many by unknown mechanisms <ref type="bibr">[29]</ref>. This work also highlights the paucity in knowledge about which plasmid traits will be favored in natural ecosystems. Models investigating the evolutionary mechanisms that sustain plasmid diversity suggest that they should encode traits, like antibiotic resistance, that are widely beneficial to many bacterial species and come under relatively strong selection <ref type="bibr">[30]</ref>. Future investigations into how, when, and where plasmid traits such as cell motility provide soil bacteria with an advantage would provide a more in-depth understanding of the eco-evolutionary role of these mobile genetic elements in soil. distribution, and GC content. (A) Cladogram of complete chromosomes of Curtobacterium constructed from a phylogenomic analysis of 916 single-copy core genes. All branches displayed represent bootstrap values of 95% confidence or greater. Bolded values next to strain identifiers are nucleotide lengths of plasmids in bp. The branches are colored by ecotype designation with adjacent color tiles indicating the environment from which the strain was isolated. Note: asterisks indicate the two plasmids in host P990 with relatively lower % GC content (see panel D) compared to the others (B) Heatmap of plasmids constructed from mash pairwise similarities. The strain identifiers are listed by row and the plasmid identifier (Table S1) as columns. The color tiles beside the row labels indicate the environment as in panel A. (C) The frequency of plasmid sizes in kilobases across all strains, where the x-axis is the lower bound of each 25kB bin. (D) Percent GC content for each plasmid and its corresponding chromosome. and vary by environment. (A) Percentages (Log10 scaled) of COG functional category counts of the Curtobacterium plasmids (top) and chromosome (bottom) sequences. The total number of COG functions identified on Curtobacterium plasmids and chromosomes are shown in parentheses. No COG functions for category Z were identified on the plasmids, and plasmids pCff2, pCff3, and pD03b are not included as no COG functions were identified. (B) Normalized frequencies of COG categories encoded by the plasmids by environment. The counts of COG functions were first converted into proportional abundances within an environment after removal A.</p><note type="other">FIGURE LEGENDS</note></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Percentage of COG by replicon COG category</head><p>P a r A F t s K T r w C P a r B P a r B G H 4 G H 6 5 G H 3 5 P a r A P a r B Pa rG G T 2 Tr w C Num genes per kbp 2.65 0 Singleton gene clusters 436 0 I (17) II</p><p>Other (1)</p><p>V ( <ref type="formula">4</ref>)</p><p>Curtobacterium Plasmids</p><p>2,396 gene clusters 4,382 gene calls Ecotype COG20 Functions Pfam Ecotype pD03b -1,579 pCff3 -22,293 pCff2 -25,142 pG07 -82,765 pTCS -41,985 pD26a -55,370 pC1a -98,179 pD35 -56,505 pD03a -51,792 pD26b -42,203 pTCL -163,762 pP20 -147,860 pCspYC1 -77,217 pW21 -62,286 pCFF113 -113,440 pCff1 -147,310 Unnamed -121,530 pC1b -60,295 pC1c -54,520 pS16 -545,759 pCPAA3 -567,298 pG54b -25,704 pCff119b -119,808 Plasmid Features Conjugative, cell-to-cell DNA transfer proteins Chromosome/plasmid partitioning proteins Histone-like proteins CAZymes pG07 82,765 bp 68.8 % GC pD35 56,505 bp 66.6 % GC M O B F</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. B.</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SUPPLEMENTAL METHODS</head><p>Culture collection and reference genomes. We long-read sequenced 23 Curtobacterium strains from our culture collection that were obtained from senescent plant litter (the top 0-5 cm of soil) along an elevation gradient in Southern California. The strains were stored in 25% v/v glycerol at -80 &#186;C and had been previously sequenced on an Illumina platform <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref>. In addition, we retrieved 14 complete plasmid sequences (and associated host chromosomes) representing diverse Curtobacterium spp. hosts that were deposited in NCBI GenBank and RefSeq databases on March 31, 2022. The search criteria we used included: 'Curtobacterium' and 'Plasmid' or 'Chromosome'. In total, we include 39 Curtobacterium genomes in our analyses (Table <ref type="table">S1</ref>).</p><p>Notably, several attempts were made to isolate plasmids from several strains in our culture collection using Qiagen&#174; Plasmid Maxi Kit (Qiagen, Hilden, Germany Sequencing Center (Pennsylvania, USA), generating 300 Mbp per isolate. Basecalls of the raw nanopore reads were performed using Guppy v5.0.16.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sequence assemblies. De novo 'hybrid' assemblies of ONT and Illumina sequenced</head><p>Curtobacterium strains were performed with quality checked short and long reads using the default settings of Unicycler version 0.4.8 <ref type="bibr">[4]</ref>. Prior to assembly, quality checks for both ONT and Illumina sequencing data were checked using FastQC version 0.11.9 and reports compiled using MultiQC version 1.9 <ref type="bibr">[5]</ref>. For ONT generated reads, low quality (PRED &lt; 8), adaptor, and chimeric sequences were removed using Porechop version 0.2.4 along with sequences &lt; 2 kbp in length per previously described methods <ref type="bibr">[6]</ref>. For Illumina generated reads, low quality (PHRED &lt; 30), adapter, and PhiX sequences were removed using FastP version 0.20.0 <ref type="bibr">[7]</ref>. The read quality for both ONT and Illumina quality filtered reads were reassessed with FastQC and MultiQC. A 'hybrid' assembly (combining long and short read sequencing data) approach was used to obtain complete replicon assemblies, as many long reads can exceed the length of repeats in bacterial genomes, which are also a characteristic of many types of MGE, and short reads can improve accuracy of detecting plasmids in WGS data <ref type="bibr">[6]</ref>. Notably, for the Scrubland-52 (W52)</p><p>and Pine-Oak-43 (P43) genomes, these hybrid assemblies failed, and long-read only assemblies using Trycycler v0.5.3 was performed along with a final polishing step using Medaka version 1.6.0 <ref type="bibr">[8]</ref>. All assembly graphs were assessed using Bandage version 0.8.1 <ref type="bibr">[9]</ref>, and completeness of genome assemblies (e.g., contiguity, N50, and %GC) determined using the web interface of Quast <ref type="bibr">[10]</ref>.</p><p>Phylogenomic analysis. To determine the similarity of NCBI retrieved plasmid and chromosomes sequences to previously described ecotypes (genetic clades with similar <ref type="bibr">Finks et al., 2024</ref> phenotypes that are adapted to local environmental conditions including temperature and moisture) of Curtobacterium <ref type="bibr">[1]</ref> from our culture collection, reference sequences were imported into Anvi'o version 7.0 <ref type="bibr">[11]</ref>. First, 916 single-copy core genes within chromosomes sequences were identified, concatenated, and nucleotide positions that were gap characters in more than 50% of the sequences removed using trimAl version 1.4.1. Next, IQ-TREE <ref type="bibr">[12,</ref><ref type="bibr">13]</ref> with the 'WAG' <ref type="bibr">[14]</ref> general matrix model was used to construct a maximum likelihood tree, which was visualized using iTOL version 5 <ref type="bibr">[15]</ref>. Except for three strains (AA3, BH2-1-1 and W02), the Curtobacterium strains in this study fell within five previously described ecotypes (based on clade designations).</p><p>Putative plasmids were identified as closed, circular sequences that were distinct from the chromosome (those having similar percent GC content to known Curtobacterium plasmids). No genes were conserved across all plasmid sequences, and the nucleotide lengths of putative plasmids varied significantly. Therefore, pairwise estimates of plasmid similarities were calculated using Mash version 2.3 <ref type="bibr">[16,</ref><ref type="bibr">17]</ref>. The parameters for calculating mash distances were as follows: K-mer = 21 and minimum-hashes per sketch = 1000 (Table <ref type="table">S2</ref> and <ref type="table">S3</ref>). This comparison method was chosen because it allows for the similarity of the original sequences to be rapidly estimated with a bounded error. It depends only on the size of the sketch (i.e., the mash similarities are independent of the genome sizes) and is strongly correlated with ANI <ref type="bibr">[16]</ref>.</p><p>Mash distances for chromosomal sequences were also calculated using the same approach as for plasmids (Table <ref type="table">S2</ref>). To evaluate whether putative plasmids of Curtobacterium grouped into known plasmid taxonomic units (380 PTUs constructed from 9,894 plasmid sequences from a curated reference database -RefSeq84), the web version of COPLA was used <ref type="bibr">[18]</ref>. To investigate whether any of the plasmids shared conserved backbone region as is common with <ref type="bibr">Finks et al., 2024</ref> other types of plasmids <ref type="bibr">[19]</ref>, whole genome alignments were performed using Mauve v1.1.3 <ref type="bibr">[20]</ref> with a seed weight set to 15 and minimum LCB score of 30,000.</p><p>Trait analyses. To determine the trait content of chromosomes and plasmids, gene calls were made in Anvi'o using Prodigal version 2.6.3 <ref type="bibr">[21]</ref> and searched against the COG20 (Clusters of Orthologous Groups of genes/proteins) <ref type="bibr">[22]</ref> and Pfam version 33.1 <ref type="bibr">[23]</ref> databases via DIAMOND v0.9.14 <ref type="bibr">[24]</ref> in sensitive mode (Tables <ref type="table">S4</ref> and <ref type="table">S5</ref>). Putative plasmid replicases (used in plasmid replicon typing/incompatibility grouping) were identified from hits to the Pfam databases (Table <ref type="table">S6</ref>). Clustering analysis of plasmid and chromosome amino acid sequence similarities were performed in Anvi'o using the MCL algorithm <ref type="bibr">[25]</ref>, under the following parameters: exclude partial gene calls, minimum gene cluster occurrence = 1, and default settings for minbit heuristic and MCL inflation parameter. Gene clusters for plasmid and chromosome replicons, visualized via the anvio-display-pan feature of the interactive interface. All COG functions, Pfam hits, and corresponding gene calls were exported as tables from Anvi'o and merged into one data table before importing into R version 4.2.2 <ref type="bibr">[26]</ref> for statistical analysis. To determine the potential for plasmids to be mobilizable, sequences were searched for MOB family relaxases, enzymes essential for conjugative DNA processing <ref type="bibr">[27]</ref> using MobScan (Table <ref type="table">S7</ref>) <ref type="bibr">[28]</ref>.</p><p>Additionally, chromosome and plasmid sequences were analyzed for genes involved in carbohydrate and nitrogen utilization. To identify carbohydrate active enzymes (CAZymes), we used run_dbcan 4.0.0 and dbCAN2 databases released in 2022 <ref type="bibr">[29]</ref>. Query matches were included if two or more of the three search tools (HMMER, DIAMOND, Hotpep) identified the same CAZyme family annotation per the developer's recommendation <ref type="bibr">[29]</ref>. Query results were included in analyses for HMMER searches of dbCAN and dbCAN-sub with E-values &lt; 1e-15</p><p>Finks et al., 2024 and coverage &gt; 0.35; and for DIAMOND searches of the CAZy database with E-value &lt; 1e-102 (Tables <ref type="table">S8</ref> and <ref type="table">S9</ref>). To identify genes associated with nitrogen-cycling pathways, BLASTp searches of queries against a curated database of nitrogen (N) gene families, the NCycDB release 2019 <ref type="bibr">[30]</ref> at 100% sequence identity were performed and gene calls having E-values 10 -5 and &gt; 50 % query coverages were included in the analyses (Tables <ref type="table">S10</ref>).</p><p>Statistical analysis. To determine whether the pairwise similarities for plasmid and chromosome sequences varied by ecotype and/or environment type, similarity matrices for each sequence type were tested separately via permutational multivariate analysis of variance (PERMANOVA; permutations n = 999 with unrestricted permutations of raw data using type III sums of squares) in PRIMER-e version 6 <ref type="bibr">[31,</ref><ref type="bibr">32]</ref> with ecotype and/or environment designated as fixed factors. Distance-based tests for homogeneity of multivariate dispersions were also performed using the PERMDISP function in PRIMER-e, grouping by either ecotype or environment. To account for sampling biases for rare ecotypes (i.e., Curtobacterium chromosomes outside ecotype/clade I or V; Table <ref type="table">S1</ref>) and environments (i.e., Curtobacterium isolated from algae or unknown origins; Table <ref type="table">S1</ref>), the number of plasmids/chromosomes by category were grouped together into an 'Other' category. The estimated variance explained was determined by dividing terms with significant p-values plus the residual variation by the sum of the estimates of components of variation given as output from PRIMER-e. To test whether plasmids and chromosome genetic similarities varied similarly by ecotype and environment, a RELATE test <ref type="bibr">[32]</ref> using Spearman correlation was performed in PRIMER-e.</p><p>To determine whether the COG and CAZyme composition of plasmid and chromosomes varied by ecotype and/or environment type, euclidean distances were calculated from COG and CAZyme counts using the vegdist function of the 'vegan' package in R <ref type="bibr">[33]</ref>, and PERMANOVA (<ref type="url">https://www.geneious.com</ref>). Additional, G-tests were performed on contingency tables of nonstandardize trait counts with rare traits (traits counts &lt; 6 across all environments) removed to confirm trends were not stochastic attributes of these sequences.</p></div></body>
		</text>
</TEI>
