<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Systematic &lt;i&gt;in vitro&lt;/i&gt; specificity profiling reveals nicking defects in natural and engineered CRISPR–Cas9 variants</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>03/21/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10318804</idno>
					<idno type="doi">10.1093/nar/gkab163</idno>
					<title level='j'>Nucleic Acids Research</title>
<idno>0305-1048</idno>
<biblScope unit="volume">49</biblScope>
<biblScope unit="issue">7</biblScope>					

					<author>Karthik Murugan</author><author>Shravanti K Suresh</author><author>Arun S Seetharam</author><author>Andrew J Severin</author><author>Dipali G Sashital</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Abstract            Cas9 is an RNA-guided endonuclease in the bacterial CRISPR–Cas immune system and a popular tool for genome editing. The commonly used Streptococcus pyogenes Cas9 (SpCas9) is relatively non-specific and prone to off-target genome editing. Other Cas9 orthologs and engineered variants of SpCas9 have been reported to be more specific. However, previous studies have focused on specificity of double-strand break (DSB) or indel formation, potentially overlooking alternative cleavage activities of these Cas9 variants. In this study, we employed in vitro cleavage assays of target libraries coupled with high-throughput sequencing to systematically compare cleavage activities and specificities of two natural Cas9 variants (SpCas9 and Staphylococcus aureus Cas9) and three engineered SpCas9 variants (SpCas9 HF1, HypaCas9and HiFi Cas9). We observed that all Cas9s tested could cleave target sequences with up to five mismatches. However, the rate of cleavage of both on-target and off-target sequences varied based on target sequence and Cas9 variant. In addition, SaCas9 and engineered SpCas9 variants nick targets with multiple mismatches but have a defect in generating a DSB, while SpCas9 creates DSBs at these targets. Overall, these differences in cleavage rates and DSB formation may contribute to varied specificities observed in genome editing studies.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>Cas9 is the well-studied effector protein of type II CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated) bacterial immune systems <ref type="bibr">(1,</ref><ref type="bibr">2)</ref>. Cas9 is an endonuclease that uses a dual CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) to bind dsDNA targets that are complementary to the guide region of the crRNA and adjacent to a short, conserved protospacer-adjacent motif (PAM) sequence <ref type="bibr">(3,</ref><ref type="bibr">4)</ref>. Two nuclease domains in Cas9, HNH and RuvC, cut the target and non-target strand respectively, generating a double-stranded break (DSB) in the dsDNA <ref type="bibr">(4)</ref> with little post-cleavage trimming <ref type="bibr">(5,</ref><ref type="bibr">6)</ref>. The dual RNAs can be combined into a single guide-RNA (sgRNA) and the targeting region can be varied, making Cas9-sgRNA a readily programmable, two component system for use in various biotechnological applications <ref type="bibr">(4,</ref><ref type="bibr">7)</ref>. In particular, DSB formation followed by DNA repair can lead to changes in genomic DNA sequence, enabling genome editing following Cas9 cleavage <ref type="bibr">(8,</ref><ref type="bibr">9)</ref>.</p><p>Cas9 can tolerate mismatches between the crRNA and the target DNA, which is consistent with its role as a bacterial immune system effector in facilitating defense against rapidly evolving bacteriophages (10-13). Cas9 generally tolerates multiple mismatches in the PAM-distal region while PAM-proximal "seed" mismatches reduce the cleavage activity <ref type="bibr">(14)</ref><ref type="bibr">(15)</ref><ref type="bibr">(16)</ref><ref type="bibr">(17)</ref><ref type="bibr">(18)</ref>. This low fidelity leads to off-target activity when used for genome editing applications, as Cas9 can create DSBs at sites with limited homology to the intended target <ref type="bibr">(16,</ref><ref type="bibr">17,</ref><ref type="bibr">19)</ref>. While the commonly used wildtype (WT) Streptococcus pyogenes Cas9 (SpCas9) can tolerate multiple mismatches in the target sequence, other naturally occurring Cas9 orthologs from Staphylococcus aureus, Neisseria meningitidis and Campylobacter jejuni are reported to have higher specificity in genome editing compared to SpCas9 <ref type="bibr">(20)</ref><ref type="bibr">(21)</ref><ref type="bibr">(22)</ref><ref type="bibr">(23)</ref>. Many other strategies have been developed to reduce off-target activity of Cas9 <ref type="bibr">(24)</ref>. SpCas9 has been engineered to improve the fidelity of target cleavage activity. Some mutations were designed to reduce DNA target interactions, making the requirement for complete complementarity with the crRNA more stringent <ref type="bibr">(25,</ref><ref type="bibr">26)</ref>. Mutations rationally introduced in the REC domain of SpCas9 prevent conformational changes required for nuclease domain activation when a target sequence with mismatches is encountered <ref type="bibr">(27,</ref><ref type="bibr">28)</ref>. Bacterial screens have also been used to select high-fidelity SpCas9 variants that maintain on-target cleavage but have reduced off-target cleavage activity <ref type="bibr">(29)</ref><ref type="bibr">(30)</ref><ref type="bibr">(31)</ref>.</p><p>Several methods have been developed to detect and study off-target activities of Cas9 <ref type="bibr">(24,</ref><ref type="bibr">(32)</ref><ref type="bibr">(33)</ref><ref type="bibr">(34)</ref><ref type="bibr">(35)</ref>. However, methods that measure Cas9 off-target editing in eukaryotic cells are limited because cellular factors like nucleosomes may sequester potential cleavage sites <ref type="bibr">(36,</ref><ref type="bibr">37)</ref>. DNA accessibility can also vary depending on cellular processes, which may change the outcome and detection of potential Cas9 off-target editing events. These methods also rely on DSBs in the DNA generated by Cas9 or postcleavage DNA repair and indel formation, which can vary among cell types and experiments <ref type="bibr">(24,</ref><ref type="bibr">(32)</ref><ref type="bibr">(33)</ref><ref type="bibr">(34)</ref><ref type="bibr">(35)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cas9 expression and purification</head><p>All Cas9 proteins were expressed in Escherichia coli BL21 (DE3) cells. Overnight cultures of the cells carrying the expression plasmid were used to inoculate 2X TY broth supplemented with corresponding antibiotics in 1:100 ratio. The antibiotics used were kanamycin at 25 &#181;g/mL for SpCas9 (pMJ806), and at 50 &#181;g/mL for SaCas9 (pSV272 construct) and ampicillin at 100 &#181;g/mL for SpCas9-HF1 (pJSC111) and HypaCas9 (pJSC173). Cultures were grown at 37 &#176;C to an optical density (600 nm) of 0.5 -0.6 and IPTG was added to a final concentration of 0.2 mM to induce protein expression. The incubation was continued at 18 &#176;C overnight (~16 -18 hours) and harvested the next day for protein purification.</p><p>SpCas9 was purified using a previously established protocol <ref type="bibr">(43)</ref>. Cells were resuspended in Lysis Buffer I (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 10 mM imidazole, and 10% glycerol) supplemented with PMSF. A sonicator was used to lyse the cells and the lysate was centrifuged to remove insoluble material. The clarified lysate was applied to a HisPur&#8482; Ni-NTA Resin (ThermoFisher Scientific) column.</p><p>After washing the column with Lysis Buffer I, the bound protein was eluted in Elution Buffer I (Lysis Buffer I + 250 mM imidazole final concentration). The Ni-NTA column eluent was concentrated and run on a HiLoad 16/600 Superdex 200 gel filtration column (GE Healthcare) pre-equilibrated with SEC Buffer A (20 mM Tris-HCl, pH 8.0, and 500 mM NaCl). TEV protease was added at 1:100 (w/w) ratio to the pools containing 6X His-MBP tagged Cas9 and incubated on ice, overnight at 4 &#176;C. Samples were reapplied to HisPur&#8482; Ni-NTA Resin (ThermoFisher Scientific) to remove the His-tagged TEV, free 6X His-MBP, and any remaining tagged protein. The flow-through was collected, concentrated and further purified by using a HiLoad 16/600 S200 gel filtration column in SEC Buffer B (20 mM Tris-HCl, pH 8.0, 200 mM KCl, and 1mM EDTA). Peak pools were analyzed on SDS-PAGE gels and the pools with Cas9 were combined, concentrated, flash frozen in liquid nitrogen and stored at -80&#176;C until further use. Cleavage activity of SpCas9 purified using this protocol was similar to commercially available SpCas9 (data not shown).</p><p>An alternative previously established purification protocol was used for all other Cas9 variants <ref type="bibr">(44)</ref>, with the exception of Alt-R &#174; S.p. HiFi Cas9, which was provided by Integrated DNA Technologies (IDT). Harvested cells were resuspended in Lysis Buffer II (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5 mM imidazole), supplemented with protease inhibitors (PMSF, cOmplete&#8482; Protease Inhibitor Cocktail Tablet or Halt Protease Inhibitor Cocktail). A sonicator was used to lyse the cells and the lysate was centrifuged to remove insoluble material. The clarified lysate was applied to a HisPur&#8482; Ni-NTA Resin (ThermoFisher Scientific) column. After washing the column with 10 column volumes of Wash Buffer (Lysis Buffer + 15 mM imidazole final concentration), the bound protein was eluted in Elution Buffer I (Lysis Buffer II + 250 mM imidazole final concentration). Fractions containing Cas9 were pooled and TEV protease was added in a 1:100 (w/w) ratio and dialyzed in Dialysis Buffer (10 mM HEPES-KOH pH 7.5, 200 mM KCl, 1 mM DTT) at 4&#176;C overnight. The dialyzed protein was diluted 1:1 with 20 mM HEPES KOH (pH 7.5) and loaded on a HiTrap Heparin HP (GE Healthcare) column and washed with Buffer A (20 mM HEPES-KOH pH 7.5, 100 mM KCl). The protein was eluted with Buffer B (20 mM HEPES-KOH pH 7.5, 2 M KCl) by applying a gradient from 0% to 50% over a total volume of 60 ml. Eluted peak fractions were analyzed by SDS-PAGE and fractions with Cas9 were combined and concentrated. DTT was added to a final concentration of 1 mM. The protein was fractionated on a HiLoad 16/600 Superdex 200 gel filtration column (GE Healthcare), eluting with SEC buffer (20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT).</p><p>Peak pools were analyzed on SDS-PAGE gels and the pools with Cas9 were combined, concentrated, flash frozen in liquid nitrogen and stored at -80&#176;C until further use.</p><p>Variations in Cas9 purification procedures could lead to differences in activity of the Cas9 variants. However, the level of purity was similar for all variants, and conditions were identical for all reactions (Fig. <ref type="figure">S1A</ref>) (see methods section -In vitro cleavage assay and analysis). All Cas9s were frozen as high concentration stocks (~ 61 -200 &#181;M). Working stock concentrations of the proteins (5 or 10 &#181;M) were made in SEC buffer (20 mM HEPES-KOH pH 7.5, 500 mM KCl, 1 mM DTT).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Library creation</head><p>Target libraries were partially randomized to generate a pool of sequences containing mismatches <ref type="bibr">(45)</ref>. The following probability distribution function was used to determine the randomization/doping frequency,</p><p>where, P is the pool of the population, L is the sequence length, n is the number of mutations/template and f is the probability of mutation/position (doping level or frequency). A randomization/doping frequency (f) of 15% results in a library containing a mixed pool of sequences of 20 nt (L) with a high representation of 2 to 4 mismatches (n). Single-stranded oligonucleotide libraries were ordered from IDT using hand mixed pools (<ref type="url">https://www.idtdna.com/pages/products/custom-dna-rna/mixed-bases</ref>). For libraries with 15% randomization/doping frequency, if the target sequence has A at a given position, a mix of A:C:G:T would be dispensed in 85:5:5:5 ratio during oligonucleotide synthesis resulting in 85% A at this position and 15% of C, G or T (5% each).</p><p>The number of different mutation combinations (MMc) for a given number of mutations, n, and sequence length, L, regardless of the doping level/frequency is determined by,</p><p>The total number of unique target sequences with a single mismatch is 60, with 2 mismatches is 1,710, and with 3 mismatches is 30,780, etc. We used two library sequences that we previously tested for Cas12a <ref type="bibr">(42)</ref>, a modified protospacer 4 (PS4) sequence from Streptococcus pyogenes CRISPR locus (55% GC) and EMX1 gene target sequence (80% GC) (see Supplementary Table <ref type="table">1</ref> for target sequence).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Plasmid and nucleic acid preparation</head><p>All DNA oligonucleotides used in this study were synthesized by IDT or Thermo Scientific. RNAs (tracrRNA and crRNA) and single-stranded target or library oligonucleotides were ordered from IDT.</p><p>Supplementary Table <ref type="table">1</ref> lists the sequences of DNA and RNA oligonucleotides used in this study.</p><p>Gibson assembly was used to generate target (pTarget) and library (pLibrary) plasmids <ref type="bibr">(46)</ref>. The oligonucleotides for the targets or libraries were diluted to 0.2 &#181;M in 1X NEBuffer 2. pUC19 vector was amplified using primers listed in Supplementary table 1 via PCR to insert homology arms. The PCR reaction was subjected to DpnI digestion and PCR clean up (Promega Wizard SV Gel and PCR Clean-Up System), as per the manufacturer's protocol. 30 ng of PCR amplified pUC19, 5 &#181;L of oligonucleotide (0.2 &#181;M) and ddH2O to bring the volume to 10 &#181;L were mixed with 10 &#181;L 2X NEBuilder HiFi DNA Assembly Master mix (New England Biolabs) and incubated at 50 &#176;C for 1 hour. NEB Stable competent cells were transformed with 2 &#181;L of the assembled product, as per the manufacturer's protocol. Transformants were plated for plasmid preparation (for pTarget plasmids) or to assess transformation efficiencies (for pLibrary plasmids). For pTarget, starter cultures from individual colonies were used to inoculate 50 mL LB media with 100 &#181;g/mL ampicillin. For pLibrary, all of cells in the outgrowth media from the transformation recovery were used to inoculate 50 mL LB with 100 &#181;g/mL ampicillin. Cultures were grown overnight at 37 &#176;C for plasmid propagation and extraction using QIAGEN Plasmid Midi Kit. The following precautions were taken to ensure the plasmid remained supercoiled during plasmid extractions. Cells were cooled on ice before harvesting. All initial steps from lysis to neutralization for plasmid extractions were performed on ice with minimum mechanical stress. Plasmids were stored as aliquots that were used for up to 10 freeze-thaw cycles. Different pLibrary assembly reactions and preparations were used for the replicates of the in vitro cleavage assays (Fig. <ref type="figure">S1B</ref>). All pTarget sequences were verified by Sanger sequencing (Eurofins Genomics, Kentucky, USA). For controls, pUC19 was prepared by restriction enzyme digestion using BsaI-HF to linearize the plasmid and Nt.BspQI to nick the plasmid using the manufacturer's protocols (New England Biolabs).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>In vitro cleavage assay and analysis</head><p>The protocol was adapted from previously described methods <ref type="bibr">(47)</ref>. Cas9:tracrRNA:crRNA complex was formed by incubating Cas9 and tracrRNA:crRNA at a 1:1.5 ratio in reaction buffer (20 mM HEPES, pH 7.4, 100 mM KCl, 5 mM MgCl2, 1 mM DTT, and 5% glycerol) at 37 &#176;C for 10 min. Cas9 RNP complex (final concentration 100 nM Cas9 and 150 nM tracrRNA:crRNA) was mixed with pTarget, pLibrary or empty plasmid (15 ng/&#181;L, ~9 nM) to initiate cleavage reactions at 37 &#176;C. Phenol-chloroform was used to quench reaction aliquots at 5, 10, 15, 30, 60, 300 and 1800 s for pTarget and at 1, 5, 30, 60 and 180 min for pLibrary. The aqueous layer was extracted and separated on a 1% agarose gel via electrophoresis and stained with SYBR Safe (Invitrogen) or RedSafe (Intron Bio) stain for dsDNA visualization. Excess tracrRNA:crRNA was used in cleavage assays to prevent any RNA-independent cleavage activity <ref type="bibr">(48)</ref>. All cleavage assays were performed in triplicate.</p><p>Bands were visualized and quantified with ImageJ (<ref type="url">https://imagej.nih.gov/ij/</ref>). Intensities of the band (I) in the uncleaved (supercoiled -SC) and cleaved fractions (nicked -N and linearized -L) were measured. Fractions (FR) cleaved and uncleaved were calculated as follows. The FRSC, FRN and FRL were determined for each of the time points 't'. FR for time point 0 (FR0) was determined for the negative control pLibrary (i.e. pLibrary run on a gel after preparation as represented in Fig. <ref type="figure">S1B</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119865;&#119903;&#119886;&#119888;&#119905;&#119894;&#119900;&#119899; &#119888;&#119897;&#119890;&#119886;&#119907;&#119890;&#119889; (&#119865;&#119877;</head><p>The apparent rates of pTarget and pLibrary cleavage were determined by fitting FRC to a onephase association equation using GraphPad Prism v 8.4.3 (<ref type="url">https://www.graphpad.com/scientificsoftware/prism/</ref>).</p><p>Where t is time, FR is the appropriate FRc that starts from FR0 and goes to FRfinal (FR at the last time point), and k is the apparent rate constant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Library preparation for HTS</head><p>Agarose gel electrophoresis (as described above) was used to separate the plasmid library cleavage products into cleaved (linear and nicked) and uncleaved (supercoiled) products. The bands from the nicked and supercoiled pools from various time points were excised separately and were individually gel purified using QIAquick Gel Extraction Kit (Qiagen). Nextera Adapters (NEA) were designed to amplify across the target region in the pLibrary. Because the PCR primers amplified across the target region, Cas9-mediated linearization of the plasmid due to DSB formation at the target site did not yield any PCR product while Cas9 mediated nicked plasmid resulted in amplification of the target region via PCR.</p><p>Standard Nextera unique indices/barcodes were used to multiplex the samples and were added to the first PCR products using another round of PCR (see Supplementary Table <ref type="table">1</ref> for NEA primers). Samples were purified using QIAquick PCR Purification Kit (Qiagen) between the two PCR steps. The size of the PCR products was verified using Agilent 2100 Bioanalyzer. Pooled samples were subjected to NextSeq or MiSeq for paired-end reads of 75 cycles at Admera Health, LLC (New Jersey, USA) or Iowa State DNA Facility (Ames, IA). Samples were pooled and multiplexed to get an average of 100,000 reads per sample (Fig. <ref type="figure">S2D</ref>). To ensure coverage of each sample in a minimal number of NextSeq/MiSeq runs, we included two out of the three replicates performed for the pLibrary cleavage assays. 15% PhiX was spiked in to increase sequence diversity of the sample.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>HTS data analysis</head><p>Extraction of target sequences, read counts, and number of mismatches per target sequence from HTS data were analyzed using custom bash scripts (see associated GitHub repository: <ref type="url">https://github.com/sashital-lab/Cas9_specificity</ref>). A simple workflow of the analysis is described in Supplementary figure <ref type="figure">3</ref>, adapted from our previous study on Cas12a <ref type="bibr">(42)</ref>. Target sequences were extracted along with the counts of the extracted target sequences and the number of mismatches. The files containing the extracted target sequences and counts are available on Iowa State University Library's DataShare (see Availability for more information). Target sequence information was imported into Microsoft Excel or R for plotting and summarizing, post command-line processing.</p><p>In each pool, the fraction of target sequences containing 'n' mismatches (MM) (Fn-MM) was calculated as follows.</p><p>Fn-MM was normalized to the fraction (FR) of DNA present in the supercoiled or nicked fraction at a given time point 't' to generate an estimated abundance (EA) of a given set of sequences at a given timepoint. FR was calculated for each time point using equations 3 through 6 as described above.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119864;&#119860; &#119899;-&#119872;&#119872; = (&#119865; &#119899;-&#119872;&#119872; &#119900;&#119891; &#119878; &#119886;&#119905; &#119905;) * (&#119865;&#119877; &#119886;&#119905; &#119905;) [Eq. 9]</head><p>These values were plotted against number of mismatches (n) to generate mismatch distribution curves.</p><p>The relative abundance (enrichment and/or depletion) (RA) of a sequence containing 'n' mismatches at each time point 't' compared to the negative control, (i.e. pLibrary run on a gel after preparation as represented in Fig. <ref type="figure">S1B</ref>). Log -fold change in adundance = log 2 (&#119877;&#119860; &#119878; ) &#119886;&#119905; &#119905; <ref type="bibr">[Eq. 11]</ref> The RA for the perfect target sequence (0 MM), RA0MM was calculated using equation 10, where n = 0 at the different time points. The RA for target sequences with 1 to 5 MM at each time point 't', RA1-5MM-t was calculated by summing EA for 1 to 5 MM, EA1-5MM at each time point, and normalizing to the sum of EA of 1 to 5 MM in the negative control (i.e. pLibrary run on a gel after preparation as represented in Fig. <ref type="figure">S1B</ref>) as shown below. [Eq. 12]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119877;&#119860;</head><p>The relative cleaved fraction of counts for on-target and off-targets (RAcleaved FR) was determined by subtracting RA0MM and RA1-5MM values, respectively from 1 at each time point 't', as shown below and plotted against time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119877;&#119860; &#119888;&#119897;&#119890;&#119886;&#119907;&#119890;&#119889; &#119865;&#119877;-&#119900;&#119899;</head><p>The specificity score (SS) for Cas9 cleavage was calculated by dividing the on-target by off-target RAcleaved FR at each time point 't'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SS = &#119877;&#119860;</head><p>The specificity scores of SaCas9 and HF Cas9 variants were normalized to WT SpCas9 to determine relative specificity at each time point.</p><p>For the heatmaps, the estimated abundance (EA) of sequences containing a particular nucleotide (N = A, G, C, T) at a particular position (P = 1 to 20) for target sequences containing 'n' mismatches at each time point 't' was calculated as above. Relative abundance (RA) was calculated by normalizing EA against the pool of DNA in the original library to eliminate variability in aberrant nicking that may have occurred for individual pLibraries in the negative control.</p><p>For the supercoiled pool, we calculated the maximum change in relative abundance (RA) over time as max &#916;RAS-NP for each sequence containing a particular nucleotide (N = A, G, C, T) at a particular position (P = 1 to 20) for target sequences containing 'n' mismatches over all time points 't' (0 min, 1 min, 5 min, 30 min, 60 min and 180 min). Max &#916;RAS-NP is indicated as max &#916; abundance in the figures for simplicity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#119872;&#119886;&#119909; &#8710;&#119877;&#119860; &#119878;</head><p>] [Eq. 17]</p><p>For the nicked pool, we calculated the average change in relative abundance (RA) over time as &#916;RAS-NP for each sequence containing a particular nucleotide (N = A, G, C, T) at a particular position (P = 1 to 20) for target sequences containing 'n' mismatches over time points 't' after Cas9 cleavage (1 min, 5 min, 30 min, 60 min and 180 min). &#916;RAS-NP is indicated as &#916; abundance in the figures for simplicity.</p><p>In the supercoiled pool, we defined the extent of cleavage of a target sequence from the supercoiled pool as abundancemin by determining the minimum value of RAS-NP across all time points for those target sequences. For the nicked pool, we defined the extent of nicking of a target sequence as abundancemax by determining the maximum value of RAS-NP across all time points for those target sequences.</p><p>Abundancemin or abundancemax were normalized to the highest value across both pLibraries, Cas9s and mismatches (1 to 5 MM) which allows comparison between Cas9s and mismatches. Using custom scripts in R, the &#916; abundance and abundancemin and abundancemax were used to plot the bubble heatmaps for the supercoiled and nicked pools, respectively. &#916; max change and &#916; abundance defined the gradient colour and abundancemin and abundancemax defined the bubble size.</p><p>For the analysis of target sequences with two mismatches, the sequences with 2 mismatches were extracted. The distance between the two mismatches and the total counts for sequences separated by that distance were determined. The counts were normalized to the number of possible ways the two mismatches can occur <ref type="bibr">(42)</ref>, and the max &#916; abundance, &#916; abundance, abundancemin and abundancemax were calculated similarly to equations 16, 17 and 18 and plotted versus distance between mismatches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cleavage activity of Cas9 against target library</head><p>We sought to compare the cleavage activity and specificity of different Cas9 variants in a systematic manner. We performed a previously established in vitro plasmid library (pLibrary) cleavage assay with five Cas9 variants <ref type="bibr">(42)</ref>, WT SpCas9, WT SaCas9 and three high-fidelity variants of SpCas9 -SpCas9 HF1, HypaCas9, and Alt-R &#174; S.p. HiFi Cas9 (HiFi Cas9) (Fig. <ref type="figure">1A,</ref><ref type="figure">S1A</ref>) <ref type="bibr">(25,</ref><ref type="bibr">27,</ref><ref type="bibr">31)</ref>. The three high-fidelity variants of SpCas9 will be collectively referred to as HF Cas9 hereafter. For each Cas9 variant, we used two different crRNA sequences with partner tracrRNA and generated corresponding negatively supercoiled (nSC) plasmids containing the perfect target (pTarget) or target library (pLibrary) (see methods section -Plasmid and nucleic acid preparation) (Fig. <ref type="figure">S1B</ref>). The pLibraries contained a distribution of target sequences with between zero and ten mismatches to the crRNA guide sequence, with a maximum representation of target sequences with two to four mismatches in the libraries (Fig. <ref type="figure">S1C</ref>). The two crRNA and library sequences were designed based on protospacer 4 sequence from Streptococcus pyogenes CRISPR locus (55% G/C) and EMX1 gene target sequence (80% G/C), referred to as pLibrary PS4 and pLibrary EMX1 respectively. We employed the native dual crRNA and tracrRNA system for our assay to avoid any differences that may stem from single guide RNA design optimization <ref type="bibr">(49,</ref><ref type="bibr">50)</ref>. We used the differential migration of the nicked (n) and linear (li) cleavage products of negatively supercoiled (nSC) dsDNA plasmid on an agarose gel to analyze Cas9 cleavage activity (51) (Fig. <ref type="figure">S2A</ref>).</p><p>Linear products represent fully cleaved DNA, in which both strands were cleaved by Cas9. The accumulation of linear DNA over time was used to determine rates of cleavage for pTarget. Cleavage rates of pTarget were significantly variable depending both on target sequence and Cas9 variant (Fig. <ref type="figure">1B,</ref><ref type="figure">C,</ref><ref type="figure">S2A,</ref><ref type="figure">B</ref>). SpCas9 cleaved pTarget PS4 ~3.6-fold faster than pTarget EMX1. A similar trend was observed for SaCas9, although this ortholog cleaved both pTargets ~7-fold slower than SpCas9 (Fig. <ref type="figure">1B,</ref><ref type="figure">C,</ref><ref type="figure">S2A,</ref><ref type="figure">B</ref>). Among the HF Cas9s, HiFi Cas9 had cleavage rates that were comparable to WT SpCas9. In contrast, SpCas9 HF1 and HypaCas9 cleaved pTarget PS4 ~36-and ~12-fold slower than SpCas9, respectively (Fig. <ref type="figure">1B</ref>, C, S2B), similar to previously reported cleavage defects for these two HF Cas9 variants <ref type="bibr">(52)</ref>. However, cleavage rates for SpCas9 HF1 and HypaCas9 were comparable to SpCas9 for pTarget EMX1 (Fig. <ref type="figure">1C</ref>, S2A, B), indicating that cleavage defects for HF Cas9 variants may vary based on target sequence.</p><p>For pLibrary cleavage assays, we observed a substantial amount of nicked product, resulting from incomplete cleavage of the target. We therefore determined the apparent rate of overall cleavage (nicked and linearized product) (Fig. <ref type="figure">1C</ref>, S2A, B, see methods section -In vitro cleavage assay and analysis). As expected, rates of pLibrary cleavage were substantially slower than for pTargets, due to the presence of mismatches in the target sequence (Fig. <ref type="figure">1B,</ref><ref type="figure">C,</ref><ref type="figure">S2A,</ref><ref type="figure">B</ref>). SpCas9 rapidly cleaved more than 50% of both negatively supercoiled pLibraries, with the vast majority of product DNA becoming linearized (Fig. <ref type="figure">1B,</ref><ref type="figure">C</ref>). In contrast, for SaCas9 and HF Cas9 variants, we observed greater accumulation of nicked plasmid, especially for pLibrary PS4 (Fig. <ref type="figure">1B,</ref><ref type="figure">C</ref>). On average, all other Cas9 variants accumulated significantly more nicked product for pLibrary PS4 than SpCas9 (Fig. <ref type="figure">1D</ref>). For pLibrary EMX1, SaCas9 and SpCas9 HF1 had significantly more accumulation of nicked product than SpCas9 (Fig. <ref type="figure">1D</ref>).</p><p>We also checked whether cleavage occurred outside of the target region during pLibrary cleavage by testing the cleavage activity of Cas9 against the empty plasmid backbone without and with the different crRNAs (Fig. <ref type="figure">S2C</ref>). The empty plasmid was minimally cleaved by Cas9-tracrRNA:crRNA, except in the case of SpCas9-EMX1 crRNA where a substantial nicked product was observed at the three hour time point. However, we do not observe similar amounts of nicking of the pLibrary EMX1 by Cas9 (Fig. <ref type="figure">S2C</ref>) and further analysis indicated that pLibrary nicking is target-sequence dependent (see below).</p><p>To determine which sequences were cleaved by Cas9 variants, we extracted the plasmid DNA from the supercoiled and nicked pools, performed barcoded-PCR amplification and multiplexed, highthroughput sequencing (HTS) to get sufficient coverage of reads for each sample (Fig. <ref type="figure">1A</ref>, S2D see methods section -Library preparation for HTS). Although we were unable to sequence the linearized pool using PCR amplicon sequencing, for our analysis, we assumed that target sequences absent from both the supercoiled and nicked pools were linearized. We determined the fraction of counts for the target sequences in the HTS data and normalized this fraction with the fraction of DNA present in the pool at a given time point (Fig. <ref type="figure">1B,</ref><ref type="figure">C</ref>) to represent an estimated abundance of given target sequences within the pool (see methods section -HTS analysis). Here, target sequences cleaved by Cas9 were depleted from the supercoiled pool while those nicked by Cas9 were enriched in the nicked pool.</p><p>We initially evaluated the cumulative effects of mismatches on the cleavage activity of each Cas9 variant by plotting the log-fold change in targets containing different numbers of mismatches with the crRNA guide sequence over time (Fig. <ref type="figure">2</ref>). We also plotted target abundance as mismatch distribution curves (Fig. <ref type="figure">S4</ref>). Together, the heatmaps and mismatch distribution curves enable overall comparison of cleavage for target sequences containing varying numbers of mismatches with the crRNA across Cas9 variants, across time points for each Cas9 variant (Fig. <ref type="figure">2,</ref><ref type="figure">S4</ref>). As expected, the perfect target (zero mismatch) was rapidly depleted from the supercoiled pool of the pLibrary (Fig. <ref type="figure">2A,</ref><ref type="figure">B,</ref><ref type="figure">S4A,</ref><ref type="figure">B</ref>). SpCas9 partially cleaved sequences with up to four mismatches in the first time point tested for both pLibraries, as observed in previous in vitro and in vivo studies on SpCas9 cleavage specificity <ref type="bibr">(17,</ref><ref type="bibr">18)</ref> (Fig. <ref type="figure">S4A,</ref><ref type="figure">B</ref>). This observation indicates that our in vitro pLibrary cleavage assay reproduced a similar specificity profile for SpCas9 as previous studies and can further be used to benchmark against SaCas9 and HF Cas9 variants. Like SpCas9, SaCas9 and HF Cas9 variants cleaved sequences containing up to four mismatches in both pLibraries, although the rate and extent of depletion of these sequences varied (Fig. <ref type="figure">2A</ref>, B and S4A, B). In general, variations in rates of depletion of mismatched sequences correlated with reduced rates of cleavage of the perfect target (Fig. <ref type="figure">2A,</ref><ref type="figure">B</ref>), with SpCas9 HF1 and HypaCas9 showing slowest depletion of all targets in pLibrary PS4 and SaCas9 showing slowest depletion of all targets in pLibrary EMX1.</p><p>We also observed substantial accumulation of nicked target sequences with two to five mismatches, especially for SaCas9 and HF Cas9 variants cleaving pLibrary PS4 (Fig. <ref type="figure">2C,</ref><ref type="figure">D,</ref><ref type="figure">S4C,</ref><ref type="figure">D</ref>).</p><p>These results suggest that SaCas9 and HF Cas9 variants are slower to fully cleave targets containing several mismatches than WT SpCas9, resulting in formation of nicks. Notably, some target sequences with one and two mismatches were initially nicked by SaCas9 or HF Cas9 variants, but subsequently depleted from the nicked pool due to completion of DSB formation (Fig. <ref type="figure">2C,</ref><ref type="figure">D,</ref><ref type="figure">S4C,</ref><ref type="figure">D</ref>). In addition, we observed differential amounts of accumulation of nicked DNA for targets containing three to five mismatches between the two pLibraries (Fig. <ref type="figure">2C,</ref><ref type="figure">D</ref>). Overall, these data suggest that Cas9 variant and crRNA sequence can affect the rate of second-strand cleavage at mismatched targets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prolonged exposure reduces specificity of high-fidelity Cas9 variants</head><p>Our HTS data allows us to compare the overall cleavage efficiency and specificity of the Cas9 variants. We first determined the efficiency of cleavage of the perfect target and targets with multiple mismatches (one to five MM) in the pLibrary (Fig. <ref type="figure">3A -D</ref>) (see methods -HTS analysis). Cleavage efficiencies of the perfect target within pLibrary were similar to those observed for pTarget (Fig. <ref type="figure">1B,</ref><ref type="figure">C,</ref><ref type="figure">S2B</ref>). Analysis of mismatched targets indicated differences in cleavage efficiencies in comparison to the perfect target (Fig. <ref type="figure">3C,</ref><ref type="figure">D</ref>). For example, while HiFi Cas9 cleaved the PS4 perfect target with similar efficiency to SpCas9, we observed a marked reduction in cleavage of PS4 mismatched targets for HiFi Cas9 (Fig. <ref type="figure">3C</ref>).</p><p>To analyze these differences in cleavage efficiencies, we generated a specificity score that reports the relative efficiency of cleavage of on-and off-target sequences over time for the Cas9 variants relative to SpCas9 (Fig. <ref type="figure">3E,</ref><ref type="figure">F</ref>) (See methods -HTS analysis). For the two WT Cas9 orthologs, we did not observe significant differences in specificity scores, suggesting that SpCas9 and SaCas9 have similar specificities for the two target sequences. All three HF Cas9 variants had some significant differences in specificity scores relative to WT SpCas9 at early time points (Fig. <ref type="figure">3E</ref>). The relative specificity scores for HF Cas9 variants were substantially larger for pLibrary PS4 than for pLibrary EMX1. Notably, we did not observe significant differences in specificity scores for any Cas9 variants at later time points (&#8805; 30 min).</p><p>The lack of specificity differences between WT and HF Cas9 at longer time points indicates that prolonged exposure of HF Cas9 variants can eventually lead to off-target cleavage activity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sequence determinants of Cas9 cleavage activity and nicking defects</head><p>We next wanted to characterize the effects of mismatch position and type on Cas9 cleavage (Fig. <ref type="figure">3E,</ref><ref type="figure">F</ref>). We analyzed the sequences present in both the supercoiled and nicked pools and calculated the relative abundance of target sequences containing one to five mismatches over time (see methods section -HTS analysis). To visualize the effects of mismatches, we used bubble heatmaps that reveal the maximal extent of cleavage (defining bubble size) and the rate of cleavage (defining gradient color) for targets containing a given mismatch type at a given position of the target.</p><p>For the supercoiled pool, target sequences that were depleted over time represent sequences that can be cleaved by Cas9. Therefore, the minimum relative abundance value in the time course (abundancemin) represents the extent of target sequence cleavage by Cas9 (Fig. <ref type="figure">4,</ref><ref type="figure">S5</ref>). To estimate the rate of depletion of sequences from the supercoiled pool, we calculated the maximal change in relative abundance between time points (max &#8710; abundance), colored as depleted (red) or unchanged (white). For SpCas9, the heatmaps reveal cleavage defects in the PAM-proximal "seed" region for target sequences with two to four mismatches, similar to previously reported seed regions comprising eight to ten PAMproximal nucleotides <ref type="bibr">(6,</ref><ref type="bibr">17,</ref><ref type="bibr">18,</ref><ref type="bibr">53,</ref><ref type="bibr">54)</ref>. Seed defects for target sequences with one mismatch were less pronounced. The higher tolerance for a single mismatch in the seed sequence is likely due to the relatively high concentration of Cas9 used for pLibrary cleavage <ref type="bibr">(18)</ref>. Notably, while seed-dependent defects were evident for other Cas9 variants (Fig. <ref type="figure">4</ref>, S5), SpCas9 HF1 and HypaCas9 also had substantial cleavage defects for targets containing mismatches located outside of the seed for pLibrary PS4 (Fig. <ref type="figure">4</ref>). Mismatches located toward the middle of PS4 (positions 11 to 13) were particularly deleterious for HypaCas9 cleaving one to three mismatch targets, while PAM-distal mismatches as far as the second to last position (position 19) from the PAM were highly deleterious for SpCas9 HF1. These results suggest that mismatches are more uniformly deleterious throughout the target for some HF Cas9 variants, although this observation was dependent on target sequence. Despite differences in the rate and extent of cleavage, mismatch specific effects were generally very similar among all Cas9 variants. These effects were more pronounced in the seed, where C-C or U-C mismatches were generally strongly deleterious (Fig. <ref type="figure">4,</ref><ref type="figure">S5</ref>). In contrast, G-T mismatches were tolerated well within the seed for all Cas9 variants. These mismatch identity observations for Cas9 are consistent with previous in vitro library studies <ref type="bibr">(6,</ref><ref type="bibr">40)</ref>.</p><p>As noted above, we observed significant accumulation of nicked plasmid for all Cas9 variants in comparison to SpCas9 for pLibrary PS4, and for SaCas9 and Cas9 HF1 for pLibrary EMX1 (Fig. <ref type="figure">1D</ref>). This accumulation was likely due to a defect in cleavage of the second strand following an initial nicking event.</p><p>Our HTS data revealed that in the nicked pool, some target sequences initially have a high relative abundance that decreased over time, indicating the eventual formation of a DSB. In contrast, some target sequences were initially uncleaved but accumulated within the nicked pool over time. To visualize these effects, we plotted the maximum abundance (abundancemax) to define the maximal extent of nicking and colored by the average change in abundance over time (&#8710; abundance) to define nicked targets that were depleted (red), accumulated (blue), or unchanged (white) following the first time point (Fig. <ref type="figure">5,</ref><ref type="figure">S6</ref>). These heatmaps reveal that second strand cleavage defects are highly dependent on mismatch position, and in some cases on mismatch type. While some nicking defects for SaCas9 were caused by seed mismatches, the most notable nicking defects occurred for targets containing mismatches toward the middle of the target sequence (positions 9 to 12). For pLibrary PS4 targets, G-T or U-G mismatches within this region caused a nicking defect that was severely compounded upon addition of further mismatches. For sequences with one or two mismatches, targets containing these mismatches were initially nicked, but rapidly linearized, as visualized by large red circles (Fig. <ref type="figure">5</ref>). However, when present within three or four mismatch-containing targets, these mismatches caused the target to remain nicked for prolonged periods, as visualized by large blue circles. Similar positional defects in second-strand cleavage were observed for SaCas9 cleaving pLibary EMX1, although the effects were less dependent on mismatch type and less substantial than for PS4 (Fig. <ref type="figure">5,</ref><ref type="figure">S6</ref>). Overall, these results suggest that mismatches toward the middle of the target can reduce second-strand cleavage by SaCas9.</p><p>For HF Cas9 variants, we observed a similar position-specific defect in second-strand cleavage for pLibrary PS4 (Fig. <ref type="figure">5,</ref><ref type="figure">S6</ref>). These defects were not correlated with any particular mismatch type and appeared to be dependent mainly on mismatch location. Mismatches located in the PAM distal region, particularly positions 11 to 16, caused strong nicking defects for all three HF Cas9 variants for pLibrary PS4. A similar position dependence was observed for EMX1 for HF Cas9 variants, although less nicking was observed overall for this target (Fig. <ref type="figure">1D,</ref><ref type="figure">S4D,</ref><ref type="figure">S5</ref>). Notably, although we observed similar patterns of depletion of supercoiled DNA between SpCas9 and HiFi Cas9 (Fig. <ref type="figure">4</ref>), the mismatch position-dependent nicking defect was substantially greater for HiFiCas9 than for SpCas9, especially for PS4 targets containing three or four mismatches (Fig. <ref type="figure">5</ref>). This suggests that while HiFi Cas9 can cleave target sequences with similar numbers and types of mismatches as the wild-type protein, accumulation of mismatches in the PAM-distal region results in a defect in cleavage of the second strand for HiFi Cas9 that is not observed for the wild-type protein.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Closely spaced mismatches compound overall and second-strand cleavage defects</head><p>For sequences containing multiple mismatches, it has previously been observed that the distance between mismatches can affect the level of cleavage defect by SpCas9 <ref type="bibr">(6,</ref><ref type="bibr">14,</ref><ref type="bibr">17,</ref><ref type="bibr">18,</ref><ref type="bibr">38,</ref><ref type="bibr">39,</ref><ref type="bibr">55)</ref>. We wished to determine the extent to which mismatch separation affects cleavage by all five Cas9 variants, as well as whether distance between mismatches influenced second-strand cleavage defects. We analyzed sequences containing two mismatches, which were highly represented in our target libraries (Fig. <ref type="figure">S1C</ref>). In a 20-nucleotide sequence, two mismatches can be separated by between 0 (i.e. mismatches located at adjacent positions) and 18 nucleotides (i.e. mismatches located at the beginning and end of the sequence). To determine how this distance affects the rate of cleavage for the Cas9 variants, we analyzed the supercoiled and nicked pool using bubble heatmaps as described above, but now based on the distance between the two mismatches and the location of the two mismatches. Double mismatches spaced close together (zero to four nucleotides separation) caused substantial decrease in depletion from the supercoiled pool, consistent with previous reports that closely spaced mismatches are deleterious for Cas9-dependent cleavage <ref type="bibr">(17,</ref><ref type="bibr">18,</ref><ref type="bibr">38)</ref> (Fig. <ref type="figure">6A,</ref><ref type="figure">B</ref>). One exception was SaCas9, which did not display a defect for closely spaced double mismatches for pLibrary PS4 (Fig. <ref type="figure">6A</ref>), although this defect was apparent for pLibrary EMX1 (Fig. <ref type="figure">6B</ref>). Conversely, SpCas9 HF1 displayed substantial cleavage defects for double mismatches spaced further apart (14 to 17 nucleotides separation) for PS4 (Fig. <ref type="figure">6A</ref>), a defect that was not observed for EMX1 (Fig. <ref type="figure">6B</ref>). These results underscore the variability in mismatch effects based on target sequence. To determine whether the effect of mismatch spacing is also influenced by the position within the target, we analyzed the effects of two mismatches separated by between zero and eight nucleotides within the seed or PAM-distal region (Fig. <ref type="figure">6C,</ref><ref type="figure">D</ref>). Closely spaced mismatches (five or fewer nucleotides separation) were highly deleterious in the seed. In contrast, mismatch spacing had little impact in the PAM-distal region, where mismatches separated by any distance were similarly tolerated.</p><p>For the nicked pool, double mismatches caused similar amounts of nicking defects regardless of spacing across the whole target, as visualized by bubbles of similar sizes (Fig. <ref type="figure">6E,</ref><ref type="figure">F</ref>). However, the rate of nicking or linearization of targets was impacted to some degree by mismatch distance. This is especially apparent for SaCas9 and HiFi Cas9 cleaving pLibrary PS4 (Fig. <ref type="figure">6E</ref>). While most double mismatches led to eventual linearization by SaCas9 and HiFi Cas9, as visualized by bubbles with shades of red or white, mismatches spaced 13 to 16 nucleotides apart were shades of blue, indicating accumulation of these targets in the nicked fraction due to a stronger nicking defect. Mismatches with this spacing necessarily places one mismatch within the PAM-distal region, consistent with the positiondependent nicking defect described above (Fig. <ref type="figure">5,</ref><ref type="figure">S6</ref>). Further analysis of mismatch distance in the PAM-distal region revealed marked distance-dependent effects (Fig. <ref type="figure">6G,</ref><ref type="figure">H</ref>). In general, for SaCas9 and HF Cas9 variants, mismatches spaced closer together in the PAM-distal region caused second-strand cleavage defects and accumulation in the nicked pool for pLibrary PS4 (Fig. <ref type="figure">6G</ref>). Double mismatches separated by four or fewer nucleotides in the PAM-distal region were especially deleterious for SpCas9 HF1 and HypaCas9 (Fig. <ref type="figure">6G</ref>). A similar defect for closely spaced double mismatches in the PAM-distal region was observed for SpCas9 HF1 for pLibrary EMX1, although the extent of the defect was less substantial (Fig. <ref type="figure">6H</ref>). For SaCas9 cleaving pLibrary PS4, mismatches spaced further apart (six to eight nucleotides) in either region caused a partial defect in second-strand cleavage resulting in delayed linearization, as visualized by large white or red bubbles (Fig. <ref type="figure">6G</ref>). Overall, these results indicate that multiple closely spaced mismatches within the PAM-distal region can cause reduced rates of secondstrand cleavage for SaCas9 and HF Cas9 variants, albeit in a target-dependent manner.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Validating nicking defects against mismatched targets</head><p>Finally, to validate the nicking defect observed in pLibrary cleavage, we verified cleavage of individual target sequences containing two to five mismatches that were present in the nicked pool of pLibrary PS4 at the longest time point (three hours). Targets were subjected to cleavage by each Cas9 variant (Fig. <ref type="figure">7A</ref>) and the extent of nicking and linearization was quantified at 10 min and 3 h (Fig. <ref type="figure">7B</ref>). All Cas9 variants linearized targets with two or three mismatches after three hours of incubation. We observed small but significant differences in nicking and linearization between SpCas9 and other Cas9 variants for targets with two or three mismatches. The nicking defect was most notable for SaCas9 cleaving a target containing three mismatches in comparison to SpCas9, which mostly linearized this target. For targets with more than three mismatches, we observed substantially less linearization for all Cas9 variants. A target containing four mismatches within the seed (pTarget 4.1 MM) caused the strongest defect in any type of cleavage, although both SpCas9 and SaCas9 nicked 30 to 40% of the target by 3 h. Cleavage was significantly lower for all three HF Cas9 variants for this target. In contrast, all Cas9 variants cleaved target sequences with four or five mismatches in the PAM-distal region (pTarget 4.2 MM and 5 MM). As expected, based on our HTS analysis, these targets were nicked substantially but not linearized, indicating a second-strand cleavage defect. For pTarget 4.2 MM, SaCas9, HypaCas9 and HiFi Cas9 had significantly more nicked product and significantly less linearized product that SpCas9, indicating a stronger nicking defect for these variants. In contrast, SpCas9 accumulated significantly more nicked product than SaCas9, SpCas9 HF1 and HiFi Cas9 by 3 h for pTarget 5 MM, consistent with the overall lower specificity of SpCas9. Overall, these results validate stronger nicking and overall cleavage defects of SaCas9 and HF Cas9 variants in comparison to SpCas9 for the PS4 target.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DISCUSSION</head><p>Cas9 specificity has been the subject of substantial investigation and engineering efforts, due to its importance for genome editing technologies <ref type="bibr">(6, 16-18, 25, 27, 31, 35, 39-41)</ref>. However, many previous studies investigated individual Cas9 variants separately, focusing on target binding and/or DSB formation by Cas9. Our in vitro library cleavage assay has enabled a comparative study of the cleavage specificity of Cas9 variants, revealing cleavage defects that have previously remained undetected <ref type="bibr">(6,</ref><ref type="bibr">14,</ref><ref type="bibr">15,</ref><ref type="bibr">38,</ref><ref type="bibr">40,</ref><ref type="bibr">52)</ref>. We find that engineered SpCas9 variants display higher specificity than wild-type SpCas9 in a target-dependent manner, although prolonged exposure reduces this specificity. Over time, all Cas9 variants can cleave sequences with up to five mismatches. However, while SpCas9 linearizes most target sequences with multiple mismatches as previously observed, SaCas9 and HF Cas9 variants often only nick these sequences. It is well established that Cas9 binds to sequences with limited similarity to the crRNA, although it has generally been concluded that cleavage may not occur at these sites <ref type="bibr">(6,</ref><ref type="bibr">40,</ref><ref type="bibr">(56)</ref><ref type="bibr">(57)</ref><ref type="bibr">(58)</ref>. Our results now reveal that partial cleavage can occur at off-target sites, although second-strand cleavage defects prevent DSB formation. Most previous specificity studies tested for DSB and/or indel formation at target and off-target sites <ref type="bibr">(6,</ref><ref type="bibr">25,</ref><ref type="bibr">27,</ref><ref type="bibr">31,</ref><ref type="bibr">35,</ref><ref type="bibr">39,</ref><ref type="bibr">40)</ref>. Although nicked DNA may be subject to error-prone DNA repair or lead to collapse of replisomes and potential mutagenesis <ref type="bibr">(59)</ref><ref type="bibr">(60)</ref><ref type="bibr">(61)</ref><ref type="bibr">(62)</ref><ref type="bibr">(63)</ref>, nicks may also be repaired by error-free DNA repair pathways. Thus, nicking defects may obscure cleavage that does occur at off-target sites, resulting in higher genome editing specificity for SaCas9 and HF Cas9 variants.</p><p>Recent studies have compared the binding and cleavage specificities of SpCas9 and HF variants, including Cas9 HF1 and HypaCas9 <ref type="bibr">(6,</ref><ref type="bibr">52,</ref><ref type="bibr">64,</ref><ref type="bibr">65)</ref>. Although target binding defects were not observed for these variants, PAM-distal mismatches decreased the rate of cleavage for both variants in comparison to SpCas9. Our results reveal that PAM-distal mismatches not only slow the rate of overall cleavage but can also slow the rate of DSB formation for SaCas9 and HFCas9 variants, leading to nick formation. This second-strand cleavage defect may be due to R-loop collapse and premature target release following nicking of one of the strands, as has been proposed for the overall decreased kinetics of off-target cleavage by HF Cas9 variants <ref type="bibr">(6,</ref><ref type="bibr">52)</ref>. Additional defects may be caused by decreased movement of the HNH domain, which is required for cleavage activation of both the HNH and RuvC catalytic domains <ref type="bibr">(4,</ref><ref type="bibr">27,</ref><ref type="bibr">28,</ref><ref type="bibr">(66)</ref><ref type="bibr">(67)</ref><ref type="bibr">(68)</ref>. Single-molecule studies of SpCas9 HF1 and HypaCas9 revealed that HNH domain movements were diminished in comparison to wild-type SpCas9, especially in the presence of PAM-distal mismatches <ref type="bibr">(27)</ref>. Together with our observation of nicking defects caused by PAM-distal mismatches, this suggests that cleavage by the HNH domain is impaired upon binding to targets with PAM-distal mismatches due to loss of domain rearrangements necessary to position the HNH active site for cleavage. However, sufficient HNH domain movement may occur to trigger cleavage of the non-target strand by the RuvC domain, leading to nicking of the non-target strand.</p><p>The natural role of Cas effectors is to provide defense against invading genetic elements. Specificity of these effectors has likely been tuned through evolutionary pressures exerted by rapidly evolving phages and other mobile genetic elements. Thus, it is surprising that natural orthologs of Cas effectors, including SaCas9 and various Cas12a orthologs, have been shown to have higher intrinsic genome editing specificity than SpCas9 <ref type="bibr">(21,</ref><ref type="bibr">23,</ref><ref type="bibr">(69)</ref><ref type="bibr">(70)</ref><ref type="bibr">(71)</ref>. In vitro investigations have been vital for defining the native cleavage specificities of these nucleases to understand their natural role as immune effectors.</p><p>We and others have observed that Cas9 and Cas12a have similar PAM-distal mismatch tolerance and similar defects for C mismatches and tolerances of T mismatch <ref type="bibr">(6,</ref><ref type="bibr">42)</ref>. These findings are consistent with the observation that mismatch position impacts the ability of phages to escape immunity <ref type="bibr">(10,</ref><ref type="bibr">11,</ref><ref type="bibr">13)</ref>, and suggest that the types of mutations that arise may be similarly consequential. We also previously observed that Cas12a, like SaCas9 and HF Cas9 variants, can cleave sequences with several mismatches, but displays a second-strand cleavage defect in the presence of multiple PAM-distal mismatches <ref type="bibr">(42)</ref>. The ability to nick target sequences with multiple mismatches may allow broader immunity against phages, as nicking within mutated target regions may reduce the rate of phage replication and could still enable target degradation by host nucleases <ref type="bibr">(10,</ref><ref type="bibr">72)</ref>. Non-specific nicking activities have also been reported for several Cas effector proteins <ref type="bibr">(42,</ref><ref type="bibr">72,</ref><ref type="bibr">73)</ref>, suggesting that DNA nicking is part of the vast repertoire of nucleic acid cleavage activities employed by CRISPR-Cas systems to neutralize phage infection. Future studies may determine whether single-strand breaks in the invading phage genome are sufficient for CRISPR-mediated immunity.        </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>TABLES, FIGURES AND LEGENDS</head></div></body>
		</text>
</TEI>
