Oligo Mapping of mRNA Digests Using a Novel Informatics Workflow
Abstract
Recent advances in LC-MS and informatics technologies are supporting scientists in their efforts to design innovative strategies for mRNA critical attribute analysis. This application note describes a workflow for mRNA sequence mapping, from sample preparation using RNase T2 enzymes, to data acquisition using UPLC™-QTof MS, and novel informatics for data analysis. The combined use of a novel RNA digestion enzyme and automated software processing for digested oligo mapping of Green Fluorescent Protein (GFP) mRNA are discussed in detail. Improved sequence coverage based on unique accurate masses of oligonucleotide digest products is illustrated. In addition to the sequence confirmation, we also demonstrated that 5’ capping efficiency analysis can benefit from use of the novel RNase T2 digestion approach.
Benefits
- A new informatics workflow featuring the waters_connect™ MAP Sequence App streamlines oligonucleotide mapping of mRNA digest data acquired using UPLC QTof MS
- RNase T2 enzymes (MC1 and Cusativin) offer unique cleavage specificity and opportunity to generate overlapping digestion products, achieving higher sequence coverage, compared to conventional RNase T1 digestion
Introduction
The recent development and approval of two COVID mRNA-based vaccines has brought RNA-based therapeutics to the forefront of the biopharma industry.1–3 The resulting need for rapid product development has necessitated development of analytics for the precise characterization of mRNA critical quality attributes (CQAs), including sequence and modification integrity. Two workflows were previously developed by Waters™ for directly assessing mRNA CQAs, namely 5’ capping efficiency measurement and Poly(A) Tail heterogeneity analysis, utilizing the waters_connect INTACT Mass App, while a different workflow utilizing the MAP Sequence App, was recently demonstrated for oligo mapping of 100-mer single guide RNAs.4,5,6
Conventional oligonucleotide sequencing techniques, like Sanger sequencing and next-generation sequencing (NGS) have been applied for sequence analysis of mRNAs, due to their cost effectiveness and throughput, but lack the ability for assessment of the many modifications that can arise during production and degradation of mRNA molecules. Alternatively, LC-MS based approaches known for their exceptional specificity, sensitivity, and quantification performance are becoming more popular, and readily address base and backbone modifications.7–10 Overall, for characterization of complex biologics, orthogonal techniques are desired to obtain a holistic view of product quality.
LC-MS workflows for RNA digest oligo mapping have historically been laborious and time consuming, involving significant manual data analysis and curation. In a recent application note, we discussed a UPLC-MS and informatics workflow for automatically mapping sgRNAs sequences following individual digestion by an array of enzymes, including RNase T1 and RNase T2 enzymes (RapiZyme MC1 and Cusativin), and hRNase4.6 This application note extends the utility of the workflow (highlighted in Figure 1) to larger mRNA molecules digested by multiple RNA digestion enzymes, including two recently launched RNase T2 enzymes (RapiZyme MC1 and RapiZyme Cusativin), that have unique cleavage patterns that promote higher sequence coverage.6,11 The rapid (<2 minutes) automated assignment of sgRNA digestion products and streamlined interface for data processing and review provided a significant advantage for sgRNAs and should provide even greater efficiencies for the more complex mRNA characterization exercise. In addition, digestion by RapiZyme MC1 and RapiZyme Cusativin was shown to provide superior results for the co-measurement of mRNA capping efficiency compared with the widely used RNase T1 digestion approach.
Experimental
Reagents and Sample Preparation
Dipropylethylamine (DPA, 99% purity, catalogue number D214752–500ML) and 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP, 99% purity, catalogue number 105228–100G) were purchased from Millipore Sigma (St Louis, MO). Methanol (LC-MS grade, catalogue number 34966–1L) was obtained from Honeywell (Charlotte, NC). HPLC grade Type I deionized (DI) water was purified using a Milli-Q system (Millipore, Bedford, MA). Mobile phases were prepared fresh daily. Ultrapure nuclease-free water (catalogue number J71786.AE) for mRNA digestions was purchased from Thermo Fisher Scientific (Waltham, MA).
An mRNA construct based on the jellyfish green fluorescent protein (GFP) sequence was custom-made via IVT (in vitro transcription) synthesis by Biosynthesis (Lewisville, TX). The mRNA molecule was synthesized with a Cap1 structure (with the sequence: 7MeGpppA(2’-OMe) with elemental composition: C32H42N15O26P5), followed by 1019 nucleotides and no Poly(A) Tail sequence.
Chromatographically purified, animal free, ribonuclease T1 (catalogue no LS01490, 500kU), isolated from Aspergillus oryzae, was purchased from Worthington Biochemical Corporation (Lakewood, NJ). The lyophilized enzyme was dissolved in 5 mL of 100 mM ammonium bicarbonate (catalogue no 5.33005–50G, Millipore Sigma) to prepare a solution containing 100 units/µL. For mRNA digestion with RNase T1, 5 µL of 5 µM GFP mRNA were mixed with 25 µL of nuclease-free water and 10 µL of RNase T1 enzyme (1000 units) and the digestion was allowed to proceed at 37 oC for 15 minutes. The digestion mixture was prepared in a QuanRecovery™ MaxPeak™ 300 µL vial. The digest was analyzed immediately by LC-MS using 5 µL injections.
RapiZyme MC1 (p/n: 186011190, 10000 units/tube) and RapiZyme Cusativin (p/n: 186011192, 10000 units/tube) are two novel RNA digestion enzymes recently introduced by Waters Corporation.6,7 The GFP mRNA digestion protocols used for RapiZyme MC1 and Cusativin are very similar. For RapiZyme MC1, the GFP mRNA (10 µL, 5 µM solution) was denatured at 90 oC for 2 minutes in a buffer containing 200 mM ammonium acetate pH 8.0. For RapiZyme Cusativin, the mRNA (10 µL, 5 µM solution) was denatured at 90 oC for 2 minutes in a buffer containing 200 mM ammonium acetate pH 9.0. Both samples were cooled on ice and microcentrifuged to collect the sample droplets. After adding 50 units of digestion enzyme (1 µL of either RapiZyme MC1 or RapiZyme Cusativin) and 8 µL of nuclease-free water to obtain a final volume of ~20 µL, the mRNA was digested at 37 oC for 60 minutes in an Eppendorf thermomixer. The enzymatic digestion was stopped by heating to 70 ºC for 15 minutes, to inactivate the enzymes. The digest was analyzed immediately by LC-MS using 5 µL injections.
All datasets were acquired with waters_connect UNIFI™ App version 3.6.0.21 and subsequently processed using the MAP Sequence App, assisted by the mRNA Cleaver and Coverage Viewer MicroApps.
LC Conditions
LC-MS system: |
Xevo™ G3 QTof LC-MS with ACQUITY™ Premier UPLC (Binary) System |
Column: |
ACQUITY Premier Oligonucleotide BEH™ C18 FIT Column 130 Å, 1.7 µm, 2.1 x 150 mm, (p/n: 186009486) |
Column temperature: |
60 oC |
Flow rate: |
300 µL/min |
Mobile phases: |
Solvent A: 10 mM DPA (dipropylethylamine), 40 mM HFIP (1,1,1,3,3,3-hexafluoroisopropanol) in DI water, pH 8.5 Solvent B: 10 mM DPA, 40 mM HFIP in 50% methanol |
Sample temperature: |
8 oC |
Sample vials: |
QuanRecovery MaxPeak HPS vials (p/n: 186009186) |
Injection volume: |
5 µL |
Wash solvents: |
Purge solvent: 50% MeOH Sample Manager wash solvent: 50% MeOH Seal wash: 20% acetonitrile in DI water |
Gradient Table
MS Conditions
MS system: |
Xevo™ G3 QTof Mass Spectrometer |
Ionization mode: |
ESI(-) |
Acquisition mode: |
MSE |
Acquisition rate: |
1 Hz |
Capillary voltage: |
2.5 kV |
Cone voltage: |
40 V |
Source offset: |
60 V |
Source temperature: |
120 oC |
Desolvation temperature: |
550 oC |
Cone gas flow: |
50 L/h |
Desolvation gas flow: |
600 L/hr |
TOF mass range: |
340–4000 (MSE acquisition) |
Low energy CE: |
6 V |
High energy CE ramp: |
25 to 50 V |
Lock-mass: |
50 pg/µL Leu Enk |
Data acquisition: |
waters_connect 3.6.0.21 |
Data processing: |
waters_connect 3.6.0.21 |
Data processing: |
mRNA Cleaver MicroApp v1.1.0 |
Results and Discussion
As biotherapeutics organizations advance their product development pipelines, increasing adoption for LC-MS based methods for RNA sequence analysis helps to achieve more thorough initial characterization and reliable ongoing sequence confirmation. Improvements in chromatographic reproducibility and resolution, and the increased usability of high mass resolution MS, along with improved sensitivity and accuracy of MS systems, allow for confident sequence confirmation and detection of base modifications. Most importantly, new oligonucleotide informatics tools have made the daunting task of data analysis for RNA map data analysis faster and easier.
While LC-MS for digested peptide mapping of biopharmaceutical proteins is routine, employing equivalent approaches to RNA is more challenging. Unlike proteins, which are composed of 20 different amino acids, mRNAs contain only four building blocks (A, C, G, T), making the generation of oligonucleotide digestion products that have unique masses mappable to mRNAs more challenging. High frequency cleavers, like RNase T1 with G-cleavage specificity, generate many isomeric or even identical mRNA digestion products that result in ambiguous assignments. Enzymes with lower frequency cleavage sites are able to produce longer digestion products, especially when they generate a population of missed cleavage oligonucleotides. The novel RNase Type2 enzymes employed in this work (RapiZyme MC1 and RapiZyme Cusativin), have a greater chance at producing assignable digestion products with unique masses.
An informatics workflow (Figure 1) featuring the waters_connect MAP Sequence App for data processing and sequence assignments was designed to facilitate fast and automated data processing of LC-MS datasets acquired following the enzymatic digestion and UPLC-MS data acquisition.6 In the first step, the mRNA Cleaver MicroApp is used for generating the predicted in-silico digested oligonucleotide products of the specified mRNA molecule. In the second step, the waters_connect MAP Sequence App processes the UPLC-MS data and matches the predicted neutral monoisotopic masses of digested oligos to the experimental MS1 data. In the final step, the sequence coverage obtained for the mRNA digest is summarized and visualized using the Coverage Viewer MicroApp.
As the workflow utilizes the accurate mass measurement of RNA digestion products for oligo mapping, it is critical to generate unambiguous, unique products through enzymatic digestion, to reduce reliance on user intervention with MS2 fragmentation data to resolve assignment ambiguities. RapiZyme MC1 and RapiZyme Cusativin enzymes can produce longer unique digestion products through a combination of unique cleavage specificity and intentionally generated missed cleavages by regulating enzyme amounts and digestion time. Data independent MSE acquisition is used for confirming oligonucleotide digestion product assignments. The elevated energy MSE fragmentation information will be incorporated into future releases of the App, particularly to address some remaining ambiguous assignments.
mRNA Oligo Mapping
Three independent LC-MSE oligo map datasets were acquired for GFP mRNA sample digested with three different ribonucleases: RNase T1, RapiZyme MC1 and RapiZyme Cusativin. The three UPLC-MSE datasets were processed with the MAP Sequence App using the parameters highlighted in Figure 2 and the results were exported to the Coverage Viewer Micro App for quick visualization of sequence coverage. The three TIC chromatograms recorded for: RNase T1 (Figure 3A), Rapizyme MC1 (4A) and RapiZyme Cusativin (5A) are shown. MC1 and Cusativin are known to produce a significantly larger number of misses cleavages compared to RNase T1, and this enzymatic feature is clearly reflected in the complexity of chromatograms shown in Figures 4A and 5A.6–9
A comparison of the sequence mapping results (Figures 3B, 4B, and 5B) indicates that RNase T1 produced considerably lower unique coverage (~60%, 3B), compared to the MC1 (~97%, 4B) and Cusativin (~85%, 5B) analyses, because the number of ambiguously assigned digestion products was much higher. This is a direct result of the much broader specificity of RNase T1 (cleavage after every G residue), in contrast with MC1 and Cusativin cleavage specificity, which relies on very specific dinucleotide motifs.6–9 MC1 cleaves at the 5’-end of uridine residues, with three major cleavage sites (A_U / C_U / U_U) and two minor cleavage sites (C_A / C_G). Cusativin cleaves at the 3’-end of cytidine residues, with four major cleavage sites (C_A / C_G / C_U / U_A) and three minor cleavage sites (A_U / G_U / U_U). While RNase T1 adds a linear phosphate to the 3’-end of all its digestion products, both MC1 and Cusativin produce mainly digestion products with a 3’ cyclic phosphate.
Compared to RNase T1, MC1, and Cusativin also have a higher capacity to generate controlled missed cleavages, therefore up to four missed cleavages were allowed for these two enzymes in the mRNA Cleaver prediction tool, while only two missed cleavages were applied for RNase T1. With more missed cleavages, both MC1 and Cusativin produced longer digestion products compared to RNase T1. These longer oligonucleotide products are more likely to have unique masses, reducing the chance for ambiguous assignments. As a result of these enzymatic features, the mapping results of MC1 and Cusativin exbibit better coverage values.
In the case of ambiguous sequence assignments, the CONFIRM Sequence waters_conect App could be used to further investigate the elevated energy MSE data to assign the fragmentation ions to the most probable oligonucleotide sequences.12 To illustrate this approach, we selected a pair of 16-mer digestion products, produced by RNase T1 digestion of GFP mRNA. The pair is formed by the structural isomers U251:G266 (sequence UCC UUC CCC UCA CUC G) and C555:G570 (CCC UCC CCU CUA UCU G) with the same nucleotide composition. As shown by the inset (Figure 6 Top), MAP Sequence automatically found these two sequences based on multiple precursors (charge states 2–9) and labeled them as “alternative product assignments”. The extracted mass chromatogram of the corresponding triply charged precursor (m/z=1654.86), also showed in Figure 6, indicates the presence of a single, high-abundance, chromatographic peak which could belong to one or both oligonucleotide products. CONFIRM Sequence was used to automatically assign the fragment ions detected in the elevated energy ESI-MSE spectrum (Figure 7) recorded for this oligonucleotide digestion product to confidently reveal a match for the C555:G570 sequence with 100% sequence coverage.
Several examples of mRNA Cap analysis have been described in previously published application notes from Waters.5,13 The mRNA cap is added to the 5’-end of an mRNA and it typically consists of a fully matured 7 N-methyl guanosine that is 5’-5’-linked to the mRNA via a triphosphate ester (m7Gppp). In the case of GFP mRNA, a variant Cap1 structure was used, involving the 2’-O-methylation of an adenosine residue that is then attached to the mRNA. The Cap1 structure of the GFP mRNA is therefore denoted as m7GpppAm.
Both the uncapped and capped GFP 5’ mRNA digestion products were detected in the MC1 mRNA digest, as demonstrated by the extracted mass chromatograms displayed in Figure 8. Following MC1 digestion, the uncapped version (m/z=1419.5302, -3), eluted at 22.0 min in the top trace (Figure 8A) and the Cap1 modified species (m/z=1700.8962, -3), eluted at 24.7 min in the bottom trace (Figure 8B). Using the integrated peak area obtained from these triply charged state XIC peaks, the capping efficiency was calculated to be 79.8%.
The identity of these oligonucleotide species is further confirmed by the isotopic distributions (Figure 9) recorded for the most abundant charge states detected ([M-3H]3-). The monoisotopic peak of the free (uncapped) 13-mer MC1 digest oligonucleotide with the sequence GGA AGA CCC AAG C, is highlighted in Figure 8A, while the monoisotopic peak of the Cap1 modified 13-mer oligonucleotide is shown in the bottom panel (Figure 8B). In this case, both MC1 and Cusativin produced the same digestion product (sequence GGA AGA CCC AAG C) after cleaving the 5’-terminus of GFP mRNA, thus enabling this measurement of the mRNA capping efficiency. However, in the case of RNase T1 digestion, because the first mRNA residue at the 5’ end is a guanosine, the uncapped digestion product was too short (a single nucleotide) to be retained under the IP-RP conditions employed. For this reason, only the Cap1 digested product was detected (sequence m7GpppAmGp, most abundant monoisotopic ion: m/z=602.5482, -2, data not shown), and it was not possible to measure the mRNA capping efficiency. Therefore, in the case of GFP mRNA, only MC1 and Cusativin could be used successfully for capping efficiency calculations.
These results highlight the current utility of mass spectrometry for sequence confirmation of large mRNA molecules, but there are still gains to be realized for workflow efficiency and automation, and it remains an area of active workflow development. Continued development of software automation for processing these complex datasets, along with the expansion of the digestion enzyme toolset, will be crucial for enhancing the utility of this emerging methodology.
Conclusion
- A new informatics workflow was demonstrated, featuring the waters_connect MAP Sequence waters App, which facilitated MS1 oligo mapping of mRNA digests using UPLC-MS acquired data
- Two novel RNase T2 enzymes (RapiZyme MC1 and RapiZyme Cusativin) generated a greater population of fully digested oligonucleotides and missed cleavage oligos, with their unique digestion specificity compared to RNase T1, resulting in higher and more confident overlapping sequence coverage.
- Isomeric-digestion oligonucleotide products can be differentiated using the waters_connect CONFIRM Sequence App to match elevated energy fragment ions to sequences of isomeric oligos
References
- Xu S, Yang K, Li R, Zhang L. mRNA Vaccine Era-Mechanisms, Drug Platform and Clinical Prospection, Intl J Mol Sci Chem, 2020, 21 (18), 6582, doi: 10.3390/ijms21186582.
- Verbeke R, Lentacker I, De Smedt SC, Dewitte H. The dawn of mRNA vaccines: The COVID-19 case. J Controlled Release, 2021, 333, 511–520, doi: 10.1016/ j.jconrel.2021.03.043.
- Jackson NA, Kester KE, Casimiro D, Gurunathan S, DeRosa F. The Promise of mRNA Vaccines: A Biotech and Industrial Perspective. npj Vaccines, 2020, 5, https://doi.org/10.1038/s41541-020-0159-8.
- Synthetic mRNA Oligo-Mapping Using Ion-Pairing Liquid chromatography and Mass Spectrometry, 2022, Waters application note. June, 2022. 720007669.
- RNA CQA Analysis using the BioAccord LC-MS System and INTACT Mass waters_connect Application, 2023, Waters application note. November, 2023. 720008130.
- RNA Digestion Product Mapping Using an Integrated UPLC-MS and Informatics Workflow, 2024, Waters application note. September, 2024. 720008553.
- Jiang T, Yu N, Kim J, Murgo JR, Kissai M, Ravichandran K, Miracco E, Presnyak V, Hua S. Oligonucleotide Sequence Mapping of Large Therapeutic mRNAs via Parallel Ribonuclease Digestions and LC-MS/MS, Anal Chem, 2019, 91, 8500–8506, doi:10.1021/acs.analchem.9b01664.
- Vanhinsbergh CJ, Criscuolo A, Sutton JN, Keely M, Williamson AJK, Cook K, Dickman M. Characterization and Sequence Mapping of Large RNA and mRNA Therapeutics using Mass Spectrometry, Anal Chem, 2022, 94, 7339–7349, doi:10.1021/acs.analchem.2c00765.
- Gau B, Dawdy AW, Wang HL, Bare B, Castaneda CH, Friese OV, Thompson MS, Lerch TF, Cirelli DJ, Rouse JC. Oligonucleotide Mapping via Mass Spectrometry to Enable Comprehensive Primary Structure Characterization of an mRNA Vaccine Against SARS CoV-2, Sci Rep, 2023, 13, No 9038, doi:10.1038/s41598-023-36193-2.
- Tang S, Liu GY, Yan Y, Wang S, Li N. Development of a Flow Through-Based Limited Digestion Approach for High-Throughput and High-Sequence Coverage Mapping of Therapeutic mRNAs, Anal Chem, 2024, 96, 16944–17003, doi:10.1021/acs.analchem.4c04384.
- Tunable Digestion of RNA Using RapiZyme RNases to Confirm Sequence and Map Modifications, 2024, Waters application note. September, 2024. 720008539.
- CONFIRM Sequence: A waters_connect Application for Sequencing of Synthetic Oligonucleotide and Their Impurities, 2022, Waters application note. July, 2022. 720007677.
- Rapid Analysis of a Synthetic mRNA Cap Structure Using Ion-Pairing RPLC with the BioAccord LC-MS System, 2021, Waters application note. August, 2021. 720007329.
Featured Products
720008677, January2025