Pathogen-sugar interactions revealed by universal saturation transfer analysis

Many pathogens exploit host cell-surface glycans. However, precise analyses of glycan ligands binding with heavily modified pathogen proteins can be confounded by overlapping sugar signals and/or compounded with known experimental constraints. Universal saturation transfer analysis (uSTA) builds on existing nuclear magnetic resonance spectroscopy to provide an automated workflow for quantitating protein-ligand interactions. uSTA reveals that early-pandemic, B-origin-lineage severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike trimer binds sialoside sugars in an “end-on” manner. uSTA-guided modeling and a high-resolution cryo–electron microscopy structure implicate the spike N-terminal domain (NTD) and confirm end-on binding. This finding rationalizes the effect of NTD mutations that abolish sugar binding in SARS-CoV-2 variants of concern. Together with genetic variance analyses in early pandemic patient cohorts, this binding implicates a sialylated polylactosamine motif found on tetraantennary N-linked glycoproteins deep in the human lung as potentially relevant to virulence and/or zoonosis. Description Glycans in the spotlight Viral infection of a cell requires a complex series of molecular recognition events, often mediated by glycoproteins and cell-surface glycans. Buchanan et al. developed a nuclear magnetic resonance analysis method to better study such interactions and applied it to influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins binding sialoside glycans. For SARS-CoV-2 in particular, they found evidence for a sialoside-binding site in the N-terminal domain of the original B-origin lineage spike protein that was lost in subsequent variants. These results were corroborated by cryo–electron microscopy structures of the glycan-bound spike protein and genetic variation analysis from patients early in the pandemic, which uncovered host factors involved in glycosylation that potentially contributed to variation in disease severity. —MAF A multidisciplinary study of viral proteins uncovers binding sites and patterns for sialoside glycans. INTRODUCTION The surface proteins found on both pathogens and host cells mediate cell entry (and exit) and influence disease progression and transmission. Both types of proteins can bear host-generated posttranslational modifications, such as glycosylation, that are essential for function but can confound current biophysical methods used for dissecting key interactions. Several human viruses (including non-SARS coronaviruses) attach to host cell surface N-linked glycans that include forms of sialic acid (sialosides). There remains, however, conflicting evidence as to whether or how SARS-associated coronaviruses might use such a mechanism. In the absence of an appropriate biochemical assay, the ability to analyze the binding of such glycans to heavily modified proteins and resolve this issue is limited. RATIONALE We developed and demonstrated a quantitative extension of “saturation transfer” protein nuclear magnetic resonance (NMR) methods to a complete mathematical model of the magnetization transfer caused by interactions between protein and ligand. The designed method couples objective resonance identification and intensity measurement in NMR spectra (via a deconvolution algorithm) with Bloch-McConnell analysis of magnetization transfer (as judged by this resonance signal intensity) to enable a structural, kinetic, and thermodynamic analysis of ligand binding. Such quantification is beyond previously perceived limits of exchange rates, concentration, or system and therefore represents a potentially universal saturation transfer analysis (uSTA) method. RESULTS In an automated workflow, uSTA can be applied to a range of even heavily modified protein systems in a general manner to obtain quantitative binding interaction parameters (KD, kEx). uSTA proved critical in mapping direct interactions between sialoside sugar ligands and relevant virus surface attachment glycoproteins, including multiple variants of both severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein and influenza H1N1 hemagglutinin protein. It was successful in quantitating ligand NMR signals in spectral regions otherwise occluded by resonances from mobile protein glycans. In early-pandemic (December 2019) B-origin-lineage SARS-CoV-2 spike trimer, a clear “end-on” binding mode of sialoside sugars to spike was revealed by uSTA. This mode contrasted with “extended-surface side”-binding for heparin sugar ligands. uSTA-derived restraints used in structural modeling suggested sialoside-glycan binding sites in a β sheet–rich region of spike N-terminal domain (NTD), distant from the receptor-binding domain (RBD) that binds ACE2 co-receptor and that has been identified as the site for other sugar interactions. Consistent with this NTD site being a previously unknown sialoside sugar-binding pocket, uSTA-sialoside binding was minimally perturbed by antibodies that neutralize the ACE2-binding RBD domain. Strikingly, uSTA also shows that this sialoside binding is disrupted in spike from multiple variants of concern (B1.1.7/alpha, B1.351/beta, B.1.617.2/delta, and B.1.1.529/omicron) that emerged later in the pandemic (September 2020 onward). Notably, these variants possess multiple hotspot mutations in the NTD. End-on sialoside binding in a B-origin-lineage spike-NTD pocket was pinpointed by cryo-EM to a previously unknown site that is created from residues that are notably mutated or are in regions where mutations occur in variants of concern (e.g., His69, Val70, and Tyr145 in alpha and omicron). An analysis of beneficial genetic variances correlated with disease severity in cohorts of patients from early 2020 suggests a model in which this site in the NTD of B-origin-lineage SARS-CoV-2 (but not in later variants) may have exploited a specific sialylated polylactosamine motif found on tetraantennary human N-linked glycoproteins, known to be present in deeper human lung. CONCLUSION Together, these results confirm a distinctive sugar-binding mode mediated by the unusual NTD of B-origin-lineage SARS-CoV-2 spike protein that is lost in later variants. This may implicate modulation of binding by SARS-CoV-2 virus to human cell surface sugars as a determinant of virulence and/or zoonosis. More generally, because cell surface glycans are widely relevant to biology and pathology, the uSTA method can now provide ready, quantitative, widespread analysis of complex, host-derived, and posttranslationally modified proteins in their binding to putative ligands, which may be relevant to disease, even in previously confounding complex systems. uSTA identifies a sugar-binding site and pose in the unusual NTD of early-pandemic SARS-CoV-2 spike. (A) This site and pose, which were confirmed by cryo–electron microscopy, are heavily mutated in variants of concern. (B) Analyses reveal a loss of sialoside-spike binding, rationalized by clustering of mutations around the binding site. *All alpha-variant mutations are in >1 variant.

INTRODUCTION: The surface proteins found on both pathogens and host cells mediate cell entry (and exit) and influence disease progression and transmission. Both types of proteins can bear host-generated posttranslational modifications, such as glycosylation, that are essential for function but can confound current biophysical methods used for dissecting key interactions. Several human viruses (including non-SARS coronaviruses) attach to host cell surface N-linked glycans that include forms of sialic acid (sialosides). There remains, however, conflicting evidence as to whether or how SARS-associated coronaviruses might use such a mechanism. In the absence of an appropriate biochemical assay, the ability to analyze the binding of such glycans to heavily modified proteins and resolve this issue is limited. RATIONALE : We developed and demonstrated a quantitative extension of "saturation transfer" protein nuclear magnetic resonance (NMR) methods to a complete mathematical model of the magnetization transfer caused by interactions between protein and ligand. The designed method couples objective resonance identification and intensity measurement in NMR spectra (via a deconvolution algorithm) with Bloch-McConnell analysis of magnetization transfer (as judged by this resonance signal intensity) to enable a structural, kinetic, and thermodynamic analysis of ligand binding. Such quantification is beyond previously perceived limits of exchange rates, concentration, or system and therefore represents a potentially universal saturation transfer analysis (uSTA) method. RESULTS: In an automated workflow, uSTA can be applied to a range of even heavily modified protein systems in a general manner to obtain quantitative binding interaction parameters (K D , k Ex ). uSTA proved critical in mapping direct interactions between sialoside sugar ligands and relevant virus surface attachment glycoproteins, including multiple variants of both severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein and influenza H1N1 hemagglutinin protein. It was successful in quantitating ligand NMR signals in spectral regions otherwise occluded by resonances from mobile protein glycans. In early-pandemic (December 2019) B-origin-lineage SARS-CoV-2 spike trimer, a clear "end-on" binding mode of sialoside sugars to spike was revealed by uSTA. This mode contrasted with "extended-surface side"-binding for heparin sugar ligands. uSTAderived restraints used in structural modeling suggested sialoside-glycan binding sites in a b sheet-rich region of spike N-terminal domain (NTD), distant from the receptor-binding domain (RBD) that binds ACE2 co-receptor and that has been identified as the site for other sugar interactions. Consistent with this NTD site being a previously unknown sialoside sugar-binding pocket, uSTA-sialoside binding was minimally perturbed by antibodies that neutralize the ACE2-binding RBD domain. Strikingly, uSTA also shows that this sialoside binding is disrupted in spike from multiple variants of concern (B1.1.7/alpha, B1.351/beta, B.1.617.2/delta, and B.1.1.529/omicron) that emerged later in the pandemic (September 2020 onward). Notably, these variants possess multiple hotspot mutations in the NTD. End-on sialoside binding in a B-origin-lineage spike-NTD pocket was pinpointed by cryo-EM to a previously unknown site that is created from residues that are notably mutated or are in regions where mutations occur in variants of concern (e.g., His 69 , Val 70 , and Tyr 145 in alpha and omicron). An analysis of beneficial genetic variances correlated with disease severity in cohorts of patients from early 2020 suggests a model in which this site in the NTD of B-origin-lineage SARS-CoV-2 (but not in later variants) may have exploited a specific sialylated polylactosamine motif found on tetraantennary human N-linked glycoproteins, known to be present in deeper human lung.
CONCLUSION: Together, these results confirm a distinctive sugar-binding mode mediated by the unusual NTD of B-origin-lineage SARS-CoV-2 spike protein that is lost in later variants. This may implicate modulation of binding by SARS-CoV-2 virus to human cell surface sugars as a determinant of virulence and/or zoonosis. More generally, because cell surface glycans are widely relevant to biology and pathology, the uSTA method can now provide ready, quantitative, widespread analysis of complex, host-derived, and posttranslationally modified proteins in their binding to putative ligands, which may be relevant to disease, even in previously confounding complex systems. ▪ S ialosides are present in glycans that are anchored to human cells, and they mediate binding that is central to cell-cell communication in human physiology and that is at the heart of many host-pathogen interactions. One of the most well-known examples is that of influenza virus, which binds to sialosides with its hemagglutinin (HA or H) protein and cleaves off sialic acid from the infected cell with its neuraminidase (NA or N) protein; HxNx variants of influenza with different HA or NA protein types have a profound effect on zoonosis and pathogenicity (1).
The Middle East respiratory syndrome [MERS (2)] virus, which is related to severe acute respiratory syndrome coronavirus 1 and 2 (SARS-CoV-1 and -2), has been shown to exploit cell-surface sugar sialosides (2-6) as part of an attachment strategy. Both SARS-CoV-1 (7)(8)(9) and SARS-CoV-2 (10,11) are known to gain entry to host cells through the use of receptor-binding domains (RBDs) of their respective spike proteins that bind human cell-surface protein ACE2, but whether these viruses engage sialosides as part of the infection cycle has, despite predictions (6,12), remained unclear. Preliminary reports as to whether complex sialosides are or are not bound are contradictory and format-dependent (13)(14)(15). Glycosaminoglycans on proteoglycans such as heparin have been identified as a primary cooperative glycan attachment point (16,17). Studies reporting sugar binding have so far implicated binding sites in or close to the RBD of the spike protein. Surprisingly, the N-terminal domain (NTD, fig. S1), which has a putative glycan binding fold (10,18,19) and binds sialosides in other non-SARS coronaviruses (including MERS), has been less explored. The NTD has no confirmed function in SARS-CoV-2, and yet neutralization by antibodies against this domain suggests a potentially important function in viral replication. The unresolved role of host cell-surface sialosides for this pathogen has been noted as an important open question (17). The hypothesized roles for sugar interactions (20) in both virulence (21) and zoonosis (1) indicate that there is an urgent need for precise, quantitative, and robust methods for analysis.
In principle, magnetization transfer in protein nuclear magnetic resonance (NMR) spectroscopy could meet this need, as it can measure ligand binding in its native state without the need for additional labeling or modification of either ligand or protein (e.g., attachment to surface or sensor) (figs. S2 and S3; see text S1 for more details). Saturation transfer difference (STD) (22), which has been widely used to gauge qualitative ligand-protein interactions (23), detects the transfer of magnetization while they are bound via "cross-relaxation." In reality, complex, highly modified protein systems have proven difficult to analyze in a quantitative manner with current methods for several reasons. First, mammalian proteins (or those derived by pathogens from expression in infected mammalian hosts) often bear large, highly mobile glycans. Critically, in the case of glycoproteins such as SARS-CoV-2 spike that may themselves bind glycans, this leads to contributions to protein NMR spectra that may overlap with putative glycan ligand resonances, thereby obscuring needed signal. Second, the NMR spectra of glycan ligands are themselves complex, comprising many overlapped resonances as multiplets and limiting the accurate determination of signal intensities. Finally, STD is commonly described as limited to specific kinetic regimes and/or ligand-toprotein binding equilibrium positions (24). As a result, many regimes and systems have been considered inaccessible to STD.
Using a rigorous theoretical description, coupled with a computational approach based on a Bayesian deconvolution algorithm to objectively and accurately extract signal from all observed resonances, we have undertaken an optimized reformulation of the magnetization/ "saturation" transfer protocol (figs. S4, S5, S6, and S8). This approach reliably and quantitatively determines precise binding rates (k on , k off , k ex ), constants (K D ), and interaction "maps" across a wide range of regimes ( fig.  S4), including systems previously thought to be intractable.
Design of uSTA based on a comprehensive treatment of ligand-protein magnetization transfer While using existing STD methodology to study the interaction between the SARS-CoV-2 spike protein and sialosides, we noted several challenges that resulted in the development of uSTA (see text S3 for more details). Our theoretical analyses ( fig. S4) suggested that many common assumptions or limits that are thought to govern the applicability of magnetization transfer might in fact be circumvented, and we set out to devise a complete treatment that might accomplish this (figs. S5, S6, and S8). This resulted in five specific methodological changes that resulted in a more sensitive, accurate, quantitative, and general method for studying the interactions between biomolecules and ligands (summarized in figs. S5 and S8 and discussed in detail in text S4).
1) We noted the discrepancy between K D s determined using existing STD methods and those obtained using other biophysical methods (25). We performed a theoretical analysis using the Bloch-McConnell equations, a rigorous formulation for studying the evolution of magnetization in exchanging systems that has been widely used to analyze chemical exchange saturation transfer (26,27), dark-state exchange saturation transfer (28), and Carr-Purcell-Meiboom-Gill (29) NMR data to describe protein motion. This analysis not only allowed us to explain this discrepancy, but also enabled fitting of data to give accurate k on , k off , and K D values for protein-ligand interactions that were in excellent accord with alternative measurements ( Fig. 1G and fig. S13); we also found that the range of k on and k off in which the experiment is applicable is far wider than previously recognized ( fig. S4).
2) In mammalian proteins, contributions from glycans on the surface of the protein could not be removed from the spectrum by means of relaxation filters used in epitope mapping (30) without compromising the sensitivity of the experiment. We addressed this instead by applying baseline subtraction using data obtained from a protein-only sample.
3) The magnetization transfer, and hence the sensitivity of the experiment, will be higher when the excitation frequency of the saturation pulse is close to a maximum in the protein NMR spectrum. If any ligand resonances are outside of the bandwidth of the pulse, and if a "ligand-only" subtraction is applied, the magnetization transfer can be maximized. With this condition, the response for a given protein-ligand system in fact becomes invariant to the excitation frequency used (figs. S9 and S10). 4) In complex molecules, such as sialosides, NMR spectra are crowded and overlapped. To reliably obtain magnetization transfer measurements at all points in the ligand, we developed a peak-picking algorithm based on earlier work (31) that can automate the process, returning a list of peak locations and a simulated NMR experiment that can be directly compared to the data. The locations of the peaks are in excellent agreement with the locations for multiplets determined using stan-dard multidimensional approaches used for resonance assignment (see Fig. 1 ; see also  overlaps in all subsequent uSTA analyses and  table S7). 5) The uSTA software allowed the combination of intensity from scalar coupled multiplets following a user-input assignment, to provide "per resonance" measures of saturation transfer. These are provided as-is, and also as h1/r 6 i interpolated "binding maps" that represent the interaction on nearby heteroatoms, thereby allowing ready visual inspection of the binding pose of a molecule.

Testing of uSTA in model systems
The uSTA method (Fig. 1, A to C) was tested first in an archetypal, yet challenging, ligandto-protein interaction (Fig. 1, D to G). Implementation in an automated manner through software governing the uSTA workflow reduced artifacts arising from subjective, manual analyses ( fig. S6). The binding of L-tryptophan (Trp) to bovine serum albumin (BSA, Fig. 1D) is a long-standing benchmark (25) because of the supposed role of hydrophobicity in the plasticity of this interaction as well as a lack of corresponding fully determined, unambiguous three-dimensional (3D; e.g., crystal) structures. This is also a simpler amino acidprotein interaction system (less-modified protein, small ligand) that classical NMR/STD methods are perceived (24,32) to have already delineated well.
As for a standard STD experiment, 1D 1 H-NMR spectra were determined for both ligand and protein. In addition, mixed spectra containing both protein and excess ligand (P + L) were determined with and without excitation irradiation at frequencies corresponding to prominent resonance within the protein but far from any ligand (pulse "on") or where the center of the pulse was moved to avoid ligand and protein (pulse "off," labeled "1D" in the figures). Deconvolved spectra for ligand determined in the presence of protein were matched with high accuracy by uSTA (Fig. 1E). Moreover, uSTA generated highly consistent binding "heatmaps" comprising atom-specific magnetization transfer efficiencies (proton data mapped onto heteroatoms by taking a local 1/r 6 average to enable visual comparison) that described the pose of ligand bound to protein ( Fig. 1F; see also figs. S8 and S11). These were determined over a range of ligand concentrations even as low as 40 mM [ Fig. 1, E(iii) and F(iii)] where the ability of uSTA to extract accurate signal proved unprecedented and critical to quantitation of binding (see below). Binding maps were strikingly consistent across concentrations, indicating a single, consistent pose driven by the strongest interaction of protein with the heteroaromatic indole side chain of Trp. This not only proved consistent with x-ray crystal structures of BSA with other hydrophobic ligands, (33) it also revealed quantitative subtleties of this interaction at high precision: Protein "grip" is felt more at the distal edge of the indole moiety.
Next, with indications of expanded capability of uSTA in a benchmark system, we moved to first analyses of sugar-protein interactions. Sugar ligand trehalose (Tre) binds only weakly to trehalose repressor protein TreR and so proves challenging in ligand-to-protein interaction analysis (34). Nonetheless, the uSTA workflow again successfully and rapidly determined atom-specific transfer efficiencies with high precision and resolution ( fig. S3F). Atomprecise subtleties were revealed in this case as well: Hotspots of binding occur around OH-3/ OH-4 and graduate to reduced binding around both sugar rings, with only minimal binding of the primary OH-6 hydroxyl ( fig. S3F). Once more, this uSTA-mapped P + L interaction proved consistent with prior x-ray crystal structures (35).
Direct determination of ligand-protein K D using uSTA The precision of signal determination in uSTA critically allowed variation of ligand/protein concentrations even down to low levels (see above), enabling direct determination of binding constants in a manner not possible by classical methods. Following measurement of magnetization transfer between ligand and protein, variation with concentration ( Fig. 1E) was quantitatively analyzed using modified Bloch-McConnell equations (36) (see Methods). These accounted for intrinsic relaxation, crossrelaxation, and protein-ligand binding (Fig.  1A) to directly provide measurements of equilibrium binding K D and associated kinetics (k ex ). In the Trp/BSA system, this readily revealed K D = 38 ± 15 mM, k on = 1.6 (± 0.6) × 10 5 M -1 s -1 , and k off = 6.0 ± 2.0 s -1 (Fig. 1G), consistent with prior determinations of K D by other solution-phase methods [K D = 30 ± 9 mM by isothermal calorimetry (32)]. Note that this direct method proved to be possible only because of the ability of the uSTA method to deconvolute a true signal with sufficient precision, even at the lower concentrations used and consequently lower signal (Fig. 1E). Thus, uSTA enabled atom-mapping and quantitation for ligand binding that were improved over previous methods. Critically, these values were fully consistent with all observed NMR data and independently obtained measures of K D .

uSTA allows interrogation of designed crypticity in influenza HA virus attachment
Having validated the uSTA methodology, we next used it to interrogate sugar binding by viral attachment protein systems that have proved typically intractable to classical methods. The hemagglutinin (HA) trimer of influenza (A to C) Schematic of the process for uSTA that exploits comprehensive numerical analysis of relaxation and ligand binding kinetics (A) using full and automatically quantified signal intensities in NMR spectra (B) and calculates per-resonance transfer efficiencies (C). In (B), signal analysis determined the number of peaks that can give rise to the signal, and returned simulated spectrum by convolving these with peak shape function. The precise peak positions returned are in excellent agreement with the known positions of resonances identified using conventional means. Magnetization transfer NMR experiments compare two 1D NMR spectra, where the second involves a specific saturation pulse that aims to "hit" the protein but "miss" the ligand in its excitation. This is accomplished by acquiring the 1D spectrum with the saturation pulse held off resonance at -35 ppm such that it will not excite protons in either ligand or protein [labeled "1D" in (B) and (C)]. The uSTA method requires these two spectra to be analyzed as described in (B), in pairs, one that contains the raw signal, and the second that is the difference between the two. We define the "transfer efficiency" as the fractional signal that has passed from the ligand to the protein. (D to F) Application of uSTA to study the interaction between bovine serum albumin (BSA) and L-tryptophan (Trp). In (D), the 1D 1 H-NMR spectrum of the mixture at 200 mM Trp and 5 mM BSA (= P + L, blue) is dominated by ligand, yet the ligand (L) and protein (P) can still be deconvolved by universal deconvolution, using a reference obtained from a sample containing protein only. This reveals contributions from individual multiplets originating from the ligand (yellow) and the protein-only baseline (black), allowing precise recapitulation of the sum (red). In (E), application of universal deconvolution to STD spectra with varying concentrations of tryptophan allows uSTA using ligand resonances identified in (D). This in turn allows signal intensity in the STD spectrum (P + L STD, light blue) to be determined with high precision. Although signal-to-noise in the STD increases considerably with increasing ligand concentration, the measured atom-specific transfer efficiencies as determined by uSTA are consistent [(F), left, bar charts; right, transfer efficiency binding "maps"], showing that the primary contact between protein and ligand occurs on the distal edge (C-1, 2, 3, 4; N-7 and C-9 using the numbering shown) of the indole aromatic ring. Application of the same uSTA workflow also allowed precise determination of even weakly binding sugar ligand trehalose (Glc-a1, 1a-Glc) to E. coli trehalose repressor TreR. Again, uSTA allows determination of transfer efficiencies with atom-specific precision (see fig. S3F). (G) Quantitative analysis of the STD build-up curves using a modified set of Bloch-McConnell equations that account for binding and cross-relaxation allows us to determine thermodynamic and kinetic parameters that describe the BSA-Trp interaction, K D , k on , and k off . The values obtained are indicated and are in excellent accord with those obtained by other methods (25,32). Errors come from a bootstrapping procedure (see Methods).
A virus is known to be essential for its exploitation of sialoside binding (37); H1N1 has emerged as one of the most threatening variants in recent years. We took the H1 HA in both native form and a modified form, containing a non-natural sequon specific for N-glycosylation that was previously designed (38) to block intermolecular (in trans) sugar binding. This designed blocking in a so-called HA-DRBS variant (38) also notably creates an additional glycan beyond the existing, potentially confounding, glycosylation background. It therefore provided another test of uSTA's ability to delineate relevant sialoside ligand interactions in another important pathogen protein ( Fig. 2A). Despite this intended blocking, the precision of uSTA was such that residual in trans binding of sialotrisaccharide 2 could still be detected in HA-DRBS, albeit at a lower, modulated level [as expected by design (38)]. Although H1 HA is known to bind both sialo-trisaccharides 2 and 3, 2 is the less preferred (2,6-over 2,3-linked) ligand (39), and yet its binding could still be mapped (here, to a mode mediated primarily by the sialoside moiety). In this way, the sensitivity of uSTA to detect even lower sialoside binding to relevant proteins was confirmed.
uSTA reveals natural, cryptic sialoside binding by SARS-CoV-2 spike We next probed putative, naturally cryptic sialoside binding sites in SARS-CoV-2. Our analysis of the 1D protein 1 H-NMR spectrum of the purified prefusion-stabilized ectodomain construct (10) of intact trimeric SARS-CoV-2 spike attachment protein ( fig. S7) revealed extensive protein glycosylation with sufficient mobility to generate a strong 1 H-NMR resonance in the region 3.4 to 4.0 ppm (Fig. 3A). Although lacking detail, these resonances displayed chemical shifts consistent with the described mixed patterns of oligomannose, hybrid, and complex N-glycosylation found on SARS-CoV-2 spike after expression in human cells (40). As such, these mobile glycans on SARS-CoV-2 spike contain sialoside glycan residues that not only confound analyses by classical NMR methods but are also potential competing, "internal" (in cis) ligands for any putative attachment (in trans) interactions, as well as possible direct ligands for in trans interactions in their own right (41). Therefore, their presence in the protein NMR analysis presented clear confounding issues for typical classical STD analyses. As such, SARS-CoV-2 spike represented a stringent and important test of the uSTA method.
We used uSTA to evaluate a representative panel of both natural and site-specifically modified unnatural sialosides as possible ligands of spike ( Fig. 3 and figs. S3 and S12). Use of classical methods provided an ambiguous assessment ( fig. S8), but use of uSTA immediately revealed binding and nonbinding sugar ligands (Fig. 3, figs. S8, S10, and S11, and table S7). Initially, the simplest sialoside, N-acetylneuraminic acid (1), was tested as a mixture of its mutarotating anomers (1a ⇔ 1b). When Having confirmed simple, selective monosaccharide a-sialoside binding, we explored extended a-sialoside oligosaccharide ligands (compounds 2 and 3; Fig. 3, C and D) that would give further insight into the binding of natural endogenous human cell-surface sugars as well as unnatural variants (compounds 4 to 6; Fig. 3, F and G) that could potentially interrupt such binding. Sialosides are often found appended to galactosyl (Gal/GalNAc) residues in either a2,3-linked (2) or a2,6linked form (3). Both were tested (Fig. 3, C and D) and exhibited "end-on" binding consistent with that seen for N-acetyl-neuraminic acid (1) alone, but with more extended binding surfaces (Fig. 3D), qualitatively suggesting a stronger binding affinity (see below for quantitative analysis). Common features of all sialoside binding modes were observed: The NHAc-5 4 of 17  (38) DRBS variant is generated by the creation of an N-linked glycosylation site via the creation of needed sequon NQT from wild-type NQR by the R205T point mutation in HA adjacent to the sialic acid-binding site. In this way, disruption of sialic acid binding through designed "blocking" in HA-DRBS is intended to ablate the binding of HA wild-type to sialosides in this synthetic variant. (B and C) Notably, wild-type and DRBS variants of H1N1 HA in fact show a remarkably similar overall binding pattern (B) for the 2,3-sialo-trisaccharide 2 (focused through engagement with the sialoside) but a significant intensity moderation (C) for the DRBS variant, indicative of a partial (but not complete) loss of binding consistent with design (38). (D) Raw spectral data demonstrate that atom-specific differences in intensity in the 1D versus the difference spectra can be discerned using uSTA. Note that atom numbering shown and used here is generated automatically by uSTA.
acetamide of the terminal sialic acid (Sia) is a binding hotspot in 1, 2, and 3 that drives the "end-on" binding. Differences were also observed: The a2,6-trisaccharide (3) displayed a more extended binding face yet with less intense binding hotspots (Fig. 3D) engaging additionally the side-chain glycerol moiety (C7-C9) of the terminal Sia acid as well as the OH-4 C4 hydroxyl of the Gal residue. The interaction with a2,3-trisaccharide (2) was tighter and more specific to NHAc-5 of the Sia.
These interactions of the glycerol C7-C9 side chain detected by uSTA were probed further through construction (fig. S12) of unnatural modified variants (4 to 6; Fig. 3, F and G). Replacement of the OH-9 hydroxyl group of sialic acid with azide N 3 -9 in 4 [ Fig. 3F(iii) and table S7] was well tolerated, but larger changes (replacement with aromatic group biphenylcarboxamide BPC-9) in 5 and 6 led to an apparently abrupt shift in binding mode that was instead dominated by the unnatural hydrophobic aromatic modification ( Fig. 3G and Fig. 4C). As for native sugar 1, azide-modified sugar 4 also interacted with spike in a stereochemically specific manner with only the a-anomer . uSTA allowed precise dissection of interaction contributions in these unusual hybrid (natural-unnatural) sugar ligands that could not have been determined using classical methods (see text S5).
Using variable concentrations of the most potent natural ligand a2,3-trisaccharide 2 [6 mM spike, 2 at 60 mM, 200 mM, 1 mM, and 2 mM excitation at 5.3 ppm] and variable concentrations of spike protein, we used the uSTA method to directly determine solution-phase affinities ( Fig. 4A): K D = 32 ± 12 mM, k on = 6300 ± 2300 M -1 s -1 , and k off = 0.20 ± 0.08 s -1 . We also probed binding in a different mode by measuring the affinity of spike to 2 when displayed on a modified surface (fig. S13) using surface plasmon resonance (SPR) analysis. The latter generated a corresponding K D = 23.7 ± 3.6 mM (k on = 1004 ± 290 M -1 s -1 ). Such similar values for sialoside ligand in solution (by uSTA) or when displayed at a solid-solution interface (by SPR) suggested no substantial avidity gain from display of multiple sugars on a surface.
Structural insights from uSTA delineate binding to SARS-CoV-2 spike uSTA analyses consistently identified binding hotspots in sugars 1 to 4 providing the highest transfer efficiencies in an atom-specific manner, particular the "end" NHAc-5 acetamide methyl group of the tip sialic acid residue in all. A combination of uSTA with so-called high ambiguity-driven docking (HADDOCK) methods (42,43) was then used to probe likely regions in SARS-CoV-2 spike for this "end-on" binding mode via uSTA data-driven atomistic Buchanan   . The uSTA process of ligand peak assignment and deconvolution → P + L peak assignment and deconvolution → application to P + L STD yields precise atom-specific transfer efficiencies ( fig. S6). Note how in (ii) individual multiplet components, have been assigned (yellow); the back-calculated deconvolved spectrum (red) is an extremely close match for the raw data (purple). The spectrum is a complex superposition of the ligand spectrum (and protein only yet uSTA again accurately deconvolves the spectrum, revealing the contribution of protein-only (black) and the ligand peaks (yellow). Using these data, uSTA analysis of the STD spectrum pinpoints ligand peaks and signal intensities. Spectral atom numbering shown and used here is generated automatically by uSTA; all other numbering in sugars follows carbohydrate nomenclature convention. . Spike shows strong binding preference for the a anomers [F(i, iii) versus F(ii, iv)] despite this strong population difference. Binding surfaces were also highly similar to those of extended trisaccharides 2 and 3.
(G) Using these intensities, atom-specific transfer efficiencies can be determined with high precision, shown here for hybrid sialoside 5. The details of both the unnatural BPC moiety and the natural sialic acid moiety can be mapped; although the unnatural aromatic BPC dominates interaction, uSTA nonetheless delineates the subtleties of the associated contributions from the natural sugar moiety in this ligand (see also figs. S5 and S6). . The lower ligand concentrations yield data that are of lower sensitivity than higher concentrations. The transfer efficiencies will be higher in this case, as more molecules are effectively involved in the binding. Thus, data at lower concentrations will in general have more scatter and higher transfer efficiencies. These data points are desirable for the analysis, as it is here where we expected the greatest variation of transfer efficiency with ligand concentration. The analysis is applied globally and so the uncertainties in the final fitted parameters from the bootstrapping analysis (see Methods) provide a direct and confident measure of the goodness of fit. (B) Normalized uSTA transfer efficiencies of the NAc-5 methyl protons can be determined for each ligand studied here. This allowed relative contributions to "end on" binding to be assessed via uSTA in a "modespecific" manner. This confirmed strong aover b-sialoside selectivity. Errors were determined through a bootstrapping procedure where mixing times were sampled with replacement, allowing for the construction of histograms of values in the various parameters that robustly reflect their fitting errors. models. In each case, a cluster of likely poses emerged (Fig. 4D) for 1a, 2, and 3 (see fig. S14 for details) consistent with "end-on" binding where the acetamide NHAc-5 methyl group of the sialic acid moiety was held by the unusual b sheet-rich region of the NTD of SARS-CoV-2 spike. Under the restraints of uSTA and homology, a glycan-binding pocket was delineated by a triad of residues (Phe 79 , Thr 259 , and Leu 18 ) mediating aromatic, carbonyl hydrogen-bonding, and hydrophobic interactions, respectively. However, the sequence and structural homology to prior (i.e., MERS) coronavirus spike proteins in this predicted region was low; the MERS spike protein uses a corresponding NHAc-binding pocket characterized by an aromatic (Phe 39 ), hydrogen-bonding (Asp 36 ), and hydrophobic (Ile 132 ) triad to bind the modified sugar 9-O-acetyl-sialic acid (5). SARS-CoV-2 glycan attachment mechanisms have to date only identified a role for spike RBD in binding rather than NTD (15,17). We used uSTA to compare the relative potency of the sialoside binding identified here to previously identified (17) heparin binding motifs (Fig. 5, A  and B). Heparin sugars 7 and 8 of similar size to natural sialosides 3 and 4 were selected so as to allow a near ligand-for-ligand comparison based on similar potential binding surface areas. 7 and 8 also differed from each other only at a single glycan residue (residue 2) site to allow possible dissection of subtle contributions to binding. Unlike the "end-on" binding seen for sialosides 3 and 4, uSTA revealed an extended, nonlocalized binding interface for 7 and 8 consistent instead with "side-on" binding [ Fig. 5, A(ii) and B(ii)].
Finally, to explore the possible role of sialoside binding in relation to ACE2 binding, we also used uSTA to probe the effects upon binding of the addition of a known, potent neutralizing antibody of ACE2-spike binding, C5 (Fig.  5, D and E, and fig. S16) (45,46). Assessment of binding to sialoside 2 in the presence and absence of antibody at a concentration sufficient to saturate the RBD led to only slight reduction in binding. Uniformly modulated atomic transfer efficiencies and near-identical binding maps (Fig. 5, E and F) were consistent with a maintained sialoside-binding pocket with undisrupted topology and mode of binding.
Together, these findings allow us to conclude that the sialoside binding observed with uSTA involves a previously unidentified "endon" mechanism/mode that operates in addition to and potentially cooperatively with Buchanan   ACE2 binding in SARS-CoV-2. The primary sialoside glycan-binding site SARS-CoV-2 spike is distinct from that of heparin ("end-on" versus "side-on"), not in the RBD (not neutralized by RBD-binding antibody), and found instead in an unusual NTD region that has become altered in emergent variants (loss of binding in alpha and beta variants of concern). Cryo-EM pinpoints the sialoside-binding site in B-origin-lineage spike Structural analysis of the possible binding of sialosides has been hampered to date by the moderate resolution, typically less than 3 Å, of most SARS-CoV-2 spike structures. Currently deposited cryo-EM-derived coulombic maps show that the NTD of spike is often the least well-resolved region. In our initial attempts with native protein, large stretches of amino acids within the NTD were not experimentally located (45); the most disordered regions occur in the NTD regions that contribute to the surface of the spike. A stabilized closed mutant form (47) of the spike was examined and gave improvements, but the coulombic map was still too weak and noisy to permit tracing of many of the loops in the NTD. However, with a reported fatty acidbound form of the spike (48), which has shown prior improved definition of the NTD, we were able to collect a 2.3 Å dataset in the presence of the a2,3-sialo-trisaccharide 9. The map was clear for almost the entire structure including the previously identified linoleic acid (fig. S17D); only 13 N-terminal residues and two loops (residues 618 to 632 and 676 to 689) were not located. Although the density is weaker at the outer surface of the NTD than at the core of the structure ( fig. S17B), the map was of sufficient quality to model N-glycosylation at site Asn 149 , which is in a flexible region, and the fucosylation state of N-linked glycans at Asn 165 (Fig. 6A). We observed density in a pocket at the surface of the NTD lined by residues His 69 , Tyr 145 , Trp 152 , Gln 183 , Leu 249 , and Thr 259 (Fig. 2B and  fig. S17). This density is absent in other spike structures of higher than 2.7 Å resolution (PDB IDs 7jji, 7a4n, 7dwy, 6x29, 6zge, 6xlu, 7n8h, 6zb5, and 7lxy), even those (such as PDB IDs 7jji, 6zb5, and 6zge) that have a well-ordered NTD. The density when contoured at 2.6s is fitted by an a-sialoside consistent with the terminal residue of 9, with the distinctive glycerol and N-acetyl groups clear. To further strengthen our confidence in the identification of the sialic acid, we determined a native (unsoaked) structure to 2.4 Å using the same batch of protein ( fig. S17D). This structure showed no density in the sialic acid binding site supporting our assignment. The sialoside was therefore included in the refinement and the thermal factors (108 Å 2 ) were comparable to those for the adjacent protein residues (95 to 108 Å 2 ). In this position, the glycerol moiety makes a hydrogen bond with the side chain of Tyr 145 , the N-acetyl group with Ser 247 , and the carboxylate with Gln 183 , and there are hydrophobic interactions with Trp 152 . Lowering the map threshold to 1.6s would be consistent with a second pyranoside (e.g., galactoside) residue ( fig. S17C). At this contour level, the map clearly covers the axially configured carboxylate of the sialic acid ( fig. S17). The middle galactoside residue of 9 positioned in this density would make contacts with Arg 248 and Leu 249 .
Structural superposition of the NTD with that of the MERS spike (RCSB 6NZK) shows that although the sialic acid-binding pockets of both are on the outer surface, these pockets are 12 Å apart (as judged by the C2 atom of respective sialic acids; Fig. 6D). In MERS spike, sialic acid is bound at the edge of the central b sheet, whereas in SARS-CoV-2 the sugar is bound at the center of the sheet; thus, the pockets use different elements of secondary structure. Because of distinct changes in the structure of the loops connecting the strands, the sialic acid pocket from one protein is not present in the other protein. Several regions of additional density were not fitted by the model ( fig. S18).

Disclosure of sialoside trisaccharide as a ligand for B-origin-lineage SARS-CoV-2 correlates with clinical genetic variation in early-phase pandemic
A distinctive mode of sialoside binding by spike confirms a potential attachment point for SARS-CoV-2 found commonly on cell surfaces (sialosides are attached both as glycolipid and glycoprotein glycoconjugates), thus raising the question of whether glycosylation function in humans affects infection by SARS-CoV-2 and hence the presentation and pathology of COVID-19 disease. Analysis of whole-exome sequencing data of an early 2020 cohort of 533 COVID-19-positive patients (see table S1) identified two glycan-associated genes within the top five that were most influential upon disease severity. Specifically, recursive feature elimination applied to a LASSO (least absolute shrinkage and selection operator)-based (49) logistic regression model identified LGALS3BP (fourth of >18,000 analyzed genes) and B3GNT8 (fifth of >18,000) ( Fig. 7A and fig. S19). Variants in these two genes were beneficially associated with less severe disease outcome (Fig. 7, B and C; see also tables S2 to S6 for specific B3GNT8 and LGALS3BP genetic variants, B3GNT8 c 2 five categories, B3GNT8 c 2 2×2, LGALS3BP c 2 five categories, LGALS3BP c 2 2×2, respectively).

Discussion
Experimentally, there are still few, if any, orthogonal approaches to the useful surface display methods (e.g., "glycoarrays") currently used for readily surveying ligands that might be exploited by pathogens. Following validation in model ligand-protein systems, uSTA provided a ready method for identifying sugar ligands bound by pathogens, as well as their binding parameters and poses, even in posttranslationally modified (e.g., glycosylated) protein systems.
In an influenza virus HA protein variant designed to abolish binding through competition by an added glycan site on HA (55), uSTA was nonetheless able to unambiguously reveal and "map" residual sialoside binding despite the presence of an added protein-linked glycan as "internal blocker." This is a protein type that has been well-studied in array formats (39); we showed here that, even with a "blocked" HA (in a glycosylated state) and a non-preferred 2,3-sialoside ligand, binding could still be mapped by uSTA.
In the spike trimer of the B-origin lineage of SARS-CoV-2, despite the presence of mobile, protein-linked glycans, uSTA clearly revealed sialoside binding and, through mapping, revealed that this binding is more potent when the sialosyl moiety terminates galactosyl oligosaccharides. This pose is in agreement with our cryo-EM structures, which show that the NHAc-5 N-acetyl group at the sugar's tip is buried, a mode of binding we refer to as "endon." Prior modeling was partly misled by use of lower-resolution structures of spike, because the NTD is highly disordered in these initial structures. uSTA and cryo-EM also identified a second, differing mode of binding in SARS-CoV-2 spike only by hybrid, aromatic sugars (e.g., 5, 6, 9) driven by aromatic engagement, but the physiological relevance of this binding pocket is currently unclear. We cannot exclude the presence of another sialic acid-binding site (15), but there is no structural support for it.
The spike sialoside-binding site in the NTD is also coincident or near to numerous mutational and deletion hotspots in, for example, alpha and omicron (His 69 , Val 70 , Tyr 145 ) and beta (b strand Leu 241 -Leu 242 -Ala 243 -Leu 244 ) variants of concern (Fig. 5C). Changes remove either key interacting residues (alpha/omicron variant: residues 69, 70, 145) or perturb structurally important residues (beta variant: b strand) that form the pocket. The "end-on" binding we observed is quite different to the "side-on" binding observed for heparin (17). Heparins are often bound in a non-sequencespecific, charge-mediated manner, consistent with such a "side-on" mode. The location of three binding sites in the trimer, essentially at the extreme edges of the spike (Fig. 6C), also imposes substantial geometric constraints for avidity enhancement through multivalency (56,57).
We also find a clear link between our data and genetic analyses of patients that correlate with the severity of their disease. This association suggests potential roles in infection and disease progression for cell-surface glycans and the two glycan-associated genes that we have identified. Despite their independent identification here, both gene products interact around a common glycan motif: the poly-LacNAc chain-extension variants found in tetraantennary N-linked glycoproteins. Consistent with the sialoside ligands found here, these glycoproteins contain Sia-Gal-GlcNAc motifs within N-linked polyLacNAc chains. These motifs have recently been identified in the deeper human lung (58).
These data lead us to suggest that B-originlineage SARS-CoV-2 virus may have exploited glycan-mediated attachment to host cells (Fig.  7D) using N-linked polyLacNAc chains as a foothold. Reduction of Gal-3-BP function would allow its target, the lectin Gal-3, to bind more effectively to N-linked polyLacNAc chains, thereby competing with SARS-CoV-2 virus. Similarly, loss of b3GnT8 function would ablate the production of foothold N-linked poly-LacNAc chains, directly denying the virus a foothold. We cannot exclude other possible mechanisms including, for example, the role of N-linked polyLacNAc chains in T cell regulation (59) or glycolipid ligands (15). This analysis of the influence of genetic variation upon susceptibility to virus was confined to "first wave," early-pandemic patients infected with B-origin-lineage SARS-CoV-2. Our discovery here also that in B-lineage virus such Values were interpolated in a 1/r 6 manner onto heteroatoms following the same procedure according to the described NMR methods, and the color bar was scaled from zero to the maximum value. binding to certain sialosides was ablated in later phases of the pandemic (after September 2020) in variants of concern further highlights the dynamic role that sugar binding may play in virus evolution and may be linked, as has been previously suggested for H5N1 influenza A virus, to the "switching" of sugar-binding preferences by pathogens during or after zoonotic transitions (1). The focused "end-on" binding of the N-acetyl group in the N-acetyl-sialosides, which are found as the biosynthetically exclusive form of sialosides in humans (60), might have been a contributing factor in driving zoonosis. Finally, our data also raise the question of why binding might be ablated in later variants of SARS-CoV-2. Again by comparison with influenza, which uses neuraminidases for the purpose of "release" when budding from a host cell (61), we speculate that in the absence of its own encoded neuraminidase, SARS-CoV-2 must walk a tight balance between the ability to bind human host glycans (potentially useful in a zoonotic leap) and cell-tocell transmission (where release could become rate-limiting). One answer to this problem would be to ablate N-glycan binding via the sialoside motif subsequent to a successful zoonotic leap. This solution also has the advantage of removing a potential site for antibody neutralization for an interaction that might prove pivotal or critical in the context of zoonosis as a potentially global driver of virus fitness. Our combined data and models may therefore support decades-old hypotheses (20) proposing the benefit of cryptic sugar binding by pathogens that may be "switched on and off" to drive fitness in a different manner (e.g., in virulence or zoonosis) as needed.

Protein expression and purification: SARS-CoV-2 spike
The templates for wild type, alpha, and beta spike were kindly provided by P. Supasa and G. Screaton (University of Oxford). The gene encoding amino acids 1 to 1208 of the wild type, alpha, and beta SARS-CoV-2 spike glycoprotein ectodomain [with mutations of RRAR → GSAS at residues 682-685 (the furin cleavage site) and KV → PP at residues 986-987, as well as inclusion of a T4 fibritin trimerization domain] was cloned into the pOPINTTGneo-BAP vector using the forward primer (5′-GTCCAAG-TTTATACTGAATTCCTCAAGCAGGCCACCAT-GTTCGTGTTCCTGGTGCTG-3′) and the reverse primer (5′-GTCATTCAGCAAGCTTAAAAAGG-TAGAAAGTAATAC-3′), resulting in an aviTag/ Bap sequence plus 6His in the 3′ terminus of the construct. The template for B-lineageorigin (wild type) spike is previously described (62). The templates for the alpha and beta spike are in the supplementary materials.
For B-lineage-origin, alpha, and beta spike, Expi293 cells (Thermofisher Scientific) were used to express the Spike-Bap protein. The cells were cultured in Expi293 expression media (Thermofisher Scientific) and were transfected using PEI MAX 40kDa (Polyscience) if cells were >95% viable and had reached a density of 1.5 × 10 6 to 2 × 10 6 cells per ml. Following transfection, cells were cultured at 37°C and 5% CO 2 at 120 rpm for 17 hours. Enhancers (6 mM valproic acid, 6.5 mM sodium propionate, 50 mM glucose, all from Sigma) were then added and protein was expressed at 30°C for 5 days before purification.
For the purification of wild type, alpha, beta, delta, and omicron spike, the medium in which the spike protein was secreted was supplemented with 1× PBS buffer at pH 7.4 (1:1 v/v) and 5 mM NiSO 4 . The pH was adjusted with NaOH to pH 7.4 and filtered using a 0.8-mm filter. The mixture was stirred at 150 rpm for 2 hours at room temperature. The spike protein was purified on an Akta Express system (GE Healthcare) using a 5-ml His trap FF GE Healthcare column in PBS, 40 mM imidazole, pH 7.4, and eluted in PBS, 300 mM imidazole, pH 7.4. The protein was then injected onto either a Superdex 200 16/600 or 10/300 gel filtration column (GE Healthcare) in deuterated PBS buffer, pH 7.4. The eluted protein was concentrated using an Amicon Ultra-4 100kDa concentrator at 2000 rpm, 16°C (prewashed multiple times with deuterated PBS) to a concentration of roughly 1 mg/ml.

Errors
The errors in the transfer efficiencies were estimated using a bootstrapping procedure. Specifically sample STD spectra were assembled through taking random combinations with replacement of mixing times, and the analysis to Buchanan  identified, B3GNT8 and LGALS3BP produce gene products bGlcNAcT8 and Gal-3-BP, respectively, that manipulate and/or engage with processes associated with a common polyLacNAc-extended chain motif found on tetraantennary N-linked glycoproteins. A model emerges in which any associated loss of function from variance leads either to loss of polyLacNAc-extended chain (due to loss of initiation by bGlcNAcT8) or enhanced sequestration of by Gal-3 polyLacNAc-extended chain (which is antagonized by Gal-3-BP). Both would potentially lead to reduced access of virus spike to uSTA-identified motifs.
obtain the transfer efficiency was performed on each. This process was repeated 100 times to enable evaluation of the mean and standard deviation transfer efficiency for each residue. Mean values correspond well with the value from the original analysis, and so we take the standard deviation as our estimate in uncertainty, which further is in accord with values obtained from independent repeated measurements.

Protein NMR experiments
All NMR experiments in table S8 were conducted at 15°C on a Bruker AVANCE NEO 600 MHz spectrometer with CPRHe-QR-1H/19F/ 13C/15N-5mm-Z helium-cooled cryoprobe. Samples were stored in a Bruker SampleJet sample loader while not in magnet, at 4°C. 1D 1H NMR spectra with w5 water suppression were acquired using the Bruker pulse sequence zggpw5, using the smooth square Bruker shape SMSQ.10.100 for the pulsed-field gradients. The spectrum was centered on the water peak, and the receiver gain was adjusted. Typical acquisition parameters were sweep width of 9615.39 Hz, 16 scans per transient (NS), with four dummy scans, 32,768 complex points (TD), and a recycle delay (d1) of 1 s for a total acquisition time of 54 s. Reference 1D spectra of protein-only samples were acquired similarly with 16,384 scans per transient with a total acquisition time of 12.5 hours.
An STD experiment with excitation sculpted water suppression was developed from the Bruker pulse sequence stddiffesgp.2. The saturation was achieved using a concatenated series of 50-ms Gaussian-shaped pulses to achieve the desired total saturation time (d20). The shape of the pulses was specified by the Bruker shape file Gaus.1.1000, where the pulse is divided into 1000 steps and the standard deviation for the Gaussian shape is 165 steps. The field of the pulse was set to 200 Hz, which was calculated internally through scaling the power of the high-power 90°pulse.
The total relaxation delay was set to 5 s, during which the saturation pulse was applied. The data were acquired in an interleaved fashion, with each individual excitation frequency being repeated eight times (L4) until the total desired number of scans was achieved. Again, the spectrum was centered on the water peak, and the receiver gain was optimized. After recording of the free induction decay (FID), and prior to the recycle delay, a pair of waterselective pulses wee applied to destroy any unwanted magnetization. For all gradients (excitation sculpting and spoil), the duration was 3 ms using the smooth-square shape SMSQ10.100.
In a typical experiment, two excitation frequencies were required, one exciting protein, and one exciting far from the protein (+20,000 Hz, +33 ppm from the carrier). A range of mixing times were acquired to allow us to carefully quantify the buildup curve to obtain K D values. Off-and on-resonance spectra were acquired for 16 saturation times, giving a total acquisition time of 8.7 hours.
The experiment was acquired as a pseudo-3D experiment, with each spectrum being acquired at a chosen set of excitation frequencies and mixing times. Recycle delays were set to 10 s for BSA + tryptophan STDs, and were 5 s otherwise.
For STD 10-to 50-ms Gaussian experiments, the saturation times used were every other time from the default STD: 0.1 s, 0.5 s, 0.9 s, 1.3 s, 1.7 s, 2 s, 3 s, 4 s. List1: For STD var freq 1, the on-resonance frequencies in Hz relative to an offset of 2820. 61   Spectra were also acquired on a 600-MHz spectrometer with Bruker Avance III HD console and 5-mm TCI CryoProbe, running TopSpin 3.2.6, recorded in table S9, and a 950-MHz spectrometer with Bruker Avance III HD console and 5-mm TCI CryoProbe, running TopSpin 3.6.1, recorded in table S11. The 950-MHz spectrometer used a SampleJet sample changer. Samples were stored at 15°C. The parameters used for the STD experiments were the same as above, with the following varying by instrument: On the 600-MHz spectrometer, typical acquisition parameters were sweep width of 9615.39 Hz with typically 128 scans per transient (NS = 16 * L4 = 8), 32,768 complex points in the direct dimension and two dummy scans, executed prior to data acquisition.
On the 950-MHz spectrometer, typical acquisition parameters were sweep width of 15,243.90 Hz with typically 128 scans per transient (NS = 16 * L4 = 8), 32,768 complex points in the direct dimension and 2 dummy scans, executed prior to data acquisition.
uSTA data analysis NMR spectra with a range of excitation frequencies and mixing times were acquired on ligand-only, protein-only, and mixed protein/ ligand samples ( fig. S6).
To analyze an STD dataset, two projections were created by summing over all 1D spectra and summing over all corresponding STD spectra. These two projections provide exceptionally high signal-to-noise, suitable for detailed analysis and reliable peak detection. The UnidecNMR algorithm was first executed on the 1D "pulse off" spectra to identify peak positions and intensities. Having identified possible peak positions, the algorithm then analyzes the STD spectra but only allowing resonances in places already identified in the 1D spectrum. Both analyses are conducted using the protein-only baselines for accurate effective subtraction of the protein baseline without the need to use relaxation filters (fig. S8).
The ligand-only spectra were analyzed similarly and in each case, excellent agreement with the known assignments was obtained, providing us with confidence in the algorithm. The mixed protein/ligand spectrum was then analyzed, which returned results very similar to the ligand-only case. Contributions from the protein, although small, were typically evident in the spectra, justifying the explicit inclusion of the protein-only baseline during the analysis. When analyzing the mixture, we included the protein-only background as a peak shape whose contribution to the spectrum could be freely adjusted. In this way, the spectra of protein/ligand mixtures could be accurately and quickly deconvolved, with the identified ligand resonances occurring in precisely the positions expected from the ligand-only spectra. The results from the previous steps were then used to analyze the STD spectra. As these have much lower signal-to-noise, we fixed the ligand peak positions to be only those previously identified. Otherwise the protocol performed as described previously, where we used protein-only STD data to provide a baseline.
These analyses allow us to define a "transfer efficiency," which is simply the ratio of the signal from a given multiplet in the STD spectrum to the total expected in the raw 1D experiment. To obtain "per atom" transfer efficiencies, signals from the various pre-assumed components on the multiplets from each resonance were first summed before calculating the ratio. In the software, this is achieved by manually annotating the initial peak list using information obtained from independent-assignment experiments (see figs. S21 and S22).
Over the course of the project, it became clear that subtracting the transfer efficiencies obtained from a ligand-only sample was an essential part of the method (figs. S9 and S10).
Depending on the precise relationship among the chemical shift of excitation, the location of the ligand peaks, and the excitation profile of the Gaussian train, we observed small apparent STD transfer in the ligand-only sample that cannot be attributed to ligand binding, arising from a small residual excitation of ligand protons, followed by internal cross-relaxation. It is likely that this excitation occurs at least in part via resonances of the ligand that are exchange-broadened, such as OH protons, which are not directly observed in the spectrum. When exciting far from the protein, zero ligand excitation is observed, as we would expect, but when exciting close to the methyls, or in the aromatic region, residual ligand excitation could be detected in ligand-only samples (figs. S9 and S10). Without the ligand-only correction, the uSTA surface may appear to be highly dependent on choice of excitation frequency. However, with the ligand correction, the relative uSTA profiles become invariant with excitation frequency. In general, therefore, we advise acquiring these routinely, and so the uSTA analysis assumes the presence of these data (figs. S6 and S8). The invariance of relative transfer efficiency with excitation frequency suggests that the internal evolution of magnetization within the protein during saturation (likely on the micro-/millisecond time scales) is much faster than the effective cross-relaxation rate between protein and ligand (occurring on the seconds time scale).
Having identified the relevant resonances of interest and performed both a protein and residual ligand subtraction, we reanalyzed the spectra without first summing over the different mixing times, in order to develop the quantitative atom-specific build-up curves. These were quantitatively analyzed as described below to obtain K D and k off rates. The values we obtain performing this analysis on BSA/Trp closely match those measured by ITC, and the values we measure for ligand 2 and Spike are in good agreement with those measured by SPR as described in the text.
The coverage of protons over the ligands studied here was variable; for example, there are no protons on carboxyl groups. To enable a complete surface to be rendered, the transfer efficiencies for each proton were calculated as described above, and the value is then transferred to the adjacent heteroatom. For heteroatoms not connected to an observed proton, a 1/r 6 weighted average score was calculated. This approach allows us to define a unique surface. Caution should be exercised when quantitatively interpreting such surfaces where there are no direct measurements of the heteroatom.
In practice, raw unformatted FIDs are submitted to the uSTA pipeline, and the various steps described are performed largely automatically, where a user needs to manually adjust processing settings such as phasing and choosing which regions to focus on, iteratively adjust the peak shape to get a good match between the final reconvolved spectrum and the raw data, and input manual atomic assignments for each observed multiplet. The uSTA pipeline then provides a user with a report that shows the results of the various stages of analysis, and uses pymol to render the surfaces. The final transfer efficiencies delivered by the program can be combined with a folder containing a series of HADDOCK models to provide final structural models (Fig. 5).

Quantitative analysis via uSTA
In principle, a complete description of the saturation transfer experiment can be achieved via the Bloch-McConnell equations. If we can set up a density matrix describing all the spins in the system, their interactions, and their rates of chemical change in an evolution matrix R, then we can follow the system with time according to: The challenge comes from the number of spins that must be included and the need to accurately describe all the interactions between them, which will need to also include how these are modulated by molecular motions in order to get an accurate description of the relaxation processes. This is illustrated by the CORCEMA method (64) that takes a static structure of a protein/ligand complex and estimates STD transfers. The CORCEMA calculations performed to arrive at cross-relaxation rates assume the complex is rigid, which is often a poor approximation for a protein, and because of the large number of spins involved, the calculation is sufficiently intensive such that this calculation cannot be routinely used to fit to experimental data. It would be very desirable to extract quantitative structural parameters, as well as chemical properties such as interaction strengths and association/dissociation rates, directly from STD data. In what follows, we develop a simple quantitative model for the STD experiment to achieve this goal. We will treat the system as comprising just two spins, one to represent the ligand and one to represent the protein, and we allow the two spins to exist either in isolation or in a bound state. We can safely neglect scalar coupling and so we only need to allow the x, y, and z basis operators for each spin, together with an identity operator to ensure that the system returns to thermal equilibrium at long times. As such, our evolution matrix R will be a square matrix with 13 × 13 elements.
For the spin part, our model requires us to consider the chemical shift of the ligand in the free and bound states, and the chemical shifts of the protein in the free and bound states. In practice however, it is sufficient to set the free protein state on resonance with the pulse, and the free ligand chemical shift is set to a value that matches experiment.
The longitudinal and transverse relaxation rates are calculated for the free and bound states using a simple model assuming in each state there are two dipole-coupled spins separated by a distance R with similar Larmor frequency. In addition, cross-relaxation between ligand and protein is allowed only when the two are bound. The relaxation rates are characterized by an effective distance and an effective correlation time, which are each parameterized in terms of an interaction constant (depending on effective distance) and a spectral density function (depending on effective distance) The longitudinal and transverse relaxation rates R 1 and R 2 describe auto-relaxation of diagonal z elements and xy elements, respectively. The cross-relaxation rates s describe cross-relaxation and couple z elements between the ligand and protein in the bound state. We ensure that the system returns to equilibrium at long times by adding elements of the form R 1 M 0 or sM 0 linking the identify element and the z matrix elements. Overall, the relaxation part of the model is parameterized by two correlation times, one for the ligand and one for the protein/ complex, and three distances, one for the ligand auto-relaxation rates, one for protein autorelaxation rates, and one for the protein/ligand separation.
Finally, the chemical kinetics govern the rates at which the spins can interconvert. We will take a simple model where PL ⇆ P + L, whose dissociation constant is given by The free protein concentration can be determined from knowledge of the K D and the total ligand and protein concentrations: from which the bound protein concentration and the free and bound ligand concentrations can be easily calculated. The density matrix is initialized with the free and bound protein/ligand concentrations assigned to the relevant z operators. It was found to be important to additionally include a factor that accounts for the increased proton density within the protein. The saturation pulse is then applied either as a concatenated series of Gaussian pulses whose duration and peak power in Hz needs to be specified, exactly matching the pulse shapes and durations used in the experiment (see NMR methods above).
Build-up curves and transfer efficiencies can be easily simulated using this model and compared to data, and the various parameters can be optimized to fit to the data. In total, the model is characterized by eight parameters: K D , k off , the correlation times of the ligand and the protein, the three distances described above, and the proton density within the protein. There is substantial correlation between the effects of the various parameters, and care is needed using optimization to avoid local minima. By obtaining data at various protein and ligand concentrations, however, it is possible to break this degeneracy and obtain welldescribed values as in the text.
In practical terms, the initial rate of the buildup curve is predominantly affected by the crossrelaxation rate and the off rate, and the final height of the build-up curve is mostly influenced by the proton density in the protein and K D . Software to perform this analysis has been directly incorporated into the uSTA software.

Parameters fitted by the model
Overall the model is parameterized by a set of values that characterize the intrinsic and cross relaxation. From t G and r IS (ligand) we estimate R 1 and R 2 of the ligand; from t E and r IS (protein) we obtain R 1 and R 2 of the protein; and from t E and r IS (complex) we calculate the cross-relaxation rate. These values are combined with a factor that accounts for the larger number of spins present in the protein, "fac," and the on and off rates, to complete a set of eight parameters that specify our model. The distances should be considered "effective" values that parameterize the relaxation rates, although in principle it should be possible to obtain physical insights from their interpretation. The concentration-independent relaxation rates can be separated from the exchange rates by comparing the curves as a function of ligand and protein concentration. By treating the system as comprising two spins, we are effectively assuming that the cross-relaxation within the protein is very efficient. In the STD experiment, saturation pulses are applied for several seconds, which is sufficient for nearsaturating spin diffusion within a protein.
Because of the complexity of the model, optimization of the parameters via a gradient descent method can get stuck in local minima. In practical applications, it is advisable to start the optimization over a range of initial conditions, particularly in the rates, to ensure that the lowest possible c 2 is achieved.

SPR binding measurement assays
All experiments were performed on a Biacore T200 instrument. For the immobilization of SiaLac onto the sensor chip, a flow rate of 10 ml/min was used in a buffer solution of HBP-EP (0.01 M HEPES pH 7.4, 0.15 M NaCl, 3 mM EDTA, 0.005% v/v surfactant P20). A CM5 sensor chip (carboxymethylated dextran) was equilibrated with HBS-EP buffer at 20°C. The chip was activated by injecting a mixture of Nhydroxysuccinimide (50 mM) and EDC-HCl (200 mM) for 10 min followed by a 2-min wash step with buffer. Ethylenediamine (1 M in PBS) was then injected for 7 min followed by a 2-min wash step followed by ethanolamine-HCl (1 M, pH 8.5) for 10 min and then a further 1-min wash step. Finally, SiaLac-IME (5.6 mM in PBS) reagent 10 was injected over 10 min and a final 2-min wash step was performed (see fig. S13 and supplementary materials for further details).
For analysis of spike binding, a flow rate of 10 ml/min was used at 16°C. Serial dilutions of spike (0.19, 0.50, 1.36, and 3.68 mM) were injected for 30 s association and 150 s dissociation starting with the lowest concentration. Buffer-only runs were carried out before injection of spike and after the first two dilutions. BSA (3.03 mM in PBS) was used as a negative control, and a mouse serum in a 100-fold dilution was used as a positive control.

Analysis of SPR data
To analyze the SPR data, we assume an equilibrium of the form PL ⇆ P + L characterized by a dissociation constant To follow the kinetics of binding and dissociation, we assume that the SPR response G is proportional to the bound complex G = k[PL], which leads to the following kinetic equation: This can be solved when restrained by the total number of binding sites, L tot = [L] + [PL]. Under conditions of constant flow, we assume that the free protein concentration is constant, which leads to the following: And similarly, for dissociation where we take the concentration of free protein to be zero: The recovery of the chip was not complete after each protein concentration and wash step, as has been observed for shear-induced lectinligand binding with glycans immobilized onto a chip surface (65). Nonetheless, the data were well explained by a global analysis where the on and off rates were held to be identical for each replicate, but the value of k was allowed to vary slightly between runs, and an additional constant was introduced to G off to account for incomplete recovery of the SPR signal following standard approaches. Concentration of spike was insufficient to get the plateau region of the binding, and so the specific time values taken for the on rate affect the fitted values.

Modeling of the N-terminal domain of SARS-CoV-2 with glycans
We modeled the structure of the NTD on Protein Data Bank (PDB) entry 7c2l (19) because it provided much better coverage of the area of interest when compared to the majority of the templates available at PDB as of 15 July 2020. The models were created with Modeller (66), using the "automodel" protocol without refining the "loop." We generated 10 models and ranked them by their DOPE score (67), selecting the top five for ensemble docking.

Docking of 3′-sialyllactose to SARS-CoV-2 NTD
We docked 3′-sialyllactose to NTD with version 2.4 of the HADDOCK webserver (42,43). The binding site on NTD was defined by comparison with PDB entry 6q06 (5), a complex of MERS-CoV spike protein and 2,3-sialyl-Nacetyl-lactosamine. The binding site could not be directly mapped because of conformational differences between the NTDs of MERS-CoV and SARS-CoV-2, but by inspection a region with similar properties (aromatics, methyl groups, and positively charged residues) could be identified. We defined in HADDOCK the sialic acid as "active" and residues 18,19,20,21,22,68,76,77,78,79,244,254,255,256,258, and 259 of NTD as "passive," meaning the sialic acid needs to make contact with at least one of the NTD residues but there is no penalty if it doesn't contact all of them, thus allowing the compound to freely explore the binding pocket. Because only one restraint was used, we disabled the random removal of restraints. Following our small-molecule docking recommended settings (68), we skipped the "hot" parts of the semi-flexible simulated annealing protocol ("initiosteps" and "cool1_steps" set to 0) and also lowered the starting temperature of the last two substages to 500 and 300 K, respectively ("tadinit2_t" and "tadinit3_t" to 500 and 300, respectively). Clustering was performed based on "RMSD" with a distance cutoff of 2 Å, and the scoring function was modified to HADDOCKscore ¼ 1:0 * E vdW þ 0:1 * E elec þ 1:0 * E desol þ 0:1 * E AIR All other settings were kept to their default values. Finally, the atom-specific transfer efficiencies determined by uSTA were used to filter cluster candidates.

Cryo-EM analysis
SARS-CoV-2 spike protein, generated and purified as described (48), in 1.1 mg/ml was incubated with 10 mM ethyl(triiodobenzamide) siallyllactoside overnight at 4°C. A 3.5-ml sample was applied to glow-discharged Quantifoil gold R1.2/1.3 300-mesh grids and blotted for 3 s at 100% humidity and 6°C before vitrification in liquid ethane using Vitrobot (FEI). Two datasets were collected on Titan Krios equipped with a K2 direct electron detector at the cryo-EM facility (OPIC) in the Division of Structural Biology, University of Oxford. Both datasets were collected by SerialEM at a magnification of 165,000× with a physical pixel size of 0.82 Å per pixel. Defocus range was -0.8 mm to -2.4 mm. Total doses for the two datasets were 60 e/Å 2 (5492 movies) and 61 e/Å 2 (8284 movies), respectively.
Motion correction was performed by MotionCor2 (69). The motion-corrected micrographs were imported into cryoSPARC (70) and contrast transfer function values were estimated using Gctf (71) in cryoSPARC. Templates were produced by 2D classification from 5492 micrographs with particles auto-picked by Laplacian-of-Gaussian (LoG)-based algorithm in RELION 3.0 (72,73). Particles were picked from all micrographs using Template picker in cryoSPARC. Multiple rounds of 2D classification were carried out and the selected 2D classes (372,157 particles) were subjected to 3D classification (Heterogeneous Refinement in cryoSPARC) using six classes. One class was predominant after 3D classification. Nonuniform refinement (74) was performed for this class (312,018 particles) with C1 and C3 symmetry, respectively, yielding a 2.27 Å map for C3 symmetry and a 2.44 Å map for C1 symmetry. See also fig. S23 and table S12 for cryo-EM data collection, refinement, and validation statistics.

Genetic analysis of clinical samples
Variant calling: Reads were mapped to the hg19 reference genome by the Burrow-Wheeler aligner BWA. Variants calling was performed according to the GATK4 best practice guidelines. Namely, duplicates were first removed by MarkDuplicates, and base qualities were recalibrated using BaseRecalibration and ApplyBQSR. HaplotypeCaller was used to calculate Genomic VCF files for each sample, which were then used for multi-sample calling by GenomicDBImport and GenotypeGVCF. In order to improve the specificity-sensitivity balance, variants' quality scores were calculated by VariantRecalibrator and ApplyVQSR, and only variants with estimated truth sensitivity above 99.9% were retained. Variants were annotated by ANNOVAR.
Rare variant selection: Missense, splicing, and loss-of-function variants with a frequency lower than 0.01 according to ExAC_NFE (Non Finnish European ExAC Database) were considered for further analyses. A score of 0 was assigned to each sample where the gene is not mutated, and a score of 1 was assigned when at least one variant is present on the gene.
The cohort was distributed as follows. Ethnicity: 504 white, 4 Black, 5 Asian, 16 Hispanic ethnicity, 4 patients for which this information was not available. Sex: 317 male, 216 female. Age: minimum age 19 years, maximum age 99 years, mean age 62.5 years.

Gene prioritization by logistic regression
Discriminating genes in COVID-19 disease were interpreted in a framework of feature selection analysis using a customized feature selection approach based on the recursive feature elimination algorithm applied to the LASSO logistic regression model. Specifically, for a set of n samples {x i , y i } (i = 1, …, n), each of which consists of p input features x i,k ∈ c i, k = 1, …, p and one output variable y i ∈ Y, these features assumed the meaning of genes, whereas the samples were the patients involved in the study. The space c = c 1 × c 2 … × c p was denoted "input space," whereas the "hypothesis space" was the space of all the possible functions f: c → Y mapping the inputs to the output. Given that the number of features (p) is substantially higher than the number of samples (n), LASSO regularization (49) has the effect of shrinking the estimated coefficients to zero, providing a feature selection method for sparse solutions within the classification tasks. Feature selection methods based on such regularization structures (embedded methods) were most applicable to our scope because they were computationally tractable and strictly connected with the classification task of the ML algorithm.
As the baseline algorithm for the embedded method, we adopted the logistic regression (LR) model that is a state-of-the-art ML algorithm for binary classification tasks with probabilistic interpretation. It models the log-odds of the posterior success probability of a binary variable as the linear combination of the input: where x is the input vector, b k are the coefficients of the regression, and X and Y are the random variables representing the input and the output, respectively. The loss function to be minimized is given by the binary crossentropy loss À X n i¼1 y i logŷ i 1 À y i ð Þlog 1 Àŷ i ð Þ ½ whereŷ = Pr(Y = 1|X = x) is the predicted target variable and y is the true label. As already introduced, in order to enforce both the sparsity and the interpretability of the results, the model is trained with the additional LASSO regularization term In this way, the absolute value of the surviving weights of the LR algorithm was interpreted as the feature importance of the subset of most relevant genes for the task. Because a featureranking criterion can become suboptimal when the subset of removed features is large (75), we applied recursive feature elimination (RFE) methodology. For each step of the procedure, we fitted the model and removed the features with smallest ranking criteria in a recursive manner until a certain number of features was reached.
The fundamental hyperparameter of LR is the strength of the LASSO term tuned with a grid search procedure on the accuracy of the 10-fold cross-validation. The k-fold cross-validation provided the partition of the dataset into k batches, then exploited k -1 batches for the training and the remaining test batch as a test, by repeating this procedure k times. In the grid search method, a cross-validation procedure was carried out for each value of the regularization hyperparameter in the range [10 -4 , ..., 10 6 ]. Specifically, the optimal regularization parameter is chosen by selecting the most parsimonious parameter whose cross-validation average accuracy falls in the range of the best one along with its standard deviation. During the fitting procedure, the class unbalancing was tackled by penalizing the misclassification of minority class with a multiplicative factor inversely proportional to the class frequencies. For the RFE, the number of included features at each step of the algorithm as well as the final number of features was fixed at 100. All data preprocessing and the RFE procedure were coded in Python; the LR model was used, as included, in the scikit-learn module with the liblinear coordinate descent optimization algorithm.