Reliable In Silico Ranking of Engineered Therapeutic TCR Binding Affinities with MMPB/GBSA

Accurate and efficient in silico ranking of protein–protein binding affinities is useful for protein design with applications in biological therapeutics. One popular approach to rank binding affinities is to apply the molecular mechanics Poisson–Boltzmann/generalized Born surface area (MMPB/GBSA) method to molecular dynamics (MD) trajectories. Here, we identify protocols that enable the reliable evaluation of T-cell receptor (TCR) variants binding to their target, peptide-human leukocyte antigens (pHLAs). We suggest different protocols for variant sets with a few (≤4) or many mutations, with entropy corrections important for the latter. We demonstrate how potential outliers could be identified in advance and that just 5–10 replicas of short (4 ns) MD simulations may be sufficient for the reproducible and accurate ranking of TCR variants. The protocols developed here can be applied toward in silico screening during the optimization of therapeutic TCRs, potentially reducing both the cost and time taken for biologic development.


Additional methods
WT A6-TCR Solvation. The 3D-Reference Interaction Site Model 1,2 (3D-RISM) was used to predict the density distribution function (g(r)) for water oxygen atoms across the entire TCR-pHLA binding site of the WT A6 TCR-pHLA structure. For 3D-RISM calculations (performed with AmberTools18), the Kovalenko-Hirata (KH) closure method 3,4 was used, with all other settings kept as default. Following this, Placevent 5 was used to solvate the entire TCR-pHLA complex, by solvating the entire complex with waters molecules up to 5 Å away from any protein atom. Multiple cut-off g(r) values (point at which additional waters with smaller g (r) values are no longer added) were tested as the default value of 1.5 Å resulted in stability issues (due to "vacuum bubbles") during NPT simulations (due to too much space between solvent atoms in the initial structure). We found that appropriate solvation was achieved with a g(r) cut-off of 1.1 Å and following this we solvated the protein and these water molecules in a water box of size 7 Å (away from both any protein and 3D-RISM/Placevent water molecule). This distance of 7 Å was chosen as it gave a slightly bigger box size (in all dimensions) to the box size that would be generated when solvating just the protein in a 10 Å octahedral water box.
(This value is likely to be somewhat system dependent.) Structure equilibration procedure. The following procedure was used to prepare all systems simulated for production MD simulations in the NPT ensemble at 298 K and 1 atm. All dynamics steps applied the SHAKE algorithm to constrain all bonds containing a hydrogen atom. Replica simulations were initiated from the second heating step of the following protocol (with each replica assigned different random velocity vectors at this stage). Simulations performed in the NVT ensemble used Langevin temperature control (with a collision frequency of 1 ps −1 ) and used a simulation timestep of 1 fs. Simulations performed in the NPT ensemble used Langevin temperature control (collision frequency of 1 ps −1 ) and a Berendsen barostat (1 ps pressure relaxation time).
The equilibration protocol is as follows: First, hydrogens atoms and solvent molecules were energy minimized (using 500 steps of steepest descent followed by 500 steps of conjugate gradient minimization). To prevent the movement of non-hydrogen and non-solvent atoms during the minimization, 10 kcal mol −1 Å −1 positional restraints were used to keep all heavy atoms fixed. Then the solvent was heated rapidly from 50 K to 298 K (NVT ensemble, 1 fs timestep) over the course of 200 ps, with the previously described restraints still maintained.
The positional restraints were then replaced with 5 kcal mol −1 Å −1 positional restraints on only the Cα carbon atoms of each residue and subjected to another round of energy minimization (500 steps of steepest descent followed by 500 steps of conjugate gradient). Retaining these positional restraints, the system was heated from 25 K to 298 K over the course of 50 ps (NVT ensemble, 1 fs time step). Simulations were then performed in the NPT ensemble (1 atm, 298 K, 2 fs time step) by first gradually reducing the 5 kcal mol −1 Å −1 Cα carbon restraints over the course of 50 ps. This was done by reducing the restraint weight by 1 kcal mol −1 Å −1 every 10 ps. A final 1 ns long NVT MD simulation with no restraints placed on the system was then performed, with the final structure produced after this run, used as the starting point for production MD simulations. were performed only on the 1G4 TCR-pHLA complexes studied in this manuscript. Residues in the pHLA that were retained were HLA residues: 5-26, 33-47, 54-101, 112-118, 123-133, 139-174, and peptide residues: 1-9. Residues in the TCR that were retained were TCRα-chain: was used to retain 1000 binding site water molecules in each frame. The 1000 closest water molecules (water position determined using the oxygen atom) to the Cα of any of central peptide residues (residues 4, 5, 6 and 7) were used for this closest water molecules calculation.
We used a modified version of the Nmode program from AmberTools14, which allows for the use of the "ibelly" parameter which fixes selected atoms in place, meaning they have no (direct) contribution to the vibrational frequency calculation. The modified Nmode code (along with scripts to setup and analyse the obtained results) can be provided upon request to the corresponding authors.

Inclusion of Explicit Water Molecules for MMPB/GBSA Calculations. MMPB/GBSA
calculations performed with explicit solvent (either 10, 20, 30 or 50 water molecules) were selected for using the "closest" command with CPPTRAJ. 6 Water molecules were included in the complex and receptor calculations, and receptor residues at the binding site were used to select the x closest water molecules. Specifically, for WT 1G4, HLA residues: 19, 62, 65-66, Spearman's rank (rs) are provided for the full set of data (in black) and without the three identified outliers discussed in the main text (in red and in parenthesis).           a HID corresponds to a histidine residue which is singly protonated on its Nδ1 nitrogen.

Supporting tables
b HIE corresponds to a histidine residue which is singly protonated on their Nε2 nitrogen.     Figure S3, Figure S4) Figure S3, Figure S4). MADs determined here are for MMPB/GBSA calculations performed with a varying number of explicit water molecules present, but without any entropy corrections added.