skip to main content

Title: Sequence patterns and signatures: Computational and experimental discovery of amyloid-forming peptides

Screening amino acid sequence space via experiments to discover peptides that self-assemble into amyloid fibrils is challenging. We have developed a computational peptide assembly design (PepAD) algorithm that enables the discovery of amyloid-forming peptides. Discontinuous molecular dynamics (DMD) simulation with the PRIME20 force field combined with the FoldAmyloid tool is used to examine the fibrilization kinetics of PepAD-generated peptides. PepAD screening of ∼10,000 7-mer peptides resulted in twelve top-scoring peptides with two distinct hydration properties. Our studies revealed that eight of the twelve in silico discovered peptides spontaneously form amyloid fibrils in the DMD simulations and that all eight have at least five residues that the FoldAmyloid tool classifies as being aggregation-prone. Based on these observations, we re-examined the PepAD-generated peptides in the sequence pool returned by PepAD and extracted five sequence patterns as well as associated sequence signatures for the 7-mer amyloid-forming peptides. Experimental results from Fourier transform infrared spectroscopy (FTIR), thioflavin T (ThT) fluorescence, circular dichroism (CD), and transmission electron microscopy (TEM) indicate that all the peptides predicted to assemble in silico assemble into antiparallel β-sheet nanofibers in a concentration-dependent manner. This is the first attempt to use a computational approach to search for amyloid-forming peptides based on customized settings. Our efforts facilitate the identification of β-sheet-based self-assembling peptides, and contribute insights towards answering a fundamental scientific question: “What does it take, sequence-wise, for a peptide to self-assemble?”

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
PNAS Nexus
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The aggregation of monomeric amyloid β protein (Aβ) peptide into oligomers and amyloid fibrils in the mammalian brain is associated with Alzheimer’s disease. Insight into the thermodynamic stability of the Aβ peptide in different polymeric states is fundamental to defining and predicting the aggregation process. Experimental determination of Aβ thermodynamic behavior is challenging due to the transient nature of Aβ oligomers and the low peptide solubility. Furthermore, quantitative calculation of a thermodynamic phase diagram for a specific peptide requires extremely long computational times. Here, using a coarse-grained protein model, molecular dynamics (MD) simulations are performed to determine an equilibrium concentration and temperature phase diagram for the amyloidogenic peptide fragment Aβ16–22. Our results reveal that the only thermodynamically stable phases are the solution phase and the macroscopic fibrillar phase, and that there also exists a hierarchy of metastable phases. The boundary line between the solution phase and fibril phase is found by calculating the temperature-dependent solubility of a macroscopic Aβ16–22fibril consisting of an infinite number of β-sheet layers. This in silico determination of an equilibrium (solubility) phase diagram for a real amyloid-forming peptide, Aβ16–22, over the temperature range of 277–330 K agrees well with fibrillation experiments and transmission electron microscopy (TEM) measurements of the fibril morphologies formed. This in silico approach of predicting peptide solubility is also potentially useful for optimizing biopharmaceutical production and manufacturing nanofiber scaffolds for tissue engineering.

    more » « less
  2. Peptide self-assembly, wherein molecule A associates with other A molecules to form fibrillar β-sheet structures, is common in nature and widely used to fabricate synthetic biomaterials. Selective coassembly of peptide pairs A and B with complementary partial charges is gaining interest due to its potential for expanding the form and function of biomaterials that can be realized. It has been hypothesized that charge-complementary peptides organize into alternating ABAB-type arrangements within assembled β-sheets, but no direct molecular-level evidence exists to support this interpretation. We report a computational and experimental approach to characterize molecular-level organization of the established peptide pair, CATCH. Discontinuous molecular dynamics simulations predict that CATCH(+) and CATCH(−) peptides coassemble but do not self-assemble. Two-layer β-sheet amyloid structures predominate, but off-pathway β-barrel oligomers are also predicted. At low concentration, transmission electron microscopy and dynamic light scattering identified nonfibrillar ∼20-nm oligomers, while at high concentrations elongated fibers predominated. Thioflavin T fluorimetry estimates rapid and near-stoichiometric coassembly of CATCH(+) and CATCH(−) at concentrations ≥100 μM. Natural abundance13C NMR and isotope-edited Fourier transform infrared spectroscopy indicate that CATCH(+) and CATCH(−) coassemble into two-component nanofibers instead of self-sorting. However,13C–13C dipolar recoupling solid-state NMR measurements also identify nonnegligible AA and BB interactions among a majority of AB pairs. Collectively, these results demonstrate that strictly alternating arrangements of β-strands predominate in coassembled CATCH structures, but deviations from perfect alternation occur. Off-pathway β-barrel oligomers are also suggested to occur in coassembled β-strand peptide systems.

    more » « less
  3. The aggregation of amyloids into toxic oligomers is believed to be a key pathogenic event in the onset of Alzheimer's disease. Peptidomimetic modulators capable of destabilizing the propagation of an extended network of β-sheet fibrils represent a potential intervention strategy. Modifications to amyloid-beta (Aβ) peptides derived from the core domain have afforded inhibitors capable of both antagonizing aggregation and reducing amyloid toxicity. Previous work from our laboratory has shown that peptide backbone amination stabilizes β-sheet-like conformations and precludes β-strand aggregation. Here, we report the synthesis of N -aminated hexapeptides capable of inhibiting the fibrillization of full-length Aβ 42 . A key feature of our design is N -amino substituents at alternating backbone amides within the aggregation-prone Aβ 16–21 sequence. This strategy allows for maintenance of an intact hydrogen-bonding backbone edge as well as side chain moieties important for favorable hydrophobic interactions. An N -amino scan of Aβ 16–21 resulted in the identification of peptidomimetics that block Aβ 42 fibrilization in several biophysical assays. 
    more » « less
  4. Abstract

    Collagen fibrils represent a unique case of protein folding and self‐association. We have recently successfully developed triple‐helical peptides that can further self‐assemble into collagen‐mimetic mini‐fibrils. The 35 nm axially repeating structure of the mini‐fibrils, which is designated thed‐period, is highly reminiscent of the well‐known 67 nmD‐period of native collagens when examined using TEM and atomic force spectroscopy. We postulate that it is the pseudo‐identical repeating sequence units in the primary structure of the designed peptides that give rise to thed‐period of the quaternary structure of the mini‐fibrils. In this work, we characterize the self‐assembly of two additional designed peptides: peptide Col877 and peptide Col108rr. The triple‐helix domain of Col877 consists of three pseudo‐identical amino acid sequence units arranged in tandem, whereas that of Col108rr consists of three sequence units identical in amino acid composition but different in sequence. Both peptides form stable collagen triple helices, but only triple helices Col877 self‐associate laterally under fibril forming conditions to form mini‐fibrils having the predictedd‐period. The Co108rr triple helices, however, only form nonspecific aggregates having no identifiable structural features. These results further accentuate the critical involvement of the repeating sequence units in the self‐assembly of collagen mini‐fibrils; the actual amino acid sequence of each unit has only secondary effects. Collagen is essential for tissue development and function. This novel approach to creating collagen‐mimetic fibrils can potentially impact fundamental research and have a wide range of biomedical and industrial applications.

    more » « less
  5. This data set for the manuscript entitled "Design of Peptides that Fold and Self-Assemble on Graphite" includes all files needed to run and analyze the simulations described in the this manuscript in the molecular dynamics software NAMD, as well as the output of the simulations. The files are organized into directories corresponding to the figures of the main text and supporting information. They include molecular model structure files (NAMD psf or Amber prmtop format), force field parameter files (in CHARMM format), initial atomic coordinates (pdb format), NAMD configuration files, Colvars configuration files, NAMD log files, and NAMD output including restart files (in binary NAMD format) and trajectories in dcd format (downsampled to 10 ns per frame). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts or python scripts. These scripts and their output are also included.

    Version: 2.0

    Changes versus version 1.0 are the addition of the free energy of folding, adsorption, and pairing calculations (Sim_Figure-7) and shifting of the figure numbers to accommodate this addition.

    Conventions Used in These Files

    Structure Files
    - graph_*.psf or sol_*.psf (original NAMD (XPLOR?) format psf file including atom details (type, charge, mass), as well as definitions of bonds, angles, dihedrals, and impropers for each dipeptide.)

    - graph_*.pdb or sol_*.pdb (initial coordinates before equilibration)
    - repart_*.psf (same as the above psf files, but the masses of non-water hydrogen atoms have been repartitioned by VMD script repartitionMass.tcl)
    - freeTop_*.pdb (same as the above pdb files, but the carbons of the lower graphene layer have been placed at a single z value and marked for restraints in NAMD)
    - amber_*.prmtop (combined topology and parameter files for Amber force field simulations)
    - repart_amber_*.prmtop (same as the above prmtop files, but the masses of non-water hydrogen atoms have been repartitioned by ParmEd)

    Force Field Parameters
    CHARMM format parameter files:
    - par_all36m_prot.prm (CHARMM36m FF for proteins)
    - par_all36_cgenff_no_nbfix.prm (CGenFF v4.4 for graphene) The NBFIX parameters are commented out since they are only needed for aromatic halogens and we use only the CG2R61 type for graphene.
    - toppar_water_ions_prot_cgenff.str (CHARMM water and ions with NBFIX parameters needed for protein and CGenFF included and others commented out)

    Template NAMD Configuration Files
    These contain the most commonly used simulation parameters. They are called by the other NAMD configuration files (which are in the namd/ subdirectory):
    - template_min.namd (minimization)
    - template_eq.namd (NPT equilibration with lower graphene fixed)
    - template_abf.namd (for adaptive biasing force)

    - namd/min_*.0.namd

    - namd/eq_*.0.namd

    Adaptive biasing force calculations
    - namd/eabfZRest7_graph_chp1404.0.namd
    - namd/eabfZRest7_graph_chp1404.1.namd (continuation of eabfZRest7_graph_chp1404.0.namd)

    Log Files
    For each NAMD configuration file given in the last two sections, there is a log file with the same prefix, which gives the text output of NAMD. For instance, the output of namd/eabfZRest7_graph_chp1404.0.namd is eabfZRest7_graph_chp1404.0.log.

    Simulation Output
    The simulation output files (which match the names of the NAMD configuration files) are in the output/ directory. Files with the extensions .coor, .vel, and .xsc are coordinates in NAMD binary format, velocities in NAMD binary format, and extended system information (including cell size) in text format. Files with the extension .dcd give the trajectory of the atomic coorinates over time (and also include system cell information). Due to storage limitations, large DCD files have been omitted or replaced with new DCD files having the prefix stride50_ including only every 50 frames. The time between frames in these files is 50 * 50000 steps/frame * 4 fs/step = 10 ns. The system cell trajectory is also included for the NPT runs are output/eq_*.xst.

    Files with the .sh extension can be found throughout. These usually provide the highest level control for submission of simulations and analysis. Look to these as a guide to what is happening. If there are scripts with step1_*.sh and step2_*.sh, they are intended to be run in order, with step1_*.sh first.


    The directory contents are as follows. The directories Sim_Figure-1 and Sim_Figure-8 include README.txt files that describe the files and naming conventions used throughout this data set.

    Sim_Figure-1: Simulations of N-acetylated C-amidated amino acids (Ac-X-NHMe) at the graphite–water interface.

    Sim_Figure-2: Simulations of different peptide designs (including acyclic, disulfide cyclized, and N-to-C cyclized) at the graphite–water interface.

    Sim_Figure-3: MM-GBSA calculations of different peptide sequences for a folded conformation and 5 misfolded/unfolded conformations.

    Sim_Figure-4: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-5: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 295 K.

    Sim_Figure-5_replica: Temperature replica exchange molecular dynamics simulations for the peptide cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) with 20 replicas for temperatures from 295 to 454 K.

    Sim_Figure-6: Simulation of the peptide molecule cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) in free solution (no graphite).

    Sim_Figure-7: Free energy calculations for folding, adsorption, and pairing for the peptide CHP1404 (sequence: cyc(GTGSGTG-GPGG-GCGTGTG-SGPG)). For folding, we calculate the PMF as function of RMSD by replica-exchange umbrella sampling (in the subdirectory Folding_CHP1404_Graphene/). We make the same calculation in solution, which required 3 seperate replica-exchange umbrella sampling calculations (in the subdirectory Folding_CHP1404_Solution/). Both PMF of RMSD calculations for the scrambled peptide are in Folding_scram1404/. For adsorption, calculation of the PMF for the orientational restraints and the calculation of the PMF along z (the distance between the graphene sheet and the center of mass of the peptide) are in Adsorption_CHP1404/ and Adsorption_scram1404/. The actual calculation of the free energy is done by a shell script ("") in the 1_free_energy/ subsubdirectory. Processing of the PMFs must be done first in the 0_pmf/ subsubdirectory. Finally, files for free energy calculations of pair formation for CHP1404 are found in the Pair/ subdirectory.

    Sim_Figure-8: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) where the peptides are far above the graphene–water interface in the initial configuration.

    Sim_Figure-9: Two replicates of a simulation of nine peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-9_scrambled: Two replicates of a simulation of nine peptide molecules with the control sequence cyc(GGTPTTGGGGGGSGGPSGTGGC) at the graphite–water interface at 370 K.

    Sim_Figure-10: Adaptive biasing for calculation of the free energy of the folded peptide as a function of the angle between its long axis and the zigzag directions of the underlying graphene sheet.


    This material is based upon work supported by the US National Science Foundation under grant no. DMR-1945589. A majority of the computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CHE-1726332, CNS-1006860, EPS-1006860, and EPS-0919443. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, through allocation BIO200030. 
    more » « less