Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Protein structure is central to biological function, and enabling multimodal protein models requires joint reasoning over sequence, structure, and function. A key barrier is the lack of principled protein structure tokenizers (PSTs): existing approaches fix token size or rely on continuous vector codebooks, limiting interpretability, multi-scale control, and transfer across architectures. We introduce GeoBPE, a geometry-grounded PST that transforms continuous, noisy, multi-scale backbone conformations into discrete ``sentences'' of geometry while enforcing global constraints. Analogous to byte-pair encoding, GeoBPE generates a hierarchical vocabulary of geometric primitives by iteratively (i) clustering Geo-Pair occurrences with k-medoids to yield a resolution-controllable vocabulary; (ii) quantizing each Geo-Pair to its closest medoid prototype; and (iii) reducing drift through differentiable inverse kinematics that optimizes boundary glue angles under an SE(3) end-frame loss. GeoBPE offers compression (>10x reduction in bits-per-residue at similar distortion rate), data efficiency (>10x less training data), and generalization (maintains test/train distortion ratio of 1.0−1.1). It is architecture-agnostic: (a) its hierarchical vocabulary provides a strong inductive bias for coarsening residue-level embeddings from large PLMs into motif- and protein-level representations, consistently outperforming leading PSTs across 12 tasks and 24 test splits; (b) paired with a transformer, GeoBPE supports unconditional backbone generation via language modeling; and (c) tokens align with CATH functional families and support expert-interpretable case studies, offering functional meaning absent in prior PSTs.more » « less
-
Conventional inverse problems for cohesive zones often utilize homogenized responses of the effective media to identify a fixed set of material parameters prescribed a priori. However, the mixed-mode loading conditions of composites or natural materials may exhibit interfacial relations that are difficult to anticipate. This article presents a model-discovery framework for directly identifying cohesive zone models inferred from displacement fields across the interface, without fixing on a specific form of equations. We develop a differentiable version of the Material Point Method (MPM) with interface elements formulated to capture the traction-separation law at a pre-existing crack or bonded interface. Ensuring the differentiability of the MPM solver enables us to backpropagate the mismatch between simulated and measured (e.g., DIC/DVC) displacement fields through the time integrator and interface physics. Using only kinematics and equilibrium as constraints, numerical experiments suggest that the method may recover (i) a Mode-I traction-separation curve in a double-cantilever-beam test and (ii) a mixed-mode law for a circular interface shear test. These numerical results demonstrate that displacement-only experiments, combined with a differentiable solver, offer a promising pathway for identifying rich and potentially nonparametric cohesive laws.more » « less
An official website of the United States government

Full Text Available