skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 27, 2026

Title: The Geometry of Concepts: Sparse Autoencoder Feature Structure
Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: (1) The “atomic” small-scale structure contains “crystals” whose faces are parallelograms or trapezoids, generalizing well-known examples such as (man:woman::king:queen). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently performed with linear discriminant analysis. (2) The “brain” intermediate-scale structure has significant spatial modularity; for example, math and code features form a “lobe” akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. (3) The “galaxy”-scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.  more » « less
Award ID(s):
2019786
PAR ID:
10588349
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Entropy
Volume:
27
Issue:
4
ISSN:
1099-4300
Page Range / eLocation ID:
344
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT We have not yet observed the epoch at which disc galaxies emerge in the Universe. While high-z measurements of large-scale features such as bars and spiral arms trace the evolution of disc galaxies, such methods cannot directly quantify featureless discs in the early Universe. Here, we identify a substantial population of apparently featureless disc galaxies in the Cosmic Evolution Early Release Science (CEERS) survey by combining quantitative visual morphologies of $${\sim} 7000$$ galaxies from the Galaxy Zoo JWST CEERS project with a public catalogue of expert visual and parametric morphologies. While the highest redshift featured disc we identify is at $$z_{\rm {phot}}=5.5$$, the highest redshift featureless disc we identify is at $$z_{\rm {phot}}=7.4$$. The distribution of Sérsic indices for these featureless systems suggests that they truly are dynamically cold: disc-dominated systems have existed since at least $$z\sim 7.4$$. We place upper limits on the featureless disc fraction as a function of redshift, and show that up to 75 per cent of discs are featureless at $3.0< z< 7.4$. This is a conservative limit assuming all galaxies in the sample truly lack features. With further consideration of redshift effects and observational constraints, we find the featureless disc fraction in CEERS imaging at these redshifts is more likely $${\sim} 29{\!-\!}38~{{\ \rm per\ cent}}$$. We hypothesize that the apparent lack of features in a third of high-redshift discs is due to a higher gas fraction in the early Universe, which allows the discs to be resistant to buckling and instabilities. 
    more » « less
  2. Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break thisad hocdivision. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature‐function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson–Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave‐one‐out test gives an optimal root‐mean‐square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc. 
    more » « less
  3. null (Ed.)
    Cells sense mechanical signals within the extracellular matrix, the most familiar being stiffness, but matrix stiffness cannot be simply described by a single value. Randomness in matrix structure causes stiffness at the scale of a cell to vary by more than an order of magnitude. Additionally, the extracellular matrix contains ducts, blood vessels, and, in cancer or fibrosis, regions with abnormally high stiffness. These different features could alter the stiffness sensed by a cell, but it is unclear whether the change in stiffness is large enough to overcome the noise caused by heterogeneity due to the random fibrous structure. Here we used a combination of experiments and modeling to determine the extent to which matrix heterogeneity disrupts the potential for cell sensing of a locally stiff feature in the matrix. Results showed that, at the scale of a single cell, spatial heterogeneity in local stiffness was larger than the increase in stiffness due to a stiff feature. The heterogeneity was reduced only for large length scales compared to the fiber length. Experiments verified this conclusion, showing spheroids of cells, which were large compared to the average fiber length, spreading preferentially toward stiff inclusions. Hence, the propagation of mechanical cues through the matrix depends on length scale, with single cells being able to sense only the stiffness of the nearby fibers and multicellular structures, such as tumors, also sensing the stiffness of distant matrix features. 
    more » « less
  4. Abstract An inadequate characterization of hydrogeological properties can significantly decrease the trustworthiness of subsurface flow and transport model predictions. A variety of data assimilation methods have been proposed in order to estimate hydrogeological parameters from spatially scarce data by incorporating them into the governing physical models. In order to quantify the accuracy of the estimations, several metrics have been used such as Rank Histograms, root‐mean‐square error (RMSE), and Ensemble Spread. However, these commonly used metrics do not regard the spatial correlation of the aquifer's properties. This can cause permeability fields with very different spatial structures to have similar histograms or RMSE. In this paper, we propose an approach based on color coherence vectors (CCV) for evaluating the performance of these estimation methods. CCV is a histogram‐based technique for comparing images that incorporate spatial information. We represent estimated fields as digital three‐channel images and use CCV to compare and quantify the accuracy of estimations. The appealing feature of this technique is that it considers the spatial structure embedded in the estimated fields. The sensitivity of CCV to spatial information makes it a suitable metric for assessing the performance of data assimilation techniques. Under various factors, such as numbers of measurements and structural parameters of the log conductivity field, we compare the performance of CCV with the RMSE. 
    more » « less
  5. Abstract. Most deep learning (DL) methods that are not end-to-end use several multi-scale and multi-type hand-crafted features that make the network challenging, more computationally intensive and vulnerable to overfitting. Furthermore, reliance on empirically-based feature dimensionality reduction may lead to misclassification. In contrast, efficient feature management can reduce storage and computational complexities, builds better classifiers, and improves overall performance. Principal Component Analysis (PCA) is a well-known dimension reduction technique that has been used for feature extraction. This paper presents a two-step PCA based feature extraction algorithm that employs a variant of feature-based PointNet (Qi et al., 2017a) for point cloud classification. This paper extends the PointNet framework for use on large-scale aerial LiDAR data, and contributes by (i) developing a new feature extraction algorithm, (ii) exploring the impact of dimensionality reduction in feature extraction, and (iii) introducing a non-end-to-end PointNet variant for per point classification in point clouds. This is demonstrated on aerial laser scanning (ALS) point clouds. The algorithm successfully reduces the dimension of the feature space without sacrificing performance, as benchmarked against the original PointNet algorithm. When tested on the well-known Vaihingen data set, the proposed algorithm achieves an Overall Accuracy (OA) of 74.64% by using 9 input vectors and 14 shape features, whereas with the same 9 input vectors and only 5PCs (principal components built by the 14 shape features) it actually achieves a higher OA of 75.36% which demonstrates the effect of efficient dimensionality reduction. 
    more » « less