We develop theory to understand an intriguing property of diffusion models for image generation that we term critical windows. Empirically, it has been observed that there are narrow time intervals in sampling during which particular features of the final image emerge, e.g. the image class or background color (Ho et al., 2020b; Meng et al., 2022; Choi et al., 2022; Raya & Ambrogioni, 2023; Georgiev et al., 2023; Sclocchi et al., 2024; Biroli et al., 2024). While this is advantageous for interpretability as it implies one can localize properties of the generation to a small segment of the trajectory, it seems at odds with the continuous nature of the diffusion. We propose a formal framework for studying these windows and show that for data coming from a mixture of strongly log-concave densities, these windows can be provably bounded in terms of certain measures of inter- and intra-group separation. We also instantiate these bounds for concrete examples like well-conditioned Gaussian mixtures. Finally, we use our bounds to give a rigorous interpretation of diffusion models as hierarchical samplers that progressively "decide" output features over a discrete sequence of times. We validate our bounds with synthetic experiments. Additionally, preliminary experiments on Stable Diffusion suggest critical windows may serve as a useful tool for diagnosing fairness and privacy violations in real-world diffusion models. 
                        more » 
                        « less   
                    
                            
                            Learning Reduplication with a Neural Network that Lacks Explicit Variables
                        
                    
    
            Reduplicative linguistic patterns have been used as evidence for explicit algebraic variables in models of cognition.1 Here, we show that a variable-free neural network can model these patterns in a way that predicts observed human behavior. Specifically, we successfully simulate the three experiments presented by Marcus et al. (1999), as well as Endress et al.’s (2007) partial replication of one of those experiments. We then explore the model’s ability to generalize reduplicative mappings to different kinds of novel inputs. Using Berent’s (2013) scopes of generalization as a metric, we claim that the model matches the scope of generalization that has been observed in humans. We argue that these results challenge past claims about the necessity of symbolic variables in models of cognition. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1650957
- PAR ID:
- 10356594
- Date Published:
- Journal Name:
- Journal of Language Modelling
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2299-856X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Statistical modeling is generally meant to describe patterns in data in service of the broader scientific goal of developing theories to explain those patterns. Statistical models support meaningful inferences when models are built so as to align parameters of the model with potential causal mechanisms and how they manifest in data. When statistical models are instead based on assumptions chosen by default, attempts to draw inferences can be uninformative or even paradoxical—in essence, the tail is trying to wag the dog. These issues are illustrated by van Doorn et al. (this issue) in the context of using Bayes Factors to identify effects and interactions in linear mixed models. We show that the problems identified in their applications (along with other problems identified here) can be circumvented by using priors over inherently meaningful units instead of default priors on standardized scales. This case study illustrates how researchers must directly engage with a number of substantive issues in order to support meaningful inferences, of which we highlight two: The first is the problem of coordination , which requires a researcher to specify how the theoretical constructs postulated by a model are functionally related to observable variables. The second is the problem of generalization , which requires a researcher to consider how a model may represent theoretical constructs shared across similar but non-identical situations, along with the fact that model comparison metrics like Bayes Factors do not directly address this form of generalization. For statistical modeling to serve the goals of science, models cannot be based on default assumptions, but should instead be based on an understanding of their coordination function and on how they represent causal mechanisms that may be expected to generalize to other related scenarios.more » « less
- 
            Proceedings of the 29th International Conference on DNA Computing and Molecular Programming (DNA 29)Chen, Ho-Lin; Evans, Constantine G. (Ed.)We present an abstract model of self-assembly of systems composed of "crisscross slats", which have been experimentally implemented as a single-stranded piece of DNA [Minev et al., 2021] or as a complete DNA origami structure [Wintersinger et al., 2022]. We then introduce a more physically realistic *kinetic* model and show how important constants in the model were derived and tuned, and compare simulation-based results to experimental results [Minev et al., 2021; Wintersinger et al., 2022]. Using these models, we show how we can apply optimizations to designs of slat systems in order to lower the numbers of unique slat types required to build target structures. In general, we apply two types of techniques to achieve greatly reduced numbers of slat types. Similar to the experimental work implementing DNA origami-based slats, in our designs the slats oriented in horizontal and vertical directions are each restricted to their own plane and sets of them overlap each other in square regions which we refer to as macrotiles. Our first technique extends their previous work of reusing slat types within macrotiles and requires analyses of binding domain patterns to determine the potential for errors consisting of incorrect slat types attaching at undesired translations and reflections. The second technique leverages the power of algorithmic self-assembly to efficiently reuse entire macrotiles which self-assemble in patterns following designed algorithms that dictate the dimensions and patterns of growth. Using these designs, we demonstrate that in kinetic simulations the systems with reduced numbers of slat types self-assemble more quickly than those with greater numbers. This provides evidence that such optimizations will also result in greater assembly speeds in experimental systems. Furthermore, the reduced numbers of slat types required have the potential to vastly reduce the cost and number of lab steps for crisscross assembly experiments.more » « less
- 
            null (Ed.)Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been suggested. However, formal guarantees for all of these works are severely lacking. In this paper, we present the first analysis of classification under the IRM objective—as well as these recently proposed alternatives—under a fairly natural and general model. In the linear case, we give simple conditions under which the optimal solution succeeds or, more often, fails to recover the optimal invariant predictor. We furthermore present the very first results in the non-linear regime: we demonstrate that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution—this is precisely the issue that it was intended to solve. Thus, in this setting we find that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.more » « less
- 
            Mode connectivity (Garipov et al., 2018; Draxler et al., 2018) is a surprising phenomenon in the loss landscape of deep nets. Optima—at least those discovered by gradient-based optimization—turn out to be connected by simple paths on which the loss function is almost constant. Often, these paths can be chosen to be piece-wise linear, with as few as two segments. We give mathematical explanations for this phenomenon, assuming generic properties (such as dropout stability and noise stability) of well-trained deep nets, which have previously been identified as part of understanding the generalization properties of deep nets. Our explanation holds for realistic multilayer nets, and experiments are presented to verify the theory.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    