Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

This paper studies the fundamental problem of multilayer generator models in learning hierarchical representations. The multilayer generator model that consists of multiple layers of latent variables organized in a topdown architecture tends to learn multiple levels of data abstraction. However, such multilayer latent variables are typically parameterized to be Gaussian, which can be less informative in capturing complex abstractions, resulting in limited success in hierarchical representation learning. On the other hand, the energybased (EBM) prior is known to be expressive in capturing the data regularities, but it often lacks the hierarchical structure to capture different levels of hierarchical representations. In this paper, we propose a joint latent space EBM prior model with multilayer latent variables for effective hierarchical representation learning. We develop a variational joint learning scheme that seamlessly integrates an inference model for efficient inference. Our experiments demonstrate that the proposed joint EBM prior is effective and expressive in capturing hierarchical representations and modeling data distribution.more » « lessFree, publiclyaccessible full text available September 1, 2024

The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a humanlike, trustworthy dialogue system. Common strategies either adopt a twostep paradigm, which optimizes knowledge selection and response generation separately and may overlook the inherent correlation between these two tasks, or leverage conditional variational method to jointly optimize knowledge selection and response generation by employing an inference network. In this paper, we present an endtoend learning framework, termed Sequential Posterior Inference (SPI), capable of se lecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses. In addition to modeling contributions, our experimental results on two common dialogue datasets (Wizard of Wikipedia and HollE) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics.more » « lessFree, publiclyaccessible full text available October 1, 2024

Free, publiclyaccessible full text available September 1, 2024

Generation of molecules with desired chemical and biological properties such as high druglikeness, high binding affinity to target proteins, is critical for drug discovery. In this paper, we propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our model assumes an energybased model (EBM) in the latent space. Conditional on the latent vector, the molecule and its properties are modeled by a molecule generation model and a property regression model respectively. To search for molecules with desired properties, we propose a sampling with gradual distribution shifting (SGDS) algorithm, so that after learning the model initially on the training data of existing molecules and their properties, the proposed algorithm gradually shifts the model distribution towards the region supported by molecules with desired values of properties. Our experiments show that our method achieves very strong performances on various molecule design tasks.more » « lessFree, publiclyaccessible full text available August 1, 2024

This paper studies the fundamental problem of learning multilayer generator models. The multilayer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling interlayer relations between latent variables by assuming noninformative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energybased model (EBM) on the joint latent space over all layers of latent variables with the multilayer generator as its backbone. Such joint latent space EBM prior model captures the intralayer contextual relations at each layer through layerwise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating highquality images and capturing hierarchical features for better outlier detection.more » « less

We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rulebased or similaritybased) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the complexity of concepts becomes diverse. Specifically, at the representational level, we seek to answer how the complexity varies when a visual concept is mapped to the representation space. Prior psychology literature has shown that two types of complexities (i.e., subjective complexity and visual complexity) build an invertedU relation. Leveraging the Representativeness of Attribute (RoA), we computationally confirm the following observation: Models use attributes with high RoA to describe visual concepts, and the description length falls in an invertedU relation with the increment in visual complexity. At the computational level, we aim to answer how the complexity of representation affects the shift between the rule and similaritybased generalization. We hypothesize that categoryconditioned visual modeling estimates the cooccurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity outperform those with relatively low subjective complexity in the rulebased generalization, while the trend is the opposite in the similaritybased generalization.more » « lessFree, publiclyaccessible full text available October 1, 2024

In this work, we tackle two widespread challenges in real applications for time series forecasting that have been largely understudied: distribution shifts and missing data. We propose SpectraNet, a novel multivariate timeseries forecasting model that dynamically infers a latent space spectral decomposition to capture current temporal dynamics and correlations on the recent observed history. A Convolution Neural Network maps the learned representation by sequentially mixing its components and refining the output. Our proposed approach can simultaneously produce forecasts and interpolate past observations and can, therefore, greatly simplify production systems by unifying imputation and forecasting tasks into a single model. SpectraNet achieves SoTA performance simultaneously on both tasks on five benchmark datasets, compared to forecasting and imputation models, with up to 92% fewer parameters and comparable training times. On settings with up to 80% missing data, SpectraNet has average performance improvements of almost 50% over the secondbest alternative.more » « less

Inspired by humans’ exceptional ability to master arithmetic and generalize to new problems, we present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines’ capability of learning generalizable concepts at three levels: perception, syntax, and semantics. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images (i.e., perception), how multiple concepts are structurally combined to form a valid expression (i.e., syntax), and how concepts are realized to afford various reasoning tasks (i.e., semantics), all in a weakly supervised manner. Focusing on systematic generalization, we carefully design a fivefold test set to evaluate both the interpolation and the extrapolation of learned concepts w.r.t. the three levels. Further, we design a fewshot learning split to determine whether or not models can rapidly learn new concepts and generalize them to more complex scenarios. To comprehend existing models’ limitations, we undertake extensive experiments with various sequencetosequence models, including RNNs, Transformers, and GPT3 (with the chain of thought prompting). The results indicate that current models struggle to extrapolate to longrange syntactic dependency and semantics. Models exhibit a considerable gap toward humanlevel generalization when evaluated with new concepts in a fewshot setting. Moreover, we discover that it is infeasible to solve HINT by merely scaling up the dataset and the model size; this strategy contributes little to the extrapolation of syntax and semantics. Finally, in zeroshot GPT3 experiments, the chain of thought prompting exhibits impressive results and significantly boosts the test accuracy. We believe the HINT dataset and the experimental findings are of great interest to the learning community on systematic generalization.more » « less

Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pretrained language models such as GPT3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TABMWP), a new dataset containing 38,431 opendomain gradelevel problems that require mathematical reasoning on both textual and tabular data. Each question in TABMWP is aligned with a tabular context, which is presented as an image, semistructured text, and a structured table. There are two types of questions: freetext and multichoice, and each problem is annotated with gold solutions to reveal the multistep reasoning process. We evaluate different pretrained models on TABMWP, including the GPT3 model in a fewshot setting. As earlier studies suggest, since fewshot GPT3 relies on the selection of incontext examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TABMWP. To mitigate this, we further propose a novel approach, PROMPTPG, which utilizes policy gradient to learn to select incontext examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting incontext examples.more » « less

Equivariant representation is necessary for the brain and artificial perceptual systems to faithfully represent the stimulus under some (Lie) group transformations. However, it remains unknown how recurrent neural circuits in the brain represent the stimulus equivariantly, nor the neural representation of abstract group operators. The present study uses a onedimensional (1D) translation group as an example to explore the general recurrent neural circuit mechanism of the equivariant stimulus representation. We found that a continuous attractor network (CAN), a canonical neural circuit model, selfconsistently generates a continuous family of stationary population responses (attractors) that represents the stimulus equivariantly. Inspired by the Drosophila’s compass circuit, we found that the 1D translation operators can be represented by extra speed neurons besides the CAN, where speed neurons’ responses represent the moving speed (1D translation group parameter), and their feedback connections to the CAN represent the translation generator (Lie algebra). We demonstrated that the network responses are consistent with experimental data. Our model for the first time demonstrates how recurrent neural circuitry in the brain achieves equivariant stimulus representation.more » « less