skip to main content


Search for: All records

Creators/Authors contains: "Pang, B."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system. Common strategies either adopt a two-step paradigm, which optimizes knowledge selection and response generation separately and may overlook the inherent correlation between these two tasks, or leverage conditional variational method to jointly optimize knowledge selection and response generation by employing an inference network. In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of se- lecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses. In addition to modeling contributions, our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  2. Generation of molecules with desired chemical and biological properties such as high drug-likeness, high binding affinity to target proteins, is critical for drug discovery. In this paper, we propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our model assumes an energy-based model (EBM) in the latent space. Conditional on the latent vector, the molecule and its properties are modeled by a molecule generation model and a property regression model respectively. To search for molecules with desired properties, we propose a sampling with gradual distribution shifting (SGDS) algorithm, so that after learning the model initially on the training data of existing molecules and their properties, the proposed algorithm gradually shifts the model distribution towards the region supported by molecules with desired values of properties. Our experiments show that our method achieves very strong performances on various molecule design tasks. 
    more » « less
    Free, publicly-accessible full text available August 1, 2024
  3. Learning energy-based model (EBM) requires MCMC sampling of the learned model as an inner loop of the learning algorithm. However, MCMC sampling of EBMs in high-dimensional data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space. This is a serious handicap for both theory and practice of EBMs. In this paper, we propose to learn EBM with a flow-based model (or in general latent variable model) serving as a backbone, so that the EBM is a correction or an exponential tilting of the flow-based model. We show that the model has a particularly simple form in the space of the latent variables of the generative model, and MCMC sampling of the EBM in the latent space mixes well and traverses modes in the data space. This enables proper sampling and learning of EBMs. 
    more » « less
  4. null (Ed.)
    Cyber-Physical Systems (CPS) are important components of critical infrastructure and must operate with high levels of reliability and security. We propose a conceptual approach to securing CPSs: the Cyber-Physical Immune System (CPIS), a collection of hardware and software elements deployed on top of a conventional CPS. Inspired by its biological counterpart, the CPIS comprises an independent network of distributed computing units that collects data from the conventional CPS, utilizes data-driven techniques to identify threats, adapts to the changing environment, alerts the user of any threats or anomalies, and deploys threat-mitigation strategies. 
    more » « less
  5. Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts. 
    more » « less
  6. We propose to learn energy-based model (EBM) in the latent space of a generator model, so that the EBM serves as a prior model that stands on the top-down networkofthegeneratormodel. BoththelatentspaceEBMandthetop-down network can be learned jointly by maximum likelihood, which involves short-run MCMC sampling from both the prior and posterior distributions of the latent vector. Due to the low dimensionality of the latent space and the expressiveness of the top-down network, a simple EBM in latent space can capture regularities in the data effectively, and MCMC sampling in latent space is efficient and mixes well. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection. The one-page code can be found in supplementary materials. 
    more » « less
  7. Human trajectory prediction is critical for autonomous platforms like self-driving cars or social robots. We present a latent belief energy-based model (LB-EBM) for diverse human trajectory forecast. LB-EBM is a probabilistic model with cost function defined in the latent space to account for the movement history and social context. The low dimensionality of the latent space and the high expressivity of the EBM make it easy for the model to capture the multimodality of pedestrian trajectory distributions. LB-EBM is learned from expert demonstrations (i.e., human trajectories) projected into the latent space. Sampling from or optimizing the learned LB-EBM yields a belief vector which is used to make a path plan, which then in turn helps to predict a long-range trajectory. The effectiveness of LB-EBM and the two-step approach are supported by strong empirical results. Our model is able to make accurate, multi-modal, and social compliant trajectory predictions and improves over prior state-of-the-arts performance on the Stanford Drone trajectory prediction benchmark by 10:9% and on the ETH-UCY benchmark by 27:6%. 
    more » « less
  8. Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolation in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model. 
    more » « less
  9. This paper studies the fundamental problem of learning deep generative models that consist of multiple layers of latent variables organized in top-down architectures. Such models have high expressivity and allow for learning hierarchical representations. Learning such a generative model requires inferring the latent variables for each training example based on the posterior distribution of these latent variables. The inference typically requires Markov chain Monte Caro (MCMC) that can be time consuming. In this paper, we propose to use noise initialized non-persistent short run MCMC, such as nite step Langevin dynamics initialized from the prior distribution of the latent variables, as an approximate inference engine, where the step size of the Langevin dynamics is variationally optimized by minimizing the Kullback-Leibler divergence between the distribution produced by the short run MCMC and the posterior distribution. Our experiments show that the proposed method outperforms variational auto-encoder (VAE) in terms of reconstruction error and synthesis quality. The advantage of the proposed method is that it is simple and automatic without the need to design an inference model. 
    more » « less
  10. Abstract We present our current best estimate of the plausible observing scenarios for the Advanced LIGO, Advanced Virgo and KAGRA gravitational-wave detectors over the next several years, with the intention of providing information to facilitate planning for multi-messenger astronomy with gravitational waves. We estimate the sensitivity of the network to transient gravitational-wave signals for the third (O3), fourth (O4) and fifth observing (O5) runs, including the planned upgrades of the Advanced LIGO and Advanced Virgo detectors. We study the capability of the network to determine the sky location of the source for gravitational-wave signals from the inspiral of binary systems of compact objects, that is binary neutron star, neutron star–black hole, and binary black hole systems. The ability to localize the sources is given as a sky-area probability, luminosity distance, and comoving volume. The median sky localization area (90% credible region) is expected to be a few hundreds of square degrees for all types of binary systems during O3 with the Advanced LIGO and Virgo (HLV) network. The median sky localization area will improve to a few tens of square degrees during O4 with the Advanced LIGO, Virgo, and KAGRA (HLVK) network. During O3, the median localization volume (90% credible region) is expected to be on the order of $$10^{5}, 10^{6}, 10^{7}\mathrm {\ Mpc}^3$$ 10 5 , 10 6 , 10 7 Mpc 3 for binary neutron star, neutron star–black hole, and binary black hole systems, respectively. The localization volume in O4 is expected to be about a factor two smaller than in O3. We predict a detection count of $$1^{+12}_{-1}$$ 1 - 1 + 12 ( $$10^{+52}_{-10}$$ 10 - 10 + 52 ) for binary neutron star mergers, of $$0^{+19}_{-0}$$ 0 - 0 + 19 ( $$1^{+91}_{-1}$$ 1 - 1 + 91 ) for neutron star–black hole mergers, and $$17^{+22}_{-11}$$ 17 - 11 + 22 ( $$79^{+89}_{-44}$$ 79 - 44 + 89 ) for binary black hole mergers in a one-calendar-year observing run of the HLV network during O3 (HLVK network during O4). We evaluate sensitivity and localization expectations for unmodeled signal searches, including the search for intermediate mass black hole binary mergers. 
    more » « less