skip to main content

Search for: All records

Creators/Authors contains: "Liu, B."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 1, 2023
  2. Free, publicly-accessible full text available March 10, 2023
  3. It has been recognized that jobs across different domains is becoming more data driven, and many aspects of the economy, society, and daily life depend more and more on data. Undergraduate education offers a critical link in providing more data science and engineering (DSE) exposure to students and expanding the supply of DSE talent. The National Academies have identified that effective DSE education requires both appropriate classwork and hands-on experience with real data and real applications. Currently significant progress has been made in classwork, while progress in hands-on research experience has been lacking. To fill this gap, we have proposedmore »to create data-enabled engineering project (DEEP) modules based on real data and applications, which is currently funded by the National Science Foundation (NSF) under the Improving Undergraduate STEM Education (IUSE) program. To achieve project goal, we have developed two internet-of-things (IoT) enabled laboratory engineering testbeds (LETs) and generated real data under various application scenarios. In addition, we have designed and developed several sample DEEP modules in interactive Jupyter Notebook using the generated data. These sample DEEP modules will also be ported to other interactive DSE learning environments, including Matlab Live Script and R Markdown, for wide and easy adoption. Finally, we have conducted metacognitive awareness gain (MAG) assessments to establish a baseline for assessing the effectiveness of DEEP modules in enhancing students’ reflection and metacognition. The DEEP modules that are currently being developed target students in Chemical Engineering, Electrical Engineering, Computer Science, and MS program in Data Science at xxx University. The modules will be deployed in the Spring of 2021, and we expect to have immediate impact to the targeted classes and students. We also anticipate that the DEEP modules can be adopted without modification to other disciplines in Engineering such as Mechanical, Industrial and Aerospace Engineering. They can also be easily extended to other disciplines in other colleges such as Liberal Arts by incorporating real data and applications from the respective disciplines. In this work, we will share our ideas, the rationale behind the proposed approach, the planned tasks for the project, the demonstration of modules developed, and potential dissemination venues.« less
    Free, publicly-accessible full text available July 26, 2022
  4. Training deep neural models in the presence of corrupted supervision is challenging as the corrupted data points may significantly impact the generalization performance. To alleviate this problem, we present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption and provides a unified framework for both classification and regression problems. Unlike many existing approaches that quantify the quality of the data points (e.g., based on their individual loss values), and filter them accordingly, the proposed algorithm focuses on controlling the collective impact of data points on the average gradient. Even when a corrupted datamore »point failed to be excluded by our algorithm, the data point will have a very limited impact on the overall loss, as compared with state-of-the-art filtering methods based on loss values. Extensive experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption.« less
  5. Boosting is a widely used learning technique in machine learning for solving classification problems. In boosting, one predicts the label of an example using an ensemble of weak classifiers. While boosting has shown tremendous success on many classification problems involving tabular data, it performs poorly on complex classification tasks involving low-level features such as image classification tasks. This drawback stems from the fact that boosting builds an additive model of weak classifiers, each of which has very little predictive power. Often, the resulting additive models are not powerful enough to approximate the complex decision boundaries of real-world classification problems. Inmore »this work, we present a general framework for boosting where, similar to traditional boosting, we aim to boost the performance of a weak learner and transform it into a strong learner. However, unlike traditional boosting, our framework allows for more complex forms of aggregation of weak learners. In this work, we specifically focus on one form of aggregation - function composition. We show that many popular greedy algorithms for learning deep neural networks (DNNs) can be derived from our framework using function compositions for aggregation. Moreover, we identify the drawbacks of these greedy algorithms and propose new algorithms that fix these issues. Using thorough empirical evaluation, we show that our learning algorithms have superior performance over traditional additive boosting algorithms, as well as existing greedy learning techniques for DNNs. An important feature of our algorithms is that they come with strong theoretical guarantees.« less
  6. Abstract The accurate simulation of additional interactions at the ATLAS experiment for the analysis of proton–proton collisions delivered by the Large Hadron Collider presents a significant challenge to the computing resources. During the LHC Run 2 (2015–2018), there were up to 70 inelastic interactions per bunch crossing, which need to be accounted for in Monte Carlo (MC) production. In this document, a new method to account for these additional interactions in the simulation chain is described. Instead of sampling the inelastic interactions and adding their energy deposits to a hard-scatter interaction one-by-one, the inelastic interactions are presampled, independent of the hardmore »scatter, and stored as combined events. Consequently, for each hard-scatter interaction, only one such presampled event needs to be added as part of the simulation chain. For the Run 2 simulation chain, with an average of 35 interactions per bunch crossing, this new method provides a substantial reduction in MC production CPU needs of around 20%, while reproducing the properties of the reconstructed quantities relevant for physics analyses with good accuracy.« less
    Free, publicly-accessible full text available December 1, 2023
  7. Many applications of machine learning require a model to make accurate predictions on test examples that are distributionally different from training ones, while task-specific labels are scarce during training. An effective approach to this challenge is to pre-train a model on related tasks where data is abundant, and then fine-tune it on a downstream task of interest. While pre-training has been effective in many language and vision domains, it remains an open question how to effectively use pre-training on graph datasets. In this paper, we develop a new strategy and self-supervised methods for pre-training Graph Neural Networks (GNNs). The keymore »to the success of our strategy is to pre-train an expressive GNN at the level of individual nodes as well as entire graphs so that the GNN can learn useful local and global representations simultaneously. We systematically study pre-training on multiple graph classification datasets. We find that naïve strategies, which pre-train GNNs at the level of either entire graphs or individual nodes, give limited improvement and can even lead to negative transfer on many downstream tasks. In contrast, our strategy avoids negative transfer and improves generalization significantly across downstream tasks, leading up to 9.4% absolute improvements in ROC-AUC over non-pre-trained models and achieving state-of-the-art performance for molecular property prediction and protein function prediction.« less