skip to main content


Search for: All records

Creators/Authors contains: "Luo, Tianyi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 4, 2024
  2. Wisdom of the crowd (Surowiecki, 2005a) disclosed a striking fact that the majority voting answer from a crowd is usually more accurate than a few individual experts. The same story is observed in machine learning - ensemble methods (Dietterich, 2000) leverage this idea to exploit multiple machine learning algorithms in various settings e.g., supervised learning and semi-supervised learning to achieve better performance by aggregating the predictions of different algorithms than that obtained from any constituent algorithm alone. Nonetheless, the existing aggregating rule would fail when the majority answer of all the constituent algorithms is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum (Prelec, 2004) that “a surprisingly more popular answer is more likely to be the true answer instead of the majority one” to supervised classification further improved by ensemble final predictions method and semi-supervised classification (e.g., MixMatch (Berthelot et al., 2019)) enhanced by ensemble data augmentations method. The challenge for us is to define or detect when an answer should be considered as being “surprising”. We present two machine learning aided methods which can reveal the truth when the minority instead of majority has the true answer on both settings of supervised and semi-supervised classification problems. We name our proposed method the Machine Truth Serum. Our experiments on a set of classification tasks (image, text, etc.) show that the classification performance can be further improved by applying Machine Truth Serum in the ensemble final predictions step (supervised) and in the ensemble data augmentations step (semi-supervised). 
    more » « less
  3. Semi-supervised learning (SSL) has demonstrated its potential to improve the model accuracy for a variety of learning tasks when the high-quality supervised data is severely limited. Although it is often established that the average accuracy for the entire population of data is improved, it is unclear how SSL fares with different sub-populations. Understanding the above question has substantial fairness implications when different sub-populations are defined by the demographic groups that we aim to treat fairly. In this paper, we reveal the disparate impacts of deploying SSL: the sub-population who has a higher baseline accuracy without using SSL (the "rich" one) tends to benefit more from SSL; while the sub-population who suffers from a low baseline accuracy (the "poor" one) might even observe a performance drop after adding the SSL module. We theoretically and empirically establish the above observation for a broad family of SSL algorithms, which either explicitly or implicitly use an auxiliary "pseudo-label". Experiments on a set of image and text classification tasks confirm our claims. We introduce a new metric, Benefit Ratio, and promote the evaluation of the fairness of SSL (Equalized Benefit Ratio). We further discuss how the disparate impact can be mitigated. We hope our paper will alarm the potential pitfall of using SSL and encourage a multifaceted evaluation of future SSL algorithms. 
    more » « less
  4. null (Ed.)
    Knowing whether a published research result can be replicated is important. Carrying out direct replication of published research incurs a high cost. There are efforts tried to use machine learning aided methods to predict scientific claims’ replicability. However, existing machine learning aided approaches use only hand-extracted statistics features such as p-value, sample size, etc. without utilizing research papers’ text information and train only on a very small size of annotated data without making the most use of a large number of unlabeled articles. Therefore, it is desirable to develop effective machine learning aided automatic methods which can automatically extract text information as features so that we can benefit from Natural Language Processing techniques. Besides, we aim for an approach that benefits from both labeled and the large number of unlabeled data. In this paper, we propose two weakly supervised learning approaches that use automatically extracted text information of research papers to improve the prediction accuracy of research replication using both labeled and unlabeled datasets. Our experiments over real-world datasets show that our approaches obtain much better prediction performance compared to the supervised models utilizing only statistic features and a small size of labeled dataset. Further, we are able to achieve an accuracy of 75.76% for predicting the replicability of research. 
    more » « less
  5. Abstract

    Synthesis of a pentasil‐type zeolite with ultra‐small few‐unit‐cell crystalline domains, which we call FDP (few‐unit‐cell crystalline domain pentasil), is reported. FDP is made using bis‐1,5(tributyl ammonium) pentamethylene cations as structure directing agent (SDA). This di‐quaternary ammonium SDA combines butyl ammonium, in place of the one commonly used for MFI synthesis, propyl ammonium, and a five‐carbon nitrogen‐connecting chain, in place of the six‐carbon connecting chain SDAs that are known to fit well within the MFI pores. X‐ray diffraction analysis and electron microscopy imaging of FDP indicate ca. 10 nm crystalline domains organized in hierarchical micro‐/meso‐porous aggregates exhibiting mesoscopic order with an aggregate particle size up to ca. 5 μm. Al and Sn can be incorporated into the FDP zeolite framework to produce active and selective methanol‐to‐hydrocarbon and glucose isomerization catalysts, respectively.

     
    more » « less
  6. Abstract

    Synthesis of a pentasil‐type zeolite with ultra‐small few‐unit‐cell crystalline domains, which we call FDP (few‐unit‐cell crystalline domain pentasil), is reported. FDP is made using bis‐1,5(tributyl ammonium) pentamethylene cations as structure directing agent (SDA). This di‐quaternary ammonium SDA combines butyl ammonium, in place of the one commonly used for MFI synthesis, propyl ammonium, and a five‐carbon nitrogen‐connecting chain, in place of the six‐carbon connecting chain SDAs that are known to fit well within the MFI pores. X‐ray diffraction analysis and electron microscopy imaging of FDP indicate ca. 10 nm crystalline domains organized in hierarchical micro‐/meso‐porous aggregates exhibiting mesoscopic order with an aggregate particle size up to ca. 5 μm. Al and Sn can be incorporated into the FDP zeolite framework to produce active and selective methanol‐to‐hydrocarbon and glucose isomerization catalysts, respectively.

     
    more » « less