skip to main content


Search for: All records

Creators/Authors contains: "Zhang, Jianyi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. As a new format of mobile application, mini-programs, which function within a larger app and are built with HTML, CSS, and JavaScript web technology, have become the way to do almost everything in China. Many researchers have done the ecosystem or developing study, while the permission problem has not been investigated yet. In this paper, we present our studies on the permission management of mini-programs and conduct a systematic study on 9 popular mobile host app ecosystems that host over 7 million mini-programs. After testing over 2,580 APIs, we extracted a common abstract model for mini-programs’ permission control and revealed six categories of potential security vulnerabilities due to improper permission management. It is alarming that the current popular mobile app ecosystems (i.e., host apps) under study have at least one security vulnerability due to the mini-programs’ improper permission management. We present the corresponding attack methods to dissect these potential weaknesses further to exploit the discovered vulnerabilities. To prove that the revealed vulnerabilities may cause severe consequences in real-world use, we show three kinds of attacks without privileges or cracking the host apps. We have responsibly disclosed the newly discovered vulnerabilities, and two CVEs were issued. Finally, we put forward systematic suggestions to strengthen the standardization of mini-programs. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  2. Due to the often limited communication bandwidth of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training at each communication round. Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on this observation, we design an efficient heterogeneity-aware client sampling mechanism, namely, Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the grouped dataset from the intentionally selected clients. We first propose a measure of class-imbalance which can be derived in a privacy-preserving way. Based on this measure, we design a computationefficient client sampling strategy such that the actively selected clients will generate a more classbalanced grouped dataset with theoretical guarantees. Experimental results show that Fed-CBS outperforms the status quo approaches in terms of test accuracy and the rate of convergence while achieving comparable or even better performance than the ideal setting where all the available clients participate in the FL training. 
    more » « less
    Free, publicly-accessible full text available July 23, 2024
  3. Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the largescale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for taskspecific knowledge distillation on the GLUE benchmark (Wang et al., 2018a). 
    more » « less
    Free, publicly-accessible full text available July 9, 2024
  4. ABSTRACT The rapid expansion of food and nutrition information requires new ways of data sharing and dissemination. Interactive platforms integrating data portals and visualization dashboards have been effectively utilized to describe, monitor, and track information related to food and nutrition; however, a comprehensive evaluation of emerging interactive systems is lacking. We conducted a systematic review on publicly available dashboards using a set of 48 evaluation metrics for data integrity, completeness, granularity, visualization quality, and interactivity based on 4 major principles: evidence, efficiency, emphasis, and ethics. We evaluated 13 dashboards, summarized their characteristics, strengths, and limitations, and provided guidelines for developing nutrition dashboards. We applied mixed effects models to summarize evaluation results adjusted for interrater variability. The proposed metrics and evaluation principles help to improve data standardization and harmonization, dashboard performance and usability, broaden information and knowledge sharing among researchers, practitioners, and decision makers in the field of food and nutrition, and accelerate data literacy and communication. 
    more » « less