Multi-label classification (MLC), which assigns multiple labels to each instance, is crucial to domains from computer vision to text mining. Conventional methods for MLC require huge amounts of labeled data to capture complex dependencies between labels. However, such labeled datasets are expensive, or even impossible, to acquire. Worse yet, these pre-trained MLC models can only be used for the particular label set covered in the training data. Despite this severe limitation, few methods exist for expanding the set of labels predicted by pre-trained models. Instead, we acquire vast amounts of new labeled data and retrain a new model from scratch. Here, we propose combining the knowledge from multiple pre-trained models (teachers) to train a new student model that covers the union of the labels predicted by this set of teachers. This student supports a broader label set than any one of its teachers without using labeled data. We call this new problem knowledge amalgamation for multi-label classification. Our new method, Adaptive KNowledge Transfer (ANT), trains a student by learning from each teacher’s partial knowledge of label dependencies to infer the global dependencies between all labels across the teachers. We show that ANT succeeds in unifying label dependencies among teachers, outperforming five state-of-the-art methods on eight real-world datasets. 
                        more » 
                        « less   
                    This content will become publicly available on January 22, 2026
                            
                            MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling
                        
                    
    
            Longitudinal human behavior modeling has received increasing attention over the years due to its widespread applications to patient monitoring, dietary and lifestyle recommendations, and just-in-time intervention for at-risk individuals (e.g., prob- lematic drug users and struggling students), to name a few. Using in-the-moment health data collected via ubiquitous devices (e.g., smartphones and smartwatches), this multidisciplinary field focuses on developing predictive models for certain health or well-being outcomes (e.g., depression and stress) in the short future given the time series of individual behaviors (e.g., resting heart rate, sleep quality, and current feelings). Yet, most existing models on these data, which we refer to as ubiquitous health data, do not achieve adequate accuracy. The latest works that yielded promising results have yet to consider realistic aspects of ubiquitous health data (e.g., containing features of different types and high rate of missing values) and the consumption of various resources (e.g., computing power, time, and cost). Given these two shortcomings, it is dubious whether these studies could translate to realistic settings. In this paper, we propose MuHBoost, a multi-label boosting method for addressing these shortcomings, by leveraging advanced methods in large language model (LLM) prompting and multi-label classification (MLC) to jointly predict multiple health or well-being outcomes. Because LLMs can hal- lucinate when tasked with answering multiple questions simultaneously, we also develop two variants of MuHBoost that alleviate this issue and thereby enhance its predictive performance. We conduct extensive experiments to evaluate MuH- Boost and its variants on 13 health and well-being prediction tasks defined from four realistic ubiquitous health datasets. Our results show that our three developed methods outperform all considered baselines across three standard MLC metrics, demonstrating their effectiveness while ensuring resource efficiency. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10611395
- Publisher / Repository:
- ICLR
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Online social communities are becoming windows for learning more about the health of populations, through information about our health-related behaviors and outcomes from daily life. At the same time, just as public health data and theory has shown that aspects of the built environment can affect our health-related behaviors and outcomes, it is also possible that online social environments (e.g., posts and other attributes of our online social networks) can also shape facets of our life. Given the important role of the online environment in public health research and implications, factors which contribute to the generation of such data must be well understood. Here we study the role of the built and online social environments in the expression of dining on Instagram in Abu Dhabi; a ubiquitous social media platform, city with a vibrant dining culture, and a topic (food posts) which has been studied in relation to public health outcomes. Our study uses available data on user Instagram profiles and their Instagram networks, as well as the local food environment measured through the dining types (e.g., casual dining restaurants, food court restaurants, lounges etc.) by neighborhood. We find evidence that factors of the online social environment (profiles that post about dining versus profiles that do not post about dining) have different influences on the relationship between a user’s built environment and the social dining expression, with effects also varying by dining types in the environment and time of day. We examine the mechanism of the relationships via moderation and mediation analyses. Overall, this study provides evidence that the interplay of online and built environments depend on attributes of said environments and can also vary by time of day. We discuss implications of this synergy for precisely-targeting public health interventions, as well as on using online data for public health research.more » « less
- 
            Heart rate, a commonly accessible health data from most wearables, carries rich information of a person’s well-being, yet remains of limited deep health applications, due to the lack of groundtruth of health events and their impact on heart rate patterns. Specifically, standard health analytics usually are designed based on well-modeled health conditions thus known data patterns and rich training data. To bridge the gap, we propose HeartInsightify, an exploratory framework that facilitates the process of deriving health-relevant measurable indicators from longitudinal heart rate data, without any of the above knowledge. HeartInsightify focuses on comparative and qualitative study, using model-free statistical methods such as conformal prediction, to study similarities, perform clustering and detect outliers, and build multi-resolutional data summaries, allowing human experts to efficiently examine and verify their health relevance. We conduct extensive experiments to evaluate HeartInsightify using individuals’ free-living heart rate data collected through Fitbit over 6 years. We illustrate the process of analyzing heart rate data for its health relevance and demonstrate the effectiveness of HeartInsightify. We envision that HeartInsightify lays the groundwork for personalized health analytics with continuous monitoring data from wearables.more » « less
- 
            Disease surveillance systems provide early warnings of disease outbreaks before they become public health emergencies. However, pandemics containment would be challenging due to the complex immunity landscape created by multiple variants. Genomic surveillance is critical for detecting novel variants with diverse characteristics and importation/emergence times. Yet, a systematic study incorporating genomic monitoring, situation assessment, and intervention strategies is lacking in the literature. We formulate an integrated computational modeling framework to study a realistic course of action based on sequencing, analysis, and response. We study the effects of the second variant’s importation time, its infectiousness advantage and, its cross-infection on the novel variant’s detection time, and the resulting intervention scenarios to contain epidemics driven by two-variants dynamics. Our results illustrate the limitation in the intervention’s effectiveness due to the variants’ competing dynamics and provide the following insights: i) There is a set of importation times that yields the worst detection time for the second variant, which depends on the first variant’s basic reproductive number; ii) When the second variant is imported relatively early with respect to the first variant, the cross-infection level does not impact the detection time of the second variant. We found that depending on the target metric, the best outcomes are attained under different interventions’ regimes. Our results emphasize the importance of sustained enforcement of Non-Pharmaceutical Interventions on preventing epidemic resurgence due to importation/emergence of novel variants. We also discuss how our methods can be used to study when a novel variant emerges within a population.more » « less
- 
            Abstract The increasing prevalence of wearable devices enables low-cost, long-term collection of health relevant data such as heart rate, exercise, and sleep signals. Currently these data are used to monitor short term changes with limited interpretation of their relevance to health. These data provide an untapped resource to monitor daily and long-term activity patterns. Changes and trends identified from such data can provide insights and guidance to the management of many chronic conditions that change over time. In this study we conducted a machine learning based analysis of longitudinal heart rate data collected over multiple years from Fitbit devices. We built a multi-resolutional pipeline for time series analysis, using model-free clustering methods inspired by statistical conformal prediction framework. With this method, we were able to detect health relevant events, their interesting patterns (e.g., daily routines, seasonal differences, and anomalies), and correlations to acute and chronic changes in health conditions. We present the results, lessons, and insights learned, and how to address the challenge of lack of labels. The study confirms the value of long-term heart rate data for health monitoring and surveillance, as complementary to extensive yet intermittent examinations by health care providers.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
