skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: Data-Driven Optimal Transport Cost Selection For Distributionally Robust Optimization
Some recent works showed that several machine learning algorithms, such as square-root Lasso, Support Vector Machines, and regularized logistic regression, among many others, can be represented exactly as distributionally robust optimization (DRO) problems. The distributional uncertainty set is defined as a neighborhood centered at the empirical distribution, and the neighborhood is measured by optimal transport distance. In this paper, we propose a methodology which learns such neighborhood in a natural data-driven way. We show rigorously that our framework encompasses adaptive regularization as a particular case. Moreover, we demonstrate empirically that our proposed methodology is able to improve upon a wide range of popular machine learning estimators.  more » « less
Award ID(s):
1820942 1915967
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2019 Winter Simulation Conference
Page Range / eLocation ID:
3740 to 3751
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Several machine learning methods leverage the idea of locality by using k-nearest neighbor (KNN) techniques to design better pattern recognition models. However, the choice of KNN parameters such as k is often made experimentally, e.g., via cross-validation, leading to local neighborhoods without a clear geometric interpretation. In this paper, we replace KNN with our recently introduced polytope neighborhood scheme - Non Negative Kernel regression (NNK). NNK formulates neighborhood selection as a sparse signal approximation problem and is adaptive to the local distribution of samples in the neighborhood of the data point of interest. We analyze the benefits of local neighborhood construction based on NNK. In particular, we study the generalization properties of local interpolation using NNK and present data dependent bounds in the non asymptotic setting. The applicability of NNK in transductive few shot learning setting and for measuring distance between two datasets is demonstrated. NNK exhibits robust, superior performance in comparison to standard locally weighted neighborhood methods. 
    more » « less
  2. Abstract

    Human mobility analysis plays a crucial role in urban analysis, city planning, epidemic modeling, and even understanding neighborhood effects on individuals’ health. Often, these studies model human mobility in the form of co-location networks. We have recently seen the tremendous success of network representation learning models on several machine learning tasks on graphs. To the best of our knowledge, limited attention has been paid to identifying communities using network representation learning methods specifically for co-location networks. We attempt to address this problem and study user mobility behavior through the communities identified with latent node representations. Specifically, we select several diverse network representation learning models to identify communities from a real-world co-location network. We include both general-purpose representation models that make no assumptions on network modality as well as approaches designed specifically for human mobility analysis. We evaluate these different methods on data collected in the Adolescent Health and Development in Context study. Our experimental analysis reveals that a recently proposed method (LocationTrails) offers a competitive advantage over other methods with respect to its ability to represent and reflect community assignment that is consistent with extant findings regarding neighborhood racial and socio-economic differences in mobility patterns. We also compare the learned activity profiles of individuals by factoring in their residential neighborhoods. Our analysis reveals a significant contrast in the activity profiles of individuals residing in white-dominated versus black-dominated neighborhoods and advantaged versus disadvantaged neighborhoods in a major metropolitan city of United States. We provide a clear rationale for this contrastive pattern through insights from the sociological literature.

    more » « less
  3. Broadband infrastructure in urban parks may serve crucial functions including an amenity to boost overall park use and a bridge to propagate WiFi access into contiguous neighborhoods. This project: SCC:PG Park WiFi as a BRIDGE to Community Resilience has developed a new model —Build Resilience through the Internet and Digital Greenspace Exposure, leveraging off-the-shelf WiFi technology, novel algorithms, community assets, and local partnerships to lower greenspace WiFi costs. This interdisciplinary work leverages: computer science, information studies, landscape architecture, and public health. Collaboration methodologies and relational definitions across disciplines are still nascent —especially when paired with civic-engaged, applied research. Student researchers (UG/Grad) are excellent partners in bridging disciplinary barriers and constraints. Their capacity to assimilate multiple frameworks has produced refinements to the project’s theoretical lenses and suggested novel socio-technical methodology improvements. Further, they are excellent ambassadors to community partners and stakeholders. In BRIDGE, we tested two mechanisms to augment student research participation. In both, we leveraged a classic, curriculum-based model named the Partnership for Action Learning in Sustainability program (PALS). This campus-wide, community-engaged initiative pairs faculty and students with community partners. PALS curates economic, environmental, and social sustainability challenges and scopes projects to customize appropriate coursework that addresses identified challenges. Outcomes include: literature searches, wireframes, and design plans that target solutions to civic problems. Constraints include the short semester timeframe and curriculum-learning-outcome constraints. (1) On BRIDGE, Dr. Kweon executed a semester-based Landscape Architecture PALS 400-level-studio. 18 undergraduates conducted in-class and in-field work to assess community needs and proposed design solutions for future park-wide WiFi. Research topics included: community-park history, neighborhood demographics, case-study analysis, and land-cover characteristics. The students conducted an in-Park, community engagement session —via interactive posterboard surveys, to gain input on what park amenities might be redesigned or added to promote WiFi use. The students then produced seven re-design plans; one included a café/garden, with an eco-corridor that integrated technology with nature. (2) From the classic, curriculum-based PALS model we created a summer-intensive for our five research assistants, to stimulate interdisciplinary collaboration in their research tasks and co-analysis of project data products: experimental technical WiFi-setup, community survey results, and stakeholder needs-assessments. Students met weekly with each other and team leadership, exchanged journal articles, and attended joint research events. This model shows promise for integrating students more formally into an interdisciplinary research project. An end-of-intensive focus group highlighted, from the students’ perspective, the pro/cons of this model. Results: In contrasting the two mechanisms, our results include: Model 1 is tried-and-trued and produces standardized, reliable products. However, as work is group based, student independence is limited —to explore topics/themes of interest. Civic groups are typically thrilled with the diversity of action plans produced. Model 2 provides greater independence in student-learning outcomes, fosters interdisciplinary, “dictionary-building” that can be used by the full team, deepens methodological approaches, and allows for student stipend payments. Lessons learned: intensive time frame needed more research team support and ideally should be extended, when possible, over the full project-span. UMD-IRB#1785365-4; NSF-award: 2125526. 
    more » « less
  4. Algorithms provide powerful tools for detecting and dissecting human bias and error. Here, we develop machine learning methods to to analyze how humans err in a particular high-stakes task: image interpretation. We leverage a unique dataset of 16,135,392 human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 US election, based on a Google Street View image. We show that by training a machine learning estimator of the Bayes optimal decision for each image, we can provide an actionable decomposition of human error into bias, variance, and noise terms, and further identify specific features (like pickup trucks) which lead humans astray. Our methods can be applied to ensure that human-in-the-loop decision-making is accurate and fair and are also applicable to black-box algorithmic systems. 
    more » « less
  5. Abstract The human embryo is a complex structure that emerges and develops as a result of cell-level decisions guided by both intrinsic genetic programs and cell–cell interactions. Given limited accessibility and associated ethical constraints of human embryonic tissue samples, researchers have turned to the use of human stem cells to generate embryo models to study specific embryogenic developmental steps. However, to study complex self-organizing developmental events using embryo models, there is a need for computational and imaging tools for detailed characterization of cell-level dynamics at the single cell level. In this work, we obtained live cell imaging data from a human pluripotent stem cell (hPSC)-based epiblast model that can recapitulate the lumenal epiblast cyst formation soon after implantation of the human blastocyst. By processing imaging data with a Python pipeline that incorporates both cell tracking and event recognition with the use of a CNN-LSTM machine learning model, we obtained detailed temporal information of changes in cell state and neighborhood during the dynamic growth and morphogenesis of lumenal hPSC cysts. The use of this tool combined with reporter lines for cell types of interest will drive future mechanistic studies of hPSC fate specification in embryo models and will advance our understanding of how cell-level decisions lead to global organization and emergent phenomena. Insight, innovation, integration: Human pluripotent stem cells (hPSCs) have been successfully used to model and understand cellular events that take place during human embryogenesis. Understanding how cell–cell and cell–environment interactions guide cell actions within a hPSC-based embryo model is a key step in elucidating the mechanisms driving system-level embryonic patterning and growth. In this work, we present a robust video analysis pipeline that incorporates the use of machine learning methods to fully characterize the process of hPSC self-organization into lumenal cysts to mimic the lumenal epiblast cyst formation soon after implantation of the human blastocyst. This pipeline will be a useful tool for understanding cellular mechanisms underlying key embryogenic events in embryo models. 
    more » « less