skip to main content


Search for: All records

Creators/Authors contains: "Howe, Bill"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems. 
    more » « less
  2. null (Ed.)
    Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated representations, called EquiTensors, from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on convolutional denoising autoencoders to learn shared representations. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial learning to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show that EquiTensors could help mitigate the effects of the sensitive information embodied in the biased data. Meanwhile, applications using EquiTensors outperform models that ignore exogenous features and are competitive with "oracle" models that use hand-selected datasets. 
    more » « less
  3. null (Ed.)
    The COVID-19 pandemic is compelling us to make crucial data-driven decisions quickly, bringing together diverse and unreliable sources of information without the usual quality control mechanisms we may employ. These decisions are consequential at multiple levels: They can inform local, state, and national government policy, be used to schedule access to physical resources such as elevators and workspaces within an organization, and inform contact tracing and quarantine actions for individuals. In all these cases, significant inequities are likely to arise and to be propagated and reinforced by data-driven decision systems. In this article, we propose a framework, called FIDES, for surfacing and reasoning about data equity in these systems. 
    more » « less
  4. Costa, Constantinos ; Pitoura, Evaggelia (Ed.)
    Data-driven systems can be unfair, in many different ways. All too often, as data scientists, we focus narrowly on one technical aspect of fairness. In this paper, we attempt to address equity broadly, and identify the many different ways in which it is manifest in data-driven systems. 
    more » « less
  5. null (Ed.)
    We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce, but free-text descriptions are common. JECL trains by minimizing the Kullback-Leibler divergence between the distribution of the images and text to that of a combined joint target distribution and optimizing the Jensen-Shannon divergence between the soft cluster assignments of the images and text. Regularizers are also applied to JECL to prevent trivial solutions. Experiments show that JECL outperforms both single-view and multi-view methods on large benchmark image-caption datasets, and is remarkably robust to missing captions and varying data sizes. 
    more » « less
  6. The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are Automated Decision Systems (ADS), socio-legal-technical systems that are used broadly in industry, non-pro fits, and government. ADS process data about people, help make decisions that are consequential to people's lives, are designed with the stated goals of improving efficiency and promoting equitable access to opportunity, involve a combination of human and automated decision making, and are subject to auditing for legal compliance and to public disclosure. They may or may not use AI, and may or may not operate with a high degree of autonomy, but they rely heavily on data. In this article, we argue that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of ADS. We outline a technical research agenda that requires that we step outside our comfort zone of engineering for efficiency and accuracy, to also incorporate reasoning about values and beliefs. This seems high-risk, but one of the upsides is being able to explain to our children what we do and why it matters. 
    more » « less
  7. Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility yet have been shown to reinforce socioeconomic inequity. These services rely on accurate demand prediction, but the demand data on which these models are trained reflect biases around demographics, socioeconomic conditions, and entrenched geographic patterns. To address these biases and improve fairness, we present FairST, a fairness-aware demand prediction model for spatiotemporal urban applications, with emphasis on new mobility. We use 1D (time-varying, space-constant), 2D (space-varying, time-constant) and 3D (both time- and space-varying) convolutional branches to integrate heterogeneous features, while including fairness metrics as a form of regularization to improve equity across demographic groups. We propose two spatiotemporal fairness metrics, region-based fairness gap (RFG), applicable when demographic information is provided as a constant for a region, and individual-based fairness gap (IFG), applicable when a continuous distribution of demographic information is available. Experimental results on bike share and ride share datasets show that FairST can reduce inequity in demand prediction for multiple sensitive attributes (i.e. race, age, and education level), while achieving better accuracy than even state-of-the-art fairness-oblivious methods. 
    more » « less
  8. This paper reviews the methods and findings of mobility equity studies, with a focus on new mobility. 
    more » « less