skip to main content


Title: Contextual Integrity Up and Down the Data Food Chain
According to the theory of contextual integrity (CI), privacy norms prescribe information flows with reference to five parameters — sender, recipient, subject, information type, and transmission principle. Because privacy is grasped contextually (e.g., health, education, civic life, etc.), the values of these parameters range over contextually meaningful ontologies — of information types (or topics) and actors (subjects, senders, and recipients), in contextually defined capacities. As an alternative to predominant approaches to privacy, which were ineffective against novel information practices enabled by IT, CI was able both to pinpoint sources of disruption and provide grounds for either accepting or rejecting them. Mounting challenges from a burgeoning array of networked, sensor-enabled devices (IoT) and data-ravenous machine learning systems, similar in form though magnified in scope, call for renewed attention to theory. This Article introduces the metaphor of a data (food) chain to capture the nature of these challenges. With motion up the chain, where data of higher order is inferred from lower-order data, the crucial question is whether privacy norms governing lower-order data are sufficient for the inferred higher-order data. While CI has a response to this question, a greater challenge comes from data primitives, such as digital impulses of mouse clicks, motion detectors, and bare GPS coordinates, because they appear to have no meaning. Absent a semantics, they escape CI’s privacy norms entirely.  more » « less
Award ID(s):
1650589
NSF-PAR ID:
10124891
Author(s) / Creator(s):
Date Published:
Journal Name:
Theoretical inquiries in law
Volume:
20
Issue:
1
ISSN:
1565-1509
Page Range / eLocation ID:
221-256
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. According to the theory of contextual integrity (CI), privacy norms prescribe information flows with reference to five parameters — sender, recipient, subject, information type, and transmission principle. Because privacy is grasped contextually (e.g., health, education, civic life, etc.), the values of these parameters range over contextually meaningful ontologies — of information types (or topics) and actors (subjects, senders, and recipients), in contextually defined capacities. As an alternative to predominant approaches to privacy, which were ineffective against novel information practices enabled by IT, CI was able both to pinpoint sources of disruption and provide grounds for either accepting or rejecting them. Mounting challenges from a burgeoning array of networked, sensor-enabled devices (IoT) and data-ravenous machine learning systems, similar in form though magnified in scope, call for renewed attention to theory. This Article introduces the metaphor of a data (food) chain to capture the nature of these challenges. With motion up the chain, where data of higher order is inferred from lower-order data, the crucial question is whether privacy norms governing lower-order data are sufficient for the inferred higher-order data. While CI has a response to this question, a greater challenge comes from data primitives, such as digital impulses of mouse clicks, motion detectors, and bare GPS coordinates, because they appear to have no meaning. Absent a semantics, they escape CI’s privacy norms entirely. 
    more » « less
  2. According to the theory of contextual integrity (CI), privacy norms prescribe information flows with reference to five parameters — sender, recipient, subject, information type, and transmission principle. Because privacy is grasped contextually (e.g., health, education, civic life, etc.), the values of these parameters range over contextually meaningful ontologies — of information types (or topics) and actors (subjects, senders, and recipients), in contextually defined capacities. As an alternative to predominant approaches to privacy, which were ineffective against novel information practices enabled by IT, CI was able both to pinpoint sources of disruption and provide grounds for either accepting or rejecting them. Mounting challenges from a burgeoning array of networked, sensor-enabled devices (IoT) and data-ravenous machine learning systems, similar in form though magnified in scope, call for renewed attention to theory. This Article introduces the metaphor of a data (food) chain to capture the nature of these challenges. With motion up the chain, where data of higher order is inferred from lower-order data, the crucial question is whether privacy norms governing lower-order data are sufficient for the inferred higher-order data. While CI has a response to this question, a greater challenge comes from data primitives, such as digital impulses of mouse clicks, motion detectors, and bare GPS coordinates, because they appear to have no meaning. Absent a semantics, they escape CI’s privacy norms entirely. 
    more » « less
  3. A recent technology breakthrough in spatial molecular profiling (SMP) has enabled the comprehensive molecular characterizations of single cells while preserving spatial information. It provides new opportunities to delineate how cells from different origins form tissues with distinctive structures and functions. One immediate question in SMP data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process to capture the spatial patterns. However, the Gaussian process models rely on ad hoc kernels that could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types of spatial patterns, we introduce a Bayesian approach to identify SV genes via a modified Ising model. The key idea is to use the energy interaction parameter of the Ising model to characterize spatial expression patterns. We use auxiliary variable Markov chain Monte Carlo algorithms to sample from the posterior distribution with an intractable normalizing constant in the model. Simulation studies using both simulated and synthetic data showed that the energy‐based modeling approach led to higher accuracy in detecting SV genes than those kernel‐based methods. When applied to two real spatial transcriptomics (ST) datasets, the proposed method discovered novel spatial patterns that shed light on the biological mechanisms. In summary, the proposed method presents a new perspective for analyzing ST data.

     
    more » « less
  4. Anwer, Nabil (Ed.)
    Design documentation is presumed to contain massive amounts of valuable information and expert knowledge that is useful for learning from the past successes and failures. However, the current practice of documenting design in most industries does not result in big data that can support a true digital transformation of enterprise. Very little information on concepts and decisions in early product design has been digitally captured, and the access and retrieval of them via taxonomy-based knowledge management systems are very challenging because most rule-based classification and search systems cannot concurrently process heterogeneous data (text, figures, tables, references). When experts retire or leave a design unit, industry often cannot benefit from past knowledge for future product design, and is left to reinvent the wheel repeatedly. In this work, we present AI-based Natural Language Processing (NLP) models which are trained for contextually representing technical documents containing texts, figures and tables, to do a semantic search for the retrieval of relevant data across large corpora of documents. By connecting textual and non-textual data through the use of an associative database, the semantic search question-answering system we developed can provide more comprehensive answers in the context of users’ questions. For the demonstration and assessment of this model, the semantic search question-answering system is applied to the Intergovernmental Panel on Climate Change (IPCC) Special Report 2019, which is more than 600 pages long and difficult to read and understand, even by most experts. Users can input custom queries relating to climate change concerns and receive evidence from the report that is contextually meaningful. We expect this method can transform current repositories of design documentation of heterogeneous data forms into structured knowledge-bases which can return relevant information efficiently as well as can evolve to embody manageable big data for the true digital transformation of design. 
    more » « less
  5. Abstract

    Estimates of the onset of sediment motion are integral for flood protection and river management but are often highly inaccurate. The critical shear stress (τ*c) for grain entrainment is often assumed constant, but measured values can vary by almost an order of magnitude between rivers. Such variations are typically explained by differences in measurement methodology, grain size distributions, or flow hydraulics, whereas grain resistance to motion is largely assumed to be constant. We demonstrate that grain resistance varies strongly with the bed structure, which is encapsulated by the particle height above surrounding sediment (protrusion,p) and intergranular friction (ϕf). We incorporate these parameters into a novel theory that correctly predicts resisting forces estimated in the laboratory, field, and a numerical model. Our theory challenges existing models, which significantly overestimate bed mobility. In our theory, small changes inpandϕfcan induce large changes inτ*cwithout needing to invoke variations in measurement methods or grain size. A data compilation also reveals that scatter in empirical values ofτ*ccan be partly explained by differences inpbetween rivers. Therefore, spatial and temporal variations in bed structure can partly explain the deviation ofτ*cfrom an assumed constant value. Given that bed structure is known to vary with applied shear stresses and upstream sediment supply, we conclude that a constantτ*cis unlikely. Values ofτ*care not interchangeable between streams, or even through time in a given stream, because they are encoded with the channel history.

     
    more » « less