skip to main content

Search for: All records

Creators/Authors contains: "Uhler, Caroline"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Long-term sustained mechano-chemical signals in tissue microenvironment regulate cell-state transitions. In recent work, we showed that laterally confined growth of fibroblasts induce dedifferentiation programs. However, the molecular mechanisms underlying such mechanically induced cell-state transitions are poorly understood. In this paper, we identify Lef1 as a critical somatic transcription factor for the mechanical regulation of de-differentiation pathways. Network optimization methods applied to time-lapse RNA-seq data identify Lef1 dependent signaling as potential regulators of such cell-state transitions. We show that Lef1 knockdown results in the down-regulation of fibroblast de-differentiation and that Lef1 directly interacts with the promoter regions of downstream reprogramming factors. We also evaluate the potential upstream activation pathways of Lef1, including the Smad4, Atf2, NFkB and Beta-catenin pathways, thereby identifying that Smad4 and Atf2 may be critical for Lef1 activation. Collectively, we describe an important mechanotransduction pathway, including Lef1, which upon activation, through progressive lateral cell confinement, results in fibroblast de-differentiation.

  2. Matrix completion problems arise in many applications including recommendation systems, computer vision, and genomics. Increasingly larger neural networks have been successful in many of these applications but at considerable computational costs. Remarkably, taking the width of a neural network to infinity allows for improved computational performance. In this work, we develop an infinite width neural network framework for matrix completion that is simple, fast, and flexible. Simplicity and speed come from the connection between the infinite width limit of neural networks and kernels known as neural tangent kernels (NTK). In particular, we derive the NTK for fully connected and convolutional neural networks for matrix completion. The flexibility stems from a feature prior, which allows encoding relationships between coordinates of the target matrix, akin to semisupervised learning. The effectiveness of our framework is demonstrated through competitive results for virtual drug screening and image inpainting/reconstruction. We also provide an implementation in Python to make our framework accessible on standard hardware to a broad audience.
    Free, publicly-accessible full text available April 19, 2023
  3. Free, publicly-accessible full text available April 1, 2023
  4. Free, publicly-accessible full text available March 1, 2023
  5. Cowen, Lenore (Ed.)
    Abstract Summary Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. Availability and implementation Python package freely availablemore »at http://uhlerlab.github.io/causaldag/dci. Supplementary information Supplementary data are available at Bioinformatics online.« less
  6. Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. We provide empirical evidence that 1) overparameterized autoencoders store training samples as attractors and thus iterating the learned map leads to sample recovery, and that 2) the same mechanism allows for encoding sequences of examples and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding.
  7. Abstract

    Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data-driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. More importantly, given that SARS-CoV-2 pathogenicity is highly age-dependent, it is critical to integrate aging signatures into drug discovery platforms. We here take advantage of large-scale transcriptional drug screens combined with RNA-seq data of the lung epithelium with SARS-CoV-2 infection as well as the aging lung. To identify robust druggable protein targets, we propose a principled causal framework that makes use of multiple data modalities. Our analysis highlights the importance of serine/threonine and tyrosine kinases as potential targets that intersect the SARS-CoV-2 and aging pathways. By integrating transcriptomic, proteomic and structural data that is available for many diseases, our drug discovery platform is broadly applicable. Rigorous in vitro experiments as well as clinical trials are needed to validate the identified candidate drugs.

  8. Abstract Selecting the optimal Markowitz portfolio depends on estimating the covariance matrix of the returns of N assets from T periods of historical data. Problematically, N is typically of the same order as T, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general-purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 (MTP2). This constraint on the covariance matrix not only enforces positive dependence among the assets but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock market data spanning 30 years, we show that estimating the covariance matrix under MTP2 outperforms previous state-of-the-art methods including shrinkage estimators and factor models.