  1. Importance Autism detection early in childhood is critical to ensure that autistic children and their families have access to early behavioral support. Early correlates of autism documented in electronic health records (EHRs) during routine care could allow passive, predictive model-based monitoring to improve the accuracy of early detection. Objective To quantify the predictive value of early autism detection models based on EHR data collected before age 1 year. Design, Setting, and Participants This retrospective diagnostic study used EHR data from children seen within the Duke University Health System before age 30 days between January 2006 and December 2020. These data were used to train and evaluate L2-regularized Cox proportional hazards models predicting later autism diagnosis based on data collected from birth up to the time of prediction (ages 30-360 days). Statistical analyses were performed between August 1, 2020, and April 1, 2022. Main Outcomes and Measures Prediction performance was quantified in terms of sensitivity, specificity, and positive predictive value (PPV) at clinically relevant model operating thresholds. Results Data from 45 080 children, including 924 (1.5%) meeting autism criteria, were included in this study. Model-based autism detection at age 30 days achieved 45.5% sensitivity and 23.0% PPV at 90.0% specificity. Detection by age 360 days achieved 59.8% sensitivity and 17.6% PPV at 81.5% specificity and 38.8% sensitivity and 31.0% PPV at 94.3% specificity. Conclusions and Relevance In this diagnostic study of an autism screening test, EHR-based autism detection achieved clinically meaningful accuracy by age 30 days, improving by age 1 year. This automated approach could be integrated with caregiver surveys to improve the accuracy of early autism screening. 
  2. Machine learning models are updated as new data is acquired or new architectures are developed. These updates usually increase model performance, but may introduce backward compatibility errors, where individual users or groups of users see their performance on the updated model adversely affected. This problem can also be present when training datasets do not accurately reflect overall population demographics, with some groups having overall lower participation in the data collection process, posing a significant fairness concern. We analyze how ideas from distributional robustness and minimax fairness can aid backward compatibility in this scenario, and propose two methods to directly address this issue. Our theoretical analysis is backed by experimental results on CIFAR-10, CelebA, and Waterbirds, three standard image classification datasets. 
  3. Image retrieval relies heavily on the quality of the data modeling and the distance measurement in the feature space. Building on the concept of image manifold, we first propose to represent the feature space of images, learned via neural networks, as a graph. Neighborhoods in the feature space are now defined by the geodesic distance between images, represented as graph vertices or manifold samples. When limited images are available, this manifold is sparsely sampled, making the geodesic computation and the corresponding retrieval harder. To address this, we augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images. In addition to extensive results on standard datasets illustrating the power of text to help in image retrieval, a new public dataset based on CLEVR is introduced to quantify the semantic similarity between visual data and text data. The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval given only an image and a textual instruction on the desired modifications over the image. 
