An official website of the United States government Here's how you know

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Search for: All records

Creators/Authors contains: "Mirzasoleiman, Baharan"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MM-Gen: Principled and Generalizable Data Curation for Enhancing Task Performance in VLMs

Joshi, Siddharth; Nushi, Besmira; Balachandran, Vidhisha; Chandrasekaran, Varun; Vineet, Vibhav; Joshi, Neel; Mirzasoleiman, Baharan (September 2025, Journal of Data-centric Machine Learning Research (DMLR))

Free, publicly-accessible full text available September 25, 2026
Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions

Xue, Yihao; Li, Jiping; Mirzasoleiman, Baharan (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
Synthetic Text Generation for Training Large Language Models via Gradient Matching

Nguyen, Dang; Li, Zeman; Bateni, Mohammadhossein; Mirrokni, Vahab; Razaviyayn, Meisam; Mirzasoleiman, Baharan (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks

Joshi, Siddharth; Ni, Jiayi; Mirzasoleiman, Baharan (April 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 11, 2026
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures

Nguyen, Dang; Yang, Wenhan; Anand, Rathul; Yang, Yu; Mirzasoleiman, Baharan (April 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 11, 2026
Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

Nguyen, Dang; Haddad, Paymon; Gan, Eric; Mirzasoleiman, Baharan (December 2024, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

Yang, Yu; Mishra, Siddhartha; Chiang, Jeffery N; Mirzasoleiman, Baharan (December 2024, Advances in Neural Information Processing Systems (NeurIPS))

Full Text Available
Graph Contrastive Learning under Heterophily via Graph Filters

Yang, Wenhan; Mirzasoleiman, Baharan (July 2024, Conference on Uncertainty in Artificial Intelligence (UAI))

Full Text Available
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

Yang, Wenhan; Gao, Jingdong; Mirzasoleiman, Baharan (July 2024, International Conference on Machine Learning (ICML))

Full Text Available
Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise

Xue, Yihao; Whitecross, Kyle; Mirzasoleiman, Baharan (July 2024, Conference on Uncertainty in Artificial Intelligence (UAI))

Full Text Available

« Prev Next »