NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

STARK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

Wu, Shirley; Zhao, Shiyu; Yasunaga, Michihiro; Huang, Kexin; Cao, Kaidi; Huang, Qian; Ioannidis, Vassilis N; Subbian, Karthik; Zou, James; Leskovec, Jure (December 2024, Advances in neural information processing systems)

Full Text Available
Avatar: Optimizing llm agents for tool usage via contrastive reasoning

Wu, Shirley; Zhao, Shiyu; Huang, Qian; Huang, Kexin; Yasunaga, Michihiro; Cao, Kaidi; Ioannidis, Vassilis N; Subbian, Karthik; Leskovec, Jure; Zou, James (December 2024, Advances in neural information processing systems)

Full Text Available
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

Phothilimthana, Phitchaya Mangpo; Abu-El-Haija, Sami; Cao, Kaidi; Fatemi, Bahare; Burrows, Michael; Mendis, Charith; Perozzi, Bryan (May 2024, Proceedings of the 37th International Conference on Neural Information Processing Systems)

Full Text Available
Large graph property prediction via graph segment training

Cao, Kaidi; Phothilimthana, Phitchaya Mangpo; Abu-El-Haija, Sami; Zelle, Dustin; Zhou, Yanqi; Mendis, Charith; Leskovec, Jure; Perozzi, Bryan (May 2024, International Conference on Neural Information Processing Systems)

Full Text Available
Large Graph Property Prediction via Graph Segment Training

Cao, Kaidi; Phothilimthana, Phitchaya Mangpo; Abu-El-Haija, Sami; Zelle, Dustin; Zhou, Yanqi; Mendis, Charith; Leskovec, Jure; Perozzi, Bryan (December 2023, Advances in neural information processing systems)

Learning to predict properties of a large graph is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST+EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST+EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.
more » « less
Full Text Available
AutoTransfer: AutoML with Knowledge Transfer - An Application to Graph Neural Networks

Cao, Kaidi; You, Jiaxuan; Liu, Jiaju; Leskovec, Jure (May 2023, International Conference on Learning Representations (ICLR))

AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational costs. We proposed AutoTransfer, an AutoML solution that improves search efficiency by transferring the prior architectural design knowledge to the novel task of interest. Our key innovation includes a task-model bank that captures the model performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity among different tasks. Based on the task-model bank and the task embeddings, our method estimates the design priors of desirable models of the novel task, by aggregating a similarity-weighted sum of the top-K design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluated AutoTransfer on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar best-performing architectures; (ii) AutoTransfer significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we released GNN-BANK-101, a large-scale dataset of detailed GNN training information of 120,000 task-model combinations to facilitate and inspire future research.
more » « less
Full Text Available
Learning Backward Compatible Embeddings

https://doi.org/10.1145/3534678.3539194

Hu, Weihua; Bansal, Rajas; Cao, Kaidi; Rao, Nikhil; Subbian, Karthik; Leskovec, Jure (August 2022, Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However, as the embedding model gets updated and retrained to improve performance on the intended task, the newly-generated embeddings are no longer compatible with the existing consumer models. This means that historical versions of the embeddings can never be retired or all consumer teams have to retrain their models to make them compatible with the latest version of the embeddings, both of which are extremely costly in practice. Here we study the problem of embedding version updates and their backward compatibility. We formalize the problem where the goal is for the embedding team to keep updating the embedding version, while the consumer teams do not have to retrain their models. We develop a solution based on learning backward compatible embeddings, which allows the embedding model version to be updated frequently, while also allowing the latest version of the embedding to be quickly transformed into any backward compatible historical version of it, so that consumer teams do not have to retrain their models. Our key idea is that whenever a new embedding model is trained, we learn it together with a light-weight backward compatibility transformation that aligns the new embedding to the previous version of it. Our learned backward transformations can then be composed to produce any historical version of embedding. Under our framework, we explore six methods and systematically evaluate them on a real-world recommender system application. We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates. Simultaneously, BC-Aligner achieves the intended task performance similar to the embedding model that is solely optimized for the intended task.
more » « less
Full Text Available
Open-World Semi-Supervised Learning

Cao, Kaidi; Brbic, Maria; Leskovec, Jure (January 2022, International Conference on Learning Representations (ICLR))

A fundamental limitation of applying semi-supervised learning in real-world settings is the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, this assumption rarely holds for data in-the-wild, where instances belonging to novel classes may appear at testing time. Here, we introduce a novel open-world semi-supervised learning setting that formalizes the notion that novel classes may appear in the unlabeled test data. In this novel setting, the goal is to solve the class distribution mismatch between labeled and unlabeled data, where at the test time every input instance either needs to be classified into one of the existing classes or a new unseen class needs to be initialized. To tackle this challenging problem, we propose ORCA, an end-to-end deep learning approach that introduces uncertainty adaptive margin mechanism to circumvent the bias towards seen classes caused by learning discriminative features for seen classes faster than for the novel classes. In this way, ORCA reduces the gap between intra-class variance of seen with respect to novel classes. Experiments on image classification datasets and a single-cell annotation dataset demonstrate that ORCA consistently outperforms alternative baselines, achieving 25% improvement on seen and 96% improvement on novel classes of the ImageNet dataset.
more » « less
Full Text Available
Concept Learners for Few-Shot Learning

Cao, Kaidi; Brbic, Maria; Leskovec, Jure (January 2021, International Conference on Learning Representations (ICLR))
null (Ed.)
Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions.
more » « less
Full Text Available
Concept Learners for Few-Shot Learning

Cao, Kaidi; Brbic, Maria; Leskovec, Jure. (January 2021, International Conference on Learning Representation (ICLR))
null (Ed.)
Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions.
more » « less
Full Text Available

« Prev Next »

Search for: All records