skip to main content


Search for: All records

Award ID contains: 2211386

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Catastrophic forgetting remains an outstanding challenge in continual learning. Recently, methods inspired by the brain, such as continual representation learning and memory replay, have been used to combat catastrophic forgetting. Associative learning (retaining associations between inputs and outputs, even after good representations are learned) plays an important function in the brain; however, its role in continual learning has not been carefully studied. Here, we identified a two-layer neural circuit in the fruit fly olfactory system that performs continual associative learning between odors and their associated valences. In the first layer, inputs (odors) are encoded using sparse, high-dimensional representations, which reduces memory interference by activating nonoverlapping populations of neurons for different odors. In the second layer, only the synapses between odor-activated neurons and the odor’s associated output neuron are modified during learning; the rest of the weights are frozen to prevent unrelated memories from being overwritten. We prove theoretically that these two perceptron-like layers help reduce catastrophic forgetting compared to the original perceptron algorithm, under continual learning. We then show empirically on benchmark data sets that this simple and lightweight architecture outperforms other popular neural-inspired algorithms when also using a two-layer feedforward architecture. Overall, fruit flies evolved an efficient continual associative learning algorithm, and circuit mechanisms from neuroscience can be translated to improve machine computation.

     
    more » « less
  2. Finding optimal bipartite matchings—e.g., matching medical students to hospitals for residency, items to buyers in an auction, or papers to reviewers for peer review—is a fundamental combinatorial optimization problem. We found a distributed algorithm for computing matchings by studying the development of the neuromuscular circuit. The neuromuscular circuit can be viewed as a bipartite graph formed between motor neurons and muscle fibers. In newborn animals, neurons and fibers are densely connected, but after development, each fiber is typically matched (i.e., connected) to exactly one neuron. We cast this synaptic pruning process as a distributed matching (or assignment) algorithm, where motor neurons “compete” with each other to “win” muscle fibers. We show that this algorithm is simple to implement, theoretically sound, and effective in practice when evaluated on real-world bipartite matching problems. Thus, insights from the development of neural circuits can inform the design of algorithms for fundamental computational problems.

     
    more » « less
    Free, publicly-accessible full text available September 10, 2025
  3. Labeling data via rules-of-thumb and minimal label supervision is central to Weak Supervision, a paradigm subsuming subareas of machine learning such as crowdsourced learning and semi-supervised ensemble learning. By using this labeled data to train modern machine learning methods, the cost of acquiring large amounts of hand labeled data can be ameliorated. Approaches to combining the rules-of-thumb falls into two camps, reflecting different ideologies of statistical estimation. The most common approach, exemplified by the Dawid-Skene model, is based on probabilistic modeling. The other, developed in the work of Balsubramani-Freund and others, is adversarial and game-theoretic. We provide a variety of statistical results for the adversarial approach under log-loss: we characterize the form of the solution, relate it to logistic regression, demonstrate consistency, and give rates of convergence. On the other hand, we find that probabilistic approaches for the same model class can fail to be consistent. Experimental results are provided to corroborate the theoretical results. 
    more » « less
    Free, publicly-accessible full text available July 19, 2025
  4. We consider k-means clustering in an online setting where each new data point is assigned to its closest cluster center and incurs a loss equal to the squared distance to that center, after which the algorithm is allowed to update its centers. The goal over a data stream X is to achieve a total loss that is not too much larger than L(X, OPT), the best possible loss using k fixed centers in hindsight. We give the first algorithm to achieve polynomial space and time complexity in the online setting. 
    more » « less
  5. We consider k-means clustering in an online setting where each new data point is assigned to its closest cluster center and incurs a loss equal to the squared distance to that center, after which the algorithm is allowed to update its centers. The goal over a data stream X is to achieve a total loss that is not too much larger than the best possible loss using k fixed centers in hindsight. Ours is the first algorithm to achieve polynomial space and time complexity in the online setting. We note that our results have implications to the related streaming setting, where one final clustering is outputted, and the no-substitution setting, where center selections are permanent. We show a general reduction between the no-substitution cost of a blackbox algorithm and its online cost. Finally, we translate our algorithm to the no-substitution setting and streaming settings, and it competes with and can outperform existing work in the areas. 
    more » « less