Search for: All records

Creators/Authors contains: "Chen, Justin"

« Prev Next »

Total Resources

22

Resource Type
Conference Paper

9

Conference Proceeding

0

Dataset

0

Journal Article

13

Workshop Report

0

Availability
Full Text / Resource Available

21

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improved Frequency Estimation Algorithms with and without Predictions

Aamand, Anders ; Chen, Justin Y. ; Nguyen, Huy ; Silwal, Sandeep ; Vakilian, Ali ( September 2023 , Advances in Neural Information Processing Systems)

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al.~(2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.
more » « less
Free, publicly-accessible full text available September 21, 2024
Data Structures for Density Estimation

Aamand, Anders ; Andoni, Alexandr ; Chen, Justin ; Indyk, Piotr ; Narayanan, Shyam ; Silwal, Sandeep ( January 2023 , International Conference on Machine Learning)

We study statistical/computational tradeoffs for the following density estimation problem: given kdistributionsv1,...,vk overadiscretedomain of size n, and sampling access to a distribution p, identify vi that is “close” to p. Our main result is the first data structure that, given a sublinear (in n) number of samples from p, identifies vi in time sublinear in k. We also give an improved version of the algorithm of (Acharya et al., 2018) that reports vi in time linear in k. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
more » « less
Full Text Available
Learned Interpolation for Better Streaming Quantile Approximation with Worst-Case Guarantees

Schiefer, Nicholas ; Chen, Justin Y ; Indyk, Piotr ; Narayanan, Shyam ; Silwal, Sandeep ; Wagner, Tal ( January 2023 , SIAM Conference on Applied and Computational Discrete Algorithms)

An ε-approximate quantile sketch over a stream of n inputs approximates the rank of any query point q—that is, the number of input points less than q—up to an additive error of εn, generally with some probability of at least 1−1/ poly(n), while consuming o(n) space. While the celebrated KLL sketch of Karnin, Lang, and Liberty achieves a provably optimal quantile approximation algorithm over worst-case streams, the approximations it achieves in practice are often far from optimal. Indeed, the most commonly used technique in practice is Dunning’s t-digest, which often achieves much better approximations than KLL on realworld data but is known to have arbitrarily large errors in the worst case. We apply interpolation techniques to the streaming quantiles problem to attempt to achieve better approximations on real-world data sets than KLL while maintaining similar guarantees in the worst case.
more » « less
Full Text Available
Data Structures for Density Estimation

Aamand, Anders ; Andoni, Alexandr ; Chen, Justin Y. ; Indyk, Piotr ; Narayanan, Shyam ; Silwal, Sandeep ( January 2023 , International Conference on Machine Learning, {ICML} 2023)

Full Text Available
Noetherian operators and primary decomposition

https://doi.org/10.1016/j.jsc.2021.09.002

Chen, Justin ; Härkönen, Marc ; Krone, Robert ; Leykin, Anton ( May 2022 , Journal of Symbolic Computation)

Full Text Available
Mobility analysis of nanocluster formation and growth from titanium tetraisopropoxide in a flow tube reactor

https://doi.org/10.1016/j.jaerosci.2022.105981

Qiao, Yuechen ; Li, Li ; Chen, Justin ; Yang, Suo ; Hogan, Christopher J. ( June 2022 , Journal of Aerosol Science)

Full Text Available
Site Isolation in Metal–Organic Layers Enhances Photoredox Gold Catalysis

https://doi.org/10.1021/jacs.2c03062

Zheng, Haifeng ; Fan, Yingjie ; Song, Yang ; Chen, Justin S. ; You, Eric ; Labalme, Steven ; Lin, Wenbin ( June 2022 , Journal of the American Chemical Society)

Full Text Available
(Optimal) Online Bipartite Matching with Degree Information

Aamand, Anders ; Chen, Justin Y ; Indyk, Piotr ( January 2022 , Conference on Neural Information Processing Systems)

We propose a model for online graph problems where algorithms are given access to an oracle that predicts (e.g., based on modeling assumptions or on past data) the degrees of nodes in the graph. Within this model, we study the classic problem of online bipartite matching, and a natural greedy matching algorithm called MinPredictedDegree, which uses predictions of the degrees of offline nodes. For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i.e., it is optimal for graphs drawn from this model. Since the “symmetric” version of the model, where all online nodes are identical, is a special case of the well-studied “known i.i.d. model”, it follows that the competitive ratio of MinPredictedDegree on such inputs is at least 0.7299. For the special case of graphs with power law degree distributions, we show that MinPredictedDegree frequently produces matchings almost as large as the true maximum matching on such graphs. We complement these results with an extensive empirical evaluation showing that MinPredictedDegree compares favorably to state-of-the-art online algorithms for online matching.
more » « less
Full Text Available
Noetherian operators in Macaulay2

https://doi.org/10.2140/jsag.2022.12.33

Chen, Justin ; Cid-Ruiz, Yairon ; Härkönen, Marc ; Krone, Robert ; Leykin, Anton ( January 2022 , Journal of Software for Algebra and Geometry)

Full Text Available
Using Quantitative Imaging for Personalized Medicine in Pancreatic Cancer: A Review of Radiomics and Deep Learning Applications

https://doi.org/10.3390/cancers14071654

Preuss, Kiersten ; Thach, Nate ; Liang, Xiaoying ; Baine, Michael ; Chen, Justin ; Zhang, Chi ; Du, Huijing ; Yu, Hongfeng ; Lin, Chi ; Hollingsworth, Michael A. ; et al ( April 2022 , Cancers)

As the most lethal major cancer, pancreatic cancer is a global healthcare challenge. Personalized medicine utilizing cutting-edge multi-omics data holds potential for major breakthroughs in tackling this critical problem. Radiomics and deep learning, two trendy quantitative imaging methods that take advantage of data science and modern medical imaging, have shown increasing promise in advancing the precision management of pancreatic cancer via diagnosing of precursor diseases, early detection, accurate diagnosis, and treatment personalization and optimization. Radiomics employs manually-crafted features, while deep learning applies computer-generated automatic features. These two methods aim to mine hidden information in medical images that is missed by conventional radiology and gain insights by systematically comparing the quantitative image information across different patients in order to characterize unique imaging phenotypes. Both methods have been studied and applied in various pancreatic cancer clinical applications. In this review, we begin with an introduction to the clinical problems and the technology. After providing technical overviews of the two methods, this review focuses on the current progress of clinical applications in precancerous lesion diagnosis, pancreatic cancer detection and diagnosis, prognosis prediction, treatment stratification, and radiogenomics. The limitations of current studies and methods are discussed, along with future directions. With better standardization and optimization of the workflow from image acquisition to analysis and with larger and especially prospective high-quality datasets, radiomics and deep learning methods could show real hope in the battle against pancreatic cancer through big data-based high-precision personalization.
more » « less
Full Text Available

« Prev Next »