skip to main content

Search for: All records

Creators/Authors contains: "Hong, David"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Subspace clustering is the unsupervised grouping of points lying near a union of low-dimensional linear subspaces. Algorithms based directly on geometric properties of such data tend to either provide poor empirical performance, lack theoretical guarantees or depend heavily on their initialization. We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the $K$-subspace (KSS) algorithm via the evidence accumulation clustering framework. Our algorithm, referred to as ensemble $K$-subspaces (EKSSs), forms a co-association matrix whose $(i,j)$th entry is the number of times points $i$ and $j$ are clustered together by several runs of KSS with random initializations. We prove general recovery guarantees for any algorithm that forms an affinity matrix with entries close to a monotonic transformation of pairwise absolute inner products. We then show that a specific instance of EKSS results in an affinity matrix with entries of this form, and hence our proposed algorithm can provably recover subspaces under similar conditions to state-of-the-art algorithms. The finding is, to the best of our knowledge, the first recovery guarantee for evidence accumulation clustering and for KSS variants. We show on synthetic data that our method performs well in the traditionally challenging settings of subspaces withmore »large intersection, subspaces with small principal angles and noisy data. Finally, we evaluate our algorithm on six common benchmark datasets and show that unlike existing methods, EKSS achieves excellent empirical performance when there are both a small and large number of points per subspace.« less
  2. Baseline estimation is a critical task for commercial buildings that participate in demand response programs and need to assess the impact of their strategies. The problem is to predict what the power profile would have been had the demand response event not taken place. This paper explores the use of tensor decomposition in baseline estimation. We apply the method to submetered fan power data from demand response experiments that were run to assess a fast demand response strategy expected to primarily impact the fans. Baselining this fan power data is critical for evaluating the results, but doing so presents new challenges not readily addressed by existing techniques designed primarily for baselining whole building electric loads. We find that tensor decomposition of the fan power data identifies components that capture both dominant daily patterns and demand response events, and that are generally more interpretable than those found by principal component analysis. We conclude by discussing how these components and related techniques can aid in developing new baseline models.
  3. Autonomous vehicle (AV) software systems are emerging to enable rapidly developed self-driving functionalities. Since such systems are responsible for safety-critical decisions, it is necessary to secure them in face of cyber attacks. Through an empirical study of representative AV software systems Baidu Apollo and Autoware, we discover a common over-privilege problem with the publish-subscribe communication model widely adopted by AV systems: due to the coarse-grained message design for the publish-subscribe communication, some message fields are over-granted with publish/subscribe permissions. To comply with the least-privilege principle and reduce the attack surface resulting from such problem, we argue that the publish/subscribe permissions should be defined and enforced at the granularity of message fields instead of messages. To systematically address such publish-subscribe over-privilege problems, we present AVGuardian, a system that includes (1) a static analysis tool that detects over-privilege instances in AV software and generates the corresponding access control policies at the message field granularity, and (2) a low-overhead, module-transparent, runtime publish/subscribe permission policy enforcement mechanism to perform online policy violation detection and prevention. Using our detection tool, we are able to automatically detect 581 over-privilege instances in total in Baidu Apollo. To demonstrate the severity, we further constructed several concrete exploits thatmore »can lead to vehicle collision and identity theft for AV owners, which have been reported to Baidu Apollo and confirmed as valid. For defense, we prototype and evaluate the policy enforcement mechanism, and find that it has very low overhead, does not affect original AV decision logic, and also is resilient to message replay attacks.« less
  4. We present a mechanistic mathematical model of immune checkpoint inhibitor therapy to address the oncological need for early, broadly applicable readouts (biomarkers) of patient response to immunotherapy. The model is built upon the complex biological and physical interactions between the immune system and cancer, and is informed using only standard-of-care CT. We have retrospectively applied the model to 245 patients from multiple clinical trials treated with anti–CTLA-4 or anti–PD-1/PD-L1 antibodies. We found that model parameters distinctly identified patients with common ( n = 18) and rare ( n = 10) malignancy types who benefited and did not benefit from these monotherapies with accuracy as high as 88% at first restaging (median 53 days). Further, the parameters successfully differentiated pseudo-progression from true progression, providing previously unidentified insights into the unique biophysical characteristics of pseudo-progression. Our mathematical model offers a clinically relevant tool for personalized oncology and for engineering immunotherapy regimens.
  5. Principal Component Analysis (PCA) is a standard dimensionality reduction technique, but it treats all samples uniformly, making it suboptimal for heterogeneous data that are increasingly common in modern settings. This paper proposes a PCA variant for samples with heterogeneous noise levels, i.e., heteroscedastic noise, that naturally arise when some of the data come from higher quality sources than others. The technique handles heteroscedasticity by incorporating it in the statistical model of a probabilistic PCA. The resulting optimization problem is an interesting nonconvex problem related to but not seemingly solved by singular value decomposition, and this paper derives an expectation maximization (EM) algorithm. Numerical experiments illustrate the benefits of using the proposed method to combine samples with heteroscedastic noise in a single analysis, as well as benefits of careful initialization for the EM algorithm.
  6. Principal Component Analysis (PCA) is a standard dimensionality reduction technique, but it treats all samples uniformly, making it suboptimal for heterogeneous data that are increasingly common in modern settings. This paper proposes a PCA variant for samples with heterogeneous noise levels, i.e., heteroscedastic noise, that naturally arise when some of the data come from higher quality sources than others. The technique handles heteroscedasticity by incorporating it in the statistical model of a probabilistic PCA. The resulting optimization problem is an interesting nonconvex problem related to but not seemingly solved by singular value decomposition, and this paper derives an expectation maximization (EM) algorithm. Numerical experiments illustrate the benefits of using the proposed method to combine samples with heteroscedastic noise in a single analysis, as well as benefits of careful initialization for the EM algorithm. Index Terms— Principal component analysis, heterogeneous data, maximum likelihood estimation, latent factors