skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 11 until 2:00 AM ET on Saturday, July 12 due to maintenance. We apologize for the inconvenience.


Title: Is Cosine-Similarity of Embeddings Really About Similarity?
Award ID(s):
1846210
PAR ID:
10520725
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400701726
Page Range / eLocation ID:
887 to 890
Format(s):
Medium: X
Location:
Singapore Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. ObjectiveThis study explores subjective and objective driving style similarity to identify how similarity can be used to develop driver-compatible vehicle automation. BackgroundSimilarity in the ways that interaction partners perform tasks can be measured subjectively, through questionnaires, or objectively by characterizing each agent’s actions. Although subjective measures have advantages in prediction, objective measures are more useful when operationalizing interventions based on these measures. Showing how objective and subjective similarity are related is therefore prudent for aligning future machine performance with human preferences. MethodsA driving simulator study was conducted with stop-and-go scenarios. Participants experienced conservative, moderate, and aggressive automated driving styles and rated the similarity between their own driving style and that of the automation. Objective similarity between the manual and automated driving speed profiles was calculated using three distance measures: dynamic time warping, Euclidean distance, and time alignment measure. Linear mixed effects models were used to examine how different components of the stopping profile and the three objective similarity measures predicted subjective similarity. ResultsObjective similarity using Euclidean distance best predicted subjective similarity. However, this was only observed for participants’ approach to the intersection and not their departure. ConclusionDeveloping driving styles that drivers perceive to be similar to their own is an important step toward driver-compatible automation. In determining what constitutes similarity, it is important to (a) use measures that reflect the driver’s perception of similarity, and (b) understand what elements of the driving style govern subjective similarity. 
    more » « less
  2. Similarity search is the basis for many data analytics techniques, including k-nearest neighbor classification and outlier detection. Similarity search over large data sets relies on i) a distance metric learned from input examples and ii) an index to speed up search based on the learned distance metric. In interactive systems, input to guide the learning of the distance metric may be provided over time. As this new input changes the learned distance metric, a naive approach would adopt the costly process of re-indexing all items after each metric change. In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process. To achieve this, we prove that locality-sensitive hashing (LSH) provides an invariance property, meaning that an LSH index built on the original distance metric is equally effective at supporting similarity search using an updated distance metric as long as the transform matrix learned for the new distance metric satisfies certain properties. This observation allows OASIS to avoid recomputing the index from scratch in most cases. Further, for the rare cases when an adaption of the LSH index is shown to be necessary, we design an efficient incremental LSH update strategy that re-hashes only a small subset of the items in the index. In addition, we develop an efficient distance metric learning strategy that incrementally learns the new metric as inputs are received. Our experimental study using real world public datasets confirms the effectiveness of OASIS at improving the accuracy of various similarity search-based data analytics tasks by instantaneously adapting the distance metric and its associated index in tandem, while achieving an up to 3 orders of magnitude speedup over the state-of-art techniques. 
    more » « less