skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Files of a Feather Flock Together? Measuring and Modeling How Users Perceive File Similarity in Cloud Storage
Prior work suggests that users conceptualize the organization of personal collections of digital files through the lens of similarity. However, it is unclear to what degree similar files are actually located near one another (e.g., in the same directory) in actual file collections, or whether leveraging file similarity can improve information retrieval and organization for disorganized collections of files. To this end, we conducted an online study combining automated analysis of 50 Google Drive and Dropbox users' cloud accounts with a survey asking about pairs of files from those accounts. We found that many files located in different parts of file hierarchies were similar in how they were perceived by participants, as well as in their algorithmically extractable features. Participants often wished to co-manage similar files (e.g., deleting one file implied deleting the other file) even if they were far apart in the file hierarchy. To further understand this relationship, we built regression models, finding several algorithmically extractable file features to be predictive of human perceptions of file similarity and desired file co-management. Our findings pave the way for leveraging file similarity to automatically recommend access, move, or delete operations based on users' prior interactions with similar files.  more » « less
Award ID(s):
1801663 1835890
PAR ID:
10227474
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Users face many challenges in keeping their personal file collections organized. While current file-management interfaces help users retrieve files in disorganized repositories, they do not aid in organization. Pertinent files can be difficult to find, and files that should have been deleted may remain. To help, we designed KondoCloud, a file-browser interface for personal cloud storage. KondoCloud makes machine learning-based recommendations of files users may want to retrieve, move, or delete. These recommendations leverage the intuition that similar files should be managed similarly. We developed and evaluated KondoCloud through two complementary online user studies. In our Observation Study, we logged the actions of 69 participants who spent 30 minutes manually organizing their own Google Drive repositories. We identified high-level organizational strategies, including moving related files to newly created sub-folders and extensively deleting files. To train the classifiers that underpin KondoCloud's recommendations, we had participants label whether pairs of files were similar and whether they should be managed similarly. In addition, we extracted ten metadata and content features from all files in participants' repositories. Our logistic regression classifiers all achieved F1 scores of 0.72 or higher. In our Evaluation Study, 62 participants used KondoCloud either with or without recommendations. Roughly half of participants accepted a non-trivial fraction of recommendations, and some participants accepted nearly all of them. Participants who were shown the recommendations were more likely to delete related files located in different directories. They also generally felt the recommendations improved efficiency. Participants who were not shown recommendations nonetheless manually performed about a third of the actions that would have been recommended. 
    more » « less
  2. null (Ed.)
    With the ubiquity of data breaches, forgotten-about files stored in the cloud create latent privacy risks. We take a holistic approach to help users identify sensitive, unwanted files in cloud storage. We first conducted 17 qualitative interviews to characterize factors that make humans perceive a file as sensitive, useful, and worthy of either protection or deletion. Building on our findings, we conducted a primarily quantitative online study. We showed 108 long-term users of Google Drive or Dropbox a selection of files from their accounts. They labeled and explained these files' sensitivity, usefulness, and desired management (whether they wanted to keep, delete, or protect them). For each file, we collected many metadata and content features, building a training dataset of 3,525 labeled files. We then built Aletheia, which predicts a file's perceived sensitivity and usefulness, as well as its desired management. Aletheia improves over state-of-the-art baselines by 26% to 159%, predicting users' desired file-management decisions with 79% accuracy. Notably, predicting subjective perceptions of usefulness and sensitivity led to a 10% absolute accuracy improvement in predicting desired file-management decisions. Aletheia's performance validates a human-centric approach to feature selection when using inference techniques on subjective security-related tasks. It also improves upon the state of the art in minimizing the attack surface of cloud accounts. 
    more » « less
  3. null (Ed.)
    With the ubiquity of data breaches, forgotten-about files stored in the cloud create latent privacy risks. We take a holistic approach to help users identify sensitive, unwanted files in cloud storage. We first conducted 17 qualitative interviews to characterize factors that make humans perceive a file as sensitive, useful, and worthy of either protection or deletion. Building on our findings, we conducted a primarily quantitative online study. We showed 108 long-term users of Google Drive or Dropbox a selection of files from their accounts. They labeled and explained these files’ sensitivity, usefulness, and desired management (whether they wanted to keep, delete, or protect them). For each file, we collected many metadata and content features, building a training dataset of 3,525 labeled files. We then built Aletheia, which predicts a file’s perceived sensitivity and usefulness, as well as its desired management. Aletheia improves over state-of-the-art baselines by 26% to 159%, predicting users’ desired file-management decisions with 79% accuracy. Notably, predicting subjective perceptions of usefulness and sensitivity led to a 10% absolute accuracy improvement in predicting desired file-management decisions. Aletheia’s performance validates a human-centric approach to feature selection when using inference techniques on subjective security-related tasks. It also improves upon the state of the art in minimizing the attack surface of cloud accounts. 
    more » « less
  4. Abstract Visualizing spatial assay data in anatomical images is vital for understanding biological processes in cell, tissue, and organ organizations. Technologies requiring this functionality include traditional one-at-a-time assays, and bulk and single-cell omics experiments, including RNA-seq and proteomics. The spatialHeatmap software provides a series of powerful new methods for these needs, and allows users to work with adequately formatted anatomical images from public collections or custom images. It colors the spatial features (e.g. tissues) annotated in the images according to the measured or predicted abundance levels of biomolecules (e.g. mRNAs) using a color key. This core functionality of the package is called a spatial heatmap plot. Single-cell data can be co-visualized in composite plots that combine spatial heatmaps with embedding plots of high-dimensional data. The resulting spatial context information is essential for gaining insights into the tissue-level organization of single-cell data, or vice versa. Additional core functionalities include the automated identification of biomolecules with spatially selective abundance patterns and clusters of biomolecules sharing similar abundance profiles. To appeal to both non-expert and computational users, spatialHeatmap provides a graphical and a command-line interface, respectively. It is distributed as a free, open-source Bioconductor package (https://bioconductor.org/packages/spatialHeatmap) that users can install on personal computers, shared servers, or cloud systems. 
    more » « less
  5. Recent robot collections provide various interactive tools for users to explore and analyze their datasets. Yet, the literature lacks data on how users interact with these collections and which tools can best support their goals. This late-breaking report presents preliminary data on the utility of four interactive tools for accessing a collection of robot hands. The tools include a gallery and similarity comparison for browsing and filtering existing hands, a prediction tool for estimating user impression of hands (e.g., humanlikeness), and a recommendation tool suggesting design features (e.g., number of fingers) for achieving a target user impression rating. Data from a user study with 9 novice robotics researchers suggest the users found the tools useful for various tasks and especially appreciated the gallery and recommendation functionalities for understanding the complex relationships of the data. We discuss the results and outline future steps for developing interface design guidelines for robot collections. 
    more » « less