Title: Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild
In this work, we analyze to what extent actors target poorly-secured cloud storage buckets for attack. We deployed hundreds of AWS S3 honeybuckets with different names and content to lure and measure different scanning strategies. Actors exhibited clear preferences for scanning buckets that appeared to belong to organizations, especially commercial entities in the technology sector with a vulnerability disclosure program. Actors continuously engaged with the content of buckets by downloading, uploading, and deleting files. Most alarmingly, we recorded multiple instances in which malicious actors downloaded, read, and understood a document from our honeybucket, leading them to attempt to gain unauthorized server access. more »« less
Chen, Ke; Shao, Mingfu
(, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022))
Boucher, Christina; Rahmann, Sven
(Ed.)
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d₁, d₂)-sensitive if any two sequences within an edit distance of d₁ are mapped into at least one shared bucket, and any two sequences with distance at least d₂ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of (d₁,d₂) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.
Chen, Ke; Shao, Mingfu
(, Algorithms for Molecular Biology)
Abstract BackgroundMany bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existingk-mer-based bucketing methods have been efficient in processing sequencing data with low error rates, but encounter much reduced sensitivity on data with high error rates. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. ResultsIn this paper, we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be$$(d_1, d_2)$$ -sensitive if any two sequences within an edit distance of$$d_1$$ are mapped into at least one shared bucket, and any two sequences with distance at least$$d_2$$ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of$$(d_1,d_2)$$ and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. ConclusionThese results lay the theoretical foundations for their practical use in analyzing sequences with high error rates while also providing insights for the hardness of designing ungapped LSH functions.
Deirdre K. Mulligan; Kenneth A. Bamberger
(, Berkeley technology law journal)
This Article develops a framework for both assessing and designing content moderation systems consistent with public values. It argues that moderation should not be understood as a single function, but as a set of subfunctions common to all content governance regimes. By identifying the particular values implicated by each of these subfunctions, it explores the appropriate ways the constituent tasks might best be allocated-specifically to which actors (public or private, human or technological) they might be assigned, and what constraints or processes might be required in their performance. This analysis can facilitate the evaluation and design of content moderation systems to ensure the capacity and competencies necessary for legitimate, distributed systems of content governance. Through a combination of methods, legal schemes delegate at least a portion of the responsibility for governing online expression to private actors. Sometimes, statutory schemes assign regulatory tasks explicitly. In others, this delegation often occurs implicitly, with little guidance as to how the treatment of content should be structured. In the law's shadow, online platforms are largely given free rein to configure the governance of expression. Legal scholarship has surfaced important concerns about the private sector's role in content governance. In response, private platforms engaged in content moderation have adopted structures that mimic public governance forms. Yet, we largely lack the means to measure whether these forms are substantive, effectively infusing public values into the content moderation process, or merely symbolic artifice designed to deflect much needed public scrutiny. This Article's proposed framework addresses that gap in two ways. First, the framework considers together all manner of legal regimes that induce platforms to engage in the function of content moderation. Second, it focuses on the shared set of specific tasks, or subfunctions, involved in the content moderation function across these regimes. Examining a broad range of content moderation regimes together highlights the existence of distinct common tasks and decision points that together constitute the content moderation function. Focusing on this shared set of subfunctions highlights the different values implicated by each and the way they can be "handed off' to human and technical actors to perform in different ways with varying normative and political implications. This Article identifies four key content moderation subfunctions: (1) definition of policies, (2) identification of potentially covered content, (3) application of policies to specific cases, and (4) resolution of those cases. Using these four subfunctions supports a rigorous analysis of how to leverage the capacities and competencies of government and private parties throughout the content moderation process. Such attention also highlights how the exercise of that power can be constrained-either by requiring the use of particular decision-making processes or through limits on the use of automation-in ways that further address normative concerns. Dissecting the allocation of subfunctions in various content moderation regimes reveals the distinct ethical and political questions that arise in alternate configurations. Specifically, it offers a way to think about four key questions: (1) what values are most at issue regarding each subfunction; (2) which activities might be more appropriate to delegate to particular public or private actors; (3) which constraints must be attached to the delegation of each subfunction; and (4) where can investments in shared content moderation infrastructures support relevant values? The functional framework thus provides a means for both evaluating the symbolic legal forms that firms have constructed in service of content moderation and for designing processes that better reflect public values.
Hiaeshutter-Rice, Dan; Chinn, Sedona; Chen, Kaiping
(, Frontiers in Political Science)
People are increasingly exposed to science and political information from social media. One consequence is that these sites play host to “alternative influencers,” who spread misinformation. However, content posted by alternative influencers on different social media platforms is unlikely to be homogenous. Our study uses computational methods to investigate how dimensions we refer to as audience and channel of social media platforms influence emotion and topics in content posted by “alternative influencers” on different platforms. Using COVID-19 as an example, we find that alternative influencers’ content contained more anger and fear words on Facebook and Twitter compared to YouTube. We also found that these actors discussed substantively different topics in their COVID-19 content on YouTube compared to Twitter and Facebook. With these findings, we discuss how the audience and channel of different social media platforms affect alternative influencers’ ability to spread misinformation online.
Han, Shanshan; Chakraborty, Vishal; Goodrich, Michael T; Mehrotra, Sharad; Sharma, Shantanu
(, Proceedings of the ACM on Management of Data)
This paper addresses volume leakage (i.e., leakage of the number of records in the answer set) when processing keyword queries in encrypted key-value (KV) datasets. Volume leakage, coupled with prior knowledge about data distribution and/or previously executed queries, can reveal both ciphertexts and current user queries. We develop a solution to prevent volume leakage, entitled Veil, that partitions the dataset by randomly mapping keys to a set of equi-sized buckets. Veil provides a tunable mechanism for data owners to explore a trade-off between storage and communication overheads. To make buckets indistinguishable to the adversary, Veil uses a novel padding strategy that allow buckets to overlap, reducing the need to add fake records. Both theoretical and experimental results show Veil to significantly outperform existing state-of-the-art.
Izhikevich, Katherine, Voelker, Geoffrey M, Savage, Stefan, and Izhikevich, Liz. Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild. Retrieved from https://par.nsf.gov/biblio/10587902. Web. doi:10.1109/EuroSP60621.2024.00014.
Izhikevich, Katherine, Voelker, Geoffrey M, Savage, Stefan, & Izhikevich, Liz. Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild. Retrieved from https://par.nsf.gov/biblio/10587902. https://doi.org/10.1109/EuroSP60621.2024.00014
Izhikevich, Katherine, Voelker, Geoffrey M, Savage, Stefan, and Izhikevich, Liz.
"Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild". Country unknown/Code not available: Proceedings of the 9th IEEE European Symposium on Security and Privacy (EuroS&P). https://doi.org/10.1109/EuroSP60621.2024.00014.https://par.nsf.gov/biblio/10587902.
@article{osti_10587902,
place = {Country unknown/Code not available},
title = {Using Honeybuckets to Characterize Cloud Storage Scanning in the Wild},
url = {https://par.nsf.gov/biblio/10587902},
DOI = {10.1109/EuroSP60621.2024.00014},
abstractNote = {In this work, we analyze to what extent actors target poorly-secured cloud storage buckets for attack. We deployed hundreds of AWS S3 honeybuckets with different names and content to lure and measure different scanning strategies. Actors exhibited clear preferences for scanning buckets that appeared to belong to organizations, especially commercial entities in the technology sector with a vulnerability disclosure program. Actors continuously engaged with the content of buckets by downloading, uploading, and deleting files. Most alarmingly, we recorded multiple instances in which malicious actors downloaded, read, and understood a document from our honeybucket, leading them to attempt to gain unauthorized server access.},
journal = {},
publisher = {Proceedings of the 9th IEEE European Symposium on Security and Privacy (EuroS&P)},
author = {Izhikevich, Katherine and Voelker, Geoffrey M and Savage, Stefan and Izhikevich, Liz},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.