skip to main content

Search for: All records

Creators/Authors contains: "Wright, Matthew"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Recent website fingerprinting attacks have been shown to achieve very high performance against traffic through Tor. These attacks allow an adversary to deduce the website a Tor user has visited by simply eavesdropping on the encrypted communication. This has consequently motivated the development of many defense strategies that obfuscate traffic through the addition of dummy packets and/or delays. The efficacy and practicality of many of these recent proposals have yet to be scrutinized in detail. In this study, we re-evaluate nine recent defense proposals that claim to provide adequate security with low-overheads using the latest Deep Learning-based attacks. Furthermore, we assess the feasibility of implementing these defenses within the current confines of Tor. To this end, we additionally provide the first on-network implementation of the DynaFlow defense to better assess its real-world utility.
    Free, publicly-accessible full text available April 1, 2024
  2. Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware classifier, more than a billion samples can easily accumulate. Given the scale of the problem, sequential training using continual learning techniques could provide substantial benefits in reducing training and storage overhead. To date, however, there has been no exploration of CL applied to malware classification tasks. In this paper, we study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios, including task, class, and domain incremental learning (IL). Specifically, using two realistic, large-scale malware datasets, we evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks. To our surprise, continual learning methods significantly underperformed naive Joint replay of the training data in nearly all settings ā€“ in some cases reducing accuracy by more than 70 percentage points. A simple approach ofmore »selectively replaying 20% of the stored data achieves better performance, with 50% of the training time compared to Joint replay. Finally, we discuss potential reasons for the unexpectedly poor performance of the CL techniques, with the hope that it spurs further research on developing techniques that are more effective in the malware classification domain.« less
    Free, publicly-accessible full text available August 22, 2023
  3. Deepfake videos are getting better in quality and can be used for dangerous disinformation campaigns. The pressing need to detect these videos has motivated researchers to develop different types of detection models. Among them, the models that utilize temporal information (i.e., sequence-based models) are more effective at detection than the ones that only detect intra-frame discrepancies. Recent work has shown that the latter detection models can be fooled with adversarial examples, leveraging the rich literature on crafting adversarial (still) images. It is less clear, however, how well these attacks will work on sequence-based models that operate on information taken over multiple frames. In this paper, we explore the effectiveness of the Fast Gradient Sign Method (FGSM) and the Carlini-Wagner šæ2-norm attack to fool sequence-based deepfake detector models in both the white-box and black-box settings. The experimental results show that the attacks are effective with a maximum success rate of 99.72% and 67.14% in the white-box and black-box attack scenarios, respectively. This highlights the importance of developing more robust sequence-based deepfake detectors and opens up directions for future research.
    Free, publicly-accessible full text available May 30, 2023
  4. End-to-end flow correlation attacks are among the oldest known attacks on low-latency anonymity networks, and are treated as a core primitive for traffic analysis of Tor. However, despite recent work showing that individual flows can be correlated with high accuracy, the impact of even these state-of-the-art attacks is questionable due to a central drawback: their pairwise nature, requiring comparison between N2 pairs of flows to deanonymize N users. This results in a combinatorial explosion in computational requirements and an asymptotically declining base rate, leading to either high numbers of false positives or vanishingly small rates of successful correlation. In this paper, we introduce a novel flow correlation attack, DeepCoFFEA, that combines two ideas to overcome these drawbacks. First, DeepCoFFEA uses deep learning to train a pair of feature embedding networks that respectively map Tor and exit flows into a single low-dimensional space where correlated flows are similar; pairs of embedded flows can be compared at lower cost than pairs of full traces. Second, DeepCoFFEA uses amplification, dividing flows into short windows and using voting across these windows to significantly reduce false positives; the same embedding networks can be used with an increasing number of windows to independently lower the falsemore »positive rate. We conduct a comprehensive experimental analysis showing that DeepCoFFEA significantly outperforms state-of-the-art flow correlation attacks on Tor, e.g. 93% true positive rate versus at most 13% when tuned for high precision, with two orders of magnitude speedup over prior work. We also consider the effects of several potential countermeasures on DeepCoFFEA, finding that existing lightweight defenses are not sufficient to secure anonymity networks from this threat.« less
    Free, publicly-accessible full text available May 1, 2023
  5. It is estimated that by the year 2024, the total number of systems equipped with voice assistant software will exceed 8.4 billion devices globally. While these devices provide convenience to consumers, they suffer from a myriad of security issues. This paper highlights the serious privacy threats exposed by information leakage in a smart assistant's encrypted network traffic metadata. To investigate this issue, we have collected a new dataset composed of dynamic and static commands posed to an Amazon Echo Dot using data collection and cleaning scripts we developed. Furthermore, we propose the Smart Home Assistant Malicious Ensemble model (SHAME) as the new state-of-the-art Voice Command Fingerprinting classifier. When evaluated against several datasets, our attack correctly classifies encrypted voice commands with up to 99.81% accuracy on Google Home traffic and 95.2% accuracy on Amazon Echo Dot traffic. These findings show that security measures must be taken to stop internet service providers, nation-states, and network eavesdroppers from monitoring our intimate conversations.
  6. Furnell, Steven (Ed.)
    A huge amount of personal and sensitive data is shared on Facebook, which makes it a prime target for attackers. Adversaries can exploit third-party applications connected to a userā€™s Facebook profile (i.e., Facebook apps) to gain access to this personal information. Usersā€™ lack of knowledge and the varying privacy policies of these apps make them further vulnerable to information leakage. However, little has been done to identify mismatches between usersā€™ perceptions and the privacy policies of Facebook apps. We address this challenge in our work. We conducted a lab study with 31 participants, where we received data on how they share information in Facebook, their Facebook-related security and privacy practices, and their perceptions on the privacy aspects of 65 frequently-used Facebook apps in terms of data collection, sharing, and deletion. We then compared participantsā€™ perceptions with the privacy policy of each reported app. Participants also reported their expectations about the types of information that should not be collected or shared by any Facebook app. Our analysis reveals significant mismatches between usersā€™ privacy perceptions and reality (i.e., privacy policies of Facebook apps), where we identified over-optimism not only in usersā€™ perceptions of information collection, but also on their self-efficacy in protectingmore »their information in Facebook despite experiencing negative incidents in the past. To the best of our knowledge, this is the first study on the gap between usersā€™ privacy perceptions around Facebook apps and the reality. The findings from this study offer directions for future research to address that gap through designing usable, effective, and personalized privacy notices to help users to make informed decisions about using Facebook apps.« less
  7. In this paper, a machine learning (ML) approach is proposed to detect and classify jamming attacks on unmanned aerial vehicles (UAVs). Four attack types are implemented using software-defined radio (SDR); namely, barrage, single-tone, successive-pulse, and protocol-aware jamming. Each type is launched against a drone that uses orthogonal frequency division multiplexing (OFDM) communication to qualitatively analyze its impacts considering jamming range, complexity, and severity. Then, an SDR is utilized in proximity to the drone and in systematic testing scenarios to record the radiometric parameters before and after each attack is launched. Signal-to-noise ratio (SNR), energy threshold, and several OFDM parameters are exploited as features and fed to six ML algorithms to explore and enable autonomous jamming detection/classification. The algorithms are quantitatively evaluated with metrics including detection and false alarm rates to evaluate the received signals and facilitate efficient decision-making for improved reception integrity and reliability. The resulting ML approach detects and classifies jamming with an accuracy of 92.2% and a false-alarm rate of 1.35%.
  8. Visually similar characters, or homoglyphs, can be used to perform social engineering attacks or to evade spam and plagiarism detectors. It is thus important to understand the capabilities of an attacker to identify homoglyphs - particularly ones that have not been previously spotted - and leverage them in attacks. We investigate a deep-learning model using embedding learning, transfer learning, and augmentation to determine the visual similarity of characters and thereby identify potential homoglyphs. Our approach uniquely takes advantage of weak labels that arise from the fact that most characters are not homoglyphs. Our model drastically outperforms the Normal-ized Compression Distance approach on pairwise homoglyph identification, for which we achieve an average precision of 0.97. We also present the first attempt at clustering homoglyphs into sets of equivalence classes, which is more efficient than pairwise information for security practitioners to quickly lookup homoglyphs or to normalize confusable string encodings. To measure clustering performance, we propose a metric (mBIOU) building on the classic Intersection-Over-Union (IOU) metric. Our clustering method achieves 0.592 mBIOU, compared to 0.430 for the naive baseline. We also use our model to predict over 8,000 previously unknown homoglyphs, and find good early indications that many of these may bemore »true positives. Source code and list of predicted homoglyphs are uploaded to Github: https://github.com/PerryXDeng/weaponizing_unicode.« less