skip to main content

Title: Risky BIZness: risks derived from registrar name management
In this paper, we explore a domain hijacking vulnerability that is an accidental byproduct of undocumented operational practices between domain registrars and registries. We show how over the last nine years over 512K domains have been implicitly exposed to the risk of hijacking, affecting names in most popular TLDs (including .com and .net) as well as legacy TLDs with tight registration control (such as .edu and .gov). Moreover, we show that this weakness has been actively exploited by multiple parties who, over the years, have assumed control over 163K domains without having any ownership interest in those names. In addition to characterizing the nature and size of this problem, we also report on the efficacy of the remediation in response to our outreach with registrars.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the 21st ACM Internet Measurement Conference (IMC '21)
Page Range or eLocation-ID:
673 to 686
Sponsoring Org:
National Science Foundation
More Like this
  1. We identify over a quarter of a million domains used by medium and large companies within the .com registry. We find that for around 7% of these companies very similar domain names have been registered with character changes that are intended to be indistinguishable at a casual glance. These domains would be suitable for use in Business Email Compromise frauds. Using historical registration and name server data we identify the timing, rate, and movement of these look-alike domains over a ten year period. This allows us to identify clusters of registrations which are quite clearly malicious and show how the criminals have moved their activity over time in response to countermeasures. Although the malicious activity peaked in 2016, there is still sufficient ongoing activity to cause concern.
  2. Take-down operations aim to disrupt cybercrime involving malicious domains. In the past decade, many successful take-down operations have been reported, including those against the Conficker worm, and most recently, against VPNFilter. Although it plays an important role in fighting cybercrime, the domain take-down procedure is still surprisingly opaque. There seems to be no in-depth understanding about how the take-down operation works and whether there is due diligence to ensure its security and reliability. In this paper, we report the first systematic study on domain takedown. Our study was made possible via a large collection of data, including various sinkhole feeds and blacklists, passive DNS data spanning six years, and historical WHOIS information. Over these datasets, we built a unique methodology that extensively used various reverse lookups and other data analysis techniques to address the challenges in identifying taken-down domains, sinkhole operators, and take-down durations. Applying the methodology on the data, we discovered over 620K takendown domains and conducted a longitudinal analysis on the take-down process, thus facilitating a better understanding of the operation and its weaknesses. We found that more than 14% of domains taken-down over the past ten months have been released back to the domain market and thatmore »some of the released domains have been repurchased by the malicious actor again before being captured and seized, either by the same or different sinkholes. In addition, we showed that the misconfiguration of DNS records corresponding to the sinkholed domains allowed us to hijack a domain that was seized by the FBI. Further, we found that expired sinkholes have caused the transfer of around 30K takendown domains whose traffic is now under the control of new owners.« less
  3. Identifying people in photographs is a critical task in a wide variety of domains, from national security [7] to journalism [14] to human rights investigations [1]. The task is also fundamentally complex and challenging. With the world population at 7.6 billion and growing, the candidate pool is large. Studies of human face recognition ability show that the average person incorrectly identifies two people as similar 20–30% of the time, and trained police detectives do not perform significantly better [11]. Computer vision-based face recognition tools have gained considerable ground and are now widely available commercially, but comparisons to human performance show mixed results at best [2,10,16]. Automated face recognition techniques, while powerful, also have constraints that may be impractical for many real-world contexts. For example, face recognition systems tend to suffer when the target image or reference images have poor quality or resolution, as blemishes or discolorations may be incorrectly recognized as false positives for facial landmarks. Additionally, most face recognition systems ignore some salient facial features, like scars or other skin characteristics, as well as distinctive non-facial features, like ear shape or hair or facial hair styles. This project investigates how we can overcome these limitations to support person identificationmore »tasks. By adjusting confidence thresholds, users of face recognition can generally expect high recall (few false negatives) at the cost of low precision (many false positives). Therefore, we focus our work on the “last mile” of person identification, i.e., helping a user find the correct match among a large set of similarlooking candidates suggested by face recognition. Our approach leverages the powerful capabilities of the human vision system and collaborative sensemaking via crowdsourcing to augment the complementary strengths of automatic face recognition. The result is a novel technology pipeline combining collective intelligence and computer vision. We scope this project to focus on identifying soldiers in photos from the American Civil War era (1861– 1865). An estimated 4,000,000 soldiers fought in the war, and most were photographed at least once, due to decreasing costs, the increasing robustness of the format, and the critical events separating friends and family [17]. Over 150 years later, the identities of most of these portraits have been lost, but as museums and archives increasingly digitize and publish their collections online, the pool of reference photos and information has never been more accessible. Historians, genealogists, and collectors work tirelessly to connect names with faces, using largely manual identification methods [3,9]. Identifying people in historical photos is important for preserving material culture [9], correcting the historical record [13], and recognizing contributions of marginalized groups [4], among other reasons.« less
  4. Margueron R ; Holoch D (Ed.)
    Dynamic posttranslational modifications to canonical histones that constitute the nucleosome (H2A, H2B, H3, and H4) control all aspects of enzymatic transactions with DNA. Histone methylation has been studied heavily for the past 20 years, and our mechanistic understanding of the control and function of individual methylation events on specific histone arginine and lysine residues has been greatly improved over the past decade, driven by excellent new tools and methods. Here, we will summarize what is known about the distribution and some of the functions of protein methyltransferases from all major eukaryotic supergroups. The main conclusion is that protein, and specifically histone, methylation is an ancient process. Many taxa in all supergroups have lost some subfamilies of both protein arginine methyltransferases (PRMT) and the heavily studied SET domain lysine methyltransferases (KMT). Over time, novel subfamilies, especially of SET domain proteins, arose. We use the interactions between H3K27 and H3K36 methylation as one example for the complex circuitry of histone modifications that make up the “histone code,” and we discuss one recent example (Paramecium Ezl1) for how extant enzymes that may resemble more ancient SET domain KMTs are able to modify two lysine residues that have divergent functions in plants, fungi, andmore »animals. Complexity of SET domain KMT function in the well-studied plant and animal lineages arose not only by gene duplication but also acquisition of novel DNA- and histone-binding domains in certain subfamilies.« less
  5. The decompiler is one of the most common tools for examining executable binaries without the corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Unfortunately, decompiler output is far from readable because the decompilation process is often incomplete. State-of-the-art techniques use machine learning to predict missing information like variable names. While these approaches are often able to suggest good variable names in context, no existing work examines how the selection of training data influences these machine learning models. We investigate how data provenance and the quality of training data affect performance, and how well, if at all, trained models generalize across software domains. We focus on the variable renaming problem using one such machine learning model, DIRE . We first describe DIRE in detail and the accompanying technique used to generate training data from raw code. We also evaluate DIRE ’s overall performance without respect to data quality. Next, we show how training on more popular, possibly higher quality code (measured using GitHub stars) leads to a more generalizable model because popular code tends to have more diverse variable names. Finally, we evaluate how well DIRE predicts domain-specific identifiers, propose a modification to incorporate domain information,more »and show that it can predict identifiers in domain-specific scenarios 23% more frequently than the original DIRE model.« less