skip to main content


Title: A Review of Dark Web: Trends and Future Directions
The dark web is often discussed in taboo by many who are unfamiliar with the subject. However, this paper takes a dive into the skeleton of what constructs the dark web by compiling the research of published essays. The Onion Router (TOR) and other discussed browsers are specialized web browsers that provide anonymity by going through multiple servers and encrypted networks between the host and client, hiding the IP address of both ends. This provides difficulty in terms of controlling or monitoring the dark web, leading to its popularity in criminal underworlds. In this work, we provide an overview of data mining and penetration testing tools that are being widely used to crawl and collect data. We compare the tools to provide strengths and weaknesses of the tools while providing challenges of harnessing massive data from dark web using crawlers and penetration testing tools including machine learning (ML) techniques. Despite the effort to crawl dark web has progressed, there are still rooms to advance existing approaches to combat the ever-changing landscape of the dark web.  more » « less
Award ID(s):
2100115 1723578
NSF-PAR ID:
10347029
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE Conference on Computers, Software & Applications
Page Range / eLocation ID:
1780-1785
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Online trackers are invasive as they track our digital footprints, many of which are sensitive in nature, and when aggregated over time, they can help infer intricate details about our lifestyles and habits. Although much research has been conducted to understand the effectiveness of existing countermeasures for the desktop platform, little is known about how mobile browsers have evolved to handle online trackers. With mobile devices now generating more web traffic than their desktop counterparts, we fill this research gap through a large-scale comparative analysis of mobile web browsers. We crawl 10K valid websites from the Tranco list on real mobile devices. Our data collection process covers both popular generic browsers (e.g., Chrome, Firefox, and Safari) as well as privacy-focused browsers (e.g., Brave, Duck Duck Go, and Firefox-Focus). We use dynamic analysis of runtime execution traces and static analysis of source codes to highlight the tracking behavior of invasive fingerprinters. We also find evidence of tailored content being served to different browsers. In particular, we note that Firefox Focus sees altered script code, whereas Brave and Duck Duck Go have highly similar content. To test the privacy protection of browsers, we measure the responses of each browser in blocking trackers and advertisers and note the strengths and weaknesses of privacy browsers. To establish ground truth, we use well-known block lists, including EasyList, EasyPrivacy, Disconnect and WhoTracksMe and find that Brave generally blocks the highest number of content that should be blocked as per these lists. Focus performs better against social trackers, and Duck Duck Go restricts third-party trackers that perform email-based tracking. 
    more » « less
  2. Geotechnical data are increasingly utilized to aid investigations of coastal erosion and the development of coastal morphological models; however, measurement techniques are still challenged by environmental conditions and accessibility in coastal areas, and particularly, by nearshore conditions. These challenges are exacerbated for Arctic coastal environments. This article reviews existing and emerging data collection methods in the context of geotechnical investigations of Arctic coastal erosion and nearshore change. Specifically, the use of cone penetration testing (CPT), which can provide key data for the mapping of soil and ice layers as well as for the assessment of slope and block failures, and the use of free-fall penetrometers (FFPs) for rapid mapping of seabed surface conditions, are discussed. Because of limitations in the spatial coverage and number of available in situ point measurements by penetrometers, data fusion with geophysical and remotely sensed data is considered. Offshore and nearshore, the combination of acoustic surveying with geotechnical testing can optimize large-scale seabed characterization, while onshore most recent developments in satellite-based and unmanned-aerial-vehicle-based data collection offer new opportunities to enhance spatial coverage and collect information on bathymetry and topography, amongst others. Emphasis is given to easily deployable and rugged techniques and strategies that can offer near-term opportunities to fill current gaps in data availability. This review suggests that data fusion of geotechnical in situ testing, using CPT to provide soil information at deeper depths and even in the presence of ice and using FFPs to offer rapid and large-coverage geotechnical testing of surface sediments (i.e., in the upper tens of centimeters to meters of sediment depth), combined with acoustic seabed surveying and emerging remote sensing tools, has the potential to provide essential data to improve the prediction of Arctic coastal erosion, particularly where climate-driven changes in soil conditions may bias the use of historic observations of erosion for future prediction. 
    more » « less
  3. International dark web platforms operating within multiple geopolitical regions and languages host a myriad of hacker assets such as malware, hacking tools, hacking tutorials, and malicious source code. Cybersecurity analytics organizations employ machine learning models trained on human-labeled data to automatically detect these assets and bolster their situational awareness. However, the lack of human-labeled training data is prohibitive when analyzing foreign-language dark web content. In this research note, we adopt the computational design science paradigm to develop a novel IT artifact for cross-lingual hacker asset detection(CLHAD). CLHAD automatically leverages the knowledge learned from English content to detect hacker assets in non-English dark web platforms. CLHAD encompasses a novel Adversarial deep representation learning (ADREL) method, which generates multilingual text representations using generative adversarial networks (GANs). Drawing upon the state of the art in cross-lingual knowledge transfer, ADREL is a novel approach to automatically extract transferable text representations and facilitate the analysis of multilingual content. We evaluate CLHAD on Russian, French, and Italian dark web platforms and demonstrate its practical utility in hacker asset profiling, and conduct a proof-of-concept case study. Our analysis suggests that cybersecurity managers may benefit more from focusing on Russian to identify sophisticated hacking assets. In contrast, financial hacker assets are scattered among several dominant dark web languages. Managerial insights for security managers are discussed at operational and strategic levels. 
    more » « less
  4. X.509 certificates underpin the security of the Internet economy, notably secure web servers, and they need to be revoked promptly and reliably once they are compromised. The original revocation method specified in the X.509 standard, to distribute certificate revocation lists (CRLs), is both old and untrustworthy. CRLs are susceptible to attacks such as Man-in-the-Middle and Denial of Service. The newer Online Certificate Status Protocol (OCSP) and OCSP-stapling approaches have well-known drawbacks as well. The primary contribution of this paper is Secure Revocation as a Peer Service (SCRaaPS). SCRaaPS is an alternative, reliable way to support X.509 certificate revocation via the Scrybe secure provenance system. The blockchain support of Scrybe enables the creation of a durable, reliable revocation service that can withstand Denial-of-Service attacks and ensures non-repudiation of certificates revoked. We provide cross-CA-revocation information and address the additional problem of intermediate-certificate revocation with the knock-on effects on certificates derived thereof. A Cuckoo filter provides quick, communication-free testing by servers and browsers against our current revocation list (with no false negatives). A further contribution of this work is that the revocation service can fit in as a drop-in replacement for OCSP-stapling with superior performance and coverage both for servers and browsers. Potential revocation indicated by our Cuckoo filter is backed up by rigorous service query to eliminate false positives. Cuckoo filter parameters are also stored in our blockchain to provide open access to this algorithmic option for detection. We describe the advantages of using a blockchain-based system and, in particular, the approach to distributed ledger technology and lightweight mining enabled by Scrybe, which was designed with secure provenance in mind. 
    more » « less
  5. null (Ed.)
    The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors’ homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency ( ) values. Our evaluation shows that   values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions. 
    more » « less