skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data Augmentation for JPEG Steganalysis
Deep Convolutional Neural Networks (CNNs) have performed remarkably well in JPEG steganalysis. However, they heavily rely on large datasets to avoid overfitting. Data augmentation is a popular technique to inflate the datasets available without collecting new images. For JPEG steganalysis, the augmentations predominantly used by researchers are limited to rotations and flips (D4 augmentations). This is due to the fact that the stego signal is erased by most augmentations used in computer vision. In this paper, we systematically survey a large number of other augmentation techniques and assess their benefit in JPEG steganalysis  more » « less
Award ID(s):
2028119
PAR ID:
10301788
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
13th IEEE Workshop on Information Security and Forensics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Deep learning approaches currently achieve the state-of-the-art results on camera-based vital signs measurement. One of the main challenges with using neural models for these applications is the lack of sufficiently large and diverse datasets. Limited data increases the chances of overfitting models to the available data which in turn can harm generalization. In this paper, we show that the generalizability of imaging photoplethysmography models can be improved by augmenting the training set with "magnified" videos. These augmentations are specifically designed to reveal useful features for recovering the photoplethysmogram. We show that using augmentations of this form is more effective at improving model robustness than other commonly used data augmentation approaches. We show better within-dataset and especially cross-dataset performance with our proposed data augmentation approach on three publicly available datasets. 
    more » « less
  2. null (Ed.)
    In this paper, we investigate the effect of pretraining CNNs on Ima- geNet on their performance when refined for steganalysis of digital images. In many cases, it seems that just ’seeing’ a large number of images helps with the convergence of the network during the refinement no matter what the pretraining task is. To achieve the best performance, the pretraining task should be related to steganal- ysis, even if it is done on a completely mismatched cover and stego datasets. Furthermore, the pretraining does not need to be carried out for very long and can be done with limited computational re- sources. An additional advantage of the pretraining is that it is done on color images and can later be applied for steganalysis of color and grayscale images while still having on-par or better perfor- mance than detectors trained specifically for a given source. The refining process is also much faster than training the network from scratch. The most surprising part of the paper is that networks pretrained on JPEG images are a good starting point for spatial domain steganalysis as well. 
    more » « less
  3. Abstract NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant time, money, or expertise is required to label massive amounts of textual data. Recently, data augmentation methods have been explored as a means of improving data efficiency in NLP. To date, there has been no systematic empirical overview of data augmentation for NLP in the limited labeled data setting, making it difficult to understand which methods work in which settings. In this paper, we provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting, summarizing the landscape of methods (including token-level augmentations, sentence-level augmentations, adversarial augmentations, and hidden-space augmentations) and carrying out experiments on 11 datasets covering topics/news classification, inference tasks, paraphrasing tasks, and single-sentence tasks. Based on the results, we draw several conclusions to help practitioners choose appropriate augmentations in different settings and discuss the current challenges and future directions for limited data learning in NLP. 
    more » « less
  4. null (Ed.)
    Abstract--- The JPEG compatibility attack is a steganalysis method for detecting messages embedded in the spatial representation of images under the assumption that the cover is a decompressed JPEG. This paper focuses on improving the detection accuracy for the difficult case of high JPEG qualities and content-adaptive stego algorithms. Close attention is paid to the robustness of the detection with respect to the JPEG compressor and DCT coefficient quantizer. A likelihood ratio detector derived from a model of quantization errors of DCT coefficients in the recompressed image is used to explain the main mechanism responsible for detection and to understand the results of experiments. The most accurate detector is an SRNet trained on a two-channel input consisting of the image and its SQ error. The detection performance is contrasted with state of the art on four content-adaptive stego methods, wide range of payloads and quality factors. 
    more » « less
  5. null (Ed.)
    In this paper, we study the EfficientNet family pre-trained on ImageNet when used for steganalysis using transfer learning. We show that certain “surgical modifications” aimed at maintaining the input resolution in EfficientNet architectures significantly boost their performance in JPEG steganalysis, establishing thus new benchmarks. The modified models are evaluated by their detection accuracy, the number of parameters, the memory consumption, and the total floating point operations (FLOPs) on the ALASKA II dataset. We also show that, surprisingly, EfficientNets in their “vanilla form” do not perform as well as the SRNet in BOSSbase+BOWS2. This is because, unlike ALASKA II images, BOSSbase+BOWS2 contains aggressively subsampled images with more complex content. The surgical modifications in EfficientNet remedy this underperformance as well. 
    more » « less