CNN Steganalyzers Leverage Local Embedding Artifacts

Yousfi, Yassine; Butora, Jan; Fridrich, Jessica

While convolutional neural networks have firmly established themselves as the superior steganography detectors, little human-interpretable feedback to the steganographer as to how the network reaches its decision has so far been obtained from trained models. The folklore has it that, unlike rich models, which rely on global statistics, CNNs can leverage spatially localized signals. In this paper, we adapt existing attribution tools, such as Integrated Gradients and Last Activation Maps, to show that CNNs can indeed find overwhelming evidence for steganography from a few highly localized embedding artifacts. We look at the nature of these artifacts via case studies of both modern content-adaptive and older steganographic algorithms. The main culprit is linked to “content creating changes” when the magnitude of a DCT coefficient is increased (Jsteg, –F5), which can be especially detectable for high frequency DCT modes that were originally zeros (J-MiPOD). In contrast, J- UNIWARD introduces the smallest number of locally detectable embedding artifacts among all tested algorithms. Moreover, we find examples of inhibition that facilitate distinguishing between the selection channels of stego algorithms in a multi-class detector. The authors believe that identifying and characterizing local embedding artifacts provides useful feedback for future design of steganographic schemes.

More Like this