Matlab implementation of the ensemble classifier as described in [1]. The first use of the ensemble in steganalysis (even though not fully automatized) appeared in [2].
There is no need to install anything, you can start using the function ensemble.m right away.
The usage of the program under different experimental setups is demonstrated in the attached example files. All needed feature files are also included. We highly recommend spending the time to go through these examples as they show how the program should be used for steganalysis experiments. Additional information can be found in the F.A.Q. section below.
The program is available for public use. Please, remember to recognize our work by citing [1].
Thank you.
Today, the most accurate steganalysis methods for digital media are built as supervised classifiers on feature vectors extracted from the media. The tool of choice for the machine learning seems to be the support vector machine (SVM). In this paper, we propose an alternative and well known machine learning tool – ensemble classifiers – and argue that they are ideally suited for steganalysis. Ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to the much more complex SVMs. The significantly lower training complexity opens up the possibility for the steganalyst to work with rich (high-dimensional) cover models and train on larger training sets – two key elements that appear necessary to reliably detect modern steganographic algorithms. Ensemble classification is portrayed here as a powerful developer tool that allows fast construction of steganography detectors with markedly improved detection accuracy across a wide range of embedding methods. The power of the proposed framework is demonstrated on two steganographic methods that hide messages in JPEG images.
[1] J. Kodovský, J. Fridrich, and V. Holub, Ensemble Classifiers for Steganalysis of Digital Media. IEEE Transactions on Information Forensics and Security, Vol. 7, No. 2, pp. 432-444, April 2012. [pdf]
[2] J. Kodovský, and J. Fridrich, Steganalysis in high dimensions: fusing classifiers built on random subspaces. Proc. SPIE, Electronic Imaging, Media Watermarking, Security, and Forensics XIII, San Francisco, CA, January 23–26, 2011. [pdf] [slides]
Q: Do I need any additional packages, libraries or Matlab toolboxes?
A: No.
Q: What is the format of features used by the ensemble implementation?
A: Conveniently, we use Matlab's *.mat files. Every feature file must contain two variables: F and names. The variable F is a data matrix containing features in a row-by-row manner, i.e. the number of rows corresponds to the number of samples and the number of columns is the feature space dimensionality. The variable names is a cell array whose length is equal to the height of the matrix F and contains the corresponding image filenames from which the features were extracted. See the included tutorial for more details.
Q: I created a useful extension and I would like to contribute and make it public.
A: Send us your extension, together with its description and with a well-commented example.
Q: Is the ensemble really as accurate as SVMs?
A: According to our experiments with features and stego algorithms in both spatial and JPEG domains, the ensemble is in general as accurate as a linear SVM (or slightly better). Regarding the comparison with the Gaussian SVM, there may be a slight drop of performance if decision boundary is more complicated (non-linear).
Q: What are the main advantages of the ensemble over SVMs and other machine learning?
A: Speed. Period. The ensemble is more scalable w.r.t. the training set size and feature space dimensionality - it's complexity scales better with these two parameters (see [1]).