1 \section{A Cryptographic Approach for Steganography}
3 Our last reflections in the field of information hiding are theoretical ones,
4 discussing the relevance of current security notions in this field. Results
5 of these thoughts, published in a second accepted paper at IIHMSP'13
6 (9th Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, Beijing, China)
7 are given thereafter~\cite{bgh13:ip}.
9 \subsection{Drawbacks of the stego-security notion}
12 Theoretically speaking, the stego-security notion matches well with the idea of a perfect
13 secrecy in the WOA category of attacks. However, its concrete verification raises
14 several technical problems difficult to get around. These difficulties impact drastically
15 the effective security of the scheme, as explained in~\cite{bgh13:ip}.
17 For instance, in a stego-secure scheme, the distribution of the set of watermarked
18 images must be the same than the one of the original contents, no matter the chosen keys.
19 But \emph{how to determine practically the distribution of the original contents?}
20 Furthermore, claiming that Alice can constitutes her own subset of well-chosen images having
21 the same ``good'' distribution is quite unreasonable in several contexts of
22 steganography: Alice has not \emph{always} the choice of the supports. Moreover,
23 it introduces a kind of bias, as the warden can find such similarities surprising.
24 Suppose however that Alice is in the best situation for her, that is, she has the possibility to
25 constitute herself the set of original contents. How can she proceed practically
26 to be certain that all media into the set follow a same distribution $p(X)$?
27 According to the authors opinion, Alice has two possible choices:
29 \item Either she constitutes the set by testing, for each new content, whether this media has a same distribution than the ones that have been already selected.
30 \item Or she forges directly new images by using existing ones. For instance, she can replace all the
31 least significant bits of the original contents by using a good pseudorandom number generator.
34 In the first situation, Alice will realize a $\chi^2$ test, or other statistical tests of this kind,
35 to determine if the considered image (its least significant bits, or its low frequency coefficients, etc.) has a same distribution than images already selected. In that situation, Alice
36 does not have the liberty to choose the distribution, and it seems impossible to find a scheme
37 being able to preserve any kind of distribution, for all secret keys and all hidden messages.
38 Furthermore, such statistical hypothesis testings are not ideal ones, as they only regard if a result is unlikely to have occurred by chance alone \emph{according to a pre-determined threshold probability} (the significance level). Errors of the first (false positive) and second kind (false negative) occur necessarily, with a certain probability. In other words, with such an approach, Alice
39 cannot design a perfect set of cover contents having all the same probability $p(X)$.
40 This process leads to a set of media that follows a distribution Alice does not have access to.
42 The second situation seems more realistic, it will thus be
43 further investigated in the next section~\cite{bgh13:ip}.
51 % \section{New Scenarios of (Secure) Information Hiding}
53 % We give in this section some original situations that are
54 % obviously related to steganography, while being out of scope of
55 % the topics usually investigated by the information hiding community.
56 % These basic scenarios deal with hidden messages embedded into
57 % cover contents in a secure manner. In these scenarios, the
58 % channel obviously appears as sleazy, but the observer cannot
59 % deduce anything else (if he cannot break the channel, then
60 % he lost everything). The theoretical framework that will
61 % be presented in the next sections can be used to evaluate the
62 % security of the approaches proposed thereafter.
64 % \subsection{A First Theoretical Scenario}
66 % Let us suppose that the set of supports $\mathcal{A}$ is constituted by random binary words. In that condition, a perfect secret hiding can be obtained by replacing a given part of a support with the encrypted hidden message, where the encryption has been obtained by a one time pad method: a bitwise exclusive or between the hidden message and a given random sequence, called the keystream.
67 % If Alice and Bob have previously shared the keystream, they can thus attain by this way the level of perfect secrecy, as it is defined in the theory of Shannon, and applied in information
68 % hiding. It is true that this example is somewhat unrealistic:
69 % man can reasonably think that Alice should prefer to simply
70 % encrypt her messages instead of hiding them into encrypted data.
71 % However, it is a proof of concept in steganography showing
72 % that, if Alice has the choice of the cover set and Eve cannot
73 % break the channel, then Alice can reach a perfect secrecy in
74 % information hiding. And she can be certain that (1) it is
75 % impossible for the warden to determine which cover images
76 % contain hidden messages and (2) the content of these messages.
77 % Additionally of being a proof of concept, this example
78 % illustrate that a clear framework must be provided, taking
79 % into account who chooses the set of cover images, the
80 % objectives Eve wants to reach and the means she can use, and
83 % In practice, Alice and Bob can construct their set $\mathcal{A}$ by using several times a cryptographically secure pseudorandom number generator (PRNG) as ISAAC~\cite{Jenkins96} or BBS~\cite{BBS}.
84 % The keystream can be produced too by this kind of PRNG, leading to a practicable exchange of media that can embed, with no risk, hidden messages: the proposed stegosystem is obviously both realizable and cryptographically secure. Another
85 % possibility is to put such encrypted hidden messages into
86 % encrypted media, both encryption being realized by a
87 % cryptographically secure cryptosystem. Indeed, in these two
88 % situations, no one can
89 % make in practice the distinction between images of $\mathcal{A}$ with hidden messages and images without such messages.
91 % %An adversary can have some suspicions, because it is not very common to possess random images in his hard disk. However such suspicions can be at least limited by the following arguments. Firstly,
92 % Even though this first scenario do not pretend to be applicable
93 % for Internet exchanges, it can however provide a first
94 % concrete operational and secured steganographic scheme for
95 % file storage. Indeed,
96 % anyone has the right to possess encrypted data. The set $\mathcal{A}$ can be constituted by encrypted data and random data, some of these latter embedding hidden messages. Encrypted data can be corrupted: the encrypted data that contain hidden messages will not return a problem of wrong decryption key, but of corrupted file. Obviously, it is not the fault of Alice if her file system has some problems.
97 % Concerning random files, technicians, scientists, or engineers can justify their presence by experiments related to their works.
98 % Finally, when trying to recover data after a clash of the system, or an hard drive critical problem, recovered data are often partial or corrupted. Thus, a third possibility is to dissimulate the stegocontent into the free space of the disk, to make that it appears as a residue of a removed encrypted file. A practical solution is to remove the encrypted file after having dissimulated an hidden message in it: it will be possible to recover it by using appropriate recovery tools.
102 % \subsection{A Second Scenario}
104 % A second more realistic scenario to achieve in practice the
105 % cryptographically secures information hiding is
106 % to erase all the least significant bits (LSBs) of all the images (stored in a disk, exchanged through the Internet, etc.), and replace them by using a cryptographically secure PRNG.
107 % The cover set can be constituted by images that have LSBs
108 % closed to a random law.
109 % Then, some of these media will embed hidden messages in some places of their LSBs, the hidden messages being encrypted as previously.
111 % By doing so, the allure of the media can appears suspicious, but no adversary can deduce anything else: in practice, they cannot determine whether an image has an hidden message or not. This randomness in LSBs can be explained in several ways
112 % (images of low quality, filters, etc.). Moreover Alice, being accused by Eve to have possibly hidden messages in their suspicious images, can discover some unimportant secrets, without revealing important messages: Eve has no possibility to determine how much messages have been hidden.
114 % It is true that Eve can remove all these LSBs, because they are suspicious. Indeed, natural images have not, in general, random LSBs. For instance, in the image depicted in Fig.~\ref{Chibouki}, produced by a camera, the LSBs are not random.
115 % The number of 0's is 1825023 when the number of 1's is equal to 1876161. Thus a chisquared test of adaptation to an uniform distribution, with risk $\alpha=0.05$, fails completely, as
116 % $$ \dfrac{(x_0-\overline{x}_0)^2}{\overline{x}_0}+ \dfrac{(x_1-\overline{x}_1)^2}{\overline{x}_1} = 706.556 > 3.841,$$
117 % \noindent where $\overline{x}_0=\overline{x}_1=1850592$ are the intended occurrences of 0's and 1's in a binary sequence of size 3701184 when considering a uniform distribution, whereas $x_0$ and $x_1$ are for the obtained occurrences.
121 % \includegraphics{chibouki.jpg}
122 % \caption{Chibouki, the natural image}
127 % Thus Eve, finding surprisingly a uniform distribution in LSBs, can consider it as suspicious.
128 % And then, even if she cannot determine the subset of these suspicious images that contains hidden messages,
129 % she can at least remove the hidden channel (in case of communications through the Internet, for instance).
131 % However, Alice and Bob knows that LSBs are not the good place to hide their messages.
132 % Indeed, they can find bits in images that seems to be random
133 % (for instance, a well chosen subset of LSBs),
134 % systematically replace these bits
135 % by other ones produced using a cryptographically secure PRNG, and hide in some of them their encrypted
136 % messages. More concretely, let us consider the matrix H2 of the Daubechies db1 decomposition of Chibouki
137 % of level 2. For each coefficient of this matrix of size $444 \times 521$, we take the bit parity
138 % of its integral part. By doing so, we obtain a binary sequence that follows an uniform distribution
139 % on $\{0,1\}$, according to the chisquared test of risk 0.01:
140 % $$ \dfrac{(x_0-\overline{x}_0)^2}{\overline{x}_0}+ \dfrac{(x_1-\overline{x}_1)^2}{\overline{x}_1} = 0.073 \leqslant 6.635 .$$
142 % Thus these bits can be replaced by other ones produced by a cryptographically secure PRNG, and
143 % some of these modified images can embed securely encrypted images. To improve the robustness
144 % of the method, batteries of tests as TestU01 can be applied on bits candidates (in images, movies,
145 % sounds) to be replaced by a PRNG, to check if this designed channel can be used not only
146 % securely, but non suspiciously too.
147 % \emph{Mutatis mutandis}, the test can be replaced by the
148 % return of the steganalyzer of Eve: in a fair game that
149 % respects the Kerckhoffs' principle, all except the private
150 % keys are known by the two opposed parties.
156 \subsection{Toward a cryptographically secure hiding}
158 We recall in this section the theoretical framework for information
159 hiding security we have proposed in~\cite{bgh13:ip}, which is more closely resembling that
160 of usual approaches in cryptography, as those presented for PRNGs in Definitions~\ref{cryptoPRNGdef} and \ref{cryptoPRNGdef2}.
161 It allows to define the
162 notion of steganalyzers, it is compatible with the new original
163 scenarios of information hiding that have been dressed
164 above, and it does not have the drawbacks
165 of the stego-security definition.
168 \subsubsection{Definition of a stegosystem}
171 \begin{Def}[Stegosystem]
172 Let $\mathcal{S}, \mathcal{M}$, and $\mathcal{K}=\mathds{B}^\ell$
173 three sets of words on $\mathds{B}$ called respectively the sets of
174 supports, of messages, and of keys (of size $\ell$).
176 A \emph{stegosystem} on $(\mathcal{S}, \mathcal{M}, \mathcal{K})$
177 is a tuple $(\mathcal{I},\mathcal{E}, inv)$ such that:
179 \item $\mathcal{I}$ is a function from $\mathcal{S} \times \mathcal{M} \times \mathcal{K} $ to $ \mathcal{S}$,
180 $(s,m,k) \longmapsto \mathcal{I}(s,m,k)=s'$,
181 \item $\mathcal{E}$ is a function from $\mathcal{S} \times \mathcal{K}$ to $\mathcal{M}$,
182 $(s,k) \longmapsto \mathcal{E}(s,k) = m'$.
183 \item $inv$ is a function from $\mathcal{K} $ to $\mathcal{K}$, s.t. $\forall k \in
184 \mathcal{K}, \forall (s,m)\in \mathcal{S}\times\mathcal{M},$ \linebreak
185 $ \mathcal{E}(\mathcal{I}(s,m,k),inv(k))=m$.
186 \item $\mathcal{I}(s,m,k)$ and $\mathcal{E}(c,k^\prime)$ can be computed in
189 $\mathcal{I}$ is called the insertion or embedding
190 function, $\mathcal{E}$ the extraction function, $s$ the host content,
191 $m$ the hidden message, $k$ the embedding key, $k'=inv(k)$ the extraction key, and
192 $s'$ is the stego-content. If $\forall k \in \mathcal{K}, k=inv(k)$, the stegosystem is symmetric (private-key), otherwise
193 it is asymmetric (public-key).
199 \subsubsection{Heading notions}
203 \begin{Def}[$(T,\varepsilon)-$distinguishing attack]
204 Let $S=(\mathcal{I},\mathcal{E}, inv)$
205 a stegosystem on $(\mathcal{A}, \mathcal{M}, \mathcal{K})$, with
206 $\mathcal{A} \subset \mathds{B}^M$.
207 A $(T,\varepsilon)-$distinguishing attack on the stegosystem $S$
209 algorithm $\mathcal{D}:\mathcal{A} \longrightarrow \{0,1\}$ in running time $T$, such that
210 there exists $m \in \mathcal{M}$, %for all $m \in \mathds{B}^{f(\ell(S_i))}$,
211 %for all sufficiently large $i$'s,
212 $$\left| Pr\left[\mathcal{D}\left(\mathcal{I}(s,m,k)\right)=1 \mid k \in_R \mathcal{K}, s \in_R \mathcal{A}\right]
213 -Pr\left[\mathcal{D}\left(x\right)=1 \mid x \in_R \mathcal{A}\right]\right|\geqslant \varepsilon ,$$
214 where the probability is also taken over the internal coin flips of $\mathcal{D}$,
215 and the notation $\in_R$ indicates the process of selecting an element at
216 random and uniformly over the corresponding set.
220 A stegosystem is $(T,\varepsilon)-$un\-dis\-tin\-gui\-sha\-ble if there exists no
221 $(T,\varepsilon)-$distinguishing attack on this stegosystem.
226 Intuitively, it means that there is no polynomial-time probabilistic algorithm
227 being able to distinguish the host contents from the stego-contents~\cite{bgh13:ip}.
230 \subsubsection{A cryptographically secure information hiding scheme}
232 To show the effectiveness of the approach, we have provided and
233 proven in~\cite{bgh13:ip} a first cryptographically secure information
234 hiding scheme, according to the definition above.
238 Let $$\mathcal{S} = \left\{s_1^1, s_2^1,\ldots,s_{2^N}^1, s_1^2,
239 s_2^2,\ldots,s_{2^N}^2,\ldots,s_1^r, s_2^r,\ldots,s_{2^N}^r\right\}$$
240 a subset of $\mathds{B}^M = \mathcal{A}.$
241 Consider $G:\mathds{B}^L \longrightarrow \mathds{B}^N$ a $(T,\varepsilon)-$secure
242 pseudorandom number generator, and
243 $\mathcal{I}(s_j^i,m,k)=s^i_{m\oplus G(k)}$. Assuming that $r$ is a
244 constant, and that from $i,j$ one can
245 compute the image $s^i_j$ in time $T_1$, the stegosystem is
246 $(T-T_1-N-1,\varepsilon)$-secure.
249 Intuitively, $\mathcal{S}$ is built from $r$ images containing $N$ bits
250 of low information. The image $s^i_j$ corresponds to the $i$-th image where
251 the $N$ bits are set to $j$.