IH/IIHMSP13_2.tex

   1 \section{A Cryptographic Approach for Steganography}
   2
   3 Our last reflections in the field of information hiding are theoretical ones,
   4 discussing the relevance of current security notions in this field. Results
   5 of these thoughts, published in a second accepted paper at IIHMSP'13
   6 (9th Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, Beijing, China)
   7  are given thereafter~\cite{bgh13:ip}.
   8
   9 \subsection{Drawbacks of the stego-security notion}
  10 \label{sec:drawbacks}
  11
  12 Theoretically speaking, the stego-security notion matches well with the idea of a perfect
  13 secrecy in the WOA category of attacks. However, its concrete verification raises
  14 several technical problems difficult to get around. These difficulties impact drastically
  15 the effective security of the scheme, as explained in~\cite{bgh13:ip}.
  16
  17 For instance, in a stego-secure scheme, the distribution of the set of watermarked
  18 images must be the same than the one of the original contents, no matter the chosen keys.
  19 But \emph{how to determine practically the distribution of the original contents?}
  20 Furthermore, claiming that Alice can constitutes her own subset of well-chosen images having
  21 the same ``good'' distribution is quite unreasonable in several contexts of
  22 steganography: Alice has not \emph{always} the choice of the supports. Moreover,
  23 it introduces a kind of bias, as the warden can find such similarities surprising.
  24  Suppose however that Alice is in the best situation for her, that is, she has the possibility to
  25 constitute herself the set of original contents. How can she proceed practically
  26 to be certain that all media into the set follow a same distribution $p(X)$?
  27 According to the authors opinion, Alice has two possible choices:
  28 \begin{enumerate}
  29 \item Either she constitutes the set by testing, for each new content, whether this media has a same distribution than the ones that have been already selected.
  30 \item Or she forges directly new images by using existing ones. For instance, she can replace all the
  31 least significant bits of the original contents by using a good pseudorandom number generator.
  32 \end{enumerate}
  33
  34 In the first situation, Alice will realize a $\chi^2$ test, or other statistical tests of this kind,
  35  to determine if the considered image (its least significant bits, or its low frequency coefficients, etc.) has a same distribution than images already selected. In that situation, Alice
  36  does not have the liberty to choose the distribution, and it seems impossible to find a scheme
  37 being able to preserve any kind of distribution, for all secret keys and all hidden messages.
  38 Furthermore, such statistical hypothesis testings are not ideal ones, as they only regard if a result is unlikely to have occurred by chance alone \emph{according to a pre-determined threshold probability} (the significance level). Errors of the first (false positive) and second kind (false negative) occur necessarily, with a certain probability. In other words, with such an approach, Alice
  39 cannot design a perfect set of cover contents having all the same probability $p(X)$.
  40 This process leads to a set of media that follows a distribution Alice does not have access to.
  41
  42 The second situation seems more realistic, it will thus be
  43 further investigated in the next section~\cite{bgh13:ip}.
  44
  45
  46
  47
  48
  49
  50
  51 % \section{New Scenarios of (Secure) Information Hiding}
  52 % \label{scenarii}
  53 % We give in this section some original situations that are
  54 % obviously related to steganography, while being out of scope of
  55 % the topics usually investigated by the information hiding community.
  56 % These basic scenarios deal with hidden messages embedded into
  57 % cover contents in a secure manner. In these scenarios, the
  58 % channel obviously appears as sleazy, but the observer cannot
  59 % deduce anything else (if he cannot break the channel, then
  60 % he lost everything). The theoretical framework that will
  61 % be presented in the next sections can be used to evaluate the
  62 % security of the approaches proposed thereafter.
  63 %
  64 % \subsection{A First Theoretical Scenario}
  65 %
  66 % Let us suppose that the set of supports $\mathcal{A}$ is constituted by random binary words. In that condition, a perfect secret hiding can be obtained by replacing a given part of a support with the encrypted hidden message, where the encryption has been obtained by a one time pad method: a bitwise exclusive or between the hidden message and a given random sequence, called the keystream.
  67 % If Alice and Bob have previously shared the keystream, they can thus attain by this way the level of perfect secrecy, as it is defined in the theory of Shannon, and applied in information
  68 % hiding. It is true that this example is somewhat unrealistic:
  69 % man can reasonably think that Alice should prefer to simply
  70 % encrypt her messages instead of hiding them into encrypted data.
  71 % However, it is a proof of concept in steganography showing
  72 % that, if Alice has the choice of the cover set and Eve cannot
  73 % break the channel, then Alice can reach a perfect secrecy in
  74 % information hiding. And she can be certain that (1) it is
  75 % impossible for the warden to determine which cover images
  76 % contain hidden messages and (2) the content of these messages.
  77 % Additionally of being a proof of concept, this example
  78 % illustrate that a clear framework must be provided, taking
  79 % into account who chooses the set of cover images, the
  80 % objectives Eve wants to reach and the means she can use, and
  81 % so on.
  82 %
  83 % In practice, Alice and Bob can construct their set $\mathcal{A}$ by using several times a cryptographically secure pseudorandom number generator (PRNG) as ISAAC~\cite{Jenkins96} or BBS~\cite{BBS}.
  84 % The keystream can be produced too by this kind of PRNG, leading to a practicable exchange of media that can embed, with no risk, hidden messages: the proposed stegosystem is obviously both realizable and cryptographically secure. Another
  85 % possibility is to put such encrypted hidden messages into
  86 % encrypted media, both encryption being realized by a
  87 % cryptographically secure cryptosystem. Indeed, in these two
  88 % situations, no one can
  89 % make in practice the distinction between images of $\mathcal{A}$ with hidden messages and images without such messages.
  90 %
  91 % %An adversary can have some suspicions, because it is not very common to possess random images in his hard disk. However such suspicions can be at least limited by the following arguments. Firstly,
  92 % Even though this first scenario do not pretend to be applicable
  93 % for Internet exchanges, it can however provide a first
  94 % concrete operational and secured steganographic scheme for
  95 % file storage. Indeed,
  96 % anyone has the right to possess encrypted data. The set $\mathcal{A}$ can be constituted by encrypted data and random data, some of these latter embedding hidden messages. Encrypted data can be corrupted: the encrypted data that contain hidden messages will not return a problem of wrong decryption key, but of corrupted file. Obviously, it is not the fault of Alice if her file system has some problems.
  97 % Concerning random files, technicians, scientists, or engineers can justify their presence by experiments related to their works.
  98 % Finally, when trying to recover data after a clash of the system, or an hard drive critical problem, recovered data are often partial or corrupted. Thus, a third possibility is to dissimulate the stegocontent into the free space of the disk, to make that it appears as a residue of a removed encrypted file. A practical solution is to remove the encrypted file after having dissimulated an hidden message in it: it will be possible to recover it by using appropriate recovery tools.
  99 %
 100 %
 101 %
 102 % \subsection{A Second Scenario}
 103 %
 104 % A second more realistic scenario to achieve in practice the
 105 %  cryptographically secures information hiding is
 106 % to erase all the least significant bits (LSBs) of all the images (stored in a disk, exchanged through the Internet, etc.), and replace them by using a cryptographically secure PRNG.
 107 % The cover set can be constituted by images that have LSBs
 108 % closed to a random law.
 109 % Then, some of these media will embed hidden messages in some places of their LSBs, the hidden messages being encrypted as previously.
 110 %
 111 % By doing so, the allure of the media can appears suspicious, but no adversary can deduce anything else: in practice, they cannot determine whether an image has an hidden message or not. This randomness in LSBs can be explained in several ways
 112 % (images of low quality, filters, etc.). Moreover Alice, being accused by Eve to have possibly hidden messages in their suspicious images, can discover some unimportant secrets, without revealing important messages: Eve has no possibility to determine how much messages have been hidden.
 113 %
 114 % It is true that Eve can remove all these LSBs, because they are suspicious. Indeed, natural images have not, in general, random LSBs. For instance, in the image depicted in Fig.~\ref{Chibouki}, produced by a camera, the LSBs are not random.
 115 % The number of 0's is 1825023 when the number of 1's is equal to 1876161. Thus a chisquared test of adaptation to an uniform distribution, with risk $\alpha=0.05$, fails completely, as
 116 % $$ \dfrac{(x_0-\overline{x}_0)^2}{\overline{x}_0}+ \dfrac{(x_1-\overline{x}_1)^2}{\overline{x}_1} = 706.556 > 3.841,$$
 117 % \noindent where $\overline{x}_0=\overline{x}_1=1850592$ are the intended occurrences of 0's and 1's in a binary sequence of size 3701184 when considering a uniform distribution, whereas $x_0$ and $x_1$ are for the obtained occurrences.
 118 %
 119 % \begin{figure}
 120 % \centering
 121 % \includegraphics{chibouki.jpg}
 122 % \caption{Chibouki, the natural image}
 123 % \label{Chibouki}
 124 % \end{figure}
 125 %
 126 %
 127 % Thus Eve, finding surprisingly a uniform distribution in LSBs, can consider it as suspicious.
 128 % And then, even if she cannot determine the subset of these suspicious images that contains hidden messages,
 129 % she can at least remove the hidden channel (in case of communications through the Internet, for instance).
 130 %
 131 % However, Alice and Bob knows that LSBs are not the good place to hide their messages.
 132 % Indeed, they can find bits in images that seems to be random
 133 %  (for instance, a well chosen subset of LSBs),
 134 % systematically replace these bits
 135 % by other ones produced using a cryptographically secure PRNG, and hide in some of them their encrypted
 136 % messages. More concretely, let us consider the matrix H2 of the Daubechies db1 decomposition of Chibouki
 137 % of level 2. For each coefficient of this matrix of size $444 \times 521$, we take the bit parity
 138 % of its integral part. By doing so, we obtain a binary sequence that follows an uniform distribution
 139 % on $\{0,1\}$, according to the chisquared test of risk 0.01:
 140 % $$ \dfrac{(x_0-\overline{x}_0)^2}{\overline{x}_0}+ \dfrac{(x_1-\overline{x}_1)^2}{\overline{x}_1} = 0.073 \leqslant 6.635 .$$
 141 %
 142 % Thus these bits can be replaced by other ones produced by a cryptographically secure PRNG, and
 143 % some of these modified images can embed securely encrypted images. To improve the robustness
 144 % of the method, batteries of tests as TestU01 can be applied on bits candidates (in images, movies,
 145 % sounds) to be replaced by a PRNG, to check if this designed channel can be used not only
 146 % securely, but non suspiciously too.
 147 % \emph{Mutatis mutandis}, the test can be replaced by the
 148 % return of the steganalyzer of Eve: in a fair game that
 149 % respects the Kerckhoffs' principle, all except the private
 150 % keys are known by the two opposed parties.
 151 %
 152 %
 153 %
 154
 155
 156 \subsection{Toward a cryptographically secure hiding}
 157 \label{sec4}
 158 We recall in this section the theoretical framework for information
 159 hiding security we have proposed in~\cite{bgh13:ip}, which is more closely resembling that
 160 of usual approaches in cryptography, as those presented for PRNGs in Definitions~\ref{cryptoPRNGdef} and \ref{cryptoPRNGdef2}.
 161 It allows to define the
 162 notion of steganalyzers, it is compatible with the new original
 163 scenarios of information hiding that have been dressed
 164 above, and it does not have the drawbacks
 165  of the stego-security definition.
 166
 167
 168 \subsubsection{Definition of a stegosystem}
 169
 170
 171 \begin{Def}[Stegosystem]
 172 Let $\mathcal{S}, \mathcal{M}$, and $\mathcal{K}=\mathds{B}^\ell$
 173 three sets of words on $\mathds{B}$ called respectively the sets of
 174 supports, of messages, and of keys (of size $\ell$).
 175
 176 A \emph{stegosystem} on $(\mathcal{S}, \mathcal{M}, \mathcal{K})$
 177 is a tuple $(\mathcal{I},\mathcal{E}, inv)$ such that:
 178 \begin{itemize}
 179 \item $\mathcal{I}$ is a function from $\mathcal{S} \times \mathcal{M} \times \mathcal{K} $ to $ \mathcal{S}$,
 180 $(s,m,k) \longmapsto \mathcal{I}(s,m,k)=s'$,
 181 \item $\mathcal{E}$ is a function from $\mathcal{S} \times \mathcal{K}$ to $\mathcal{M}$,
 182 $(s,k) \longmapsto \mathcal{E}(s,k) = m'$.
 183 \item $inv$ is a function from $\mathcal{K} $ to $\mathcal{K}$, s.t. $\forall k \in
 184  \mathcal{K}, \forall (s,m)\in \mathcal{S}\times\mathcal{M},$ \linebreak
 185 $ \mathcal{E}(\mathcal{I}(s,m,k),inv(k))=m$.
 186 \item $\mathcal{I}(s,m,k)$ and $\mathcal{E}(c,k^\prime)$ can be computed in
 187  polynomial time.
 188 \end{itemize}
 189  $\mathcal{I}$ is called the insertion or embedding
 190 function, $\mathcal{E}$ the extraction function, $s$ the host content,
 191 $m$ the hidden message, $k$ the embedding key, $k'=inv(k)$ the extraction key, and
 192 $s'$ is the stego-content. If $\forall k \in \mathcal{K}, k=inv(k)$, the stegosystem is symmetric (private-key), otherwise
 193 it is asymmetric (public-key).
 194 \end{Def}
 195
 196
 197
 198
 199 \subsubsection{Heading notions}
 200
 201
 202
 203 \begin{Def}[$(T,\varepsilon)-$distinguishing attack]
 204 Let $S=(\mathcal{I},\mathcal{E}, inv)$
 205 a stegosystem on $(\mathcal{A}, \mathcal{M}, \mathcal{K})$, with
 206 $\mathcal{A} \subset \mathds{B}^M$.
 207 A $(T,\varepsilon)-$distinguishing attack on the stegosystem $S$
 208 is a probabilistic
 209 algorithm $\mathcal{D}:\mathcal{A} \longrightarrow \{0,1\}$ in running time $T$, such that
 210 there exists $m \in \mathcal{M}$, %for all $m \in \mathds{B}^{f(\ell(S_i))}$,
 211 %for all sufficiently large $i$'s,
 212 $$\left| Pr\left[\mathcal{D}\left(\mathcal{I}(s,m,k)\right)=1 \mid k \in_R \mathcal{K}, s \in_R \mathcal{A}\right]
 213  -Pr\left[\mathcal{D}\left(x\right)=1 \mid x \in_R \mathcal{A}\right]\right|\geqslant \varepsilon ,$$
 214 where the probability is also taken over the internal coin flips of $\mathcal{D}$,
 215  and the notation $\in_R$ indicates the process of selecting an element at
 216 random and uniformly over the corresponding set.
 217 \end{Def}
 218
 219 \begin{Def}
 220 A stegosystem is $(T,\varepsilon)-$un\-dis\-tin\-gui\-sha\-ble if there exists no
 221 $(T,\varepsilon)-$distinguishing attack on this stegosystem.
 222 \end{Def}
 223
 224
 225
 226 Intuitively, it means that there is no polynomial-time probabilistic algorithm
 227 being able to distinguish the host contents from the stego-contents~\cite{bgh13:ip}.
 228
 229
 230 \subsubsection{A cryptographically secure information hiding scheme}
 231 \label{findutout}
 232 To show the effectiveness of the approach, we have provided and
 233 proven in~\cite{bgh13:ip} a first cryptographically secure information
 234 hiding scheme, according to the definition above.
 235
 236
 237 \begin{Th}
 238  Let $$\mathcal{S} = \left\{s_1^1, s_2^1,\ldots,s_{2^N}^1, s_1^2,
 239  s_2^2,\ldots,s_{2^N}^2,\ldots,s_1^r, s_2^r,\ldots,s_{2^N}^r\right\}$$
 240 a subset of $\mathds{B}^M = \mathcal{A}.$
 241 Consider $G:\mathds{B}^L \longrightarrow \mathds{B}^N$ a $(T,\varepsilon)-$secure
 242 pseudorandom number generator, and
 243 $\mathcal{I}(s_j^i,m,k)=s^i_{m\oplus G(k)}$. Assuming that $r$ is a
 244 constant, and that from $i,j$ one can
 245 compute the image $s^i_j$ in time $T_1$, the stegosystem is
 246 $(T-T_1-N-1,\varepsilon)$-secure.
 247 \end{Th}
 248
 249 Intuitively, $\mathcal{S}$ is built from $r$ images containing $N$ bits
 250 of low information. The image $s^i_j$ corresponds to the $i$-th image where
 251 the $N$ bits are set to $j$.
 252