IH/di3.tex

   1
   2 \section{The $\mathcal{DI}_3$ Steganographic Process}
   3 \label{di3sec}
   4 In~\cite{bcfg12a:ip,bcfg12b:ip}, a new steganographic algorithm named  $\mathcal{DI}_3$ is presented. It is
   5 inspired from $\mathcal{CIW}_1$ and $\mathcal{CIS}_2$ respectively published
   6 in~\cite{fgb11:ip} and~\cite{gfb10:ip}, and recalled previously in this article.
   7 Compared to the first one, $\mathcal{DI}_3$ is a steganographic scheme,
   8 not just a watermarking technique. That is, in our understanding, it can embed more than one bit. Unlike
   9 $\mathcal{CIS}_2$, which requires embedding keys with three strategies, only one sequence
  10 is required for $\mathcal{DI}_3$, so it is easier to implement. Indeed
  11 $\mathcal{DI}_3$ is a faster instance of $\mathcal{CIS}_2$, as
  12 there is no message mixing in it.
  13  $\mathcal{DI}_3$ is well-defined mathematically and its security is evaluated in~\cite{bcfg12a:ip},
  14 whereas~\cite{bcfg12b:ip} provides algorithms and investigates its robustness, comparing it to
  15 some well-known watermarking schemes, namely the
  16 YASS~\cite{DBLP:conf/ih/SolankiSM07},
  17  nsF5~\cite{DBLP:conf/mmsec/FridrichPK07}, MMx~\cite{DBLP:conf/ih/KimDR06}, and HUGO~\cite{DBLP:conf/ih/PevnyFB10} algorithms
  18  detailed in the Appendix~\ref{AppendixIH}.
  19
  20
  21 \subsection{Mathematical definitions and notations}
  22 \label{sec:math-def}
  23
  24 New notations and terminologies must be introduced another time in order to be able to define
  25 mathematically the $\mathcal{DI}_3$ steganographic process. They are provided thereafter.
  26
  27
  28 \begin{Def}
  29 The \emph{support of a finite sequence} $S$ of $n$ terms is the finite set
  30 $\mathsf{S}(S)=\left\{ S^k, k < n \right\}$ containing all the distinct
  31 values of $S$. Its cardinality is s.t. $\#\mathsf{S}(S) \leqslant n$.
  32 \end{Def}
  33
  34 \begin{Def}\label{def:injective-sequence}
  35 A finite sequence $S \in \mathds{S}_\mathsf{N}$ of $n$ terms is \emph{injective}
  36 if $n = \#\mathsf{S}(S)$.
  37 It is \emph{onto}
  38 if $N = \#\mathsf{S}(S)$. Finally, it is bijective if and only if it is both
  39 injective and onto, so $n=N = \#\mathsf{S}(S)$.
  40 \end{Def}
  41
  42  ``$S$ is injective'' reflects the fact that  all the $n$ terms of
  43 the sequence $S$ are distinct, while ``$S$ is onto'' means
  44 that all the values of the set $\llbracket 1;\mathsf{N}\rrbracket$
  45 are reached at least once.
  46
  47
  48
  49
  50 \subsection{The new $\mathcal{DI}_3$ process}
  51 \label{sec:new-process-di-1}
  52
  53 In this section, the new algorithm introduced in~\cite{bcfg12a:ip} and studied
  54 in~\cite{bcfg12b:ip} is recalled.
  55 Let $\mathsf{P} \in \mathds{N}^\ast$ be the width, in term of bits, of the
  56   message to embed into the cover media.
  57  $\lambda \in \mathds{N}^{\ast}$ is the number of iterations to realize,
  58  which is s.t. $\lambda > \mathsf{P}$.
  59  $x^0 \in \mathbb{B}^\mathsf{N}$ is for the $\mathsf{N}$ LSCs of a given
  60   cover media $C$ supposed to be uniformly distributed.
  61  $m \in \mathbb{B}^\mathsf{P}$ is the message to hide into $x^0$.
  62 Finally, $S \in \mathbb{S}_\mathsf{P}$ is a strategy such that the
  63   finite sequence $\left\{S^k, k \in \llbracket \lambda - \mathsf{P}
  64   +1;\lambda \rrbracket\right\}$ is injective.
  65
  66
  67 \begin{Rem}
  68 The width $\mathsf{P}$ of the message to hide into the LSCs of the cover media
  69 $x^0$ has to be far smaller than the number of LSCs.
  70 \end{Rem}
  71
  72
  73
  74 The proposed information hiding scheme is defined by:
  75 \begin{Def}[$\mathcal{DI}_3$ Data hiding scheme]\
  76 \label{def:process-DI-1}
  77 $\forall (n,i,j) \in
  78 \mathds{N}^{\ast} \times \llbracket 0;\mathsf{N-1}\rrbracket \times \llbracket
  79 0;\mathsf{P-1}\rrbracket$:
  80
  81 \begin{equation*}
  82 \begin{array}{l}
  83 x_i^n=\left\{
  84 \begin{array}{ll}
  85 x_i^{n-1} & \text{ if }S^n\neq i \\
  86 m_{S^n} & \text{ if }S^n=i.
  87 \end{array}
  88 \right.
  89 \end{array}
  90 \end{equation*}
  91 \end{Def}
  92
  93
  94 The stego-content is the Boolean vector $y = x^\lambda \in
  95 \mathbb{B}^\mathsf{N}$, which will replace the former LSCs,
  96 that is, LSCs of the cover media are replaced by the
  97 vector $y$.
  98
  99
 100 \subsection{Security study}
 101
 102 A security study of the $\mathcal{DI}_3$ steganographic process has been realized
 103 in~\cite{bcfg12a:ip}. Conclusion of this study is summarized thereafter.
 104
 105 \begin{Prop}
 106 $\mathcal{DI}_3$ is stego-secure.
 107 \end{Prop}
 108
 109 This proof of this proposition, provided in~\cite{bcfg12a:ip}, holds for
 110 the following restrictive hypotheses:
 111
 112 \begin{description}
 113 \item[Distribution of LSCs.]
 114 We have supposed that $x^0 \sim
 115 \mathcal{U}\left(\mathbb{B}^\mathsf{N}\right)$
 116 to prove the stego-security of the data hiding process
 117 $\mathcal{DI}_3$.
 118 This hypothesis of the uniform distribution
 119 of the least significant coefficients is obviously
 120  the most restrictive one, but it can
 121 be obtained at least partially in two possible manners.
 122 Either a channel that appears to be random (for instance, when
 123 applying a chi squared test or for test batteries) can be found in the media. Or a systematic process
 124 can be applied on the images to obtain this uniformity, as follows.
 125 Before embedding the hidden message, all the original LSCs must be replaced by randomly
 126 generated ones, hoping so that such cover media will be considered to be noisy by any given attacker.
 127 Let us remark that, in
 128 the field of data anonymity for privacy on the
 129 Internet,  we are in the
 130 ``watermark-only attack'' framework.
 131 As it has been recalled  in
 132 Table~\ref{table:attack-classification},
 133 in that framework,
 134 the attacker has only access to stego-contents, having so no knowledge
 135 of the original media (\emph{i.e.}, before introducing the message in the LSCs random channel).
 136 % These considerations, which have been
 137 % deepened in later publications, will be discussed more largely at
 138 % the end of this chapter.
 139 \item[Distribution of the messages $m$.]
 140  In order to prove the stego-security of the data hiding
 141  process
 142 $\mathcal{DI}_3$, we have supposed that  $m \sim
 143 \mathcal{U}\left(\mathbb{B}^\mathsf{P}\right)$. This hypothesis
 144 of the uniform distribution of the message to hide is not
 145 really restrictive. Indeed, to encrypt the message before its
 146 embedding into the LSCs of cover media, which is usually required
 147 for obvious security reasons, is sufficient to achieve this goal. To say
 148 it different, in order to be in the conditions of applications of the process
 149 $\mathcal{DI}_3$, the hidden message must be encrypted.
 150 \item[Distribution of the strategies $S$.]
 151  To prove the stego-security of the data hiding
 152  process
 153 $\mathcal{DI}_3$, we have finally supposed that  $S \sim
 154 \mathcal{U}\left(\mathbb{S}_\mathsf{P}\right)$.  This hypothesis is not
 155 restrictive too, as any cryptographically secure
 156 pseudorandom generator (PRNG)
 157 satisfies this property. With such PRNGs, it is impossible in
 158 polynomial time, to make the distinction between  random numbers and  numbers provided by
 159 these generators. For instance, \emph{Blum Blum Shub
 160 (BBS)}~\cite{junod1999cryptographic}, \emph{Blum Goldwasser (BG)}~\cite{prng-Blum-goldwasser},
 161 or \emph{ISAAC}~\cite{Jenkins1996} are convenient here.
 162 \end{description}
 163
 164 After this theoretical study of the
 165 $\mathcal{DI}_3$ steganographic process realized in~\cite{bcfg12a:ip},
 166 we have investigated practical aspects, discussing about its
 167 concrete implementation and evaluating its robustness in~\cite{bcfg12b:ip},
 168 while article~\cite{bcfg12a:ip} already mentioned deals with its
 169 ability to face steganalyzers. These practical aspects are summarized below.
 170
 171
 172 \subsection{Implementing the $\mathcal{DI}_3$ scheme}
 173
 174
 175 In the algorithms recalled here, the following notations are used:
 176 $S$ denotes the embedding and extraction strategy,
 177  $H$ the host content or the stego-content depending of the context.
 178   $LSC$ stands for the old or new
 179 LSCs of the host or stego-content $H$ depending of the context too.
 180  $N$ denotes the number of LSCs,
 181  $\lambda$ the number of iterations to realize,
 182  $M$ the secret message, and
 183  $P$ the width of the message (number of bits).
 184
 185
 186 The $\mathcal{DI}_3$ scheme theoretically presented in~\cite{bcfg12a:ip} has been
 187 practically described by three main algorithms in~\cite{bcfg12b:ip}:
 188 \begin{enumerate}
 189   \item Algorithm~\ref{algo:strategy}
 190 generates the embedding strategy, part of the embedding key (with the LSCs and
 191 the number of iterations).
 192 \item Algorithm~\ref{algo:embed} embeds the message into
 193 the LSCs of the cover media using the strategy. The
 194 strategy has been generated by the first algorithm and the same number of
 195 iterations is used.
 196 \item Algorithm~\ref{algo:extract} extracts
 197 the secret message from the LSCs of the media (the stego-content) using the
 198 strategy, which constitutes with the message length the extraction key.
 199 \end{enumerate}
 200
 201 Two other complementary functions must be used:
 202
 203
 204 \begin{enumerate}
 205   \item Algorithm~\ref{algo:signification-function}, which
 206   allows to extract MSCs, LSCs, and passive coefficients from the host content.
 207   Its implementation is based on the concept of signification
 208   function described previously.
 209  \item Algorithm~\ref{algo:build-function}
 210  rebuilds the new host content (the stego-content) from the corresponding MSCs,
 211  LSCs, and passive coefficients. This function realizes the opposite operation of Algorithm~\ref{algo:signification-function}.
 212 \end{enumerate}
 213
 214 \begin{Rem}
 215 These two algorithms depend of the definition of the MSCs,
 216  LSCs, and passive coefficients, which can correspond to
 217 a spatial or frequency description of the host content. This is why they are
 218 not documented here.
 219 \end{Rem}
 220
 221
 222
 223 \begin{algorithm}[h]
 224 \tcc{$S$ is a sequence of
 225 integers into $\llbracket 0,P-1 \rrbracket$, such that $(S_{n_0},\ldots,S_{n_0+P-1})$ is injective on
 226 $\llbracket 0,P-1 \rrbracket$.}
 227 \KwResult{$S$: The strategy, integer sequence $(S_0,S_1,\ldots)$.}
 228 \Begin{
 229 $n_0 \longleftarrow L - P + 1$\;
 230 \If{$P > N$ OR $n_0 < 0$}{
 231 \Return{ERROR}}
 232 $S \longleftarrow$ Array of width $\lambda$, all values initialized to 0\;
 233 $cpt \longleftarrow 0$\;
 234 \While{$cpt < n_0$}{
 235 $S_{cpt} \longleftarrow $Random integer in $\llbracket 0,P-1
 236 \rrbracket$.\;
 237 $cpt \longleftarrow cpt + 1$\;}
 238 $A \longleftarrow$ We generate an arrangement of $\llbracket 0,P-1
 239 \rrbracket$\;
 240 \For{$k \in \llbracket 0,P-1\rrbracket$}{
 241 $S_{n_0 + k} \longleftarrow A_k$\;
 242 }
 243 \Return{$S$}
 244 }
 245 \caption{$strategy(N,P,\lambda)$}
 246 \label{algo:strategy}
 247 \end{algorithm}
 248
 249
 250
 251 \begin{algorithm}[h]
 252 \KwResult{New LSCs with embedded message.}
 253 \Begin{
 254 $N \longleftarrow$ Number of LSCs in $LSC$\;
 255 $P \longleftarrow$ Width of the message $M$\;
 256 \For{$k \in \llbracket 0,\lambda\rrbracket$}{
 257 $i \longleftarrow S_k$\;
 258 $LSC_{i} \longleftarrow M_i$\;
 259 }
 260 \Return{$LSC$}
 261 }
 262 \caption{$embed(LSC, M, S, \lambda)$}
 263 \label{algo:embed}
 264 \end{algorithm}
 265
 266 \begin{algorithm}[h]
 267 \KwResult{The message to extract from $LSC$.}
 268 \Begin{
 269 $RS \longleftarrow$ The strategy $S$ written in reverse order.\;
 270 $M \longleftarrow$ Array of width $P$, all values initialized to 0\;
 271 \For{$k \in \llbracket 0,\lambda\rrbracket$}{
 272 $i \longleftarrow RS_k$\;
 273 $M_{i} \longleftarrow LSC_i$\;
 274 }
 275 \Return{$M$}
 276 }
 277 \caption{$extract(LSC, S, \lambda,P)$}
 278 \label{algo:extract}
 279 \end{algorithm}
 280
 281
 282 \begin{algorithm}[h]
 283 \KwData{$H$: The original host content.}
 284 \KwResult{$MSC$: MSCs of the host content $H$.}
 285 \KwResult{$PC$: Passive coefficients of the host content $H$.}
 286 \KwResult{$LSC$: LSCs of the host content $H$.}
 287 \Begin{
 288 \tcc{Implemented by the user.}
 289 \Return{$(MSC,PC,LSC)$}
 290 }
 291 \caption{$significationFunction(H)$}
 292 \label{algo:signification-function}
 293 \end{algorithm}
 294
 295 \begin{algorithm}[h]
 296 \KwResult{$H$: The new rebuilt host content.}
 297 \Begin{
 298 \tcc{Implemented by the user.}
 299 \Return{$(MSC,PC,LSC)$}
 300 }
 301 \caption{$buildFunction(MSC,PC,LSC)$ )\label{algo:build-function}}
 302 \end{algorithm}
 303
 304
 305 \subsection{Evaluation against steganalyzers}\label{section:steganalysis}
 306
 307
 308 The steganographic scheme detailed in~\cite{bcfg12a:ip}
 309  has been compared to
 310 state of the art steganographic approaches, namely
 311 YASS~\cite{DBLP:conf/ih/SolankiSM07},
 312 HUGO~\cite{DBLP:conf/ih/PevnyFB10}, and
 313 nsF5~\cite{DBLP:conf/mmsec/FridrichPK07} detailed in the Appendix~\ref{AppendixIH}.
 314 This study, realized in~\cite{bcfg12a:ip}, is summarized thereafter.
 315
 316 The steganalysis is based on the BOSS image
 317 database~\cite{DBLP:conf/ih/BasFP11},
 318 which consists in a set of 10 000 512x512 greyscale images.
 319 We have randomly selected 50 of them to compute the cover set.
 320 Since YASS and nsF5 are dedicated to JPEG
 321 support, all these images have been firstly translated into JPEG format
 322 thanks to the \verb+mogrify+ command line.
 323 To allow the comparison between steganographic schemes, the relative payload
 324 is always set with 0.1 bit per pixel. Under that constrain,
 325 the embedded message $m$ is a sequence of 26214 randomly generated bits.
 326 This step has led to distinguish four sets of stego contents, one for
 327 each steganographic approach.
 328
 329 We have next used in~\cite{bcfg12a:ip} the steganalysis tool developed by the HugoBreakers
 330 team~\cite{holmes11,ensemble11} based on AI classifier and which won
 331 the BOSS competition~\cite{DBLP:conf/ih/BasFP11}.
 332 Table~\ref{table:holme} summarizes these steganalysis results expressed
 333 as the error probabilities of the steganalyser, as they are given in~\cite{bcfg12a:ip}. The errors are the mean
 334 of the false alarms and of the missed detection. An error that is closed
 335 to 0.5 signifies that deciding whether an image contains
 336 a stego content is a random choice for the steganalyser.
 337 Conversely, a tiny error denotes that the steganalyser can easily classify
 338 stego content and non stego content.
 339
 340 \begin{table}[ht]
 341 \begin{center}
 342 \begin{tabular}{|l|l|l|l|l|}
 343 \hline
 344 Steganographic Tool & $\mathcal{DI}_3$ & YASS & HUGO & NsF5 \\
 345 \hline
 346 Error Probability   & 0.4133& 0.0067& 0.495 & 0.47 \\
 347 \hline
 348 \end{tabular}
 349 \end{center}
 350 \caption{Steganalysis results of HugoBreakers steganalyser applied on
 351 steganographic scheme}\label{table:holme}
 352 \end{table}
 353
 354
 355 The best result is obtained by HUGO, which is closed to the perfect
 356 steganographic approach to the considered steganalyser, since the error is about 0.5. However, even if
 357 the approach detailed in~\cite{bcfg12a:ip} has no optimization, these
 358 first experiments shown promising results.
 359 % We finally notice that the HugoBreakers's steganalyser should outperform
 360 % these results  on larger image databases, \textit{e.g.}, when applied on the
 361 % whole BOSS image database.
 362 %
 363
 364
 365 \subsection{Robustness study}\label{sec:robustness-study}
 366
 367
 368 This section summarizes the robustness study presented in~\cite{bcfg12b:ip}.
 369 Each experiment is build another time on a set of 50 images, which are randomly selected
 370 among database taken from the BOSS contest~\cite{DBLP:conf/ih/BasFP11}.
 371 Each cover is a $512\times 512$ greyscale digital image.
 372 The relative payload
 373 is always set with 0.1 bit per pixel. Under that constrain,
 374 the embedded message $m$ still remains a sequence of 26214 randomly generated bits.
 375
 376 According to previous similar work in the
 377 field of information hiding, we have conducted in~\cite{bcfg12b:ip} our evaluation
 378 following a same canvas than other robustness studies documented previously in
 379  this article.
 380 We have firstly chosen some classical attacks like cropping,
 381 compression, and rotation ones. The robustness of $\mathcal{DI}_3$ has then been tested
 382 by successively applying on stego content these attacks. Differences
 383 between the message that is extracted from the attacked image and
 384 the original one are then computed and expressed as percentage.
 385
 386
 387 Different percentage of cropping
 388 (from 1\% to 81\%) have been firstly applied on the stego image in~\cite{bcfg12b:ip},
 389 Fig.~\ref{fig:robustness-results}~(c) recalls effects of such attacks.
 390 We have then addressed robustness against JPEG and JPEG 2000 compression,
 391 and results are summarized in Fig.~\ref{fig:robustness-results}~(a-b).
 392 Attacks based on geometric transformations have finally been addressed through
 393 rotations: as presented previously in this article, two opposite rotations
 394 of angle $\theta$ are successively applied around the center of the image.
 395 In these geometric transformations, angles range from 2 to 20
 396 degrees. Effects of such attacks are also recalled in
 397 Fig.~\ref{fig:robustness-results}~(d).
 398
 399
 400 \begin{figure*}[htb]
 401
 402 \begin{minipage}[b]{.45\linewidth}
 403   \centering
 404     \centerline{\includegraphics[width=7cm]{IH/graphs/atq-jpg}}
 405     \centerline{(a) JPEG effect.}
 406 \end{minipage}
 407 \hfill
 408 \begin{minipage}[b]{0.45\linewidth}
 409   \centering
 410     \centerline{\includegraphics[width=7cm]{IH/graphs/atq-jp2}}
 411     \centerline{(b) JPEG 2000 effect.}
 412 \end{minipage}
 413 \\
 414 \begin{minipage}[b]{.45\linewidth}
 415   \centering
 416     \centerline{\includegraphics[width=7cm]{IH/graphs/atq-dec}}
 417     \centerline{(c) Cropping attack.}
 418 \end{minipage}
 419 \hfill
 420 \begin{minipage}[b]{0.45\linewidth}
 421   \centering
 422     \centerline{\includegraphics[width=7cm]{IH/graphs/atq-rot}}
 423     \centerline{(d) Rotation attack.}
 424 \end{minipage}
 425 \caption{Robustness of $\mathcal{DI}_3$ scheme facing several attacks (50 images
 426 from the BOSS repository)}
 427 \label{fig:robustness-results}
 428 \end{figure*}
 429
 430
 431
 432
 433 From all these experiments, one can conclude that
 434 the steganographic scheme does not present obvious drawback and
 435 resists to all the attacks:
 436 all the percentage differences are so far less than 50\%.
 437
 438 % The comparison with robustness of other steganographic schemes exposed in the
 439 % work will be realize in a complementary study, and the best utilization of each
 440 % one in several context will be discuss.
 441
 442 \medskip
 443
 444 All researches presented in previous sections have started from the $\mathcal{CIW}_1$
 445 process, proceeding by successively correcting its drawbacks. By doing so, we have had
 446 a retreat from chaotic iterations. At the same time, the chaotic iterations based information
 447 hiding (dhCI) process, whose the $\mathcal{CIW}_1$ scheme historically arises from,
 448 continued to be investigated in parallel. Results of these investigations are detailed
 449 in the next section.