IH/ihtiap12.tex

   1 %\section{Application of Steganography for Anonymity through the Internet~\cite{bcfg12a:ip}}
   2
   3 \section{The $\mathcal{DI}_3$ steganographic process~\cite{bcfg12a:ip,bcfg12b:ip}}
   4
   5 In~\cite{bcfg12a:ip,bcfg12b:ip}, a new steganographic algorithm named  $\mathcal{DI}_3$ is presented. It is
   6 inspired from $\mathcal{CIW}_1$ and $\mathcal{CIS}_2$ respectively published\r
   7 in~\cite{fgb11:ip} and~\cite{gfb10:ip}, and recalled previously in this chapter.
   8 Compared to the first one, $\mathcal{DI}_3$ is a steganographic scheme,
   9 not just a watermarking technique. That is, in our understanding, it can embed more than one bit. Unlike\r
  10 $\mathcal{CIS}_2$, which requires embedding keys with three strategies, only one sequence\r
  11 is required for $\mathcal{DI}_3$, so it is easier to implement. Indeed
  12 $\mathcal{DI}_3$ is a faster instance of $\mathcal{CIS}_2$, as
  13 there is no message mixing in it.
  14  $\mathcal{DI}_3$ is well-defined mathematically and its security is evaluated in~\cite{bcfg12a:ip},
  15 whereas~\cite{bcfg12b:ip} provides algorithms and investigates its robustness, comparing it to
  16 some well-known watermarking schemes, namely the
  17 YASS~\cite{DBLP:conf/ih/SolankiSM07}, \r
  18  nsF5~\cite{DBLP:conf/mmsec/FridrichPK07}, MMx~\cite{DBLP:conf/ih/KimDR06}, and HUGO~\cite{DBLP:conf/ih/PevnyFB10} algorithms
  19  detailed in the Appendix~\ref{AppendixIH}.
  20
  21
  22 \subsection{Mathematical definitions and notations}
  23 \label{sec:math-def}
  24
  25 New notations and terminologies must be introduced another time in order to be able to define
  26 mathematically the $\mathcal{DI}_3$ steganographic process. They are provided thereafter.
  27
  28
  29 \begin{Def}
  30 The \emph{support of a finite sequence} $S$ of $n$ terms is the finite set
  31 $\mathsf{S}(S)=\left\{ S^k, k < n \right\}$ containing all the distinct
  32 values of $S$. Its cardinality is s.t. $\#\mathsf{S}(S) \leqslant n$.
  33 \end{Def}
  34
  35 \begin{Def}\label{def:injective-sequence}
  36 A finite sequence $S \in \mathds{S}_\mathsf{N}$ of $n$ terms is \emph{injective}
  37 if $n = \#\mathsf{S}(S)$.
  38 It is \emph{onto}
  39 if $N = \#\mathsf{S}(S)$. Finally, it is bijective if and only if it is both
  40 injective and onto, so $n=N = \#\mathsf{S}(S)$.
  41 \end{Def}
  42
  43  ``$S$ is injective'' reflects the fact that  all the $n$ terms of
  44 the sequence $S$ are distinct, while ``$S$ is onto'' means
  45 that all the values of the set $\llbracket 1;\mathsf{N}\rrbracket$
  46 are reached at least once.
  47
  48
  49
  50
  51 \subsection{The new $\mathcal{DI}_3$ process}
  52 \label{sec:new-process-di-1}
  53
  54 In this section, the new algorithm introduced in~\cite{bcfg12a:ip} and studied
  55 in~\cite{bcfg12b:ip} is recalled.
  56 Let $\mathsf{P} \in \mathds{N}^\ast$ be the width, in term of bits, of the
  57   message to embed into the cover media.
  58  $\lambda \in \mathds{N}^{\ast}$ is the number of iterations to realize,
  59  which is s.t. $\lambda > \mathsf{P}$.
  60  $x^0 \in \mathbb{B}^\mathsf{N}$ is for the $\mathsf{N}$ LSCs of a given
  61   cover media $C$ supposed to be uniformly distributed.
  62  $m \in \mathbb{B}^\mathsf{P}$ is the message to hide into $x^0$.
  63 Finally, $S \in \mathbb{S}_\mathsf{P}$ is a strategy such that the
  64   finite sequence $\left\{S^k, k \in \llbracket \lambda - \mathsf{P}
  65   +1;\lambda \rrbracket\right\}$ is injective.
  66
  67
  68 \begin{Rem}
  69 The width $\mathsf{P}$ of the message to hide into the LSCs of the cover media
  70 $x^0$ has to be far smaller than the number of LSCs.
  71 \end{Rem}
  72
  73
  74
  75 The proposed information hiding scheme is defined by:
  76 \begin{Def}[$\mathcal{DI}_3$ Data hiding scheme]\
  77 \label{def:process-DI-1}
  78 $\forall (n,i,j) \in
  79 \mathds{N}^{\ast} \times \llbracket 0;\mathsf{N-1}\rrbracket \times \llbracket
  80 0;\mathsf{P-1}\rrbracket$:
  81
  82 \begin{equation*}
  83 \begin{array}{l}
  84 x_i^n=\left\{
  85 \begin{array}{ll}
  86 x_i^{n-1} & \text{ if }S^n\neq i \\
  87 m_{S^n} & \text{ if }S^n=i.
  88 \end{array}
  89 \right.
  90 \end{array}
  91 \end{equation*}
  92 \end{Def}
  93
  94
  95 The stego-content is the Boolean vector $y = x^\lambda \in
  96 \mathbb{B}^\mathsf{N}$, which will replace the former LSCs,
  97 that is, LSCs of the cover media are replaced by the
  98 vector $y$.
  99
 100
 101 \subsection{Security study}
 102
 103 A security study of the $\mathcal{DI}_3$ steganographic process has been realized
 104 in~\cite{bcfg12a:ip}. Conclusion of this study is summarized thereafter.
 105
 106 \begin{Prop}
 107 $\mathcal{DI}_3$ is stego-secure.
 108 \end{Prop}
 109
 110 Thie proof of this proposition, provided in~\cite{bcfg12a:ip}, holds for
 111 the following restrictive hypotheses:
 112
 113 \begin{description}
 114 \item[Distribution of LSCs.]
 115 We have supposed that $x^0 \sim
 116 \mathcal{U}\left(\mathbb{B}^\mathsf{N}\right)$
 117 to prove the stego-security of the data hiding process
 118 $\mathcal{DI}_3$.
 119 This hypothesis of the uniform distribution
 120 of the least significant coefficients is obviously
 121  the most restrictive one, but it can
 122 be obtained at least partially in two possible manners.
 123 Either a channel that appears to be random (for instance, when
 124 applying a chi squared test, or for test batteries recalled in a
 125 previous chapter) can be found in the media. Or a systematic process
 126 can be applied on the images to obtain this uniformity, as follows.
 127 Before embedding the hidden message, all the original LSCs must be replaced by randomly
 128 generated ones, hoping so that such cover media will be considered to be noisy by any given attacker.
 129 Let us remark that, in
 130 the field of data anonymity for privacy on the
 131 Internet,  we are in the
 132 ``watermark-only attack'' framework.
 133 As it has been recalled  in
 134 Table~\ref{table:attack-classification},
 135 in that framework,
 136 the attacker has only access to stego-contents, having so no knowledge
 137 of the original media (\emph{i.e.}, before introducing the message in the LSCs random channel).
 138 These considerations, which have been
 139 deepened in later publications, will be discussed more largely at
 140 the end of this chapter.
 141 \item[Distribution of the messages $m$.]
 142  In order to prove the stego-security of the data hiding
 143  process
 144 $\mathcal{DI}_3$, we have supposed that  $m \sim
 145 \mathcal{U}\left(\mathbb{B}^\mathsf{P}\right)$. This hypothesis
 146 of the uniform repartition of the message to hide is not
 147 really restrictive. Indeed, to encrypt the message before its
 148 embedding into the LSCs of cover media, which is usually required
 149 for obvious security reasons, is sufficient to achieve this goal. To say
 150 it different, in order to be in the conditions of applications of the process
 151 $\mathcal{DI}_3$, the hidden message must be encrypted.
 152 \item[Distribution of the strategies $S$.]
 153  To prove the stego-security of the data hiding
 154  process
 155 $\mathcal{DI}_3$, we have finally supposed that  $S \sim
 156 \mathcal{U}\left(\mathbb{S}_\mathsf{P}\right)$.  This hypothesis is not
 157 restrictive too, as any cryptographically secure
 158 pseudorandom generator (PRNG)
 159 satisfies this property. With such PRNGs, it is impossible in
 160 polynomial time, to make the distinction between  random numbers and  numbers provided by
 161 these generators. For instance, \emph{Blum Blum Shub
 162 (BBS)}~\cite{junod1999cryptographic}, \emph{Blum Goldwasser (BG)}~\cite{prng-Blum-goldwasser},
 163 or \emph{ISAAC}~\cite{Jenkins1996}, recalled in the chapter focusing
 164 on PRNGs, are convenient here.
 165 \end{description}
 166
 167 \subsection{Evaluation against steganalyzers}\label{section:steganalysis}
 168
 169
 170 The steganographic scheme detailed in~\cite{bcfg12a:ip}
 171  has been compared to
 172 state of the art steganographic approaches, namely
 173 YASS~\cite{DBLP:conf/ih/SolankiSM07},
 174 HUGO~\cite{DBLP:conf/ih/PevnyFB10}, and
 175 nsF5~\cite{DBLP:conf/mmsec/FridrichPK07} detailed in the Appendix~\ref{AppendixIH}.
 176 This study, realized in~\cite{bcfg12a:ip}, is summarized thereafter.
 177
 178 The steganalysis is based on the BOSS image
 179 database~\cite{DBLP:conf/ih/BasFP11},
 180 which consists in a set of 10 000 512x512 greyscale images.
 181 We have randomly selected 50 of them to compute the cover set.
 182 Since YASS and nsF5 are dedicated to JPEG
 183 support, all these images have been firstly translated into JPEG format
 184 thanks to the \verb+mogrify+ command line.
 185 To allow the comparison between steganographic schemes, the relative payload
 186 is always set with 0.1 bit per pixel. Under that constrain,
 187 the embedded message $m$ is a sequence of 26214 randomly generated bits.
 188 This step has led to distinguish four sets of stego contents, one for
 189 each steganographic approach.
 190
 191 We have next used in~\cite{bcfg12a:ip} the steganalysis tool developed by the HugoBreakers
 192 team~\cite{holmes11,ensemble11} based on AI classifier and which won
 193 the BOSS competition~\cite{DBLP:conf/ih/BasFP11}.
 194 Table~\ref{table:holme} summarizes these steganalysis results expressed
 195 as the error probabilities of the steganalyser, as they are given in~\cite{bcfg12a:ip}. The errors are the mean
 196 of the false alarms and of the missed detections. An error that is closed
 197 to 0.5 signifies that deciding whether an image contains
 198 a stego content is a random choice for the steganalyser.
 199 Conversely, a tiny error denotes that the steganalyser can easily classify
 200 stego content and non stego content.
 201
 202 \begin{table}[ht]
 203 \begin{center}
 204 \begin{tabular}{|l|l|l|l|l|}
 205 \hline
 206 Steganographic Tool & $\mathcal{DI}_1$ & YASS & HUGO & NsF5 \\
 207 \hline
 208 Error Probability   & 0.4133& 0.0067& 0.495 & 0.47 \\
 209 \hline
 210 \end{tabular}
 211 \end{center}
 212 \caption{Steganalysis results of HugoBreakers steganalyser applied on
 213 steganographic scheme}\label{table:holme}
 214 \end{table}
 215
 216
 217 The best result is obtained by HUGO, which is closed to the perfect
 218 steganographic approach to the considered steganalyser, since the error is about 0.5. However, even if
 219 the approach detailed in~\cite{bcfg12a:ip} has not any optimization, these
 220 first experiments show promising results.
 221 We finally notice that the HugoBreakers's steganalyser should outperform
 222 these results  on larger image databases, \textit{e.g.}, when applied on the
 223 whole BOSS image database.