stopping.tex

   1
   2
   3
   4 Let thus be given such kind of map.
   5 This article focuses on studying its iterations according to
   6 the equation~(\ref{eq:asyn}) with a given strategy.
   7 First of all, this can be interpreted as walking into its iteration graph
   8 where the choice of the edge to follow is decided by the strategy.
   9 Notice that the iteration graph is always a subgraph of
  10 ${\mathsf{N}}$-cube augmented with all the self-loop, \textit{i.e.}, all the
  11 edges $(v,v)$ for any $v \in \Bool^{\mathsf{N}}$.
  12 Next, if we add probabilities on the transition graph, iterations can be
  13 interpreted as Markov chains.
  14
  15 \begin{xpl}
  16 Let us consider for instance
  17 the graph $\Gamma(f)$ defined
  18 in \textsc{Figure~\ref{fig:iteration:f*}.} and
  19 the probability function $p$ defined on the set of edges as follows:
  20 $$
  21 p(e) \left\{
  22 \begin{array}{ll}
  23 = \frac{1}{3} \textrm{ if $e=(v,v)$ with $v \in \Bool^3$,}\\
  24 = \frac{1}{3} \textrm{ otherwise.}
  25 \end{array}
  26 \right.
  27 $$
  28 The matrix $P$ of the Markov chain associated to the function $f^*$ and to its probability function $p$ is
  29 \[
  30 P=\dfrac{1}{3} \left(
  31 \begin{array}{llllllll}
  32 1&1&1&0&0&0&0&0 \\
  33 1&1&0&0&0&1&0&0 \\
  34 0&0&1&1&0&0&1&0 \\
  35 0&1&1&1&0&0&0&0 \\
  36 1&0&0&0&1&0&1&0 \\
  37 0&0&0&0&1&1&0&1 \\
  38 0&0&0&0&1&0&1&1 \\
  39 0&0&0&1&0&1&0&1
  40 \end{array}
  41 \right)
  42 \]
  43 \end{xpl}
  44
  45
  46 % % Let us first recall the  \emph{Total Variation} distance $\tv{\pi-\mu}$,
  47 % % which is defined for two distributions $\pi$ and $\mu$ on the same set
  48 % % $\Bool^n$  by:
  49 % % $$\tv{\pi-\mu}=\max_{A\subset \Bool^n} |\pi(A)-\mu(A)|.$$
  50 % % It is known that
  51 % % $$\tv{\pi-\mu}=\frac{1}{2}\sum_{x\in\Bool^n}|\pi(x)-\mu(x)|.$$
  52
  53 % % Let then $M(x,\cdot)$ be the
  54 % % distribution induced by the $x$-th row of $M$. If the Markov chain
  55 % % induced by
  56 % % $M$ has a stationary distribution $\pi$, then we define
  57 % % $$d(t)=\max_{x\in\Bool^n}\tv{M^t(x,\cdot)-\pi}.$$
  58 % Intuitively $d(t)$ is the largest deviation between
  59 % the distribution $\pi$ and $M^t(x,\cdot)$, which
  60 % is the result of iterating $t$ times the function.
  61 % Finally, let $\varepsilon$ be a positive number, the \emph{mixing time}
  62 % with respect to $\varepsilon$ is given by
  63 % $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
  64 % It defines the smallest iteration number
  65 % that is sufficient to obtain a deviation lesser than $\varepsilon$.
  66 % Notice that the upper and lower bounds of mixing times cannot
  67 % directly be computed with eigenvalues formulae as expressed
  68 % in~\cite[Chap. 12]{LevinPeresWilmer2006}. The authors of this latter work
  69 % only consider reversible Markov matrices whereas we do no restrict our
  70 % matrices to such a form.
  71
  72
  73
  74
  75
  76
  77
  78 This section considers functions $f: \Bool^{\mathsf{N}} \rightarrow \Bool^{\mathsf{N}} $
  79 issued from an hypercube where an Hamiltonian path has been removed.
  80 A specific random walk in this modified hypercube is first
  81 introduced. We further detail
  82 a theoretical study on the length of the path
  83 which is sufficient to follow to get a uniform distribution.
  84
  85
  86
  87
  88
  89 First of all, let $\pi$, $\mu$ be two distributions on $\Bool^{\mathsf{N}}$. The total
  90 variation distance between $\pi$ and $\mu$ is denoted $\tv{\pi-\mu}$ and is
  91 defined by
  92 $$\tv{\pi-\mu}=\max_{A\subset \Bool^{\mathsf{N}}} |\pi(A)-\mu(A)|.$$ It is known that
  93 $$\tv{\pi-\mu}=\frac{1}{2}\sum_{X\in\Bool^{\mathsf{N}}}|\pi(X)-\mu(X)|.$$ Moreover, if
  94 $\nu$ is a distribution on $\Bool^{\mathsf{N}}$, one has
  95 $$\tv{\pi-\mu}\leq \tv{\pi-\nu}+\tv{\nu-\mu}$$
  96
  97 Let $P$ be the matrix of a Markov chain on $\Bool^{\mathsf{N}}$. $P(X,\cdot)$ is the
  98 distribution induced by the $X$-th row of $P$. If the Markov chain induced by
  99 $P$ has a stationary distribution $\pi$, then we define
 100 $$d(t)=\max_{X\in\Bool^{\mathsf{N}}}\tv{P^t(X,\cdot)-\pi}.$$
 101
 102 and
 103
 104 $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
 105 % One can prove that
 106
 107 % $$t_{\rm mix}(\varepsilon)\leq \lceil\log_2(\varepsilon^{-1})\rceil t_{\rm mix}(\frac{1}{4})$$
 108
 109
 110
 111
 112 % It is known that $d(t+1)\leq d(t)$. \JFC{references ? Cela a-t-il
 113 % un intérêt dans la preuve ensuite.}
 114
 115
 116
 117 %and
 118 % $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
 119 % One can prove that \JFc{Ou cela a-t-il été fait?}
 120 % $$t_{\rm mix}(\varepsilon)\leq \lceil\log_2(\varepsilon^{-1})\rceil t_{\rm mix}(\frac{1}{4})$$
 121
 122
 123
 124 Let $(X_t)_{t\in \mathbb{N}}$ be a sequence of $\Bool^{\mathsf{N}}$ valued random
 125 variables. A $\mathbb{N}$-valued random variable $\tau$ is a {\it stopping
 126   time} for the sequence $(X_i)$ if for each $t$ there exists $B_t\subseteq
 127 (\Bool^{\mathsf{N}})^{t+1}$ such that $\{\tau=t\}=\{(X_0,X_1,\ldots,X_t)\in B_t\}$.
 128 In other words, the event $\{\tau = t \}$ only depends on the values of
 129 $(X_0,X_1,\ldots,X_t)$, not on $X_k$ with $k > t$.
 130
 131
 132 Let $(X_t)_{t\in \mathbb{N}}$ be a Markov chain and $f(X_{t-1},Z_t)$ a
 133 random mapping representation of the Markov chain. A {\it randomized
 134   stopping time} for the Markov chain is a stopping time for
 135 $(Z_t)_{t\in\mathbb{N}}$. If the Markov chain is irreducible and has $\pi$
 136 as stationary distribution, then a {\it stationary time} $\tau$ is a
 137 randomized stopping time (possibly depending on the starting position $X$),
 138 such that  the distribution of $X_\tau$ is $\pi$:
 139 $$\P_X(X_\tau=Y)=\pi(Y).$$
 140
 141
 142 \begin{Theo}
 143 If $\tau$ is a strong stationary time, then $d(t)\leq \max_{X\in\Bool^{\mathsf{N}}}
 144 \P_X(\tau > t)$.
 145 \end{Theo}
 146
 147
 148 %Let $\Bool^n$ be the set of words of length $n$.
 149 Let $E=\{(X,Y)\mid
 150 X\in \Bool^{\mathsf{N}}, Y\in \Bool^{\mathsf{N}},\ X=Y \text{ or } X\oplus Y \in 0^*10^*\}$.
 151 In other words, $E$ is the set of all the edges in the classical
 152 ${\mathsf{N}}$-cube.
 153 Let $h$ be a function from $\Bool^{\mathsf{N}}$ into $\llbracket 1, {\mathsf{N}} \rrbracket$.
 154 Intuitively speaking $h$ aims at memorizing for each node
 155 $X \in \Bool^{\mathsf{N}}$ which edge is removed in the Hamiltonian cycle,
 156 \textit{i.e.} which bit in $\llbracket 1, {\mathsf{N}} \rrbracket$
 157 cannot be switched.
 158
 159
 160
 161 We denote by $E_h$ the set $E\setminus\{(X,Y)\mid X\oplus Y =
 162 0^{{\mathsf{N}}-h(X)}10^{h(X)-1}\}$. This is the set of the modified hypercube,
 163 \textit{i.e.}, the ${\mathsf{N}}$-cube where the Hamiltonian cycle $h$
 164 has been removed.
 165
 166 We define the Markov matrix $P_h$ for each line $X$ and
 167 each column $Y$  as follows:
 168 $$\left\{
 169 \begin{array}{ll}
 170 P_h(X,X)=\frac{1}{{\mathsf{N}}} & \\
 171 P_h(X,Y)=0 & \textrm{if  $(X,Y)\notin E_h$}\\
 172 P_h(X,Y)=\frac{1}{{\mathsf{N}}} & \textrm{if $X\neq Y$ and $(X,Y) \in E_h$}
 173 \end{array}
 174 \right.
 175 $$
 176
 177 We denote by $\ov{h} : \Bool^{\mathsf{N}} \rightarrow \Bool^{\mathsf{N}}$ the function
 178 such that for any $X \in \Bool^{\mathsf{N}} $,
 179 $(X,\ov{h}(X)) \in E$ and $X\oplus\ov{h}(X)=0^{{\mathsf{N}}-h(X)}10^{h(X)-1}$.
 180 The function $\ov{h}$ is said {\it square-free} if for every $X\in \Bool^{\mathsf{N}}$,
 181 $\ov{h}(\ov{h}(X))\neq X$.
 182
 183 \begin{Lemma}\label{lm:h}
 184 If $\ov{h}$ is bijective and square-free, then $h(\ov{h}^{-1}(X))\neq h(X)$.
 185 \end{Lemma}
 186
 187 \begin{proof}
 188 Let $\ov{h}$ be bijective.
 189 Let $k\in \llbracket 1, {\mathsf{N}} \rrbracket$ s.t. $h(\ov{h}^{-1}(X))=k$.
 190 Then $(\ov{h}^{-1}(X),X)$ belongs to $E$ and
 191 $\ov{h}^{-1}(X)\oplus X = 0^{{\mathsf{N}}-k}10^{k-1}$.
 192 Let us suppose $h(X) = h(\ov{h}^{-1}(X))$. In such a case, $h(X) =k$.
 193 By definition of $\ov{h}$, $(X, \ov{h}(X)) \in E $ and
 194 $X\oplus\ov{h}(X)=0^{{\mathsf{N}}-h(X)}10^{h(X)-1} = 0^{{\mathsf{N}}-k}10^{k-1}$.
 195 Thus $\ov{h}(X)= \ov{h}^{-1}(X)$, which leads to $\ov{h}(\ov{h}(X))= X$.
 196 This contradicts the square-freeness of $\ov{h}$.
 197 \end{proof}
 198
 199 Let $Z$ be a random variable that is uniformly distributed over
 200 $\llbracket 1, {\mathsf{N}}$.
 201 For $X\in \Bool^{\mathsf{N}}$, we
 202 define, with $Z=i$,
 203 $$
 204 \left\{
 205 \begin{array}{ll}
 206 %f(X,Z)=X\oplus (0^{{\mathsf{N}}-i}10^{i-1}) & \text{if } b=1 \text{ and } i\neq h(X),\\
 207 f(X,Z)=X\oplus (0^{{\mathsf{N}}-i}10^{i-1}) & \text{if $i\neq h(X)$},\\
 208 f(X,Z)=X& \text{otherwise.}
 209 \end{array}\right.
 210 $$
 211
 212 The Markov chain is thus defined as
 213 $$
 214 X_t= f(X_{t-1},Z_t)
 215 $$
 216
 217
 218
 219 %%%%%%%%%%%%%%%%%%%%%%%%%%%ù
 220 %\section{Stopping time}
 221
 222 An integer $\ell\in \llbracket 1,{\mathsf{N}} \rrbracket$ is said {\it fair}
 223 at time $t$ if there
 224 exists $0\leq j <t$ such that $Z_{j+1}=\ell$ and $h(X_j)\neq \ell$.
 225 In other words, there exist a date $j$ before $t$ where
 226 the random variable $Z$ is  $l$
 227 (\textit{i.e.}, $l$ is the strategy at date $j$)
 228 and where the configuration $X_j$ allows to traverse the edge $l$.
 229
 230 Let $\ts$ be the first time all the elements of $\llbracket 1, {\mathsf{N}} \rrbracket$
 231 are fair. The integer $\ts$ is a randomized stopping time for
 232 the Markov chain $(X_t)$.
 233
 234
 235 \begin{Lemma}
 236 The integer $\ts$ is a strong stationary time.
 237 \end{Lemma}
 238
 239 \begin{proof}
 240 Let $\tau_\ell$ be the first time that $\ell$ is fair. The random variable
 241 $Z_{\tau_\ell}$ is of the form $\ell$ %with $\delta\in\{0,1\}$ and
 242 % such that
 243 % $b=1$ with probability $\frac{1}{2}$ and $b=0$ with probability
 244 % $\frac{1}{2}$.
 245 Since $h(X_{\tau_\ell-1})\neq\ell$ the value of the $\ell$-th
 246 bit of $X_{\tau_\ell}$
 247 is $0$ or $1$ with the same probability ($\frac{1}{2}$).
 248
 249  Moving next in the chain, at each step,
 250 the $l$-th bit  is switched from $0$ to $1$ or from $1$ to $0$ each time with
 251 the same probability. Therefore,  for $t\geq \tau_\ell$, the
 252 $\ell$-th bit of $X_t$ is $0$ or $1$ with the same probability, proving the
 253 lemma.\end{proof}
 254
 255 \begin{Theo} \label{prop:stop}
 256 If $\ov{h}$ is bijective and square-free, then
 257 $E[\ts]\leq {\mathsf{N}}^2+ (\mathsf{N}+2)(\ln(\mathsf{N})+2)$.
 258 \end{Theo}
 259
 260 For each $X\in \Bool^{\mathsf{N}}$ and $\ell\in\llbracket 1,{\mathsf{N}}\rrbracket$,
 261 let $S_{X,\ell}$ be the
 262 random variable that counts the number of steps
 263 from $X$ until we reach a configuration where
 264 $\ell$ is fair. More formally
 265 $$S_{X,\ell}=\min \{t \geq 1\mid h(X_{t-1})\neq \ell\text{ and }Z_t=\ell \text{ and } X_0=X\}.$$
 266
 267  We denote by
 268 $$\lambda_h=\max_{X,\ell} S_{X,\ell}.$$
 269
 270
 271 \begin{Lemma}\label{prop:lambda}
 272 If $\ov{h}$ is a square-free bijective function, then the inequality
 273 $E[\lambda_h]\leq 2{\mathsf{N}}^2$ is established.
 274
 275 \end{Lemma}
 276
 277 \begin{proof}
 278 For every $X$, every $\ell$, one has $\P(S_{X,\ell}\leq 2)\geq
 279 \frac{1}{{\mathsf{N}}^2}$.
 280 Let $X_0= X$.
 281 Indeed,
 282 \begin{itemize}
 283 \item if $h(X)\neq \ell$, then
 284 $\P(S_{X,\ell}=1)=\frac{1}{{\mathsf{N}}}\geq \frac{1}{{\mathsf{N}}^2}$.
 285 \item otherwise, $h(X)=\ell$, then
 286 $\P(S_{X,\ell}=1)=0$.
 287 But in this case, intutively, it is possible to move
 288 from $X$ to $\ov{h}^{-1}(X)$ (with probability $\frac{1}{N}$). And in
 289 $\ov{h}^{-1}(X)$ the $l$-th bit can be switched.
 290 More formally,
 291 since $\ov{h}$ is square-free,
 292 $\ov{h}(X)=\ov{h}(\ov{h}(\ov{h}^{-1}(X)))\neq \ov{h}^{-1}(X)$. It follows
 293 that $(X,\ov{h}^{-1}(X))\in E_h$. We thus have
 294 $P(X_1=\ov{h}^{-1}(X))=\frac{1}{{\mathsf{N}}}$. Now, by Lemma~\ref{lm:h},
 295 $h(\ov{h}^{-1}(X))\neq h(X)$. Therefore $\P(S_{x,\ell}=2\mid
 296 X_1=\ov{h}^{-1}(X))=\frac{1}{{\mathsf{N}}}$, proving that $\P(S_{x,\ell}\leq 2)\geq
 297 \frac{1}{{\mathsf{N}}^2}$.
 298 \end{itemize}
 299
 300
 301
 302
 303 Therefore, $\P(S_{X,\ell}\geq 3)\leq 1-\frac{1}{{\mathsf{N}}^2}$. By induction, one
 304 has, for every $i$, $\P(S_{X,\ell}\geq 2i)\leq
 305 \left(1-\frac{1}{{\mathsf{N}}^2}\right)^i$.
 306  Moreover,
 307 since $S_{X,\ell}$ is positive, it is known~\cite[lemma 2.9]{proba}, that
 308 $$E[S_{X,\ell}]=\sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq i).$$
 309 Since $\P(S_{X,\ell}\geq i)\geq \P(S_{X,\ell}\geq i+1)$, one has
 310 $$E[S_{X,\ell}]=\sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq i)\leq
 311 \P(S_{X,\ell}\geq 1)+\P(S_{X,\ell}\geq 2)+2 \sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq 2i).$$
 312 Consequently,
 313 $$E[S_{X,\ell}]\leq 1+1+2
 314 \sum_{i=1}^{+\infty}\left(1-\frac{1}{{\mathsf{N}}^2}\right)^i=2+2({\mathsf{N}}^2-1)=2{\mathsf{N}}^2,$$
 315 which concludes the proof.
 316 \end{proof}
 317
 318 Let $\ts^\prime$ be the first time that there are exactly ${\mathsf{N}}-1$ fair
 319 elements.
 320
 321 \begin{Lemma}\label{lm:stopprime}
 322 One has $E[\ts^\prime]\leq (\mathsf{N}+2)(\ln(\mathsf{N})+2)$.
 323 \end{Lemma}
 324
 325 \begin{proof}
 326 This is a classical  Coupon Collector's like problem. Let $W_i$
 327 be the time to obtain the $i$-th fair bit
 328 after $i-1$ fair bits have been obtained.
 329 One has $\ts^\prime=\sum_{i=1}^{{\mathsf{N}}}W_i$.
 330
 331 At position $X$ with $i-1$ fair bits,
 332 we  do not obtain a new fair if $Z$ is one of the $i-1$ already fair bits
 333 or if $Z$ is a new fair bit but $h(X)$ is $Z$.
 334 This occures with probability
 335 $p
 336 = \frac{i-1}{{\mathsf{N}}} + \frac{n-i+1}{\mathsf{N}}.\frac{1}{\mathsf{N}}
 337 =\frac{i(\mathsf{N}-1) +1}{\mathsf{N^2}}
 338 $.
 339 The random variable $W_i$ has a geometric distribution
 340 \textit{i.e.}, $P(W_i = k) = p^{k-1}.(1-p)$ and
 341 $E(W_i) = \frac{\mathsf{N^2}}{i(\mathsf{N}-1) +1}$.
 342 Therefore
 343 $$E[\ts^\prime]=\sum_{i=1}^{{\mathsf{N}}}E[W_i]
 344 =\frac{\mathsf{N^2}}{\mathsf{N}(\mathsf{N}-1) +1}  + \sum_{i=1}^{{\mathsf{N}}-1}E[W_i].$$
 345
 346 A simple study of the function $\mathsf{N} \mapsto \frac{\mathsf{N^2}}{\mathsf{N}(\mathsf{N}-1) +1}$ shows that it is bounded by $\frac{4}{3} \leq 2$.
 347 For the second term, we successively have
 348 $$
 349 \sum_{i=1}^{{\mathsf{N}}-1}E[W_i]
 350 = \mathsf{N}^2\sum_{i=1}^{{\mathsf{N}}-1} \frac{1}{i(\mathsf{N}-1) +1}
 351 \leq \mathsf{N}^2\sum_{i=1}^{{\mathsf{N}}-1} \frac{1}{i(\mathsf{N}-1)}
 352 \leq \frac{\mathsf{N}^2}{\mathsf{N}-1}\sum_{i=1}^{{\mathsf{N}}-1} \frac{1}{i}
 353 \leq (\mathsf{N}+2)\sum_{i=1}^{{\mathsf{N}}-1} \frac{1}{i}
 354 $$
 355
 356
 357 It is well known that
 358 $\sum_{i=1}^{{\mathsf{N}}-1}\frac{1}{i}\leq 1+\ln({\mathsf{N}}-1)$.
 359 It follows that
 360 $2+(\mathsf{N}+2)\sum_{i=1}^{{\mathsf{N}}-1}\frac{1}{i}
 361 \leq
 362 2+(\mathsf{N}+2)(\ln(\mathsf{N}-1)+1)
 363 \leq
 364 (\mathsf{N}+2)(\ln(\mathsf{N})+2)$.
 365 \end{proof}
 366
 367 One can now prove Theorem~\ref{prop:stop}.
 368
 369 \begin{proof}
 370 One has $\ts\leq \ts^\prime+\lambda_h$. Therefore,
 371 Theorem~\ref{prop:stop} is a direct application of
 372 lemma~\ref{prop:lambda} and~\ref{lm:stopprime}.
 373 \end{proof}
 374
 375 Notice that the calculus of the stationary time upper bound is obtained
 376 under the following constraint: for each vertex in the $\mathsf{N}$-cube
 377 there are one ongoing arc and one outgoing arc that are removed.
 378 The calculus does not consider (balanced) hamiltonian cycles, which
 379 are more regular and more binding than this constraint.
 380 In this later context, we claim that the upper bound for the stopping time
 381 should be reduced.