stopping.tex

   1
   2
   3
   4 Let thus be given such kind of map.
   5 This article focuses on studying its iterations according to
   6 the equation~(\ref{eq:asyn}) with a given strategy.
   7 First of all, this can be interpreted as walking into its iteration graph
   8 where the choice of the edge to follow is decided by the strategy.
   9 Notice that the iteration graph is always a subgraph of
  10 ${\mathsf{N}}$-cube augmented with all the self-loop, \textit{i.e.}, all the
  11 edges $(v,v)$ for any $v \in \Bool^{\mathsf{N}}$.
  12 Next, if we add probabilities on the transition graph, iterations can be
  13 interpreted as Markov chains.
  14
  15 \begin{xpl}
  16 Let us consider for instance
  17 the graph $\Gamma(f)$ defined
  18 in \textsc{Figure~\ref{fig:iteration:f*}.} and
  19 the probability function $p$ defined on the set of edges as follows:
  20 $$
  21 p(e) \left\{
  22 \begin{array}{ll}
  23 = \frac{2}{3} \textrm{ if $e=(v,v)$ with $v \in \Bool^3$,}\\
  24 = \frac{1}{6} \textrm{ otherwise.}
  25 \end{array}
  26 \right.
  27 $$
  28 The matrix $P$ of the Markov chain associated to the function $f^*$ and to its probability function $p$ is
  29 \[
  30 P=\dfrac{1}{6} \left(
  31 \begin{array}{llllllll}
  32 4&1&1&0&0&0&0&0 \\
  33 1&4&0&0&0&1&0&0 \\
  34 0&0&4&1&0&0&1&0 \\
  35 0&1&1&4&0&0&0&0 \\
  36 1&0&0&0&4&0&1&0 \\
  37 0&0&0&0&1&4&0&1 \\
  38 0&0&0&0&1&0&4&1 \\
  39 0&0&0&1&0&1&0&4
  40 \end{array}
  41 \right)
  42 \]
  43 \end{xpl}
  44
  45
  46 % % Let us first recall the  \emph{Total Variation} distance $\tv{\pi-\mu}$,
  47 % % which is defined for two distributions $\pi$ and $\mu$ on the same set
  48 % % $\Bool^n$  by:
  49 % % $$\tv{\pi-\mu}=\max_{A\subset \Bool^n} |\pi(A)-\mu(A)|.$$
  50 % % It is known that
  51 % % $$\tv{\pi-\mu}=\frac{1}{2}\sum_{x\in\Bool^n}|\pi(x)-\mu(x)|.$$
  52
  53 % % Let then $M(x,\cdot)$ be the
  54 % % distribution induced by the $x$-th row of $M$. If the Markov chain
  55 % % induced by
  56 % % $M$ has a stationary distribution $\pi$, then we define
  57 % % $$d(t)=\max_{x\in\Bool^n}\tv{M^t(x,\cdot)-\pi}.$$
  58 % Intuitively $d(t)$ is the largest deviation between
  59 % the distribution $\pi$ and $M^t(x,\cdot)$, which
  60 % is the result of iterating $t$ times the function.
  61 % Finally, let $\varepsilon$ be a positive number, the \emph{mixing time}
  62 % with respect to $\varepsilon$ is given by
  63 % $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
  64 % It defines the smallest iteration number
  65 % that is sufficient to obtain a deviation lesser than $\varepsilon$.
  66 % Notice that the upper and lower bounds of mixing times cannot
  67 % directly be computed with eigenvalues formulae as expressed
  68 % in~\cite[Chap. 12]{LevinPeresWilmer2006}. The authors of this latter work
  69 % only consider reversible Markov matrices whereas we do no restrict our
  70 % matrices to such a form.
  71
  72
  73
  74
  75
  76
  77
  78 This section considers functions $f: \Bool^{\mathsf{N}} \rightarrow \Bool^{\mathsf{N}} $
  79 issued from an hypercube where an Hamiltonian path has been removed.
  80 A specific random walk in this modified hypercube is first
  81 introduced. We further detail
  82 a theoretical study on the length of the path
  83 which is sufficient to follow to get a uniform distribution.
  84 Notice that for a general references on Markov chains
  85 see~\cite{LevinPeresWilmer2006}
  86 , and particularly Chapter~5 on stopping times.
  87
  88
  89
  90
  91 First of all, let $\pi$, $\mu$ be two distributions on $\Bool^{\mathsf{N}}$. The total
  92 variation distance between $\pi$ and $\mu$ is denoted $\tv{\pi-\mu}$ and is
  93 defined by
  94 $$\tv{\pi-\mu}=\max_{A\subset \Bool^{\mathsf{N}}} |\pi(A)-\mu(A)|.$$ It is known that
  95 $$\tv{\pi-\mu}=\frac{1}{2}\sum_{X\in\Bool^{\mathsf{N}}}|\pi(X)-\mu(X)|.$$ Moreover, if
  96 $\nu$ is a distribution on $\Bool^{\mathsf{N}}$, one has
  97 $$\tv{\pi-\mu}\leq \tv{\pi-\nu}+\tv{\nu-\mu}$$
  98
  99 Let $P$ be the matrix of a Markov chain on $\Bool^{\mathsf{N}}$. $P(X,\cdot)$ is the
 100 distribution induced by the $X$-th row of $P$. If the Markov chain induced by
 101 $P$ has a stationary distribution $\pi$, then we define
 102 $$d(t)=\max_{X\in\Bool^{\mathsf{N}}}\tv{P^t(X,\cdot)-\pi}.$$
 103
 104 and
 105
 106 $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
 107 One can prove that
 108
 109 $$t_{\rm mix}(\varepsilon)\leq \lceil\log_2(\varepsilon^{-1})\rceil t_{\rm mix}(\frac{1}{4})$$
 110
 111
 112
 113
 114 % It is known that $d(t+1)\leq d(t)$. \JFC{references ? Cela a-t-il
 115 % un intérêt dans la preuve ensuite.}
 116
 117
 118
 119 %and
 120 % $$t_{\rm mix}(\varepsilon)=\min\{t \mid d(t)\leq \varepsilon\}.$$
 121 % One can prove that \JFc{Ou cela a-t-il été fait?}
 122 % $$t_{\rm mix}(\varepsilon)\leq \lceil\log_2(\varepsilon^{-1})\rceil t_{\rm mix}(\frac{1}{4})$$
 123
 124
 125
 126 Let $(X_t)_{t\in \mathbb{N}}$ be a sequence of $\Bool^{\mathsf{N}}$ valued random
 127 variables. A $\mathbb{N}$-valued random variable $\tau$ is a {\it stopping
 128   time} for the sequence $(X_i)$ if for each $t$ there exists $B_t\subseteq
 129 (\Bool^{\mathsf{N}})^{t+1}$ such that $\{\tau=t\}=\{(X_0,X_1,\ldots,X_t)\in B_t\}$.
 130 In other words, the event $\{\tau = t \}$ only depends on the values of
 131 $(X_0,X_1,\ldots,X_t)$, not on $X_k$ with $k > t$.
 132
 133
 134 Let $(X_t)_{t\in \mathbb{N}}$ be a Markov chain and $f(X_{t-1},Z_t)$ a
 135 random mapping representation of the Markov chain. A {\it randomized
 136   stopping time} for the Markov chain is a stopping time for
 137 $(Z_t)_{t\in\mathbb{N}}$. If the Markov chain is irreducible and has $\pi$
 138 as stationary distribution, then a {\it stationary time} $\tau$ is a
 139 randomized stopping time (possibly depending on the starting position $X$),
 140 such that  the distribution of $X_\tau$ is $\pi$:
 141 $$\P_X(X_\tau=Y)=\pi(Y).$$
 142
 143 A stopping time $\tau$ is a {\emph strong stationary time} if $X_{\tau}$ is
 144 independent of $\tau$.
 145
 146
 147 \begin{thrm}\label{thm-sst}
 148 If $\tau$ is a strong stationary time, then $d(t)\leq \max_{X\in\Bool^{\mathsf{N}}}
 149 \P_X(\tau > t)$.
 150 \end{thrm}
 151
 152
 153 %Let $\Bool^n$ be the set of words of length $n$.
 154 Let $E=\{(X,Y)\mid
 155 X\in \Bool^{\mathsf{N}}, Y\in \Bool^{\mathsf{N}},\ X=Y \text{ or } X\oplus Y \in 0^*10^*\}$.
 156 In other words, $E$ is the set of all the edges in the classical
 157 ${\mathsf{N}}$-cube.
 158 Let $h$ be a function from $\Bool^{\mathsf{N}}$ into $\llbracket 1, {\mathsf{N}} \rrbracket$.
 159 Intuitively speaking $h$ aims at memorizing for each node
 160 $X \in \Bool^{\mathsf{N}}$ which edge is removed in the Hamiltonian cycle,
 161 \textit{i.e.} which bit in $\llbracket 1, {\mathsf{N}} \rrbracket$
 162 cannot be switched.
 163
 164
 165
 166 We denote by $E_h$ the set $E\setminus\{(X,Y)\mid X\oplus Y =
 167 0^{{\mathsf{N}}-h(X)}10^{h(X)-1}\}$. This is the set of the modified hypercube,
 168 \textit{i.e.}, the ${\mathsf{N}}$-cube where the Hamiltonian cycle $h$
 169 has been removed.
 170
 171 We define the Markov matrix $P_h$ for each line $X$ and
 172 each column $Y$  as follows:
 173 \begin{equation}
 174 \left\{
 175 \begin{array}{ll}
 176 P_h(X,X)=\frac{1}{2}+\frac{1}{2{\mathsf{N}}} & \\
 177 P_h(X,Y)=0 & \textrm{if  $(X,Y)\notin E_h$}\\
 178 P_h(X,Y)=\frac{1}{2{\mathsf{N}}} & \textrm{if $X\neq Y$ and $(X,Y) \in E_h$}
 179 \end{array}
 180 \right.
 181 \label{eq:Markov:rairo}
 182 \end{equation}
 183
 184 We denote by $\ov{h} : \Bool^{\mathsf{N}} \rightarrow \Bool^{\mathsf{N}}$ the function
 185 such that for any $X \in \Bool^{\mathsf{N}} $,
 186 $(X,\ov{h}(X)) \in E$ and $X\oplus\ov{h}(X)=0^{{\mathsf{N}}-h(X)}10^{h(X)-1}$.
 187 The function $\ov{h}$ is said {\it square-free} if for every $X\in \Bool^{\mathsf{N}}$,
 188 $\ov{h}(\ov{h}(X))\neq X$.
 189
 190 \begin{lmm}\label{lm:h}
 191 If $\ov{h}$ is bijective and square-free, then $h(\ov{h}^{-1}(X))\neq h(X)$.
 192 \end{lmm}
 193
 194 \begin{proof}
 195 Let $\ov{h}$ be bijective.
 196 Let $k\in \llbracket 1, {\mathsf{N}} \rrbracket$ s.t. $h(\ov{h}^{-1}(X))=k$.
 197 Then $(\ov{h}^{-1}(X),X)$ belongs to $E$ and
 198 $\ov{h}^{-1}(X)\oplus X = 0^{{\mathsf{N}}-k}10^{k-1}$.
 199 Let us suppose $h(X) = h(\ov{h}^{-1}(X))$. In such a case, $h(X) =k$.
 200 By definition of $\ov{h}$, $(X, \ov{h}(X)) \in E $ and
 201 $X\oplus\ov{h}(X)=0^{{\mathsf{N}}-h(X)}10^{h(X)-1} = 0^{{\mathsf{N}}-k}10^{k-1}$.
 202 Thus $\ov{h}(X)= \ov{h}^{-1}(X)$, which leads to $\ov{h}(\ov{h}(X))= X$.
 203 This contradicts the square-freeness of $\ov{h}$.
 204 \end{proof}
 205
 206 Let $Z$ be a random variable that is uniformly distributed over
 207 $\llbracket 1, {\mathsf{N}} \rrbracket \times \Bool$.
 208 For $X\in \Bool^{\mathsf{N}}$, we
 209 define, with $Z=(i,b)$,
 210 $$
 211 \left\{
 212 \begin{array}{ll}
 213 f(X,Z)=X\oplus (0^{{\mathsf{N}}-i}10^{i-1}) & \text{if } b=1 \text{ and } i\neq h(X),\\
 214 f(X,Z)=X& \text{otherwise.}
 215 \end{array}\right.
 216 $$
 217
 218 The Markov chain is thus defined as
 219 $$
 220 X_t= f(X_{t-1},Z_t)
 221 $$
 222
 223
 224
 225 %%%%%%%%%%%%%%%%%%%%%%%%%%%ù
 226 %\section{Stopping time}
 227
 228 An integer $\ell\in \llbracket 1,{\mathsf{N}} \rrbracket$ is said {\it fair}
 229 at time $t$ if there
 230 exists $0\leq j <t$ such that $Z_{j+1}=(\ell,\cdot)$ and $h(X_j)\neq \ell$.
 231 In other words, there exist a date $j$ before $t$ where
 232 the first element of the random variable $Z$ is exactly $l$
 233 (\textit{i.e.}, $l$ is the strategy at date $j$)
 234 and where the configuration $X_j$ allows to traverse the edge $l$.
 235
 236 Let $\ts$ be the first time all the elements of $\llbracket 1, {\mathsf{N}} \rrbracket$
 237 are fair. The integer $\ts$ is a randomized stopping time for
 238 the Markov chain $(X_t)$.
 239
 240
 241 \begin{lmm}
 242 The integer $\ts$ is a strong stationary time.
 243 \end{lmm}
 244
 245 \begin{proof}
 246 Let $\tau_\ell$ be the first time that $\ell$ is fair. The random variable
 247 $Z_{\tau_\ell}$ is of the form $(\ell,b)$ %with $\delta\in\{0,1\}$ and
 248 such that
 249 $b=1$ with probability $\frac{1}{2}$ and $b=0$ with probability
 250 $\frac{1}{2}$. Since $h(X_{\tau_\ell-1})\neq\ell$ the value of the $\ell$-th
 251 bit of $X_{\tau_\ell}$
 252 is $0$ or $1$ with the same probability ($\frac{1}{2}$).
 253
 254  Moving next in the chain, at each step,
 255 the $l$-th bit  is switched from $0$ to $1$ or from $1$ to $0$ each time with
 256 the same probability. Therefore,  for $t\geq \tau_\ell$, the
 257 $\ell$-th bit of $X_t$ is $0$ or $1$ with the same probability, proving the
 258 lemma.\end{proof}
 259
 260 \begin{thrm} \label{prop:stop}
 261 If $\ov{h}$ is bijective and square-free, then
 262 $E[\ts]\leq 8{\mathsf{N}}^2+ 4{\mathsf{N}}\ln ({\mathsf{N}}+1)$.
 263 \end{thrm}
 264
 265 For each $X\in \Bool^{\mathsf{N}}$ and $\ell\in\llbracket 1,{\mathsf{N}}\rrbracket$,
 266 let $S_{X,\ell}$ be the
 267 random variable that counts the number of steps
 268 from $X$ until we reach a configuration where
 269 $\ell$ is fair. More formally
 270 $$S_{X,\ell}=\min \{t \geq 1\mid h(X_{t-1})\neq \ell\text{ and }Z_t=(\ell,.)\text{ and } X_0=X\}.$$
 271
 272 %  We denote by
 273 % $$\lambda_h=\max_{X,\ell} S_{X,\ell}.$$
 274
 275
 276 \begin{lmm}\label{prop:lambda}
 277 Let $\ov{h}$ is a square-free bijective function. Then
 278 for all $X$ and
 279 all $\ell$,
 280 the inequality
 281 $E[S_{X,\ell}]\leq 8{\mathsf{N}}^2$ is established.
 282 \end{lmm}
 283
 284 \begin{proof}
 285 For every $X$, every $\ell$, one has $\P(S_{X,\ell})\leq 2)\geq
 286 \frac{1}{4{\mathsf{N}}^2}$.
 287 Let $X_0= X$.
 288 Indeed,
 289 \begin{itemize}
 290 \item if $h(X)\neq \ell$, then
 291 $\P(S_{X,\ell}=1)=\frac{1}{2{\mathsf{N}}}\geq \frac{1}{4{\mathsf{N}}^2}$.
 292 \item otherwise, $h(X)=\ell$, then
 293 $\P(S_{X,\ell}=1)=0$.
 294 But in this case, intuitively, it is possible to move
 295 from $X$ to $\ov{h}^{-1}(X)$ (with probability $\frac{1}{2N}$). And in
 296 $\ov{h}^{-1}(X)$ the $l$-th bit can be switched.
 297 More formally,
 298 since $\ov{h}$ is square-free,
 299 $\ov{h}(X)=\ov{h}(\ov{h}(\ov{h}^{-1}(X)))\neq \ov{h}^{-1}(X)$. It follows
 300 that $(X,\ov{h}^{-1}(X))\in E_h$. We thus have
 301 $P(X_1=\ov{h}^{-1}(X))=\frac{1}{2{\mathsf{N}}}$. Now, by Lemma~\ref{lm:h},
 302 $h(\ov{h}^{-1}(X))\neq h(X)$. Therefore $\P(S_{x,\ell}=2\mid
 303 X_1=\ov{h}^{-1}(X))=\frac{1}{2{\mathsf{N}}}$, proving that $\P(S_{x,\ell}\leq 2)\geq
 304 \frac{1}{4{\mathsf{N}}^2}$.
 305 \end{itemize}
 306
 307
 308
 309
 310 Therefore, $\P(S_{X,\ell}\geq 3)\leq 1-\frac{1}{4{\mathsf{N}}^2}$. By induction, one
 311 has, for every $i$, $\P(S_{X,\ell}\geq 2i)\leq
 312 \left(1-\frac{1}{4{\mathsf{N}}^2}\right)^i$.
 313  Moreover,
 314 since $S_{X,\ell}$ is positive, it is known~\cite[lemma 2.9]{proba}, that
 315 $$E[S_{X,\ell}]=\sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq i).$$
 316 Since $\P(S_{X,\ell}\geq i)\geq \P(S_{X,\ell}\geq i+1)$, one has
 317 $$E[S_{X,\ell}]=\sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq i)\leq
 318 \P(S_{X,\ell}\geq 1)+\P(S_{X,\ell}\geq 2)+2 \sum_{i=1}^{+\infty}\P(S_{X,\ell}\geq 2i).$$
 319 Consequently,
 320 $$E[S_{X,\ell}]\leq 1+1+2
 321 \sum_{i=1}^{+\infty}\left(1-\frac{1}{4{\mathsf{N}}^2}\right)^i=2+2(4{\mathsf{N}}^2-1)=8{\mathsf{N}}^2,$$
 322 which concludes the proof.
 323 \end{proof}
 324
 325 Let $\ts^\prime$ be the time used to get all the bits but one fair.
 326
 327 \begin{lmm}\label{lm:stopprime}
 328 One has $E[\ts^\prime]\leq 4{\mathsf{N}} \ln ({\mathsf{N}}+1).$
 329 \end{lmm}
 330
 331 \begin{proof}
 332 This is a classical  Coupon Collector's like problem. Let $W_i$ be the
 333 random variable counting the number of moves done in the Markov chain while
 334 we had exactly $i-1$ fair bits. One has $\ts^\prime=\sum_{i=1}^{{\mathsf{N}}-1}W_i$.
 335  But when we are at position $X$ with $i-1$ fair bits, the probability of
 336  obtaining a new fair bit is either $1-\frac{i-1}{{\mathsf{N}}}$ if $h(X)$ is fair,
 337  or  $1-\frac{i-2}{{\mathsf{N}}}$ if $h(X)$ is not fair.
 338
 339 Therefore,
 340 $\P (W_i=k)\leq \left(\frac{i-1}{{\mathsf{N}}}\right)^{k-1} \frac{{\mathsf{N}}-i+2}{{\mathsf{N}}}.$
 341 Consequently, we have $\P(W_i\geq k)\leq \left(\frac{i-1}{{\mathsf{N}}}\right)^{k-1} \frac{{\mathsf{N}}-i+2}{{\mathsf{N}}-i+1}.$
 342 It follows that $E[W_i]=\sum_{k=1}^{+\infty} \P (W_i\geq k)\leq {\mathsf{N}} \frac{{\mathsf{N}}-i+2}{({\mathsf{N}}-i+1)^2}\leq \frac{4{\mathsf{N}}}{{\mathsf{N}}-i+2}$.
 343
 344
 345
 346 It follows that
 347 $E[W_i]\leq \frac{4{\mathsf{N}}}{{\mathsf{N}}-i+2}$. Therefore
 348 $$E[\ts^\prime]=\sum_{i=1}^{{\mathsf{N}}-1}E[W_i]\leq
 349 4{\mathsf{N}}\sum_{i=1}^{{\mathsf{N}}-1} \frac{1}{{\mathsf{N}}-i+2}=
 350 4{\mathsf{N}}\sum_{i=3}^{{\mathsf{N}}+1}\frac{1}{i}.$$
 351
 352 But $\sum_{i=1}^{{\mathsf{N}}+1}\frac{1}{i}\leq 1+\ln({\mathsf{N}}+1)$. It follows that
 353 $1+\frac{1}{2}+\sum_{i=3}^{{\mathsf{N}}+1}\frac{1}{i}\leq 1+\ln({\mathsf{N}}+1).$
 354 Consequently,
 355 $E[\ts^\prime]\leq
 356 4{\mathsf{N}} (-\frac{1}{2}+\ln({\mathsf{N}}+1))\leq
 357 4{\mathsf{N}}\ln({\mathsf{N}}+1)$.
 358 \end{proof}
 359
 360 One can now prove Theorem~\ref{prop:stop}.
 361
 362 \begin{proof}
 363 Since $\ts^\prime$ is the time used to obtain $\mathsf{N}-1$ fair bits.
 364 Assume that the last unfair bit is $\ell$. One has
 365 $\ts=\ts^\prime+S_{X_\tau,\ell}$, and therefore $E[\ts] =
 366 E[\ts^\prime]+E[S_{X_\tau,\ell}]$. Therefore, Theorem~\ref{prop:stop} is a
 367 direct application of lemma~\ref{prop:lambda} and~\ref{lm:stopprime}.
 368 \end{proof}
 369
 370 Now using Markov Inequality, one has $\P_X(\tau > t)\leq \frac{E[\tau]}{t}$.
 371 With $t=32N^2+16N\ln (N+1)$, one obtains:  $\P_X(\tau > t)\leq \frac{1}{4}$.
 372 Therefore, using the defintion of $t_{\rm mix)}$ and
 373 Theorem~\ref{thm-sst}, it follows that
 374 $t_{\rm mix}\leq 32N^2+16N\ln (N+1)=O(N^2)$.
 375
 376
 377 Notice that the calculus of the stationary time upper bound is obtained
 378 under the following constraint: for each vertex in the $\mathsf{N}$-cube
 379 there are one ongoing arc and one outgoing arc that are removed.
 380 The calculus does not consider (balanced) Hamiltonian cycles, which
 381 are more regular and more binding than this constraint. Moreover, the bound
 382 is obtained using Markov Inequality which is frequently coarse. For the
 383 classical random walkin the  $\mathsf{N}$-cube, without removing any
 384 Hamiltonian cylce, the mixing time is in $\Theta(N\ln N)$.
 385 We conjecture that in our context, the mixing time is also in $\Theta(N\ln
 386 N)$.
 387 %In this later context, we claim that the upper bound for the stopping time
 388 %should be reduced.