update

[book_chic.git] / chapter2.tex
diff --git a/chapter2.tex b/chapter2.tex

index 41a299b87e1aaf044da70f94613f5540e344323c..b50c9dd8205e9aa12b9ba88ad3a2c4eb300c1483 100644 (file)
--- a/chapter2.tex
+++ b/chapter2.tex
@@ -384,16 +384,16 @@ The following dual numerical situation clearly illustrates this:
  \center
  \begin{tabular}{|l|c|c|c|}\hline
  \diagbox[width=4em]{$a_1$}{$b_1$}&
  \center
  \begin{tabular}{|l|c|c|c|}\hline
  \diagbox[width=4em]{$a_1$}{$b_1$}&
-  1 & 0 & marge\\ \hline
+  1 & 0 & margin\\ \hline
    1 & 96 & 4& 100 \\ \hline
    0 & 50 & 50& 100 \\ \hline
    1 & 96 & 4& 100 \\ \hline
    0 & 50 & 50& 100 \\ \hline
-  marge & 146 & 54& 200 \\ \hline
+  margin & 146 & 54& 200 \\ \hline
  \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
  \diagbox[width=4em]{$a_2$}{$b_2$}&
  \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
  \diagbox[width=4em]{$a_2$}{$b_2$}&
-  1 & 0 & marge\\ \hline
+  1 & 0 & margin\\ \hline
    1 & 94 & 6& 100 \\ \hline
    0 & 52 & 48& 100 \\ \hline
    1 & 94 & 6& 100 \\ \hline
    0 & 52 & 48& 100 \\ \hline
-  marge & 146 & 54& 200 \\ \hline
+  margin & 146 & 54& 200 \\ \hline
  \end{tabular}
  
  \caption{Numeric example of difference between implication and
  \end{tabular}
  
  \caption{Numeric example of difference between implication and
@@ -1037,3 +1037,147 @@ is the conditional entropy relative to the boxes $(\neg a \wedge \neg b)$ and $(
  These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$.
  Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations. 
  For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints:
  These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$.
  Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations. 
  For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints:
+
+\begin{enumerate}
+\item It shall integrate the entropy values and, to contrast them, for example, integrate these values into the square.
+\item As this square varies from 0 to 1, in order to denote the imbalance and therefore the inclusion, in order to oppose entropy, the value retained will be the complement to 1 of its square as long as the number of counter-examples is less than half of the observations of a (resp. non b).
+  Beyond these values, as the implications no longer have an inclusive meaning, the criterion will be assigned the value 0.
+\item In order to take into account the two information specific to $a\Rightarrow b$ and $\neg b \Rightarrow \neg a$, the product will report on the simultaneous quality of the values retained.
+The product has the property of cancelling itself as soon as one of its terms is cancelled, i.e. as soon as this quality is erased.
+\item Finally, since the product has a dimension 4 with respect to entropy, its fourth root will be of the same dimension.
+\end{enumerate}
+
+Let $\alpha=\frac{n_a}{n}$ be the frequency of a and $\overline{b}=\frac{n_{\overline{b}}}{n}$ be the frequency of non b.
+Let $t=\frac{n_{a \wedge \overline{b}}}{n}$  be the frequency of counter-examples, the two significant terms of the respective qualities of involvement and its counterpart are:
+
+\begin{eqnarray*}
+  h_1(t) = H(b\mid a) = - (1-\frac{t}{\alpha}) log_2 (1-\frac{t}{\alpha})   - \frac{t}{\alpha} log_2  \frac{t}{\alpha} & \mbox{ if }t \in [0,\frac{\alpha}{2}[\\
+  h_1(t) = 1 & \mbox{ if }t \in [\frac{\alpha}{2},\alpha]\\
+  h_2(t)= H(\overline{a}\mid \overline{b}) = -  (1-\frac{t}{\overline{\beta}}) log_2  (1-\frac{t}{\overline{\beta}})    -  \frac{t}{\overline{b}} log_2  \frac{t}{\overline{b}} & \mbox{ if }t \in [0,\frac{\overline{\beta}}{2}[\\
+  h_2(t)= 1 & \mbox{ if }t \in [\frac{\overline{\beta}}{2},\overline{\beta}]
+\end{eqnarray*}
+Hence the definition for determining the entropic criterion: 
+\definition: The inclusion index of A, support of a, in B, support of b, is the number:
+$$i(a,b) = \left[ (1-h_1^2(t)) (1-h_2^2(t)))   \right]^{\frac{1}{4}}$$
+
+which integrates the information provided by the realization of a small number of counter-examples, on the one hand to the rule $a \Rightarrow b$ and, on the other hand, to the rule $\neg b \Rightarrow \neg a$.
+
+\subsection{The implication-inclusion index}
+
+The intensity of implication-inclusion (or entropic intensity), a new measure of inductive quality, is the number:
+
+$$\psi(a,b)= \left[  i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$
+which integrates both statistical surprise and inclusive quality.
+
+The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure~\ref{chap2fig4}, for $n_a$ and $n_b$ fixed.
+Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal.
+In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important.
+
+
+\begin{figure}[htbp]
+  \centering
+\includegraphics[scale=0.5]{chap2fig4.png}
+\caption{Example of implication-inclusion.}
+
+\label{chap2fig4}      
+\end{figure}
+
+In Figure~\ref{chap2fig4}, it can be seen that this representation of the continuous function of $t$ reflects the expected properties of the inclusion criterion:
+\begin{itemize}
+\item ``Slow reaction'' to the first counter-examples (noise resistance),
+\item ``acceleration'' of the rejection of inclusion close to the balance i.e. $\frac{n_a}{2n}$,
+\item rejection beyond $\frac{n_a}{2n}$,  the intensity of implication $\varphi(a,b)$ did not ensure it.
+\end{itemize}
+
+\noindent Example 1\\
+\begin{tabular}{|c|c|c|c|}\hline
+  & $b$ & $\overline{b}$ & margin\\ \hline
+  $a$ & 200 & 400& 600 \\ \hline
+  $\overline{a}$ & 600 & 2800& 3400 \\ \hline
+  margin & 800 & 3200& 4000 \\ \hline
+\end{tabular}
+\\
+\\
+In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$).
+ The entropic values of the experiment are $h_1=h_2=0$.
+ The value of the moderator coefficient is therefore $i(a,b)=0$.
+ Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$.
+Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor.
+\\
+\\
+\noindent Example 2\\
+ \begin{tabular}{|c|c|c|c|}\hline
+  & $b$ & $\overline{b}$ & margin\\ \hline
+  $a$ & 400 & 200& 600 \\ \hline
+  $\overline{a}$ & 1000 & 2400& 3400 \\ \hline
+  margin & 1400 & 2600& 4000 \\ \hline
+ \end{tabular}
+ \\
+ \\
+ In Example 2, intensity of implication is 1 (for  $q(a,\overline{b}) = - 8.43$).
+ The entropic values of the experiment are $h_1 = 0.918$ and $h_2 = 0.391$.
+ The value of the moderator coefficient is therefore $i(a,b) = 0.6035$.
+ As a result $\psi(a,b) = 0.777$ whereas $P(B \mid A) = 0.6666$.
+ \\
+ \\
+{\bf remark}
+ \noindent The correspondence between $\varphi(a,b)$ and $\psi(a,b)$ is not monotonous as shown in the following example:
+ 
+\begin{tabular}{|c|c|c|c|}\hline
+  & $b$ & $\overline{b}$ & margin\\ \hline
+  $a$ & 40 & 20& 60 \\ \hline
+  $\overline{a}$ & 60 & 280& 340 \\ \hline
+  margin & 100 & 300& 400 \\ \hline
+\end{tabular}
+\\
+Thus, while $\varphi(a,b)$ decreased from the 1st to the 2nd example, $i(a,b)$ increased as well as $\psi(a,b)$.  On the other hand, the opposite situation is the most frequent.
+Note that in both cases, the conditional probability does not change.
+\\
+\\
+{\bf remark}
+\noindent We refer to~\cite{Lencaa} for a very detailed comparative study of association indices for binary variables.
+In particular, the intensities of classical and entropic (inclusion) implication presented in this article are compared with other indices according to a "user" entry.
+
+\section{Implication graph}
+\subsection{Problematic}
+
+At the end of the calculations of the intensities of implication in both the classical and entropic models, we have a table $p \times p$ that crosses the $p$ variables with each other, whatever their nature, and whose elements are the values of these intensities of implication, numbers of the interval $[0,~1]$.
+It must be noted that the underlying structure of all these variables is far from explicit and remains largely unimportant.
+The user remains blind to such a square table of size $p^2$.
+It cannot simultaneously embrace the possible multiple sequences of rules that underlie the overall structure of all $p$ variables.
+In order to facilitate a clearer extraction of the rules and to examine their structure, we have associated to this table, and for a given intensity threshold, an oriented graph, weighted by the intensities of implication, without a cycle whose complexity of representation the user can control by setting himself the threshold for taking into account the implicit quality of the rules.
+Each arc in this graph represents a rule: if $n_a < n_b$, the arc $a \rightarrow b$ represents the rule $a \Rightarrow b$ ; if $n_a = n_b$, then the arc $a \leftrightarrow b$ will represent the double rule $a \Leftrightarrow b$, in other words, the equivalence between these two variables.
+By varying the threshold of intensity of implication, it is obvious that the number of arcs varies in the opposite direction: for a threshold set at $0.95$, the number of arcs is less than or equal to those that would constitute the graph at threshold $0.90$. We will discuss this further below.
+
+\subsection{Algorithm}
+
+The relationship defined by statistical implication, if it is reflexive and not symmetrical, is obviously not transitive, as is induction and, on the contrary, deduction.
+However, we want it to model the partial relationship between two variables (the successes in our initial example).
+By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, we will accept the transitive closure $a \Rightarrow c$ only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ to $c$ is better than neutrality by emphasizing the dependence between $a$ and $c$.
+
+
+{\bf VERIFIER PHI PSI}\\
+\\
+{\bf Proposal:} By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, there is a transitive closure $a \Rightarrow c$ if and only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ over $c$, which reflects a certain dependence between $a$ and $c$, is better than its refutation.
+Note that for any pair of variables $(x;~ y)$, the arc $x \rightarrow y$ is weighted by the intensity of involvement (x,y).
+\\
+Let us take a formal example by assuming that between the 5 variables $a$, $b$, $c$, $d$, and $e$ exist, at the threshold above $0.5$, the following rules: $c \Rightarrow a$, $c \Rightarrow e$, $c \Rightarrow b$, $d \Rightarrow a$, $d \Rightarrow e$, $a \Rightarrow b$ and $a \Rightarrow e$.
+
+This set of numerical and graphical relationships can then be translated into the following table and graph:
+
+\begin{tabular}{|C{0.5cm}|c|c|c|c|c|}\hline
+\hspace{-0.5cm}\turn{45}{$\Rightarrow$}  & $a$ & $b$ & $c$ & $d$ & $e$\\ \hline
+$a$ &  & 0.97& & & 0.73 \\ \hline
+$b$ &  &     & & & \\ \hline  
+  $c$ & 0.82 & 0.975& & & 0.82 \\ \hline
+  $d$ & 0.78 &    &   &  & 0.92 \\ \hline
+  $e$ &  &     & & & \\ \hline  
+\end{tabular}
+
+\begin{figure}[htbp]
+  \centering
+\includegraphics[scale=1]{chap2fig5.png}
+\caption{Implication graph corresponding to the previous example.}
+
+\label{chap2fig5}      
+\end{figure}