X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_chic.git/blobdiff_plain/aa1b170823c8a4d945f63afb79f822f5d049fcd6..a5f58fd899d0066c8331580c1ee5024a4c411990:/chapter2.tex diff --git a/chapter2.tex b/chapter2.tex index 06da12d..b50c9dd 100644 --- a/chapter2.tex +++ b/chapter2.tex @@ -1069,24 +1069,115 @@ The intensity of implication-inclusion (or entropic intensity), a new measure of $$\psi(a,b)= \left[ i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$ which integrates both statistical surprise and inclusive quality. -The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure 4{\bf TO CHANGE}, for $n_a$ and $n_b$ fixed. +The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure~\ref{chap2fig4}, for $n_a$ and $n_b$ fixed. Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal. In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important. -{\bf FIGURE 4} +\begin{figure}[htbp] + \centering +\includegraphics[scale=0.5]{chap2fig4.png} +\caption{Example of implication-inclusion.} +\label{chap2fig4} +\end{figure} + +In Figure~\ref{chap2fig4}, it can be seen that this representation of the continuous function of $t$ reflects the expected properties of the inclusion criterion: +\begin{itemize} +\item ``Slow reaction'' to the first counter-examples (noise resistance), +\item ``acceleration'' of the rejection of inclusion close to the balance i.e. $\frac{n_a}{2n}$, +\item rejection beyond $\frac{n_a}{2n}$, the intensity of implication $\varphi(a,b)$ did not ensure it. +\end{itemize} \noindent Example 1\\ - \begin{tabular}{|c|c|c|c|}\hline +\begin{tabular}{|c|c|c|c|}\hline & $b$ & $\overline{b}$ & margin\\ \hline $a$ & 200 & 400& 600 \\ \hline $\overline{a}$ & 600 & 2800& 3400 \\ \hline margin & 800 & 3200& 4000 \\ \hline - \end{tabular} - \\ - In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$). +\end{tabular} +\\ +\\ +In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$). The entropic values of the experiment are $h_1=h_2=0$. The value of the moderator coefficient is therefore $i(a,b)=0$. Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$. Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor. +\\ +\\ +\noindent Example 2\\ + \begin{tabular}{|c|c|c|c|}\hline + & $b$ & $\overline{b}$ & margin\\ \hline + $a$ & 400 & 200& 600 \\ \hline + $\overline{a}$ & 1000 & 2400& 3400 \\ \hline + margin & 1400 & 2600& 4000 \\ \hline + \end{tabular} + \\ + \\ + In Example 2, intensity of implication is 1 (for $q(a,\overline{b}) = - 8.43$). + The entropic values of the experiment are $h_1 = 0.918$ and $h_2 = 0.391$. + The value of the moderator coefficient is therefore $i(a,b) = 0.6035$. + As a result $\psi(a,b) = 0.777$ whereas $P(B \mid A) = 0.6666$. + \\ + \\ +{\bf remark} + \noindent The correspondence between $\varphi(a,b)$ and $\psi(a,b)$ is not monotonous as shown in the following example: + +\begin{tabular}{|c|c|c|c|}\hline + & $b$ & $\overline{b}$ & margin\\ \hline + $a$ & 40 & 20& 60 \\ \hline + $\overline{a}$ & 60 & 280& 340 \\ \hline + margin & 100 & 300& 400 \\ \hline +\end{tabular} +\\ +Thus, while $\varphi(a,b)$ decreased from the 1st to the 2nd example, $i(a,b)$ increased as well as $\psi(a,b)$. On the other hand, the opposite situation is the most frequent. +Note that in both cases, the conditional probability does not change. +\\ +\\ +{\bf remark} +\noindent We refer to~\cite{Lencaa} for a very detailed comparative study of association indices for binary variables. +In particular, the intensities of classical and entropic (inclusion) implication presented in this article are compared with other indices according to a "user" entry. + +\section{Implication graph} +\subsection{Problematic} + +At the end of the calculations of the intensities of implication in both the classical and entropic models, we have a table $p \times p$ that crosses the $p$ variables with each other, whatever their nature, and whose elements are the values of these intensities of implication, numbers of the interval $[0,~1]$. +It must be noted that the underlying structure of all these variables is far from explicit and remains largely unimportant. +The user remains blind to such a square table of size $p^2$. +It cannot simultaneously embrace the possible multiple sequences of rules that underlie the overall structure of all $p$ variables. +In order to facilitate a clearer extraction of the rules and to examine their structure, we have associated to this table, and for a given intensity threshold, an oriented graph, weighted by the intensities of implication, without a cycle whose complexity of representation the user can control by setting himself the threshold for taking into account the implicit quality of the rules. +Each arc in this graph represents a rule: if $n_a < n_b$, the arc $a \rightarrow b$ represents the rule $a \Rightarrow b$ ; if $n_a = n_b$, then the arc $a \leftrightarrow b$ will represent the double rule $a \Leftrightarrow b$, in other words, the equivalence between these two variables. +By varying the threshold of intensity of implication, it is obvious that the number of arcs varies in the opposite direction: for a threshold set at $0.95$, the number of arcs is less than or equal to those that would constitute the graph at threshold $0.90$. We will discuss this further below. + +\subsection{Algorithm} + +The relationship defined by statistical implication, if it is reflexive and not symmetrical, is obviously not transitive, as is induction and, on the contrary, deduction. +However, we want it to model the partial relationship between two variables (the successes in our initial example). +By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, we will accept the transitive closure $a \Rightarrow c$ only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ to $c$ is better than neutrality by emphasizing the dependence between $a$ and $c$. + + +{\bf VERIFIER PHI PSI}\\ +\\ +{\bf Proposal:} By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, there is a transitive closure $a \Rightarrow c$ if and only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ over $c$, which reflects a certain dependence between $a$ and $c$, is better than its refutation. +Note that for any pair of variables $(x;~ y)$, the arc $x \rightarrow y$ is weighted by the intensity of involvement (x,y). +\\ +Let us take a formal example by assuming that between the 5 variables $a$, $b$, $c$, $d$, and $e$ exist, at the threshold above $0.5$, the following rules: $c \Rightarrow a$, $c \Rightarrow e$, $c \Rightarrow b$, $d \Rightarrow a$, $d \Rightarrow e$, $a \Rightarrow b$ and $a \Rightarrow e$. + +This set of numerical and graphical relationships can then be translated into the following table and graph: + +\begin{tabular}{|C{0.5cm}|c|c|c|c|c|}\hline +\hspace{-0.5cm}\turn{45}{$\Rightarrow$} & $a$ & $b$ & $c$ & $d$ & $e$\\ \hline +$a$ & & 0.97& & & 0.73 \\ \hline +$b$ & & & & & \\ \hline + $c$ & 0.82 & 0.975& & & 0.82 \\ \hline + $d$ & 0.78 & & & & 0.92 \\ \hline + $e$ & & & & & \\ \hline +\end{tabular} + +\begin{figure}[htbp] + \centering +\includegraphics[scale=1]{chap2fig5.png} +\caption{Implication graph corresponding to the previous example.} + +\label{chap2fig5} +\end{figure}