X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_chic.git/blobdiff_plain/4155cdb38e520aa0c325bf29b9e57da783d7f89e..aa1b170823c8a4d945f63afb79f822f5d049fcd6:/chapter2.tex diff --git a/chapter2.tex b/chapter2.tex index d5f45e3..06da12d 100644 --- a/chapter2.tex +++ b/chapter2.tex @@ -29,7 +29,7 @@ gradually being supplemented by the processing of modal, frequency and, recently, interval and fuzzy variables. } -\section{Preamble} +\section*{Preamble} Human operative knowledge is mainly composed of two components: that of facts and that of rules between facts or between rules themselves. @@ -298,7 +298,11 @@ It estimates a gap between the contingency $(card(A\cap \overline{B}))$ and the value it would have taken if there had been independence between $a$ and $b$. -\definition $$q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}- \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}$$ +\definition +\begin{equation} q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}- + \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} + \label{eq2.1} +\end{equation} is called the implication index, the number used as an indicator of the non-implication of $a$ to $b$. In cases where the approximation is properly legitimized (for example @@ -380,16 +384,16 @@ The following dual numerical situation clearly illustrates this: \center \begin{tabular}{|l|c|c|c|}\hline \diagbox[width=4em]{$a_1$}{$b_1$}& - 1 & 0 & marge\\ \hline + 1 & 0 & margin\\ \hline 1 & 96 & 4& 100 \\ \hline 0 & 50 & 50& 100 \\ \hline - marge & 146 & 54& 200 \\ \hline + margin & 146 & 54& 200 \\ \hline \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline \diagbox[width=4em]{$a_2$}{$b_2$}& - 1 & 0 & marge\\ \hline + 1 & 0 & margin\\ \hline 1 & 94 & 6& 100 \\ \hline 0 & 52 & 48& 100 \\ \hline - marge & 146 & 54& 200 \\ \hline + margin & 146 & 54& 200 \\ \hline \end{tabular} \caption{Numeric example of difference between implication and @@ -504,7 +508,7 @@ close to the Lagrange index, but better adapted to the rank variable situation. -\section{Cases of on-interval and on-interval variables} +\section{Cases of variables-on-intervals and interval-variables} \subsection{Variables-on-intervals} \subsubsection{Founding situation} @@ -589,8 +593,8 @@ admits the partition corresponding to the first maximum and that the optimal reciprocal involvement is satisfied for the partition of $[b1,~ b2]$ corresponding to the second maximum. -\section{Interval-variables} -\subsection{Founding situation} +\subsection{Interval-variables} +\subsubsection{Founding situation} Data are available from a population of $n$ individuals (who may be each or some of the sets of individuals, e.g. a class of students) according to variables (e.g. grades over a year in French, math, @@ -621,7 +625,7 @@ Similarly, we will say that $[14.25, 17.80]$ in physics most often implies $[16.40, 18]$ in mathematics. -\subsection{Algorithm} +\subsubsection{Algorithm} By following the problem of E. Diday and his collaborators, if the values taken according to the subjects by the variables $a$ and $b$ @@ -685,9 +689,15 @@ topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections, as they are with the usual topology.}, is expressed as follows by the scalar product: -$$dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial +\begin{equation} +dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial n_a}dn_a + \frac{\partial q}{\partial n_b}dn_b + \frac{\partial - q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} = grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is the elementary work of $q$ for a movement $dM$ (see chapter 14 of this book).}$$ + q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} = +grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is + the elementary work of $q$ for a movement $dM$ (see chapter 14 of + this book).} +\label{eq2.2} +\end{equation} where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge \overline{b}})$ of the vector scalar field $C$, $dM$ is the @@ -724,16 +734,24 @@ where $o(\Delta q)$ is an infinitely small first order. Let us examine the partial derivatives of $n_b$ and $n_{a \wedge \overline{b}}$ the number of counter-examples. We get: -$$ \frac{\partial +\begin{equation} + \frac{\partial q}{\partial n_b} = \frac{1}{2} n_{a \wedge \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}} -+ \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} > 0 $$ + + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} > + 0 + \label{eq2.3} +\end{equation} -$$ \frac{\partial +\begin{equation} + \frac{\partial q}{\partial n_{a \wedge \overline{b}}} = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}} -= \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0 $$ + = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0 + \label{eq2.4} +\end{equation} + Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is @@ -745,3 +763,330 @@ observed values $n_b$ and $ n_{a \wedge \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and $n_{a \wedge \overline{b}}+ n_{a \wedge \overline{b}}$. + +If we examine the case where $n_a$ varies, we obtain the partial +derivative of $q$ with respect to $n_a$ which is: + +\begin{equation} + C = \frac{ n_{a \wedge \overline{b}}}{2 + \sqrt{\frac{n_{\overline{b}}}{n}}} + \left(\frac{n}{n_a}\right)^{\frac{3}{2}} + -\frac{1}{2}\sqrt{\frac{n_{\overline{b}}}{n_a}}<0 + \label{eq2.5} + \end{equation} + +Thus, for variations of $n_a$ on $[0,~ nb]$, the implication index function is always decreasing (and concave) with respect to $n_a$ and is therefore minimum for $n_a= n_b$. As a result, the intensity of implication is increasing and maximum for $n_a= n_b$. + +Note the partial derivative of $q$ with respect to $n$: + +$$\frac{\partial q}{\partial n} = \frac{1}{2\sqrt{n}} \left( n_{a + \wedge \overline{b}}+\frac{n_a n_{\overline{b}}}{n} \right)$$ + +Consequently, if the other 3 parameters are constant, the implication +index decreases by $\sqrt{n}$. +The quality of implication is therefore all the better, a specific +property of the SIA compared to other indicators used in the +literature~\cite{Grasab}. +This property is in accordance with statistical and semantic +expectations regarding the credit given to the frequency of +observations. +Since the partial derivatives of $q$ (at least one of them) are +non-linear according to the variable parameters involved, we are +dealing with a non-linear dynamic system\footnote{"Non-linear systems + are systems that are known to be deterministic but for which, in + general, nothing can be predicted because calculations cannot be + made"~\cite{Ekeland} p. 265.} with all the epistemological +consequences that we will consider elsewhere. + + + +\subsection{Numerical example} +In a first experiment, we observe the occurrences: $n = 100$, $n_a = +20$, $n_b = 40$ (hence $n_b=60$, $ n_{a \wedge \overline{b}} = 4$). +The application of formula (\ref{eq2.1}) gives = -2.309. +In a 2nd experiment, $n$ and $n_a$ are unchanged but the occurrences +of $b$ and counter-examples $n_{a \wedge \overline{b}}$ increase by one unit. + +At the initial point of the space of the 4 variables, the partial +derivatives that only interest us (according to $n_b$ and $n_{a + \wedge \overline{b}}$) have respectively the following values when +applying formulas (\ref{eq2.3}) and (\ref{eq2.4}): $\frac{\partial + q}{\partial n_b} = 0.0385$ and $\frac{\partial q}{\partial n_{a + \wedge \overline{b}}} = 0.2887$. + +As $\Delta n_b$, $\Delta n_{\overline{b}}$ and $\Delta n_{a + \wedge \overline{b}} $ are equal to 1, -1 and 1, then $\Delta q$ is +equal to: $0.0385 + 0.2887 + o(\Delta q) = 0.3272 + o(\Delta q)$ and +the approximate value of $q$ in the second experiment is $-2.309 + +0.2887 + o(\Delta q)= -1.982 +o(\Delta q)$ using the first order +development of $q$ (formula (\ref{eq2.2})). +However, the calculation of the new implication index $q$ at the point +of the 2nd experiment is, by the use of (\ref{eq2.1}): $-1.9795$, a +value well approximated by the development of $q$. + + + +\subsection{A first differential relationship of $\varphi$ as a function of function $q$} +Let us consider the intensity of implication $\varphi$ as a function +of $q(a,\overline{b})$: +$$\varphi(q)=\frac{1}{\sqrt{2\pi}}\int_q^{\infty}e^{-\frac{t^2}{2}}$$ +We can then examine how $\varphi(q)$ varies when $q$ varies in the neighberhood of a given value $(a,b)$, knowing how $q$ itself varies according to the 4 parameters that determine it. By derivation of the integration bound, we obtain: +\begin{equation} + \frac{d\varphi}{dq}=-\frac{1}{\sqrt{2\pi}}e^{-\frac{q^2}{2}} < 0 + \label{eq2.6} +\end{equation} +This confirms that the intensity increases when $q$ decreases, but the growth rate is specified by the formula, which allows us to study more precisely the variations of $\varphi$. Since the derivative of $\varphi$ from $q$ is always negative, the function $\varphi$ is decreasing. + +{\bf Numerical example}\\ +Taking the values of the occurrences observed in the 2 experiments +mentioned above, we find for $q = -2.309$, the value of the intensity +of implication $\varphi(q)$ is equal to 0.992. Applying formula +(\ref{eq2.6}), the derivative of $\varphi$ with respect to $q$ is: +-0.02775 and the negative increase in intensity is then: -0.02775, +$\Delta q$ = 0.3272. The approximate first-order intensity is +therefore: $0.992-\Delta q$ or 0.983. However, the actual calculation +of this intensity is, for $q= -1.9795$, $\varphi(q) = 0.976$. + + + +\subsection{Examination of other indices} +Unlike the core index $q$ and the intensity of implication, which +measures quality through probability (see definition 2.3), the other +most common indices are intended to be direct measures of quality. +We will examine their respective sensitivities to changes in the +parameters used to define these indices. +We keep the ratings adopted in paragraph 2.2 and select indices that +are recalled in~\cite{Grasm},~\cite{Lencaa} and~\cite{Grast2}. + +\subsubsection{The Loevinger Index} + +It is an "ancestor" of the indices of +implication~\cite{Loevinger}. This index, rated $H(a,b)$, varies from +1 to $-\infty$. It is defined by: $H(a,b) =1-\frac{n n_{a \wedge + b}}{n_a n_b}$. Its partial derivative with respect to the variable number of counter-examples is therefore: +$$\frac{\partial H}{\partial n_{a \wedge \overline{b}}}=-\frac{n}{n_a n_b}$$ +Thus the implication index is always decreasing with $n_{a \wedge + \overline{b}}$. If it is "close" to 1, implication is "almost" +satisfied. But this index has the disadvantage, not referring to a +probability scale, of not providing a probability threshold and being +invariant in any dilation of $E$, $A$, $B$ and $A \cap \overline{B}$. + + +\subsubsection{The Lift Index} + +It is expressed by: $l =\frac{n n_{a \wedge b}}{n_a n_b}$. +This expression, linear with respect to the examples, can still be +written to highlight the number of counter-examples: +$$l =\frac{n (n_a - n_{a \wedge \overline{b}})}{n_a n_b}$$ +To study the sensitivity of the $l$ to parameter variations, we use: +$$\frac{\partial l}{\partial n_{a \wedge \overline{b}} } = +-\frac{1}{n_a n_b}$$ +Thus, the variation of the Lift index is independent of the variation +of the number of counter-examples. +It is a constant that depends only on variations in the occurrences of $a$ and $b$. Therefore, $l$ decreases when the number of counter-examples increases, which semantically is acceptable, but the rate of decrease does not depend on the rate of growth of $n_{a \wedge \overline{b}}$. + +\subsubsection{Confidence} + +This index is the best known and most widely used thanks to the sound +box available in an Anglo-Saxon publication~\cite{Agrawal}. +It is at the origin of several other commonly used indices which are only variants satisfying this or that semantic requirement... Moreover, it is simple and can be interpreted easily and immediately. +$$c=\frac{n_{a \wedge b}}{n_a} = 1-\frac{n_{a \wedge \overline{b}}}{n_a}$$ + +The first form, linear with respect to the examples, independent of +$n_b$, is interpreted as a conditional frequency of the examples of +$b$ when $a$ is known. +The sensitivity of this index to variations in the occurrence of +counter-examples is read through the partial derivative: +$$\frac{\partial c}{\partial n_{a \wedge \overline{b}} } = +-\frac{1}{n_a }$$ + + +Consequently, confidence increases when $n_{a \wedge \overline{b}}$ +decreases, which is semantically acceptable, but the rate of variation +is constant, independent of the rate of decrease of this number, of +the variations of $n$ and $n_b$. +This property seems not to satisfy intuition. +The gradient of $c$ is expressed only in relation to $n_{a \wedge + \overline{b}}$ and $n_a$: $\displaystyle \binom{ -\frac{1}{n_a}}{\frac{n_{a \wedge b}}{n_a^2}}$ + +This may also appear to be a restriction on the role of parameters in +expressing the sensitivity of the index. + +\section{Gradient field, implicative field} +We highlight here the existence of fields generated by the variables +of the corpus. + +\subsection{Existence of a gradient field} +Like our Newtonian physical space, where a gravitational field emitted +by each material object acts, we can consider that it is the same +around each variable. +For example, the variable $a$ generates a scalar field whose value in +$b$ is maximum and equal to the intensity of implication or the +implicition index $q(a,\overline{b})$. +Its action spreads in V according to differential laws as J.M. Leblond +says, in~\cite{Leblond} p.242. + +Let us consider the space $E$ of dimension 4 where the coordinates of +the points $M$ are the parameters relative to the binary variables $a$ +and $b$, i.e. ($n$, $n_a$, $n_b$, $n_{a\wedge \overline{b}}$). $q(a,\overline{b})$ is the realization of a scalar field, as an application of $\mathbb{R}^4$ in $\mathbb{R}$ (immersion of $\mathbb{N}^4$ in $\mathbb{R}^4$). +For the grad vector $q$ of components the partial derivatives of $q$ +with respect to variables $n$, $n_a$, $n_b$, $n_{a\wedge + \overline{b}}$ to define a gradient field - a particular vector +field that we will also call implicit field - it must respect the +Schwartz criterion of an exact total differential, i.e.: + +$$\frac{\partial}{\partial n_{a\wedge \overline{b}}}\left( +\frac{\partial q}{\partial n_b} \right) =\frac{\partial}{\partial n_b}\left( +\frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right) $$ +and the same for the other variables taken in pairs. However, we have, +through the formulas (\ref{eq2.3}) and (\ref{eq2.4}) + +$$ \frac{\partial}{\partial n_{a \wedge b}} \left( \frac{\partial q}{\partial n_b} \right) = \frac{1}{2} \left( \frac{n_a}{n}\right)^{-\frac{1}{2}} \left( \frac{n_{\overline{b}}}{n}\right)^{-\frac{3}{2}} = \frac{\partial}{\partial n_b}\left( +\frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right)$$ + +Thus, to the vector field C = ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) of $E$, the nature of which we will specify, corresponds a gradient field $G$ which is said to be derived from the {\bf potential} $q$. +The gradient grad $q$ is therefore the vector that represents the spatial variation of the field intensity. +It is directed from low field values to higher values. By following the gradient at each point, we follow the increase in the intensity of the field's implication in space and, in a way, the speed with which it changes as a result of the variation of one or more parameters. + +For example, if we set 3 of the parameters $n$, $n_a$, $n_b$, $n_{\overline{b}}$ given by the realization of the couple ($a$, $b$), the gradient is a vector whose direction indicates the growth or decrease of $q$, therefore the decrease or increase of $|q|$ and, as a consequence of $\varphi$ the variations of the 4th parameter. +We have indicated this above by interpreting formula (\ref{eq2.5}). + + +\subsection{Level or equipotential lines} +An equipotential (or level) line or surface in the $C$ field is a curve of $E$ along which or on which a variable point $M$ maintains the same value of the potential $q$ (e.g. isothermal lines on the globe or level lines on an IGN map). + +The equation of this surface\footnote{In differential geometry, it seems that this surface is a (quasi) differentiable variety on board, compact, homeomorphic with closed pavement of the intervals of variation of the 4 parameters. Note that the point whose component $n_b$ is equal to $n$ (therefore = 0) is a singular point ( "catastrophic" in René Thom's sense) of the surface and $q$, the potential, is not differentiable at this point. Everywhere else, the surface is distinguishable, the points are all regular. If time, for example, parameters the observations of the process of which ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) is a realization, at each instant corresponds a morphological fiber of the process represented by such a surface in space-time.} is, of course: +$$ q(a,\overline{b}) - \frac{n_{a \wedge \overline{b}}- + \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} = 0$$ + + +Therefore, on such a curve, the scalar product $grad~ q. dM$ is zero. +This is interpreted as indicating the orthogonality of the gradient with the tangent or hyperplane tangent to the curve, i.e. with the equipotential line or surface. +In a kinematic interpretation of our problem, the velocity of $M$'s path on the equipotential surface is orthogonal to the gradient in $M$. + +As an illustration in Figure~\ref{chap2fig2}, for a potential $F$ depending on only 2 variables, the figure below shows the orthogonal direction of the gradient with respect to the different equipotential surfaces along which the potential $F$ does not vary but passes from $F=7$ to $F= 10$. + +\begin{figure}[htbp] + \centering +\includegraphics[scale=1]{chap2fig2} + \caption{Illustration of potential of 2 variables} +\label{chap2fig2} % Give a unique label +\end{figure} + +It is possible in the case of the potential $q$, to build equipotential surfaces as above (two-dimensional for ease of representation). +It is understandable that the more intense the field is, the tighter the surfaces are. For a given value of $q$, in this case, 3 variables are set, for example $n$, $n_a$, $n_b$ and a value of $q$ compatible with the field constraints. Either: $n = 104$; $n_a = 1600 \leq nb = 3600$ and $q = -2$ or $|q| = 2$. We then find $n_{\overline{b}}= 528$ using formula~(\ref{eq2.1}). +But the points ($10^4$, $1600$, $5100$, $5100$, $728$) and ($100$, $25$, $64$, $3$) also belong to this surface and the same equipotential curve. +The point ($104$, $1600$, $3600$, $3600$, $928$) belongs to the equipotential curve $q=-3$). In fact, on this entire surface, we obtain a kind of homeostasis of the intensity of implication. + +The expression of the function $q$ of the variable shows that it is convex. +This property proves that the segment of points $t.M_1 + (1-t).M_2$, for $t \in [0,1]$ which connects two points $M_1$ and $M_2$ of the same equipotential line is entirely contained in its convexity. +The figure below shows two adjacent equipotential surfaces $\sum_1$ and $\sum_2$ in the implicit field corresponding to two values of the potential $q_1$ and $q_2$. +At point $M_1$ the scalar field therefore takes the value $q_1$. $M_2$ is the intersection of the normal from $M_1$ with $\sum_2$. Given the direction of the normal vector $\vec{n}$ the difference $\delta = q2 - q1$, variation of the field when we go from $\sum_1$ to $\sum_2$ is then equal to the opposite of the norm of the gradient from $q$ to $M_1$ is $\frac{\partial q}{\partial n}$, if $n_a$, $n_b$ and $n_{a \wedge \overline{b}}$ are fixed. + +\begin{figure}[htbp] + \centering +\includegraphics[scale=1]{chap2fig3} + \caption{Illustration of equipotential surfaces} +\label{chap2fig3} % Give a unique label +\end{figure} + +Thus, the space $E$ can be laminated by equipotential surfaces corresponding to successive values of $q$ relative to the cardinals ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$) which would be varied. +This situation corresponds to the one envisaged in the SIA modeling. +Fixing $n$, $n_a$ and $n_b$, we consider the random sets $X$ and $Y$ of the same cardinals as $A(n_a)$ and $B(n_b)$ and whose cardinal follows a Poisson's law or a binomial law, according to the choice of the model. +The different gradient fields, real "lines of force", associated with them are orthogonal to the surfaces defined by the corresponding values of $Q$. +This reminds us, in the theoretical framework of potential, of the premonitory metaphor of "implicit flow" that we expressed in~\cite{Grase} and that we will discuss again in Chapter 14 of the book. +Behind this notion we can imagine a transport of information of variable intensity in a causal universe. +We illustrate this metaphor with the study of the properties of the two-layer implicit cone (see §2.8). +Moreover and intuitively, the implication $a\Rightarrow b$ is of as good quality as the equipotential surface $C$ of the contingency covers random equipotential surfaces depending on the random variable. +Let us recall the relationship that unites the potential q with the intensity: +$$\varphi(a,b) =\frac{1}{\sqrt{2\pi}}\int_{q(a,\overline{b})}^{\infty}e^{-\frac{t^2}{2}} dt$$ + +\noindent {\bf remark 1}\\ +It can be seen that the intensity is also invariant on any equipotential surface of its own variations. +The surface portions generated by $q$ and by $\varphi$ are even in one-to-one correspondence. +In intuitive terms, we can say that when one "swells" the other "deflates".\\ + +\noindent {\bf remark 2}\\ +Let us note once again a particularity of the intensity of implication. +While the surfaces generated by the variations of the 4 parameters of the data are not invariant by the same dilation of the parameters, those associated with the indices cited in §2.4 are invariant and have the same undifferentiated geometric shape. + +\section{Implication-inclusion} +\subsection{Foundational and problematic situation} +Three reasons led us to improve the model formalized by the intensity of involvement: +\begin{itemize} +\item when the size of the samples processed, and in particular that of $E$, increases (by around a thousand and more), the intensity $\varphi(a,b)$ no longer tends to be sufficiently discriminating because its values can be very close to 1, while the inclusion whose quality it seeks to model is far from being satisfied (phenomenon reported in~\cite{Bodina} which deals with large student populations through international surveys); +\item the previous quasi-implication model essentially uses the measure of the strength of rule $a \Rightarrow b$. + However, taking into account a concomitance of $\neg b \Rightarrow \neg a$ (contraposed of implication) is useful or even essential to reinforce the affirmation of a good quality of the quasi-implicative, possibly quasi-causal, relationship of $a$ over $b$\footnote{This phenomenon is reported by Y. Kodratoff in~\cite{Kodratoff}.}. + At the same time, it could make it possible to correct the difficulty mentioned above (if $A$ and $B$ are small compared to $E$, their complementary will be important and vice versa); +\item the overcoming of Hempel's paradox (see Appendix 3 of this chapter). + \end{itemize} + +\subsection{An inclusion index} + +The solution\footnote{J. Blanchard provides in~\cite{Blanchardb} an answer to this problem by measuring the "equilibrium gap".} we provide uses both the intensity of implication and another index that reflects the asymmetry between situations $S_1 = (a \wedge b)$ and $S_1' = (a \wedge \neg b)$, (resp. $S2 = (\neg a \wedge \neg b)$ and $S_2' = (a \wedge \neg b)$) in favour of the first named. +The relative weakness of instances that contradict the rule and its counterpart is therefore fundamental. +Moreover, the number of counter-examples $n_{a \wedge \overline{b}}$ to $a\ Rightarrow b$ is the one to the contraposed one. +To account for the uncertainty associated with a possible bet of belonging to one of the two situations ($S_1$ or $S_1'$, (resp. $S_2$ or $S_2'$)), we therefore refer to Shannon's concept of entropy~\cite{Shannon}: +$$H(b\mid a) = - \frac{n_{a\wedge b}}{n_a}log_2 \frac{n_{a\wedge b}}{n_a} - \frac{n_{a\wedge \overline{b}}}{n_a}log_2 \frac{n_{a\wedge \overline{b}}}{n_a}$$ +is the conditional entropy relating to boxes $(a \wedge b)$ and $(a \wedge \neg b)$ when $a$ is realized + +$$H(\overline{a}\mid \overline{b}) = - \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}} - \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}$$ + +is the conditional entropy relative to the boxes $(\neg a \wedge \neg b)$ and $(a \wedge \neg b)$ when not $b$ is realized. + +These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$. +Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations. +For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints: + +\begin{enumerate} +\item It shall integrate the entropy values and, to contrast them, for example, integrate these values into the square. +\item As this square varies from 0 to 1, in order to denote the imbalance and therefore the inclusion, in order to oppose entropy, the value retained will be the complement to 1 of its square as long as the number of counter-examples is less than half of the observations of a (resp. non b). + Beyond these values, as the implications no longer have an inclusive meaning, the criterion will be assigned the value 0. +\item In order to take into account the two information specific to $a\Rightarrow b$ and $\neg b \Rightarrow \neg a$, the product will report on the simultaneous quality of the values retained. +The product has the property of cancelling itself as soon as one of its terms is cancelled, i.e. as soon as this quality is erased. +\item Finally, since the product has a dimension 4 with respect to entropy, its fourth root will be of the same dimension. +\end{enumerate} + +Let $\alpha=\frac{n_a}{n}$ be the frequency of a and $\overline{b}=\frac{n_{\overline{b}}}{n}$ be the frequency of non b. +Let $t=\frac{n_{a \wedge \overline{b}}}{n}$ be the frequency of counter-examples, the two significant terms of the respective qualities of involvement and its counterpart are: + +\begin{eqnarray*} + h_1(t) = H(b\mid a) = - (1-\frac{t}{\alpha}) log_2 (1-\frac{t}{\alpha}) - \frac{t}{\alpha} log_2 \frac{t}{\alpha} & \mbox{ if }t \in [0,\frac{\alpha}{2}[\\ + h_1(t) = 1 & \mbox{ if }t \in [\frac{\alpha}{2},\alpha]\\ + h_2(t)= H(\overline{a}\mid \overline{b}) = - (1-\frac{t}{\overline{\beta}}) log_2 (1-\frac{t}{\overline{\beta}}) - \frac{t}{\overline{b}} log_2 \frac{t}{\overline{b}} & \mbox{ if }t \in [0,\frac{\overline{\beta}}{2}[\\ + h_2(t)= 1 & \mbox{ if }t \in [\frac{\overline{\beta}}{2},\overline{\beta}] +\end{eqnarray*} +Hence the definition for determining the entropic criterion: +\definition: The inclusion index of A, support of a, in B, support of b, is the number: +$$i(a,b) = \left[ (1-h_1^2(t)) (1-h_2^2(t))) \right]^{\frac{1}{4}}$$ + +which integrates the information provided by the realization of a small number of counter-examples, on the one hand to the rule $a \Rightarrow b$ and, on the other hand, to the rule $\neg b \Rightarrow \neg a$. + +\subsection{The implication-inclusion index} + +The intensity of implication-inclusion (or entropic intensity), a new measure of inductive quality, is the number: + +$$\psi(a,b)= \left[ i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$ +which integrates both statistical surprise and inclusive quality. + +The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure 4{\bf TO CHANGE}, for $n_a$ and $n_b$ fixed. +Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal. +In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important. + + +{\bf FIGURE 4} + + +\noindent Example 1\\ + \begin{tabular}{|c|c|c|c|}\hline + & $b$ & $\overline{b}$ & margin\\ \hline + $a$ & 200 & 400& 600 \\ \hline + $\overline{a}$ & 600 & 2800& 3400 \\ \hline + margin & 800 & 3200& 4000 \\ \hline + \end{tabular} + \\ + In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$). + The entropic values of the experiment are $h_1=h_2=0$. + The value of the moderator coefficient is therefore $i(a,b)=0$. + Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$. +Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor.