+
+It is then demonstrated (for example in~\cite{Saporta}) that the
+number of events occurring during a period of fixed duration $n$
+follows a Poisson's law of parameter $c.n$ where $c$ is called the
+rate of the apparitions process during the unit of time.
+
+
+However, for each transaction assumed to be random, the event $[a=1]$
+has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
+has as probability the frequency, therefore the joint event $[a=1~
+ and~ b=0]$ has for probability estimated by the frequency
+$\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
+
+We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
+
+Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter :
+$$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
+
+As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
+
+Consequently, the probability that the hazard will lead, under the
+assumption of the absence of an a priori link between $a$ and $b$, to
+more counter-examples than those observed is:
+
+$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
+\sum^{card(A\cap \overline{B})}_{s=0} e^{-\lambda}\frac{\lambda^s}{s!} $$
+
+ But other legitimate drawing processes lead to a binomial law, or
+ even a hypergeometric law (itself not semantically adapted to the
+ situation because of its symmetry). Under suitable convergence
+ conditions, these two laws are finally reduced to the Poisson Law
+ above (see Annex to this chapter).
+
+If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
+into the variable:
+
+$$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) - \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} $$
+
+In the experimental realization, the observed value of
+$Q(a,\overline{b})$ is $q(a,\overline{b})$.
+It estimates a gap between the contingency $(card(A\cap
+\overline{B}))$ and the value it would have taken if there had been
+independence between $a$ and $b$.
+
+\definition
+\begin{equation} q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
+ \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}
+ \label{eq2.1}
+\end{equation}
+is called the implication index, the number used as an indicator of
+the non-implication of $a$ to $b$.
+In cases where the approximation is properly legitimized (for example
+$\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
+$Q(a,\overline{b})$ approximately follows the reduced centered normal
+distribution. The intensity of implication, measuring the quality of
+$a\Rightarrow b$, for $n_a\leq n_b$ and $nb \neq n$, is then defined
+from the index $q(a,\overline{b})$ by:
+
+\definition
+The implication intensity that measures the inductive quality of $a$
+over $b$ is:
+$$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
+\frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
+e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
+$$\varphi(a,b)=0,~ otherwise$$
+As a result, the definition of statistical implication becomes:
+\definition
+Implication $a\Rightarrow b$ is admissible at confidence level
+$1-\alpha $ if and only if:
+$$\varphi(a,b)\geq 1-\alpha$$
+
+
+It should be recalled that this modeling of quasi-implication measures
+the astonishment to note the smallness of counter-examples compared to
+the surprising number of instances of implication.
+It is a measure of the inductive and informative quality of
+implication. Therefore, if the rule is trivial, as in the case where
+$B$ is very large or coincides with $E$, this astonishment becomes
+small.
+We also demonstrate~\cite{Grasf} that this triviality results in a
+very low or even zero intensity of implication: If, $n_a$ being fixed
+and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
+towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
+define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
+$A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
+the inductive confidence, measured by statistical surprise, is
+insufficient.
+
+{\bf \remark Total correlation, partial correlation}
+
+
+We take here the notion of correlation in a more general sense than
+that used in the domain that develops the linear correlation
+coefficient (linear link measure) or the correlation ratio (functional
+link measure).
+In our perspective, there is a total (or partial) correlation between
+two variables $a$ and $b$ when the respective events they determine
+occur (or almost occur) at the same time, as well as their opposites.
+However, we know from numerical counter-examples that correlation and
+implication do not come down to each other, that there can be
+correlation without implication and vice versa~\cite{Grasf} and below.
+If we compare the implication coefficient and the linear correlation
+coefficient algebraically, it is clear that the two concepts do not
+coincide and therefore do not provide the same
+information\footnote{"More serious is the logical error inferred from
+ a correlation found to the existence of a causality" writes Albert
+ Jacquard in~\cite{Jacquard}, p.159. }.
+
+The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
+not coincide with the correlation coefficient $\rho(a, b)$ which is
+symmetric and which reflects the relationship between variables a and
+b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
+then
+$$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
+ n_{\overline{a}}}} q(a,\overline{b})$$
+With the correlation considered from the point of view of linear
+correlation, even if correlation and implication are rather in the
+same direction, the orientation of the relationship between two
+variables is not transparent because it is symmetrical, which is not
+the bias taken in the SIA.
+From a statistical relationship given by the correlation, two opposing
+empirical propositions can be deduced.
+
+The following dual numerical situation clearly illustrates this:
+
+
+\begin{table}[htp]
+\center
+\begin{tabular}{|l|c|c|c|}\hline
+\diagbox[width=4em]{$a_1$}{$b_1$}&
+ 1 & 0 & margin\\ \hline
+ 1 & 96 & 4& 100 \\ \hline
+ 0 & 50 & 50& 100 \\ \hline
+ margin & 146 & 54& 200 \\ \hline
+\end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
+\diagbox[width=4em]{$a_2$}{$b_2$}&
+ 1 & 0 & margin\\ \hline
+ 1 & 94 & 6& 100 \\ \hline
+ 0 & 52 & 48& 100 \\ \hline
+ margin & 146 & 54& 200 \\ \hline
+\end{tabular}
+
+\caption{Numeric example of difference between implication and
+ correlation}
+\label{chap2tab1}
+\end{table}
+
+In Table~\ref{chap2tab1}, the following correlation and implications
+can be computed:\\
+Correlation $\rho(a_1,b_1)=0.468$, Implication
+$q(a_1,\overline{b_1})=-4.082$\\
+Correlation $\rho(a_2,b_2)=0.473$, Implication $q(a_2,\overline{b_2})=-4.041$
+
+
+Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
+correlated than $a_2$ and $b_2$ while, on the other hand, the
+implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
+over $b_2$ since $q1 <q2$.
+
+On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
+finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
+
+\remark Remember that we consider not only conjunctions of variables
+of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
+or $c$..." in order to model phenomena that are concepts as it is done
+in learning or in artificial intelligence.
+The associated calculations remain compatible with the logic of the
+proposals linked by connectors.
+
+\remark Unlike the Loevinger Index~\cite{Loevinger} and conditional
+probability $(Pr[B/A])=1$ and all its derivatives, the implication
+intensity varies, non-linearly, with the expansion of sets $E$, $A$
+and $B$ and weakens with triviality (see Definition 2.3).
+Moreover, it
+is resistant to noise, especially around $0$ for, which can only make
+the relationship we want to model and establish statistically
+credible.
+Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
+maximum intensity, the inductive quality may not be strong, whereas
+$Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
+In paragraph 5, we study more closely the problem of the sensitivity
+and stability of the implication index as a function of small
+variations in the parameters involved in the study of its
+differential.
+
+\section{Case of modal and frequency variables}
+\subsection{Founding situation}
+
+Marc Bailleul's (1991-1994) research focuses in particular on the
+representation that mathematics teachers have of their own teaching.
+In order to highlight it, meaningful words are proposed to them that
+they must prioritize.
+Their choices are no longer binary, the words chosen by any teacher
+are ordered at least at the most representative.
+Mr. Bailleul's question then focuses on questions of the type: "if I
+choose this word with this importance, then I choose this other word
+with at least equal importance".
+It was therefore necessary to extend the notion of statistical
+implication to variables other than binary.
+This is the case for modal variables that are associated with
+phenomena where the values $a(x)$ are numbers in the interval $[0, 1]$
+and describe degrees of belonging or satisfaction as are fuzzy logic,
+for example, linguistic modifiers "maybe", "a little", "sometimes",
+etc.
+This problem is also found in situations where the frequency of a
+variable reflects a preorder on the values assigned by the subjects to
+the variables presented to them.
+These are frequency variables that are associated with phenomena where
+the values of $a(x)$ are any positive real values.
+This is the case when one considers a student's percentage of success
+in a battery of tests in different areas.
+
+\subsection{Formalization}
+
+J.B. Lagrange~\cite{Lagrange} has demonstrated that, in the modal
+case,
+\begin{itemize}
+ \item if $a(x)$ and $\overline{b}(x)$ are the values taken at $x$ by
+ the modal variables $a$ and $\overline{b}$, with $(x)=1-b(x)$
+ \item if $s^2_a$ and $s_{\overline{b}}^2$ are the empirical variances of variables $a$ and $\overline{b}$
+then the implication index, which he calls propensity index, becomes:
+
+\definition
+$$q(a,\overline{b}) = \frac{\sum_{x\in E} a(x)\overline{b}(x) -
+ \frac{n_a n_{\overline{b}}}{n}}
+{\sqrt{\frac{(n^2s_a^2+n_a^2)(n^2+s_{\overline{b}}^2 + n_{\overline{b}}^2)}{n^3}}}$$
+is the index of propensity of modal variables.
+\end{itemize}
+
+J.B. Lagrange also proves that this index coincides with the index
+defined previously in the binary case if the number of modalities of a
+and b is precisely 2, because in this case :\\
+$n^2s_a^2+n_a^2=n n_a$,~ ~ $ n^2+s_{\overline{b}}^2 + n_{\overline{b}}=n
+ n_{\overline{b}}$~ ~ and ~ ~ $\sum_{x\in E} a(x)\overline{b}(x)=n_{a \wedge
+ \overline{b}}$.
+
+ This solution provided in the modal case is also applicable to the
+ case of frequency variables, or even positive numerical variables,
+ provided that the values observed on the variables, such as a and b,
+ have been normalized, the normalization in $[0, 1]$ being made from the maximum of the value taken respectively by $a$ and $b$ on set $E$.
+
+\remark
+In~\cite{Regniera}, we consider rank variables that reflect a
+total order between choices presented to a population of judges.
+Each of them must order their preferential choice among a set of
+objects or proposals made to them.
+An index measures the quality of the statement of the type: "if object
+$a$ is ranked by judges then, generally, object $b$ is ranked higher
+by the same judges".
+Proximity to the previous issue leads to an index that is relatively
+close to the Lagrange index, but better adapted to the rank variable
+situation.
+
+
+\section{Cases of variables-on-intervals and interval-variables}
+\subsection{Variables-on-intervals}
+\subsubsection{Founding situation}
+
+For example, the following rule is sought to be extracted from a
+biometric data set, estimating its quality: "if an individual weighs
+between $65$ and $70kg$ then in general he is between $1.70$ and
+$1.76m$ tall".
+A similar situation arises in the search for relationships between
+intervals of student performance in two different subjects.
+The more general situation is then expressed as follows: two real
+variables $a$ and $b$ take a certain number of values over 2 finite
+intervals $[a1,~ a2]$ and $[b1,~ b2]$. Let $A$ (resp. $B$) be all the
+values of $a$ (resp. $b$) observed over $[a1,~ a2]$ (resp. $[b1,~
+ b2]$).
+For example, here, a represents the weights of a set of n subjects and b the sizes of these same subjects.
+
+Two problems arise:
+\begin{enumerate}
+\item Can adjacent sub-intervals of $[a1,~ a2]$ (resp. $[b1,~ b2]$)
+ be defined so that the finest partition obtained best respects the
+ distribution of the values observed in $[a1,~ a2]$ (resp. $[b1,~ b2]$)?
+\item Can we find the respective partitions of $[a1,~ a2]$ and $[b1,~
+ b2]$ made up of meetings of the previous adjacent sub-intervals,
+ partitions that maximize the average intensity of involvement of the
+ sub-intervals of one on sub-intervals on the other belonging to
+ these partitions?
+\end{enumerate}
+
+We answer these two questions as part of our problem by choosing the
+criteria to optimize in order to satisfy the optimality expected in
+each case.
+To the first question, many solutions have been provided in other
+settings (for example, by~\cite{Lahaniera}).
+
+\subsubsection{First problem}
+
+We will look at the interval $[a1,~ a2]$ assuming it has a trivial
+initial partition of sub-intervals of the same length, but not
+necessarily of the same frequency distribution observed on these
+sub-intervals.
+Note $P_0 = \{A_{01},~ A_{02},~ ...,~ A_{0p}\}$, this partition in $p$
+sub-intervals.
+We try to obtain a partition of $[a1,~ a2]$ into $p$ sub-intervals
+$\{A_{q1},~ A_{q2},~ ...,~ A_{qp}\}$ in such a way that within each
+sub-interval there is good statistical homogeneity (low intra-class
+inertia) and that these sub-intervals have good mutual heterogeneity
+(high inter-class inertia).
+We know that if one of the criteria is verified, the other is
+necessarily verified (Koenig-Huyghens theorem).
+This will be done by adopting a method directly inspired by the
+dynamic cloud method developed by Edwin Diday~\cite{Diday} (see also
+\cite{Lebart} and adapted to the current situation. This results in
+the optimal partition targeted.
+
+\subsubsection{Second problem}
+
+It is now assumed that the intervals $[a1,~ a2]$ and $[b1,~ b2]$ are
+provided with optimal partitions $P$ and $Q$, respectively, in the
+sense of the dynamic clouds.
+Let $p$ and $q$ be the respective numbers of sub-intervals composing
+$P$ and $Q$.
+From these two partitions, it is possible to generate $2^{p-1}$ and
+$2^{q-1}$ partitions obtained by iterated meetings of adjacent
+sub-intervals of $P$ and $Q$ \footnote{It is enough to consider the tree structure of which $A_1$ is the root, then to join it or not to $A_2$ which itself will or will not be joined to $A_3$, etc. There are therefore $2^{p-1}$ branches in this tree structure.} respectively.
+We calculate the respective intensities of implication of each
+sub-interval, whether or not combined with another of the first
+partition, on each sub-interval, whether or not combined with another
+of the second, and then the values of the intensities of the
+reciprocal implications.
+There are therefore a total of $2.2^{p-1}.2^{q-1}$ families of
+implication intensities, each of which requires the calculation of all
+the elements of a partition of $[a1,~ a2]$ on all the elements of one
+of the partitions of $[b1,~ b2]$ and vice versa.
+The optimality criterion is chosen as the geometric mean of the
+intensities of implication, the mean associated with each pair of
+partitions of elements, combined or not, defined inductively.
+We note the two maxima obtained (direct implication and its
+reciprocal) and we retain the two associated partitions by declaring
+that the implication of the variable-on-interval $a$ on the
+variable-on-interval $b$ is optimal when the interval $[a1,~ a2]$
+admits the partition corresponding to the first maximum and that the
+optimal reciprocal involvement is satisfied for the partition of
+$[b1,~ b2]$ corresponding to the second maximum.
+
+\subsection{Interval-variables}
+\subsubsection{Founding situation}
+Data are available from a population of $n$ individuals (who may be
+each or some of the sets of individuals, e.g. a class of students)
+according to variables (e.g. grades over a year in French, math,
+physics,..., but also: weight, height, chest size,...).
+The values taken by these variables for each individual are intervals
+of positive real values.
+For example, individual $x$ gives the value $[12,~ 15.50]$ to the math
+score variable.
+E. Diday would speak on this subject of symbolic variables $p$ at
+intervals defined on the population.
+
+
+We try to define an implication of intervals, relative to a variable
+$a$, which are themselves observed intervals, towards other similarly
+defined intervals and relative to another variable $b$.
+This will make it possible to measure the implicit, and therefore
+non-symmetric, association of certain interval(s) of the variable a
+with certain interval(s) of the variable $b$, as well as the
+reciprocal association from which the best one will be chosen for each
+pair of sub-intervals involved, as just described in §4.1.
+
+For example, it will be said that the sub-interval $[2, 5.5]$ of
+mathematical scores generally implies the sub-interval $[4.25, 7.5]$
+of physical scores, both of which belong to an optimal partition in
+terms of the explained variance of the respective value ranges $[1,
+ 18]$ and $[3, 20]$ taken in the population.
+Similarly, we will say that $[14.25, 17.80]$ in physics most often
+implies $[16.40, 18]$ in mathematics.
+
+
+\subsubsection{Algorithm}
+
+By following the problem of E. Diday and his collaborators, if the
+values taken according to the subjects by the variables $a$ and $b$
+are of a symbolic nature, in this case intervals of $\mathbb{R}^+$, it
+is possible to extend the above algorithms\cite{Grasi}.
+For example, variable $a$ has weight intervals associated with it and
+variable $b$ has size intervals associated with variable $b$, due to
+inaccurate measurements.
+By combining the intervals $I_x$ and $J_x$ described by the subjects
+$x$ of $E$ according to each of the variables $a$ and $b$
+respectively, we obtain two intervals $I$ and $J$ covering all
+possible values of $a$ and $b$.
+On each of them a partition can be defined in a certain number of
+intervals respecting as above a certain optimality criterion.
+For this purpose, the intersections of intervals such as $I_x$ and
+$J_x$ with these partitions will be provided with a distribution
+taking into account the areas of the common parts.
+This distribution may be uniform or of another discrete or continuous
+type.
+But thus, we are back in search of rules between two sets of
+variables-on-intervals that take, as previously in §4.1, their values
+on $[0,~ 1]$ from which we can search for optimal implications.
+
+
+\remark Whatever the type of variable considered, there is often a
+problem of overabundance of variables and therefore difficulty of
+representation.
+For this reason, we have defined an equivalence relationship on all
+variables that allows us to substitute a so-called leader variable for
+an equivalence class~\cite{Grask}.
+
+\section{Variations in the implication index q according to the 4 occurrences}
+
+In this paragraph, we examine the sensitivity of the implication index
+to disturbances in its parameters.
+
+\subsection{Stability of the implication index}
+To study the stability of the implication index $q$ is to examine its
+small variations in the vicinity of the $4$ observed integer values
+($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$).
+To do this, it is possible to perform different simulations by
+crossing these 4 integer variables on which $q$ depends~\cite{Grasx}.
+But let us consider these variables as variables with real values and
+$q$ as a function that can be continuously differentiated from these
+variables, which are themselves forced to respect inequalities: $0\leq
+n_a \leq n_b$ and $n_{a \wedge \overline{b}} \leq inf\{n_a,~ n_b\}$ and
+$sup\{n_a,~ n_b\} \leq n$.
+The function $q$ then defines a scalar and vector field on
+$\mathbb{R}^4$ as an affine and vector space on itself.
+In the likely hypothesis of an evolution of a nonchaotic process of
+data collection, it is then sufficient to examine the differential of
+$q$ with respect to these variables and to keep its restriction to the
+integer values of the parameters of the relationship $a \Rightarrow b$.
+The differential of $q$, in the sense of Fréchet's
+topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections,
+ i.e. subsets of naturals of the form $\{n,~ n+1,~ n+2,~ ....\}$, to be
+ used as a filter base, while the usual topology on $\mathbb{R}$
+ allows real intervals for filters.
+ Thus continuity and derivability are perfectly defined and
+ operational concepts according to Fréchet's topology in the same way
+ as they are with the usual topology.}, is expressed as follows by
+the scalar product:
+
+\begin{equation}
+dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial
+ n_a}dn_a + \frac{\partial q}{\partial n_b}dn_b + \frac{\partial
+ q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} =
+grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is
+ the elementary work of $q$ for a movement $dM$ (see chapter 14 of
+ this book).}
+\label{eq2.2}
+\end{equation}
+
+where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge
+ \overline{b}})$ of the vector scalar field $C$, $dM$ is the
+component vector the differential increases of these occurrence
+variables, and $grad~ q$ the component vector the partial derivatives
+of these occurrence variables.
+
+The differential of the function $q$ therefore appears as the scalar product of its gradient and the increase of $q$ on the surface representing the variations of the function $q(n,~ n_a,~ n_b,~ n_{a \wedge
+ \overline{b}})$. Thus, the gradient of $q$ represents its own
+variations according to those of its components, the 4 cardinals of
+the assemblies $E$, $A$, $B$ and $card(A\cap \overline{B})$. It
+indicates the direction and direction of growth or decrease of $q$ in
+the space of dimension 4. Remember that it is carried by the normal to
+the surface of level $q~ =~ cte$.
+
+If we want to study how $q$ varies according to $ n_{\overline{b}}$,
+we just have to replace $n_b$ by $n-n_b$ and therefore change the sign
+of the derivative of $n_b$ in the partial derivative. In fact, the
+interest of this differential lies in estimating the increase
+(positive or negative) of $q$ that we note $\Delta q$ in relation to
+the respective variations $\Delta n$, $\Delta n_a$, $\Delta n_b$ and
+$\Delta n_{a \wedge
+ \overline{b}}$. So we have:
+
+
+$$\Delta q= \frac{\partial q}{\partial n} \Delta n + \frac{\partial
+ q}{\partial n_a} \Delta n_a + \frac{\partial
+ q}{\partial n_b} \Delta n_b + \frac{\partial
+ q}{\partial n_{a \wedge
+ \overline{b}}} \Delta n_{a \wedge
+ \overline{b}} +o(\Delta q)$$
+
+where $o(\Delta q)$ is an infinitely small first order.
+Let us examine the partial derivatives of $n_b$ and $n_{a \wedge
+ \overline{b}}$ the number of counter-examples. We get:
+
+\begin{equation}
+ \frac{\partial
+ q}{\partial n_b} = \frac{1}{2} n_{a \wedge
+ \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}}
+ + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} >
+ 0
+ \label{eq2.3}
+\end{equation}
+
+
+\begin{equation}
+ \frac{\partial
+ q}{\partial n_{a \wedge
+ \overline{b}}} = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}}
+ = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0
+ \label{eq2.4}
+\end{equation}
+
+
+Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge
+ \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is
+also positive. This is interpreted as follows: if the number of
+examples of $b$ and the number of counter-examples of implication
+increase then the intensity of implication decreases for $n$ and $n_a$
+constant. In other words, this intensity of implication is maximum at
+observed values $n_b$ and $ n_{a \wedge
+ \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and $n_{a \wedge
+ \overline{b}}+ n_{a \wedge
+ \overline{b}}$.
+
+If we examine the case where $n_a$ varies, we obtain the partial
+derivative of $q$ with respect to $n_a$ which is:
+
+\begin{equation}
+ C = \frac{ n_{a \wedge \overline{b}}}{2
+ \sqrt{\frac{n_{\overline{b}}}{n}}}
+ \left(\frac{n}{n_a}\right)^{\frac{3}{2}}
+ -\frac{1}{2}\sqrt{\frac{n_{\overline{b}}}{n_a}}<0
+ \label{eq2.5}
+ \end{equation}
+
+Thus, for variations of $n_a$ on $[0,~ nb]$, the implication index function is always decreasing (and concave) with respect to $n_a$ and is therefore minimum for $n_a= n_b$. As a result, the intensity of implication is increasing and maximum for $n_a= n_b$.
+
+Note the partial derivative of $q$ with respect to $n$:
+
+$$\frac{\partial q}{\partial n} = \frac{1}{2\sqrt{n}} \left( n_{a
+ \wedge \overline{b}}+\frac{n_a n_{\overline{b}}}{n} \right)$$
+
+Consequently, if the other 3 parameters are constant, the implication
+index decreases by $\sqrt{n}$.
+The quality of implication is therefore all the better, a specific
+property of the SIA compared to other indicators used in the
+literature~\cite{Grasab}.
+This property is in accordance with statistical and semantic
+expectations regarding the credit given to the frequency of
+observations.
+Since the partial derivatives of $q$ (at least one of them) are
+non-linear according to the variable parameters involved, we are
+dealing with a non-linear dynamic system\footnote{"Non-linear systems
+ are systems that are known to be deterministic but for which, in
+ general, nothing can be predicted because calculations cannot be
+ made"~\cite{Ekeland} p. 265.} with all the epistemological
+consequences that we will consider elsewhere.
+
+
+
+\subsection{Numerical example}
+In a first experiment, we observe the occurrences: $n = 100$, $n_a =
+20$, $n_b = 40$ (hence $n_b=60$, $ n_{a \wedge \overline{b}} = 4$).
+The application of formula (\ref{eq2.1}) gives = -2.309.
+In a 2nd experiment, $n$ and $n_a$ are unchanged but the occurrences
+of $b$ and counter-examples $n_{a \wedge \overline{b}}$ increase by one unit.
+
+At the initial point of the space of the 4 variables, the partial
+derivatives that only interest us (according to $n_b$ and $n_{a
+ \wedge \overline{b}}$) have respectively the following values when
+applying formulas (\ref{eq2.3}) and (\ref{eq2.4}): $\frac{\partial
+ q}{\partial n_b} = 0.0385$ and $\frac{\partial q}{\partial n_{a
+ \wedge \overline{b}}} = 0.2887$.
+
+As $\Delta n_b$, $\Delta n_{\overline{b}}$ and $\Delta n_{a
+ \wedge \overline{b}} $ are equal to 1, -1 and 1, then $\Delta q$ is
+equal to: $0.0385 + 0.2887 + o(\Delta q) = 0.3272 + o(\Delta q)$ and
+the approximate value of $q$ in the second experiment is $-2.309 +
+0.2887 + o(\Delta q)= -1.982 +o(\Delta q)$ using the first order
+development of $q$ (formula (\ref{eq2.2})).
+However, the calculation of the new implication index $q$ at the point
+of the 2nd experiment is, by the use of (\ref{eq2.1}): $-1.9795$, a
+value well approximated by the development of $q$.
+
+
+
+\subsection{A first differential relationship of $\varphi$ as a function of function $q$}
+Let us consider the intensity of implication $\varphi$ as a function
+of $q(a,\overline{b})$:
+$$\varphi(q)=\frac{1}{\sqrt{2\pi}}\int_q^{\infty}e^{-\frac{t^2}{2}}$$
+We can then examine how $\varphi(q)$ varies when $q$ varies in the neighberhood of a given value $(a,b)$, knowing how $q$ itself varies according to the 4 parameters that determine it. By derivation of the integration bound, we obtain:
+\begin{equation}
+ \frac{d\varphi}{dq}=-\frac{1}{\sqrt{2\pi}}e^{-\frac{q^2}{2}} < 0
+ \label{eq2.6}
+\end{equation}
+This confirms that the intensity increases when $q$ decreases, but the growth rate is specified by the formula, which allows us to study more precisely the variations of $\varphi$. Since the derivative of $\varphi$ from $q$ is always negative, the function $\varphi$ is decreasing.
+
+{\bf Numerical example}\\
+Taking the values of the occurrences observed in the 2 experiments
+mentioned above, we find for $q = -2.309$, the value of the intensity
+of implication $\varphi(q)$ is equal to 0.992. Applying formula
+(\ref{eq2.6}), the derivative of $\varphi$ with respect to $q$ is:
+-0.02775 and the negative increase in intensity is then: -0.02775,
+$\Delta q$ = 0.3272. The approximate first-order intensity is
+therefore: $0.992-\Delta q$ or 0.983. However, the actual calculation
+of this intensity is, for $q= -1.9795$, $\varphi(q) = 0.976$.
+
+
+
+\subsection{Examination of other indices}
+Unlike the core index $q$ and the intensity of implication, which
+measures quality through probability (see definition 2.3), the other
+most common indices are intended to be direct measures of quality.
+We will examine their respective sensitivities to changes in the
+parameters used to define these indices.
+We keep the ratings adopted in paragraph 2.2 and select indices that
+are recalled in~\cite{Grasm},~\cite{Lencaa} and~\cite{Grast2}.
+
+\subsubsection{The Loevinger Index}
+
+It is an "ancestor" of the indices of
+implication~\cite{Loevinger}. This index, rated $H(a,b)$, varies from
+1 to $-\infty$. It is defined by: $H(a,b) =1-\frac{n n_{a \wedge
+ b}}{n_a n_b}$. Its partial derivative with respect to the variable number of counter-examples is therefore:
+$$\frac{\partial H}{\partial n_{a \wedge \overline{b}}}=-\frac{n}{n_a n_b}$$
+Thus the implication index is always decreasing with $n_{a \wedge
+ \overline{b}}$. If it is "close" to 1, implication is "almost"
+satisfied. But this index has the disadvantage, not referring to a
+probability scale, of not providing a probability threshold and being
+invariant in any dilation of $E$, $A$, $B$ and $A \cap \overline{B}$.
+
+
+\subsubsection{The Lift Index}
+
+It is expressed by: $l =\frac{n n_{a \wedge b}}{n_a n_b}$.
+This expression, linear with respect to the examples, can still be
+written to highlight the number of counter-examples:
+$$l =\frac{n (n_a - n_{a \wedge \overline{b}})}{n_a n_b}$$
+To study the sensitivity of the $l$ to parameter variations, we use:
+$$\frac{\partial l}{\partial n_{a \wedge \overline{b}} } =
+-\frac{1}{n_a n_b}$$
+Thus, the variation of the Lift index is independent of the variation
+of the number of counter-examples.
+It is a constant that depends only on variations in the occurrences of $a$ and $b$. Therefore, $l$ decreases when the number of counter-examples increases, which semantically is acceptable, but the rate of decrease does not depend on the rate of growth of $n_{a \wedge \overline{b}}$.
+
+\subsubsection{Confidence}
+
+This index is the best known and most widely used thanks to the sound
+box available in an Anglo-Saxon publication~\cite{Agrawal}.
+It is at the origin of several other commonly used indices which are only variants satisfying this or that semantic requirement... Moreover, it is simple and can be interpreted easily and immediately.
+$$c=\frac{n_{a \wedge b}}{n_a} = 1-\frac{n_{a \wedge \overline{b}}}{n_a}$$
+
+The first form, linear with respect to the examples, independent of
+$n_b$, is interpreted as a conditional frequency of the examples of
+$b$ when $a$ is known.
+The sensitivity of this index to variations in the occurrence of
+counter-examples is read through the partial derivative:
+$$\frac{\partial c}{\partial n_{a \wedge \overline{b}} } =
+-\frac{1}{n_a }$$
+
+
+Consequently, confidence increases when $n_{a \wedge \overline{b}}$
+decreases, which is semantically acceptable, but the rate of variation
+is constant, independent of the rate of decrease of this number, of
+the variations of $n$ and $n_b$.
+This property seems not to satisfy intuition.
+The gradient of $c$ is expressed only in relation to $n_{a \wedge
+ \overline{b}}$ and $n_a$: $\displaystyle \binom{ -\frac{1}{n_a}}{\frac{n_{a \wedge b}}{n_a^2}}$
+
+This may also appear to be a restriction on the role of parameters in
+expressing the sensitivity of the index.
+
+\section{Gradient field, implicative field}
+We highlight here the existence of fields generated by the variables
+of the corpus.
+
+\subsection{Existence of a gradient field}
+Like our Newtonian physical space, where a gravitational field emitted
+by each material object acts, we can consider that it is the same
+around each variable.
+For example, the variable $a$ generates a scalar field whose value in
+$b$ is maximum and equal to the intensity of implication or the
+implicition index $q(a,\overline{b})$.
+Its action spreads in V according to differential laws as J.M. Leblond
+says, in~\cite{Leblond} p.242.
+
+Let us consider the space $E$ of dimension 4 where the coordinates of
+the points $M$ are the parameters relative to the binary variables $a$
+and $b$, i.e. ($n$, $n_a$, $n_b$, $n_{a\wedge \overline{b}}$). $q(a,\overline{b})$ is the realization of a scalar field, as an application of $\mathbb{R}^4$ in $\mathbb{R}$ (immersion of $\mathbb{N}^4$ in $\mathbb{R}^4$).
+For the grad vector $q$ of components the partial derivatives of $q$
+with respect to variables $n$, $n_a$, $n_b$, $n_{a\wedge
+ \overline{b}}$ to define a gradient field - a particular vector
+field that we will also call implicit field - it must respect the
+Schwartz criterion of an exact total differential, i.e.:
+
+$$\frac{\partial}{\partial n_{a\wedge \overline{b}}}\left(
+\frac{\partial q}{\partial n_b} \right) =\frac{\partial}{\partial n_b}\left(
+\frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right) $$
+and the same for the other variables taken in pairs. However, we have,
+through the formulas (\ref{eq2.3}) and (\ref{eq2.4})
+
+$$ \frac{\partial}{\partial n_{a \wedge b}} \left( \frac{\partial q}{\partial n_b} \right) = \frac{1}{2} \left( \frac{n_a}{n}\right)^{-\frac{1}{2}} \left( \frac{n_{\overline{b}}}{n}\right)^{-\frac{3}{2}} = \frac{\partial}{\partial n_b}\left(
+\frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right)$$
+
+Thus, to the vector field C = ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) of $E$, the nature of which we will specify, corresponds a gradient field $G$ which is said to be derived from the {\bf potential} $q$.
+The gradient grad $q$ is therefore the vector that represents the spatial variation of the field intensity.
+It is directed from low field values to higher values. By following the gradient at each point, we follow the increase in the intensity of the field's implication in space and, in a way, the speed with which it changes as a result of the variation of one or more parameters.
+
+For example, if we set 3 of the parameters $n$, $n_a$, $n_b$, $n_{\overline{b}}$ given by the realization of the couple ($a$, $b$), the gradient is a vector whose direction indicates the growth or decrease of $q$, therefore the decrease or increase of $|q|$ and, as a consequence of $\varphi$ the variations of the 4th parameter.
+We have indicated this above by interpreting formula (\ref{eq2.5}).
+
+
+\subsection{Level or equipotential lines}
+An equipotential (or level) line or surface in the $C$ field is a curve of $E$ along which or on which a variable point $M$ maintains the same value of the potential $q$ (e.g. isothermal lines on the globe or level lines on an IGN map).
+
+The equation of this surface\footnote{In differential geometry, it seems that this surface is a (quasi) differentiable variety on board, compact, homeomorphic with closed pavement of the intervals of variation of the 4 parameters. Note that the point whose component $n_b$ is equal to $n$ (therefore = 0) is a singular point ( "catastrophic" in René Thom's sense) of the surface and $q$, the potential, is not differentiable at this point. Everywhere else, the surface is distinguishable, the points are all regular. If time, for example, parameters the observations of the process of which ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) is a realization, at each instant corresponds a morphological fiber of the process represented by such a surface in space-time.} is, of course:
+$$ q(a,\overline{b}) - \frac{n_{a \wedge \overline{b}}-
+ \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} = 0$$
+
+
+Therefore, on such a curve, the scalar product $grad~ q. dM$ is zero.
+This is interpreted as indicating the orthogonality of the gradient with the tangent or hyperplane tangent to the curve, i.e. with the equipotential line or surface.
+In a kinematic interpretation of our problem, the velocity of $M$'s path on the equipotential surface is orthogonal to the gradient in $M$.
+
+As an illustration in Figure~\ref{chap2fig2}, for a potential $F$ depending on only 2 variables, the figure below shows the orthogonal direction of the gradient with respect to the different equipotential surfaces along which the potential $F$ does not vary but passes from $F=7$ to $F= 10$.
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=1]{chap2fig2}
+ \caption{Illustration of potential of 2 variables}
+\label{chap2fig2} % Give a unique label
+\end{figure}
+
+It is possible in the case of the potential $q$, to build equipotential surfaces as above (two-dimensional for ease of representation).
+It is understandable that the more intense the field is, the tighter the surfaces are. For a given value of $q$, in this case, 3 variables are set, for example $n$, $n_a$, $n_b$ and a value of $q$ compatible with the field constraints. Either: $n = 104$; $n_a = 1600 \leq nb = 3600$ and $q = -2$ or $|q| = 2$. We then find $n_{\overline{b}}= 528$ using formula~(\ref{eq2.1}).
+But the points ($10^4$, $1600$, $5100$, $5100$, $728$) and ($100$, $25$, $64$, $3$) also belong to this surface and the same equipotential curve.
+The point ($104$, $1600$, $3600$, $3600$, $928$) belongs to the equipotential curve $q=-3$). In fact, on this entire surface, we obtain a kind of homeostasis of the intensity of implication.
+
+The expression of the function $q$ of the variable shows that it is convex.
+This property proves that the segment of points $t.M_1 + (1-t).M_2$, for $t \in [0,1]$ which connects two points $M_1$ and $M_2$ of the same equipotential line is entirely contained in its convexity.
+The figure below shows two adjacent equipotential surfaces $\sum_1$ and $\sum_2$ in the implicit field corresponding to two values of the potential $q_1$ and $q_2$.
+At point $M_1$ the scalar field therefore takes the value $q_1$. $M_2$ is the intersection of the normal from $M_1$ with $\sum_2$. Given the direction of the normal vector $\vec{n}$ the difference $\delta = q2 - q1$, variation of the field when we go from $\sum_1$ to $\sum_2$ is then equal to the opposite of the norm of the gradient from $q$ to $M_1$ is $\frac{\partial q}{\partial n}$, if $n_a$, $n_b$ and $n_{a \wedge \overline{b}}$ are fixed.
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=1]{chap2fig3}
+ \caption{Illustration of equipotential surfaces}
+\label{chap2fig3} % Give a unique label
+\end{figure}
+
+Thus, the space $E$ can be laminated by equipotential surfaces corresponding to successive values of $q$ relative to the cardinals ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$) which would be varied.
+This situation corresponds to the one envisaged in the SIA modeling.
+Fixing $n$, $n_a$ and $n_b$, we consider the random sets $X$ and $Y$ of the same cardinals as $A(n_a)$ and $B(n_b)$ and whose cardinal follows a Poisson's law or a binomial law, according to the choice of the model.
+The different gradient fields, real "lines of force", associated with them are orthogonal to the surfaces defined by the corresponding values of $Q$.
+This reminds us, in the theoretical framework of potential, of the premonitory metaphor of "implicit flow" that we expressed in~\cite{Grase} and that we will discuss again in Chapter 14 of the book.
+Behind this notion we can imagine a transport of information of variable intensity in a causal universe.
+We illustrate this metaphor with the study of the properties of the two-layer implicit cone (see §2.8).
+Moreover and intuitively, the implication $a\Rightarrow b$ is of as good quality as the equipotential surface $C$ of the contingency covers random equipotential surfaces depending on the random variable.
+Let us recall the relationship that unites the potential q with the intensity:
+$$\varphi(a,b) =\frac{1}{\sqrt{2\pi}}\int_{q(a,\overline{b})}^{\infty}e^{-\frac{t^2}{2}} dt$$
+
+\noindent {\bf remark 1}\\
+It can be seen that the intensity is also invariant on any equipotential surface of its own variations.
+The surface portions generated by $q$ and by $\varphi$ are even in one-to-one correspondence.
+In intuitive terms, we can say that when one "swells" the other "deflates".\\
+
+\noindent {\bf remark 2}\\
+Let us note once again a particularity of the intensity of implication.
+While the surfaces generated by the variations of the 4 parameters of the data are not invariant by the same dilation of the parameters, those associated with the indices cited in §2.4 are invariant and have the same undifferentiated geometric shape.
+
+\section{Implication-inclusion}
+\subsection{Foundational and problematic situation}
+Three reasons led us to improve the model formalized by the intensity of involvement:
+\begin{itemize}
+\item when the size of the samples processed, and in particular that of $E$, increases (by around a thousand and more), the intensity $\varphi(a,b)$ no longer tends to be sufficiently discriminating because its values can be very close to 1, while the inclusion whose quality it seeks to model is far from being satisfied (phenomenon reported in~\cite{Bodina} which deals with large student populations through international surveys);
+\item the previous quasi-implication model essentially uses the measure of the strength of rule $a \Rightarrow b$.
+ However, taking into account a concomitance of $\neg b \Rightarrow \neg a$ (contraposed of implication) is useful or even essential to reinforce the affirmation of a good quality of the quasi-implicative, possibly quasi-causal, relationship of $a$ over $b$\footnote{This phenomenon is reported by Y. Kodratoff in~\cite{Kodratoff}.}.
+ At the same time, it could make it possible to correct the difficulty mentioned above (if $A$ and $B$ are small compared to $E$, their complementary will be important and vice versa);
+\item the overcoming of Hempel's paradox (see Appendix 3 of this chapter).
+ \end{itemize}
+
+\subsection{An inclusion index}
+
+The solution\footnote{J. Blanchard provides in~\cite{Blanchardb} an answer to this problem by measuring the "equilibrium gap".} we provide uses both the intensity of implication and another index that reflects the asymmetry between situations $S_1 = (a \wedge b)$ and $S_1' = (a \wedge \neg b)$, (resp. $S2 = (\neg a \wedge \neg b)$ and $S_2' = (a \wedge \neg b)$) in favour of the first named.
+The relative weakness of instances that contradict the rule and its counterpart is therefore fundamental.
+Moreover, the number of counter-examples $n_{a \wedge \overline{b}}$ to $a\ Rightarrow b$ is the one to the contraposed one.
+To account for the uncertainty associated with a possible bet of belonging to one of the two situations ($S_1$ or $S_1'$, (resp. $S_2$ or $S_2'$)), we therefore refer to Shannon's concept of entropy~\cite{Shannon}:
+$$H(b\mid a) = - \frac{n_{a\wedge b}}{n_a}log_2 \frac{n_{a\wedge b}}{n_a} - \frac{n_{a\wedge \overline{b}}}{n_a}log_2 \frac{n_{a\wedge \overline{b}}}{n_a}$$
+is the conditional entropy relating to boxes $(a \wedge b)$ and $(a \wedge \neg b)$ when $a$ is realized
+
+$$H(\overline{a}\mid \overline{b}) = - \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}} - \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}$$
+
+is the conditional entropy relative to the boxes $(\neg a \wedge \neg b)$ and $(a \wedge \neg b)$ when not $b$ is realized.
+
+These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$.
+Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations.
+For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints:
+
+\begin{enumerate}
+\item It shall integrate the entropy values and, to contrast them, for example, integrate these values into the square.
+\item As this square varies from 0 to 1, in order to denote the imbalance and therefore the inclusion, in order to oppose entropy, the value retained will be the complement to 1 of its square as long as the number of counter-examples is less than half of the observations of a (resp. non b).
+ Beyond these values, as the implications no longer have an inclusive meaning, the criterion will be assigned the value 0.
+\item In order to take into account the two information specific to $a\Rightarrow b$ and $\neg b \Rightarrow \neg a$, the product will report on the simultaneous quality of the values retained.
+The product has the property of cancelling itself as soon as one of its terms is cancelled, i.e. as soon as this quality is erased.
+\item Finally, since the product has a dimension 4 with respect to entropy, its fourth root will be of the same dimension.
+\end{enumerate}
+
+Let $\alpha=\frac{n_a}{n}$ be the frequency of a and $\overline{b}=\frac{n_{\overline{b}}}{n}$ be the frequency of non b.
+Let $t=\frac{n_{a \wedge \overline{b}}}{n}$ be the frequency of counter-examples, the two significant terms of the respective qualities of involvement and its counterpart are:
+
+\begin{eqnarray*}
+ h_1(t) = H(b\mid a) = - (1-\frac{t}{\alpha}) log_2 (1-\frac{t}{\alpha}) - \frac{t}{\alpha} log_2 \frac{t}{\alpha} & \mbox{ if }t \in [0,\frac{\alpha}{2}[\\
+ h_1(t) = 1 & \mbox{ if }t \in [\frac{\alpha}{2},\alpha]\\
+ h_2(t)= H(\overline{a}\mid \overline{b}) = - (1-\frac{t}{\overline{\beta}}) log_2 (1-\frac{t}{\overline{\beta}}) - \frac{t}{\overline{b}} log_2 \frac{t}{\overline{b}} & \mbox{ if }t \in [0,\frac{\overline{\beta}}{2}[\\
+ h_2(t)= 1 & \mbox{ if }t \in [\frac{\overline{\beta}}{2},\overline{\beta}]
+\end{eqnarray*}
+Hence the definition for determining the entropic criterion:
+\definition: The inclusion index of A, support of a, in B, support of b, is the number:
+$$i(a,b) = \left[ (1-h_1^2(t)) (1-h_2^2(t))) \right]^{\frac{1}{4}}$$
+
+which integrates the information provided by the realization of a small number of counter-examples, on the one hand to the rule $a \Rightarrow b$ and, on the other hand, to the rule $\neg b \Rightarrow \neg a$.
+
+\subsection{The implication-inclusion index}
+
+The intensity of implication-inclusion (or entropic intensity), a new measure of inductive quality, is the number:
+
+$$\psi(a,b)= \left[ i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$
+which integrates both statistical surprise and inclusive quality.
+
+The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure~\ref{chap2fig4}, for $n_a$ and $n_b$ fixed.
+Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal.
+In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important.
+
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=0.5]{chap2fig4.png}
+\caption{Example of implication-inclusion.}
+
+\label{chap2fig4}
+\end{figure}
+
+In Figure~\ref{chap2fig4}, it can be seen that this representation of the continuous function of $t$ reflects the expected properties of the inclusion criterion:
+\begin{itemize}
+\item ``Slow reaction'' to the first counter-examples (noise resistance),
+\item ``acceleration'' of the rejection of inclusion close to the balance i.e. $\frac{n_a}{2n}$,
+\item rejection beyond $\frac{n_a}{2n}$, the intensity of implication $\varphi(a,b)$ did not ensure it.
+\end{itemize}
+
+\noindent Example 1\\
+\begin{tabular}{|c|c|c|c|}\hline
+ & $b$ & $\overline{b}$ & margin\\ \hline
+ $a$ & 200 & 400& 600 \\ \hline
+ $\overline{a}$ & 600 & 2800& 3400 \\ \hline
+ margin & 800 & 3200& 4000 \\ \hline
+\end{tabular}
+\\
+\\
+In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$).
+ The entropic values of the experiment are $h_1=h_2=0$.
+ The value of the moderator coefficient is therefore $i(a,b)=0$.
+ Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$.
+Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor.
+\\
+\\
+\noindent Example 2\\
+ \begin{tabular}{|c|c|c|c|}\hline
+ & $b$ & $\overline{b}$ & margin\\ \hline
+ $a$ & 400 & 200& 600 \\ \hline
+ $\overline{a}$ & 1000 & 2400& 3400 \\ \hline
+ margin & 1400 & 2600& 4000 \\ \hline
+ \end{tabular}
+ \\
+ \\
+ In Example 2, intensity of implication is 1 (for $q(a,\overline{b}) = - 8.43$).
+ The entropic values of the experiment are $h_1 = 0.918$ and $h_2 = 0.391$.
+ The value of the moderator coefficient is therefore $i(a,b) = 0.6035$.
+ As a result $\psi(a,b) = 0.777$ whereas $P(B \mid A) = 0.6666$.
+ \\
+ \\
+{\bf remark}
+ \noindent The correspondence between $\varphi(a,b)$ and $\psi(a,b)$ is not monotonous as shown in the following example:
+
+\begin{tabular}{|c|c|c|c|}\hline
+ & $b$ & $\overline{b}$ & margin\\ \hline
+ $a$ & 40 & 20& 60 \\ \hline
+ $\overline{a}$ & 60 & 280& 340 \\ \hline
+ margin & 100 & 300& 400 \\ \hline
+\end{tabular}
+\\
+Thus, while $\varphi(a,b)$ decreased from the 1st to the 2nd example, $i(a,b)$ increased as well as $\psi(a,b)$. On the other hand, the opposite situation is the most frequent.
+Note that in both cases, the conditional probability does not change.
+\\
+\\
+{\bf remark}
+\noindent We refer to~\cite{Lencaa} for a very detailed comparative study of association indices for binary variables.
+In particular, the intensities of classical and entropic (inclusion) implication presented in this article are compared with other indices according to a "user" entry.
+
+\section{Implication graph}
+\subsection{Problematic}
+
+At the end of the calculations of the intensities of implication in both the classical and entropic models, we have a table $p \times p$ that crosses the $p$ variables with each other, whatever their nature, and whose elements are the values of these intensities of implication, numbers of the interval $[0,~1]$.
+It must be noted that the underlying structure of all these variables is far from explicit and remains largely unimportant.
+The user remains blind to such a square table of size $p^2$.
+It cannot simultaneously embrace the possible multiple sequences of rules that underlie the overall structure of all $p$ variables.
+In order to facilitate a clearer extraction of the rules and to examine their structure, we have associated to this table, and for a given intensity threshold, an oriented graph, weighted by the intensities of implication, without a cycle whose complexity of representation the user can control by setting himself the threshold for taking into account the implicit quality of the rules.
+Each arc in this graph represents a rule: if $n_a < n_b$, the arc $a \rightarrow b$ represents the rule $a \Rightarrow b$ ; if $n_a = n_b$, then the arc $a \leftrightarrow b$ will represent the double rule $a \Leftrightarrow b$, in other words, the equivalence between these two variables.
+By varying the threshold of intensity of implication, it is obvious that the number of arcs varies in the opposite direction: for a threshold set at $0.95$, the number of arcs is less than or equal to those that would constitute the graph at threshold $0.90$. We will discuss this further below.
+
+\subsection{Algorithm}
+
+The relationship defined by statistical implication, if it is reflexive and not symmetrical, is obviously not transitive, as is induction and, on the contrary, deduction.
+However, we want it to model the partial relationship between two variables (the successes in our initial example).
+By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, we will accept the transitive closure $a \Rightarrow c$ only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ to $c$ is better than neutrality by emphasizing the dependence between $a$ and $c$.
+
+
+{\bf VERIFIER PHI PSI}\\
+\\
+{\bf Proposal:} By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, there is a transitive closure $a \Rightarrow c$ if and only if $\psi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ over $c$, which reflects a certain dependence between $a$ and $c$, is better than its refutation.
+Note that for any pair of variables $(x;~ y)$, the arc $x \rightarrow y$ is weighted by the intensity of involvement (x,y).
+\\
+Let us take a formal example by assuming that between the 5 variables $a$, $b$, $c$, $d$, and $e$ exist, at the threshold above $0.5$, the following rules: $c \Rightarrow a$, $c \Rightarrow e$, $c \Rightarrow b$, $d \Rightarrow a$, $d \Rightarrow e$, $a \Rightarrow b$ and $a \Rightarrow e$.
+
+This set of numerical and graphical relationships can then be translated into the following table and graph:
+
+\begin{tabular}{|C{0.5cm}|c|c|c|c|c|}\hline
+\hspace{-0.5cm}\turn{45}{$\Rightarrow$} & $a$ & $b$ & $c$ & $d$ & $e$\\ \hline
+$a$ & & 0.97& & & 0.73 \\ \hline
+$b$ & & & & & \\ \hline
+ $c$ & 0.82 & 0.975& & & 0.82 \\ \hline
+ $d$ & 0.78 & & & & 0.92 \\ \hline
+ $e$ & & & & & \\ \hline
+\end{tabular}
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=1]{chap2fig5.png}
+\caption{Implication graph corresponding to the previous example.}
+
+\label{chap2fig5}
+\end{figure}
+
+One of the difficulties related to the graphical representation is that the graph is not planar.
+The algorithm that allows its construction must take it into account and, in particular, must "straighten" the paths of the graph in order to allow an acceptable readability for the expert who will analyze it.
+
+The number of arcs in the graph can be reduced (or increased) if we raise (or lower) the acceptance threshold of the rules, the level of confidence in the selected rules.
+Correlatively, arcs can appear or disappear depending on the variations of the threshold.
+Let us recall that this graph is necessarily without cycle, that it is not a lattice since, for example, the variable $a$ does not imply the variable ($a$ or $\neg a$) whose support is $E$.
+A fortiori, it cannot be a Galois lattice.
+Options of the CHIC software for automatic data processing with SIA, allow to delete variables at will, to move their image in the graph in order to decrease the arcs or to focus on certain variables called vertices of a kind of "cone" whose two "plots" are made up respectively of the variables "parents" and the variables "children" of this vertex variable.
+We refer to the ends of the arcs as "nodes". A node in a given graph has a single variable or a conjunction of variables.
+The transition from a node $S_1$ to a node $S_2$ is also called "transition" which is represented by an arc in the graph.
+The upper slick of the vertex cone the variable $a$, called the nodal variable, is made up of the "fathers" of $a$, either in the "causal" sense the causes of $a$ ; the lower slick, on the other hand, is made up of the "children" of $a$ and therefore, always in the causal sense, the consequences or effects of $a$.
+The expert in the field analysed here must be particularly interested in these configurations, which are rich in information.
+See, for example~\cite{Lahanierc} and the two implicit cones below (i.e. Figures~\ref{chap2fig6} and \ref{chap2fig7}).
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=0.75]{chap2fig6.png}
+\caption{Implicative cone.}
+
+\label{chap2fig6}
+\end{figure}
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=0.75]{chap2fig7.png}
+\caption{Implicative cone centered on a variable.}
+
+\label{chap2fig7}
+\end{figure}
+
+
+\section{Reduction in the number of variables}
+\subsection{Motivation}
+
+
+As soon as the number of variables becomes excessive, most of the available techniques become impractical\footnote{This paragraph is strongly inspired by paper~\cite{Grask}.}.
+In particular, when an implicitive analysis is carried out by calculating association rules~\cite{Agrawal}, the number of rules discovered undergoes a combinatorial explosion with the number of variables, and quickly becomes inextricable for a decision-maker, provided that variable conjunctions are requested.
+In this context, it is necessary to make a preliminary reduction in the number of variables.
+
+Thus, ~\cite{Ritschard} proposed an efficient heuristic to reduce both the number of rows and columns in a table, using an association measure as a quasi-optimal criterion for controlling the heuristic.
+However, to our knowledge, in the various other research studies, the type of situation at the origin of the need to group rows or columns is not taken into account in the reduction criteria, whether the analyst's problem and aim are the search for similarity, dissimilarity, implication, etc., between variables.
+
+Also, to the extent that there are very similar variables in the sense of statistical implication, it might be appropriate to substitute a single variable for these variables that would be their leader in terms of representing an equivalence class of similar variables for the implicit purpose.
+We therefore propose, following the example of what is done to define the notion of quasi-implication, to define a notion of quasi-equivalence between variables, in order to build classes from which we will extract a leader.
+We will illustrate this with an example.
+Then, we will consider the possibility of using a genetic algorithm to optimize the choice of the representative for each quasi-equivalence class.
+
+\subsection{Definition of quasi-equivalence}
+
+Two binary variables $a$ and $b$ are logically equivalent for the SIA when the two quasi-implications $a \Rightarrow b$ and $b \Rightarrow a$ are simultaneously satisfied at a given threshold.
+We have developed criteria to assess the quality of a quasi-involvement: one is the statistical surprise based on the likelihood of~\cite{Lerman} relationship, the other is the entropic form of quasi-inclusion~\cite{Grash2} which is presented in this chapter (§7).
+
+According to the first criterion, we could say that two variables $a$ and $b$ are almost equivalent when the intensity of involvement $\varphi(a,b)$ of $a\Rightarrow b$ is little different from that of $b \Rightarrow a$. However, for large groups (several thousands), this criterion is no longer sufficiently discriminating to validate inclusion.
+
+According to the second criterion, an entropic measure of the imbalance between the numbers $n_{a \wedge b}$ (individuals who satisfy $a$ and $b$) and $n_{a \wedge \overline{b}} $ (individuals who satisfy $a$ and $\neg b$, counter-examples to involvement $a\Rightarrow b$) is used to indicate the quality of involvement $a\Rightarrow b$, on the one hand, and the numbers $n_{a \wedge b}$ and $n_{\overline{a} \wedge b}$ to assess the quality of mutual implication $b\Rightarrow a$, on the other.
+
+
+Here we will use a method comparable to that used in Chapter 3 to define the entropic implication index.
+
+By posing $n_a$ and $n_b$, respectively effective of $a$ and $b$, the imbalance of the rule $a\Rightarrow b$ is measured by a conditional entropy $K(b \mid a=1)$, and that of $b\Rightarrow a$ by $K(a \mid b=1)$ with:
+
+
+\begin{eqnarray*}
+ K(b\mid a=1) = - \left( 1- \frac{n_{a\wedge b}}{n_a}\right) log_2 \left( 1- \frac{n_{a\wedge b}}{n_a}\right) - \frac{n_{a\wedge b}}{n_a}log_2 \frac{n_{a\wedge b}}{n_a} & \quad if \quad \frac{n_{a \wedge b}}{n_a} > 0.5\\
+ K(b\mid a=1) = 1 & \quad if \quad \frac{n_{a \wedge b}}{n_a} \leq 0.5\\
+ K(a\mid b=1) = - \left( 1- \frac{n_{a\wedge b}}{n_b}\right) log_2 \left( 1- \frac{n_{a\wedge b}}{n_b}\right) - \frac{n_{a\wedge b}}{n_b}log_2 \frac{n_{a\wedge b}}{n_b} & \quad if \quad \frac{n_{a \wedge b}}{n_b} > 0.5\\
+ K(a\mid b=1) = 1 & \quad if \quad \frac{n_{a \wedge b}}{n_b} \leq 0.5
+\end{eqnarray*}
+
+These two entropies must be low enough so that it is possible to bet on $b$ (resp. $a$) with a good certainty when $a$ (resp. $b$) is achieved. Therefore their respective complements to 1 must be simultaneously strong.
+
+\begin{figure}[htbp]
+ \centering
+\includegraphics[scale=0.5]{chap2fig8.png}
+\caption{Illustration of the functions $K$ et $1-K^2$ on $[0; 1]$ .}
+
+\label{chap2fig7}
+\end{figure}
+
+
+\definition A first entropic index of equivalence is given by:
+$$e(a,b) = \left (\left[ 1 - K^2(b \mid a = 1)\right ]\left[ 1 - K^2(a \mid b = 1) \right]\right)^{\frac{1}{4}}$$
+
+When this index takes values in the neighbourhood of $1$, it reflects a good quality of a double implication.
+In addition, in order to better take into account $a \wedge b$ (the examples), we integrate this parameter through a similarity index $s(a,b)$ of the variables, for example in the sense of I.C. Lerman~\cite{Lermana}.
+The quasi-equivalence index is then constructed by combining these two concepts.
+
+\definition A second entropic equivalence index is given by the formula
+
+$$\sigma(a,b)= \left [ e(a,b).s(a,b)\right ]^{\frac{1}{2}}$$
+
+From this point of view, we then set out the quasi-equivalence criterion that we use.
+
+\definition The pair of variables $\{a,b\}$ is said to be almost equivalent for the selected quality $\beta$ if $\sigma(a,b) \geq \beta$.
+For example, a value $\beta=0.95$ could be considered as a good quasi-equivalence between $a$ and $b$.
+
+\subsection{Algorithm of construction of quasi-equivalence classes}
+
+Let us assume a set $V = \{a,b,c,...\}$ of $v$ variables with a valued relationship $R$ induced by the measurement of quasi-equivalence on all pairs of $V$.
+We will assume the pairs of variables classified in a decreasing order of quasi-equivalence.
+If we have set the quality threshold for quasi-equivalence at $\beta$, only the first of the pairs $\{a,b\}$ checking for inequality $\sigma(a,b)\ge \beta$ will be retained.
+In general, only a part $V'$, of cardinal $v'$, of the variables of $V$ will verify this inequality.
+If this set $V'$ is empty or too small, the user can reduce his requirement to a lower threshold value.
+The relationship being symmetrical, we will have at most pairs to study.
+As for $V-V'$, it contains only non-reducible variables.
+
+We propose to use the following greedy algorithm:
+\begin{enumerate}
+\item A first potential class $C_1^0= \{e,f\}$ is constituted such that $\sigma(e,f)$ represents the largest of the $\beta$-equivalence values.
+ If possible, this class is extended to a new class $C_1$ by taking from $V'$ all the elements $x$ such that any pair of variables within this class allows a quasi-equivalence greater than or equal to $\beta$;
+
+\item We continue with:
+ \begin{enumerate}
+ \item If $o$ and $k$ forming the pair $(o,k)$ immediately below $(e,f)$ according to the index $\sigma$, belong to $C_1$, then we move to the pair immediately below (o,k) and proceed as in 1.;
+ \item If $o$ and $k$ do not belong to $C_1$, proceed as in 1. from the pair they constitute by forming the basis of a new class;
+ \item If $o$ or $k$ does not belong to $C_1$, one of these two variables can either form a singleton class or belong to a future class. On this one, we will of course practice as above.
+ \end{enumerate}
+ \end{enumerate}
+
+After a finite number of iterations, a partition of $V$ is available in $r$ classes of $\sigma$-equivalence: $\{C_1, C_2,..., C_r\}$.
+The quality of the reduction may be assessed by a gross or proportional index of $\beta^{\frac{p}{k}}$.
+However, we prefer the criterion defined below, which has the advantage of integrating the choice of representative.
+
+In addition, $k$ variables representing the $k$ classes of $\sigma$-equivalence could be selected on the basis of the following elementary criterion: the quality of connection of this variable with those of its class.
+However, this criterion does not optimize the reduction since the choice of representative is relatively arbitrary and may be a sign of triviality of the variable.
+