+Moreover, to our knowledge, on the one hand, most often the different
+and interesting developments focus on proposals for a partial
+implication index for binary data~\cite{Lermana} or \cite{Lallich}, on
+the other hand, this notion is not extended to other types of
+variables, to extraction and representation according to a rule graph
+or a hierarchy of meta-rules; structures aiming at access to the
+meaning of a whole not reduced to the sum of its
+parts~\cite{Seve}\footnote{This is what the philosopher L. Sève
+ emphasizes :"... in the non-additive, non-linear passage of the
+ parts to the whole, there are properties that are in no way
+ precontained in the parts and which cannot therefore be explained by
+ them" }, i.e. operating as a complex non-linear system.
+For example, it is well known, through usage, that the meaning of a
+sentence does not completely depend on the meaning of each of the
+words in it (see the previous chapter, point 4).
+Let us return to what we believe is fertile in the approach we are
+It would seem that, in the literature, the notion of implication index
+is also not extended to the search for subjects and categories of
+subjects responsible for associations.
+Nor that this responsibility is quantified and thus leads to a
+reciprocal structuring of all subjects, conditioned by their
+relationships to variables.
+We propose these extensions here after recalling the founding
+\section{Implication intensity in the binary case}
+\subsection{Fundamental and founding situation}
+A set of objects or subjects E is crossed with variables
+(characters, criteria, successes,...) which are interrogated as
+follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$
+implies instantiating variable $b$?
+In other words, do the subjects tend to be $b$ if we know that they are
+In natural, human or life sciences situations, where theorems (if $a$
+then $b$) in the deductive sense of the term cannot be established
+because of the exceptions that taint them, it is important for the
+researcher and the practitioner to "mine into his data" in order to
+identify sufficiently reliable rules (kinds of "partial theorems",
+inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship,
+a genesis, to describe, structure a population and make the assumption
+of a certain stability for descriptive and, if possible, predictive
+But this excavation requires the development of methods to guide it
+and to free it from trial and error and empiricism.
+To do this, following the example of the I.C. Lerman similarity
+measurement method \cite{Lerman,Lermanb}, following the classic
+approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we
+define~\cite{Grasb,Grasf} the confirmatory quality measure of the
+implicative relationship $a \Rightarrow b$ from the implausibility of
+the occurrence in the data of the number of cases that invalidate it,
+i.e. for which $a$ is verified without $b$ being verified. This
+amounts to comparing the difference between the quota and the
+theoretical if only chance occurred\footnote{"...[in agreement with
+ Jung] if the frequency of coincidences does not significantly
+ exceed the probability that they can be calculated by attributing
+ them solely by chance to the exclusion of hidden causal
+ relationships, we certainly have no reason to suppose the existence
+ of such relationships.", H. Atlan~\cite{Atlana}}.
+But when analyzing data, it is this gap that we take into account and
+not the statement of a rejection or null hypothesis eligibility.
+This measure is relative to the number of data verifying $a$ and not
+$b$ respectively, the circumstance in which the involvement is
+precisely put in default.
+It quantifies the expert's "astonishment" at the unlikely small number
+of counter-examples in view of the supposed independence between the
+variables and the numbers involved.
+Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$,
+In the classical paradigmatic situation and initially retained, it is
+about the performance (success-failure) to items of a questionnaire.
+To a finite set $E$ of $n$ subjects $x$, functions of the type : $x
+\rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies
+or has the character $a$ and $0$ (or $a(x) = false$) otherwise are
+associated by abuse of writing.
+In artificial intelligence, we will say that $x$ is an example or an
+instance for $a$ if $a(x) = 1$ and a counter-example if not.
+The $a \Rightarrow b$ rule is logically true if for any $x$ in the
+sample, $b(x)$ is null only if $a(x)$ is also null; in other words if
+set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the
+$x$ for which $b(x)=1$.
+However, this strict inclusion is only exceptionally observed in the
+pragmatically encountered experiments.
+In the case of a knowledge questionnaire, we could indeed observe a
+few rare students passing an item $a$ and not passing item $b$,
+without contesting the tendency to pass item $b$ when we have passed
+item $a$.
+With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or
+$n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the
+counter-examples (or) that must be taken into account in order to
+statistically accept whether or not to keep the quasi-implication or
+quasi-rule $a \Rightarrow b$. Thus, it is from the dialectic of
+example-counter-examples that the rule appears as the overcoming of
+To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of
+$E$, chosen randomly and independently (absence of a priori link
+between these two parts) and of the same respective cardinals as $A$
+and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
+We will then say:
+\definition $a \Rightarrow b$ is acceptable at confidence level
+$1-\alpha$ if and only if
+$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
+ \centering
+ \caption{The dark grey parts correspond to the counter-examples of the
+ implication $a \Rightarrow b$}
+It is established \cite{Lermanb} that, for a certain drawing process,
+the random variable $Card(X\cap \overline{Y})$ follows the Poisson law
+of parameter $\frac{n_a n_{\overline{b}}}{n}$.
+We achieve this same result by proceeding differently in the following
+Note $X$ (resp. $Y$) the random subset of binary transactions where
+$a$ (resp. $b$) would appear, independently, with the frequency
+$\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$).
+To specify how the transactions specified in variables $a$ and $b$,
+respectively $A$ and $B$, are extracted, for example, the following
+semantically permissible assumptions are made regarding the
+observation of the event: $[a=1~ and~ b=0]$. $(A\cap
+\overline{B})$\footnote{We then note $\overline{v}$ the variable
+ negation of $v$ (or $not~ v$) and $\overline{P}$ the complementary
+ part of the part P of E.} is the subset of transactions,
+counter-examples of implication $a \Rightarrow b$:
+\item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent
+ random variables;
+\item h2: the law of the number of events occurring in the time
+ interval $[t,~ t+T[$ depends only on T;
+\item h3: two such events cannot occur simultaneously
+It is then demonstrated (for example in~\cite{Saporta}) that the
+number of events occurring during a period of fixed duration $n$
+follows a Poisson's law of parameter $c.n$ where $c$ is called the
+rate of the apparitions process during the unit of time.
+However, for each transaction assumed to be random, the event $[a=1]$
+has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
+has as probability the frequency, therefore the joint event $[a=1~
+ and~ b=0]$ has for probability estimated by the frequency
+$\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
+We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
+Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter :
+$$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
+As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
+Consequently, the probability that the hazard will lead, under the
+assumption of the absence of an a priori link between $a$ and $b$, to
+more counter-examples than those observed is:
+$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
+\sum^{card(A\cap \overline{B})}_{s=0} e^{-\lambda}\frac{\lambda^s}{s!} $$
+ But other legitimate drawing processes lead to a binomial law, or
+ even a hypergeometric law (itself not semantically adapted to the
+ situation because of its symmetry). Under suitable convergence
+ conditions, these two laws are finally reduced to the Poisson Law
+ above (see Annex to this chapter).
+If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
+into the variable:
+$$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) - \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} $$
+In the experimental realization, the observed value of
+$Q(a,\overline{b})$ is $q(a,\overline{b})$.
+It estimates a gap between the contingency $(card(A\cap
+\overline{B}))$ and the value it would have taken if there had been
+independence between $a$ and $b$.
+\begin{equation} q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
+ \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}
+ \label{eq2.1}
+is called the implication index, the number used as an indicator of
+the non-implication of $a$ to $b$.
+In cases where the approximation is properly legitimized (for example
+$\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
+$Q(a,\overline{b})$ approximately follows the reduced centered normal
+distribution. The intensity of implication, measuring the quality of
+$a\Rightarrow b$, for $n_a\leq n_b$ and $nb \neq n$, is then defined
+from the index $q(a,\overline{b})$ by:
+The implication intensity that measures the inductive quality of a
+over b is:
+$$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
+\frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
+e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
+$$\varphi(a,b)=0,~ otherwise$$
+As a result, the definition of statistical implication becomes:
+Implication $a\Rightarrow b$ is admissible at confidence level
+$1-\alpha $ if and only if:
+$$\varphi(a,b)\geq 1-\alpha$$
+It should be recalled that this modeling of quasi-implication measures
+the astonishment to note the smallness of counter-examples compared to
+the surprising number of instances of implication.
+It is a measure of the inductive and informative quality of
+implication. Therefore, if the rule is trivial, as in the case where
+$B$ is very large or coincides with $E$, this astonishment becomes
+We also demonstrate~\cite{Grasf} that this triviality results in a
+very low or even zero intensity of implication: If, $n_a$ being fixed
+and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
+towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
+define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
+$A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
+the inductive confidence, measured by statistical surprise, is
+{\bf \remark Total correlation, partial correlation}
+We take here the notion of correlation in a more general sense than
+that used in the domain that develops the linear correlation
+coefficient (linear link measure) or the correlation ratio (functional
+link measure).
+In our perspective, there is a total (or partial) correlation between
+two variables $a$ and $b$ when the respective events they determine
+occur (or almost occur) at the same time, as well as their opposites.
+However, we know from numerical counter-examples that correlation and
+implication do not come down to each other, that there can be
+correlation without implication and vice versa~\cite{Grasf} and below.
+If we compare the implication coefficient and the linear correlation
+coefficient algebraically, it is clear that the two concepts do not
+coincide and therefore do not provide the same
+information\footnote{"More serious is the logical error inferred from
+ a correlation found to the existence of a causality" writes Albert
+ Jacquard in~\cite{Jacquard}, p.159. }.
+The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
+not coincide with the correlation coefficient $\rho(a, b)$ which is
+symmetric and which reflects the relationship between variables a and
+b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
+$$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
+ n_{\overline{a}}}} q(a,\overline{b})$$
+With the correlation considered from the point of view of linear
+correlation, even if correlation and implication are rather in the
+same direction, the orientation of the relationship between two
+variables is not transparent because it is symmetrical, which is not
+the bias taken in the SIA.
+From a statistical relationship given by the correlation, two opposing
+empirical propositions can be deduced.
+The following dual numerical situation clearly illustrates this:
+ 1 & 0 & marge\\ \hline
+ 1 & 96 & 4& 100 \\ \hline
+ 0 & 50 & 50& 100 \\ \hline
+ marge & 146 & 54& 200 \\ \hline
+\end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
+ 1 & 0 & marge\\ \hline
+ 1 & 94 & 6& 100 \\ \hline
+ 0 & 52 & 48& 100 \\ \hline
+ marge & 146 & 54& 200 \\ \hline
+\caption{Numeric example of difference between implication and
+ correlation}
+In Table~\ref{chap2tab1}, the following correlation and implications
+can be computed:\\
+Correlation $\rho(a_1,b_1)=0.468$, Implication
+Correlation $\rho(a_2,b_2)=0.473$, Implication $q(a_2,\overline{b_2})=-4.041$
+Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
+correlated than $a_2$ and $b_2$ while, on the other hand, the
+implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
+over $b_2$ since $q1 <q2$.
+On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
+finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
+\remark Remember that we consider not only conjunctions of variables
+of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
+or $c$..." in order to model phenomena that are concepts as it is done
+in learning or in artificial intelligence.
+The associated calculations remain compatible with the logic of the
+proposals linked by connectors.
+\remark Unlike the Loevinger Index~\cite{Loevinger} and conditional
+probability $(Pr[B/A])=1$ and all its derivatives, the implication
+intensity varies, non-linearly, with the expansion of sets $E$, $A$
+and $B$ and weakens with triviality (see Definition 2.3).
+Moreover, it
+is resistant to noise, especially around $0$ for, which can only make
+the relationship we want to model and establish statistically
+Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
+maximum intensity, the inductive quality may not be strong, whereas
+$Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
+In paragraph 5, we study more closely the problem of the sensitivity
+and stability of the implication index as a function of small
+variations in the parameters involved in the study of its
+\section{Case of modal and frequency variables}
+\subsection{Founding situation}
+Marc Bailleul's (1991-1994) research focuses in particular on the
+representation that mathematics teachers have of their own teaching.
+In order to highlight it, meaningful words are proposed to them that
+they must prioritize.
+Their choices are no longer binary, the words chosen by any teacher
+are ordered at least at the most representative.
+Mr. Bailleul's question then focuses on questions of the type: "if I
+choose this word with this importance, then I choose this other word
+with at least equal importance".
+It was therefore necessary to extend the notion of statistical
+implication to variables other than binary.
+This is the case for modal variables that are associated with
+phenomena where the values $a(x)$ are numbers in the interval $[0, 1]$
+and describe degrees of belonging or satisfaction as are fuzzy logic,
+for example, linguistic modifiers "maybe", "a little", "sometimes",
+This problem is also found in situations where the frequency of a
+variable reflects a preorder on the values assigned by the subjects to
+the variables presented to them.
+These are frequency variables that are associated with phenomena where
+the values of $a(x)$ are any positive real values.
+This is the case when one considers a student's percentage of success
+in a battery of tests in different areas.
+J.B. Lagrange~\cite{Lagrange} has demonstrated that, in the modal
+ \item if $a(x)$ and $\overline{b}(x)$ are the values taken at $x$ by
+ the modal variables $a$ and $\overline{b}$, with $(x)=1-b(x)$
+ \item if $s^2_a$ and $s_{\overline{b}}^2$ are the empirical variances of variables $a$ and $\overline{b}$
+then the implication index, which he calls propensity index, becomes:
+$$q(a,\overline{b}) = \frac{\sum_{x\in E} a(x)\overline{b}(x) -
+ \frac{n_a n_{\overline{b}}}{n}}
+{\sqrt{\frac{(n^2s_a^2+n_a^2)(n^2+s_{\overline{b}}^2 + n_{\overline{b}}^2)}{n^3}}}$$
+is the index of propensity of modal variables.
+J.B. Lagrange also proves that this index coincides with the index
+defined previously in the binary case if the number of modalities of a
+and b is precisely 2, because in this case :\\
+$n^2s_a^2+n_a^2=n n_a$,~ ~ $ n^2+s_{\overline{b}}^2 + n_{\overline{b}}=n
+ n_{\overline{b}}$~ ~ and ~ ~ $\sum_{x\in E} a(x)\overline{b}(x)=n_{a \wedge
+ \overline{b}}$.
+ This solution provided in the modal case is also applicable to the
+ case of frequency variables, or even positive numerical variables,
+ provided that the values observed on the variables, such as a and b,
+ have been normalized, the normalization in $[0, 1]$ being made from the maximum of the value taken respectively by $a$ and $b$ on set $E$.
+In~\cite{Regniera}, we consider rank variables that reflect a
+total order between choices presented to a population of judges.
+Each of them must order their preferential choice among a set of
+objects or proposals made to them.
+An index measures the quality of the statement of the type: "if object
+$a$ is ranked by judges then, generally, object $b$ is ranked higher
+by the same judges".
+Proximity to the previous issue leads to an index that is relatively
+close to the Lagrange index, but better adapted to the rank variable
+\section{Cases of variables-on-intervals and interval-variables}
+\subsubsection{Founding situation}
+For example, the following rule is sought to be extracted from a
+biometric data set, estimating its quality: "if an individual weighs
+between $65$ and $70kg$ then in general he is between $1.70$ and
+$1.76m$ tall".
+A similar situation arises in the search for relationships between
+intervals of student performance in two different subjects.
+The more general situation is then expressed as follows: two real
+variables $a$ and $b$ take a certain number of values over 2 finite
+intervals $[a1,~ a2]$ and $[b1,~ b2]$. Let $A$ (resp. $B$) be all the
+values of $a$ (resp. $b$) observed over $[a1,~ a2]$ (resp. $[b1,~
+ b2]$).
+For example, here, a represents the weights of a set of n subjects and b the sizes of these same subjects.
+Two problems arise:
+\item Can adjacent sub-intervals of $[a1,~ a2]$ (resp. $[b1,~ b2]$)
+ be defined so that the finest partition obtained best respects the
+ distribution of the values observed in $[a1,~ a2]$ (resp. $[b1,~ b2]$)?
+\item Can we find the respective partitions of $[a1,~ a2]$ and $[b1,~
+ b2]$ made up of meetings of the previous adjacent sub-intervals,
+ partitions that maximize the average intensity of involvement of the
+ sub-intervals of one on sub-intervals on the other belonging to
+ these partitions?
+We answer these two questions as part of our problem by choosing the
+criteria to optimize in order to satisfy the optimality expected in
+each case.
+To the first question, many solutions have been provided in other
+settings (for example, by~\cite{Lahaniera}).
+\subsubsection{First problem}
+We will look at the interval $[a1,~ a2]$ assuming it has a trivial
+initial partition of sub-intervals of the same length, but not
+necessarily of the same frequency distribution observed on these
+Note $P_0 = \{A_{01},~ A_{02},~ ...,~ A_{0p}\}$, this partition in $p$
+We try to obtain a partition of $[a1,~ a2]$ into $p$ sub-intervals
+$\{A_{q1},~ A_{q2},~ ...,~ A_{qp}\}$ in such a way that within each
+sub-interval there is good statistical homogeneity (low intra-class
+inertia) and that these sub-intervals have good mutual heterogeneity
+(high inter-class inertia).
+We know that if one of the criteria is verified, the other is
+necessarily verified (Koenig-Huyghens theorem).
+This will be done by adopting a method directly inspired by the
+dynamic cloud method developed by Edwin Diday~\cite{Diday} (see also
+\cite{Lebart} and adapted to the current situation. This results in
+the optimal partition targeted.
+\subsubsection{Second problem}
+It is now assumed that the intervals $[a1,~ a2]$ and $[b1,~ b2]$ are
+provided with optimal partitions $P$ and $Q$, respectively, in the
+sense of the dynamic clouds.
+Let $p$ and $q$ be the respective numbers of sub-intervals composing
+$P$ and $Q$.
+From these two partitions, it is possible to generate $2^{p-1}$ and
+$2^{q-1}$ partitions obtained by iterated meetings of adjacent
+sub-intervals of $P$ and $Q$ \footnote{It is enough to consider the tree structure of which $A_1$ is the root, then to join it or not to $A_2$ which itself will or will not be joined to $A_3$, etc. There are therefore $2^{p-1}$ branches in this tree structure.} respectively.
+We calculate the respective intensities of implication of each
+sub-interval, whether or not combined with another of the first
+partition, on each sub-interval, whether or not combined with another
+of the second, and then the values of the intensities of the
+reciprocal implications.
+There are therefore a total of $2.2^{p-1}.2^{q-1}$ families of
+implication intensities, each of which requires the calculation of all
+the elements of a partition of $[a1,~ a2]$ on all the elements of one
+of the partitions of $[b1,~ b2]$ and vice versa.
+The optimality criterion is chosen as the geometric mean of the
+intensities of implication, the mean associated with each pair of
+partitions of elements, combined or not, defined inductively.
+We note the two maxima obtained (direct implication and its
+reciprocal) and we retain the two associated partitions by declaring
+that the implication of the variable-on-interval $a$ on the
+variable-on-interval $b$ is optimal when the interval $[a1,~ a2]$
+admits the partition corresponding to the first maximum and that the
+optimal reciprocal involvement is satisfied for the partition of
+$[b1,~ b2]$ corresponding to the second maximum.
+\subsubsection{Founding situation}
+Data are available from a population of $n$ individuals (who may be
+each or some of the sets of individuals, e.g. a class of students)
+according to variables (e.g. grades over a year in French, math,
+physics,..., but also: weight, height, chest size,...).
+The values taken by these variables for each individual are intervals
+of positive real values.
+For example, individual $x$ gives the value $[12,~ 15.50]$ to the math
+score variable.
+E. Diday would speak on this subject of symbolic variables $p$ at
+intervals defined on the population.
+We try to define an implication of intervals, relative to a variable
+$a$, which are themselves observed intervals, towards other similarly
+defined intervals and relative to another variable $b$.
+This will make it possible to measure the implicit, and therefore
+non-symmetric, association of certain interval(s) of the variable a
+with certain interval(s) of the variable $b$, as well as the
+reciprocal association from which the best one will be chosen for each
+pair of sub-intervals involved, as just described in §4.1.
+For example, it will be said that the sub-interval $[2, 5.5]$ of
+mathematical scores generally implies the sub-interval $[4.25, 7.5]$
+of physical scores, both of which belong to an optimal partition in
+terms of the explained variance of the respective value ranges $[1,
+ 18]$ and $[3, 20]$ taken in the population.
+Similarly, we will say that $[14.25, 17.80]$ in physics most often
+implies $[16.40, 18]$ in mathematics.
+By following the problem of E. Diday and his collaborators, if the
+values taken according to the subjects by the variables $a$ and $b$
+are of a symbolic nature, in this case intervals of $\mathbb{R}^+$, it
+is possible to extend the above algorithms\cite{Grasi}.
+For example, variable $a$ has weight intervals associated with it and
+variable $b$ has size intervals associated with variable $b$, due to
+inaccurate measurements.
+By combining the intervals $I_x$ and $J_x$ described by the subjects
+$x$ of $E$ according to each of the variables $a$ and $b$
+respectively, we obtain two intervals $I$ and $J$ covering all
+possible values of $a$ and $b$.
+On each of them a partition can be defined in a certain number of
+intervals respecting as above a certain optimality criterion.
+For this purpose, the intersections of intervals such as $I_x$ and
+$J_x$ with these partitions will be provided with a distribution
+taking into account the areas of the common parts.
+This distribution may be uniform or of another discrete or continuous
+But thus, we are back in search of rules between two sets of
+variables-on-intervals that take, as previously in §4.1, their values
+on $[0,~ 1]$ from which we can search for optimal implications.
+\remark Whatever the type of variable considered, there is often a
+problem of overabundance of variables and therefore difficulty of
+For this reason, we have defined an equivalence relationship on all
+variables that allows us to substitute a so-called leader variable for
+an equivalence class~\cite{Grask}.
+\section{Variations in the implication index q according to the 4 occurrences}
+In this paragraph, we examine the sensitivity of the implication index
+to disturbances in its parameters.
+\subsection{Stability of the implication index}
+To study the stability of the implication index $q$ is to examine its
+small variations in the vicinity of the $4$ observed integer values
+($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$).
+To do this, it is possible to perform different simulations by
+crossing these 4 integer variables on which $q$ depends~\cite{Grasx}.
+But let us consider these variables as variables with real values and
+$q$ as a function that can be continuously differentiated from these
+variables, which are themselves forced to respect inequalities: $0\leq
+n_a \leq n_b$ and $n_{a \wedge \overline{b}} \leq inf\{n_a,~ n_b\}$ and
+$sup\{n_a,~ n_b\} \leq n$.
+The function $q$ then defines a scalar and vector field on
+$\mathbb{R}^4$ as an affine and vector space on itself.
+In the likely hypothesis of an evolution of a nonchaotic process of
+data collection, it is then sufficient to examine the differential of
+$q$ with respect to these variables and to keep its restriction to the
+integer values of the parameters of the relationship $a \Rightarrow b$.
+The differential of $q$, in the sense of Fréchet's
+topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections,
+ i.e. subsets of naturals of the form $\{n,~ n+1,~ n+2,~ ....\}$, to be
+ used as a filter base, while the usual topology on $\mathbb{R}$
+ allows real intervals for filters.
+ Thus continuity and derivability are perfectly defined and
+ operational concepts according to Fréchet's topology in the same way
+ as they are with the usual topology.}, is expressed as follows by
+the scalar product:
+dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial
+ n_a}dn_a + \frac{\partial q}{\partial n_b}dn_b + \frac{\partial
+ q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} =
+grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is
+ the elementary work of $q$ for a movement $dM$ (see chapter 14 of
+ this book).}
+where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge
+ \overline{b}})$ of the vector scalar field $C$, $dM$ is the
+component vector the differential increases of these occurrence
+variables, and $grad~ q$ the component vector the partial derivatives
+of these occurrence variables.
+The differential of the function $q$ therefore appears as the scalar product of its gradient and the increase of $q$ on the surface representing the variations of the function $q(n,~ n_a,~ n_b,~ n_{a \wedge
+ \overline{b}})$. Thus, the gradient of $q$ represents its own
+variations according to those of its components, the 4 cardinals of
+the assemblies $E$, $A$, $B$ and $card(A\cap \overline{B})$. It
+indicates the direction and direction of growth or decrease of $q$ in
+the space of dimension 4. Remember that it is carried by the normal to
+the surface of level $q~ =~ cte$.
+If we want to study how $q$ varies according to $ n_{\overline{b}}$,
+we just have to replace $n_b$ by $n-n_b$ and therefore change the sign
+of the derivative of $n_b$ in the partial derivative. In fact, the
+interest of this differential lies in estimating the increase
+(positive or negative) of $q$ that we note $\Delta q$ in relation to
+the respective variations $\Delta n$, $\Delta n_a$, $\Delta n_b$ and
+$\Delta n_{a \wedge
+ \overline{b}}$. So we have:
+$$\Delta q= \frac{\partial q}{\partial n} \Delta n + \frac{\partial
+ q}{\partial n_a} \Delta n_a + \frac{\partial
+ q}{\partial n_b} \Delta n_b + \frac{\partial
+ q}{\partial n_{a \wedge
+ \overline{b}}} \Delta n_{a \wedge
+ \overline{b}} +o(\Delta q)$$
+where $o(\Delta q)$ is an infinitely small first order.
+Let us examine the partial derivatives of $n_b$ and $n_{a \wedge
+ \overline{b}}$ the number of counter-examples. We get:
+ \frac{\partial
+ q}{\partial n_b} = \frac{1}{2} n_{a \wedge
+ \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}}
+ + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} >
+ 0
+ \label{eq2.3}
+ \frac{\partial
+ q}{\partial n_{a \wedge
+ \overline{b}}} = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}}
+ = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0
+ \label{eq2.4}
+Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge
+ \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is
+also positive. This is interpreted as follows: if the number of
+examples of $b$ and the number of counter-examples of implication
+increase then the intensity of implication decreases for $n$ and $n_a$
+constant. In other words, this intensity of implication is maximum at
+observed values $n_b$ and $ n_{a \wedge
+ \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and $n_{a \wedge
+ \overline{b}}+ n_{a \wedge
+ \overline{b}}$.
+If we examine the case where $n_a$ varies, we obtain the partial
+derivative of $q$ with respect to $n_a$ which is:
+ C = \frac{ n_{a \wedge \overline{b}}}{2
+ \sqrt{\frac{n_{\overline{b}}}{n}}}
+ \left(\frac{n}{n_a}\right)^{\frac{3}{2}}
+ -\frac{1}{2}\sqrt{\frac{n_{\overline{b}}}{n_a}}<0
+ \label{eq2.5}
+ \end{equation}
+Thus, for variations of $n_a$ on $[0,~ nb]$, the implication index function is always decreasing (and concave) with respect to $n_a$ and is therefore minimum for $n_a= n_b$. As a result, the intensity of implication is increasing and maximum for $n_a= n_b$.
+Note the partial derivative of $q$ with respect to $n$:
+$$\frac{\partial q}{\partial n} = \frac{1}{2\sqrt{n}} \left( n_{a
+ \wedge \overline{b}}+\frac{n_a n_{\overline{b}}}{n} \right)$$
+Consequently, if the other 3 parameters are constant, the implication
+index decreases by $\sqrt{n}$.
+The quality of implication is therefore all the better, a specific
+property of the SIA compared to other indicators used in the
+This property is in accordance with statistical and semantic
+expectations regarding the credit given to the frequency of
+Since the partial derivatives of $q$ (at least one of them) are
+non-linear according to the variable parameters involved, we are
+dealing with a non-linear dynamic system\footnote{"Non-linear systems
+ are systems that are known to be deterministic but for which, in
+ general, nothing can be predicted because calculations cannot be
+ made"~\cite{Ekeland} p. 265.} with all the epistemological
+consequences that we will consider elsewhere.
+\subsection{Numerical example}
+In a first experiment, we observe the occurrences: $n = 100$, $n_a =
+20$, $n_b = 40$ (hence $n_b=60$, $ n_{a \wedge \overline{b}} = 4$).
+The application of formula (\ref{eq2.1}) gives = -2.309.
+In a 2nd experiment, $n$ and $n_a$ are unchanged but the occurrences
+of $b$ and counter-examples $n_{a \wedge \overline{b}}$ increase by one unit.
+At the initial point of the space of the 4 variables, the partial
+derivatives that only interest us (according to $n_b$ and $n_{a
+ \wedge \overline{b}}$) have respectively the following values when
+applying formulas (\ref{eq2.3}) and (\ref{eq2.4}): $\frac{\partial
+ q}{\partial n_b} = 0.0385$ and $\frac{\partial q}{\partial n_{a
+ \wedge \overline{b}}} = 0.2887$.
+As $\Delta n_b$, $\Delta n_{\overline{b}}$ and $\Delta n_{a
+ \wedge \overline{b}} $ are equal to 1, -1 and 1, then $\Delta q$ is
+equal to: $0.0385 + 0.2887 + o(\Delta q) = 0.3272 + o(\Delta q)$ and
+the approximate value of $q$ in the second experiment is $-2.309 +
+0.2887 + o(\Delta q)= -1.982 +o(\Delta q)$ using the first order
+development of $q$ (formula (\ref{eq2.2})).
+However, the calculation of the new implication index $q$ at the point
+of the 2nd experiment is, by the use of (\ref{eq2.1}): $-1.9795$, a
+value well approximated by the development of $q$.
+\subsection{A first differential relationship of $\varphi$ as a function of function $q$}
+Let us consider the intensity of implication $\varphi$ as a function
+of $q(a,\overline{b})$:
+We can then examine how $\varphi(q)$ varies when $q$ varies in the neighberhood of a given value $(a,b)$, knowing how $q$ itself varies according to the 4 parameters that determine it. By derivation of the integration bound, we obtain:
+ \frac{d\varphi}{dq}=-\frac{1}{\sqrt{2\pi}}e^{-\frac{q^2}{2}} < 0
+ \label{eq2.6}
+This confirms that the intensity increases when $q$ decreases, but the growth rate is specified by the formula, which allows us to study more precisely the variations of $\varphi$. Since the derivative of $\varphi$ from $q$ is always negative, the function $\varphi$ is decreasing.
+{\bf Numerical example}\\
+Taking the values of the occurrences observed in the 2 experiments
+mentioned above, we find for $q = -2.309$, the value of the intensity
+of implication $\varphi(q)$ is equal to 0.992. Applying formula
+(\ref{eq2.6}), the derivative of $\varphi$ with respect to $q$ is:
+-0.02775 and the negative increase in intensity is then: -0.02775,
+$\Delta q$ = 0.3272. The approximate first-order intensity is
+therefore: $0.992-\Delta q$ or 0.983. However, the actual calculation
+of this intensity is, for $q= -1.9795$, $\varphi(q) = 0.976$.
+\subsection{Examination of other indices}
+Unlike the core index $q$ and the intensity of implication, which
+measures quality through probability (see definition 2.3), the other
+most common indices are intended to be direct measures of quality.
+We will examine their respective sensitivities to changes in the
+parameters used to define these indices.
+We keep the ratings adopted in paragraph 2.2 and select indices that
+are recalled in~\cite{Grasm},~\cite{Lencaa} and~\cite{Grast2}.
+\subsubsection{The Loevinger Index}
+It is an "ancestor" of the indices of
+implication~\cite{Loevinger}. This index, rated $H(a,b)$, varies from
+1 to $-\infty$. It is defined by: $H(a,b) =1-\frac{n n_{a \wedge
+ b}}{n_a n_b}$. Its partial derivative with respect to the variable number of counter-examples is therefore:
+$$\frac{\partial H}{\partial n_{a \wedge \overline{b}}}=-\frac{n}{n_a n_b}$$
+Thus the implication index is always decreasing with $n_{a \wedge
+ \overline{b}}$. If it is "close" to 1, implication is "almost"
+satisfied. But this index has the disadvantage, not referring to a
+probability scale, of not providing a probability threshold and being
+invariant in any dilation of $E$, $A$, $B$ and $A \cap \overline{B}$.
+\subsubsection{The Lift Index}
+It is expressed by: $l =\frac{n n_{a \wedge b}}{n_a n_b}$.
+This expression, linear with respect to the examples, can still be
+written to highlight the number of counter-examples:
+$$l =\frac{n (n_a - n_{a \wedge \overline{b}})}{n_a n_b}$$
+To study the sensitivity of the $l$ to parameter variations, we use:
+$$\frac{\partial l}{\partial n_{a \wedge \overline{b}} } =
+-\frac{1}{n_a n_b}$$
+Thus, the variation of the Lift index is independent of the variation
+of the number of counter-examples.
+It is a constant that depends only on variations in the occurrences of $a$ and $b$. Therefore, $l$ decreases when the number of counter-examples increases, which semantically is acceptable, but the rate of decrease does not depend on the rate of growth of $n_{a \wedge \overline{b}}$.
+This index is the best known and most widely used thanks to the sound
+box available in an Anglo-Saxon publication~\cite{Agrawal}.
+It is at the origin of several other commonly used indices which are only variants satisfying this or that semantic requirement... Moreover, it is simple and can be interpreted easily and immediately.
+$$c=\frac{n_{a \wedge b}}{n_a} = 1-\frac{n_{a \wedge \overline{b}}}{n_a}$$
+The first form, linear with respect to the examples, independent of
+$n_b$, is interpreted as a conditional frequency of the examples of
+$b$ when $a$ is known.
+The sensitivity of this index to variations in the occurrence of
+counter-examples is read through the partial derivative:
+$$\frac{\partial c}{\partial n_{a \wedge \overline{b}} } =
+-\frac{1}{n_a }$$
+Consequently, confidence increases when $n_{a \wedge \overline{b}}$
+decreases, which is semantically acceptable, but the rate of variation
+is constant, independent of the rate of decrease of this number, of
+the variations of $n$ and $n_b$.
+This property seems not to satisfy intuition.
+The gradient of $c$ is expressed only in relation to $n_{a \wedge
+ \overline{b}}$ and $n_a$:(). {\bf CHECK FORMULA}
+This may also appear to be a restriction on the role of parameters in
+expressing the sensitivity of the index.
+\section{Gradient field, implicative field}
+We highlight here the existence of fields generated by the variables
+of the corpus.
+\subsection{Existence of a gradient field}
+Like our Newtonian physical space, where a gravitational field emitted
+by each material object acts, we can consider that it is the same
+around each variable.
+For example, the variable $a$ generates a scalar field whose value in
+$b$ is maximum and equal to the intensity of implication or the
+implicition index $q(a,\overline{b})$.
+Its action spreads in V according to differential laws as J.M. Leblond
+says, in~\cite{Leblond} p.242.
+Let us consider the space $E$ of dimension 4 where the coordinates of
+the points $M$ are the parameters relative to the binary variables $a$
+and $b$, i.e. ($n$, $n_a$, $n_b$, $n_{a\wedge \overline{b}}$). $q(a,\overline{b})$ is the realization of a scalar field, as an application of $\mathbb{R}^4$ in $\mathbb{R}$ (immersion of $\mathbb{N}^4$ in $\mathbb{R}^4$).
+For the grad vector $q$ of components the partial derivatives of $q$
+with respect to variables $n$, $n_a$, $n_b$, $n_{a\wedge
+ \overline{b}}$ to define a gradient field - a particular vector
+field that we will also call implicit field - it must respect the
+Schwartz criterion of an exact total differential, i.e.:
+$$\frac{\partial}{\partial n_{a\wedge \overline{b}}}\left(
+\frac{\partial q}{\partial n_b} \right) =\frac{\partial}{\partial n_b}\left(
+\frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right) $$
+and the same for the other variables taken in pairs. However, we have,
+through the formulas (\ref{eq2.3}) and (\ref{eq2.4})