chapter2.tex

   1 %%%%%%%%%%%%%%%%%%%%% chapter.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   2 %
   3 % sample chapter
   4 %
   5 % Use this file as a template for your own input.
   6 %
   7 %%%%%%%%%%%%%%%%%%%%%%%% Springer-Verlag %%%%%%%%%%%%%%%%%%%%%%%%%%
   8 %\motto{Use the template \emph{chapter.tex} to style the various elements of your chapter content.}
   9 \chapter{From the founding situations of the SIA to its formalization}
  10 \label{intro} % Always give a unique label
  11 % use \chaptermark{}
  12 % to alter or adjust the chapter heading in the running head
  13
  14
  15
  16 \abstract{
  17 Starting from mathematical didactic situations, the implicitative
  18 statistical analysis method develops as problems are encountered and
  19 questions are asked.
  20 Its main objective is to structure data crossing subjects and
  21 variables, to extract inductive rules between variables and, based on
  22 the contingency of these rules, to explain and therefore forecast in
  23 various fields: psychology, sociology, biology, etc.
  24 It is for this purpose that the concepts of intensity of implication,
  25 class cohesion, implication-inclusion, significance of hierarchical
  26 levels, contribution of additional variables, etc., are based.
  27 Similarly, the processing of binary variables (e.g., descriptors) is
  28 gradually being supplemented by the processing of modal, frequency
  29 and, recently, interval and fuzzy variables.
  30 }
  31
  32 \section*{Preamble}
  33
  34 Human operative knowledge is mainly composed of two components: that
  35 of facts and that of rules between facts or between rules themselves.
  36 It is his learning that, through his culture and his personal
  37 experiences, allows him to gradually develop these forms of knowledge,
  38 despite the regressions, the questioning, the ruptures that arise at
  39 the turn of decisive information.
  40 However, we know that these dialectically contribute to ensuring a
  41 balanced operation.
  42 However, the rules are inductively formed in a relatively stable way
  43 as soon as the number of successes, in terms of their explanatory or
  44 anticipatory quality, reaches a certain level (of confidence) from
  45 which they are likely to be implemented.
  46 On the other hand, if this (subjective) level is not reached, the
  47 individual's economy will make him resist, in the first instance, his
  48 abandonment or criticism.
  49 Indeed, it is costly to replace the initial rule with another rule
  50 when a small number of infirmations appear, since it would have been
  51 reinforced by a large number of confirmations.
  52 An increase in this number of negative instances, depending on the
  53 robustness of the level of confidence in the rule, may lead to its
  54 readjustment or even abandonment.
  55 Laurent Fleury~\cite{Fleury}, in his thesis, correctly cites the
  56 example - which Régis repeats - of the highly admissible rule: "all
  57 Ferraris are red".
  58 This very robust rule will not be abandoned when observing a single or
  59 two counter-examples.
  60 Especially since it would not fail to be quickly
  61 re-comforted.
  62
  63 Thus, contrary to what is legitimate in mathematics, where not all
  64 rules (theorem) suffer from exception, where determinism is total,
  65 rules in the human sciences, more generally in the so-called "soft"
  66 sciences, are acceptable and therefore operative as long as the number
  67 of counter-examples remains "bearable" in view of the frequency of
  68 situations where they will be positive and effective.
  69 The problem in data analysis is then to establish a relatively
  70 consensual numerical criterion to define the notion of a level of
  71 confidence that can be adjusted to the level of requirement of the
  72 rule user.
  73 The fact that it is based on statistics is not surprising.
  74 That it has a property of non-linear resistance to noise (weakness of
  75 the first counter-example(s)) may also seem natural, in line with the
  76 "economic" meaning mentioned above.
  77 That it collapses if counter-examples are repeated also seems to have
  78 to guide our choice in the modeling of the desired criterion.
  79 This text presents the epistemological choice we have made.
  80 As such it is therefore refutable, but the number of situations and
  81 applications where it has proved relevant and fruitful leads us to
  82 reproduce its genesis here.
  83
  84 \section{Introduction}
  85
  86 Different theoretical approaches have been adopted to model the
  87 extraction and representation of imprecise (or partial) inference
  88 rules between binary variables (or attributes or characters)
  89 describing a population of individuals (or subjects or objects).
  90 But the initial situations and the nature of the data do not change
  91 the initial problem.
  92 It is a question of discovering non-symmetrical inductive rules to
  93 model relationships of the type "if a then almost b".
  94 This is, for example, the option of Bayesian networks~\cite{Amarger}
  95 or Galois lattices~\cite{Simon}.
  96 But more often than not, however, since the correlation and the
  97 ${\chi}^2$ test are unsuitable because of their symmetric nature,
  98 conditional probability~\cite{Loevinger, Agrawal,Grasn}  remains the
  99 driving force behind the definition of the association, even when the
 100 index of this selected association is multivariate~\cite{Bernard}.
 101
 102
 103
 104 Moreover, to our knowledge, on the one hand, most often the different
 105 and interesting developments focus on proposals for a partial
 106 implication index for binary data~\cite{Lermana} or \cite{Lallich}, on
 107 the other hand, this notion is not extended to other types of
 108 variables, to extraction and representation according to a rule graph
 109 or a hierarchy of meta-rules; structures aiming at access to the
 110 meaning of a whole not reduced to the sum of its
 111 parts~\cite{Seve}\footnote{This is what the philosopher L. Sève
 112   emphasizes :"... in the non-additive, non-linear passage of the
 113   parts to the whole, there are properties that are in no way
 114   precontained in the parts and which cannot therefore be explained by
 115   them" }, i.e. operating as a complex non-linear system.
 116 For example, it is well known, through usage, that the meaning of a
 117 sentence does not completely depend on the meaning of each of the
 118 words in it (see the previous chapter, point 4).
 119
 120 Let us return to what we believe is fertile in the approach we are
 121 developing.
 122 It would seem that, in the literature, the notion of implication index
 123 is also not extended to the search for subjects and categories of
 124 subjects responsible for associations.
 125 Nor that this responsibility is quantified and thus leads to a
 126 reciprocal structuring of all subjects, conditioned by their
 127 relationships to variables.
 128 We propose these extensions here after recalling the founding
 129 paradigm.
 130
 131
 132 \section{Implication intensity in the binary case}
 133
 134 \subsection{Fundamental and founding situation}
 135
 136 A set of objects or subjects E is crossed with variables
 137 (characters, criteria, successes,...) which are interrogated as
 138 follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$
 139 implies instantiating variable $b$?
 140 In other words, do the subjects tend to be $b$ if we know that they are
 141 $a$?".
 142 In natural, human or life sciences situations, where theorems (if $a$
 143 then $b$) in the deductive sense of the term cannot be established
 144 because of the exceptions that taint them, it is important for the
 145 researcher and the practitioner to "mine into his data" in order to
 146 identify sufficiently reliable rules (kinds of "partial theorems",
 147 inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship,
 148 a genesis, to describe, structure a population and make the assumption
 149 of a certain stability for descriptive and, if possible, predictive
 150 purposes.
 151 But this excavation requires the development of methods to guide it
 152 and to free it from trial and error and empiricism.
 153
 154
 155 \subsection{Mathematization}
 156
 157 To do this, following the example of the I.C. Lerman similarity
 158 measurement method \cite{Lerman,Lermanb}, following the classic
 159 approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we
 160 define~\cite{Grasb,Grasf} the confirmatory quality measure of the
 161 implicative relationship $a \Rightarrow b$ from the implausibility of
 162 the occurrence in the data of the number of cases that invalidate it,
 163 i.e. for which $a$ is verified without $b$ being verified. This
 164 amounts to comparing the difference between the quota and the
 165 theoretical if only chance occurred\footnote{"...[in agreement with
 166     Jung] if the frequency of coincidences does not significantly
 167   exceed the probability that they can be calculated by attributing
 168   them solely by chance to the exclusion of hidden causal
 169   relationships, we certainly have no reason to suppose the existence
 170   of such relationships.", H. Atlan~\cite{Atlana}}.
 171 But when analyzing data, it is this gap that we take into account and
 172 not the statement of a rejection or null hypothesis eligibility.
 173 This measure is relative to the number of data verifying $a$ and not
 174 $b$ respectively, the circumstance in which the involvement is
 175 precisely put in default.
 176 It quantifies the expert's "astonishment" at the unlikely small number
 177 of counter-examples in view of the supposed independence between the
 178 variables and the numbers involved.
 179
 180 Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$,
 181 $c$,...
 182 In the classical paradigmatic situation and initially retained, it is
 183 about the performance (success-failure) to items of a questionnaire.
 184 To a finite set $E$ of $n$ subjects $x$, functions of the type : $x
 185 \rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies
 186 or has the character $a$ and $0$ (or $a(x) = false$) otherwise are
 187 associated by abuse of writing.
 188 In artificial intelligence, we will say that $x$ is an example or an
 189 instance for $a$ if $a(x) = 1$ and a counter-example if not.
 190
 191
 192 The $a \Rightarrow b$ rule is logically true if for any $x$ in the
 193 sample, $b(x)$ is null only if $a(x)$ is also null; in other words if
 194 set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the
 195 $x$ for which $b(x)=1$.
 196 However, this strict inclusion is only exceptionally observed in the
 197 pragmatically encountered experiments.
 198 In the case of a knowledge questionnaire, we could indeed observe a
 199 few rare students passing an item $a$ and not passing item $b$,
 200 without contesting the tendency to pass item $b$ when we have passed
 201 item $a$.
 202 With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or
 203 $n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the
 204 counter-examples (or) that must be taken into account in order to
 205 statistically accept whether or not to keep the quasi-implication or
 206 quasi-rule  $a \Rightarrow b$.  Thus, it is from the dialectic of
 207 example-counter-examples that the rule appears as the overcoming of
 208 contradiction.
 209
 210 \subsection{Formalization}
 211
 212 To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of
 213 $E$, chosen randomly and independently (absence of a priori link
 214 between these two parts) and of the same respective cardinals as $A$
 215 and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
 216
 217 We will then say:
 218
 219 \definition $a \Rightarrow b$ is acceptable at confidence level
 220 $1-\alpha$ if and only if
 221 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
 222
 223 \begin{figure}[htbp]
 224   \centering
 225 \includegraphics[scale=0.34]{chap2fig1.png}
 226  \caption{The dark grey parts correspond to the counter-examples of the
 227    implication $a \Rightarrow b$}
 228 \label{chap2fig1}
 229 \end{figure}
 230
 231 It is established \cite{Lermanb} that, for a certain drawing process,
 232 the random variable $Card(X\cap \overline{Y})$ follows the Poisson law
 233 of parameter $\frac{n_a n_{\overline{b}}}{n}$.
 234 We achieve this same result by proceeding differently in the following
 235 way:
 236
 237 Note $X$ (resp. $Y$) the random subset of binary transactions where
 238 $a$ (resp. $b$) would appear, independently, with the frequency
 239 $\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$).
 240 To specify how the transactions specified in variables $a$ and $b$,
 241 respectively $A$ and $B$, are extracted, for example, the following
 242 semantically permissible assumptions are made regarding the
 243 observation of the event: $[a=1~ and~ b=0]$. $(A\cap
 244 \overline{B})$\footnote{We then note $\overline{v}$ the variable
 245   negation of $v$  (or $not~ v$) and $\overline{P}$ the complementary
 246   part of the part P of E.} is the subset of transactions,
 247 counter-examples of implication $a \Rightarrow b$:
 248
 249 Assumptions:
 250 \begin{itemize}
 251 \item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent
 252   random variables;
 253 \item h2: the law of the number of events occurring in the time
 254   interval $[t,~ t+T[$ depends only on T;
 255 \item h3: two such events cannot occur simultaneously
 256 \end{itemize}
 257
 258 It is then demonstrated (for example in~\cite{Saporta}) that the
 259 number of events occurring during a period of fixed duration $n$
 260 follows a Poisson's law of parameter $c.n$ where $c$ is called the
 261 rate of the apparitions process during the unit of time.
 262
 263
 264 However, for each transaction assumed to be random, the event $[a=1]$
 265 has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
 266 has as probability the frequency, therefore the joint event $[a=1~
 267   and~ b=0]$ has for probability estimated by the frequency
 268 $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
 269
 270 We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
 271
 272 Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter :
 273 $$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
 274
 275 As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
 276
 277 Consequently, the probability that the hazard will lead, under the
 278 assumption of the absence of an a priori link between $a$ and $b$, to
 279 more counter-examples than those observed is:
 280
 281 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
 282 \sum^{card(A\cap \overline{B})}_{s=0}  e^{-\lambda}\frac{\lambda^s}{s!} $$
 283
 284  But other legitimate drawing processes lead to a binomial law, or
 285  even a hypergeometric law (itself not semantically adapted to the
 286  situation because of its symmetry). Under suitable convergence
 287  conditions, these two laws are finally reduced to the Poisson Law
 288  above (see Annex to this chapter).
 289
 290 If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
 291 into the variable:
 292
 293 $$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) -  \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}  $$
 294
 295 In the experimental realization, the observed value of
 296 $Q(a,\overline{b})$ is $q(a,\overline{b})$.
 297 It estimates a gap between the contingency $(card(A\cap
 298 \overline{B}))$ and the value it would have taken if there had been
 299 independence between $a$ and $b$.
 300
 301 \definition
 302 \begin{equation} q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
 303     \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}
 304   \label{eq2.1}
 305 \end{equation}
 306 is called the implication index, the number used as an indicator of
 307 the non-implication of $a$ to $b$.
 308 In cases where the approximation is properly legitimized (for example
 309 $\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
 310 $Q(a,\overline{b})$ approximately follows the reduced centered normal
 311 distribution. The intensity of implication, measuring the quality of
 312 $a\Rightarrow b$, for $n_a\leq n_b$ and  $nb \neq n$, is then defined
 313 from the index $q(a,\overline{b})$ by:
 314
 315 \definition
 316 The implication intensity  that measures the inductive quality of a
 317 over b is:
 318 $$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
 319 \frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
 320 e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
 321 $$\varphi(a,b)=0,~ otherwise$$
 322 As a result, the definition of statistical implication becomes:
 323 \definition
 324 Implication  $a\Rightarrow b$ is admissible at confidence level
 325 $1-\alpha $ if and only if:
 326 $$\varphi(a,b)\geq 1-\alpha$$
 327
 328
 329 It should be recalled that this modeling of quasi-implication measures
 330 the astonishment to note the smallness of counter-examples compared to
 331 the surprising number of instances of implication.
 332 It is a measure of the inductive and informative quality of
 333 implication. Therefore, if the rule is trivial, as in the case where
 334 $B$ is very large or coincides with $E$, this astonishment becomes
 335 small.
 336 We also demonstrate~\cite{Grasf} that this triviality results in a
 337 very low or even zero intensity of implication: If, $n_a$ being fixed
 338 and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
 339 towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
 340 define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
 341 $A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
 342 the inductive confidence, measured by statistical surprise, is
 343 insufficient.
 344
 345 {\bf \remark Total correlation, partial correlation}
 346
 347
 348 We take here the notion of correlation in a more general sense than
 349 that used in the domain that develops the linear correlation
 350 coefficient (linear link measure) or the correlation ratio (functional
 351 link measure).
 352 In our perspective, there is a total (or partial) correlation between
 353 two variables $a$ and $b$ when the respective events they determine
 354 occur (or almost occur) at the same time, as well as their opposites.
 355 However, we know from numerical counter-examples that correlation and
 356 implication do not come down to each other, that there can be
 357 correlation without implication and vice versa~\cite{Grasf} and below.
 358 If we compare the implication coefficient and the linear correlation
 359 coefficient algebraically, it is clear that the two concepts do not
 360 coincide and therefore do not provide the same
 361 information\footnote{"More serious is the logical error inferred from
 362   a correlation found to the existence of a causality" writes Albert
 363   Jacquard in~\cite{Jacquard}, p.159. }.
 364
 365 The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
 366 not coincide with the correlation coefficient $\rho(a, b)$ which is
 367 symmetric and which reflects the relationship between variables a and
 368 b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
 369 then
 370 $$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
 371     n_{\overline{a}}}} q(a,\overline{b})$$
 372 With the correlation considered from the point of view of linear
 373 correlation, even if correlation and implication are rather in the
 374 same direction, the orientation of the relationship between two
 375 variables is not transparent because it is symmetrical, which is not
 376 the bias taken in the SIA.
 377 From a statistical relationship given by the correlation, two opposing
 378 empirical propositions can be deduced.
 379
 380 The following dual numerical situation clearly illustrates this:
 381
 382
 383 \begin{table}[htp]
 384 \center
 385 \begin{tabular}{|l|c|c|c|}\hline
 386 \diagbox[width=4em]{$a_1$}{$b_1$}&
 387   1 & 0 & margin\\ \hline
 388   1 & 96 & 4& 100 \\ \hline
 389   0 & 50 & 50& 100 \\ \hline
 390   margin & 146 & 54& 200 \\ \hline
 391 \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
 392 \diagbox[width=4em]{$a_2$}{$b_2$}&
 393   1 & 0 & margin\\ \hline
 394   1 & 94 & 6& 100 \\ \hline
 395   0 & 52 & 48& 100 \\ \hline
 396   margin & 146 & 54& 200 \\ \hline
 397 \end{tabular}
 398
 399 \caption{Numeric example of difference between implication and
 400   correlation}
 401 \label{chap2tab1}
 402 \end{table}
 403
 404 In Table~\ref{chap2tab1}, the following correlation and implications
 405 can be computed:\\
 406 Correlation $\rho(a_1,b_1)=0.468$, Implication
 407 $q(a_1,\overline{b_1})=-4.082$\\
 408 Correlation $\rho(a_2,b_2)=0.473$, Implication  $q(a_2,\overline{b_2})=-4.041$
 409
 410
 411 Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
 412 correlated than $a_2$ and $b_2$ while, on the other hand, the
 413 implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
 414 over $b_2$ since $q1 <q2$.
 415
 416 On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
 417 finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
 418
 419 \remark  Remember that we consider not only conjunctions of variables
 420 of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
 421 or $c$..." in order to model phenomena that are concepts as it is done
 422 in learning or in artificial intelligence.
 423 The associated calculations remain compatible with the logic of the
 424 proposals linked by connectors.
 425
 426 \remark Unlike the Loevinger Index~\cite{Loevinger}  and conditional
 427 probability $(Pr[B/A])=1$ and all its derivatives, the implication
 428 intensity varies, non-linearly, with the expansion of sets $E$, $A$
 429 and $B$ and weakens with triviality (see Definition 2.3).
 430 Moreover, it
 431 is resistant to noise, especially around $0$ for, which can only make
 432 the relationship we want to model and establish statistically
 433 credible.
 434 Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
 435 maximum intensity, the inductive quality may not be strong, whereas
 436 $Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
 437 In paragraph 5, we study more closely the problem of the sensitivity
 438 and stability of the implication index as a function of small
 439 variations in the parameters involved in the study of its
 440 differential.
 441
 442 \section{Case of modal and frequency variables}
 443 \subsection{Founding situation}
 444
 445 Marc Bailleul's (1991-1994) research focuses in particular on the
 446 representation that mathematics teachers have of their own teaching.
 447 In order to highlight it, meaningful words are proposed to them that
 448 they must prioritize.
 449 Their choices are no longer binary, the words chosen by any teacher
 450 are ordered at least at the most representative.
 451 Mr. Bailleul's question then focuses on questions of the type: "if I
 452 choose this word with this importance, then I choose this other word
 453 with at least equal importance".
 454 It was therefore necessary to extend the notion of statistical
 455 implication to variables other than binary.
 456 This is the case for modal variables that are associated with
 457 phenomena where the values $a(x)$ are numbers in the interval $[0, 1]$
 458 and describe degrees of belonging or satisfaction as are fuzzy logic,
 459 for example, linguistic modifiers "maybe", "a little", "sometimes",
 460 etc.
 461 This problem is also found in situations where the frequency of a
 462 variable reflects a preorder on the values assigned by the subjects to
 463 the variables presented to them.
 464 These are frequency variables that are associated with phenomena where
 465 the values of $a(x)$ are any positive real values.
 466 This is the case when one considers a student's percentage of success
 467 in a battery of tests in different areas.
 468
 469 \subsection{Formalization}
 470
 471 J.B. Lagrange~\cite{Lagrange} has demonstrated that, in the modal
 472 case,
 473 \begin{itemize}
 474   \item if $a(x)$ and $\overline{b}(x)$ are the values taken at $x$ by
 475     the modal variables $a$ and $\overline{b}$, with $(x)=1-b(x)$
 476   \item if $s^2_a$ and $s_{\overline{b}}^2$ are the empirical variances of variables $a$ and $\overline{b}$
 477 then  the implication index, which he calls propensity index, becomes:
 478
 479 \definition
 480 $$q(a,\overline{b}) = \frac{\sum_{x\in E} a(x)\overline{b}(x)  -
 481   \frac{n_a n_{\overline{b}}}{n}}
 482 {\sqrt{\frac{(n^2s_a^2+n_a^2)(n^2+s_{\overline{b}}^2 + n_{\overline{b}}^2)}{n^3}}}$$
 483 is the index of propensity of modal variables.
 484 \end{itemize}
 485
 486 J.B. Lagrange also proves that this index coincides with the index
 487 defined previously in the binary case if the number of modalities of a
 488 and b is precisely 2, because in this case :\\
 489 $n^2s_a^2+n_a^2=n n_a$,~ ~ $ n^2+s_{\overline{b}}^2 + n_{\overline{b}}=n
 490   n_{\overline{b}}$~ ~ and ~ ~ $\sum_{x\in E} a(x)\overline{b}(x)=n_{a \wedge
 491   \overline{b}}$.
 492
 493  This solution provided in the modal case is also applicable to the
 494  case of frequency variables, or even positive numerical variables,
 495  provided that the values observed on the variables, such as a and b,
 496  have been normalized, the normalization in $[0, 1]$ being made from the maximum of the value taken respectively by $a$ and $b$ on set $E$.
 497
 498 \remark
 499 In~\cite{Regniera}, we consider rank variables that reflect a
 500 total order between choices presented to a population of judges.
 501 Each of them must order their preferential choice among a set of
 502 objects or proposals made to them.
 503 An index measures the quality of the statement of the type: "if object
 504 $a$ is ranked by judges then, generally, object $b$ is ranked higher
 505 by the same judges".
 506 Proximity to the previous issue leads to an index that is relatively
 507 close to the Lagrange index, but better adapted to the rank variable
 508 situation.
 509
 510
 511 \section{Cases of variables-on-intervals  and interval-variables}
 512 \subsection{Variables-on-intervals}
 513 \subsubsection{Founding situation}
 514
 515 For example, the following rule is sought to be extracted from a
 516 biometric data set, estimating its quality: "if an individual weighs
 517 between $65$ and $70kg$ then in general he is between $1.70$ and
 518 $1.76m$ tall".
 519 A similar situation arises in the search for relationships between
 520 intervals of student performance in two different subjects.
 521 The more general situation is then expressed as follows: two real
 522 variables $a$ and $b$ take a certain number of values over 2 finite
 523 intervals $[a1,~ a2]$ and $[b1,~ b2]$. Let $A$ (resp. $B$) be all the
 524 values of $a$ (resp. $b$) observed over $[a1,~ a2]$ (resp. $[b1,~
 525   b2]$).
 526 For example, here, a represents the weights of a set of n subjects and b the sizes of these same subjects.
 527
 528 Two problems arise:
 529 \begin{enumerate}
 530 \item  Can adjacent sub-intervals of $[a1,~ a2]$ (resp. $[b1,~ b2]$)
 531   be defined so that the finest partition obtained best respects the
 532   distribution of the values observed in $[a1,~ a2]$ (resp. $[b1,~ b2]$)?
 533 \item  Can we find the respective partitions of $[a1,~ a2]$ and $[b1,~
 534   b2]$ made up of meetings of the previous adjacent sub-intervals,
 535   partitions that maximize the average intensity of involvement of the
 536   sub-intervals of one on sub-intervals on the other belonging to
 537   these partitions?
 538 \end{enumerate}
 539
 540 We answer these two questions as part of our problem by choosing the
 541 criteria to optimize in order to satisfy the optimality expected in
 542 each case.
 543 To the first question, many solutions have been provided in other
 544 settings (for example, by~\cite{Lahaniera}).
 545
 546 \subsubsection{First problem}
 547
 548 We will look at the interval $[a1,~ a2]$ assuming it has a trivial
 549 initial partition of sub-intervals of the same length, but not
 550 necessarily of the same frequency distribution observed on these
 551 sub-intervals.
 552 Note $P_0 = \{A_{01},~ A_{02},~ ...,~ A_{0p}\}$, this partition in $p$
 553 sub-intervals.
 554 We try to obtain a partition of $[a1,~ a2]$ into $p$ sub-intervals
 555 $\{A_{q1},~ A_{q2},~ ...,~ A_{qp}\}$ in such a way that within each
 556 sub-interval there is good statistical homogeneity (low intra-class
 557 inertia) and that these sub-intervals have good mutual heterogeneity
 558 (high inter-class inertia).
 559 We know that if one of the criteria is verified, the other is
 560 necessarily verified (Koenig-Huyghens theorem).
 561 This will be done by adopting a method directly inspired by the
 562 dynamic cloud method developed by Edwin Diday~\cite{Diday} (see also
 563 \cite{Lebart} and adapted to the current situation. This results in
 564 the optimal partition targeted.
 565
 566 \subsubsection{Second problem}
 567
 568 It is now assumed that the intervals $[a1,~ a2]$ and $[b1,~ b2]$ are
 569 provided with optimal partitions $P$ and $Q$, respectively, in the
 570 sense of the dynamic clouds.
 571 Let $p$ and $q$ be the respective numbers of sub-intervals composing
 572 $P$ and $Q$.
 573 From these two partitions, it is possible to generate $2^{p-1}$ and
 574 $2^{q-1}$ partitions obtained by iterated meetings of adjacent
 575 sub-intervals of $P$ and $Q$ \footnote{It is enough to consider the tree structure of which $A_1$ is the root, then to join it or not to $A_2$ which itself will or will not be joined to $A_3$, etc. There are therefore $2^{p-1}$ branches in this tree structure.} respectively.
 576 We calculate the respective intensities of implication of each
 577 sub-interval, whether or not combined with another of the first
 578 partition, on each sub-interval, whether or not combined with another
 579 of the second, and then the values of the intensities of the
 580 reciprocal implications.
 581 There are therefore a total of $2.2^{p-1}.2^{q-1}$ families of
 582 implication intensities, each of which requires the calculation of all
 583 the elements of a partition of $[a1,~ a2]$ on all the elements of one
 584 of the partitions of $[b1,~ b2]$ and vice versa.
 585 The optimality criterion is chosen as the geometric mean of the
 586 intensities of implication, the mean associated with each pair of
 587 partitions of elements, combined or not, defined inductively.
 588 We note the two maxima obtained (direct implication and its
 589 reciprocal) and we retain the two associated partitions by declaring
 590 that the implication of the variable-on-interval $a$ on the
 591 variable-on-interval $b$ is optimal when the interval $[a1,~ a2]$
 592 admits the partition corresponding to the first maximum and that the
 593 optimal reciprocal involvement is satisfied for the partition of
 594 $[b1,~ b2]$ corresponding to the second maximum.
 595
 596 \subsection{Interval-variables}
 597 \subsubsection{Founding situation}
 598 Data are available from a population of $n$ individuals (who may be
 599 each or some of the sets of individuals, e.g. a class of students)
 600 according to variables (e.g. grades over a year in French, math,
 601 physics,..., but also: weight, height, chest size,...).
 602 The values taken by these variables for each individual are intervals
 603 of positive real values.
 604 For example, individual $x$ gives the value $[12,~ 15.50]$ to the math
 605 score variable.
 606 E. Diday would speak on this subject of symbolic variables $p$ at
 607 intervals defined on the population.
 608
 609
 610 We try to define an implication of intervals, relative to a variable
 611 $a$, which are themselves observed intervals, towards other similarly
 612 defined intervals and relative to another variable $b$.
 613 This will make it possible to measure the implicit, and therefore
 614 non-symmetric, association of certain interval(s) of the variable a
 615 with certain interval(s) of the variable $b$, as well as the
 616 reciprocal association from which the best one will be chosen for each
 617 pair of sub-intervals involved, as just described in §4.1.
 618
 619 For example, it will be said that the sub-interval $[2, 5.5]$ of
 620 mathematical scores generally implies the sub-interval $[4.25, 7.5]$
 621 of physical scores, both of which belong to an optimal partition in
 622 terms of the explained variance of the respective value ranges $[1,
 623   18]$ and $[3, 20]$ taken in the population.
 624 Similarly, we will say that $[14.25, 17.80]$ in physics most often
 625 implies $[16.40, 18]$ in mathematics.
 626
 627
 628 \subsubsection{Algorithm}
 629
 630 By following the problem of E. Diday and his collaborators, if the
 631 values taken according to the subjects by the variables $a$ and $b$
 632 are of a symbolic nature, in this case intervals of $\mathbb{R}^+$, it
 633 is possible to extend the above algorithms\cite{Grasi}.
 634 For example, variable $a$ has weight intervals associated with it and
 635 variable $b$ has size intervals associated with variable $b$, due to
 636 inaccurate measurements.
 637 By combining the intervals $I_x$ and $J_x$ described by the subjects
 638 $x$ of $E$ according to each of the variables $a$ and $b$
 639 respectively, we obtain two intervals $I$ and $J$ covering all
 640 possible values of $a$ and $b$.
 641 On each of them a partition can be defined in a certain number of
 642 intervals respecting as above a certain optimality criterion.
 643 For this purpose, the intersections of intervals such as $I_x$ and
 644 $J_x$ with these partitions will be provided with a distribution
 645 taking into account the areas of the common parts.
 646 This distribution may be uniform or of another discrete or continuous
 647 type.
 648 But thus, we are back in search of rules between two sets of
 649 variables-on-intervals that take, as previously in §4.1, their values
 650 on $[0,~ 1]$ from which we can search for optimal implications.
 651
 652
 653 \remark Whatever the type of variable considered, there is often a
 654 problem of overabundance of variables and therefore difficulty of
 655 representation.
 656 For this reason, we have defined an equivalence relationship on all
 657 variables that allows us to substitute a so-called leader variable for
 658 an equivalence class~\cite{Grask}.
 659
 660 \section{Variations in the implication index q according to the 4 occurrences}
 661
 662 In this paragraph, we examine the sensitivity of the implication index
 663 to disturbances in its parameters.
 664
 665 \subsection{Stability of the implication index}
 666 To study the stability of the implication index $q$ is to examine its
 667 small variations in the vicinity of the $4$ observed integer values
 668 ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$).
 669 To do this, it is possible to perform different simulations by
 670 crossing these 4 integer variables on which $q$ depends~\cite{Grasx}.
 671 But let us consider these variables as variables with real values and
 672 $q$ as a function that can be continuously differentiated from these
 673 variables, which are themselves forced to respect inequalities: $0\leq
 674 n_a \leq n_b$ and $n_{a \wedge \overline{b}} \leq inf\{n_a,~ n_b\}$ and
 675 $sup\{n_a,~ n_b\} \leq n$.
 676 The function $q$ then defines a scalar and vector field on
 677 $\mathbb{R}^4$ as an affine and vector space on itself.
 678 In the likely hypothesis of an evolution of a nonchaotic process of
 679 data collection, it is then sufficient to examine the differential of
 680 $q$ with respect to these variables and to keep its restriction to the
 681 integer values of the parameters of the relationship $a \Rightarrow b$.
 682 The differential of $q$, in the sense of Fréchet's
 683 topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections,
 684   i.e. subsets of naturals of the form $\{n,~ n+1,~ n+2,~ ....\}$, to be
 685   used as a filter base, while the usual topology on $\mathbb{R}$
 686   allows real intervals for filters.
 687   Thus continuity and derivability are perfectly defined and
 688   operational concepts according to Fréchet's topology in the same way
 689   as they are with the usual topology.}, is expressed as follows by
 690 the scalar product:
 691
 692 \begin{equation}
 693 dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial
 694   n_a}dn_a +  \frac{\partial q}{\partial n_b}dn_b +  \frac{\partial
 695   q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} =
 696 grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is
 697   the elementary work of $q$ for a movement $dM$ (see chapter 14 of
 698   this book).}
 699 \label{eq2.2}
 700 \end{equation}
 701
 702 where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge
 703   \overline{b}})$ of the vector scalar field $C$, $dM$ is the
 704 component vector the differential increases of these occurrence
 705 variables, and $grad~ q$ the component vector the partial derivatives
 706 of these occurrence variables.
 707
 708 The differential of the function $q$ therefore appears as the scalar product of its gradient and the increase of $q$ on the surface representing the variations of the function $q(n,~ n_a,~ n_b,~ n_{a \wedge
 709   \overline{b}})$. Thus, the gradient of $q$ represents its own
 710 variations according to those of its components, the 4 cardinals of
 711 the assemblies $E$, $A$, $B$ and $card(A\cap \overline{B})$. It
 712 indicates the direction and direction of growth or decrease of $q$ in
 713 the space of dimension 4. Remember that it is carried by the normal to
 714 the surface of level $q~ =~ cte$.
 715
 716 If we want to study how $q$ varies according to $ n_{\overline{b}}$,
 717 we just have to replace $n_b$ by $n-n_b$ and therefore change the sign
 718 of the derivative of $n_b$ in the partial derivative. In fact, the
 719 interest of this differential lies in estimating the increase
 720 (positive or negative) of $q$ that we note $\Delta q$ in relation to
 721 the respective variations $\Delta n$, $\Delta n_a$, $\Delta n_b$ and
 722 $\Delta n_{a \wedge
 723   \overline{b}}$. So we have:
 724
 725
 726 $$\Delta q= \frac{\partial q}{\partial n} \Delta n + \frac{\partial
 727   q}{\partial n_a} \Delta n_a  + \frac{\partial
 728   q}{\partial n_b} \Delta n_b + \frac{\partial
 729   q}{\partial n_{a \wedge
 730   \overline{b}}} \Delta n_{a \wedge
 731   \overline{b}} +o(\Delta q)$$
 732
 733 where $o(\Delta q)$ is an infinitely small first order.
 734 Let us examine the partial derivatives of $n_b$ and  $n_{a \wedge
 735   \overline{b}}$ the number of counter-examples. We get:
 736
 737 \begin{equation}
 738   \frac{\partial
 739   q}{\partial n_b} = \frac{1}{2} n_{a \wedge
 740   \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}}
 741   + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} >
 742   0
 743   \label{eq2.3}
 744 \end{equation}
 745
 746
 747 \begin{equation}
 748   \frac{\partial
 749   q}{\partial n_{a \wedge
 750   \overline{b}}}    = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}}
 751   = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0
 752   \label{eq2.4}
 753 \end{equation}
 754
 755
 756 Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge
 757   \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is
 758 also positive. This is interpreted as follows: if the number of
 759 examples of $b$ and the number of counter-examples of implication
 760 increase then the intensity of implication decreases for $n$ and $n_a$
 761 constant. In other words, this intensity of implication is maximum at
 762 observed values $n_b$ and $ n_{a \wedge
 763   \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and  $n_{a \wedge
 764   \overline{b}}+ n_{a \wedge
 765   \overline{b}}$.
 766
 767 If we examine the case where $n_a$ varies, we obtain the partial
 768 derivative of $q$ with respect to $n_a$ which is:
 769
 770 \begin{equation}
 771   C = \frac{ n_{a \wedge \overline{b}}}{2
 772   \sqrt{\frac{n_{\overline{b}}}{n}}}
 773   \left(\frac{n}{n_a}\right)^{\frac{3}{2}}
 774   -\frac{1}{2}\sqrt{\frac{n_{\overline{b}}}{n_a}}<0
 775   \label{eq2.5}
 776   \end{equation}
 777
 778 Thus, for variations of $n_a$ on $[0,~ nb]$, the implication index function is always decreasing (and concave) with respect to $n_a$ and is therefore minimum for $n_a= n_b$. As a result, the intensity of implication is increasing and maximum for $n_a= n_b$.
 779
 780 Note the partial derivative of $q$ with respect to $n$:
 781
 782 $$\frac{\partial q}{\partial n} = \frac{1}{2\sqrt{n}} \left(  n_{a
 783   \wedge \overline{b}}+\frac{n_a n_{\overline{b}}}{n}   \right)$$
 784
 785 Consequently, if the other 3 parameters are constant, the implication
 786 index decreases by $\sqrt{n}$.
 787 The quality of implication is therefore all the better, a specific
 788 property of the SIA compared to other indicators used in the
 789 literature~\cite{Grasab}.
 790 This property is in accordance with statistical and semantic
 791 expectations regarding the credit given to the frequency of
 792 observations.
 793 Since the partial derivatives of $q$ (at least one of them) are
 794 non-linear according to the variable parameters involved, we are
 795 dealing with a non-linear dynamic system\footnote{"Non-linear systems
 796   are systems that are known to be deterministic but for which, in
 797   general, nothing can be predicted because calculations cannot be
 798   made"~\cite{Ekeland} p. 265.} with all the epistemological
 799 consequences that we will consider elsewhere.
 800
 801
 802
 803 \subsection{Numerical example}
 804 In a first experiment, we observe the occurrences: $n = 100$, $n_a =
 805 20$, $n_b = 40$ (hence $n_b=60$, $ n_{a   \wedge \overline{b}} = 4$).
 806 The application of formula (\ref{eq2.1}) gives = -2.309.
 807 In a 2nd experiment, $n$ and $n_a$ are unchanged but the occurrences
 808 of $b$ and counter-examples $n_{a   \wedge \overline{b}}$ increase by one unit.
 809
 810 At the initial point of the space of the 4 variables, the partial
 811 derivatives that only interest us (according to $n_b$ and $n_{a
 812   \wedge \overline{b}}$) have respectively the following values when
 813 applying formulas (\ref{eq2.3}) and (\ref{eq2.4}): $\frac{\partial
 814   q}{\partial n_b} = 0.0385$ and $\frac{\partial q}{\partial n_{a
 815   \wedge \overline{b}}}  = 0.2887$.
 816
 817 As $\Delta n_b$, $\Delta n_{\overline{b}}$ and $\Delta  n_{a
 818   \wedge \overline{b}} $ are equal to 1, -1 and 1, then $\Delta q$ is
 819 equal to: $0.0385 + 0.2887 + o(\Delta q) = 0.3272 + o(\Delta q)$ and
 820 the approximate value of $q$ in the second experiment is $-2.309 +
 821 0.2887 + o(\Delta q)= -1.982 +o(\Delta q)$ using the first order
 822 development of $q$ (formula (\ref{eq2.2})).
 823 However, the calculation of the new implication index $q$ at the point
 824 of the 2nd experiment is, by the use of (\ref{eq2.1}): $-1.9795$, a
 825 value well approximated by the development of $q$.
 826
 827
 828
 829 \subsection{A first differential relationship of $\varphi$ as a function of function $q$}
 830 Let us consider the intensity of implication $\varphi$ as a function
 831 of $q(a,\overline{b})$:
 832 $$\varphi(q)=\frac{1}{\sqrt{2\pi}}\int_q^{\infty}e^{-\frac{t^2}{2}}$$
 833 We can then examine how $\varphi(q)$ varies when $q$ varies in the neighberhood of a given value $(a,b)$, knowing how $q$ itself varies according to the 4 parameters that determine it. By derivation of the integration bound, we obtain:
 834 \begin{equation}
 835   \frac{d\varphi}{dq}=-\frac{1}{\sqrt{2\pi}}e^{-\frac{q^2}{2}} < 0
 836   \label{eq2.6}
 837 \end{equation}
 838 This confirms that the intensity increases when $q$ decreases, but the growth rate is specified by the formula, which allows us to study more precisely the variations of $\varphi$. Since the derivative of $\varphi$ from $q$ is always negative, the function $\varphi$ is decreasing.
 839
 840 {\bf Numerical example}\\
 841 Taking the values of the occurrences observed in the 2 experiments
 842 mentioned above, we find for $q = -2.309$, the value of the intensity
 843 of implication  $\varphi(q)$ is equal to 0.992. Applying formula
 844 (\ref{eq2.6}), the derivative of $\varphi$ with respect to $q$ is:
 845 -0.02775 and the negative increase in intensity is then: -0.02775,
 846 $\Delta q$ = 0.3272. The approximate first-order intensity is
 847 therefore: $0.992-\Delta q$ or 0.983. However, the actual calculation
 848 of this intensity is, for $q= -1.9795$,  $\varphi(q) = 0.976$.
 849
 850
 851
 852 \subsection{Examination of other indices}
 853 Unlike the core index $q$ and the intensity of implication, which
 854 measures quality through probability (see definition 2.3), the other
 855 most common indices are intended to be direct measures of quality.
 856 We will examine their respective sensitivities to changes in the
 857 parameters used to define these indices.
 858 We keep the ratings adopted in paragraph 2.2 and select indices that
 859 are recalled in~\cite{Grasm},~\cite{Lencaa}  and~\cite{Grast2}.
 860
 861 \subsubsection{The Loevinger Index}
 862
 863 It is an "ancestor" of the indices of
 864 implication~\cite{Loevinger}. This index, rated $H(a,b)$, varies from
 865 1 to $-\infty$. It is defined by: $H(a,b) =1-\frac{n n_{a \wedge
 866     b}}{n_a n_b}$. Its partial derivative with respect to the variable number of counter-examples is therefore:
 867 $$\frac{\partial H}{\partial n_{a \wedge \overline{b}}}=-\frac{n}{n_a n_b}$$
 868 Thus the implication index is always decreasing with $n_{a \wedge
 869   \overline{b}}$. If it is "close" to 1, implication is "almost"
 870 satisfied. But this index has the disadvantage, not referring to a
 871 probability scale, of not providing a probability threshold and being
 872 invariant in any dilation of $E$, $A$, $B$ and $A \cap \overline{B}$.
 873
 874
 875 \subsubsection{The Lift Index}
 876
 877 It is expressed by: $l =\frac{n n_{a \wedge b}}{n_a n_b}$.
 878 This expression, linear with respect to the examples, can still be
 879 written to highlight the number of counter-examples:
 880 $$l =\frac{n (n_a - n_{a \wedge \overline{b}})}{n_a n_b}$$
 881 To study the sensitivity of the $l$ to parameter variations, we use:
 882 $$\frac{\partial l}{\partial n_{a \wedge \overline{b}} } =
 883 -\frac{1}{n_a n_b}$$
 884 Thus, the variation of the Lift index is independent of the variation
 885 of the number of counter-examples.
 886 It is a constant that depends only on variations in the occurrences of $a$ and $b$. Therefore, $l$ decreases when the number of counter-examples increases, which semantically is acceptable, but the rate of decrease does not depend on the rate of growth of $n_{a \wedge \overline{b}}$.
 887
 888 \subsubsection{Confidence}
 889
 890 This index is the best known and most widely used thanks to the sound
 891 box available in an Anglo-Saxon publication~\cite{Agrawal}.
 892 It is at the origin of several other commonly used indices which are only variants satisfying this or that semantic requirement... Moreover, it is simple and can be interpreted easily and immediately.
 893 $$c=\frac{n_{a \wedge b}}{n_a} = 1-\frac{n_{a \wedge \overline{b}}}{n_a}$$
 894
 895 The first form, linear with respect to the examples, independent of
 896 $n_b$, is interpreted as a conditional frequency of the examples of
 897 $b$ when $a$ is known.
 898 The sensitivity of this index to variations in the occurrence of
 899 counter-examples is read through the partial derivative:
 900 $$\frac{\partial c}{\partial n_{a \wedge \overline{b}} } =
 901 -\frac{1}{n_a }$$
 902
 903
 904 Consequently, confidence increases when $n_{a \wedge \overline{b}}$
 905 decreases, which is semantically acceptable, but the rate of variation
 906 is constant, independent of the rate of decrease of this number, of
 907 the variations of $n$ and $n_b$.
 908 This property seems not to satisfy intuition.
 909 The gradient of $c$ is expressed only in relation to $n_{a \wedge
 910   \overline{b}}$ and $n_a$: $\displaystyle \binom{ -\frac{1}{n_a}}{\frac{n_{a \wedge b}}{n_a^2}}$
 911
 912 This may also appear to be a restriction on the role of parameters in
 913 expressing the sensitivity of the index.
 914
 915 \section{Gradient field, implicative field}
 916 We highlight here the existence of fields generated by the variables
 917 of the corpus.
 918
 919 \subsection{Existence of a gradient field}
 920 Like our Newtonian physical space, where a gravitational field emitted
 921 by each material object acts, we can consider that it is the same
 922 around each variable.
 923 For example, the variable $a$ generates a scalar field whose value in
 924 $b$ is maximum and equal to the intensity of implication or the
 925 implicition index $q(a,\overline{b})$.
 926 Its action spreads in V according to differential laws as J.M. Leblond
 927 says, in~\cite{Leblond} p.242.
 928
 929 Let us consider the space $E$ of dimension 4 where the coordinates of
 930 the points $M$ are the parameters relative to the binary variables $a$
 931 and $b$, i.e. ($n$, $n_a$, $n_b$, $n_{a\wedge \overline{b}}$). $q(a,\overline{b})$ is the realization of a scalar field, as an application of $\mathbb{R}^4$ in $\mathbb{R}$ (immersion of $\mathbb{N}^4$ in $\mathbb{R}^4$).
 932 For the grad vector $q$ of components the partial derivatives of $q$
 933 with respect to variables $n$, $n_a$, $n_b$, $n_{a\wedge
 934   \overline{b}}$ to define a gradient field - a particular vector
 935 field that we will also call implicit field - it must respect the
 936 Schwartz criterion of an exact total differential, i.e.:
 937
 938 $$\frac{\partial}{\partial n_{a\wedge   \overline{b}}}\left(
 939 \frac{\partial q}{\partial n_b} \right) =\frac{\partial}{\partial n_b}\left(
 940 \frac{\partial q}{\partial n_{a\wedge   \overline{b}}} \right) $$
 941 and the same for the other variables taken in pairs. However, we have,
 942 through the formulas (\ref{eq2.3}) and (\ref{eq2.4})
 943
 944 $$ \frac{\partial}{\partial n_{a \wedge b}} \left( \frac{\partial q}{\partial n_b} \right) = \frac{1}{2}  \left( \frac{n_a}{n}\right)^{-\frac{1}{2}}   \left( \frac{n_{\overline{b}}}{n}\right)^{-\frac{3}{2}}  = \frac{\partial}{\partial n_b}\left(
 945 \frac{\partial q}{\partial n_{a\wedge   \overline{b}}} \right)$$
 946
 947 Thus, to the vector field C = ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) of $E$, the nature of which we will specify, corresponds a gradient field $G$ which is said to be derived from the {\bf potential} $q$.
 948 The gradient grad $q$ is therefore the vector that represents the spatial variation of the field intensity.
 949 It is directed from low field values to higher values. By following the gradient at each point, we follow the increase in the intensity of the field's implication in space and, in a way, the speed with which it changes as a result of the variation of one or more parameters.
 950
 951 For example, if we set 3 of the parameters $n$, $n_a$, $n_b$, $n_{\overline{b}}$ given by the realization of the couple ($a$, $b$), the gradient is a vector whose direction indicates the growth or decrease of $q$, therefore the decrease or increase of $|q|$ and, as a consequence of $\varphi$ the variations of the 4th parameter.
 952 We have indicated this above by interpreting formula (\ref{eq2.5}).
 953
 954
 955 \subsection{Level or equipotential lines}
 956 An equipotential (or level) line or surface in the $C$ field is a curve of $E$ along which or on which a variable point $M$ maintains the same value of the potential $q$ (e.g. isothermal lines on the globe or level lines on an IGN map).
 957
 958 The equation of this surface\footnote{In differential geometry, it seems that this surface is a (quasi) differentiable variety on board, compact, homeomorphic with closed pavement of the intervals of variation of the 4 parameters. Note that the point whose component $n_b$ is equal to $n$ (therefore = 0) is a singular point ( "catastrophic" in René Thom's sense) of the surface and $q$, the potential, is not differentiable at this point. Everywhere else, the surface is distinguishable, the points are all regular. If time, for example, parameters the observations of the process of which ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) is a realization, at each instant corresponds a morphological fiber of the process represented by such a surface in space-time.} is, of course:
 959 $$ q(a,\overline{b}) - \frac{n_{a \wedge \overline{b}}-
 960   \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} = 0$$
 961
 962
 963 Therefore, on such a curve, the scalar product $grad~ q. dM$ is zero.
 964 This is interpreted as indicating the orthogonality of the gradient with the tangent or hyperplane tangent to the curve, i.e. with the equipotential line or surface.
 965 In a kinematic interpretation of our problem, the velocity of $M$'s path on the equipotential surface is orthogonal to the gradient in $M$.
 966
 967 As an illustration  in Figure~\ref{chap2fig2}, for a potential $F$ depending on only 2 variables, the figure below shows the orthogonal direction of the gradient with respect to the different equipotential surfaces along which the potential $F$ does not vary but passes from $F=7$ to $F= 10$.
 968
 969 \begin{figure}[htbp]
 970   \centering
 971 \includegraphics[scale=1]{chap2fig2}
 972  \caption{Illustration of potential of 2 variables}
 973 \label{chap2fig2}       % Give a unique label
 974 \end{figure}
 975
 976 It is possible in the case of the potential $q$, to build equipotential surfaces as above (two-dimensional for ease of representation).
 977 It is understandable that the more intense the field is, the tighter the surfaces are. For a given value of $q$, in this case, 3 variables are set, for example $n$, $n_a$, $n_b$ and a value of $q$ compatible with the field constraints. Either: $n = 104$; $n_a = 1600 \leq nb = 3600$ and $q = -2$ or $|q| = 2$. We then find $n_{\overline{b}}= 528$ using formula~(\ref{eq2.1}).
 978 But the points ($10^4$, $1600$, $5100$, $5100$, $728$) and ($100$, $25$, $64$, $3$) also belong to this surface and the same equipotential curve.
 979 The point ($104$, $1600$, $3600$, $3600$, $928$) belongs to the equipotential curve $q=-3$). In fact, on this entire surface, we obtain a kind of homeostasis of the intensity of implication.
 980
 981 The expression of the function $q$ of the variable shows that it is convex.
 982 This property proves that the segment of points $t.M_1 + (1-t).M_2$, for $t \in [0,1]$ which connects two points $M_1$ and $M_2$ of the same equipotential line is entirely contained in its convexity.
 983 The figure below shows two adjacent equipotential surfaces $\sum_1$ and $\sum_2$ in the implicit field  corresponding to two values of the potential $q_1$ and $q_2$.
 984 At point $M_1$ the scalar field therefore takes the value $q_1$. $M_2$ is the intersection of the normal from $M_1$ with  $\sum_2$. Given the direction of the normal vector $\vec{n}$ the difference $\delta = q2 - q1$, variation of the field when we go from  $\sum_1$ to  $\sum_2$ is then equal to the opposite of the norm of the gradient from $q$ to $M_1$ is $\frac{\partial q}{\partial n}$, if $n_a$, $n_b$ and $n_{a \wedge \overline{b}}$ are fixed.
 985
 986 \begin{figure}[htbp]
 987   \centering
 988 \includegraphics[scale=1]{chap2fig3}
 989  \caption{Illustration of equipotential surfaces}
 990 \label{chap2fig3}       % Give a unique label
 991 \end{figure}
 992
 993 Thus, the space $E$ can be laminated by equipotential surfaces corresponding to successive values of $q$ relative to the cardinals ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$) which would be varied.
 994 This situation corresponds to the one envisaged in the SIA modeling.
 995 Fixing $n$, $n_a$ and $n_b$, we consider the random sets $X$ and $Y$ of the same cardinals as $A(n_a)$ and $B(n_b)$ and whose cardinal follows a Poisson's law or a binomial law, according to the choice of the model.
 996 The different gradient fields, real "lines of force", associated with them are orthogonal to the surfaces defined by the corresponding values of $Q$.
 997 This reminds us, in the theoretical framework of potential, of the premonitory metaphor of "implicit flow" that we expressed in~\cite{Grase} and that we will discuss again in Chapter 14 of the book.
 998 Behind this notion we can imagine a transport of information of variable intensity in a causal universe.
 999 We illustrate this metaphor with the study of the properties of the two-layer implicit cone (see §2.8).
1000 Moreover and intuitively, the implication $a\Rightarrow b$ is of as good quality as the equipotential surface $C$ of the contingency covers random equipotential surfaces depending on the random variable.
1001 Let us recall the relationship that unites the potential q with the intensity:
1002 $$\varphi(a,b) =\frac{1}{\sqrt{2\pi}}\int_{q(a,\overline{b})}^{\infty}e^{-\frac{t^2}{2}} dt$$
1003
1004 \noindent {\bf remark 1}\\
1005 It can be seen that the intensity is also invariant on any equipotential surface of its own variations.
1006 The surface portions generated by $q$ and by $\varphi$ are even in one-to-one correspondence.
1007 In intuitive terms, we can say that when one "swells" the other "deflates".\\
1008
1009 \noindent {\bf remark 2}\\
1010 Let us note once again a particularity of the intensity of implication.
1011 While the surfaces generated by the variations of the 4 parameters of the data are not invariant by the same dilation of the parameters, those associated with the indices cited in §2.4 are invariant and have the same undifferentiated geometric shape.
1012
1013 \section{Implication-inclusion}
1014 \subsection{Foundational and problematic situation}
1015 Three reasons led us to improve the model formalized by the intensity of involvement:
1016 \begin{itemize}
1017 \item when the size of the samples processed, and in particular that of $E$, increases (by around a thousand and more), the intensity $\varphi(a,b)$ no longer tends to be sufficiently discriminating because its values can be very close to 1, while the inclusion whose quality it seeks to model is far from being satisfied (phenomenon reported in~\cite{Bodina} which deals with large student populations through international surveys);
1018 \item  the previous quasi-implication model essentially uses the measure of the strength of rule $a \Rightarrow b$.
1019   However, taking into account a concomitance of $\neg b \Rightarrow \neg a$ (contraposed of implication) is useful or even essential to reinforce the affirmation of a good quality of the quasi-implicative, possibly quasi-causal, relationship of $a$ over $b$\footnote{This phenomenon is reported by Y. Kodratoff in~\cite{Kodratoff}.}.
1020   At the same time, it could make it possible to correct the difficulty mentioned above (if $A$ and $B$ are small compared to $E$, their complementary will be important and vice versa);
1021 \item  the overcoming of Hempel's paradox (see Appendix 3 of this chapter).
1022   \end{itemize}
1023
1024 \subsection{An inclusion index}
1025
1026 The solution\footnote{J. Blanchard provides in~\cite{Blanchardb} an answer to this problem by measuring the "equilibrium gap".} we provide uses both the intensity of implication and another index that reflects the asymmetry between situations $S_1 = (a \wedge b)$ and $S_1' = (a \wedge \neg b)$, (resp. $S2 = (\neg a \wedge \neg b)$ and $S_2' = (a \wedge \neg  b)$) in favour of the first named.
1027 The relative weakness of instances that contradict the rule and its counterpart is therefore fundamental.
1028 Moreover, the number of counter-examples $n_{a \wedge \overline{b}}$ to $a\ Rightarrow b$ is the one to the contraposed one.
1029 To account for the uncertainty associated with a possible bet of belonging to one of the two situations ($S_1$ or $S_1'$, (resp. $S_2$ or $S_2'$)), we therefore refer to Shannon's  concept of entropy~\cite{Shannon}:
1030 $$H(b\mid a) = - \frac{n_{a\wedge b}}{n_a}log_2   \frac{n_{a\wedge b}}{n_a}  - \frac{n_{a\wedge \overline{b}}}{n_a}log_2   \frac{n_{a\wedge \overline{b}}}{n_a}$$
1031 is the conditional entropy relating to boxes $(a \wedge b)$ and $(a \wedge \neg b)$ when $a$ is realized
1032
1033 $$H(\overline{a}\mid \overline{b}) = - \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}}    - \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}log_2   \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}$$
1034
1035 is the conditional entropy relative to the boxes $(\neg a \wedge \neg b)$ and $(a \wedge \neg b)$ when not $b$ is realized.
1036
1037 These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$.
1038 Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations.
1039 For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints:
1040
1041 \begin{enumerate}
1042 \item It shall integrate the entropy values and, to contrast them, for example, integrate these values into the square.
1043 \item As this square varies from 0 to 1, in order to denote the imbalance and therefore the inclusion, in order to oppose entropy, the value retained will be the complement to 1 of its square as long as the number of counter-examples is less than half of the observations of a (resp. non b).
1044   Beyond these values, as the implications no longer have an inclusive meaning, the criterion will be assigned the value 0.
1045 \item In order to take into account the two information specific to $a\Rightarrow b$ and $\neg b \Rightarrow \neg a$, the product will report on the simultaneous quality of the values retained.
1046 The product has the property of cancelling itself as soon as one of its terms is cancelled, i.e. as soon as this quality is erased.
1047 \item Finally, since the product has a dimension 4 with respect to entropy, its fourth root will be of the same dimension.
1048 \end{enumerate}
1049
1050 Let $\alpha=\frac{n_a}{n}$ be the frequency of a and $\overline{b}=\frac{n_{\overline{b}}}{n}$ be the frequency of non b.
1051 Let $t=\frac{n_{a \wedge \overline{b}}}{n}$  be the frequency of counter-examples, the two significant terms of the respective qualities of involvement and its counterpart are:
1052
1053 \begin{eqnarray*}
1054   h_1(t) = H(b\mid a) = - (1-\frac{t}{\alpha}) log_2 (1-\frac{t}{\alpha})   - \frac{t}{\alpha} log_2  \frac{t}{\alpha} & \mbox{ if }t \in [0,\frac{\alpha}{2}[\\
1055   h_1(t) = 1 & \mbox{ if }t \in [\frac{\alpha}{2},\alpha]\\
1056   h_2(t)= H(\overline{a}\mid \overline{b}) = -  (1-\frac{t}{\overline{\beta}}) log_2  (1-\frac{t}{\overline{\beta}})    -  \frac{t}{\overline{b}} log_2  \frac{t}{\overline{b}} & \mbox{ if }t \in [0,\frac{\overline{\beta}}{2}[\\
1057   h_2(t)= 1 & \mbox{ if }t \in [\frac{\overline{\beta}}{2},\overline{\beta}]
1058 \end{eqnarray*}
1059 Hence the definition for determining the entropic criterion:
1060 \definition: The inclusion index of A, support of a, in B, support of b, is the number:
1061 $$i(a,b) = \left[ (1-h_1^2(t)) (1-h_2^2(t)))   \right]^{\frac{1}{4}}$$
1062
1063 which integrates the information provided by the realization of a small number of counter-examples, on the one hand to the rule $a \Rightarrow b$ and, on the other hand, to the rule $\neg b \Rightarrow \neg a$.
1064
1065 \subsection{The implication-inclusion index}
1066
1067 The intensity of implication-inclusion (or entropic intensity), a new measure of inductive quality, is the number:
1068
1069 $$\psi(a,b)= \left[  i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$
1070 which integrates both statistical surprise and inclusive quality.
1071
1072 The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure 4{\bf TO CHANGE}, for $n_a$ and $n_b$ fixed.
1073 Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal.
1074 In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important.
1075
1076
1077 {\bf FIGURE 4}
1078
1079
1080 \noindent Example 1\\
1081  \begin{tabular}{|c|c|c|c|}\hline
1082   & $b$ & $\overline{b}$ & margin\\ \hline
1083   $a$ & 200 & 400& 600 \\ \hline
1084   $\overline{a}$ & 600 & 2800& 3400 \\ \hline
1085   margin & 800 & 3200& 4000 \\ \hline
1086  \end{tabular}
1087  \\
1088  In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$).
1089  The entropic values of the experiment are $h_1=h_2=0$.
1090  The value of the moderator coefficient is therefore $i(a,b)=0$.
1091  Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$.
1092 Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor.