chapter2.tex

   1 %%%%%%%%%%%%%%%%%%%%% chapter.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   2 %
   3 % sample chapter
   4 %
   5 % Use this file as a template for your own input.
   6 %
   7 %%%%%%%%%%%%%%%%%%%%%%%% Springer-Verlag %%%%%%%%%%%%%%%%%%%%%%%%%%
   8 %\motto{Use the template \emph{chapter.tex} to style the various elements of your chapter content.}
   9 \chapter{From the founding situations of the SIA to its formalization}
  10 \label{intro} % Always give a unique label
  11 % use \chaptermark{}
  12 % to alter or adjust the chapter heading in the running head
  13
  14
  15
  16 \abstract{
  17 Starting from mathematical didactic situations, the implicitative
  18 statistical analysis method develops as problems are encountered and
  19 questions are asked.
  20 Its main objective is to structure data crossing subjects and
  21 variables, to extract inductive rules between variables and, based on
  22 the contingency of these rules, to explain and therefore forecast in
  23 various fields: psychology, sociology, biology, etc.
  24 It is for this purpose that the concepts of intensity of implication,
  25 class cohesion, implication-inclusion, significance of hierarchical
  26 levels, contribution of additional variables, etc., are based.
  27 Similarly, the processing of binary variables (e.g., descriptors) is
  28 gradually being supplemented by the processing of modal, frequency
  29 and, recently, interval and fuzzy variables.
  30 }
  31
  32 \section{Preamble}
  33
  34 Human operative knowledge is mainly composed of two components: that
  35 of facts and that of rules between facts or between rules themselves.
  36 It is his learning that, through his culture and his personal
  37 experiences, allows him to gradually develop these forms of knowledge,
  38 despite the regressions, the questioning, the ruptures that arise at
  39 the turn of decisive information.
  40 However, we know that these dialectically contribute to ensuring a
  41 balanced operation.
  42 However, the rules are inductively formed in a relatively stable way
  43 as soon as the number of successes, in terms of their explanatory or
  44 anticipatory quality, reaches a certain level (of confidence) from
  45 which they are likely to be implemented.
  46 On the other hand, if this (subjective) level is not reached, the
  47 individual's economy will make him resist, in the first instance, his
  48 abandonment or criticism.
  49 Indeed, it is costly to replace the initial rule with another rule
  50 when a small number of infirmations appear, since it would have been
  51 reinforced by a large number of confirmations.
  52 An increase in this number of negative instances, depending on the
  53 robustness of the level of confidence in the rule, may lead to its
  54 readjustment or even abandonment.
  55 Laurent Fleury~\cite{Fleury}, in his thesis, correctly cites the
  56 example - which Régis repeats - of the highly admissible rule: "all
  57 Ferraris are red".
  58 This very robust rule will not be abandoned when observing a single or
  59 two counter-examples.
  60 Especially since it would not fail to be quickly
  61 re-comforted.
  62
  63 Thus, contrary to what is legitimate in mathematics, where not all
  64 rules (theorem) suffer from exception, where determinism is total,
  65 rules in the human sciences, more generally in the so-called "soft"
  66 sciences, are acceptable and therefore operative as long as the number
  67 of counter-examples remains "bearable" in view of the frequency of
  68 situations where they will be positive and effective.
  69 The problem in data analysis is then to establish a relatively
  70 consensual numerical criterion to define the notion of a level of
  71 confidence that can be adjusted to the level of requirement of the
  72 rule user.
  73 The fact that it is based on statistics is not surprising.
  74 That it has a property of non-linear resistance to noise (weakness of
  75 the first counter-example(s)) may also seem natural, in line with the
  76 "economic" meaning mentioned above.
  77 That it collapses if counter-examples are repeated also seems to have
  78 to guide our choice in the modeling of the desired criterion.
  79 This text presents the epistemological choice we have made.
  80 As such it is therefore refutable, but the number of situations and
  81 applications where it has proved relevant and fruitful leads us to
  82 reproduce its genesis here.
  83
  84 \section{Introduction}
  85
  86 Different theoretical approaches have been adopted to model the
  87 extraction and representation of imprecise (or partial) inference
  88 rules between binary variables (or attributes or characters)
  89 describing a population of individuals (or subjects or objects).
  90 But the initial situations and the nature of the data do not change
  91 the initial problem.
  92 It is a question of discovering non-symmetrical inductive rules to
  93 model relationships of the type "if a then almost b".
  94 This is, for example, the option of Bayesian networks~\cite{Amarger}
  95 or Galois lattices~\cite{Simon}.
  96 But more often than not, however, since the correlation and the
  97 ${\chi}^2$ test are unsuitable because of their symmetric nature,
  98 conditional probability~\cite{Loevinger, Agrawal,Grasn}  remains the
  99 driving force behind the definition of the association, even when the
 100 index of this selected association is multivariate~\cite{Bernard}.
 101
 102
 103
 104 Moreover, to our knowledge, on the one hand, most often the different
 105 and interesting developments focus on proposals for a partial
 106 implication index for binary data~\cite{Lermana} or \cite{Lallich}, on
 107 the other hand, this notion is not extended to other types of
 108 variables, to extraction and representation according to a rule graph
 109 or a hierarchy of meta-rules; structures aiming at access to the
 110 meaning of a whole not reduced to the sum of its
 111 parts~\cite{Seve}\footnote{This is what the philosopher L. Sève
 112   emphasizes :"... in the non-additive, non-linear passage of the
 113   parts to the whole, there are properties that are in no way
 114   precontained in the parts and which cannot therefore be explained by
 115   them" }, i.e. operating as a complex non-linear system.
 116 For example, it is well known, through usage, that the meaning of a
 117 sentence does not completely depend on the meaning of each of the
 118 words in it (see the previous chapter, point 4).
 119
 120 Let us return to what we believe is fertile in the approach we are
 121 developing.
 122 It would seem that, in the literature, the notion of implication index
 123 is also not extended to the search for subjects and categories of
 124 subjects responsible for associations.
 125 Nor that this responsibility is quantified and thus leads to a
 126 reciprocal structuring of all subjects, conditioned by their
 127 relationships to variables.
 128 We propose these extensions here after recalling the founding
 129 paradigm.
 130
 131
 132 \section{Implication intensity in the binary case}
 133
 134 \subsection{Fundamental and founding situation}
 135
 136 A set of objects or subjects E is crossed with variables
 137 (characters, criteria, successes,...) which are interrogated as
 138 follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$
 139 implies instantiating variable $b$?
 140 In other words, do the subjects tend to be $b$ if we know that they are
 141 $a$?".
 142 In natural, human or life sciences situations, where theorems (if $a$
 143 then $b$) in the deductive sense of the term cannot be established
 144 because of the exceptions that taint them, it is important for the
 145 researcher and the practitioner to "mine into his data" in order to
 146 identify sufficiently reliable rules (kinds of "partial theorems",
 147 inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship,
 148 a genesis, to describe, structure a population and make the assumption
 149 of a certain stability for descriptive and, if possible, predictive
 150 purposes.
 151 But this excavation requires the development of methods to guide it
 152 and to free it from trial and error and empiricism.
 153
 154
 155 \subsection{Mathematization}
 156
 157 To do this, following the example of the I.C. Lerman similarity
 158 measurement method \cite{Lerman,Lermanb}, following the classic
 159 approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we
 160 define~\cite{Grasb,Grasf} the confirmatory quality measure of the
 161 implicative relationship $a \Rightarrow b$ from the implausibility of
 162 the occurrence in the data of the number of cases that invalidate it,
 163 i.e. for which $a$ is verified without $b$ being verified. This
 164 amounts to comparing the difference between the quota and the
 165 theoretical if only chance occurred\footnote{"...[in agreement with
 166     Jung] if the frequency of coincidences does not significantly
 167   exceed the probability that they can be calculated by attributing
 168   them solely by chance to the exclusion of hidden causal
 169   relationships, we certainly have no reason to suppose the existence
 170   of such relationships.", H. Atlan~\cite{Atlana}}.
 171 But when analyzing data, it is this gap that we take into account and
 172 not the statement of a rejection or null hypothesis eligibility.
 173 This measure is relative to the number of data verifying $a$ and not
 174 $b$ respectively, the circumstance in which the involvement is
 175 precisely put in default.
 176 It quantifies the expert's "astonishment" at the unlikely small number
 177 of counter-examples in view of the supposed independence between the
 178 variables and the numbers involved.
 179
 180 Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$,
 181 $c$,...
 182 In the classical paradigmatic situation and initially retained, it is
 183 about the performance (success-failure) to items of a questionnaire.
 184 To a finite set $E$ of $n$ subjects $x$, functions of the type : $x
 185 \rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies
 186 or has the character $a$ and $0$ (or $a(x) = false$) otherwise are
 187 associated by abuse of writing.
 188 In artificial intelligence, we will say that $x$ is an example or an
 189 instance for $a$ if $a(x) = 1$ and a counter-example if not.
 190
 191
 192 The $a \Rightarrow b$ rule is logically true if for any $x$ in the
 193 sample, $b(x)$ is null only if $a(x)$ is also null; in other words if
 194 set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the
 195 $x$ for which $b(x)=1$.
 196 However, this strict inclusion is only exceptionally observed in the
 197 pragmatically encountered experiments.
 198 In the case of a knowledge questionnaire, we could indeed observe a
 199 few rare students passing an item $a$ and not passing item $b$,
 200 without contesting the tendency to pass item $b$ when we have passed
 201 item $a$.
 202 With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or
 203 $n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the
 204 counter-examples (or) that must be taken into account in order to
 205 statistically accept whether or not to keep the quasi-implication or
 206 quasi-rule  $a \Rightarrow b$.  Thus, it is from the dialectic of
 207 example-counter-examples that the rule appears as the overcoming of
 208 contradiction.
 209
 210 \subsection{Formalization}
 211
 212 To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of
 213 $E$, chosen randomly and independently (absence of a priori link
 214 between these two parts) and of the same respective cardinals as $A$
 215 and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
 216
 217 We will then say:
 218
 219 \definition $a \Rightarrow b$ is acceptable at confidence level
 220 $1-\alpha$ if and only if
 221 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
 222
 223 \begin{figure}[htbp]
 224   \centering
 225 \includegraphics[scale=0.34]{chap2fig1.png}
 226  \caption{The dark grey parts correspond to the counter-examples of the
 227    implication $a \Rightarrow b$}
 228 \label{chap2fig1}
 229 \end{figure}
 230
 231 It is established \cite{Lermanb} that, for a certain drawing process,
 232 the random variable $Card(X\cap \overline{Y})$ follows the Poisson law
 233 of parameter $\frac{n_a n_{\overline{b}}}{n}$.
 234 We achieve this same result by proceeding differently in the following
 235 way:
 236
 237 Note $X$ (resp. $Y$) the random subset of binary transactions where
 238 $a$ (resp. $b$) would appear, independently, with the frequency
 239 $\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$).
 240 To specify how the transactions specified in variables $a$ and $b$,
 241 respectively $A$ and $B$, are extracted, for example, the following
 242 semantically permissible assumptions are made regarding the
 243 observation of the event: $[a=1~ and~ b=0]$. $(A\cap
 244 \overline{B})$\footnote{We then note $\overline{v}$ the variable
 245   negation of $v$  (or $not~ v$) and $\overline{P}$ the complementary
 246   part of the part P of E.} is the subset of transactions,
 247 counter-examples of implication $a \Rightarrow b$:
 248
 249 Assumptions:
 250 \begin{itemize}
 251 \item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent
 252   random variables;
 253 \item h2: the law of the number of events occurring in the time
 254   interval $[t,~ t+T[$ depends only on T;
 255 \item h3: two such events cannot occur simultaneously
 256 \end{itemize}
 257
 258 It is then demonstrated (for example in~\cite{Saporta}) that the
 259 number of events occurring during a period of fixed duration $n$
 260 follows a Poisson's law of parameter $c.n$ where $c$ is called the
 261 rate of the apparitions process during the unit of time.
 262
 263
 264 However, for each transaction assumed to be random, the event $[a=1]$
 265 has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
 266 has as probability the frequency, therefore the joint event $[a=1~
 267   and~ b=0]$ has for probability estimated by the frequency
 268 $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
 269
 270 We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
 271
 272 Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter :
 273 $$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
 274
 275 As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
 276
 277 Consequently, the probability that the hazard will lead, under the
 278 assumption of the absence of an a priori link between $a$ and $b$, to
 279 more counter-examples than those observed is:
 280
 281 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
 282 \sum^{card(A\cap \overline{B})}_{s=0}  e^{-\lambda}\frac{\lambda^s}{s!} $$
 283
 284  But other legitimate drawing processes lead to a binomial law, or
 285  even a hypergeometric law (itself not semantically adapted to the
 286  situation because of its symmetry). Under suitable convergence
 287  conditions, these two laws are finally reduced to the Poisson Law
 288  above (see Annex to this chapter).
 289
 290 If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
 291 into the variable:
 292
 293 $$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) -  \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}  $$
 294
 295 In the experimental realization, the observed value of
 296 $Q(a,\overline{b})$ is $q(a,\overline{b})$.
 297 It estimates a gap between the contingency $(card(A\cap
 298 \overline{B}))$ and the value it would have taken if there had been
 299 independence between $a$ and $b$.
 300
 301 \definition $$q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-  \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}$$
 302 is called the implication index, the number used as an indicator of
 303 the non-implication of $a$ to $b$.
 304 In cases where the approximation is properly legitimized (for example
 305 $\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
 306 $Q(a,\overline{b})$ approximately follows the reduced centered normal
 307 distribution. The intensity of implication, measuring the quality of
 308 $a\Rightarrow b$, for $n_a\leq n_b$ and  $nb \neq n$, is then defined
 309 from the index $q(a,\overline{b})$ by:
 310
 311 \definition
 312 The implication intensity  that measures the inductive quality of a
 313 over b is:
 314 $$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
 315 \frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
 316 e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
 317 $$\varphi(a,b)=0,~ otherwise$$
 318 As a result, the definition of statistical implication becomes:
 319 \definition
 320 Implication  $a\Rightarrow b$ is admissible at confidence level
 321 $1-\alpha $ if and only if:
 322 $$\varphi(a,b)\geq 1-\alpha$$
 323
 324
 325 It should be recalled that this modeling of quasi-implication measures
 326 the astonishment to note the smallness of counter-examples compared to
 327 the surprising number of instances of implication.
 328 It is a measure of the inductive and informative quality of
 329 implication. Therefore, if the rule is trivial, as in the case where
 330 $B$ is very large or coincides with $E$, this astonishment becomes
 331 small.
 332 We also demonstrate~\cite{Grasf} that this triviality results in a
 333 very low or even zero intensity of implication: If, $n_a$ being fixed
 334 and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
 335 towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
 336 define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
 337 $A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
 338 the inductive confidence, measured by statistical surprise, is
 339 insufficient.
 340
 341 {\bf \remark Total correlation, partial correlation}
 342
 343
 344 We take here the notion of correlation in a more general sense than
 345 that used in the domain that develops the linear correlation
 346 coefficient (linear link measure) or the correlation ratio (functional
 347 link measure).
 348 In our perspective, there is a total (or partial) correlation between
 349 two variables $a$ and $b$ when the respective events they determine
 350 occur (or almost occur) at the same time, as well as their opposites.
 351 However, we know from numerical counter-examples that correlation and
 352 implication do not come down to each other, that there can be
 353 correlation without implication and vice versa~\cite{Grasf} and below.
 354 If we compare the implication coefficient and the linear correlation
 355 coefficient algebraically, it is clear that the two concepts do not
 356 coincide and therefore do not provide the same
 357 information\footnote{"More serious is the logical error inferred from
 358   a correlation found to the existence of a causality" writes Albert
 359   Jacquard in~\cite{Jacquard}, p.159. }.
 360
 361 The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
 362 not coincide with the correlation coefficient $\rho(a, b)$ which is
 363 symmetric and which reflects the relationship between variables a and
 364 b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
 365 then
 366 $$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
 367     n_{\overline{a}}}} q(a,\overline{b})$$
 368 With the correlation considered from the point of view of linear
 369 correlation, even if correlation and implication are rather in the
 370 same direction, the orientation of the relationship between two
 371 variables is not transparent because it is symmetrical, which is not
 372 the bias taken in the SIA.
 373 From a statistical relationship given by the correlation, two opposing
 374 empirical propositions can be deduced.
 375
 376 The following dual numerical situation clearly illustrates this:
 377
 378
 379 \begin{table}[htp]
 380 \center
 381 \begin{tabular}{|l|c|c|c|}\hline
 382 \diagbox[width=4em]{$a_1$}{$b_1$}&
 383   1 & 0 & marge\\ \hline
 384   1 & 96 & 4& 100 \\ \hline
 385   0 & 50 & 50& 100 \\ \hline
 386   marge & 146 & 54& 200 \\ \hline
 387 \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
 388 \diagbox[width=4em]{$a_2$}{$b_2$}&
 389   1 & 0 & marge\\ \hline
 390   1 & 94 & 6& 100 \\ \hline
 391   0 & 52 & 48& 100 \\ \hline
 392   marge & 146 & 54& 200 \\ \hline
 393 \end{tabular}
 394
 395 \caption{Numeric example of difference between implication and
 396   correlation}
 397 \label{chap2tab1}
 398 \end{table}
 399
 400 In Table~\ref{chap2tab1}, the following correlation and implications
 401 can be computed:\\
 402 Correlation $\rho(a_1,b_1)=0.468$, Implication
 403 $q(a_1,\overline{b_1})=-4.082$\\
 404 Correlation $\rho(a_2,b_2)=0.473$, Implication  $q(a_2,\overline{b_2})=-4.041$
 405
 406
 407 Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
 408 correlated than $a_2$ and $b_2$ while, on the other hand, the
 409 implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
 410 over $b_2$ since $q1 <q2$.
 411
 412 On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
 413 finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
 414
 415 \remark  Remember that we consider not only conjunctions of variables
 416 of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
 417 or $c$..." in order to model phenomena that are concepts as it is done
 418 in learning or in artificial intelligence.
 419 The associated calculations remain compatible with the logic of the
 420 proposals linked by connectors.
 421
 422 \remark Unlike the Loevinger Index~\cite{Loevinger}  and conditional
 423 probability $(Pr[B/A])=1$ and all its derivatives, the implication
 424 intensity varies, non-linearly, with the expansion of sets $E$, $A$
 425 and $B$ and weakens with triviality (see Definition 2.3).
 426 Moreover, it
 427 is resistant to noise, especially around $0$ for, which can only make
 428 the relationship we want to model and establish statistically
 429 credible.
 430 Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
 431 maximum intensity, the inductive quality may not be strong, whereas
 432 $Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
 433 In paragraph 5, we study more closely the problem of the sensitivity
 434 and stability of the implication index as a function of small
 435 variations in the parameters involved in the study of its
 436 differential.
 437
 438 \section{Case of modal and frequency variables}
 439 \subsection{Founding situation}
 440
 441 Marc Bailleul's (1991-1994) research focuses in particular on the
 442 representation that mathematics teachers have of their own teaching.
 443 In order to highlight it, meaningful words are proposed to them that
 444 they must prioritize.
 445 Their choices are no longer binary, the words chosen by any teacher
 446 are ordered at least at the most representative.
 447 Mr. Bailleul's question then focuses on questions of the type: "if I
 448 choose this word with this importance, then I choose this other word
 449 with at least equal importance".
 450 It was therefore necessary to extend the notion of statistical
 451 implication to variables other than binary.
 452 This is the case for modal variables that are associated with
 453 phenomena where the values $a(x)$ are numbers in the interval $[0, 1]$
 454 and describe degrees of belonging or satisfaction as are fuzzy logic,
 455 for example, linguistic modifiers "maybe", "a little", "sometimes",
 456 etc.
 457 This problem is also found in situations where the frequency of a
 458 variable reflects a preorder on the values assigned by the subjects to
 459 the variables presented to them.
 460 These are frequency variables that are associated with phenomena where
 461 the values of $a(x)$ are any positive real values.
 462 This is the case when one considers a student's percentage of success
 463 in a battery of tests in different areas.
 464
 465 \subsection{Formalization}
 466
 467 J.B. Lagrange~\cite{Lagrange} has demonstrated that, in the modal
 468 case,
 469 \begin{itemize}
 470   \item if $a(x)$ and $\overline{b}(x)$ are the values taken at $x$ by
 471     the modal variables $a$ and $\overline{b}$, with $(x)=1-b(x)$
 472   \item if $s^2_a$ and $s_{\overline{b}}^2$ are the empirical variances of variables $a$ and $\overline{b}$
 473 then  the implication index, which he calls propensity index, becomes:
 474
 475 \definition
 476 $$q(a,\overline{b}) = \frac{\sum_{x\in E} a(x)\overline{b}(x)  -
 477   \frac{n_a n_{\overline{b}}}{n}}
 478 {\sqrt{\frac{(n^2s_a^2+n_a^2)(n^2+s_{\overline{b}}^2 + n_{\overline{b}}^2)}{n^3}}}$$
 479 is the index of propensity of modal variables.
 480 \end{itemize}
 481
 482 J.B. Lagrange also proves that this index coincides with the index
 483 defined previously in the binary case if the number of modalities of a
 484 and b is precisely 2, because in this case :\\
 485 $n^2s_a^2+n_a^2=n n_a$,~ ~ $ n^2+s_{\overline{b}}^2 + n_{\overline{b}}=n
 486   n_{\overline{b}}$~ ~ and ~ ~ $\sum_{x\in E} a(x)\overline{b}(x)=n_{a \wedge
 487   \overline{b}}$.
 488
 489  This solution provided in the modal case is also applicable to the
 490  case of frequency variables, or even positive numerical variables,
 491  provided that the values observed on the variables, such as a and b,
 492  have been normalized, the normalization in $[0, 1]$ being made from the maximum of the value taken respectively by $a$ and $b$ on set $E$.
 493
 494 \remark
 495 In~\cite{Regniera}, we consider rank variables that reflect a
 496 total order between choices presented to a population of judges.
 497 Each of them must order their preferential choice among a set of
 498 objects or proposals made to them.
 499 An index measures the quality of the statement of the type: "if object
 500 $a$ is ranked by judges then, generally, object $b$ is ranked higher
 501 by the same judges".
 502 Proximity to the previous issue leads to an index that is relatively
 503 close to the Lagrange index, but better adapted to the rank variable
 504 situation.
 505
 506
 507 \section{Cases of on-interval and on-interval variables}
 508 \subsection{Variables-on-intervals}
 509 \subsubsection{Founding situation}
 510
 511 For example, the following rule is sought to be extracted from a
 512 biometric data set, estimating its quality: "if an individual weighs
 513 between $65$ and $70kg$ then in general he is between $1.70$ and
 514 $1.76m$ tall".
 515 A similar situation arises in the search for relationships between
 516 intervals of student performance in two different subjects.
 517 The more general situation is then expressed as follows: two real
 518 variables $a$ and $b$ take a certain number of values over 2 finite
 519 intervals $[a1,~ a2]$ and $[b1,~ b2]$. Let $A$ (resp. $B$) be all the
 520 values of $a$ (resp. $b$) observed over $[a1,~ a2]$ (resp. $[b1,~
 521   b2]$).
 522 For example, here, a represents the weights of a set of n subjects and b the sizes of these same subjects.
 523
 524 Two problems arise:
 525 \begin{enumerate}
 526 \item  Can adjacent sub-intervals of $[a1,~ a2]$ (resp. $[b1,~ b2]$)
 527   be defined so that the finest partition obtained best respects the
 528   distribution of the values observed in $[a1,~ a2]$ (resp. $[b1,~ b2]$)?
 529 \item  Can we find the respective partitions of $[a1,~ a2]$ and $[b1,~
 530   b2]$ made up of meetings of the previous adjacent sub-intervals,
 531   partitions that maximize the average intensity of involvement of the
 532   sub-intervals of one on sub-intervals on the other belonging to
 533   these partitions?
 534 \end{enumerate}
 535
 536 We answer these two questions as part of our problem by choosing the
 537 criteria to optimize in order to satisfy the optimality expected in
 538 each case.
 539 To the first question, many solutions have been provided in other
 540 settings (for example, by~\cite{Lahaniera}).
 541
 542 \subsubsection{First problem}
 543
 544 We will look at the interval $[a1,~ a2]$ assuming it has a trivial
 545 initial partition of sub-intervals of the same length, but not
 546 necessarily of the same frequency distribution observed on these
 547 sub-intervals.
 548 Note $P_0 = \{A_{01},~ A_{02},~ ...,~ A_{0p}\}$, this partition in $p$
 549 sub-intervals.
 550 We try to obtain a partition of $[a1,~ a2]$ into $p$ sub-intervals
 551 $\{A_{q1},~ A_{q2},~ ...,~ A_{qp}\}$ in such a way that within each
 552 sub-interval there is good statistical homogeneity (low intra-class
 553 inertia) and that these sub-intervals have good mutual heterogeneity
 554 (high inter-class inertia).
 555 We know that if one of the criteria is verified, the other is
 556 necessarily verified (Koenig-Huyghens theorem).
 557 This will be done by adopting a method directly inspired by the
 558 dynamic cloud method developed by Edwin Diday~\cite{Diday} (see also
 559 \cite{Lebart} and adapted to the current situation. This results in
 560 the optimal partition targeted.
 561
 562 \subsubsection{Second problem}
 563
 564 It is now assumed that the intervals $[a1,~ a2]$ and $[b1,~ b2]$ are
 565 provided with optimal partitions $P$ and $Q$, respectively, in the
 566 sense of the dynamic clouds.
 567 Let $p$ and $q$ be the respective numbers of sub-intervals composing
 568 $P$ and $Q$.
 569 From these two partitions, it is possible to generate $2^{p-1}$ and
 570 $2^{q-1}$ partitions obtained by iterated meetings of adjacent
 571 sub-intervals of $P$ and $Q$ \footnote{It is enough to consider the tree structure of which $A_1$ is the root, then to join it or not to $A_2$ which itself will or will not be joined to $A_3$, etc. There are therefore $2^{p-1}$ branches in this tree structure.} respectively.
 572 We calculate the respective intensities of implication of each
 573 sub-interval, whether or not combined with another of the first
 574 partition, on each sub-interval, whether or not combined with another
 575 of the second, and then the values of the intensities of the
 576 reciprocal implications.
 577 There are therefore a total of $2.2^{p-1}.2^{q-1}$ families of
 578 implication intensities, each of which requires the calculation of all
 579 the elements of a partition of $[a1,~ a2]$ on all the elements of one
 580 of the partitions of $[b1,~ b2]$ and vice versa.
 581 The optimality criterion is chosen as the geometric mean of the
 582 intensities of implication, the mean associated with each pair of
 583 partitions of elements, combined or not, defined inductively.
 584 We note the two maxima obtained (direct implication and its
 585 reciprocal) and we retain the two associated partitions by declaring
 586 that the implication of the variable-on-interval $a$ on the
 587 variable-on-interval $b$ is optimal when the interval $[a1,~ a2]$
 588 admits the partition corresponding to the first maximum and that the
 589 optimal reciprocal involvement is satisfied for the partition of
 590 $[b1,~ b2]$ corresponding to the second maximum.
 591
 592 \section{Interval-variables}
 593 \subsection{Founding situation}
 594 Data are available from a population of $n$ individuals (who may be
 595 each or some of the sets of individuals, e.g. a class of students)
 596 according to variables (e.g. grades over a year in French, math,
 597 physics,..., but also: weight, height, chest size,...).
 598 The values taken by these variables for each individual are intervals
 599 of positive real values.
 600 For example, individual $x$ gives the value $[12,~ 15.50]$ to the math
 601 score variable.
 602 E. Diday would speak on this subject of symbolic variables $p$ at
 603 intervals defined on the population.
 604
 605
 606 We try to define an implication of intervals, relative to a variable
 607 $a$, which are themselves observed intervals, towards other similarly
 608 defined intervals and relative to another variable $b$.
 609 This will make it possible to measure the implicit, and therefore
 610 non-symmetric, association of certain interval(s) of the variable a
 611 with certain interval(s) of the variable $b$, as well as the
 612 reciprocal association from which the best one will be chosen for each
 613 pair of sub-intervals involved, as just described in §4.1.
 614
 615 For example, it will be said that the sub-interval $[2, 5.5]$ of
 616 mathematical scores generally implies the sub-interval $[4.25, 7.5]$
 617 of physical scores, both of which belong to an optimal partition in
 618 terms of the explained variance of the respective value ranges $[1,
 619   18]$ and $[3, 20]$ taken in the population.
 620 Similarly, we will say that $[14.25, 17.80]$ in physics most often
 621 implies $[16.40, 18]$ in mathematics.
 622
 623
 624 \subsection{Algorithm}
 625
 626 By following the problem of E. Diday and his collaborators, if the
 627 values taken according to the subjects by the variables $a$ and $b$
 628 are of a symbolic nature, in this case intervals of $\mathbb{R}^+$, it
 629 is possible to extend the above algorithms\cite{Grasi}.
 630 For example, variable $a$ has weight intervals associated with it and
 631 variable $b$ has size intervals associated with variable $b$, due to
 632 inaccurate measurements.
 633 By combining the intervals $I_x$ and $J_x$ described by the subjects
 634 $x$ of $E$ according to each of the variables $a$ and $b$
 635 respectively, we obtain two intervals $I$ and $J$ covering all
 636 possible values of $a$ and $b$.
 637 On each of them a partition can be defined in a certain number of
 638 intervals respecting as above a certain optimality criterion.
 639 For this purpose, the intersections of intervals such as $I_x$ and
 640 $J_x$ with these partitions will be provided with a distribution
 641 taking into account the areas of the common parts.
 642 This distribution may be uniform or of another discrete or continuous
 643 type.
 644 But thus, we are back in search of rules between two sets of
 645 variables-on-intervals that take, as previously in §4.1, their values
 646 on $[0,~ 1]$ from which we can search for optimal implications.
 647
 648
 649 \remark Whatever the type of variable considered, there is often a
 650 problem of overabundance of variables and therefore difficulty of
 651 representation.
 652 For this reason, we have defined an equivalence relationship on all
 653 variables that allows us to substitute a so-called leader variable for
 654 an equivalence class~\cite{Grask}.
 655
 656 \section{Variations in the implication index q according to the 4 occurrences}
 657
 658 In this paragraph, we examine the sensitivity of the implication index
 659 to disturbances in its parameters.
 660
 661 \subsection{Stability of the implication index}
 662 To study the stability of the implication index $q$ is to examine its
 663 small variations in the vicinity of the $4$ observed integer values
 664 ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$).
 665 To do this, it is possible to perform different simulations by
 666 crossing these 4 integer variables on which $q$ depends~\cite{Grasx}.
 667 But let us consider these variables as variables with real values and
 668 $q$ as a function that can be continuously differentiated from these
 669 variables, which are themselves forced to respect inequalities: $0\leq
 670 n_a \leq n_b$ and $n_{a \wedge \overline{b}} \leq inf\{n_a,~ n_b\}$ and
 671 $sup\{n_a,~ n_b\} \leq n$.
 672 The function $q$ then defines a scalar and vector field on
 673 $\mathbb{R}^4$ as an affine and vector space on itself.
 674 In the likely hypothesis of an evolution of a nonchaotic process of
 675 data collection, it is then sufficient to examine the differential of
 676 $q$ with respect to these variables and to keep its restriction to the
 677 integer values of the parameters of the relationship $a \Rightarrow b$.
 678 The differential of $q$, in the sense of Fréchet's
 679 topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections,
 680   i.e. subsets of naturals of the form $\{n,~ n+1,~ n+2,~ ....\}$, to be
 681   used as a filter base, while the usual topology on $\mathbb{R}$
 682   allows real intervals for filters.
 683   Thus continuity and derivability are perfectly defined and
 684   operational concepts according to Fréchet's topology in the same way
 685   as they are with the usual topology.}, is expressed as follows by
 686 the scalar product:
 687
 688 $$dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial
 689   n_a}dn_a +  \frac{\partial q}{\partial n_b}dn_b +  \frac{\partial
 690   q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} = grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is the elementary work of $q$ for a movement $dM$ (see chapter 14 of this book).}$$
 691
 692 where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge
 693   \overline{b}})$ of the vector scalar field $C$, $dM$ is the
 694 component vector the differential increases of these occurrence
 695 variables, and $grad~ q$ the component vector the partial derivatives
 696 of these occurrence variables.
 697
 698 The differential of the function $q$ therefore appears as the scalar product of its gradient and the increase of $q$ on the surface representing the variations of the function $q(n,~ n_a,~ n_b,~ n_{a \wedge
 699   \overline{b}})$. Thus, the gradient of $q$ represents its own
 700 variations according to those of its components, the 4 cardinals of
 701 the assemblies $E$, $A$, $B$ and $card(A\cap \overline{B})$. It
 702 indicates the direction and direction of growth or decrease of $q$ in
 703 the space of dimension 4. Remember that it is carried by the normal to
 704 the surface of level $q~ =~ cte$.
 705
 706 If we want to study how $q$ varies according to $ n_{\overline{b}}$,
 707 we just have to replace $n_b$ by $n-n_b$ and therefore change the sign
 708 of the derivative of $n_b$ in the partial derivative. In fact, the
 709 interest of this differential lies in estimating the increase
 710 (positive or negative) of $q$ that we note $\Delta q$ in relation to
 711 the respective variations $\Delta n$, $\Delta n_a$, $\Delta n_b$ and
 712 $\Delta n_{a \wedge
 713   \overline{b}}$. So we have:
 714
 715
 716 $$\Delta q= \frac{\partial q}{\partial n} \Delta n + \frac{\partial
 717   q}{\partial n_a} \Delta n_a  + \frac{\partial
 718   q}{\partial n_b} \Delta n_b + \frac{\partial
 719   q}{\partial n_{a \wedge
 720   \overline{b}}} \Delta n_{a \wedge
 721   \overline{b}} +o(\Delta q)$$
 722
 723 where $o(\Delta q)$ is an infinitely small first order.
 724 Let us examine the partial derivatives of $n_b$ and  $n_{a \wedge
 725   \overline{b}}$ the number of counter-examples. We get:
 726
 727 $$ \frac{\partial
 728   q}{\partial n_b} = \frac{1}{2} n_{a \wedge
 729   \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}}
 730 + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} > 0  $$
 731
 732
 733 $$ \frac{\partial
 734   q}{\partial n_{a \wedge
 735   \overline{b}}}    = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}}
 736 = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0 $$
 737
 738 Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge
 739   \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is
 740 also positive. This is interpreted as follows: if the number of
 741 examples of $b$ and the number of counter-examples of implication
 742 increase then the intensity of implication decreases for $n$ and $n_a$
 743 constant. In other words, this intensity of implication is maximum at
 744 observed values $n_b$ and $ n_{a \wedge
 745   \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and  $n_{a \wedge
 746   \overline{b}}+ n_{a \wedge
 747   \overline{b}}$.