1 %%%%%%%%%%%%%%%%%%%%% chapter.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5 % Use this file as a template for your own input.
7 %%%%%%%%%%%%%%%%%%%%%%%% Springer-Verlag %%%%%%%%%%%%%%%%%%%%%%%%%%
8 %\motto{Use the template \emph{chapter.tex} to style the various elements of your chapter content.}
9 \chapter{From the founding situations of the SIA to its formalization}
10 \label{intro} % Always give a unique label
12 % to alter or adjust the chapter heading in the running head
17 Starting from mathematical didactic situations, the implicitative
18 statistical analysis method develops as problems are encountered and
20 Its main objective is to structure data crossing subjects and
21 variables, to extract inductive rules between variables and, based on
22 the contingency of these rules, to explain and therefore forecast in
23 various fields: psychology, sociology, biology, etc.
24 It is for this purpose that the concepts of intensity of implication,
25 class cohesion, implication-inclusion, significance of hierarchical
26 levels, contribution of additional variables, etc., are based.
27 Similarly, the processing of binary variables (e.g., descriptors) is
28 gradually being supplemented by the processing of modal, frequency
29 and, recently, interval and fuzzy variables.
34 Human operative knowledge is mainly composed of two components: that
35 of facts and that of rules between facts or between rules themselves.
36 It is his learning that, through his culture and his personal
37 experiences, allows him to gradually develop these forms of knowledge,
38 despite the regressions, the questioning, the ruptures that arise at
39 the turn of decisive information.
40 However, we know that these dialectically contribute to ensuring a
42 However, the rules are inductively formed in a relatively stable way
43 as soon as the number of successes, in terms of their explanatory or
44 anticipatory quality, reaches a certain level (of confidence) from
45 which they are likely to be implemented.
46 On the other hand, if this (subjective) level is not reached, the
47 individual's economy will make him resist, in the first instance, his
48 abandonment or criticism.
49 Indeed, it is costly to replace the initial rule with another rule
50 when a small number of infirmations appear, since it would have been
51 reinforced by a large number of confirmations.
52 An increase in this number of negative instances, depending on the
53 robustness of the level of confidence in the rule, may lead to its
54 readjustment or even abandonment.
55 Laurent Fleury~\cite{Fleury}, in his thesis, correctly cites the
56 example - which Régis repeats - of the highly admissible rule: "all
58 This very robust rule will not be abandoned when observing a single or
60 Especially since it would not fail to be quickly
63 Thus, contrary to what is legitimate in mathematics, where not all
64 rules (theorem) suffer from exception, where determinism is total,
65 rules in the human sciences, more generally in the so-called "soft"
66 sciences, are acceptable and therefore operative as long as the number
67 of counter-examples remains "bearable" in view of the frequency of
68 situations where they will be positive and effective.
69 The problem in data analysis is then to establish a relatively
70 consensual numerical criterion to define the notion of a level of
71 confidence that can be adjusted to the level of requirement of the
73 The fact that it is based on statistics is not surprising.
74 That it has a property of non-linear resistance to noise (weakness of
75 the first counter-example(s)) may also seem natural, in line with the
76 "economic" meaning mentioned above.
77 That it collapses if counter-examples are repeated also seems to have
78 to guide our choice in the modeling of the desired criterion.
79 This text presents the epistemological choice we have made.
80 As such it is therefore refutable, but the number of situations and
81 applications where it has proved relevant and fruitful leads us to
82 reproduce its genesis here.
84 \section{Introduction}
86 Different theoretical approaches have been adopted to model the
87 extraction and representation of imprecise (or partial) inference
88 rules between binary variables (or attributes or characters)
89 describing a population of individuals (or subjects or objects).
90 But the initial situations and the nature of the data do not change
92 It is a question of discovering non-symmetrical inductive rules to
93 model relationships of the type "if a then almost b".
94 This is, for example, the option of Bayesian networks~\cite{Amarger}
95 or Galois lattices~\cite{Simon}.
96 But more often than not, however, since the correlation and the
97 ${\chi}^2$ test are unsuitable because of their symmetric nature,
98 conditional probability~\cite{Loevinger, Agrawal,Grasn} remains the
99 driving force behind the definition of the association, even when the
100 index of this selected association is multivariate~\cite{Bernard}.
104 Moreover, to our knowledge, on the one hand, most often the different
105 and interesting developments focus on proposals for a partial
106 implication index for binary data~\cite{Lermana} or \cite{Lallich}, on
107 the other hand, this notion is not extended to other types of
108 variables, to extraction and representation according to a rule graph
109 or a hierarchy of meta-rules; structures aiming at access to the
110 meaning of a whole not reduced to the sum of its
111 parts~\cite{Seve}\footnote{This is what the philosopher L. Sève
112 emphasizes :"... in the non-additive, non-linear passage of the
113 parts to the whole, there are properties that are in no way
114 precontained in the parts and which cannot therefore be explained by
115 them" }, i.e. operating as a complex non-linear system.
116 For example, it is well known, through usage, that the meaning of a
117 sentence does not completely depend on the meaning of each of the
118 words in it (see the previous chapter, point 4).
120 Let us return to what we believe is fertile in the approach we are
122 It would seem that, in the literature, the notion of implication index
123 is also not extended to the search for subjects and categories of
124 subjects responsible for associations.
125 Nor that this responsibility is quantified and thus leads to a
126 reciprocal structuring of all subjects, conditioned by their
127 relationships to variables.
128 We propose these extensions here after recalling the founding
132 \section{Implication intensity in the binary case}
134 \subsection{Fundamental and founding situation}
136 A set of objects or subjects E is crossed with variables
137 (characters, criteria, successes,...) which are interrogated as
138 follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$
139 implies instantiating variable $b$?
140 In other words, do the subjects tend to be $b$ if we know that they are
142 In natural, human or life sciences situations, where theorems (if $a$
143 then $b$) in the deductive sense of the term cannot be established
144 because of the exceptions that taint them, it is important for the
145 researcher and the practitioner to "mine into his data" in order to
146 identify sufficiently reliable rules (kinds of "partial theorems",
147 inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship,
148 a genesis, to describe, structure a population and make the assumption
149 of a certain stability for descriptive and, if possible, predictive
151 But this excavation requires the development of methods to guide it
152 and to free it from trial and error and empiricism.
155 \subsection{Mathematization}
157 To do this, following the example of the I.C. Lerman similarity
158 measurement method \cite{Lerman,Lermanb}, following the classic
159 approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we
160 define~\cite{Grasb,Grasf} the confirmatory quality measure of the
161 implicative relationship $a \Rightarrow b$ from the implausibility of
162 the occurrence in the data of the number of cases that invalidate it,
163 i.e. for which $a$ is verified without $b$ being verified. This
164 amounts to comparing the difference between the quota and the
165 theoretical if only chance occurred\footnote{"...[in agreement with
166 Jung] if the frequency of coincidences does not significantly
167 exceed the probability that they can be calculated by attributing
168 them solely by chance to the exclusion of hidden causal
169 relationships, we certainly have no reason to suppose the existence
170 of such relationships.", H. Atlan~\cite{Atlana}}.
171 But when analyzing data, it is this gap that we take into account and
172 not the statement of a rejection or null hypothesis eligibility.
173 This measure is relative to the number of data verifying $a$ and not
174 $b$ respectively, the circumstance in which the involvement is
175 precisely put in default.
176 It quantifies the expert's "astonishment" at the unlikely small number
177 of counter-examples in view of the supposed independence between the
178 variables and the numbers involved.
180 Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$,
182 In the classical paradigmatic situation and initially retained, it is
183 about the performance (success-failure) to items of a questionnaire.
184 To a finite set $E$ of $n$ subjects $x$, functions of the type : $x
185 \rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies
186 or has the character $a$ and $0$ (or $a(x) = false$) otherwise are
187 associated by abuse of writing.
188 In artificial intelligence, we will say that $x$ is an example or an
189 instance for $a$ if $a(x) = 1$ and a counter-example if not.
192 The $a \Rightarrow b$ rule is logically true if for any $x$ in the
193 sample, $b(x)$ is null only if $a(x)$ is also null; in other words if
194 set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the
195 $x$ for which $b(x)=1$.
196 However, this strict inclusion is only exceptionally observed in the
197 pragmatically encountered experiments.
198 In the case of a knowledge questionnaire, we could indeed observe a
199 few rare students passing an item $a$ and not passing item $b$,
200 without contesting the tendency to pass item $b$ when we have passed
202 With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or
203 $n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the
204 counter-examples (or) that must be taken into account in order to
205 statistically accept whether or not to keep the quasi-implication or
206 quasi-rule $a \Rightarrow b$. Thus, it is from the dialectic of
207 example-counter-examples that the rule appears as the overcoming of
210 \subsection{Formalization}
212 To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of
213 $E$, chosen randomly and independently (absence of a priori link
214 between these two parts) and of the same respective cardinals as $A$
215 and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
219 \definition $a \Rightarrow b$ is acceptable at confidence level
220 $1-\alpha$ if and only if
221 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
225 \includegraphics[scale=0.34]{chap2fig1.png}
226 \caption{The dark grey parts correspond to the counter-examples of the
227 implication $a \Rightarrow b$}
231 It is established \cite{Lermanb} that, for a certain drawing process,
232 the random variable $Card(X\cap \overline{Y})$ follows the Poisson law
233 of parameter $\frac{n_a n_{\overline{b}}}{n}$.
234 We achieve this same result by proceeding differently in the following
237 Note $X$ (resp. $Y$) the random subset of binary transactions where
238 $a$ (resp. $b$) would appear, independently, with the frequency
239 $\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$).
240 To specify how the transactions specified in variables $a$ and $b$,
241 respectively $A$ and $B$, are extracted, for example, the following
242 semantically permissible assumptions are made regarding the
243 observation of the event: $[a=1~ and~ b=0]$. $(A\cap
244 \overline{B})$\footnote{We then note $\overline{v}$ the variable
245 negation of $v$ (or $not~ v$) and $\overline{P}$ the complementary
246 part of the part P of E.} is the subset of transactions,
247 counter-examples of implication $a \Rightarrow b$:
251 \item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent
253 \item h2: the law of the number of events occurring in the time
254 interval $[t,~ t+T[$ depends only on T;
255 \item h3: two such events cannot occur simultaneously
258 It is then demonstrated (for example in~\cite{Saporta}) that the
259 number of events occurring during a period of fixed duration $n$
260 follows a Poisson's law of parameter $c.n$ where $c$ is called the
261 rate of the apparitions process during the unit of time.
264 However, for each transaction assumed to be random, the event $[a=1]$
265 has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
266 has as probability the frequency, therefore the joint event $[a=1~
267 and~ b=0]$ has for probability estimated by the frequency
268 $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
270 We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
272 Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter :
273 $$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
275 As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
277 Consequently, the probability that the hazard will lead, under the
278 assumption of the absence of an a priori link between $a$ and $b$, to
279 more counter-examples than those observed is:
281 $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
282 \sum^{card(A\cap \overline{B})}_{s=0} e^{-\lambda}\frac{\lambda^s}{s!} $$
284 But other legitimate drawing processes lead to a binomial law, or
285 even a hypergeometric law (itself not semantically adapted to the
286 situation because of its symmetry). Under suitable convergence
287 conditions, these two laws are finally reduced to the Poisson Law
288 above (see Annex to this chapter).
290 If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
293 $$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) - \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} $$
295 In the experimental realization, the observed value of
296 $Q(a,\overline{b})$ is $q(a,\overline{b})$.
297 It estimates a gap between the contingency $(card(A\cap
298 \overline{B}))$ and the value it would have taken if there had been
299 independence between $a$ and $b$.
302 \begin{equation} q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
303 \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}
306 is called the implication index, the number used as an indicator of
307 the non-implication of $a$ to $b$.
308 In cases where the approximation is properly legitimized (for example
309 $\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
310 $Q(a,\overline{b})$ approximately follows the reduced centered normal
311 distribution. The intensity of implication, measuring the quality of
312 $a\Rightarrow b$, for $n_a\leq n_b$ and $nb \neq n$, is then defined
313 from the index $q(a,\overline{b})$ by:
316 The implication intensity that measures the inductive quality of $a$
318 $$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
319 \frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
320 e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
321 $$\varphi(a,b)=0,~ otherwise$$
322 As a result, the definition of statistical implication becomes:
324 Implication $a\Rightarrow b$ is admissible at confidence level
325 $1-\alpha $ if and only if:
326 $$\varphi(a,b)\geq 1-\alpha$$
329 It should be recalled that this modeling of quasi-implication measures
330 the astonishment to note the smallness of counter-examples compared to
331 the surprising number of instances of implication.
332 It is a measure of the inductive and informative quality of
333 implication. Therefore, if the rule is trivial, as in the case where
334 $B$ is very large or coincides with $E$, this astonishment becomes
336 We also demonstrate~\cite{Grasf} that this triviality results in a
337 very low or even zero intensity of implication: If, $n_a$ being fixed
338 and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
339 towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
340 define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
341 $A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
342 the inductive confidence, measured by statistical surprise, is
345 {\bf \remark Total correlation, partial correlation}
348 We take here the notion of correlation in a more general sense than
349 that used in the domain that develops the linear correlation
350 coefficient (linear link measure) or the correlation ratio (functional
352 In our perspective, there is a total (or partial) correlation between
353 two variables $a$ and $b$ when the respective events they determine
354 occur (or almost occur) at the same time, as well as their opposites.
355 However, we know from numerical counter-examples that correlation and
356 implication do not come down to each other, that there can be
357 correlation without implication and vice versa~\cite{Grasf} and below.
358 If we compare the implication coefficient and the linear correlation
359 coefficient algebraically, it is clear that the two concepts do not
360 coincide and therefore do not provide the same
361 information\footnote{"More serious is the logical error inferred from
362 a correlation found to the existence of a causality" writes Albert
363 Jacquard in~\cite{Jacquard}, p.159. }.
365 The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
366 not coincide with the correlation coefficient $\rho(a, b)$ which is
367 symmetric and which reflects the relationship between variables a and
368 b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
370 $$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
371 n_{\overline{a}}}} q(a,\overline{b})$$
372 With the correlation considered from the point of view of linear
373 correlation, even if correlation and implication are rather in the
374 same direction, the orientation of the relationship between two
375 variables is not transparent because it is symmetrical, which is not
376 the bias taken in the SIA.
377 From a statistical relationship given by the correlation, two opposing
378 empirical propositions can be deduced.
380 The following dual numerical situation clearly illustrates this:
385 \begin{tabular}{|l|c|c|c|}\hline
386 \diagbox[width=4em]{$a_1$}{$b_1$}&
387 1 & 0 & margin\\ \hline
388 1 & 96 & 4& 100 \\ \hline
389 0 & 50 & 50& 100 \\ \hline
390 margin & 146 & 54& 200 \\ \hline
391 \end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
392 \diagbox[width=4em]{$a_2$}{$b_2$}&
393 1 & 0 & margin\\ \hline
394 1 & 94 & 6& 100 \\ \hline
395 0 & 52 & 48& 100 \\ \hline
396 margin & 146 & 54& 200 \\ \hline
399 \caption{Numeric example of difference between implication and
404 In Table~\ref{chap2tab1}, the following correlation and implications
406 Correlation $\rho(a_1,b_1)=0.468$, Implication
407 $q(a_1,\overline{b_1})=-4.082$\\
408 Correlation $\rho(a_2,b_2)=0.473$, Implication $q(a_2,\overline{b_2})=-4.041$
411 Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
412 correlated than $a_2$ and $b_2$ while, on the other hand, the
413 implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
414 over $b_2$ since $q1 <q2$.
416 On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
417 finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
419 \remark Remember that we consider not only conjunctions of variables
420 of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
421 or $c$..." in order to model phenomena that are concepts as it is done
422 in learning or in artificial intelligence.
423 The associated calculations remain compatible with the logic of the
424 proposals linked by connectors.
426 \remark Unlike the Loevinger Index~\cite{Loevinger} and conditional
427 probability $(Pr[B/A])=1$ and all its derivatives, the implication
428 intensity varies, non-linearly, with the expansion of sets $E$, $A$
429 and $B$ and weakens with triviality (see Definition 2.3).
431 is resistant to noise, especially around $0$ for, which can only make
432 the relationship we want to model and establish statistically
434 Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
435 maximum intensity, the inductive quality may not be strong, whereas
436 $Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
437 In paragraph 5, we study more closely the problem of the sensitivity
438 and stability of the implication index as a function of small
439 variations in the parameters involved in the study of its
442 \section{Case of modal and frequency variables}
443 \subsection{Founding situation}
445 Marc Bailleul's (1991-1994) research focuses in particular on the
446 representation that mathematics teachers have of their own teaching.
447 In order to highlight it, meaningful words are proposed to them that
448 they must prioritize.
449 Their choices are no longer binary, the words chosen by any teacher
450 are ordered at least at the most representative.
451 Mr. Bailleul's question then focuses on questions of the type: "if I
452 choose this word with this importance, then I choose this other word
453 with at least equal importance".
454 It was therefore necessary to extend the notion of statistical
455 implication to variables other than binary.
456 This is the case for modal variables that are associated with
457 phenomena where the values $a(x)$ are numbers in the interval $[0, 1]$
458 and describe degrees of belonging or satisfaction as are fuzzy logic,
459 for example, linguistic modifiers "maybe", "a little", "sometimes",
461 This problem is also found in situations where the frequency of a
462 variable reflects a preorder on the values assigned by the subjects to
463 the variables presented to them.
464 These are frequency variables that are associated with phenomena where
465 the values of $a(x)$ are any positive real values.
466 This is the case when one considers a student's percentage of success
467 in a battery of tests in different areas.
469 \subsection{Formalization}
471 J.B. Lagrange~\cite{Lagrange} has demonstrated that, in the modal
474 \item if $a(x)$ and $\overline{b}(x)$ are the values taken at $x$ by
475 the modal variables $a$ and $\overline{b}$, with $(x)=1-b(x)$
476 \item if $s^2_a$ and $s_{\overline{b}}^2$ are the empirical variances of variables $a$ and $\overline{b}$
477 then the implication index, which he calls propensity index, becomes:
480 $$q(a,\overline{b}) = \frac{\sum_{x\in E} a(x)\overline{b}(x) -
481 \frac{n_a n_{\overline{b}}}{n}}
482 {\sqrt{\frac{(n^2s_a^2+n_a^2)(n^2+s_{\overline{b}}^2 + n_{\overline{b}}^2)}{n^3}}}$$
483 is the index of propensity of modal variables.
486 J.B. Lagrange also proves that this index coincides with the index
487 defined previously in the binary case if the number of modalities of a
488 and b is precisely 2, because in this case :\\
489 $n^2s_a^2+n_a^2=n n_a$,~ ~ $ n^2+s_{\overline{b}}^2 + n_{\overline{b}}=n
490 n_{\overline{b}}$~ ~ and ~ ~ $\sum_{x\in E} a(x)\overline{b}(x)=n_{a \wedge
493 This solution provided in the modal case is also applicable to the
494 case of frequency variables, or even positive numerical variables,
495 provided that the values observed on the variables, such as a and b,
496 have been normalized, the normalization in $[0, 1]$ being made from the maximum of the value taken respectively by $a$ and $b$ on set $E$.
499 In~\cite{Regniera}, we consider rank variables that reflect a
500 total order between choices presented to a population of judges.
501 Each of them must order their preferential choice among a set of
502 objects or proposals made to them.
503 An index measures the quality of the statement of the type: "if object
504 $a$ is ranked by judges then, generally, object $b$ is ranked higher
506 Proximity to the previous issue leads to an index that is relatively
507 close to the Lagrange index, but better adapted to the rank variable
511 \section{Cases of variables-on-intervals and interval-variables}
512 \subsection{Variables-on-intervals}
513 \subsubsection{Founding situation}
515 For example, the following rule is sought to be extracted from a
516 biometric data set, estimating its quality: "if an individual weighs
517 between $65$ and $70kg$ then in general he is between $1.70$ and
519 A similar situation arises in the search for relationships between
520 intervals of student performance in two different subjects.
521 The more general situation is then expressed as follows: two real
522 variables $a$ and $b$ take a certain number of values over 2 finite
523 intervals $[a1,~ a2]$ and $[b1,~ b2]$. Let $A$ (resp. $B$) be all the
524 values of $a$ (resp. $b$) observed over $[a1,~ a2]$ (resp. $[b1,~
526 For example, here, a represents the weights of a set of n subjects and b the sizes of these same subjects.
530 \item Can adjacent sub-intervals of $[a1,~ a2]$ (resp. $[b1,~ b2]$)
531 be defined so that the finest partition obtained best respects the
532 distribution of the values observed in $[a1,~ a2]$ (resp. $[b1,~ b2]$)?
533 \item Can we find the respective partitions of $[a1,~ a2]$ and $[b1,~
534 b2]$ made up of meetings of the previous adjacent sub-intervals,
535 partitions that maximize the average intensity of involvement of the
536 sub-intervals of one on sub-intervals on the other belonging to
540 We answer these two questions as part of our problem by choosing the
541 criteria to optimize in order to satisfy the optimality expected in
543 To the first question, many solutions have been provided in other
544 settings (for example, by~\cite{Lahaniera}).
546 \subsubsection{First problem}
548 We will look at the interval $[a1,~ a2]$ assuming it has a trivial
549 initial partition of sub-intervals of the same length, but not
550 necessarily of the same frequency distribution observed on these
552 Note $P_0 = \{A_{01},~ A_{02},~ ...,~ A_{0p}\}$, this partition in $p$
554 We try to obtain a partition of $[a1,~ a2]$ into $p$ sub-intervals
555 $\{A_{q1},~ A_{q2},~ ...,~ A_{qp}\}$ in such a way that within each
556 sub-interval there is good statistical homogeneity (low intra-class
557 inertia) and that these sub-intervals have good mutual heterogeneity
558 (high inter-class inertia).
559 We know that if one of the criteria is verified, the other is
560 necessarily verified (Koenig-Huyghens theorem).
561 This will be done by adopting a method directly inspired by the
562 dynamic cloud method developed by Edwin Diday~\cite{Diday} (see also
563 \cite{Lebart} and adapted to the current situation. This results in
564 the optimal partition targeted.
566 \subsubsection{Second problem}
568 It is now assumed that the intervals $[a1,~ a2]$ and $[b1,~ b2]$ are
569 provided with optimal partitions $P$ and $Q$, respectively, in the
570 sense of the dynamic clouds.
571 Let $p$ and $q$ be the respective numbers of sub-intervals composing
573 From these two partitions, it is possible to generate $2^{p-1}$ and
574 $2^{q-1}$ partitions obtained by iterated meetings of adjacent
575 sub-intervals of $P$ and $Q$ \footnote{It is enough to consider the tree structure of which $A_1$ is the root, then to join it or not to $A_2$ which itself will or will not be joined to $A_3$, etc. There are therefore $2^{p-1}$ branches in this tree structure.} respectively.
576 We calculate the respective intensities of implication of each
577 sub-interval, whether or not combined with another of the first
578 partition, on each sub-interval, whether or not combined with another
579 of the second, and then the values of the intensities of the
580 reciprocal implications.
581 There are therefore a total of $2.2^{p-1}.2^{q-1}$ families of
582 implication intensities, each of which requires the calculation of all
583 the elements of a partition of $[a1,~ a2]$ on all the elements of one
584 of the partitions of $[b1,~ b2]$ and vice versa.
585 The optimality criterion is chosen as the geometric mean of the
586 intensities of implication, the mean associated with each pair of
587 partitions of elements, combined or not, defined inductively.
588 We note the two maxima obtained (direct implication and its
589 reciprocal) and we retain the two associated partitions by declaring
590 that the implication of the variable-on-interval $a$ on the
591 variable-on-interval $b$ is optimal when the interval $[a1,~ a2]$
592 admits the partition corresponding to the first maximum and that the
593 optimal reciprocal involvement is satisfied for the partition of
594 $[b1,~ b2]$ corresponding to the second maximum.
596 \subsection{Interval-variables}
597 \subsubsection{Founding situation}
598 Data are available from a population of $n$ individuals (who may be
599 each or some of the sets of individuals, e.g. a class of students)
600 according to variables (e.g. grades over a year in French, math,
601 physics,..., but also: weight, height, chest size,...).
602 The values taken by these variables for each individual are intervals
603 of positive real values.
604 For example, individual $x$ gives the value $[12,~ 15.50]$ to the math
606 E. Diday would speak on this subject of symbolic variables $p$ at
607 intervals defined on the population.
610 We try to define an implication of intervals, relative to a variable
611 $a$, which are themselves observed intervals, towards other similarly
612 defined intervals and relative to another variable $b$.
613 This will make it possible to measure the implicit, and therefore
614 non-symmetric, association of certain interval(s) of the variable a
615 with certain interval(s) of the variable $b$, as well as the
616 reciprocal association from which the best one will be chosen for each
617 pair of sub-intervals involved, as just described in §4.1.
619 For example, it will be said that the sub-interval $[2, 5.5]$ of
620 mathematical scores generally implies the sub-interval $[4.25, 7.5]$
621 of physical scores, both of which belong to an optimal partition in
622 terms of the explained variance of the respective value ranges $[1,
623 18]$ and $[3, 20]$ taken in the population.
624 Similarly, we will say that $[14.25, 17.80]$ in physics most often
625 implies $[16.40, 18]$ in mathematics.
628 \subsubsection{Algorithm}
630 By following the problem of E. Diday and his collaborators, if the
631 values taken according to the subjects by the variables $a$ and $b$
632 are of a symbolic nature, in this case intervals of $\mathbb{R}^+$, it
633 is possible to extend the above algorithms\cite{Grasi}.
634 For example, variable $a$ has weight intervals associated with it and
635 variable $b$ has size intervals associated with variable $b$, due to
636 inaccurate measurements.
637 By combining the intervals $I_x$ and $J_x$ described by the subjects
638 $x$ of $E$ according to each of the variables $a$ and $b$
639 respectively, we obtain two intervals $I$ and $J$ covering all
640 possible values of $a$ and $b$.
641 On each of them a partition can be defined in a certain number of
642 intervals respecting as above a certain optimality criterion.
643 For this purpose, the intersections of intervals such as $I_x$ and
644 $J_x$ with these partitions will be provided with a distribution
645 taking into account the areas of the common parts.
646 This distribution may be uniform or of another discrete or continuous
648 But thus, we are back in search of rules between two sets of
649 variables-on-intervals that take, as previously in §4.1, their values
650 on $[0,~ 1]$ from which we can search for optimal implications.
653 \remark Whatever the type of variable considered, there is often a
654 problem of overabundance of variables and therefore difficulty of
656 For this reason, we have defined an equivalence relationship on all
657 variables that allows us to substitute a so-called leader variable for
658 an equivalence class~\cite{Grask}.
660 \section{Variations in the implication index q according to the 4 occurrences}
662 In this paragraph, we examine the sensitivity of the implication index
663 to disturbances in its parameters.
665 \subsection{Stability of the implication index}
666 To study the stability of the implication index $q$ is to examine its
667 small variations in the vicinity of the $4$ observed integer values
668 ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$).
669 To do this, it is possible to perform different simulations by
670 crossing these 4 integer variables on which $q$ depends~\cite{Grasx}.
671 But let us consider these variables as variables with real values and
672 $q$ as a function that can be continuously differentiated from these
673 variables, which are themselves forced to respect inequalities: $0\leq
674 n_a \leq n_b$ and $n_{a \wedge \overline{b}} \leq inf\{n_a,~ n_b\}$ and
675 $sup\{n_a,~ n_b\} \leq n$.
676 The function $q$ then defines a scalar and vector field on
677 $\mathbb{R}^4$ as an affine and vector space on itself.
678 In the likely hypothesis of an evolution of a nonchaotic process of
679 data collection, it is then sufficient to examine the differential of
680 $q$ with respect to these variables and to keep its restriction to the
681 integer values of the parameters of the relationship $a \Rightarrow b$.
682 The differential of $q$, in the sense of Fréchet's
683 topology\footnote{Fréchet's topology allows $\mathbb{N}$ sections,
684 i.e. subsets of naturals of the form $\{n,~ n+1,~ n+2,~ ....\}$, to be
685 used as a filter base, while the usual topology on $\mathbb{R}$
686 allows real intervals for filters.
687 Thus continuity and derivability are perfectly defined and
688 operational concepts according to Fréchet's topology in the same way
689 as they are with the usual topology.}, is expressed as follows by
693 dq = \frac{\partial q}{\partial n}dn + \frac{\partial q}{\partial
694 n_a}dn_a + \frac{\partial q}{\partial n_b}dn_b + \frac{\partial
695 q}{\partial n_{a \wedge \overline{b}}}dn_{a \wedge \overline{b}} =
696 grad~q.dM\footnote{By a mechanistic metaphor, we will say that $dq$ is
697 the elementary work of $q$ for a movement $dM$ (see chapter 14 of
702 where $M$ is the coordinate point $(n,~ n_a,~ n_b,~ n_{a \wedge
703 \overline{b}})$ of the vector scalar field $C$, $dM$ is the
704 component vector the differential increases of these occurrence
705 variables, and $grad~ q$ the component vector the partial derivatives
706 of these occurrence variables.
708 The differential of the function $q$ therefore appears as the scalar product of its gradient and the increase of $q$ on the surface representing the variations of the function $q(n,~ n_a,~ n_b,~ n_{a \wedge
709 \overline{b}})$. Thus, the gradient of $q$ represents its own
710 variations according to those of its components, the 4 cardinals of
711 the assemblies $E$, $A$, $B$ and $card(A\cap \overline{B})$. It
712 indicates the direction and direction of growth or decrease of $q$ in
713 the space of dimension 4. Remember that it is carried by the normal to
714 the surface of level $q~ =~ cte$.
716 If we want to study how $q$ varies according to $ n_{\overline{b}}$,
717 we just have to replace $n_b$ by $n-n_b$ and therefore change the sign
718 of the derivative of $n_b$ in the partial derivative. In fact, the
719 interest of this differential lies in estimating the increase
720 (positive or negative) of $q$ that we note $\Delta q$ in relation to
721 the respective variations $\Delta n$, $\Delta n_a$, $\Delta n_b$ and
723 \overline{b}}$. So we have:
726 $$\Delta q= \frac{\partial q}{\partial n} \Delta n + \frac{\partial
727 q}{\partial n_a} \Delta n_a + \frac{\partial
728 q}{\partial n_b} \Delta n_b + \frac{\partial
729 q}{\partial n_{a \wedge
730 \overline{b}}} \Delta n_{a \wedge
731 \overline{b}} +o(\Delta q)$$
733 where $o(\Delta q)$ is an infinitely small first order.
734 Let us examine the partial derivatives of $n_b$ and $n_{a \wedge
735 \overline{b}}$ the number of counter-examples. We get:
739 q}{\partial n_b} = \frac{1}{2} n_{a \wedge
740 \overline{b}} (\frac{n_a}{n})^{-\frac{1}{2}} (n-n_b)^{-\frac{3}{2}}
741 + \frac{1}{2} (\frac{n_a}{n})^{\frac{1}{2}} (n-n_b)^{-\frac{1}{2}} >
749 q}{\partial n_{a \wedge
750 \overline{b}}} = \frac{1}{\sqrt{\frac{n_a n_{\overline{b}}}{n}}}
751 = \frac{1}{\sqrt{\frac{n_a (n-n_b)}{n}}} > 0
756 Thus, if the increases $\Delta nb$ and $\Delta n_{a \wedge
757 \overline{b}}$ are positive, the increase of $q(a,\overline{b})$ is
758 also positive. This is interpreted as follows: if the number of
759 examples of $b$ and the number of counter-examples of implication
760 increase then the intensity of implication decreases for $n$ and $n_a$
761 constant. In other words, this intensity of implication is maximum at
762 observed values $n_b$ and $ n_{a \wedge
763 \overline{b}}$ and minimum at values $n_b+\Delta n_b$ and $n_{a \wedge
764 \overline{b}}+ n_{a \wedge
767 If we examine the case where $n_a$ varies, we obtain the partial
768 derivative of $q$ with respect to $n_a$ which is:
771 C = \frac{ n_{a \wedge \overline{b}}}{2
772 \sqrt{\frac{n_{\overline{b}}}{n}}}
773 \left(\frac{n}{n_a}\right)^{\frac{3}{2}}
774 -\frac{1}{2}\sqrt{\frac{n_{\overline{b}}}{n_a}}<0
778 Thus, for variations of $n_a$ on $[0,~ nb]$, the implication index function is always decreasing (and concave) with respect to $n_a$ and is therefore minimum for $n_a= n_b$. As a result, the intensity of implication is increasing and maximum for $n_a= n_b$.
780 Note the partial derivative of $q$ with respect to $n$:
782 $$\frac{\partial q}{\partial n} = \frac{1}{2\sqrt{n}} \left( n_{a
783 \wedge \overline{b}}+\frac{n_a n_{\overline{b}}}{n} \right)$$
785 Consequently, if the other 3 parameters are constant, the implication
786 index decreases by $\sqrt{n}$.
787 The quality of implication is therefore all the better, a specific
788 property of the SIA compared to other indicators used in the
789 literature~\cite{Grasab}.
790 This property is in accordance with statistical and semantic
791 expectations regarding the credit given to the frequency of
793 Since the partial derivatives of $q$ (at least one of them) are
794 non-linear according to the variable parameters involved, we are
795 dealing with a non-linear dynamic system\footnote{"Non-linear systems
796 are systems that are known to be deterministic but for which, in
797 general, nothing can be predicted because calculations cannot be
798 made"~\cite{Ekeland} p. 265.} with all the epistemological
799 consequences that we will consider elsewhere.
803 \subsection{Numerical example}
804 In a first experiment, we observe the occurrences: $n = 100$, $n_a =
805 20$, $n_b = 40$ (hence $n_b=60$, $ n_{a \wedge \overline{b}} = 4$).
806 The application of formula (\ref{eq2.1}) gives = -2.309.
807 In a 2nd experiment, $n$ and $n_a$ are unchanged but the occurrences
808 of $b$ and counter-examples $n_{a \wedge \overline{b}}$ increase by one unit.
810 At the initial point of the space of the 4 variables, the partial
811 derivatives that only interest us (according to $n_b$ and $n_{a
812 \wedge \overline{b}}$) have respectively the following values when
813 applying formulas (\ref{eq2.3}) and (\ref{eq2.4}): $\frac{\partial
814 q}{\partial n_b} = 0.0385$ and $\frac{\partial q}{\partial n_{a
815 \wedge \overline{b}}} = 0.2887$.
817 As $\Delta n_b$, $\Delta n_{\overline{b}}$ and $\Delta n_{a
818 \wedge \overline{b}} $ are equal to 1, -1 and 1, then $\Delta q$ is
819 equal to: $0.0385 + 0.2887 + o(\Delta q) = 0.3272 + o(\Delta q)$ and
820 the approximate value of $q$ in the second experiment is $-2.309 +
821 0.2887 + o(\Delta q)= -1.982 +o(\Delta q)$ using the first order
822 development of $q$ (formula (\ref{eq2.2})).
823 However, the calculation of the new implication index $q$ at the point
824 of the 2nd experiment is, by the use of (\ref{eq2.1}): $-1.9795$, a
825 value well approximated by the development of $q$.
829 \subsection{A first differential relationship of $\varphi$ as a function of function $q$}
830 Let us consider the intensity of implication $\varphi$ as a function
831 of $q(a,\overline{b})$:
832 $$\varphi(q)=\frac{1}{\sqrt{2\pi}}\int_q^{\infty}e^{-\frac{t^2}{2}}$$
833 We can then examine how $\varphi(q)$ varies when $q$ varies in the neighberhood of a given value $(a,b)$, knowing how $q$ itself varies according to the 4 parameters that determine it. By derivation of the integration bound, we obtain:
835 \frac{d\varphi}{dq}=-\frac{1}{\sqrt{2\pi}}e^{-\frac{q^2}{2}} < 0
838 This confirms that the intensity increases when $q$ decreases, but the growth rate is specified by the formula, which allows us to study more precisely the variations of $\varphi$. Since the derivative of $\varphi$ from $q$ is always negative, the function $\varphi$ is decreasing.
840 {\bf Numerical example}\\
841 Taking the values of the occurrences observed in the 2 experiments
842 mentioned above, we find for $q = -2.309$, the value of the intensity
843 of implication $\varphi(q)$ is equal to 0.992. Applying formula
844 (\ref{eq2.6}), the derivative of $\varphi$ with respect to $q$ is:
845 -0.02775 and the negative increase in intensity is then: -0.02775,
846 $\Delta q$ = 0.3272. The approximate first-order intensity is
847 therefore: $0.992-\Delta q$ or 0.983. However, the actual calculation
848 of this intensity is, for $q= -1.9795$, $\varphi(q) = 0.976$.
852 \subsection{Examination of other indices}
853 Unlike the core index $q$ and the intensity of implication, which
854 measures quality through probability (see definition 2.3), the other
855 most common indices are intended to be direct measures of quality.
856 We will examine their respective sensitivities to changes in the
857 parameters used to define these indices.
858 We keep the ratings adopted in paragraph 2.2 and select indices that
859 are recalled in~\cite{Grasm},~\cite{Lencaa} and~\cite{Grast2}.
861 \subsubsection{The Loevinger Index}
863 It is an "ancestor" of the indices of
864 implication~\cite{Loevinger}. This index, rated $H(a,b)$, varies from
865 1 to $-\infty$. It is defined by: $H(a,b) =1-\frac{n n_{a \wedge
866 b}}{n_a n_b}$. Its partial derivative with respect to the variable number of counter-examples is therefore:
867 $$\frac{\partial H}{\partial n_{a \wedge \overline{b}}}=-\frac{n}{n_a n_b}$$
868 Thus the implication index is always decreasing with $n_{a \wedge
869 \overline{b}}$. If it is "close" to 1, implication is "almost"
870 satisfied. But this index has the disadvantage, not referring to a
871 probability scale, of not providing a probability threshold and being
872 invariant in any dilation of $E$, $A$, $B$ and $A \cap \overline{B}$.
875 \subsubsection{The Lift Index}
877 It is expressed by: $l =\frac{n n_{a \wedge b}}{n_a n_b}$.
878 This expression, linear with respect to the examples, can still be
879 written to highlight the number of counter-examples:
880 $$l =\frac{n (n_a - n_{a \wedge \overline{b}})}{n_a n_b}$$
881 To study the sensitivity of the $l$ to parameter variations, we use:
882 $$\frac{\partial l}{\partial n_{a \wedge \overline{b}} } =
884 Thus, the variation of the Lift index is independent of the variation
885 of the number of counter-examples.
886 It is a constant that depends only on variations in the occurrences of $a$ and $b$. Therefore, $l$ decreases when the number of counter-examples increases, which semantically is acceptable, but the rate of decrease does not depend on the rate of growth of $n_{a \wedge \overline{b}}$.
888 \subsubsection{Confidence}
890 This index is the best known and most widely used thanks to the sound
891 box available in an Anglo-Saxon publication~\cite{Agrawal}.
892 It is at the origin of several other commonly used indices which are only variants satisfying this or that semantic requirement... Moreover, it is simple and can be interpreted easily and immediately.
893 $$c=\frac{n_{a \wedge b}}{n_a} = 1-\frac{n_{a \wedge \overline{b}}}{n_a}$$
895 The first form, linear with respect to the examples, independent of
896 $n_b$, is interpreted as a conditional frequency of the examples of
897 $b$ when $a$ is known.
898 The sensitivity of this index to variations in the occurrence of
899 counter-examples is read through the partial derivative:
900 $$\frac{\partial c}{\partial n_{a \wedge \overline{b}} } =
904 Consequently, confidence increases when $n_{a \wedge \overline{b}}$
905 decreases, which is semantically acceptable, but the rate of variation
906 is constant, independent of the rate of decrease of this number, of
907 the variations of $n$ and $n_b$.
908 This property seems not to satisfy intuition.
909 The gradient of $c$ is expressed only in relation to $n_{a \wedge
910 \overline{b}}$ and $n_a$: $\displaystyle \binom{ -\frac{1}{n_a}}{\frac{n_{a \wedge b}}{n_a^2}}$
912 This may also appear to be a restriction on the role of parameters in
913 expressing the sensitivity of the index.
915 \section{Gradient field, implicative field}
916 We highlight here the existence of fields generated by the variables
919 \subsection{Existence of a gradient field}
920 Like our Newtonian physical space, where a gravitational field emitted
921 by each material object acts, we can consider that it is the same
922 around each variable.
923 For example, the variable $a$ generates a scalar field whose value in
924 $b$ is maximum and equal to the intensity of implication or the
925 implicition index $q(a,\overline{b})$.
926 Its action spreads in V according to differential laws as J.M. Leblond
927 says, in~\cite{Leblond} p.242.
929 Let us consider the space $E$ of dimension 4 where the coordinates of
930 the points $M$ are the parameters relative to the binary variables $a$
931 and $b$, i.e. ($n$, $n_a$, $n_b$, $n_{a\wedge \overline{b}}$). $q(a,\overline{b})$ is the realization of a scalar field, as an application of $\mathbb{R}^4$ in $\mathbb{R}$ (immersion of $\mathbb{N}^4$ in $\mathbb{R}^4$).
932 For the grad vector $q$ of components the partial derivatives of $q$
933 with respect to variables $n$, $n_a$, $n_b$, $n_{a\wedge
934 \overline{b}}$ to define a gradient field - a particular vector
935 field that we will also call implicit field - it must respect the
936 Schwartz criterion of an exact total differential, i.e.:
938 $$\frac{\partial}{\partial n_{a\wedge \overline{b}}}\left(
939 \frac{\partial q}{\partial n_b} \right) =\frac{\partial}{\partial n_b}\left(
940 \frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right) $$
941 and the same for the other variables taken in pairs. However, we have,
942 through the formulas (\ref{eq2.3}) and (\ref{eq2.4})
944 $$ \frac{\partial}{\partial n_{a \wedge b}} \left( \frac{\partial q}{\partial n_b} \right) = \frac{1}{2} \left( \frac{n_a}{n}\right)^{-\frac{1}{2}} \left( \frac{n_{\overline{b}}}{n}\right)^{-\frac{3}{2}} = \frac{\partial}{\partial n_b}\left(
945 \frac{\partial q}{\partial n_{a\wedge \overline{b}}} \right)$$
947 Thus, to the vector field C = ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) of $E$, the nature of which we will specify, corresponds a gradient field $G$ which is said to be derived from the {\bf potential} $q$.
948 The gradient grad $q$ is therefore the vector that represents the spatial variation of the field intensity.
949 It is directed from low field values to higher values. By following the gradient at each point, we follow the increase in the intensity of the field's implication in space and, in a way, the speed with which it changes as a result of the variation of one or more parameters.
951 For example, if we set 3 of the parameters $n$, $n_a$, $n_b$, $n_{\overline{b}}$ given by the realization of the couple ($a$, $b$), the gradient is a vector whose direction indicates the growth or decrease of $q$, therefore the decrease or increase of $|q|$ and, as a consequence of $\varphi$ the variations of the 4th parameter.
952 We have indicated this above by interpreting formula (\ref{eq2.5}).
955 \subsection{Level or equipotential lines}
956 An equipotential (or level) line or surface in the $C$ field is a curve of $E$ along which or on which a variable point $M$ maintains the same value of the potential $q$ (e.g. isothermal lines on the globe or level lines on an IGN map).
958 The equation of this surface\footnote{In differential geometry, it seems that this surface is a (quasi) differentiable variety on board, compact, homeomorphic with closed pavement of the intervals of variation of the 4 parameters. Note that the point whose component $n_b$ is equal to $n$ (therefore = 0) is a singular point ( "catastrophic" in René Thom's sense) of the surface and $q$, the potential, is not differentiable at this point. Everywhere else, the surface is distinguishable, the points are all regular. If time, for example, parameters the observations of the process of which ($n$, $n_a$, $n_b$, $n_{\overline{b}}$) is a realization, at each instant corresponds a morphological fiber of the process represented by such a surface in space-time.} is, of course:
959 $$ q(a,\overline{b}) - \frac{n_{a \wedge \overline{b}}-
960 \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} = 0$$
963 Therefore, on such a curve, the scalar product $grad~ q. dM$ is zero.
964 This is interpreted as indicating the orthogonality of the gradient with the tangent or hyperplane tangent to the curve, i.e. with the equipotential line or surface.
965 In a kinematic interpretation of our problem, the velocity of $M$'s path on the equipotential surface is orthogonal to the gradient in $M$.
967 As an illustration in Figure~\ref{chap2fig2}, for a potential $F$ depending on only 2 variables, the figure below shows the orthogonal direction of the gradient with respect to the different equipotential surfaces along which the potential $F$ does not vary but passes from $F=7$ to $F= 10$.
971 \includegraphics[scale=1]{chap2fig2}
972 \caption{Illustration of potential of 2 variables}
973 \label{chap2fig2} % Give a unique label
976 It is possible in the case of the potential $q$, to build equipotential surfaces as above (two-dimensional for ease of representation).
977 It is understandable that the more intense the field is, the tighter the surfaces are. For a given value of $q$, in this case, 3 variables are set, for example $n$, $n_a$, $n_b$ and a value of $q$ compatible with the field constraints. Either: $n = 104$; $n_a = 1600 \leq nb = 3600$ and $q = -2$ or $|q| = 2$. We then find $n_{\overline{b}}= 528$ using formula~(\ref{eq2.1}).
978 But the points ($10^4$, $1600$, $5100$, $5100$, $728$) and ($100$, $25$, $64$, $3$) also belong to this surface and the same equipotential curve.
979 The point ($104$, $1600$, $3600$, $3600$, $928$) belongs to the equipotential curve $q=-3$). In fact, on this entire surface, we obtain a kind of homeostasis of the intensity of implication.
981 The expression of the function $q$ of the variable shows that it is convex.
982 This property proves that the segment of points $t.M_1 + (1-t).M_2$, for $t \in [0,1]$ which connects two points $M_1$ and $M_2$ of the same equipotential line is entirely contained in its convexity.
983 The figure below shows two adjacent equipotential surfaces $\sum_1$ and $\sum_2$ in the implicit field corresponding to two values of the potential $q_1$ and $q_2$.
984 At point $M_1$ the scalar field therefore takes the value $q_1$. $M_2$ is the intersection of the normal from $M_1$ with $\sum_2$. Given the direction of the normal vector $\vec{n}$ the difference $\delta = q2 - q1$, variation of the field when we go from $\sum_1$ to $\sum_2$ is then equal to the opposite of the norm of the gradient from $q$ to $M_1$ is $\frac{\partial q}{\partial n}$, if $n_a$, $n_b$ and $n_{a \wedge \overline{b}}$ are fixed.
988 \includegraphics[scale=1]{chap2fig3}
989 \caption{Illustration of equipotential surfaces}
990 \label{chap2fig3} % Give a unique label
993 Thus, the space $E$ can be laminated by equipotential surfaces corresponding to successive values of $q$ relative to the cardinals ($n$, $n_a$, $n_b$, $n_{a \wedge \overline{b}}$) which would be varied.
994 This situation corresponds to the one envisaged in the SIA modeling.
995 Fixing $n$, $n_a$ and $n_b$, we consider the random sets $X$ and $Y$ of the same cardinals as $A(n_a)$ and $B(n_b)$ and whose cardinal follows a Poisson's law or a binomial law, according to the choice of the model.
996 The different gradient fields, real "lines of force", associated with them are orthogonal to the surfaces defined by the corresponding values of $Q$.
997 This reminds us, in the theoretical framework of potential, of the premonitory metaphor of "implicit flow" that we expressed in~\cite{Grase} and that we will discuss again in Chapter 14 of the book.
998 Behind this notion we can imagine a transport of information of variable intensity in a causal universe.
999 We illustrate this metaphor with the study of the properties of the two-layer implicit cone (see §2.8).
1000 Moreover and intuitively, the implication $a\Rightarrow b$ is of as good quality as the equipotential surface $C$ of the contingency covers random equipotential surfaces depending on the random variable.
1001 Let us recall the relationship that unites the potential q with the intensity:
1002 $$\varphi(a,b) =\frac{1}{\sqrt{2\pi}}\int_{q(a,\overline{b})}^{\infty}e^{-\frac{t^2}{2}} dt$$
1004 \noindent {\bf remark 1}\\
1005 It can be seen that the intensity is also invariant on any equipotential surface of its own variations.
1006 The surface portions generated by $q$ and by $\varphi$ are even in one-to-one correspondence.
1007 In intuitive terms, we can say that when one "swells" the other "deflates".\\
1009 \noindent {\bf remark 2}\\
1010 Let us note once again a particularity of the intensity of implication.
1011 While the surfaces generated by the variations of the 4 parameters of the data are not invariant by the same dilation of the parameters, those associated with the indices cited in §2.4 are invariant and have the same undifferentiated geometric shape.
1013 \section{Implication-inclusion}
1014 \subsection{Foundational and problematic situation}
1015 Three reasons led us to improve the model formalized by the intensity of involvement:
1017 \item when the size of the samples processed, and in particular that of $E$, increases (by around a thousand and more), the intensity $\varphi(a,b)$ no longer tends to be sufficiently discriminating because its values can be very close to 1, while the inclusion whose quality it seeks to model is far from being satisfied (phenomenon reported in~\cite{Bodina} which deals with large student populations through international surveys);
1018 \item the previous quasi-implication model essentially uses the measure of the strength of rule $a \Rightarrow b$.
1019 However, taking into account a concomitance of $\neg b \Rightarrow \neg a$ (contraposed of implication) is useful or even essential to reinforce the affirmation of a good quality of the quasi-implicative, possibly quasi-causal, relationship of $a$ over $b$\footnote{This phenomenon is reported by Y. Kodratoff in~\cite{Kodratoff}.}.
1020 At the same time, it could make it possible to correct the difficulty mentioned above (if $A$ and $B$ are small compared to $E$, their complementary will be important and vice versa);
1021 \item the overcoming of Hempel's paradox (see Appendix 3 of this chapter).
1024 \subsection{An inclusion index}
1026 The solution\footnote{J. Blanchard provides in~\cite{Blanchardb} an answer to this problem by measuring the "equilibrium gap".} we provide uses both the intensity of implication and another index that reflects the asymmetry between situations $S_1 = (a \wedge b)$ and $S_1' = (a \wedge \neg b)$, (resp. $S2 = (\neg a \wedge \neg b)$ and $S_2' = (a \wedge \neg b)$) in favour of the first named.
1027 The relative weakness of instances that contradict the rule and its counterpart is therefore fundamental.
1028 Moreover, the number of counter-examples $n_{a \wedge \overline{b}}$ to $a\ Rightarrow b$ is the one to the contraposed one.
1029 To account for the uncertainty associated with a possible bet of belonging to one of the two situations ($S_1$ or $S_1'$, (resp. $S_2$ or $S_2'$)), we therefore refer to Shannon's concept of entropy~\cite{Shannon}:
1030 $$H(b\mid a) = - \frac{n_{a\wedge b}}{n_a}log_2 \frac{n_{a\wedge b}}{n_a} - \frac{n_{a\wedge \overline{b}}}{n_a}log_2 \frac{n_{a\wedge \overline{b}}}{n_a}$$
1031 is the conditional entropy relating to boxes $(a \wedge b)$ and $(a \wedge \neg b)$ when $a$ is realized
1033 $$H(\overline{a}\mid \overline{b}) = - \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{a\wedge \overline{b}}}{n_{\overline{b}}} - \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}log_2 \frac{n_{\overline{a} \wedge \overline{b}}}{n_{\overline{b}}}$$
1035 is the conditional entropy relative to the boxes $(\neg a \wedge \neg b)$ and $(a \wedge \neg b)$ when not $b$ is realized.
1037 These entropies, with values in $[0,1]$, should therefore be simultaneously weak and therefore the asymmetries between situations $S_1$ and $S_1'$ (resp. $S_2$ and $S_2'$) should be simultaneously strong if one wishes to have a good criterion for including $A$ in $B$.
1038 Indeed, entropies represent the average uncertainty of experiments that consist in observing whether b is performed (or not a is performed) when a (or not b) is observed. The complement to 1 of this uncertainty therefore represents the average information collected by performing these experiments. The more important this information is, the stronger is the guarantee of the quality of the involvement and its counterpart. We must now adapt this entropic numerical criterion to the model expected in the different cardinal situations.
1039 For the model to have the expected meaning, it must satisfy, in our opinion, the following epistemological constraints:
1042 \item It shall integrate the entropy values and, to contrast them, for example, integrate these values into the square.
1043 \item As this square varies from 0 to 1, in order to denote the imbalance and therefore the inclusion, in order to oppose entropy, the value retained will be the complement to 1 of its square as long as the number of counter-examples is less than half of the observations of a (resp. non b).
1044 Beyond these values, as the implications no longer have an inclusive meaning, the criterion will be assigned the value 0.
1045 \item In order to take into account the two information specific to $a\Rightarrow b$ and $\neg b \Rightarrow \neg a$, the product will report on the simultaneous quality of the values retained.
1046 The product has the property of cancelling itself as soon as one of its terms is cancelled, i.e. as soon as this quality is erased.
1047 \item Finally, since the product has a dimension 4 with respect to entropy, its fourth root will be of the same dimension.
1050 Let $\alpha=\frac{n_a}{n}$ be the frequency of a and $\overline{b}=\frac{n_{\overline{b}}}{n}$ be the frequency of non b.
1051 Let $t=\frac{n_{a \wedge \overline{b}}}{n}$ be the frequency of counter-examples, the two significant terms of the respective qualities of involvement and its counterpart are:
1054 h_1(t) = H(b\mid a) = - (1-\frac{t}{\alpha}) log_2 (1-\frac{t}{\alpha}) - \frac{t}{\alpha} log_2 \frac{t}{\alpha} & \mbox{ if }t \in [0,\frac{\alpha}{2}[\\
1055 h_1(t) = 1 & \mbox{ if }t \in [\frac{\alpha}{2},\alpha]\\
1056 h_2(t)= H(\overline{a}\mid \overline{b}) = - (1-\frac{t}{\overline{\beta}}) log_2 (1-\frac{t}{\overline{\beta}}) - \frac{t}{\overline{b}} log_2 \frac{t}{\overline{b}} & \mbox{ if }t \in [0,\frac{\overline{\beta}}{2}[\\
1057 h_2(t)= 1 & \mbox{ if }t \in [\frac{\overline{\beta}}{2},\overline{\beta}]
1059 Hence the definition for determining the entropic criterion:
1060 \definition: The inclusion index of A, support of a, in B, support of b, is the number:
1061 $$i(a,b) = \left[ (1-h_1^2(t)) (1-h_2^2(t))) \right]^{\frac{1}{4}}$$
1063 which integrates the information provided by the realization of a small number of counter-examples, on the one hand to the rule $a \Rightarrow b$ and, on the other hand, to the rule $\neg b \Rightarrow \neg a$.
1065 \subsection{The implication-inclusion index}
1067 The intensity of implication-inclusion (or entropic intensity), a new measure of inductive quality, is the number:
1069 $$\psi(a,b)= \left[ i(a,b).\varphi(a,b) \right]^{\frac{1}{2}}$$
1070 which integrates both statistical surprise and inclusive quality.
1072 The function $\psi$ of the variable $t$ admits a representation that has the shape indicated in Figure~\ref{chap2fig4}, for $n_a$ and $n_b$ fixed.
1073 Note in this figure the difference in the behaviour of the function with respect to the conditional probability $P(B\mid A)$, a fundamental index of other rule measurement models, for example in Agrawal.
1074 In addition to its linear, and therefore not very nuanced nature, this probability leads to a measure that decreases too quickly from the first counter-examples and then resists too long when they become important.
1077 \begin{figure}[htbp]
1079 \includegraphics[scale=0.5]{chap2fig4.png}
1080 \caption{Example of implication-inclusion.}
1085 In Figure~\ref{chap2fig4}, it can be seen that this representation of the continuous function of $t$ reflects the expected properties of the inclusion criterion:
1087 \item ``Slow reaction'' to the first counter-examples (noise resistance),
1088 \item ``acceleration'' of the rejection of inclusion close to the balance i.e. $\frac{n_a}{2n}$,
1089 \item rejection beyond $\frac{n_a}{2n}$, the intensity of implication $\varphi(a,b)$ did not ensure it.
1092 \noindent Example 1\\
1093 \begin{tabular}{|c|c|c|c|}\hline
1094 & $b$ & $\overline{b}$ & margin\\ \hline
1095 $a$ & 200 & 400& 600 \\ \hline
1096 $\overline{a}$ & 600 & 2800& 3400 \\ \hline
1097 margin & 800 & 3200& 4000 \\ \hline
1101 In Example 1, implication intensity is $\varphi(a,b)=0.9999$ (with $q(a,\overline{b})=-3.65$).
1102 The entropic values of the experiment are $h_1=h_2=0$.
1103 The value of the moderator coefficient is therefore $i(a,b)=0$.
1104 Hence, $\psi(a,b)=0$ whereas $P(B\mid A)=0.33$.
1105 Thus, the "entropic" functions "moderate" the intensity of implication in this case where inclusion is poor.
1108 \noindent Example 2\\
1109 \begin{tabular}{|c|c|c|c|}\hline
1110 & $b$ & $\overline{b}$ & margin\\ \hline
1111 $a$ & 400 & 200& 600 \\ \hline
1112 $\overline{a}$ & 1000 & 2400& 3400 \\ \hline
1113 margin & 1400 & 2600& 4000 \\ \hline
1117 In Example 2, intensity of implication is 1 (for $q(a,\overline{b}) = - 8.43$).
1118 The entropic values of the experiment are $h_1 = 0.918$ and $h_2 = 0.391$.
1119 The value of the moderator coefficient is therefore $i(a,b) = 0.6035$.
1120 As a result $\psi(a,b) = 0.777$ whereas $P(B \mid A) = 0.6666$.
1124 \noindent The correspondence between $\varphi(a,b)$ and $\psi(a,b)$ is not monotonous as shown in the following example:
1126 \begin{tabular}{|c|c|c|c|}\hline
1127 & $b$ & $\overline{b}$ & margin\\ \hline
1128 $a$ & 40 & 20& 60 \\ \hline
1129 $\overline{a}$ & 60 & 280& 340 \\ \hline
1130 margin & 100 & 300& 400 \\ \hline
1133 Thus, while $\varphi(a,b)$ decreased from the 1st to the 2nd example, $i(a,b)$ increased as well as $\psi(a,b)$. On the other hand, the opposite situation is the most frequent.
1134 Note that in both cases, the conditional probability does not change.
1138 \noindent We refer to~\cite{Lencaa} for a very detailed comparative study of association indices for binary variables.
1139 In particular, the intensities of classical and entropic (inclusion) implication presented in this article are compared with other indices according to a "user" entry.
1141 \section{Implication graph}
1142 \subsection{Problematic}
1144 At the end of the calculations of the intensities of implication in both the classical and entropic models, we have a table $p \times p$ that crosses the $p$ variables with each other, whatever their nature, and whose elements are the values of these intensities of implication, numbers of the interval $[0,~1]$.
1145 It must be noted that the underlying structure of all these variables is far from explicit and remains largely unimportant.
1146 The user remains blind to such a square table of size $p^2$.
1147 It cannot simultaneously embrace the possible multiple sequences of rules that underlie the overall structure of all $p$ variables.
1148 In order to facilitate a clearer extraction of the rules and to examine their structure, we have associated to this table, and for a given intensity threshold, an oriented graph, weighted by the intensities of implication, without a cycle whose complexity of representation the user can control by setting himself the threshold for taking into account the implicit quality of the rules.
1149 Each arc in this graph represents a rule: if $n_a < n_b$, the arc $a \rightarrow b$ represents the rule $a \Rightarrow b$ ; if $n_a = n_b$, then the arc $a \leftrightarrow b$ will represent the double rule $a \Leftrightarrow b$, in other words, the equivalence between these two variables.
1150 By varying the threshold of intensity of implication, it is obvious that the number of arcs varies in the opposite direction: for a threshold set at $0.95$, the number of arcs is less than or equal to those that would constitute the graph at threshold $0.90$. We will discuss this further below.
1152 \subsection{Algorithm}
1154 The relationship defined by statistical implication, if it is reflexive and not symmetrical, is obviously not transitive, as is induction and, on the contrary, deduction.
1155 However, we want it to model the partial relationship between two variables (the successes in our initial example).
1156 By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, we will accept the transitive closure $a \Rightarrow c$ only if $\varphi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ to $c$ is better than neutrality by emphasizing the dependence between $a$ and $c$.
1159 {\bf Proposal:} By convention, if $a \Rightarrow b$ and $b \Rightarrow c$, there is a transitive closure $a \Rightarrow c$ if and only if $\varphi(a,c) \geq 0.5$, i.e. if the implicit relationship of $a$ over $c$, which reflects a certain dependence between $a$ and $c$, is better than its refutation.
1160 Note that for any pair of variables $(x;~ y)$, the arc $x \rightarrow y$ is weighted by the intensity of involvement (x,y).
1162 Let us take a formal example by assuming that between the 5 variables $a$, $b$, $c$, $d$, and $e$ exist, at the threshold above $0.5$, the following rules: $c \Rightarrow a$, $c \Rightarrow e$, $c \Rightarrow b$, $d \Rightarrow a$, $d \Rightarrow e$, $a \Rightarrow b$ and $a \Rightarrow e$.
1164 This set of numerical and graphical relationships can then be translated into the following table and graph:
1166 \begin{tabular}{|C{0.5cm}|c|c|c|c|c|}\hline
1167 \hspace{-0.5cm}\turn{45}{$\Rightarrow$} & $a$ & $b$ & $c$ & $d$ & $e$\\ \hline
1168 $a$ & & 0.97& & & 0.73 \\ \hline
1169 $b$ & & & & & \\ \hline
1170 $c$ & 0.82 & 0.975& & & 0.82 \\ \hline
1171 $d$ & 0.78 & & & & 0.92 \\ \hline
1172 $e$ & & & & & \\ \hline
1175 \begin{figure}[htbp]
1177 \includegraphics[scale=1]{chap2fig5.png}
1178 \caption{Implication graph corresponding to the previous example.}
1183 One of the difficulties related to the graphical representation is that the graph is not planar.
1184 The algorithm that allows its construction must take it into account and, in particular, must "straighten" the paths of the graph in order to allow an acceptable readability for the expert who will analyze it.
1186 The number of arcs in the graph can be reduced (or increased) if we raise (or lower) the acceptance threshold of the rules, the level of confidence in the selected rules.
1187 Correlatively, arcs can appear or disappear depending on the variations of the threshold.
1188 Let us recall that this graph is necessarily without cycle, that it is not a lattice since, for example, the variable $a$ does not imply the variable ($a$ or $\neg a$) whose support is $E$.
1189 A fortiori, it cannot be a Galois lattice.
1190 Options of the CHIC software for automatic data processing with SIA, allow to delete variables at will, to move their image in the graph in order to decrease the arcs or to focus on certain variables called vertices of a kind of "cone" whose two "plots" are made up respectively of the variables "parents" and the variables "children" of this vertex variable.
1191 We refer to the ends of the arcs as "nodes". A node in a given graph has a single variable or a conjunction of variables.
1192 The transition from a node $S_1$ to a node $S_2$ is also called "transition" which is represented by an arc in the graph.
1193 The upper slick of the vertex cone the variable $a$, called the nodal variable, is made up of the "fathers" of $a$, either in the "causal" sense the causes of $a$ ; the lower slick, on the other hand, is made up of the "children" of $a$ and therefore, always in the causal sense, the consequences or effects of $a$.
1194 The expert in the field analysed here must be particularly interested in these configurations, which are rich in information.
1195 See, for example~\cite{Lahanierc} and the two implicit cones below (i.e. Figures~\ref{chap2fig6} and \ref{chap2fig7}).
1197 \begin{figure}[htbp]
1199 \includegraphics[scale=0.75]{chap2fig6.png}
1200 \caption{Implicative cone.}
1205 \begin{figure}[htbp]
1207 \includegraphics[scale=0.75]{chap2fig7.png}
1208 \caption{Implicative cone centered on a variable.}
1214 \section{Reduction in the number of variables}
1215 \subsection{Motivation}
1218 As soon as the number of variables becomes excessive, most of the available techniques become impractical\footnote{This paragraph is strongly inspired by paper~\cite{Grask}.}.
1219 In particular, when an implicitive analysis is carried out by calculating association rules~\cite{Agrawal}, the number of rules discovered undergoes a combinatorial explosion with the number of variables, and quickly becomes inextricable for a decision-maker, provided that variable conjunctions are requested.
1220 In this context, it is necessary to make a preliminary reduction in the number of variables.
1222 Thus, ~\cite{Ritschard} proposed an efficient heuristic to reduce both the number of rows and columns in a table, using an association measure as a quasi-optimal criterion for controlling the heuristic.
1223 However, to our knowledge, in the various other research studies, the type of situation at the origin of the need to group rows or columns is not taken into account in the reduction criteria, whether the analyst's problem and aim are the search for similarity, dissimilarity, implication, etc., between variables.
1225 Also, to the extent that there are very similar variables in the sense of statistical implication, it might be appropriate to substitute a single variable for these variables that would be their leader in terms of representing an equivalence class of similar variables for the implicit purpose.
1226 We therefore propose, following the example of what is done to define the notion of quasi-implication, to define a notion of quasi-equivalence between variables, in order to build classes from which we will extract a leader.
1227 We will illustrate this with an example.
1228 Then, we will consider the possibility of using a genetic algorithm to optimize the choice of the representative for each quasi-equivalence class.
1230 \subsection{Definition of quasi-equivalence}
1232 Two binary variables $a$ and $b$ are logically equivalent for the SIA when the two quasi-implications $a \Rightarrow b$ and $b \Rightarrow a$ are simultaneously satisfied at a given threshold.
1233 We have developed criteria to assess the quality of a quasi-involvement: one is the statistical surprise based on the likelihood of~\cite{Lerman} relationship, the other is the entropic form of quasi-inclusion~\cite{Grash2} which is presented in this chapter (§7).
1235 According to the first criterion, we could say that two variables $a$ and $b$ are almost equivalent when the intensity of involvement $\varphi(a,b)$ of $a\Rightarrow b$ is little different from that of $b \Rightarrow a$. However, for large groups (several thousands), this criterion is no longer sufficiently discriminating to validate inclusion.
1237 According to the second criterion, an entropic measure of the imbalance between the numbers $n_{a \wedge b}$ (individuals who satisfy $a$ and $b$) and $n_{a \wedge \overline{b}} $ (individuals who satisfy $a$ and $\neg b$, counter-examples to involvement $a\Rightarrow b$) is used to indicate the quality of involvement $a\Rightarrow b$, on the one hand, and the numbers $n_{a \wedge b}$ and $n_{\overline{a} \wedge b}$ to assess the quality of mutual implication $b\Rightarrow a$, on the other.
1240 Here we will use a method comparable to that used in Chapter 3 to define the entropic implication index.
1242 By posing $n_a$ and $n_b$, respectively effective of $a$ and $b$, the imbalance of the rule $a\Rightarrow b$ is measured by a conditional entropy $K(b \mid a=1)$, and that of $b\Rightarrow a$ by $K(a \mid b=1)$ with:
1246 K(b\mid a=1) = - \left( 1- \frac{n_{a\wedge b}}{n_a}\right) log_2 \left( 1- \frac{n_{a\wedge b}}{n_a}\right) - \frac{n_{a\wedge b}}{n_a}log_2 \frac{n_{a\wedge b}}{n_a} & \quad if \quad \frac{n_{a \wedge b}}{n_a} > 0.5\\
1247 K(b\mid a=1) = 1 & \quad if \quad \frac{n_{a \wedge b}}{n_a} \leq 0.5\\
1248 K(a\mid b=1) = - \left( 1- \frac{n_{a\wedge b}}{n_b}\right) log_2 \left( 1- \frac{n_{a\wedge b}}{n_b}\right) - \frac{n_{a\wedge b}}{n_b}log_2 \frac{n_{a\wedge b}}{n_b} & \quad if \quad \frac{n_{a \wedge b}}{n_b} > 0.5\\
1249 K(a\mid b=1) = 1 & \quad if \quad \frac{n_{a \wedge b}}{n_b} \leq 0.5
1252 These two entropies must be low enough so that it is possible to bet on $b$ (resp. $a$) with a good certainty when $a$ (resp. $b$) is achieved. Therefore their respective complements to 1 must be simultaneously strong.
1254 \begin{figure}[htbp]
1256 \includegraphics[scale=0.5]{chap2fig8.png}
1257 \caption{Illustration of the functions $K$ et $1-K^2$ on $[0; 1]$ .}
1263 \definition A first entropic index of equivalence is given by:
1264 $$e(a,b) = \left (\left[ 1 - K^2(b \mid a = 1)\right ]\left[ 1 - K^2(a \mid b = 1) \right]\right)^{\frac{1}{4}}$$
1266 When this index takes values in the neighbourhood of $1$, it reflects a good quality of a double implication.
1267 In addition, in order to better take into account $a \wedge b$ (the examples), we integrate this parameter through a similarity index $s(a,b)$ of the variables, for example in the sense of I.C. Lerman~\cite{Lermana}.
1268 The quasi-equivalence index is then constructed by combining these two concepts.
1270 \definition A second entropic equivalence index is given by the formula
1272 $$\sigma(a,b)= \left [ e(a,b).s(a,b)\right ]^{\frac{1}{2}}$$
1274 From this point of view, we then set out the quasi-equivalence criterion that we use.
1276 \definition The pair of variables $\{a,b\}$ is said to be almost equivalent for the selected quality $\beta$ if $\sigma(a,b) \geq \beta$.
1277 For example, a value $\beta=0.95$ could be considered as a good quasi-equivalence between $a$ and $b$.
1279 \subsection{Algorithm of construction of quasi-equivalence classes}
1281 Let us assume a set $V = \{a,b,c,...\}$ of $v$ variables with a valued relationship $R$ induced by the measurement of quasi-equivalence on all pairs of $V$.
1282 We will assume the pairs of variables classified in a decreasing order of quasi-equivalence.
1283 If we have set the quality threshold for quasi-equivalence at $\beta$, only the first of the pairs $\{a,b\}$ checking for inequality $\sigma(a,b)\ge \beta$ will be retained.
1284 In general, only a part $V'$, of cardinal $v'$, of the variables of $V$ will verify this inequality.
1285 If this set $V'$ is empty or too small, the user can reduce his requirement to a lower threshold value.
1286 The relationship being symmetrical, we will have at most pairs to study.
1287 As for $V-V'$, it contains only non-reducible variables.
1289 We propose to use the following greedy algorithm:
1291 \item A first potential class $C_1^0= \{e,f\}$ is constituted such that $\sigma(e,f)$ represents the largest of the $\beta$-equivalence values.
1292 If possible, this class is extended to a new class $C_1$ by taking from $V'$ all the elements $x$ such that any pair of variables within this class allows a quasi-equivalence greater than or equal to $\beta$;
1294 \item We continue with:
1296 \item If $o$ and $k$ forming the pair $(o,k)$ immediately below $(e,f)$ according to the index $\sigma$, belong to $C_1$, then we move to the pair immediately below (o,k) and proceed as in 1.;
1297 \item If $o$ and $k$ do not belong to $C_1$, proceed as in 1. from the pair they constitute by forming the basis of a new class;
1298 \item If $o$ or $k$ does not belong to $C_1$, one of these two variables can either form a singleton class or belong to a future class. On this one, we will of course practice as above.
1302 After a finite number of iterations, a partition of $V$ is available in $r$ classes of $\sigma$-equivalence: $\{C_1, C_2,..., C_r\}$.
1303 The quality of the reduction may be assessed by a gross or proportional index of $\beta^{\frac{r}{k}}$.
1304 However, we prefer the criterion defined below, which has the advantage of integrating the choice of representative.
1306 In addition, $k$ variables representing the $k$ classes of $\sigma$-equivalence could be selected on the basis of the following elementary criterion: the quality of connection of this variable with those of its class.
1307 However, this criterion does not optimize the reduction since the choice of representative is relatively arbitrary and may be a sign of triviality of the variable.
1309 \section{Conclusion}
1311 This overview of the development of implicit statistical analysis shows, if necessary, how a data processing theory is built step by step in response to problems presented by experts from various fields and in response to epistemological requirements that respect common sense and intuition.
1312 It therefore appears differently than as a view of the mind since it is directly applicable to the situations that lead to its genesis.
1313 The extensions made to the types of data processed, to the modes of representation of their structures, to the relationships between subjects, their descriptors and variables are indeed the result of the experts' greedy questions.
1314 Its respective functions as developer and analyzer seem to operate successfully in multiple application areas.
1316 We will have noticed that the theoretical basis is simple, which could be the reason for its fertility.
1317 Even if the questioning of primitive theoretical choices is not apparent here, this genesis has not been without conflicts between the expected answers, the ease of their access and therefore these answers have been sources of restoration or even redesign; often discussed within the research team.
1318 In any case, this method of data analysis will have made it possible and will, Régis hopes, still make it possible to highlight living structures thanks to the non-symmetrical approach on which it is based.
1320 Among the current or future work proposed to our team, one concerns an extension of the SIA to vector variables in response to problems in proteomics.
1321 Another is more broadly concerned with the relationship between SIA and the treatment of fuzzy sets (see Chapter 7).
1322 The function of the "implication" fuzzy logic operator will be illustrated by new applications.
1323 Through another subject, we will review our method to allow the SIA to solve the problem of data table vacancies, as well as the ongoing work on reducing redundant rules in SIA.
1324 Finally, it is clear that this work will be conducted interactively with applications and, in particular, the contribution of SIA to the classification rule in the leaves of classification trees.
1328 \section{Annex1: Two models of the classical implication intensity}
1330 \subsection{Binomial model}
1332 To examine the quality of quasi-rule $a \Rightarrow b$, in the case where the variables are binary, is to measure equivalently that of the inclusion of the subset of transactions satisfying $a$ in the subset of transactions satisfying $b$.
1333 The counter-examples relating to inclusion are indeed the same as those relating to the implication expressed by: "any satisfactory transaction $a$ has also satisfied $b$".
1334 From this overall perspective, as soon as $n_a n_b$, the quality of the quasi-rule $a \Rightarrow b$, can only be semantically better than the one of $b \Rightarrow a$.
1335 We will therefore assume, later on, that $n_a \leq n_b$ when studying $a \Rightarrow b$. In this case, the main population is finite and $Card~ E = n$.
1337 Binomial modelling was the first to be adopted chronologically (see~\cite{Grasb} chap. 2).
1338 It was compared to other models in~\cite{Lermana}.
1339 Let us briefly recall what the binomial model consists of.
1340 With the adopted notations, $X$ and $Y$ are two random subsets, independently chosen from all the parts of $E$, respectively of the same cardinal $n_a$ and $n_b$ as the subsets of the realizations of $a$ and $b$.
1341 The observed value $n_{a \wedge b}$ can be considered as the realization of a random variable $Card(X\cap Y)$ which represents the random number of counter-examples to the inclusion of $X$ in $Y$, counter-examples observed during $n$ successive independent draws. From there, $Card(X\cap \overline{Y})$ can be considered as a binomial variable of parameters $n$ and $\pi$ where $\pi$ is itself estimated by $p = \frac{n_a}{n}\frac{n_b}{n}$. Thus:
1343 $$Pr[Card(X\cap \overline{Y})= k]= C_n^k\left( \frac{n_an_{\overline{b}}}{n^2} \right)^k \left(1-\frac{n_a n_{\overline{b}}}{n^2} \right)^{n-k} $$
1345 The estimated reduced centered variable $Q(a,~\overline{b})$ then accepts as a realization:
1347 $$q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
1348 \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}(1-\frac{n_a n_{\overline{b}}}{n^2})} }$$
1350 As before, we obtain the estimated intensity of empirical implication:
1351 $$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] = 1 - \sum _0^{n_{a \wedge \overline{b}}} C_n^k\left (\frac{n_an_{\overline{b}}}{n^2}\right )^k\left (1-\frac{n_an_{\overline{b}}}{n^2}\right )^{n-k}$$
1354 The probability law of $Q(a,\overline{b})$ can be approximated by the one of the Laplace-Gauss law centred reduced $N(0,1)$. Generally, the intensity calculated in the Poisson model is more "severe" than the intensity derived from the binomial model in the sense that $\varphi(a,b)_{Poisson} \leq \varphi(a,b)_{Binomial}$.
1356 \remark We can note that the implication index is null if and only if the two variables $a$ and $b$ are independent. Indeed, we have
1357 $$ q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-
1358 \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}(1-\frac{n_a n_{\overline{b}}}{n^2})} } =0 \iff n_{a \wedge \overline{b}}- \frac{n_a.n_{\overline{b}}}{n}=0$$
1360 $$q(a,\overline{b}) =0 \iff n_{a \wedge \overline{b}}=\frac{n_a.n_{\overline{b}}}{n}~ \mbox{or }~ q(a,\overline{b}) =0 \iff \frac{n_a.n_{\overline{b}}}{n}=\frac{n_a}{n}\frac{n_{\overline{b}}}{n}$$
1362 This last relationship reflects the property of statistical independence.
1364 \subsection{Hypergeometric model}
1365 Let us briefly recall the 3rd modelling proposed in \cite{Lermana} and \cite{Grasd}. We repeat the same approach: $A$ and $B$ are the parts of $E$ representing the individuals satisfying $a$ and $b$ respectively and whose cardinals are $card (A)=n_a$ and $card (B)=n_b$. Then let us consider, two independent random parts $X$ and $Y$ such that $card (X)=n_a$ and $card (Y)=n_b$. The random variable $Card(A \cap \overline{Y})$ represents the random number of elements of $E$ which, being in $A$ are not in $Y$. This variable follows a hypergeometric law and we have for all $kn_a$:
1367 $$Pr[Card(A \cap \overline{Y})=k]=\frac{C_{n_a}^k C_{n-n_a}^{n-n_b-k}}{C_n^{n-n_b}} =\frac{n_a!n_{\overline{a}}! n_b!n_{\overline{b}}! }{k!n!(n_a-k)!(n_{\overline{b}}-k)! (n_b-n_a+k)! }$$
1369 $$\frac{C_{n-n_b}^k C_{n_b}^{n_a-k}}{C_n^{n_a}} = Pr[Card(X \cap \overline{B})=k]$$
1371 This shows, by exchanging the role of $a$ and $b$, that the empirical implication index $Q(a,\overline{b})$ corresponding to the quasi-rule $a \Rightarrow b$, is the same as the one corresponding to the reciprocal, i.e. $Q(b,\overline{a})$ . We thus obtain the same intensity for the quasi-rule $a \Rightarrow b$ and for the reciprocal quasi-rule $b \Rightarrow a$.
1373 \subsection{Choice of models to evaluate the intensity of implication}
1374 If binomial modeling remains compatible with the semantics of implication, a non-symmetric binary relationship, the same cannot be said for hypergeometric modeling since it does not distinguish the quality of a quasi-rule from that of its reciprocal and has a low pragmatic character.
1375 Consequently, we will only retain the Poisson model and the binomial model as models adapted to the semantics of involvement between binary variables.
1378 The legitimate coexistence of three different models of our problem of measuring the quality of a quasi-rule is not inconsistent: it is due to the way in which the drawing of transactions (Poisson's law) or sets of grouped transactions (binomial law or hypergeometric law) is taken into account one by one. In addition, we know that when the total number of transactions becomes very large, all three models converge on the same Gaussian model. In~\cite{Lallich}, we find, as a generalization, a parameterization of the three indices obtained by these models, which allows us to evaluate the interest of the rules obtained by comparing them to a given threshold.
1380 \section{Annex 2: Modelling of implication integrating confidence and surprise}
1382 Recently, in~\cite{Grasab}, we have assembled two statistical concepts that we believe are internal to the implicit relationship between two variables $a$ and $b$:
1384 \item on the one hand, the intensity of involvement $\varphi(a,b)$ measuring surprise or astonishment at the low number of counter-examples to implication between these variables
1385 \item on the other hand, the confidence $C(b \mid a)$ measuring the conditional frequency of $b$ knowing $a$ who is involved in the majority of the other implication indices as we have seen in §2.5.4.
1388 So, we claim, by plagiarizing G. Vergnaud~\cite{Vergnaudd} speaking about aesthetics, that there is no data analysis without {\bf confidence} (psychological level). But there is also no data analysis without {\bf surprise}\footnote{This is also what René Thom says in~\cite{Thoma} p. 130: (translated in english) "...the problem is not to describe reality, the problem is much more to identify in it what makes sense to us, what is surprising in all the facts. If the facts do not surprise us, they do not bring any new element to the understanding of the universe: we might as well ignore them" and further on: "... which is not possible if we do not already have a theory".} (statistical level), nor without {\bf scale correction} (pragmatic level). The two concepts (confidence and intensity of implication) therefore respond to relatively distinct but not contradictory principles: confidence is based on the subordination of variable $b$ to variable $a$ while intensity of implication is based on counter-examples to the subjection relationship of $b$ by $a$.
1390 It is demonstrated in~\cite{Grasab} that, for any $\alpha$ that the ratio
1392 $$ \frac{Pr[C(b\mid a)\geq \alpha]}{Pr[\varphi(a,b)\geq \alpha]}~\mbox{is close of}~ \frac{Pr[C(b \mid a) \geq \alpha}{1-\alpha}$$
1395 Under these conditions, this ratio is a good indicator of satisfaction between confidence and intensity of implication: greater than 1, confidence is then better than intensity; less than 1, intensity is stronger. Further research could be based on this indicator.
1397 Finally, as we did for entropic intensity, we will take into account the contraposed by associating the two conditional frequencies of b knowing a, i.e. $C_1(a,b)$ (for direct implication $a \Rightarrow b$) and $no~ a$ knowing $no~ b$, $C_2(a,b)$ (for contraposed implication $\neg b \Rightarrow \neg a$). Finally, we choose the following formula to define a new measure of implication that we call {\bf implifiance} in French (implication + confidence):
1399 $$ \phi(a,b)=\varphi(a,b).\left [ C_1(a,b).C_2(a,b) \right ]^{\frac{1}{4}}$$
1401 For example, if we extract a rule whose implication is equal to $0.95$, its intensity of implication is at least equal to $0.95$ and each of the $C_1$ and $C_2$ confidences is at least equal to $0.81$. If the implication is equal to $0.90$, the respective minima are $0.90$ and $0.66$, which preserves the plausibility of the rule.
1403 The following two figures show the respective variations in intensity of implication, entropic intensity and implifiance in ordinates as a function of the number of counter-examples in cases $n=100$ and $n=1000$ (respectively in Figures~\ref{chap2fig9} and~\ref{chap2fig10}.
1405 \begin{figure}[htbp]
1407 \includegraphics[scale=1.3]{chap2fig9.png}
1408 \caption{Example of Implifiance with $n=100$.}
1413 \begin{figure}[htbp]
1415 \includegraphics[scale=1.3]{chap2fig10.png}
1416 \caption{Example of Implifiance with $n=1000$.}
1422 \section{Annex 3: SIA and Hempel's paradox}
1424 If we look at the SIA from the point of view of Knowledge Extraction, we find the main objective of the inductive establishment of rules and quasi-rules between variables $a$ and $b$ observed through instances $x$ of a set $E$ of objects or subjects. A strict rule (or theorem in this case) will be expressed in a symbolic form: $\forall x, (a(x)\Rightarrow b(x))$. A quasi-rule will present counter-examples, i.e. the following statement will be observed: $\exists x, (a(x)\wedge \overline{b(x)})$.
1427 The purpose of the SIA is to provide a measure to such rules in order to estimate their quality when the frequency of the last statement above is low.
1428 First, within the framework of the SIA, a quality index is constructed in order, like other indices, to provide a probabilistic response to this problem.
1429 But in seeking among the rules\footnote{$n_{a \wedge \overline{b}}$} those that would express a causality, a causal relationship, or at least a causal relationship, it seemed absolutely necessary to us, as we said in point 4, to support the satisfaction of the direct rule by a measure of its contraposition: $\forall x, (\overline{b(x)} \Rightarrow \overline{a(x)})$.
1430 Indeed, if statistically, whether with confidence measured by conditional frequency or with intensity of implication, the truth of a strict rule is also obtained with its counterpart, this is no longer necessarily the case with a quasi-rule.
1431 We have also sought to construct in a new and original way a measure that makes it possible to overcome Hempel's paradox~\cite{Hempel} in order to obtain a measure that confirms the satisfaction of induction in terms of causality.
1434 It should be recalled that, according to Carl G. Hempel, in strict logic, this paradox is linked to the irrelevance of contraposition in relation to induction, whereas empirical non-satisfaction (de facto) with premise $a$ is observed.
1435 It is the consequence of the application of Hempel's 3rd principle: "If an observed object $x$ does not satisfy the antecedent (i.e. $a(x) = false$), it does not count or it is irrelevant in relation to the conditional (= the direct proposition)".
1436 In other words, the confirmation of the contraposition does not provide anything as to the direct version of the proposal, although it is logically equivalent to it.
1437 For example, it is not the confirmatory observation of the contraposition of "All crows are black" by a red cat (i. e. not black) that confirms the validity of "All crows are black". Nor, for that matter, by continuing to observe other non-black objects. Because to confirm this statement and thus validate the induction, we would have to review all the non-black objects that can be infinite in number.
1439 In other words, according to Hempel, in the implication truth table, cases where $a(x)$ is false are uninteresting for induction; only the lines [$a(x)=true$ and $b(x)=true$] that confirm the rule and [$a(x)=false$ and $b(x)=true$] that invalidate it, are retained.
1442 \underline{However, in SIA, this paradox does not hold for two reasons:}
1446 \item the objects $x$ are part of the same finite or unfinite reference set $E$, i.e. infinite, countable and even continuous, in which all $x$ are likely, with relevance, to satisfy or not satisfy the variables at stake. That is, by assigning them a value (truth or numerical), the direct proposition and/or its counterpart are also evaluable (for example, proposition $a \Rightarrow b$ is true even if $a(x)$ is false while $b(x)$ is true);
1447 \item Since we are most often dealing with quasi-rules, the equivalence between a proposal and its counterpart no longer holds, and it is on the basis of the combination of the respective and evaluated qualities of these statements that we induce or not a causal character. Moreover, if the rule is strict, the logical equivalence with its counterpart is strict and the counterpart rule is satisfied at the same time.