X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_chic.git/blobdiff_plain/c1aea4dc782230b123634e9c6f06bfc9d39b973a..bc528e875a70eb7d9dd810f644ba646bf6a4e6f6:/chapter2.tex?ds=sidebyside diff --git a/chapter2.tex b/chapter2.tex index c99fe46..f492308 100644 --- a/chapter2.tex +++ b/chapter2.tex @@ -107,4 +107,193 @@ implication index for binary data~\cite{Lermana} or \cite{Lallich}, on the other hand, this notion is not extended to other types of variables, to extraction and representation according to a rule graph or a hierarchy of meta-rules; structures aiming at access to the -meaning of a whole not reduced to the sum of its parts \footnote{ICI }, i.e. operating as a complex non-linear system. For example, it is well known, through usage, that the meaning of a sentence does not completely depend on the meaning of each of the words in it (see the previous chapter, point 4). +meaning of a whole not reduced to the sum of its +parts~\cite{Seve}\footnote{This is what the philosopher L. Sève + emphasizes :"... in the non-additive, non-linear passage of the + parts to the whole, there are properties that are in no way + precontained in the parts and which cannot therefore be explained by + them" }, i.e. operating as a complex non-linear system. +For example, it is well known, through usage, that the meaning of a +sentence does not completely depend on the meaning of each of the +words in it (see the previous chapter, point 4). + +Let us return to what we believe is fertile in the approach we are +developing. +It would seem that, in the literature, the notion of implication index +is also not extended to the search for subjects and categories of +subjects responsible for associations. +Nor that this responsibility is quantified and thus leads to a +reciprocal structuring of all subjects, conditioned by their +relationships to variables. +We propose these extensions here after recalling the founding +paradigm. + + +\section{Implication intensity in the binary case} + +\subsection{Fundamental and founding situation} + +A set of objects or subjects E is crossed with variables +(characters, criteria, successes,...) which are interrogated as +follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$ +implies instantiating variable $b$? +In other words, do the subjects tend to be $b$ if we know that they are +$a$?". +In natural, human or life sciences situations, where theorems (if $a$ +then $b$) in the deductive sense of the term cannot be established +because of the exceptions that taint them, it is important for the +researcher and the practitioner to "mine into his data" in order to +identify sufficiently reliable rules (kinds of "partial theorems", +inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship, +a genesis, to describe, structure a population and make the assumption +of a certain stability for descriptive and, if possible, predictive +purposes. +But this excavation requires the development of methods to guide it +and to free it from trial and error and empiricism. + + +\subsection{Mathematization} + +To do this, following the example of the I.C. Lerman similarity +measurement method \cite{Lerman,Lermanb}, following the classic +approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we +define~\cite{Grasb,Grasf} the confirmatory quality measure of the +implicative relationship $a \Rightarrow b$ from the implausibility of +the occurrence in the data of the number of cases that invalidate it, +i.e. for which $a$ is verified without $b$ being verified. This +amounts to comparing the difference between the quota and the +theoretical if only chance occurred\footnote{"...[in agreement with + Jung] if the frequency of coincidences does not significantly + exceed the probability that they can be calculated by attributing + them solely by chance to the exclusion of hidden causal + relationships, we certainly have no reason to suppose the existence + of such relationships.", H. Atlan~\cite{Atlana}}. +But when analyzing data, it is this gap that we take into account and +not the statement of a rejection or null hypothesis eligibility. +This measure is relative to the number of data verifying $a$ and not +$b$ respectively, the circumstance in which the involvement is +precisely put in default. +It quantifies the expert's "astonishment" at the unlikely small number +of counter-examples in view of the supposed independence between the +variables and the numbers involved. + +Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$, +$c$,... +In the classical paradigmatic situation and initially retained, it is +about the performance (success-failure) to items of a questionnaire. +To a finite set $E$ of $n$ subjects $x$, functions of the type : $x +\rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies +or has the character $a$ and $0$ (or $a(x) = false$) otherwise are +associated by abuse of writing. +In artificial intelligence, we will say that $x$ is an example or an +instance for $a$ if $a(x) = 1$ and a counter-example if not. + + +The $a \Rightarrow b$ rule is logically true if for any $x$ in the +sample, $b(x)$ is null only if $a(x)$ is also null; in other words if +set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the +$x$ for which $b(x)=1$. +However, this strict inclusion is only exceptionally observed in the +pragmatically encountered experiments. +In the case of a knowledge questionnaire, we could indeed observe a +few rare students passing an item $a$ and not passing item $b$, +without contesting the tendency to pass item $b$ when we have passed +item $a$. +With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or +$n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the +counter-examples (or) that must be taken into account in order to +statistically accept whether or not to keep the quasi-implication or +quasi-rule $a \Rightarrow b$. Thus, it is from the dialectic of +example-counter-examples that the rule appears as the overcoming of +contradiction. + +\subsection{Formalization} + +To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of +$E$, chosen randomly and independently (absence of a priori link +between these two parts) and of the same respective cardinals as $A$ +and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$. + +We will then say: +Definition 1: $a \Rightarrow b$ is acceptable at confidence level +$1-\alpha$ if and only if +$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$ + +\begin{figure}[htbp] + \centering +\includegraphics[scale=0.34]{chap2fig1.png} + \caption{The dark grey parts correspond to the counter-examples of the + implication $a \Rightarrow b$} +\label{chap2fig1} +\end{figure} + +It is established \cite{Lermanb} that, for a certain drawing process, +the random variable $Card(X\cap \overline{Y})$ follows the Poisson law +of parameter $\frac{n_a n_{\overline{b}}}{n}$. +We achieve this same result by proceeding differently in the following +way: + +Note $X$ (resp. $Y$) the random subset of binary transactions where +$a$ (resp. $b$) would appear, independently, with the frequency +$\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$). +To specify how the transactions specified in variables $a$ and $b$, +respectively $A$ and $B$, are extracted, for example, the following +semantically permissible assumptions are made regarding the +observation of the event: $[a=1~ and~ b=0]$. $(A\cap +\overline{B})$\footnote{We then note $\overline{v}$ the variable + negation of $v$ (or $not~ v$) and $\overline{P}$ the complementary + part of the part P of E.} is the subset of transactions, +counter-examples of implication $a \Rightarrow b$: + +Assumptions: +\begin{itemize} +\item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent + random variables; +\item h2: the law of the number of events occurring in the time + interval $[t,~ t+T[$ depends only on T; +\item h3: two such events cannot occur simultaneously +\end{itemize} + +It is then demonstrated (for example in~\cite{Saporta}) that the +number of events occurring during a period of fixed duration $n$ +follows a Poisson's law of parameter $c.n$ where $c$ is called the +rate of the apparitions process during the unit of time. + + +However, for each transaction assumed to be random, the event $[a=1]$ +has the probability of the frequency $\frac{n_a}{n}$, the event[b=0] +has as probability the frequency, therefore the joint event $[a=1~ + and~ b=0]$ has for probability estimated by the frequency +$\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence). + +We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$. + +Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter : +$$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$ + +As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$ + +Consequently, the probability that the hazard will lead, under the +assumption of the absence of an a priori link between $a$ and $b$, to +more counter-examples than those observed is: + +$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] = +\sum^{card(A\cap \overline{B})}_{s=0} e^{-\lambda}\frac{\lambda^s}{s!} $$ + + But other legitimate drawing processes lead to a binomial law, or + even a hypergeometric law (itself not semantically adapted to the + situation because of its symmetry). Under suitable convergence + conditions, these two laws are finally reduced to the Poisson Law + above (see Annex to this chapter). + +If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable +into the variable: + +$$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) - \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}} $$ + +In the experimental realization, the observed value of +$Q(a,\overline{b})$ is $q(a,\overline{b})$. +It estimates a gap between the contingency $(card(A\cap +\overline{B}))$ and the value it would have taken if there had been +independence between $a$ and $b$. +