X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_chic.git/blobdiff_plain/c1aea4dc782230b123634e9c6f06bfc9d39b973a..bc528e875a70eb7d9dd810f644ba646bf6a4e6f6:/chapter2.tex?ds=sidebyside

diff --git a/chapter2.tex b/chapter2.tex
index c99fe46..f492308 100644
--- a/chapter2.tex
+++ b/chapter2.tex
@@ -107,4 +107,193 @@ implication index for binary data~\cite{Lermana} or \cite{Lallich}, on
 the other hand, this notion is not extended to other types of
 variables, to extraction and representation according to a rule graph
 or a hierarchy of meta-rules; structures aiming at access to the
-meaning of a whole not reduced to the sum of its parts \footnote{ICI }, i.e. operating as a complex non-linear system. For example, it is well known, through usage, that the meaning of a sentence does not completely depend on the meaning of each of the words in it (see the previous chapter, point 4). 
+meaning of a whole not reduced to the sum of its
+parts~\cite{Seve}\footnote{This is what the philosopher L. SÃ¨ve
+  emphasizes :"... in the non-additive, non-linear passage of the
+  parts to the whole, there are properties that are in no way
+  precontained in the parts and which cannot therefore be explained by
+  them" }, i.e. operating as a complex non-linear system.
+For example, it is well known, through usage, that the meaning of a
+sentence does not completely depend on the meaning of each of the
+words in it (see the previous chapter, point 4).
+
+Let us return to what we believe is fertile in the approach we are
+developing.
+It would seem that, in the literature, the notion of implication index
+is also not extended to the search for subjects and categories of
+subjects responsible for associations.
+Nor that this responsibility is quantified and thus leads to a
+reciprocal structuring of all subjects, conditioned by their
+relationships to variables.
+We propose these extensions here after recalling the founding
+paradigm.
+
+
+\section{Implication intensity in the binary case}
+
+\subsection{Fundamental and founding situation}
+
+A set of objects or subjects E is crossed with variables
+(characters, criteria, successes,...) which are interrogated as
+follows: "to what extent can we consider that instantiating variable\footnote{Throughout the book, the word "variable" refers to both an isolated variable in premise (example: "to be blonde") or a conjunction of isolated variables (example: "to be blonde and to be under 30 years old and to live in Paris")} $a$
+implies instantiating variable $b$?
+In other words, do the subjects tend to be $b$ if we know that they are
+$a$?".
+In natural, human or life sciences situations, where theorems (if $a$
+then $b$) in the deductive sense of the term cannot be established
+because of the exceptions that taint them, it is important for the
+researcher and the practitioner to "mine into his data" in order to
+identify sufficiently reliable rules (kinds of "partial theorems",
+inductions) to be able to conjecture\footnote{"The exception confirms the rule", as the popular saying goes, in the sense that there would be no exceptions if there were no rule} a possible causal relationship,
+a genesis, to describe, structure a population and make the assumption
+of a certain stability for descriptive and, if possible, predictive
+purposes.
+But this excavation requires the development of methods to guide it
+and to free it from trial and error and empiricism.
+
+
+\subsection{Mathematization}
+
+To do this, following the example of the I.C. Lerman similarity
+measurement method \cite{Lerman,Lermanb}, following the classic
+approach in non-parametric tests (e. g. Fischer, Wilcoxon, etc.), we
+define~\cite{Grasb,Grasf} the confirmatory quality measure of the
+implicative relationship $a \Rightarrow b$ from the implausibility of
+the occurrence in the data of the number of cases that invalidate it,
+i.e. for which $a$ is verified without $b$ being verified. This
+amounts to comparing the difference between the quota and the
+theoretical if only chance occurred\footnote{"...[in agreement with
+    Jung] if the frequency of coincidences does not significantly
+  exceed the probability that they can be calculated by attributing
+  them solely by chance to the exclusion of hidden causal
+  relationships, we certainly have no reason to suppose the existence
+  of such relationships.", H. Atlan~\cite{Atlana}}.
+But when analyzing data, it is this gap that we take into account and
+not the statement of a rejection or null hypothesis eligibility.
+This measure is relative to the number of data verifying $a$ and not
+$b$ respectively, the circumstance in which the involvement is
+precisely put in default.
+It quantifies the expert's "astonishment" at the unlikely small number
+of counter-examples in view of the supposed independence between the
+variables and the numbers involved.
+
+Let us be clear. A finite set $V$ of $v$ variables is given: $a$, $b$,
+$c$,...
+In the classical paradigmatic situation and initially retained, it is
+about the performance (success-failure) to items of a questionnaire.
+To a finite set $E$ of $n$ subjects $x$, functions of the type : $x
+\rightarrow a(x)$ where $a(x) = 1$ (or $a(x) = true$) if $x$ satisfies
+or has the character $a$ and $0$ (or $a(x) = false$) otherwise are
+associated by abuse of writing.
+In artificial intelligence, we will say that $x$ is an example or an
+instance for $a$ if $a(x) = 1$ and a counter-example if not.
+
+
+The $a \Rightarrow b$ rule is logically true if for any $x$ in the
+sample, $b(x)$ is null only if $a(x)$ is also null; in other words if
+set $A$ of the $x$ for which $a(x)=1$ is contained in set $B$ of the
+$x$ for which $b(x)=1$.
+However, this strict inclusion is only exceptionally observed in the
+pragmatically encountered experiments.
+In the case of a knowledge questionnaire, we could indeed observe a
+few rare students passing an item $a$ and not passing item $b$,
+without contesting the tendency to pass item $b$ when we have passed
+item $a$.
+With regard to the cardinals of $E$ (of size $n$), but also of $A$ (or
+$n_a$) and $B$ (or $n_b$), it is therefore the "weight" of the
+counter-examples (or) that must be taken into account in order to
+statistically accept whether or not to keep the quasi-implication or
+quasi-rule  $a \Rightarrow b$.  Thus, it is from the dialectic of
+example-counter-examples that the rule appears as the overcoming of
+contradiction.
+
+\subsection{Formalization}
+
+To formalize this quasi-rule, we consider any two parts $X$ and $Y$ of
+$E$, chosen randomly and independently (absence of a priori link
+between these two parts) and of the same respective cardinals as $A$
+and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
+
+We will then say:
+Definition 1: $a \Rightarrow b$ is acceptable at confidence level
+$1-\alpha$ if and only if
+$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
+
+\begin{figure}[htbp]
+  \centering
+\includegraphics[scale=0.34]{chap2fig1.png}
+ \caption{The dark grey parts correspond to the counter-examples of the
+   implication $a \Rightarrow b$}
+\label{chap2fig1}      
+\end{figure}
+
+It is established \cite{Lermanb} that, for a certain drawing process,
+the random variable $Card(X\cap \overline{Y})$ follows the Poisson law
+of parameter $\frac{n_a n_{\overline{b}}}{n}$.
+We achieve this same result by proceeding differently in the following
+way:
+
+Note $X$ (resp. $Y$) the random subset of binary transactions where
+$a$ (resp. $b$) would appear, independently, with the frequency
+$\frac{n_a}{n}$ (resp. $\frac{n_b}{n}$).
+To specify how the transactions specified in variables $a$ and $b$,
+respectively $A$ and $B$, are extracted, for example, the following
+semantically permissible assumptions are made regarding the
+observation of the event: $[a=1~ and~ b=0]$. $(A\cap
+\overline{B})$\footnote{We then note $\overline{v}$ the variable
+  negation of $v$  (or $not~ v$) and $\overline{P}$ the complementary
+  part of the part P of E.} is the subset of transactions,
+counter-examples of implication $a \Rightarrow b$: 
+
+Assumptions:
+\begin{itemize}
+\item h1: the waiting times of an event $[a~ and~ not~ b]$ are independent
+  random variables;
+\item h2: the law of the number of events occurring in the time
+  interval $[t,~ t+T[$ depends only on T;
+\item h3: two such events cannot occur simultaneously
+\end{itemize}
+
+It is then demonstrated (for example in~\cite{Saporta}) that the
+number of events occurring during a period of fixed duration $n$
+follows a Poisson's law of parameter $c.n$ where $c$ is called the
+rate of the apparitions process during the unit of time.
+
+
+However, for each transaction assumed to be random, the event $[a=1]$
+has the probability of the frequency $\frac{n_a}{n}$, the event[b=0]
+has as probability the frequency, therefore the joint event $[a=1~
+  and~ b=0]$ has for probability estimated by the frequency
+$\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$ in the hypothesis of absence of an a priori link between a and b (independence).
+
+We can then estimate the rate $c$ of this event by $\frac{n_a}{n}. \frac{n_{\overline{b}}}{b}$.
+
+Thus for a duration of time $n$, the occurrences of the event $[a~ and~ not~b]$ follow a Poisson's law of parameter : 
+$$\lambda = \frac{n_a.n_{\overline{b}}}{n}$$
+
+As a result, $Pr[Card(X\cap \overline{Y})= s]= e^{-\lambda}\frac{\lambda^s}{s!}$
+
+Consequently, the probability that the hazard will lead, under the
+assumption of the absence of an a priori link between $a$ and $b$, to
+more counter-examples than those observed is:
+
+$$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})] =
+\sum^{card(A\cap \overline{B})}_{s=0}  e^{-\lambda}\frac{\lambda^s}{s!} $$
+
+ But other legitimate drawing processes lead to a binomial law, or
+ even a hypergeometric law (itself not semantically adapted to the
+ situation because of its symmetry). Under suitable convergence
+ conditions, these two laws are finally reduced to the Poisson Law
+ above (see Annex to this chapter).
+ 
+If $n_{\overline{b}}\neq 0$, we reduce and center this Poison variable
+into the variable:
+
+$$Q(a,\overline{b})= \frac{card(X \cap \overline{Y})) -  \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}  $$
+
+In the experimental realization, the observed value of
+$Q(a,\overline{b})$ is $q(a,\overline{b})$.
+It estimates a gap between the contingency $(card(A\cap
+\overline{B}))$ and the value it would have taken if there had been
+independence between $a$ and $b$.
+