new

author couturie <you@example.com>

Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)

committer couturie <you@example.com>

Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)
author couturie <you@example.com>
Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)
committer couturie <you@example.com>
Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)
diff --git a/book.tex b/book.tex

index e7afb00b746057c9f57ac478f569cc04315be6d2..0cdc1fd2cac72b2beb6add5fce4fe138f31fdd60 100644 (file)
--- a/book.tex
+++ b/book.tex
@@ -30,6 +30,7 @@
  
  \usepackage{newtxtext}       % 
  \usepackage{newtxmath}       % selects Times Roman as basic font
+\usepackage{diagbox}
  
  % see the list of further useful packages
  % in the Reference Guide
diff --git a/chapter2.tex b/chapter2.tex

index f4923083dff091fc8903a7190b501eb84186d035..096f51f7d1d2ab5ac234e26997f9117c6303de78 100644 (file)
--- a/chapter2.tex
+++ b/chapter2.tex
@@ -215,7 +215,8 @@ between these two parts) and of the same respective cardinals as $A$
  and $B$. Let $\overline{Y}$ and $\overline{B}$ be the respective complementary of $Y$ and $B$ in $E$ of the same cardinal $n_{\overline{b}}= n-n_b$.
  
  We will then say:
-Definition 1: $a \Rightarrow b$ is acceptable at confidence level
+
+\definition $a \Rightarrow b$ is acceptable at confidence level
  $1-\alpha$ if and only if
  $$Pr[Card(X\cap \overline{Y})\leq card(A\cap \overline{B})]\leq \alpha$$
  
@@ -297,3 +298,140 @@ It estimates a gap between the contingency $(card(A\cap
  \overline{B}))$ and the value it would have taken if there had been
  independence between $a$ and $b$.
  
+\definition $$q(a,\overline{b}) = \frac{n_{a \wedge \overline{b}}-  \frac{n_a.n_{\overline{b}}}{n}}{\sqrt{\frac{n_a.n_{\overline{b}}}{n}}}$$
+is called the implication index, the number used as an indicator of
+the non-implication of $a$ to $b$.
+In cases where the approximation is properly legitimized (for example
+$\frac{n_a.n_{\overline{b}}}{n}\geq 4$), the variable
+$Q(a,\overline{b})$ approximately follows the reduced centered normal
+distribution. The intensity of implication, measuring the quality of
+$a\Rightarrow b$, for $n_a\leq n_b$ and  $nb \neq n$, is then defined
+from the index $q(a,\overline{b})$ by:
+
+\definition 
+The implication intensity  that measures the inductive quality of a
+over b is:
+$$\varphi(a,b)=1-Pr[Q(a,\overline{b})\leq q(a,\overline{b})] =
+\frac{1}{\sqrt{2 \pi}} \int^{\infty}_{ q(a,\overline{b})}
+e^{-\frac{t^2}{2}} dt,~ if~ n_b \neq n$$
+$$\varphi(a,b)=0,~ otherwise$$
+As a result, the definition of statistical implication becomes:
+\definition 
+Implication  $a\Rightarrow b$ is admissible at confidence level
+$1-\alpha $ if and only if: 
+$$\varphi(a,b)\geq 1-\alpha$$
+
+
+It should be recalled that this modeling of quasi-implication measures
+the astonishment to note the smallness of counter-examples compared to
+the surprising number of instances of implication.
+It is a measure of the inductive and informative quality of
+implication. Therefore, if the rule is trivial, as in the case where
+$B$ is very large or coincides with $E$, this astonishment becomes
+small.
+We also demonstrate~\cite{Grasf} that this triviality results in a
+very low or even zero intensity of implication: If, $n_a$ being fixed
+and $A$ being included in $B$, $n_b$ tends towards $n$ ($B$ "grows"
+towards $E$), then $\varphi(a,b)$ tends towards $0$. We therefore
+define, by "continuity":$\varphi(a,b) = 0$ if $n_b = n$. Similarly, if
+$A\subset B$, $\varphi(a,b)$ may be less than $1$ in the case where
+the inductive confidence, measured by statistical surprise, is
+insufficient.
+
+{\bf \remark Total correlation, partial correlation}
+
+
+We take here the notion of correlation in a more general sense than
+that used in the domain that develops the linear correlation
+coefficient (linear link measure) or the correlation ratio (functional
+link measure).
+In our perspective, there is a total (or partial) correlation between
+two variables $a$ and $b$ when the respective events they determine
+occur (or almost occur) at the same time, as well as their opposites.
+However, we know from numerical counter-examples that correlation and
+implication do not come down to each other, that there can be
+correlation without implication and vice versa~\cite{Grasf} and below.
+If we compare the implication coefficient and the linear correlation
+coefficient algebraically, it is clear that the two concepts do not
+coincide and therefore do not provide the same
+information\footnote{"More serious is the logical error inferred from
+  a correlation found to the existence of a causality" writes Albert
+  Jacquard in~\cite{Jacquard}, p.159. }.
+
+The quasi-implication of non-symmetric index $q(a,\overline{b})$ does
+not coincide with the correlation coefficient $\rho(a, b)$ which is
+symmetric and which reflects the relationship between variables a and
+b. Indeed, we show~\cite{Grasf} that if $q(a,\overline{b}) \neq 0$
+then
+$$\frac{\rho(a,b)}{q(a,\overline{b})} = \sqrt{\frac{n}{n_b
+    n_{\overline{a}}}} q(a,\overline{b})$$
+With the correlation considered from the point of view of linear
+correlation, even if correlation and implication are rather in the
+same direction, the orientation of the relationship between two
+variables is not transparent because it is symmetrical, which is not
+the bias taken in the SIA.
+From a statistical relationship given by the correlation, two opposing
+empirical propositions can be deduced.
+
+The following dual numerical situation clearly illustrates this:
+
+
+\begin{table}[htp]
+\center
+\begin{tabular}{|l|c|c|c|}\hline
+\diagbox[width=4em]{$a_1$}{$b_1$}&
+  1 & 0 & marge\\ \hline
+  1 & 96 & 4& 100 \\ \hline
+  0 & 50 & 50& 100 \\ \hline
+  marge & 146 & 54& 200 \\ \hline
+\end{tabular} ~ ~ ~ ~ ~ ~ ~ \begin{tabular}{|l|c|c|c|}\hline
+\diagbox[width=4em]{$a_2$}{$b_2$}&
+  1 & 0 & marge\\ \hline
+  1 & 94 & 6& 100 \\ \hline
+  0 & 52 & 48& 100 \\ \hline
+  marge & 146 & 54& 200 \\ \hline
+\end{tabular}
+
+\caption{Numeric example of difference between implication and
+  correlation}
+\label{chap2tab1}
+\end{table}
+
+In Table~\ref{chap2tab1}, the following correlation and implications
+can be computed:\\
+Correlation $\rho(a_1,b_1)=0.468$, Implication
+$q(a_1,\overline{b_1})=-4.082$\\
+Correlation $\rho(a_2,b_2)=0.473$, Implication  $q(a_2,\overline{b_2})=-4.041$
+
+
+Thus, we observe that, on the one hand, $a_1$ and $b_1$ are less
+correlated than $a_2$ and $b_2$ while, on the other hand, the
+implication intensity of $a_1$ over $b_1$ is higher than that of $a_2$
+over $b_2$ since $q1 <q2$.
+
+On this subject, Alain Ehrenberg in~\cite{Ehrenberg} writes: "The
+finding of a correlation does not remove the ambiguity between" when I do $X$, my brain is in state $Y$" and "if I do $X$, it is because my brain is in state $Y$", that is, between something that happens in my brain when I do an action.
+
+\remark  Remember that we consider not only conjunctions of variables
+of the type "$a$ and $b$" but also disjunctions such as "($a$ and $b$)
+or $c$..." in order to model phenomena that are concepts as it is done
+in learning or in artificial intelligence.
+The associated calculations remain compatible with the logic of the
+proposals linked by connectors.
+
+\remark Unlike the Loevinger Index~\cite{Loevinger}  and conditional
+probability $(Pr[B/A])=1$ and all its derivatives, the implication
+intensity varies, non-linearly, with the expansion of sets $E$, $A$
+and $B$ and weakens with triviality (see Definition 2.3).
+Moreover, it
+is resistant to noise, especially around $0$ for, which can only make
+the relationship we want to model and establish statistically
+credible.
+Finally, as we have seen, the inclusion of $A$ in $B$ does not ensure
+maximum intensity, the inductive quality may not be strong, whereas
+$Pr[B/A]$ is equal to $1$~\cite{Grasm,Guillet}.
+In paragraph 5, we study more closely the problem of the sensitivity
+and stability of the implication index as a function of small
+variations in the parameters involved in the study of its
+differential.
+
diff --git a/references.tex b/references.tex

index e11ef546735cc79318e8655cf9daf00b60399b50..faba5e301de3243b676cbccc5305ec24a4f3dca3 100644 (file)
--- a/references.tex
+++ b/references.tex
@@ -87,6 +87,8 @@
    des Sciences dures aux Sciences Humaines et Sociales, R. Gras (dir.), Cépaduès Ed. Toulouse, p. 339-448, ISBN: 978.2.36493.577.8.
  
  
+\bibitem{Ehrenberg} Ehrenberg A. (2008) Sciences Humaines, n° 198, nov. 2008. 
+  
    
  \bibitem{Espagnat} d’Espagnat B. (1981) A la recherche du réel, Le
    regard d’un physicien, Paris.
@@ -99,6 +101,8 @@
  
  \bibitem{Gaudin} Gaudin F.  (2005) Emergence, complexité et dialectique, Paris, O.Jacob.
  
+
+  
  \bibitem{Grasa}  Gras R. (1976) Recherche d’une taxonomie d’objectifs cognitifs en Mathématiques, I.R.E.M. de Rennes. 
  
  \bibitem{Grasb} Gras R. (1979) Contribution à l'étude expérimentale et à l'analyse de certaines acquisitions cognitives et de certains objectifs didactiques en mathématiques, Thèse d'Etat, Université de Rennes I.
@@ -171,6 +175,11 @@ l’implication et la confiance?, L’Analyse Statistique Implicative,
  des Sciences dures aux Sciences Humaines et Sociales, R.Gras (dir.),
  Cépaduès Ed. Toulouse, p. 195-208, ISBN: 978.2.36493.577.8.
  
+\bibitem{Guillet} Guillet, F., Hamilton, H. J.  (2007) Quality measures in data mining (Vol. 43). Springer.
+
+
+\bibitem{Jacquard} Jacquard A. (2001) La science à l’usage des non-scientifiques », p.159, 2001.
+
  
  
  \bibitem{Kodratoff} Kodratoff Y. (2000) Extraction de connaissances à partir des données
author	couturie <you@example.com>
	Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)
committer	couturie <you@example.com>
	Sat, 9 Mar 2019 21:25:07 +0000 (22:25 +0100)
book.tex		patch \| blob \| history
chapter2.tex		patch \| blob \| history
references.tex		patch \| blob \| history