Python_data_science_第五课

conditional probability and Bayes’ Theorem 这两个理论看的我头晕。

conditional probability
if I have two events that depend on each other, what’s the probability that both will occur.
Notation:
P(A,B) is the probability of A and B both occuring independent of each other. P(B|A): probability of B given A has already occurred.

we know: P(B|A) = P(A,B) / P(A)

Example: I give my students two tests, 60% of my students passed both tests, but the first test was easier, -80% passwd that one. what percentage of students who passed the first test also passed the second ? A= passing the first B= passing the second so we are asking for P(B|A) - probability of B given A

P(B|A) = P(A,B) / P(A) = 0.6 / 0.8 = 0.75
75% of (students who passed the first test) also passed the second.

Question ,What about P(A|B), probability of A given B has already occurred. we calculate the students passed the second test also passed the first test. P(A|B) is NOT equal to P(B|A)
Bayes’ Theorem
Now that you understand conditional probability, you can understand Bayes’ Theorem:

P(A|B) = P(A)P(B|A) / P(B)
In english: The probability of A given B, is the Probability of A times the probability of B given A over the probability of B.

The key insight is that the probability of something that depends on B depends very much on the base probability of B and A.

Drug testing is a common example. Even a “highly accurate” drug test can produce more false positives than true positives.
Let’s say we have a drug test that can accurately identify users of a drug 99% of the time
and accurately has a negative result for 99% of non-users. But only 0.3% of the overall population actually uses this drug.

A= is a user of the drug
B = test positively for the drug

P(A)=0.3% = 0.003
We can work out from that information that P(B) is 1.3% ( 0.99 * 0.03 + 0.01 * 0.997 )
The probability of testing positive if you do use, plus the probability of testing positive if you don’t .

P(A B)= P(A) * P(B A) / P(B) = 0.003 * 0.99 / 0.013 = 22.8%

So the odds of someone being an actual user of the drug given that they test positive is only 22.8%.
Even though P(B|A) is 99%, P(A|B) is only 22.8%.
看完Bayes’ Theorem, 对很多所谓的统计数字能有些新的认识。
参考web page .
https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

里面的例子是cancer test.
%1的人有癌症， 99%没有
80%有癌症的人能得到正确的结果，20%人即使有cancer 可能也会miss.
没有癌症的人9.6%的几率会被误诊为癌症， 90.4%的概率会得到自己没有癌症的诊断。

如果一个人诊断结果是癌症，那么他真的有癌症的几率是多少呢？
our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

PREVIOUSPython_data_science_第四课

NEXTPython_data_science_第六课