notes-math-statistics

---

"John Ioannidis gives a hypothetical but realistic example in the paper mentioned earlier (*). In his example, he supposes that 100,000 gene polymorphisms are being tested for association with schizophrenia. If 10 polymorphisms truly are associated with schizophrenia, the pre-study probability that a given gene is associated is 0.0001. If a study has 60% power (β = 0.4) and significance level α = 0.05, the post-study probability that a polymorphism determined to be associated really is associated is 0.0012. That is, a gene reported to be associated with schizophrenia is 12 times more likely to actually be associated with the disease than a gene chosen at random. However, the bad news is that 12 times 0.0001 is only 0.0012. There’s a 99.8% chance that the result is false." -- http://www.johndcook.com/blog/2008/12/06/why-microarray-studies-are-often-wrong/

perhaps see also the comments on that blog post?

---

see also [self-notes-someStuffYouMightLikeToKnow], statistics section and probability section.

conjugate families

a table:

http://fisher.osu.edu/~schroeder.9/AMIS900/ech6.pdf

bayesian simulation and regression

http://fisher.osu.edu/~schroeder.9/AMIS900/ech7.pdf

http://fisher.osu.edu/~schroeder.9/AMIS900/ech8.pdf

missing information

http://fisher.osu.edu/~schroeder.9/AMIS900/ech11.pdf


heavy-tailed distributions

there seems to be a relation between heavy-tailed-ness, and nonfiniteness of moments

" Mandelbrot and Taleb pointed out that although one can assume that the odds of finding a person who is several miles tall are extremely low, similar excessive observations can not be excluded in other areas of application. They argued that while traditional bell curves may provide a satisfactory representation of height and weight in the population, they do not provide a suitable modeling mechanism for market risks or returns, where just ten trading days represent 63 per cent of the returns of the past 50 years. " -- https://en.wikipedia.org/wiki/Seven_states_of_randomness

mild/slow/wild

Mandelbrot divides probability distributions into 3 categories, "mild", "slow", and "wild" (and 7 subcategories).

mild: non-concentration in both the short-run and the long-run: eg gaussian (Aside: mandelbrot doesnt like the term 'normal' because he finds heavy-tailed distributions to be quite common). Exponential distributions are 'borderline mild' because if you take two samples they tend to be of different scales, but if you take many samples they tend to be of similar scale.

slow: short-run: concentration. Long-run: non-concentration. Example: lognormal.

wild: concentration in both the short-run and the long run. Equivalently (i think): some moments are infinite. Example: pareto.

(note: i think Mandelbrot thinks that it is impossible to have long-run concentration and short-run non-concentration, not sure about this though)

Mandelbrot makes a metaphor from mild/slow/wild to gas/liquid/solid respectively. He says gas/liquid/solid is often defined by two properties: constant volume (gases have non-constant volume, liquids and solids have constant volume) and flowing (gases and liquids flow, solids do not). He maps constant volume to short-run concentration, and maps flowing to long-run non-concentration.

The seven categories: see https://en.wikipedia.org/wiki/Seven_states_of_randomness

Note that the log of Pareto-distributed random variable (with minimum param=1) is exponentially distributed. Note that the log of a lognormal is normally distributed. This provides an interesting relation/'construction blueprint' for slow and wild distribution out of mild distributions.

How is the central limit theorem escaped to make heavy-tailed distributions? Two ways (probably among many):

Links:

some heavy-tailed distributions

(are these distinct, or synonyms/subsets of each other?)

pareto, generalized pareto (includes pareto, lomax), levy, cauchy

---

todo

factor analysis and minimum sample sizes:

https://www.encorewiki.org/display/~nzhao/The+Minimum+Sample+Size+in+Factor+Analysis

one should perform a power analysis and worry that, even if the significance threshold p-value is pr

" It wasn't until some years later that I discovered (mind you, not invented) power analysis, one of whose fruits was the revelation that for a two-independent-group-mean comparison with n

30 per group at the sanctified two-tailed .05

level, the probability that a medium-sized effect would be labeled as significant by the most modern methods (a t test) was only .47. Thus, it was approximately a coin flip whether one would get a significant result, even though, in reality, the effect size was meaningful "

" The problem is that, as practiced, current research hardly reflects much attention to power. How often have you seen any mention of power in the journals you read, let alone an actual power analysis in the methods sections of the articles? Last year in Psychological Bulletin , Sedlmeier and Gigernzer (1989) published an article entitled "Do Studies of Statistical Power Have an Effect on the Power of Studies?". The answer was no. Using the same methods I had used on the articles in the 1960 Journal of Abnormal and Social Psychology ( Cohen, 1962 ), they performed a power analysis on the 1984 Journal of Abnormal Psychology and found that the median power under the same conditions was .44, a little worse than the .46 I had found 24 years earlier. It was worse still (.37) when they took into account the occasional use of an experimentwise alpha criterion. Even worse than that, in some 11% of the studies, research hypotheses were framed as null hypotheses and their nonsignificance interpreted as confirmation. The median power of these studies to detect a medium effect at the two-tailed .05 level was .25! These are not isolated results: Rossi, Rossi, and Cottrill (in press) , using the same methods, did a power survey of the 142 articles in the 1982 volumes of the Journal of Personality and Social Psychology and the Journal of Abnormal Psychology and found essentially the same results. "

https://speakerdeck.com/jakevdp/statistics-for-hackers?slide=109 describes a method for model order selected based on "difference in mean-squared error follows the chi-squared distribution" and "can estimate degrees of freedom easily because the models are nested". Find out more about that.