statistics vol 1 - descriptive intro

Hola hola mis amigos,

Hope you are having great Saturday. Mine is partly with statistics so I would like to share with you some things that I have learned so far and their application. I am rather pragmatic so this is why I am paying attention to the application every time :D. As much as I like theoretical approach and mind exercises, being able to use the things you know is a great experience and I am always very excited when it happens to me hah. Okay after this brief introduction let’s focus on today’s topic: statistics!

Statistics is a mathematics discipline that provides tools to interpret observations about some quality/qualities described within a random sample. We might consider the distribution of this sample and some characteristics that will highlight for us some interesting values. For example, we can consider a sample of n random people where all of them are football players and we can analyze their height or weight or BMI and their goal stats and see whether can we say something about the dependence between those qualities or not.

Most basic field of the statistic is descriptive statistics. In general, we have some data that we want to visualize or we want to extract from them some characteristics. Visualizing data can be done by bar charts, pie charts or any time of charts – but this is trivial, all of you know that already :D. What I want to mention more widely is characteristics for that data and random samples. So we have a couple of types of characteristics:

Measures of central tendency (measures of location)

They respond to the question what is the most typical value for that sample? And we consider as a central tendency measure:

Arithmetic Mean, pros: equal worth for each value, lack of promotion, cons: lack of resistance for outliers. Median, pros: resistance for outliers, cons: not taking under consideration values a bit further from the median. Mode – the most common appearing observation. Truncated Mean – arithmetic mean but without k-observations from each side from the ordered sample. Winsorized Mean – arithmetic mean but we replace k-observations from each side with k observation for the left side and n-k observation for the right side. Weighted Arithmetic Mean, Geometric Mean (the n-th root of the product of each of n values from the sample), Harmonic Mean (a multiplicative inverse of the arithmetic mean of the multiplicative inverse of the values).

And others such as quantiles (quartiles – division by 4, deciles – division by 10 and percentiles – division by 100)

Measures of dispersion

They respond to the question of how typical is a typical value for that sample?

Range – just distance between last and first value in the ordered sample. Interquartile Range – which is IQR=Q3-Q1, the difference between third and first quartile, pros: resistance for outliers,  Quarter Deviation is an IQR/2. Variance for random sample – tells how averagely data are inclined from the mean (expected value). Standard Deviation -  square root of the variance, Coefficient of Variation – the ratio of the standard deviation to the mean, shows the extent of the variability in relation to the mean of the population, MAD (Median Absolute Deviation) – this is median of the absolute deviations from the data’s median, pros: more resilient to outliers then standard deviation.

Measures of the shape

Skewness – there are different formulas, but in general it relates to the symmetry of the distribution and distribution of a random variable about its mean. Kurtosis -  similar to skewness, it is a descriptor of the shape of a probability distribution, it helps to see how far a distribution deviates from normal.

Btw I am still working on the approach to this super, knowledge-sharing posts so if you have any thoughts on that I will appreciate them! Thanks, guyz,


xoxo,

szarki9