probability distributions - Intro | first steps in DS ocean

Hello All!

I haven’t been there for quite some time, sorry about that. But cutting to the important stuff… In a number of next posts, I would like to describe the application of more or less common probability distributions. I have spent lately quite some time on it, as in order to understand what distribution you should use, firstly you need to understand what each distribution means. So… starting from basic definitions and basing distributions I will try to demonstrate here the more advanced ones!

First of all, what a probability distribution is? By distribution of a probability, we call the function that assigns the probability of having a particular value of a random variable to its values. Given that, the values of that function will be located between 0 and 1. When I think about probability distribution I try to think about something easy and basic, like the distribution of people’s height in Poland. Let’s take into consideration only men. The average height of men in Poland is 180 cm. What does it mean? That if you take a random man on the street this is a high probability that he will be around that. What exact probability it will be? It will be 50% exactly (you can check my previous post about descriptive statistics) and that means that most of the men in Poland are “distributed” around that number. Later on, I will depict a couple of examples so stay with me!

Now, as a distribution of a probability is a function, it can take discrete values or continuous values. In the first instance, I will focus on discrete distributions 😊. By discrete function, we call a function that has a finite or countable number of values that (in distributions) take non-zero probability as an outcome of the function.

Bernoulli distribution (binomial distribution)

All of you probably have heard about the Bernoulli scheme, which is a model of a situation where we have an experiment with a binary outcome. A binary outcome is assigned to the success or a failure, where a probability of success has a value of p (what gives the probability of failure equal to 1-p).

Bernoulli distribution function tells you what is a probability of k successes among n-times Bernoulli trial, having p, as a probability of success. Look below on a distribution formula:

$$x_k = k,\; p_k = P(X=k) = {n\choose k} p^k (1-p)^{n-k} \sim B(n,p)$$ $$k \in \{0,1,...,n\},\; n\in \mathbb{N},\; p \in(0,1)$$

Geometric distribution

But having a Bernoulli scheme we might want to ask a bit different question. For example, what is the probability of having success after k-1 failures, when the probability of success equals p? We will consider it as waiting time for a first success to come. In that case, k might be any natural number from 0. Look below on a distribution formula:

$$x_k=k,\; p_k = P(X=k)=p(1-p)^{k-1} \sim Geo(p)$$ $$ k \in \{0,1,...\},\; p \in(0,1)$$

Negative binomial distribution

Extension of the question from the last paragraph is question about the probability of waiting time for m-th success after k failures, where m is previously defined number . When m is an integer, then we call this distribution as Pascal distribution and when m is a real-value case then we call it Polya distribution. Probability mass function is:

$$ x_k=k, \; p_k=P(X=k)={m-1\choose n-1}p^m(1-p)^{k-m} \sim NB(m,p)$$ $$k \in \{m,m+1,...\},\; p \in(0,1) $$

Multinomial distribution

Another useful distribution is a generalization of the binomial distribution. We can apply it to an experiment which we repeat n times. Each trial of the experiment can have different results, is independent and let’s say that there are r distinct results. The probability of each result is the same in every experiment and we call it by pi for i from 1 to r. Then if we assign to random variables from 1 to r, the number of occurrence for each distinct result we will have a random vector with a probability given by the below mass function:

$$ P(X_1=k_1,X_2=k_2,...,X_r=k_r)=\frac{n!}{k_1!k_2!...k_r!}p_1^{k_1}...p_r^{k_r} \sim Mult(n,p)$$ $$ \sum_{i=1}^{r}k_i = n,\;\sum_{i=1}^{r}p_i = 1,\; p_i \in (0,1),\; n\in \mathbb{N}$$

Hypergeometric distribution

When doing first probability classes exercises we often come across random draws of balls from an urn. We can have a draw with a return or without and so on. Let’s assume that we carry out an experiment when we draw n balls from an urn with m balls of one type and N-m balls of different types then that (and in total, we have N balls in our urn). We want to investigate the probability of drawing k balls from the first type of elements in our urn. The probability mass function for that is as below:

$$ P(X=k) = \frac{ {m\choose k} {N-m\choose n-k} }{N\choose n} \sim HGeo(N,m,n)$$ $$ N \in \mathbb{N},\; m\in\{0,1,...,N\} ,\; n\in\{0,1,...,N\},$$ $$ k\in \{ max(0,n+m-N),..., min(n,m) \}$$

To sum up, all of those distributions cover different situations that we want to model. Like you see, for each of them, probability distribution function looks a bit different and when you dive deeper into the expected values or variances you might see other relationships.

Hope it clarifies a little, whenever you have some feedback or questions you are more than welcome to contact me.

xoxo,
szarki9

probability distributions - Intro

Bernoulli distribution (binomial distribution)

Geometric distribution

Negative binomial distribution

Multinomial distribution

Hypergeometric distribution

You may also enjoy...

probability distributions - Poisson