statistics vol 2 - inferential statistics intro
Written on September 25th, 2019 by szarki9Hi there,
For the last two weeks, I have been researching what is the best and most achievable way (from my perspective ofc) to prepare yourself to work as a data specialist. I decided to start with the interview questions as I think they can give you quite a good overview of the skills that you need to have at the beginning and after them, you can start developing skills in more advanced way or make the DS portfolio.
There are a lot of question that you might be asked from fields such as statistics, machine learning, data quality or data cleansing (including use of most common python libraries), SQL query optimization and so on. I will start with statistics – that is the most basic knowledge that you should have and will be used everywhere further.
So let us start with the types of statistics and what is a statistic in general?
There are two of them, descriptive statistics and inferential statistics. I have described the first one widely a couple of posts before. But just to recall, the descriptive statistics is used to see tendencies (such as average, mean, std, etc.) in given data set and visualize them (using charts).
Inferential statistics, on the other hand, is much more complex, uses high mathematical solutions and works on random samples in order to make inferences about a population. For example, having a random sample of people and their salaries you can test whether a statement that men earn more than women can be rejected as a hypothesis or not. We can also estimate an interval for the value of specific parameters (mean or standard deviation). Given that, inferential statistics can be divided into two: estimation (point estimation or interval estimation) and hypothesis testing. I will go with specifics of each of the statistics in later posts, but in the end I would like to highlight the fundamental difference about them which is that you cannot make predictions out of info described within methods from descriptive statistics because they just characterize this specific data set that is given and these characterizations are always precise numbers. With inferential statistics output of the considered problems is rather a range of potential numbers with a degree of confidence.
Hope it clarifies a little, see you soon
xoxo
szarki9