supervised learning vs unsupervised learning
Written on October 12th, 2019 by szarki9Good hmm night? guyz,
today's topic - statistical learning - was developed out of inferential statistics and was the first concept out of what the term machine learning was created. Basically, we can say that this is the same when we consider ML as mathematic tools, algorithms and statistical models used to perform complicated tasks without previously specified formulas and without direct interference from a user. But as there are many ML definitions and from what I have learned - many people understand it in a bit different way, it may be safer to say, that statistical learning was something that has evolved into ML.
But what statistical learning is exactly?
Statistical learning is a study about understanding data. Having given data set and using statistical learning tools we can either predict values of wanted variables or look for a connection between them or do both - that depends on what we want to know and what we can get out of the gathered data. As the title says - we recognize a distinction on supervised and unsupervised statistical learning. From what depends on that differentiation? Let us consider the following examples (first two are from the book ”An Introduction to Statistical Learning” - a recommendation from my side guyz):
1. Assume that we collect a set of data on the top 500 firms in Europe. For each firm, we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary.
In that case, we are looking for a connection/inference between variables and we want to establish which of them are relevant when we define salary. But what we also know is the CEO's salaries itself for each of 500 companies. This type of instance when we know the output of variables we will consider as supervised learning.
2. We are interested in predicting the % change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets. Hence we collect weekly data for all of 2018. For each week we record the % change in the USD/Euro, the % change in the US market, the % change in the British market and the % change in the German market.
This one, as stated in the problem statement is a prediction case. Having 4 variables and data from 52 weeks we would like to predict the % change in the USD/Euro exchange rate. Again, we have historical data from the previous year, so the output variable (which is % change) is known and hence again we will use supervised learning tools to solve that.
3. Let us have a bunch of photos of 6 people but without information about who is on which one and you want to divide this dataset into 6 piles, each with the photos of one individual.
Here we want our algorithm to classify by itself who is who and we do not provide at the beginning photos of everyone with a label and their name. When we do not know the output (here: who is who and what characteristics define each one of them) we apply unsupervised learning tools, as we want the algorithm to find characteristics that will distinct people (in that case).
As I have mentioned under the examples, supervised and unsupervised learning use different tools/models/techniques to solve the problems that are given. The main distinction between these two types is the knowledge about output variables and prior knowledge about labels (i.e. in photo cases). Supervised learning might be used to find inferences or to predict values. The tools that are most common here are linear regression, logistic regression, classification, etc. On the other hand, we also might use unsupervised learning to find inferences using techniques such as clustering or association. But before we decide what we are going to use, we must understand what are data all about and what is the use of them from a business perspective. But - another but hehe, I guess that understanding data is the first obvious step before all the data related work.
Hope you enjoyed, next time I will try briefly explain supervised learning techniques and que te pasa with a regression.
xoxo,
szarki9