data analytics problem approach
Written on September 10th, 2020 by szarki9Hello there,
here is another sum-up note from my lecture. It is probably more useful for me than for you haha, but enjoy it!
What do we want to get out of the data?
"The purpose of computing is insight, not numbers." Richard Hamming, 1962
What is insight?
There is no unique definition, but there is a set o characteristics that describes that:
- Complex
- Deep
- Qualitative
- Unexpected
- Relevant
Insight has multiple levels; basic level - a simple fact from the data and top-level - an interpretation from the data that requires reasoning on multiple data sources and domain knowledge, and some intermediate levels between them.
What makes a good dataset if you aim for insight?
- high volume - the more data than the more chance that you have something useful
- historical - the more we know the past the better we understand the future
- consistent - keep track of the context to understand the meaning of data
- multivariate - the more variables we have, the richer opportunity we have that we will discover something
- atomic - specify to the lowest level of detail at which something might ever be examined
- clean - information should be accurate, free of error, and complete
- clear - make it easy for yourself and others what data means
- richly segmented - provide and consider several groupings of the data
- of known pedigree - establish the reliability of your sources
Data guidelines to keep it feasible
- look inside first
- remove the format constraints
- figure out what’s missing
- embrace diversity
ANALYTIC REASONING
Types of reason: inductive reasoning and deductive reasoning.
Deductive reasoning -> associated with "formal logic", involves reasoning from known premises, or premises presumed to be true, to a certain conclusion, the conclusions reached are certain, inevitable, inescapable. => formulate hypotheses about relationships and underlying models, carry out experiments with the data to test hypotheses and models.
Inductive reasoning -> known as "informal logic", involves drawing uncertain inferences, based on probabilistic reasoning, the conclusions are probable, reasonable, plausible, believable. => explanatory data analysis to discover or refine hypotheses, discover new relationships, insights, and analytic paths from the data.
Data Science supports and encourages shifting between deductive (hypothesis-based) and inductive (pattern-based) reasoning.
Analytic reasoning artifacts:
- elemental artifacts: artifacts derived from isolated pieces of information
- pattern artifacts: artifacts derived from collections of data
- higher-order knowledge artifacts: logical inferences linking evidence and other reasoning artifacts to greater knowledge value
- complex reasoning constructs: a conjecture explanation, assessment, of forecast that should be supported by the evidence
Here you will find a couple of chars that are worth having in mind!
Byeeee, szarki9