Data Science
Two types of ml algorithm training.
- Prediction:
Unsupervised learning (unknown data) (nearest neighbours) - Exploration:
Supervised learning (known data)
Data Mining
-
Models
-
clustering
-
Business understanding-> data Understanding-> data Prepare->modeling(used be done manually with large scales mathematic analysis)(Machine Learning)>evaluation->deployment
-
Machine Learning:
Getting computers to act without being explicitly programmed.
Learning from data. -
Statistical learning.
Disadvantage:
- untrustworthy resource
- Spark ML/Spark MLLib
- bring potential suitable modelling lists
Case Analysis:
Sad Francisco Or New York
Some intuition(elevation)
Adding nuance
(Adding another dimension allows more nuance)
[scatterplot]
(Features, predictors, variables)
Drawing Boundaries
Scatterplot matrix
Decision tree(one variable each time)
Find a Better Boundaries[histogram]
Forks
Split point
Over fitting