在计算机算法中使用统计模型使计算机能够做出决策和预测,并执行传统上需要人类认知能力的任务。机器学习是统计学和计算机科学交叉的跨学科领域,它发展了不同的统计模型,并将其与计算机算法交织在一起。它支撑着许多现代技术,例如语音识别、互联网搜索、生物信息学和计算机视觉亚马逊的推荐系统、谷歌的无人驾驶汽车和最新的癌症诊断成像系统都基于机器学习技术。
这门关于机器学习的课程将解释如何构建使用真实世界应用程序学习和适应的系统。将涉及的一些主题包括线性回归、逻辑回归、深度神经网络、聚类等。本课程将以项目为导向,重点编写应用于实际问题的学习算法的软件实现,特别是信用风险、托收管理和欺诈检测。
The use of statistical models in computer algorithms allows computers to make decisions and predictions, and to perform tasks that traditionally require human cognitive abilities. Machine learning is the interdisciplinary field at the intersection of statistics and computer science which develops such statistical models and interweaves them with computer algorithms. It underpins many modern technologies, such as speech recognition, Internet search, bioinformatics and computer vision—Amazon’s recommender system, Google’s driverless car and the most recent imaging systems for cancer diagnosis are all based on Machine Learning technology.
This course on Machine Learning will explain how to build systems that learn and adapt using real-world applications. Some of the topics to be covered include linear regression, logistic regression, deep neural networks, clustering, and so forth. The course will be project-oriented, with emphasis placed on writing software implementations of learning algorithms applied to real-world problems, in particular, Credit Risk, Collections Management and Fraud Detection.
Instructors:
Dr. Alejandro Correa Bahnsen
email: al.bahnsen@gmail.com
twitter: @albahnsen
github: albahnsen
Iván Torroledo
email: ivan.torroledo@gmail.com
github: torroledo
Requiriments
Python version 3.5;
Numpy, the core numerical extensions for linear algebra and multidimensional arrays;
Scipy, additional libraries for scientific programming;
Matplotlib, excellent plotting and graphing libraries;
IPython, with the additional libraries required for the notebook interface.
Pandas, Python version of R dataframe
scikit-learn, Machine learning library!
A good, easy to install option that supports Mac, Windows, and Linux, and that has all of these packages (and much more) is the Anaconda.
GIT!! Unfortunatelly out of the scope of this class, but please take a look at these tutorials
Sessions
Session Notebook link Exercises
1 Introduction to Machine Learning
2 Introduction to Python for Data Analysis Python & Numpy & Pandas
3 Linear Regression Income Prediction Rent
4 Logistic Regression Credit Scoring
5 Data Preparation and Model Evaluation Credit Scoring V2
6 Unbalance Datasets Fraud Detection
7 Decision Trees Fraud Detection V2
8 Ensemble Methods - Bagging Fraud Detection V3
9 Statistical Inference
10 Cost-Sensitive Classification Credit Scoring V4
11 Model Deployment
下载地址:
https://url92.ctfile.com/f/1850492-581502561-1d4830?p=3660 (访问密码: 3660)