Week 2 Building AI Projects
Lesson 1 Introduction
Lesson 2 Workflow of a machine learning project
to summarize, the key steps of a machine learning project are to collect data, to train the model, the A to B mapping, and then to deploy the model. Throughout these steps, there is often a lot of iteration, meaning fine-tuning or adapting the model to work better or getting data back even after you've shipped it to, hopefully, make the product better, which may or may not be possible depending on whether you're able to get data back.
Lesson 3 Workflow of a data science project
Unlike a machine learning project, the output of a data science project is often a set of actionable insights, a set of insights that may cause you to do things differently.
the key steps of a data science project are to collect the data, to analyze the data, and then to suggest hypotheses and actions, and then to continue to get the data back and reanalyze the data periodically.
Lesson 4 Every job function needs to learn how to use data
Lesson 5 How to choose an AI project (Part 1)
when thinking about concrete AI projects, I find it much more useful to think about automating tasks rather than automating jobs.
Lesson 6 how to choose an AI project (Part 2)
Due diligence has a specific meaning in the legal world. But informally, it just means that you want to spend some time to make sure what you hope is true really is true.
- is this actually doable with today's technology?
- A second important question for technical diligence is how much data is needed to get to this desired level of performance, and do you have a way to get that much data.
- Third; would be engineering timeline to try to figure out how long it will take and how many people it will take to build a system that you would like to have built.
- In addition to technical diligence, I will often also conduct business diligence to make sure that the project you envision really is valuable for the business.
in addition to technical diligence and business diligence, I hope you also conduct ethical diligence and make sure that whatever you're doing is actually making humanity and making society better off.
Machine learning projects can be in-house or outsourced. I've seen both of these models used successfully. Sometimes if you outsource a machine learning project, you can have access much more quickly to talent and get going faster on a project. It is nice if eventually you build your own in-house AI team and can also do these projects in-house.
Unlike machine learning projects though, data science projects are more commonly done in-house. They're not impossible to outsource, you can sometimes outsource them, but what I've seen is that data science projects are often so closely tied to your business that it takes very deep day-to-day knowledge about your business to do the best data science projects.
when there's a massive force of an industry-standard solution that is been built, you might be better off just embracing an industry-standard or embracing someone else's platform rather than trying to do everything in-house.
Lesson 7 Working with an AI team
First, it really helps your AI team if you can specify an acceptance criterion for the project.
rather than asking for an AI system that just does something perfectly you see very often that we want an AI system that performs at a certain percentage accuracy like this example here.
AI teams group data into two main datasets.
The first called the training set and the second called the test set which we've already talked a bit about. The training set is just a set of pictures together with labels showing whether each of these pictures is of a coffee mug that is okay or defective.
Here are some of the reasons it may not be possible for a piece of AI software to be a 100% accurate.
First, machine learning technology today despite being very powerful still has limitations and they just can't do everything.
Second, insufficient data. If you don't have enough data specifically, if you don't have enough training data for the AI software to learn from it may be very difficult to get a very high level of accuracy.
Third, data is messy and sometimes data can be mislabeled.
Lesson 7 Technical tools for AI teams (optional)
Although GitHub is a technical website built for engineers, if you want, you should feel free to play around GitHub and see what other types of AI software people have released online as well.
A CPU is the computer processor in your computer, whether it's your desktop, your laptop, or a compute server off in the cloud. CPU stands for central processing unit, and CPUs are made by intel and AMD and a few other companies, this does a lot of the computation in your computer.
GPU stands for graphics processing unit. Historically, the GPU was made to process pictures, so if you play a video game, it's probably a GPU that is drawing the fancy graphics. But what we found several years ago was that the hardware that was originally built for processing graphics turns out to be very, very powerful. For building very large neural networks or very large deep learning algorithms. Given the need to build very large deep learning or very large neural network systems.And GPUs have proved to be a fantastic fit to this type of computation that we need to have done to train very large neural networks.
Cloud deployments refer to if you rent compute servers, such as from Amazon's AWS or Microsoft's Euro or Google's GCP, in order to use someone else's service to do your computation. Whereas an on-prem deployment means buying your own compute service and running the service locally in your own company.
If you are building a self-driving car, there's not enough time to send data from a self driving car to a cloud server to decide if you should stop the car or not. And then send that message back to the self-driving car. So the computation has to happen usually in a computer right there inside the car, that's called an edge deployment.