AI is the new electricity 1
Electricity transformed countless industries:
Transportation, manufacturing, healthcare, communication, and more.
AI will bring about an equally big transformation.
AI, Machine Learning, and Deep Learning
What is AI, Machine Learning, and Deep Learning? and what is the difference between them?
AI is the broader concept of machines being able to perform tasks that would typically require human intelligence, such as visual perception, speech recognition, decision-making, and language understanding.
Machine Learning is a subset of AI that focuses on teaching machines to learn from data without being explicitly programmed.
Deep Learning is a subset of Machine Learning that uses neural networks with many layers to learn representations of data.
AI is bigger than Machine Learning. Machine Learning is bigger than Deep Learning.
Machine Learning Opportunities
Machine Learning has advanced so rapidly in the last few years, that there are so many opportunities to apply learning algorithms. Both in industry as well as in academia. Today we have:
- English department professors trying to apply learning algorithms to understand history betters.
- Lawyers trying to apply machine learning into process legal documents.
And of-campus, every company, both the tech companies as well as lots of other companies that you would not consider as tech companies, everything from manufacturing companies, to healthcare companies, to logistic companies are also trying to apply machine learning.
So, if you look at it on a factual basis, the number of people doing very valuable machine learning projects today is much grater than it was six months ago, and six months ago is much grater than it was a year ago. And the amount of value, the amount of exciting meaningful work being done in machine learning is very strongly going up.
And given the rise of the amount of data we have as well as the new machine learning tools that we have, it will be a long time before we run out of opportunities. before society as a whole, has enough people with the machine learning skill set.
Just as maybe 20 years ago was a good time to start working on this Internet thing and all people that started working on the Internet like 20 years ago have fantastic careers.
I think today is a wonderful time to jump into machine learning and the opportunities for you to do unique things that no one else is doing. 2
One of the things i found very exciting about machine learning is that it is no longer a sort of pure tech company only kind of thing. Many years ago, machine learning, it was like a thing that the computer science department would do and the elite AI companies like Google, Facebook, Baidu, and Microsoft would do. But now it is so pervasive that even companies that are not traditionally tech companies see a huge need to apply machine learning.
What is Machine Learning?
It seems to be everywhere these days and it is useful for so many spaces. Every time there is a major technological disruption which there is now, through Machine Learning, it gives us an opportunity to remake large parts of the world and if we behave ethically in principled way and use the superpowers of machine learning to do things that helps people’s lives, improve the healthcare system, give every child a personalized tutor, and maybe we can make our democracy run better rather than make it run worse.
The meaning i find in machine learning is that there is so many people that are so eager for us to go in and help them with these tools. 2
Machine Learning definition
Arthur Samuel whose claim to fame was building a checkers playing program, defined machine learning as:
Field of study that gives computers the ability to learn without being explicitly programmed. 3
Arthur Samuel, many many decades ago, built the checkers playing program. The debates of the day was can a computer ever do something that it wasn’t explicitly told to do. and he wrote the checkers program, that through self play learns whether the patterns of the checkerboard are more likely to lead to win versus lose, and learned to be even better than him, the author himself. So back then, this was viewed as a remarkable result that a computer programmer write a piece of software to do something that the programmer himself could not do. Because this program became better than Arthur Samuel, at the task of playing checkers.
And today, we are used to computers or machine learnings outperforming humans on so many tasks. But, it turns out that when you choose a narrow task like speech recognition on a certain type of task, you can maybe surpass human level performance. If you choose a narrow task like playing the game of Go, than by throwing really tons of computational power at it and self play, you can have a computer become very good at this narrow task.
Tom Mitchell gave a more formal definition of machine learning as:
Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 4
In this definition, for the case of playing checkers, we have:
- The experience E, would be the experience of having a checkers play-program played tons of games against itself. So, computers lots of patients and sit there for days playing checkers against itself.
- The task T, is the task of playing checkers.
- The performance measure P, is what’s the chance of this program winning the next game of checkers it plays against the next opponent.
So, we say that, this is a well-posed learning problem, learning the game of checkers.
Machine Learning Strategy (Learning Theory)
There is a huge difference in the effectiveness of how two different teams could apply exact same learning algorithm. The most skilled Machine Learning practitioners are very strategic, and when you are working on a Machine Learning project, you have a lot of decision to make. Do you collect more data? Do you try a different learning algorithm? Do you rent faster GPUs to train your learning algorithm for longer? or if you collect more data, what type of data do you collect? or for all of these architectural choices.
There are a lot of decisions you need to make when building these learning algorithms. you should become more systematic in driving machine learning as a systematic engineering discipline. So when one day you are working on a machine learning project, you can efficiently figure out what to do next. and sometime make an analogy to how software engineers work.
When you run a learning algorithm, it almost never works the first time. It’s this just life. And the way you go about debugging the learning algorithm will have a huge impact on your efficiency on how quickly you can build effective learning system.
Discipline of Machine Learning
Until now, too much of the process of making your learning algorithms work well has been a black magic, kind of process, where has worked on this for decades. So, when you run something, you don’t know why it does not recognize it and get help from the experienced ones and do it till it works.
What we trying to do with the discipline of machine learning is to evolve it from black magic, tribal knowledge, experience-based thing to a systematic engineering process.
So we should use systematic tools to strategize about how to proceed. So it can be very efficient in how you can lead a team to build an effective learning system. And you could preventing of wasting lots of time on some direction that maybe could have relatively quickly figured out it was not promising.
One last analogy, if you’re used to optimizing code, making code run faster, less experienced software engineers will just dive in and optimize the code and try to make it run faster, Let’s take the C++ and code in assembly or something but more experienced software engineers will run a profiler to try to figure out what part of code is actually the bottleneck and then just focus on optimizing that.
Supervised Learning
Supervised learning is the machine learning task of learning a function that maps an input X to an output or label Y based on example input-output pairs.
The goal is to learn a function h that maps X to Y so that for a new input X, the function h(x) can predict the corresponding output Y.
At the heart of Supervised Learning is the idea that during training, we are given inputs X together with the labels Y and we give it both at the same time, and the job of our learning algorithm is to find a mapping so that given a new X, we can map it to the most appropriate output Y.
Machine Learning is very useful today, It turns out that most of the recent wave of economic value created by Machine Learning is through Supervised Learning.
Autonomous Driving with Supervised Learning
This is a very old video called ALVINN video made by Dean A. Pomerleau, using supervised learning for autonomous driving, but it is a good example of supervised learning, and this is not state of the art for autonomous driving anymore and how self-driving cars are built today, but it actually does remarkably well.
It will be using artificial neural network to drive a vehicle that was built at Carnegie Mellon University in 1989. And what happens is during training, it watches the human drive the vehicle and about 10 times per second, it digitizes the image in front of the vehicle. So the picture was taken by a front-facing camera, and what it does is in order to collect labeled data the car while the human is driving it, records both the image as well as the steering direction that was chosen by human. The image turned to grayscale and lower resolution, and the driver direction was saved as Y label which is the human steering direction. Initially the neural network doesn’t know how to drive, and as the algorithm learns using the back-propagation learning algorithm or gradient descents, the neural network’s outputs becomes more and more accurate.
After the learning algorithm has learned, it can then be used to drive the vehicle. It will digitizing the image in front of the road, taking this image and passing it through the learning algorithm, the trained neural network, letting the neural networks select a steering direction, and then using a little motor to turn the wheel.
This is slightly more advanced version which has trained two separate model, one for a one-lane road and one for a two-lane road. And the arbitrator is another algorithm that tries to decide whether the one-lane or the two-lane model is more appropriate one for a particular given situation.
Regression Problem
The term regression refers to that, the value of Y, which we are trying to predict, is continuous and real number.
For example:
- predicting the price of a house based on its features, such as the number of bedrooms, bathrooms, square footage, etc.
- predicting the price of a stock based on its features, such as the company’s revenue, earnings, and etc.
Classification Problem
The term classification refers to that, the value of Y, which we are trying to predict, is discrete.
For example:
- Email Spam: predicting whether an email is spam or not.
- Cancer Grade: predicting grade of the cancer (level 1, 2, or 3) based on its size and other features.
- Fraudulent Credit Card Transaction: predicting whether a credit card transaction is fraudulent or not based on its features.
Unsupervised Learning
Unsupervised Learning would be the case where we are given a set of inputs X and we are not given any labels Y, and the job of our learning algorithm is to find some structure in the data.
For example:
- Google News: Google news everyday crawls or reads, tens of thousands of news articles on the Internet and groups them together. It taken a lot of the articles written by different reporters and grouped them together. This will be taking different news sources and figuring out that these are all stories kind of about the same thing.
- Social Networks: Social networks like Facebook, Instagram, and Twitter, they have a lot of users and they want to group users into different communities and making friends groups.
- Market Segmentation: Market segmentation is the process of dividing a market into distinct groups of consumers with similar needs or characteristics.
Cocktail Party Problem
Cocktail party problem is a problem where we have a noisy room and you stick multiple microphones in the room and record overlapping voices, so that no labels reaches multiple microphones, an array of microphones, in a room of lots of people talking. how can you have the algorithm separate out the people’s voices? so you get clean recording of just one voice at a time. The algorithm you should use to do this called Independent Component Analysis (ICA).
Internet Unlabeled Data
The Internet has tons of unlabeled text data. You just suck down data from the Internet. There are no labels necessarily but can you learn interesting things about language. Figure out on one of the best cited results recently was learning analogies, like, man is to woman as king is to queen, or what’s a Tokyo is to Japan as Washington DC is to the United States?
We can learn analogies like that from unlabeled data, just from text on the Internet. That’s also Unsupervised Learning.
Deep Learning
There’s one subset of Machine Learning that’s really hot tight now, because it’s just advancing very rapidly, called Deep Learning.
Reinforcement Learning
Reinforcement Learning is the case where we have an agent that is interacting with an environment. The agent is trying to learn a policy, which is a mapping from states to actions, so that it can maximize its reward over time.
Let’s say, i give you the keys to my autonomous helicopter. This helicopter is actually sitting in my office, and I’m trying to figure out how to get rid of it, and i’ll ask you to write a program to make it fly. So how do you do that? You can use learning algorithms to get robots to do pretty interesting things like this. And it turns out that a good way to do this is through Reinforcement Learning.
It turns out that no one knows what’s the optimal way to fly a helicopter, in our example case, if you fly a helicopter, you have two control sticks that you’re moving, but no one knows what’s the optimal way to move the control stick. So the way you can get a helicopter fly itself is let the helicopter do whatever think of this as training a dog. You can’t teach a dog the optimal way to behave, So how do you train a dog? You let the dog do whatever it wants, and then whenever it behaves well, you go, “Oh, good dog!”, And when it misbehaves you go, “Bad dog!”. And then over time the dog learns to do more of good dog things and fewer of the bad dog things. And so, Reinforcement Learning is a bit like that.
Recently, the most famous applications of Reinforcement Learning happened for game-playing, playing Atari games or playing Game of Go, like AlphaGo. Game playing has made some remarkable stunts a remarkable PR but I’m also even more excited about the interesting things reinforcement learning is making into robotics applications.
Reinforcement Learning has been proven to be fantastic at playing games, it’s also making real traction in optimizing robots and optimizing sort of logistic system and things like that.