How Machine Learning Programs “Learn” – Naive Bayes Classifier and Neural Networks
With all the hype surrounding self-driving cars and video-game-playing AI robots, it’s worth taking a step back and reminding ourselves how machine learning programs actually “learn”. In this article, we look at two machine learning (ML) techniques–spam filters and neural networks–and demystify how they work.
And if you’re not sure what machine learning even is, read about the difference between artificial intelligence, machine learning, and deep learning.
A Simple Example: Naive Bayes Classifier
One common machine learning algorithm is the Naive Bayes classifier, which is used for filtering spam emails. It keeps messages like “Nigerian Prince Needs Monetary Assistance!” out of your inbox. So how does it work?
Every time you click the “Mark as Spam” button, you’re updating a gigantic database of sample spam emails. With this information, a computer program can collect statistics about various words or phrases. For instance, it might note that 25% of spam emails contain the phrase “male enhancement”, or that 30% contain the phrase “fast money now”.
With these numbers, the Naive Bayes classifier now has all the information it needs to mark incoming emails as either legitimate or spam. When you receive a new message, the algorithm looks up statistics about all the words in it. It then combines those statistics with some math (specifically, a probability fact called Bayes rule) to classify email as spam or not. That’s it!
Of course, we’ve glossed over the number crunching that goes into this algorithm, but the actual “learning” process of this ML program is quite simple. It’s just updating statistics about words/phrases (e.g. 40% of emails with the phrase “Nigerian Prince” are spam). There’s nothing mystical about how this program works.
An unfortunate side effect of spam filtering
A Complex Example: Neural Networks
An ML algorithm that is growing in popularity is neural networks. In July of last year, Google announced it used neural networks to cut their data center cooling costs by a whopping 40%. To cut costs, they needed a way to predict how their PUE (power usage effectiveness) changes with respect to variables like server load, number of water pumps, number of cooling towers, and other data center attributes.
Google also uses machine learning to predict parking difficulty, and those are just two of many applications of machine learning. Such a calculation is too complicated for an engineer to formulate by hand, so they used a neural network.
Google’s neural networks are complicated, but all you need to know is that it contains parameters which get updated as you input data. “Data” in the context of the data center example means the PUE levels, server load, # water pumps, etc. at different points in time. As these parameters get tweaked, the neural network’s ability to accurately calculate PUE improves. The end result is a program that can tell you what a human engineer couldn’t — how their energy efficiency changes based on their data center cooling configuration.
Google used this algorithm to achieve the amazing 40% decrease in costs. In summary, Google’s neural network “learned” how to predict PUE based on the configuration of the data center. But remember, this “learning” process wasn’t anything mysterious. It was just tweaking parameters — i.e. actual numbers — in their neural network to make it’s PUE estimations more accurate.
Google data center