Artificial Intuition and Reinforcement Learning, the Next Steps in Machine Learning

Humans and machines differ in how they act in so many ways.

A Form of Intuition

Yet there have been developments in AI that have led to not only more intelligent machines, but also it seems like they have developed a form of intuition. It was an insight learned from Google’s DeepMind research that involves a super computer that used AI called AlphaGo. It became a master in playing the ancient game of Go, even defeating the best human players in the world. Then a successor was built called AlphaGo Zero which defeated AlphaGo. It seems it had developed its own strategy based on what appears to be intuitive thinking. That was something believed only humans can have, not computers.

Intuition has a lot more to do with gut feeling, rather than calculated decision making processes. Being intuitive is not the same as being intellectual. They are really two different cognitive processes. Intelligence is based on what is known while intuition deals with the unknown. Intuition is more based on feelings, while intelligence is logic. Humans can make a decision based on what they feel, not necessarily what would be logical. Computers don’t have emotions like humans, so for a machine to use a “hunch” when making decisions is quite remarkable indeed since they are binary. Yet that is how to win a complex game like Go, to defy logic and base decisions on the possibility of an outcome. For example, an opponent can easily make a mistake that won’t be noticeable to a machine, unless it was trying to look at it from a human perspective. Understanding what the opponent is trying to do based on things beyond the rules of the game is intuitive thinking. So this would mean AlphaGo Zero has some form of function that might be analogous to human intuition, yet it is a machine.

AlphaGo Zero, the successor to AlphaGo, has beaten its predecessor in its own game. AlphaGo is renowned for beating the world’s top players in the game called Go, an ancient Chinese board game that requires intuitive thinking as part of its strategy. Until recently, computers could not make their own decisions based on intuition. Then Google’s DeepMind developed AlphaGo to play the game of Go and eventually develop its own strategies. It actually worked so well, even top Go players learned new things. The only thing that could beat AlphaGo was a newer version of itself called AlphaGo Zero. AlphaGo Zero beating AlphaGo shows that the field of deep learning in AI has made a major advancement. Deep Learning is a subset of Machine Learning, and under this falls another classification called RL Reinforcement Learning or “Self Learning”. This uses ANN (Advanced Neural Networks) to use data to make decisions.

[image_carousel_alternative images=”509,513,514″ onclick=”lightbox” items=”1″ items_on_small_screens=”3″ navigation=”1″ slide_by=”by_page” navigation_style=”2″ slide_number_status=”1″ style=”1″ fade=”1″ lazyload=”1″ img_size=”large” css_class=”dark”]

After only three days of self-play, AlphaGo Zero was strong enough to defeat the version of itself that beat 18-time world champion Lee Se-dol — 100 games to nil. After 40 days, it had a 90 percent win rate against the most advanced version of the original AlphaGo software. DeepMind, the creators of AlphaGo and AlphaGo Zero, says this makes it arguably the best Go player in history, and it is non-human.

This was an example of “Self Play Reinforcement Learning” which AlphaGo Zero utilized. This allowed the computer to train itself from scratch and actually become better than its predecessor in the smallest timeframe. What AlphaGo Zero did was play Go millions of times with itself without human intervention, meaning it was unsupervised ML. Basically the neural network in the program for AlphaGo Zero is creating its own “artificial knowledge”. AlphaGo Zero learned from reinforcement based on a sequence of actions that had both consequences and inception.

Basic RL is based on the Markov Decision Process. For every move AlphaGo or AlphaGo Zero makes, it looks at the probability of outcomes with the aid of powerful processors called TPUs arranged in an asynchronous distributed mode. To explain, asynchronous means that it does not rely on previous tasks to complete for execution. Asynchronous tasks process in parallel. This is important to machine learning because of how AlphaGo Zero responds to new data input without being programmed to do so. AlphaGo Zero did not require any human intervention in the sense that it learned to play Go by playing against itself until it could anticipate its own moves and how those moves would affect the game’s outcome.

Tags: