WHY THIS MATTERS IN BRIEF
- Decades old AI algorithms are increasingly demonstrating that, with some fine tuning, they can thrash today’s best systems – and people are taking notice
Artificial intelligence (AI) researchers have a long history of going back in time to explore old ideas, and now researchers at OpenAI, which is backed by Elon Musk, have revisited “Neuroevolution,” a field that has been around since the 1980s, and they’ve achieved state of the art results.
The group, which was led by OpenAI’s research director Ilya Sutskever, explored the use of a set of algorithms called “Evolution strategies,” which are aimed at solving “optimisation” problems. Optimisation problems are just like they sound, think of something that needs optimising, such as your route to work, a flight plan, or even a healthcare treatment and optimise it.
On an abstract level, the technique the team used works by letting successful algorithms to pass their characteristics on to future generations – in short, each successive generation gets better and better at whatever tasks they’ve been assigned. However, coming back into the present day, the researchers took these algorithms and reworked them so they’d work better with today’s deep neural networks and run better on large scale distributed computing systems.
To validate the new systems effectiveness they set the algorithms to work on a series of challenges that are seen as benchmarks for reinforcement learning – the technique behind many of Google DeepMind’s most impressive feats that range from teaching their AI’s to learn as fast as humans, and giving them human like memory, through to creating new Artificial General Intelligence (AGI) architectures, and teaching them to dream and annihilate online Go players – to name but a few.
One of the challenges was to train the algorithm to play a variety of Atari computer games, and the other was to get it to learn how to control a virtual humanoid walker in a physics engine.
First the algorithm started with a random policy – the set of rules that govern how the system should behave to achieve high score, and then it created several hundred copies of the policy, with some random variation that were then tested on the game. These policies were then mixed back together again, but with greater weight given to the ones that got the highest score in the game. The team repeated the process until it came up with a policy that played the game well.
In just an hour of training on the Atari challenge the algorithm achieved a level of mastery that took a DeepMind’s reinforcement learning system a whole day to learn, and on the walking problem it took just 10 minutes, compared to DeepMind’s 10 hours.
One of the keys to this dramatic performance improvement was the fact that the new system was superb at processing workloads in parallel. To solve the walking simulation, for example, the system spread its computations over 1,440 CPU cores, while in the Atari challenge it used 720.
This was possible because the system only required limited communication between the various “worker” algorithms testing the candidate policies – scaling reinforcement algorithms like DeepMind’s have to communicate a lot more. Additionally, the new system didn’t need to use “backpropagation,” a common neural network learning technique – this effectively compares the network’s input with the desired output and then feeds the resulting information back into the network to help optimise it.
When combined this helped make the new systems code shorter, and the algorithm three to four times faster. But the approach has its limitations. These kinds of algorithms are usually compared based on their data efficiency – the number of iterations required to achieve a specific score in a game, and using this metric, the OpenAI approach did worse than the traditional reinforcement learning approaches, even though it carried out those iterations much quicker.
For supervised learning problems, for example, such as image classification and speech recognition it was up to 1,000 times slower than approaches that use backpropagation. And that’s bad.
Nevertheless, the work demonstrated promising new applications for what were once thought obsolete evolutionary approaches, and OpenAI isn’t the only group investigating them, Google has also been experimenting with older algorithms, so while I don’t know about dogs, it certainly looks like you can teach old algorithms new tricks.
Matthew Griffin Global Futurist, Tech Evangelist, X Prize Mentor ● Int'l Keynote Speaker ● Disruption, Futures and Innovation expert
Matthew Griffin, Futurist and Founder of the 311 Institute, a global futures think tank, is described as “The Adviser behind the Advisers.” Recognised in 2013, 2015 and 2016 as one of Europe’s foremost futurists, innovation and strategy experts Matthew mentors several XPrize teams, and is an award winning author, entrepreneur and international speaker who is regularly featured on the BBC, Discovery, Kurzweil, Newsweek, TechCrunch and VentureBeat. Working hand in hand with accelerators, investors, governments, multi-nationals and regulators around the world Matthew shines a light on the future and helps them transform their industries, organisations, products and services by demonstrating how the combination of democratised, and increasingly powerful emerging technologies, are helping fuel cultural, industrial and societal change that is transforming old industries and creating new ones. Matthew’s clients include Accenture, Bain & Co, Bank of America, Booz Allen Hamilton, Boston Consulting Group, Dell EMC, Deloitte, Deutsche Bank, E&Y, Fidelity, Goldman Sachs, Huawei, JP Morgan Chase, KPMG, McKinsey & Co, PWC, Qualcomm, SAP, Schroeder’s, Sequoia Capital, UBS, the UK’s HM Treasury, the USAF and many others.