A lot of AI is trapped in the digital. It targets us with ads on YouTube. It generates stories and marketing copy with language models like GPT-3. It creates new and better deep fake videos to entertain us.
But a more difficult set of problems confront AI as it moves into the physical world. Deep reinforcement learning (RL) is one of the crucial technologies that will solve problems there.
Deep RL is a kind of AI that can learn how to reach a goal over many steps, often by inventing surprising moves to navigate complex environments. It sets deep artificial neural networks within a reinforcement learning framework that trains these neural networks through rewards and penalties as they try to achieve their goal.
RL works in situations where sequential decisions can be made on the path to a goal, and like robotic control and optimization problems (e.g. "learn how to walk across the room"), the decisions early in the sequence determine what can be done later. That is, they make your actions path-dependent, just as a move early in a chess game excludes certain moves later in a chess game.
The combination of these approaches means deep RL can win games and master real-world environments because it means an AI can both see and act on what it sees strategically.
Deep learning needs a lot of data, and is usually used for one-off decisions, such as "what objects are in this image?" But as with any high-potential technology, there are many problems to be solved before RL can be widely deployed.
To train a neural network to perform image recognition, all you need are pictures with labels. Someone has to name the thing portrayed in the pixels, and the neural network can learn to predict the name from the pixels. Finding people to label your pictures is usually easy, and you can hire them on Scale or Mechanical Turk.
When you train a reinforcement learning algorithm, you need to understand the entire system that the algorithm has to navigate as it seeks its goals. Video games are relatively easy, for a few different reasons:
Thinking about what makes video games good for RL also tells us some of the reasons why RL faces challenges when it is applied to real-world problems.
The real world is mostly not digitized. That is, no digital copy exists for most real-world situations, interactions, and behavior, and there are often no sensors in place for an algorithm to figure out what's going on. For privacy reasons, that's probably a good thing, but training an AI to solve a problem in physical space, it's a constraint. AI without data is as useful as a toaster without electricity.
The APIs of the real world — the way we observe it and act upon it — is not clean. Our senses tell us what's going on in our immediate surroundings, more or less, but even those are only partially observable, and we have to infer a lot from what little we can sense. For example, I may hear steps in the next room and have to guess who is walking across the floor.
IoT data is famously messy. Just like our senses, the sensors that machines rely upon to navigate the world can generate a lot of error. Interpreting sensory data means constantly filtering out the noise to see if there is any signal left.
Think for a second about the actions available on a video game console, and then compare those to the actions you need to take to walk across the room and pick up a cup of tea (or, more complex, to find a new job). The decisions we take in real life to achieve our goals are complex and the so-called action space, or the array of choices we have, can seem unlimited. It is hard to learn how to make the right decision in the face of unlimited choice.
A lot of what makes the world tick is invisible to us. We don't see what happened, let alone understand the causal chain of events. The same is true for algorithms. They can't see everything or know what leads to what. Which is something they have to overcome when we ask them to make decisions about what to do next.
These are some of the challenges with applying RL to the real world. You often need to build a digital replica of the system in which you want to achieve your goals – sometimes these are called digital twins, other times they are called simulations. You may not have data that allows you to see what is going on in that world, and if you have the data, it may not be clean. And finally, it may be hard to learn the actions that will lead to success, if indeed any exist.
With all that said, deep RL is incredibly effective at achieving its goals in the real world, in those situations where it has access to data and can test its decisions on the environment. The improvements RL shows in those cases are similar to the improvements that DeepMind's algorithms showed in the game of Go. It masters them in ways that humans have not. Because of that, it has the potential to revolutionize entire industries.
This leads to the question of ethics. I do not believe that separate ethics exist for AI, as opposed to other technologies and the choices made by their human operators. Much of AI ethics, as it is practiced, is a fig leaf to deflect criticism from the actions and choices that a company or government is making independent of the technology.
And, with few exceptions, it is a jobs program for people who cannot themselves build AI or deeply understand it, but nevertheless want to opine about new and powerful technology. Unfortunately, they will influence governmental policies just as they excite public fears. The AI ethics industrial complex is as good an excuse as any to pour pre-existing opinions about capitalism and technology into a new bottle.
We don't have separate ethics for cars or corkscrews. We have ways of thinking about how harm is caused, or likely to be caused. But, we don't have to reinvent the discipline of ethics now that it applies to large statistical models.
One additional reason why AI ethics is mostly futile is that every technology plays its hand, no matter what we do to stop it. Under the conditions of geopolitical competition and capitalist enterprise, someone somewhere will keep working on almost any technology that gives them an advantage, no matter what the professional scolds say.
Indeed, each great conflict gives rise to a major new technology. WWI gave us fighter planes, WWII gave us nuclear bombs, and the ongoing cyber war will probably give us a superintelligence amid a more general escalation of weapons development. Do I think we should outlaw superintelligence? If superintelligence can arise from a combination of math, code, and compute, then I don't think writing laws about it will be effective.
Instead of debating ethics, let's talk about limiting the near-term risks of deep RL, at least in this grace period we have before a superintelligence is born.
Deep RL can make sequential decisions in a complex space to achieve a goal. One way that it can do this is by controlling robots. Now, robots can share space with us, and they can hurt us if they're going too fast. So, an easy way to make free-range robots safer is with a hard-coded speed limit. Another key issue in deploying deep RL has to do with the types of decisions we allow it to make. When it's deployed to the battlefield, we need to have humans pull the trigger, just like we do with drones now. That leads us back to the same old ethics we've always had.
Most of the innovations of Silicon Valley are applied to the consumer internet and web-based software. That makes the internet a great place to be, but we don't just live online, no matter how many screens we stare at.
To make progress on hard, physical problems, the same focus and effort have to be applied to making and shipping physical objects. In manufacturing and logistics, things do not change at the speed of code.
A lot of the potential of deep RL is in the real world, controlling robots, vehicles, and other hardware. As deep RL gets deployed, we will see enormous gains in efficiency (doing more with less), which is one of the underrated ways of fighting climate change. And RL will operate not just on the scale of an individual robot that is picking up objects, but of entire systems, where fleets of vehicles, cranes, and robot arms have to be coordinated and can learn to work as teams. Rather than acting as an individual brain, RL has the potential to serve as a control tower for a swarm of actors, steering teams that coalesce out of complexity.
As boring as it may sound, the first place this will impact the physical world of business is in its scheduling, the choreography of capitalism. Which order should the factory process next? What gets made when by which machines and in what amount is a hard problem to solve for hundreds of thousands of companies worldwide. Brief delays in the flow of crucial goods can lead to enormous breakdowns in the supply chain, as we saw with the Suez Canal.
In sum, deep RL is a breakthrough technology that solves a lot of hard problems we couldn't solve before. Like all powerful technologies, it is a double-edged sword, so we have to think about both the harms and the benefits it produces. Deep RL's requirements make it hard to set up initially because it learns best within simulations where it can test many decisions without fear of catastrophic consequences in the real world. One of the prime areas of deep RL applications will be global manufacturing and the supply chain because those sectors already have to coordinate the work of many machines together to optimize their performance.
Chris Nicholson is the founder of Pathmind, an AI startup that applies deep reinforcement learning to supply chain and industrial operations. Pathmind was founded to help businesses handle deep economic change and increase the resilience of their operations with AI. Chris oversees the company's strategic vision and day-to-day execution, driving innovation and growth for Pathmind's technology platform, and optimizing performance in warehouses and on factory floors as part of the digital transformation of business,
By Chris Nicholson, CEO of Pathmind
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.