|Context||NIPS 2015 Reasoning, Attention, and Memory Workshop|
The ultimate goal of communication-based AI (intelligent machines) are machines which can do almost anything, including helping students understand homework, help researchers find information and to tasks which are too demanding, write programs, etc. For a roadmap, we need a minimal set of components required for an intelligent machine, an approach to construct the machine, and the requirements for the machine to be scalable.
Machines need the ability to communicate, because it is not important that they can behave in some specific environment so much as being able to communicate with us in a way we can understand. Machines also need to be motivated to solve the problems we are interested in. Finally, the machine needs to be able to adapt (learn) over time, and in particular with very long-term memory. To develop such machines, we need an environment which can teach basic communication skills and learning strategies. It also needs to be able to receive rewards for correct behavior to ensure motivation. It should be built incrementally, starting from simpler problems.
It makes sense to have an environment which we know all of the behavior, but still teaches requisite communication skills. The tasks must be designed carefully to have the appropriate complexity, so that the machine doesn't fail early but nevertheless makes progress. The environment must include a learner, a teacher, and rewards. As tasks are completed, they should be more complex, and the environment must include communication channels. Overall this matches some of the goals of reinforcement learning. The environment represents the learner's world, where the learner is an intelligent machine which receives an input signal and a reward signal based on an output signal. The environment must also include the teacher, which specifies tasks for the learner, which at first can be a machine itself but eventually should be replaced by human users.
A machine could receive a list of instructions (“move and learn”), it could then take the actions “move” and “learn” by communicating those actions; based on completing these actions it then receives a reward.
It's important to build a machine which can learn quickly, so that it can scale up quickly from simple tasks to complex problems. Learning quickly involves being able to show a new type of behavior and have it be guided through a few tasks before it can generalize to similar tasks later. Over time, there should be less and less need for supervision to receive rewards and complete the tasks. Eventually, the learner must be capable of communicating with humans who can teach it behavior, and must be capable of having their environment in the real world. Overall, the machine needs long-term memory, needs to be Turing-Complete, needs to be able to handle incremental/compositional learning, and needs to be able to decrease the amount of supervision through rewards. Current models do not achieve these requirements. For example, certain trivial patterns are out-of-scope for RNNs; stack-augmented recurrent networks are an example of a new model which can solve some of these tasks.
This idea extends recurrent networks with structured long term memory, which the network learns to control, which is an idea from the 1980s and 90s. The model is fully continuous, and is trained to be able to read/write to an unbounded memory and take push, pop, and no-operation actions. Additional memory structures such as lists, queues, tapes, grids, etc. could also be used. Because it is continuous, all of its behavior can be learned from examples. It is able to learn simple patterns (grammars) in an unsupervised manner, just from observing the example sequences. Because RNNs cannot count, they can't solve these problems; LSTMs can solve these problems when they are simple enough; stack RNNs can solve these problems for complex grammars because they can learn to push and pop a variable number of variables from the stack. The stack RNN can also learn to do binary addition in an “unsupervised” manner, e.g. predicting the next symbol correctly. Interestingly, the model ends up using different stacks for different clear purposes (counters, end of number symbols, length of number symbols, carry, etc.). In summary, stack RNNs are turing-complete with at least two stacks, can learn some algorithmic patterns, have long-term memory, and work for problems which break RNNs and LSTMs, but it only learns to store partial calculations (rather than the logic - the logic is stored in the weight matrices, so if you train it for one task and then train it for another it will forget how to do the first task) and it is pretty inefficient.