|Context||NIPS 2015 Reasoning, Attention, and Memory Workshop|
Neural Turing Machines try to simulate human working memory using a linearly structured memory, which has proven successful for simple algorithms. The linear memory structure may be insufficient for more complicated tasks, such as question answering. From a neurological perspective, the Neural Turing Machine makes sense because our “controller” and “memory” are jointly learned from our environment. This advantage is also a disadvantage because it means that the memory interaction between parameters and memory content could be hard to control. For some tasks, such as copy and recall, and on question answering, this can prevent or slow convergence. The human brain does not necessarily have a linear structure; it may be more tree-structured, with branches and conditionals. This suggests that by mimicking structured memory in the Neural Turing Machine may improve its results.
A simple hierarchy was introduced, with an upper and lower layer, where signals from the upper layer are accumulated into the lower memory - signals are just sent from one to another, not read/write/etc. Three different ways of communicating between the upper and lower memory layers; one where the lower memory is “hidden” and cannot be read from and the upper layer only receives content from the lower memory (“hidden memory”), one where there is a second read head to write to the upper layer (“double-controlled”), and finally to use two separate LSTMs for writing into each memory (“tightly coupled”). The idea is that the upper layer is a short-term memory which remembers immediate input where the lower layer is a longer-term memory.
These models were tested on three tasks (copy, recall, and bAbI question-answering), it was observed that the “vanilla” NTM had more trouble determining the correct procedure for the problem as quickly during training. Overfitting was observed for the vanilla NTM with two memories, and for the tightly coupled NTM. On the supporting-fact(s)-only bAbI tasks, the hidden-memory, double-controlled and tightly-coupled NTMs also converged quickly but the vanilla NTM was not able to converge. The success could be thanks to the fact that the additional memory structures are stabilizing/smoothing. Adding a more complicated structure (e.g. graphs rather than a depth-two tree) may improve things. It may also be valuable to study a real question answering dataset.