|Context||NIPS 2015 Reasoning, Attention, and Memory Workshop|
A fundamental NLP tslk is question answering because it can encompass other tasks - e.g., given some text, we must answer a question about it (Where is Mary? What's the sentiment? Who are the characters in the story? What are the parts of speech in this sentence? What's the translation to French?). It would therefore make sense to work on a joint model for question answering. However, there is currently no model which has consistent results across a wide variety of tasks - most achieve state-of-the-art results on a single tasks. Furthermore, fully joint multitask learning is hard, so it is usually restricted to the lower layers/levels of the model and only helps when the tasks are related.
Dynamic Memory Networks are an attempt to have a general model architecture for arbitrary Q&A tasks. At a high level, there are a few modules, all of which communicate with vectors, all of which can be trained with backpropagation. The first module is the input module which is a neural sequence model which computes a hidden representation of every word in every sentence. Then, there is a question module, which determines which portions of the input to attend to given a query input. Because some questions require multiple passes over the memory, there is an episodic memory module which allows attention to be applied to the input multiple times and produces a final vector representation for all relevant inputs. This vector is passed to an answer module which is a neural sequence module which generates an answer.
More specifically, the input module is a GRU, which is the simplest model with two advantages over a simple RNN: It can make the hidden state depend only on the input, or it can copy the hidden state completely. This is achieved by computing two nonlinear representations of the combination of the input and the hidden state, and using them to determine how much to receive input/forget the state and how much to produce output respectively. The episodic memory module is a GRU augmented with an additional gate which depends on the current timestep. It captures all kinds of similarities between the sentence vector and the question, and iteratively summarizes all relevant facts known so far (allowing for transitive reasoning). Some of these ideas are inspired by neuroscience, in that there are brain regions where events are stored and recalled. The answer model is also another GRU, with a softmax over the output vocabulary. You can also do unsupervised pretraining of the inputs by using GLOVE wordvectors.
On bAbI tasks, the resulting model performs about as well as the Memory Network. It has also been trained to produce a wide variety of NLP-like answers to text inputs. The behavior of the episodic memory (how the attention changes over time) can also be visualized to evaluate results. The model makes no assumptions about the language, so it can work reasonably well with different languages, provided that it was trained on different question/answer pairs. The sequence to sequence model of Sutskever can be seen as a special case without the attention and episodic memory module. The most relevant related work are memory networks; both of which have input, scoring, attention, and response mechanisms. The main different is implementation, where dynamic memory networks use the same sequence models for all of the different modules which may help provide a broader range of applications.