User Tools

Site Tools


dynamic_memory_networks

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

dynamic_memory_networks [2015/12/17 21:59] (current)
Line 1: Line 1:
 +{{tag>"​Talk Summaries"​ "​Neural Networks"​ "NIPS 2015"​}}
  
 +====== Dynamic Memory Networks ======
 +
 +| Presenter | Richard Socher |
 +| Context | NIPS 2015 Reasoning, Attention, and Memory Workshop |
 +| Date | 12/12/15 |
 +
 +A fundamental NLP tslk is question answering because it can encompass other tasks - e.g., given some text, we must answer a question about it (Where is Mary?  What's the sentiment? ​ Who are the characters in the story? ​ What are the parts of speech in this sentence? ​ What's the translation to French?​). ​ It would therefore make sense to work on a joint model for question answering. ​ However, there is currently no model which has consistent results across a wide variety of tasks - most achieve state-of-the-art results on a single tasks. ​ Furthermore,​ fully joint multitask learning is hard, so it is usually restricted to the lower layers/​levels of the model and only helps when the tasks are related.
 +
 +===== Dynamic Memory Networks =====
 +
 +Dynamic Memory Networks are an attempt to have a general model architecture for arbitrary Q&A tasks. ​ At a high level, there are a few modules, all of which communicate with vectors, all of which can be trained with backpropagation. ​ The first module is the input module which is a neural sequence model which computes a hidden representation of every word in every sentence. ​ Then, there is a question module, which determines which portions of the input to attend to given a query input. ​ Because some questions require multiple passes over the memory, there is an episodic memory module which allows attention to be applied to the input multiple times and produces a final vector representation for all relevant inputs. ​ This vector is passed to an answer module which is a neural sequence module which generates an answer.
 +
 +More specifically,​ the input module is a GRU, which is the simplest model with two advantages over a simple RNN: It can make the hidden state depend only on the input, or it can copy the hidden state completely. ​ This is achieved by computing two nonlinear representations of the combination of the input and the hidden state, and using them to determine how much to receive input/​forget the state and how much to produce output respectively. ​ The episodic memory module is a GRU augmented with an additional gate which depends on the current timestep. ​ It captures all kinds of similarities between the sentence vector and the question, and iteratively summarizes all relevant facts known so far (allowing for transitive reasoning). ​ Some of these ideas are inspired by neuroscience,​ in that there are brain regions where events are stored and recalled. ​ The answer model is also another GRU, with a softmax over the output vocabulary. ​ You can also do unsupervised pretraining of the inputs by using GLOVE wordvectors.
 +
 +On bAbI tasks, the resulting model performs about as well as the Memory Network. ​ It has also been trained to produce a wide variety of NLP-like answers to text inputs. ​ The behavior of the episodic memory (how the attention changes over time) can also be visualized to evaluate results. ​ The model makes no assumptions about the language, so it can work reasonably well with different languages, provided that it was trained on different question/​answer pairs. ​ The sequence to sequence model of Sutskever can be seen as a special case without the attention and episodic memory module. ​ The most relevant related work are memory networks; both of which have input, scoring, attention, and response mechanisms. ​ The main different is implementation,​ where dynamic memory networks use the same sequence models for all of the different modules which may help provide a broader range of applications.  ​
dynamic_memory_networks.txt ยท Last modified: 2015/12/17 21:59 (external edit)