User Tools

Site Tools


Exploiting Cognitive Constraints to Improve Machine-Learning Memory Models

Presenter Michael Mozer
Context NIPS 2015 Reasoning, Attention, and Memory Workshop
Date 12/12/15

The human visual system has provided a good inspiration for computer visual system, so maybe the memory system can provide inspiration for reasoning attention and memory systems. Understanding human memory is important for machine learning systems which must predict what information is interesting or available at a point in time. For example, given a simple task where one stimulus requires a response, and a different stimulus requires another response, and stimuli are presented sequentially, the previous trials can have a strong effect on the response time. This is a reflection of a simple memory which looks at the recent past when building up an expectation about the next stimulus. In some experiments it has been suggested that the dependence on the past follows a power law, which is a property of explicit sorts of human memory (e.g. studying to learn facts which will be tested at a later point). It has been shown that spaced exposure to information results in better retention over a longer amount of time. The relationship between the spacing of exposure and the required retention period is non-uniform and non-monotonic among individual retention pyramids. The spacing between exposure sessions and retention interval roughly follows a power law fit, and can be modeled e.g. as a neural network or a cascade of leaky integrators or a Kalman filter. Some key features of these models are that when an event happens, a memory of it is stored in multiple traces (intervals), with the traces decaying at different rates, and the memory strength is a weighted sum of traces, with the slower scales weighted less importantly. This allows it to predict the amount to which we might remember a stimulus based on how long ago it was, how often we were exposed to it, and how many times. Compared to gated recurrent network models (LSTMs), the recurrent networks have little or no decay. Compared to networks with learned decay constants, the networks do not have an enforced dominants of faster times scales. Hierarchical recurrent networks have fixed decay constants, and history compression networks have their compression event based instead of time-based. A model combining all of the necessary components to mimic human memory would be one where a number of groups of recurrent units with fixed decay rates, with a learned input mapping. An appropriate model may be able to predict human memory and behavior better.

exploiting_cognitive_constraints_to_improve_machine-learning_memory_models.txt ยท Last modified: 2015/12/17 21:59 (external edit)