|Authors||Arshia Cont, Shlomo Dubnov, and Gérard Assayag|
|Publication Info||Retrieved here|
Proposes an active reinforcement-learning based model for automatic musical improvisation and style imitation.
Expectation can be seen as the principal emotional content of music, but most computational systems to not attempt to model its cognitive importance. It has been shown that listeners are highly sensitive to the probabilities of different sound events and patterns, based on an internal representation of sound based on musical attributes. The expectation may be based on a system of rewards and punishments that evaluates the prediction accuracy, suggesting “competing and concurrent representations”.
Memory is used for preparation, and can be classified into four sources of musical expectation: Verdical/episodic, holding specific events from the past; schematic, which does not associate with the past (“first-exposure listening”); dynamic adaptive, which is formed in real-time as the piece is being listened to; and conscious, which relates to using explicit knowledge to process the music. These schemes operate concurrently and in parallel, and anticipatory events arise from interactions between each component. A computational model should address all expectation types; this work focuses on dynamic adaptive and conscious.
One interaction scheme involves features activating parts of a long-term memory which evoke similar events in the past, which creates a context for expectations that influence the current cognitive state. A short-term memory causes modifications in the long-term memory, depending on how much it is repeated and rehearsed. Thinking also requires feedback to ensure it is adaptive (reward useful thinking, discourage useless thinking).
The current work is focused on automatic systems which address the complexity of musical signals directly. A challenge for these systems arises from the need to simultaneously represent and process many attributes of music information, which is handled by the way music is represented and the memory model used. Cross alphabets represent each symbol as a vector of attributes, so they do not model interactions between components directly; instead, membership functions must be constructed to allow for context dependencies. In a multiple viewpoint model, individual models are derived for each musical attribute and combined, allowing modal interactions among components with the requirement that a huge repertoire of music is used. This allows for explicit consideration of long-term dependencies during learning. Factorial Markov Models model the probability of each component at any given point in time based on the correlation between the components. They do not model any long-term behavior. Most previous approaches use predictive models based on context, and do not consider musical form or changes in interaction between players.
Anticipation is the mental realization of possible predicted actions and their effect on the perception and learning of the world at an instant in time; a marriage of actions and expectations. The proposed system is both payoff anticipatory and state anticipatory, explicitly using prediction and anticipation during learning and decision making. The anticipatory profile and emotional force data of music can be measured based on human experiments.
Reinforcement learning is used, which encapsulates a continuous interaction between an agent and its environment. The environment responds to the agent's actions and produces numerical values that the agent tries to maximize over time through three signals: The agent's actions, the state of the system, and the agent's goal (rewards). The proposed system uses a “dyna” RL architecture, in which the environment is the human performer or score. The agents play the role of memory and mental representations of the input sequences.