Reinforcement learning: What action, what value? Gerhard Jocham Otto-von-Guericke University, Magdeburg Once we observe the outcome of a decision, we can use this for learning - we optimize our future behaviour. But which of the many actions we perform is responsible for a particular outcome? This is called the credit assignment problem. I will present behavioural and fMRI data suggesting that there are separable learning mechanisms operating in parallel that compete for control over behaviour. Of these, only one indeed relies on establishing contingent associations between rewards and their causal choices, whereas three other mechanisms could be identified that made use of simpler statistical dependencies. Our fMRI data suggest that neural activity in lateral orbitofrontal cortex (LOFC) bears the hallmarks of a signal that may serve to link outcomes to their causal choices. Further nodes in an LOFC- centred network and their relationship to the different learning mechanisms will be discussed. The second part of the talk will ask what values are actually learnt during reinforcement learning. In formal learning theory, prediction errors (the difference between the outcome received and the outcome expected) are key drivers of learning. They are used to update the value of stimuli or actions. However, recent findings suggest that activity in the striatum may reflect the error term of an algorithm (policy gradient) that does not learn any values at all but instead directly updates a choice policy. I will present very preliminary data that suggest an alternative possibility, that values may be learnt in a relative frame, that is, how good one option is relative to an alternative option.