TITLE: Unexpected uncertainty as a driver of human exploration in n-arm bandit problems: Limitations of fixed parameter ’softmax’ (Gibbs law) reinforcement learning algorithms as approximate computational explanations of human behaviour. SPEAKER: Dr. Kiran Kalidindi (Manchester Business School, University of Manchester, UK.) ABSTRACT: A number of researchers have used reinforcement learning algorithms to understand human behaviour in narm bandit problems (Kalidindi & Bowman, 2007; Busemeyer & Stout, 2002; Daw, O’Doherty, Dayan, Seymour, & Dolan, 2006; Yechiam & Busemeyer, 2008). In this talk I will focus on the popular ’softmax’ reinforcement learning (RL) action-selection rule. Specifically, I will be considering human data from 4-arm bandit problems and use the popular Iowa Gambling task as an example to set the scene. The payoffs for this task are stationery, which signifies that the reward distribution for each choice is fixed. Frequently, models fitted to human choice data using model tracing and maximum-likelihood (Busemeyer & Stout, 2002; Wetzels, Vandekerckhove, Tuerlinckx, & Wagenmakers, In Press) suffer from two deficits, in that, when they are ’run’ they produce behaviour unlike human subjects and they are poor at fitting new payoff regimes (Yechiam & Busemeyer, 2008), they do not generalize well. One reason for this is that these models are generally implemented with fixed parameters throughout the task. The limitations of using fixed parameter RL models are further highlighted by a non-stationery 4-arm bandit design we have recently run in our own lab. A non-stationery task is one in which the payoff distributions may change over time. I will present some variations of the earlier presented RL models that get closer to replicating human behaviour by adjusting parameters relative to expected prediction-error and the unexpected predictionerror. This is somewhat similar to the expected and unexpected uncertainty idea of Yu & Dayan (2005). References Busemeyer, J. R. & Stout, J. C. (2002). A contribution of cognitive decision models to clinical assessment: decomposing performance on the bechara gambling task. Psychological Assessment, 14(3), 253--262. Daw, N., O’Doherty, J., Dayan, P., Seymour, B., & Dolan, R. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876--879. Kalidindi, K. & Bowman, H. (2007). Using e-greedy reinforcement methods to further understand ventromedial prefrontal patients’ deficits on the iowa gambling task. Neural Networks, 20, 676--689. Wetzels, R., Vandekerckhove, J., Tuerlinckx, F., & Wagenmakers, E.-J. (In Press). Bayesian parameter estimation in the expectancy valence model of the iowa gambling task. Journal of Mathematical Psychology. Yechiam, E. & Busemeyer, J. R. (2008). Evaluating generalizability and parameter consistency in learning models. Games and Economic Behavior, 63(1), 370--394. Yu, A. J. & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46, 681--692.