The idea that brains use probabilistic internal models of the world is a powerful explanation for a range of behavioural phenomena. Unclear is whether or how neurons represent such models in higher cortical regions, learn them, and use them in behaviour. To address these issues, we sought evidence for the learning of internal models by cortical neurons during a behavioural task. Using a sampling framework, we predicted that trial-evoked and sleeping population activity represent the inferred and expected probabilities generated from an internal model of the task, and would become more similar as the task was learnt. To test these predictions, we analysed population activity from rodent prefrontal cortex before, during, and after sessions of learning rules on a maze. Distributions of activity patterns converged between trials and post-learning sleep during successful rule learning. Learning induced changes were greatest for patterns predicting correct choice and expressed at the choice point of the maze, consistent with an updated internal model of the task. Our results suggest sample-based internal models are a general computational principle of cortex.