Existing models of birdsong learning assume that brain area LMAN introduces variability into song for trial-and-error learning. Recent data suggest that LMAN also encodes a corrective bias driving short-term improvements in song. These later consolidate in area RA, a motor cortex analogue downstream of LMAN. We develop a new model of such two-stage learning. Using a stochastic gradient descent approach, we derive how 'tutor' circuits should match plasticity mechanisms in 'student' circuits for efficient learning. We further describe a reinforcement learning framework with which the tutor can build its teaching signal. We show that mismatching the tutor signal and plasticity mechanism can impair or abolish learning. Applied to birdsong, our results predict the temporal structure of the corrective bias from LMAN given a plasticity rule in RA. Our framework can be applied predictively to other paired brain areas showing two-stage learning.