Neural activity during cognitive tasks exhibits complex dynamics that flexibly encode task-relevant variables. Recurrent neural networks operating in the near-chaotic regime, which spontaneously generate rich dynamics, have been proposed as a model of cortical computation during cognitive tasks. However, existing methods for training these networks are either biologically implausible, and/or require a continuous, real-time error signal to guide the learning process. Here we show that a biologically plausible learning rule can train such recurrent networks, guided solely by delayed, phasic rewards at the end of each trial. Networks operating under this learning rule successfully learn nontrivial tasks requiring flexible (context-dependent) associations, memory maintenance, nonlinear mixed selectivities, and coordination among multiple outputs. Furthermore, applying this method to learn various tasks from the experimental literature, we show that the resulting networks replicate complex dynamics previously observed in animal cortex, such as dynamic encoding of task features, switching from stimulus-specific to response-specific representations, and selective integration of sensory input streams. The rule also successfully trains networks with nonnegative responses and separate excitatory and inhibitory neurons observing Dale's law. We conclude that recurrent neural networks offer a plausible model of cortical dynamics during both learning and performance of flexible behavior.