A central challenge in sensory neuroscience involves understanding how neural circuits shape computations across multiple nonlinear cell layers. Here we develop a computational framework to reconstruct the response properties of experimentally unobserved neurons in the interior of a multilayered neural circuit. We combine non-smooth regularization techniques with proximal consensus algorithms to overcome traditional difficulties in fitting such models due to the high dimensionality of their parameter space. Our methods are both statistically and computationally efficient, enabling us to not only rapidly learn hierarchical non-linear models, but also efficiently compute widely used descriptive statistics such as the spike triggered average (STA) and covariance (STC) for high dimensional stimuli. We apply our framework to retinal ganglion cell processing, learning STAs and STCs to similar accuracy using just 12% of recorded data, reducing experiment time by an order of magnitude. Furthermore, we learn three layer nonlinear models of retinal circuitry, consisting of thousands of parameters, using only 40 minutes of responses to white noise. Our models demonstrate a 53% improvement in predicting ganglion cell spikes over classical linear-nonlinear (LN) models. The internal structure of these models reveals that hidden nonlinear subunits match the properties of retinal bipolar cells in both receptive field structure and number. Subunits had consistently high thresholds, leading to sparse activity patterns in which only one subunit drives ganglion cell spiking at any time. From the model's parameters, we predict that the removal of visual redundancies through stimulus decorrelation across space-a central tenet of efficient coding theory-originates primarily from bipolar cell synapses. Furthermore, the composite nonlinear computation performed by retinal circuitry corresponds to a boolean OR function applied to bipolar cell feature detectors. Our general computational framework may aid in extracting principles of nonlinear hierarchical sensory processing across diverse modalities from limited data.