Abstract
Species delimitation, the process of deciding how to group a set of organisms into units called species, is one of the most challenging problems in evolutionary computational biology. While many methods exist for species delimitation, most based on the coalescent theory, few are scalable to very large datasets and those methods that scale tend to be not very accurate. Species delimitation is closely related to species tree inference from discordant gene trees, a problem that has enjoyed rapid advances in recent years. A major advance has been the surprising accuracy and scalability of methods that rely on breaking gene trees into quartets of leaves. In this paper, we build on this success and propose a new method called SODA for species delimitation. We test SODA in extensive simulations and show that it can easily scale to very large datasets while maintaining high accuracy.