Getting your hands on real-world data to test your software is the real deal. Nothing comes closer to reality than feeding an application with actual data from the wild. But what if the amount of data is more than you can handle? There are limitations on how much time you can spend on test runs. Whether you are running tests on a developer machine or as part of a Continuous Integration system, you probably won't be able to crunch large amounts of data each time you make a code change. Sooner or later, you will be forced to shrink your test corpus to a more manageable size. This article presents an approach that uses code coverage metrics to determine a representative test data subset.