Automated testing allows you to do a few things:

  1. Tests codify your expectations at the point when you actually remember what you’re expecting
  2. They allow you to offload verification to a machine
  3. You can make changes to your code with confidence

These three things are what make tests such an important part of software development, without tests it’s difficult or impossible to identify where erros are occuring, and to make changes to your code without causing further problems.

Data science projects tend to be pretty under tested, which is unfortunate because they have all of the same maintainability issues as software projects. For instance if the packages you use change, the data changes, or you just go back and try to make some changes yourself you run the risk of breaking your analysis in in ways which can be very difficult to detect.

There’s been a fair amount of excellent writing on and testing data but it’s also useful to test the statistical model itself.

A lot of times statistical models are developed once, and then deployed into production systems. During the development process a data science will hopefully spend a lot of time interogating the model to make sure that it’s not overfitting, making use of bad data, or misbehaving in some other manner. Once the model is deployed however people move on to other things and might not worry to much about whether the model is continuing to function as they expected. This leaves the model open to two big classes of bugs:

  1. The model’s inputs change This might happen if the data collection process changes, the modelling function itself changes, or you change something about how the data is processed before it’s fed into the model. Sometimes this will result in dramatic failures which are easy to identify. For instance your model might start producing noticibly bad predictions which users complain about, but it might also produce subtle problems which you can’t easily detect. The model might just produce worse predictions, or do so only for a subset of users.

  2. The model is refit on invalid data One of the great features of modern machine learning techniques is that they can continue to learn from new data. So you might fit a neural network on a historic dataset, but continue to update the model. In this case the data which is used to update the model can systematically bias it, and if you aren’t checking in on the model regularly you might lose accuracy. An extreme example might be if a bug in the data collection process resulted in a single user’s record being repeated multiple times. If the model is refit on that data, it would begin to fit to that user record at the expense of out of sample accuracy.

How to test models

You can test models in the same way you test other parts of your package.

LS0tCnRpdGxlOiAiQXV0b21hdGUgeW91ciBzYW5pdHkgY2hlY2tzOiBtb2RlbCB2ZXJpZmljYXRpb24gd2l0aCB0ZXN0dGhhdCAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCkF1dG9tYXRlZCB0ZXN0aW5nIGFsbG93cyB5b3UgdG8gZG8gYSBmZXcgdGhpbmdzOgoKMSkgVGVzdHMgY29kaWZ5IHlvdXIgZXhwZWN0YXRpb25zIGF0IHRoZSBwb2ludCB3aGVuIHlvdSBhY3R1YWxseSByZW1lbWJlciB3aGF0IHlvdSdyZSBleHBlY3RpbmcKMSkgVGhleSBhbGxvdyB5b3UgdG8gb2ZmbG9hZCB2ZXJpZmljYXRpb24gdG8gYSBtYWNoaW5lCjEpIFlvdSBjYW4gbWFrZSBjaGFuZ2VzIHRvIHlvdXIgY29kZSB3aXRoIGNvbmZpZGVuY2UKClRoZXNlIHRocmVlIHRoaW5ncyBhcmUgd2hhdCBtYWtlIHRlc3RzIHN1Y2ggYW4gaW1wb3J0YW50IHBhcnQgb2Ygc29mdHdhcmUgZGV2ZWxvcG1lbnQsIHdpdGhvdXQgdGVzdHMgaXQncyBkaWZmaWN1bHQgb3IgaW1wb3NzaWJsZSB0byBpZGVudGlmeSB3aGVyZSBlcnJvcyBhcmUgb2NjdXJpbmcsIGFuZCB0byBtYWtlIGNoYW5nZXMgdG8geW91ciBjb2RlIHdpdGhvdXQgY2F1c2luZyBmdXJ0aGVyIHByb2JsZW1zLiAKCkRhdGEgc2NpZW5jZSBwcm9qZWN0cyB0ZW5kIHRvIGJlIHByZXR0eSB1bmRlciB0ZXN0ZWQsIHdoaWNoIGlzIHVuZm9ydHVuYXRlIGJlY2F1c2UgdGhleSBoYXZlIGFsbCBvZiB0aGUgc2FtZSBtYWludGFpbmFiaWxpdHkgaXNzdWVzIGFzIHNvZnR3YXJlIHByb2plY3RzLiBGb3IgaW5zdGFuY2UgaWYgdGhlIHBhY2thZ2VzIHlvdSB1c2UgY2hhbmdlLCB0aGUgZGF0YSBjaGFuZ2VzLCBvciB5b3UganVzdCBnbyBiYWNrIGFuZCB0cnkgdG8gbWFrZSBzb21lIGNoYW5nZXMgeW91cnNlbGYgeW91IHJ1biB0aGUgcmlzayBvZiBicmVha2luZyB5b3VyIGFuYWx5c2lzIGluIGluIHdheXMgd2hpY2ggY2FuIGJlIHZlcnkgZGlmZmljdWx0IHRvIGRldGVjdC4gCgpUaGVyZSdzIGJlZW4gYSBmYWlyIGFtb3VudCBvZiBleGNlbGxlbnQgd3JpdGluZyBvbiAhW3Rlc3RpbmcgZnVuY3Rpb25hbGl0eV0oaHR0cDovL3ItcGtncy5oYWQuY28ubnovdGVzdHMuaHRtbCkgYW5kICFbdGVzdGluZyBkYXRhXShodHRwczovL2dpdGh1Yi5jb20vcm9wZW5zY2kvdGVzdGRhdCkgYnV0IGl0J3MgYWxzbyB1c2VmdWwgdG8gdGVzdCB0aGUgc3RhdGlzdGljYWwgbW9kZWwgaXRzZWxmLiAKCkEgbG90IG9mIHRpbWVzIHN0YXRpc3RpY2FsIG1vZGVscyBhcmUgZGV2ZWxvcGVkIG9uY2UsIGFuZCB0aGVuIGRlcGxveWVkIGludG8gcHJvZHVjdGlvbiBzeXN0ZW1zLiBEdXJpbmcgdGhlIGRldmVsb3BtZW50IHByb2Nlc3MgYSBkYXRhIHNjaWVuY2Ugd2lsbCBob3BlZnVsbHkgc3BlbmQgYSBsb3Qgb2YgdGltZSBpbnRlcm9nYXRpbmcgdGhlIG1vZGVsIHRvIG1ha2Ugc3VyZSB0aGF0IGl0J3Mgbm90IG92ZXJmaXR0aW5nLCBtYWtpbmcgdXNlIG9mIGJhZCBkYXRhLCBvciBtaXNiZWhhdmluZyBpbiBzb21lIG90aGVyIG1hbm5lci4gT25jZSB0aGUgbW9kZWwgaXMgZGVwbG95ZWQgaG93ZXZlciBwZW9wbGUgbW92ZSBvbiB0byBvdGhlciB0aGluZ3MgYW5kIG1pZ2h0IG5vdCB3b3JyeSB0byBtdWNoIGFib3V0IHdoZXRoZXIgdGhlIG1vZGVsIGlzIGNvbnRpbnVpbmcgdG8gZnVuY3Rpb24gYXMgdGhleSBleHBlY3RlZC4gVGhpcyBsZWF2ZXMgdGhlIG1vZGVsIG9wZW4gdG8gdHdvIGJpZyBjbGFzc2VzIG9mIGJ1Z3M6IAoKMSkgVGhlIG1vZGVsJ3MgaW5wdXRzIGNoYW5nZQpUaGlzIG1pZ2h0IGhhcHBlbiBpZiB0aGUgZGF0YSBjb2xsZWN0aW9uIHByb2Nlc3MgY2hhbmdlcywgdGhlIG1vZGVsbGluZyBmdW5jdGlvbiBpdHNlbGYgY2hhbmdlcywgb3IgeW91IGNoYW5nZSBzb21ldGhpbmcgYWJvdXQgaG93IHRoZSBkYXRhIGlzIHByb2Nlc3NlZCBiZWZvcmUgaXQncyBmZWQgaW50byB0aGUgbW9kZWwuIFNvbWV0aW1lcyB0aGlzIHdpbGwgcmVzdWx0IGluIGRyYW1hdGljIGZhaWx1cmVzIHdoaWNoIGFyZSBlYXN5IHRvIGlkZW50aWZ5LiBGb3IgaW5zdGFuY2UgeW91ciBtb2RlbCBtaWdodCBzdGFydCBwcm9kdWNpbmcgbm90aWNpYmx5IGJhZCBwcmVkaWN0aW9ucyB3aGljaCB1c2VycyBjb21wbGFpbiBhYm91dCwgYnV0IGl0IG1pZ2h0IGFsc28gcHJvZHVjZSBzdWJ0bGUgcHJvYmxlbXMgd2hpY2ggeW91IGNhbid0IGVhc2lseSBkZXRlY3QuIFRoZSBtb2RlbCBtaWdodCBqdXN0IHByb2R1Y2Ugd29yc2UgcHJlZGljdGlvbnMsIG9yIGRvIHNvIG9ubHkgZm9yIGEgc3Vic2V0IG9mIHVzZXJzLiAKCjIpIFRoZSBtb2RlbCBpcyByZWZpdCBvbiBpbnZhbGlkIGRhdGEKT25lIG9mIHRoZSBncmVhdCBmZWF0dXJlcyBvZiBtb2Rlcm4gbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWVzIGlzIHRoYXQgdGhleSBjYW4gY29udGludWUgdG8gbGVhcm4gZnJvbSBuZXcgZGF0YS4gU28geW91IG1pZ2h0IGZpdCBhIG5ldXJhbCBuZXR3b3JrIG9uIGEgaGlzdG9yaWMgZGF0YXNldCwgYnV0IGNvbnRpbnVlIHRvIHVwZGF0ZSB0aGUgbW9kZWwuIEluIHRoaXMgY2FzZSB0aGUgZGF0YSB3aGljaCBpcyB1c2VkIHRvIHVwZGF0ZSB0aGUgbW9kZWwgY2FuIHN5c3RlbWF0aWNhbGx5IGJpYXMgaXQsIGFuZCBpZiB5b3UgYXJlbid0IGNoZWNraW5nIGluIG9uIHRoZSBtb2RlbCByZWd1bGFybHkgeW91IG1pZ2h0IGxvc2UgYWNjdXJhY3kuIEFuIGV4dHJlbWUgZXhhbXBsZSBtaWdodCBiZSBpZiBhIGJ1ZyBpbiB0aGUgZGF0YSBjb2xsZWN0aW9uIHByb2Nlc3MgcmVzdWx0ZWQgaW4gYSBzaW5nbGUgdXNlcidzIHJlY29yZCBiZWluZyByZXBlYXRlZCBtdWx0aXBsZSB0aW1lcy4gSWYgdGhlIG1vZGVsIGlzIHJlZml0IG9uIHRoYXQgZGF0YSwgaXQgd291bGQgYmVnaW4gdG8gZml0IHRvIHRoYXQgdXNlciByZWNvcmQgYXQgdGhlIGV4cGVuc2Ugb2Ygb3V0IG9mIHNhbXBsZSBhY2N1cmFjeS4gCgoKIyMgSG93IHRvIHRlc3QgbW9kZWxzCgpZb3UgY2FuIHRlc3QgbW9kZWxzIGluIHRoZSBzYW1lIHdheSB5b3UgdGVzdCBvdGhlciBwYXJ0cyBvZiB5b3VyIHBhY2thZ2UuIAoKLSBIb3cgZG8gdGhlIG1vZGVsIHBhcmFtZXRlcnMgbG9vaz8KLSBJcyBpdCBwZXJmb3JtaW5nIHdlbGwgZW5vdWdoPwotIEFyZSB0aGUgcHJlZGljdGlvbnMgd2l0aGluIGEgc2Vuc2libGUgcmFuZ2U/Ci0gQXV0b21hdGVkIHNhbml0eSBjaGVja3M/Ci0gSXMgaXQgbWFraW5nIHVzZSBvZiBpbGxlZ2FsIG9yIHVuZXRoaWNhbCBkYXRhPwo=