No-Garbage-In Policy

Most of the data sets our (potential) clients come with are problematic and can not be completely converted into technically valid registrations. We have chosen not to allow partial, inconsistent or invalid data into the system. We can’t always live up to this, but we will not lower the criteria, no matter what.

Let us first describe the problem. Lets say we are importing several works containing sub-publishers and their respectful territories. Some of them have characters in their titles that are not allowed in CWR format. We basically have four options.

Garbage IN - Garbage OUT

One would be to import all works, then create invalid CWR and hope it will not be noticed. It actually will not in some societies. But other may reject it. We may save some time initially by taking the risk, but those rejections will lead to names being changed. If we then choose only to register the correct data in those societies that rejected the original ones, we will have different registrations, which may lead to new issues. We can create revisions for all societies, but then we have not saved any time, quite on the contrary, it is worse compared to fixing things straight away.

Garbage IN - And stays IN

Second would be to import all works, but disallow the bad ones to be exported to a CWR file.This is much better option than the previous one, as long as the work can not be exported to CWR that is valid in any society. In an ideal world, this may work fine, but in real life, for various reasons, works never get fixed. What is even workse, issues never get reported back up the chain. Furthermore, this is not how societies work.

No Garbage IN

Third would be only to allow valid works to enter the system. This is how societies work. The fourth is to disallow the complete set just because one or several works are problematic. The difference between the two may be huge, depending on the source of data and how it can be fixed there, but as it is possible that our validation does not catch a problem, insisting on the last option would be pushing it too far needlessly. So we inform the user that there is an issue and let them decide whether they want the partial import or to cancel it completely.


The largest consequence for us is that we don’t sign up some potential clients, mostly due to the fact that they are not willing to press their clients for correct data. While we feel compassion for people in such situations, we are determined not to enter that mindset.

The largest consequence for our clients is that they fix their data, by fixing it they learn new skills and disallow bad data further up the stream, which, in turn, leads to less bad data overall. While there still are rejections, conflicts and other issues to deal with, there are much fewer of them and, more importantly, easier to detect and ultimately, fix.