Navigating the data quality minefield

By Jenny on July 14, 2017. Categorised in Data Management, Data Quality, FME


Your organisation never stands still, with data continually flowing in and out of your systems at an impressive rate. But there is one issue than most organisations struggle to cope with; controlling the accuracy of data received from third parties.

The consumption of poor quality data can cause significant knock-on effects – bad data leads to bad decision-making, often resulting in time-consuming and costly corrective action later on. It therefore makes sense to check the quality of incoming data at the point of arrival.

With seemingly few options available, many organisations turn to performing these quality processes manually, but this is far from ideal. The processes are often the remit of a small team of specialists, leaving the organisation vulnerable to knowledge shortages if members of the team move on.

Additionally, these specialist teams frequently struggle to keep up with the volume of requests for their time, meaning data users having to wait longer to access the intelligence contained within the data. All of these small delays add up to cause significant levels of disruption for the wider organisation.

What’s needed is an automated way to process this incoming data before it reaches significant systems so that users aren’t caught in a slow and costly queue. What would be even better is a system that could take this one step further by defining how any non-compliant data is handled – perhaps returning incorrect data to the issuer for review.

Not only would such a system drastically increase the volume of processes that could be performed, it would also remove human error resulting in far great confidence in the data and any decisions it influences. In addition, the removal of manual processing would result in significant time and cost savings, with delays downstream being minimised or even eliminated.

East Hampshire District Council used this logic when they used FME to create a process to QA a large collection of historic datasets ready for loading into their SQL database. Not only did automating the processes increase the overall accuracy and reliability of the data, it also enabled significant savings in staff costs.

It’s a simple approach, but it has the potential to solve an issue that causes many organisations huge levels of disruption. It may not work for all scenarios, but it would definitely be a good starting point for most.