Using FME to analyse 140 million MOT records – not a problem!
Our sister company, Field Dynamics, recently put together a report that analysed 140 million class 4 MOT records to get one of the most accurate views of average vehicle mileage across England, Scotland, and Wales.
To compile the report, the team at Field Dynamics needed to process four years’ worth of data for almost every vehicle in Great Britain. They used MOT tests, a legally required test for vehicles over three years old (sometimes sooner depending on vehicle type), to calculate the difference in mileage per vehicle between each test.
Field Dynamics needed a data tool that could handle a huge amount of data, validate it, and load it into a PostgreSQL database quickly and efficiently. That tool was FME.
Here’s how FME made Field Dynamics’ GB MOT Report possible.
Using FME to validate data
MOT data is published by the Driver and Vehicle Standards Agency (DSVA) but is not classed as an ‘official statistic’. This means it’s not subject to scrutiny and assessment by the UK Statistics Authority so there may be some anomalous data. Field Dynamics found that some vehicles were recorded to have negative mileage, and some were recorded to have driven over a billion miles in a year. FME not only cleaned out the obvious data anomalies but also implemented the logic rules needed to load a high-quality data set into the PostgreSQL database.
Using FME to sort schema
When reviewing the data, Field Dynamics found that the records were inconsistent, with different information levels and column layouts. This effected the table structure and shifted some information under incorrect table headers. Field Dynamics used FME to correct this schema drift to enforce a standard schema structure that was database ready.
Using FME to replace values
Throughout the MOT data, some data points were given codes by the DSVA i.e. DI for Diesel, PE for Petrol, HY for Hybrid. Field Dynamics used FME to replace these attributes with clearer explanations of data points, which made it easier for analysts to understand.
Using FME to separate data
Field Dynamics only wanted to analyse Class 4 MOT data but the CSV files from DSVA contained every type of MOT test. Field Dynamics used FME to filter out the tests they weren’t planning to analyse before loading into their database. With so many data points, this helped speed up loading and processing.
Using FME to aggregate multiple datasets
Each years’ MOT data was uploaded into its own huge CSV file. The team wanted to combine these all together for loading into PostgreSQL. FME was the perfect tool to reliably concatenate the data.
Using FME to save time
Without FME to extract, transform and load the data, Field Dynamics’ analysts would have had to manually validate, segment, and clean and load the data. This would have taken so much time the project would not have been viable. Instead, the Field Dynamics team used that saved time to put together an insightful report that challenges perceptions around how much the average vehicle travels in Great Britain.
You can read the Field Dynamics report on GB MOT Mileage that was made possible by FME here.
If you want to know more about how FME can sort your data challenges, get in touch and we’ll be happy to help.