TQ
dev.com

Blog about software development

Subscribe

Free Dutch postcodes CSV dataset

18 Apr 2024 - by 'Maurits van der Schee'

Web shops in the Netherlands need accurate address validation and completion. Kadaster is a Dutch government body that can provide the data that we need. Unfortunately they only provide the data in an almost unusable format (about 96GB of XML data), while we need a CSV file with only a small subset of that data for our needs. In this blog post I'll explain how to reduce the vast dataset into a small file of either 40MB (megabyte) including locations in RD coordinates or 17MB without.

Bert Hubert's bagconv

I modified the "bagconv" tool created by Bert Hubert to do all the required steps in one go (in about half an hour):

  1. download (3GB zip)
  2. unpack (3GB zip into 96GB xml)
  3. convert (96GB XML into 9GB sqlite)
  4. extract (9GB sqlite into 350MB csv)
  5. compress (350MB csv into 17MB 7z)

The command to install the dependencies and run locally is:

sudo apt install build-essential cmake libsqlite3-dev nlohmann-json3-dev zlib1g-dev 
sudo apt install sqlite3 wget unzip 7zip
bash run.sh

Or you can run without installing dependencies via docker with:

bash docker-run.sh

Note that a run may take half an hour and use 110G (gigabyte) of disk space. You can run:

bash clean.sh

to clean up all used diskspace after you copied the files from the "dist" folder that you need.

Loading using SQL

The resulting 7zip compressed CSV files can be loaded into MariaDB database as shown in the postcodes-nl and postcodes-nl-geo projects. Note that the releases of those projects contain these required datasets (7z files), so there is no need to run the conversion by yourself. I will try to release the loader script plus dataset regularly. I will try to maintain the fork in order to build my own 7zip compressed CSV files.

If you have any improvements or additions, please create an issue on the corresponding Github project.

PostNL address check service

You can now implement a service comparable to the PostNL addres check service. You can execute the following GET request:

curl -G -d "postalCode=1000AA" -d "houseNumber=1" -d "houseNumberSuffix=" \
https://productprijslokatie.postnl.nl/address-widget/api/lookup/address | jq

that returns:

{
    "address": {
        "street": "Postbus",
        "postalCode": "1000AA",
        "houseNumber": "1",
        "houseNumberSuffix": null,
        "city": "AMSTERDAM"
    },
    "matched": true,
    "availableHouseNumberSuffixes": []
}

NB: Please don't abuse this PostNL API, it seems to be designed for internal use only.

Download dataset

Links


PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.