First macro analyses of spatial content in wikidata

Through, our self-hosted Wikidata database we can perform large-scale queries of geographic content in Wikidata and check some spatial distributions.

Places of worship, for example can be queried through this SPARQL query:
(you can test the query here)

# all instances of places of worship (or of subclasses), with coordinates
SELECT ?item ?itemLabel ?coords
?item wdt:P31 wd:Q1370598;
wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

With our  test database from Novemer 2017 this query generated a result set of 133,389 items representing churches, mosques, shrines, synagoges and all other kinds of different places of worship.

In order to prepare this dataset for GIS visualizations, I imported it via an R script into a PostGIS database. From here I used Quantum GIS to generate map views of the dataset:

geocoded places of worship from Wikidata

We imediately recognize a familiar pattern from other crowdsourced geographic information: the Global North, specifically the US east coast, Europe and Japan, contain most of the data. This impression is strongly confirmed by the heatmap view of this dataset:

heatmap of geocoded places of worship from Wikidata

Interestingly, the highest density of places of worship in Wikidata seems to be in Austria in Czech Republik. My best guess is that this might be casued by an automated import of a detailed Austrian dataset that contained these information.

Let’s have a look on another feature: places of birth, obtained through this SPARQL query:

# all places of birth, with coordinates
SELECT DISTINCT ?place ?placeLabel ?coord 
?item wdt:P19 ?place.
?place wdt:P625 ?coord.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

With my database snapshot, this generated a result set of 151,214 geolocated places of birth from Wikidata, which are globally distributed as follows:

geocoded places of birth from Wikidata


While this pattern appears to be way more balanced than the places of worship, the heatmap visualization relativizes this impression and shows, again, the data’s major focus on Europe, espacially southern Germany:

heatmap of geocoded places of birth from Wikidata

And what happens if we map all geographic locations in Wikidata?:

#geographic locations 
SELECT ?geoloc ?geolocLabel ?coord
?geoloc wdt:P31/wdt:P279* wd:Q2221906.
?geoloc wdt:P625 ?coord.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

geographic locations from Wikidata

Here I received a total of 2,356,323 places. While those spread  over all populated areas of the world,  the relative distribution of the heatmap view tells a different story (Interestingly, this category has its highest density in southern England):

heatmap of geographic locations from Wikidata

Summing up these tiny first explorations, Wikidata seems to be strongly biased towards the Global North and particularly Western Europe. which is not a big surprise in face of other crowdsourced geographic information from OpenStreetMap or Wikipedia.

Interestingly, undernaeth this general bias towards Europe, each thematic dataset seems to bear slightly different nuances of skews and focus areas. Further this is not to say that Wikidata has nothing to offer about other regions as well, as the dot map on “geographic locations” indicates.

I plan to conduct more detailed analyses on these issues in future.