Archiv der Kategorie: not on frontpage

Mapping geosemantics

In Wikidata, different knowledge items are interrelated through properties in countless semantic statements.

For example:

Germany‘s capital is currently Berlin“;
Berlin‘s head of government is currently Michael Müller“;
Michael Müller is a member of the Social Democratic Party of Germany” (SPD);
“the SPD‘s headquaters location is the  Willy-Brandt-Haus;
“the  Willy-Brandt-Haus has a coordinate location of 52°30’N, 13°24’E; …

While this structure is the common DNA of semantic web databases  (RDF triples), it bears some very exciting aspects from a geographers’ or a GIScientist’s perspective, because relationships between geographic entities are modeled through semantic properties and not (only), as common in geographic information systems, through geometric or topologic properties.

This resonates with a long lasting debate in geography about conceptions of space in ontologies of spatial data. Many authors have criticized conventional data models from GIS and cartography for their Newtonian, i.e. absolute view of space as a geometric container, arguing for more relational spatial data ontologies that represent space through relations of spatial objects.

From this view, the geospatial semantic web in general and Wikidata in particular might present a promising avenue towards digital representations of relational spaces, networks of semantically conected places, or shortly: geosemantics.

As a first step towards an exploration of these geosemantics in Wikidata, I have creted a tool in R that delivers an interactive web map (in Leaflet) for a given place (input) with selectable layers of all semantic relations and their respective places in Wikidata.

This tool is a first step towards an examination of a place’s multidimensional and multiscalar geosemantics in a manner of qualitative geovisual analytics.

Here are some exemplary result maps:

City semantics (based on the place “Cologne“)

(might take a moment)

If we check the map’s layers of Wikidata’s instance-of semantics for cologne, we get a feeling for changing meanings and functions of the city across historical periods, as well as for changing scales of interrelationships between cologne and other cities.

Unsurprisingly, the largest layer of places consists of other “cities”. The city is a weak but globally adopted concept, defined in Wikidata as a “large and permanent human settlement”. Whether or not we agree with this definition, there is no denial that “cities”, as places of urban life play a major role in our world. Needless to say that this layer is far from complete. Zooming into specific regions we see that many cities are missing here.

Then we have two layers of rather quantitative criteria: big cities (“with a population of more than 100,000 inhabitants”) and metropolis (“very large and significant city or urban area usually with millions of inhabitants”). Apparenly, these layers are even less complete than the city layer. “Big cities” in Wikidata are mostly located in Western Europe and “Metropolises” are even too sporadic to talk about patterns at all.

What I found most interesting however, are the more specific layers of “independent cities” (a category from the German administrative system), “hanseatic cities” (cities that were members of the Hanseatic League) and “Roman cities” (Roman-period settlements), becasue they point to changing funtions, meanings and alliances of Cologne across different historical periods. Again, these layers are incomplete, but, since Wikidata keeps growing at high pace, this will probably get better in future.

Israeli settlements (based on the place “Efrat“)

(might take a moment)

If we look for examples of contrasting perspectives, those can usually be easily found in the context of Israel and Palestine. For this map, I searched for the semantics of the Israeli town/settlement Efrat in the (Israeli occupied) West Bank, which might also be called State of Palestine or Judea and Samaria (depending on your political stance towards the question of Israel-Palestine). And indeed, in Wikidata, Efrat is part of two quite dissmillar semantic categories.

In accordance with the internationally wide accepted “UN view”, Efrat is an Israeli settlement (“Jewish civilian communities built by Israel on lands it occupied following the 1967 Six-Day War”). This category includes only places in the disputed territories of the West Bank, the Golan Heights and the Gaza Srip (in the last case the settlements are historical becasue Israel evacuated and destroyed all settlements in the Gaza Strip in 2005).

By contrast, Efrat is also described by Wikidata as a “local council in Israel“, which is an administrative category for municipal governments in Israel. On the map, these local councils are located inside both the internationally accepted borders of pre-1967 Israel and inside the disputed territories. This category then, represents a rather “Israeli view” of the situation, where settlements are often accepted as “normal” Israeli towns.

Refugee camps (based on the place “Calais Jungle“)

(might take a moment)


While this map is not composed of different semantic layers, I still found it interesting for putting the discourse of migration and flight into a wider picture. For example, I probably wouldn’t have made a connection between the “Calais Jungle” and Beaubears Island in Canada, which was an Acadian refugee camp during the French and Indian War in the 18th century. There are also Palestinian refugee camps in the Gaza Strip on the map, like the Nuseirat Camp, which belong to another totally different historical and geographical context. Some places on the map are located in Germany, and represent refugee camps for displaced persons after the end of the Second World War.

Thus, on the one hand, this dataset is highly heterougenous (needless to say that the data is far from complete in any case and that the selection seems highly eclectic). On the other hand, I appreciate precisely this inconsistency for its explorative value, for opening unexpected connections between varying historical and geographical contexts of flight and expulsion.

First macro analyses of spatial content in wikidata

Through, our self-hosted Wikidata database we can perform large-scale queries of geographic content in Wikidata and check some spatial distributions.

Places of worship, for example can be queried through this SPARQL query:
(you can test the query here)

# all instances of places of worship (or of subclasses), with coordinates
SELECT ?item ?itemLabel ?coords
WHERE
{
?item wdt:P31 wd:Q1370598;
wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

With our  test database from Novemer 2017 this query generated a result set of 133,389 items representing churches, mosques, shrines, synagoges and all other kinds of different places of worship.

In order to prepare this dataset for GIS visualizations, I imported it via an R script into a PostGIS database. From here I used Quantum GIS to generate map views of the dataset:

geocoded places of worship from Wikidata

We imediately recognize a familiar pattern from other crowdsourced geographic information: the Global North, specifically the US east coast, Europe and Japan, contain most of the data. This impression is strongly confirmed by the heatmap view of this dataset:

heatmap of geocoded places of worship from Wikidata

Interestingly, the highest density of places of worship in Wikidata seems to be in Austria in Czech Republik. My best guess is that this might be casued by an automated import of a detailed Austrian dataset that contained these information.

Let’s have a look on another feature: places of birth, obtained through this SPARQL query:

# all places of birth, with coordinates
SELECT DISTINCT ?place ?placeLabel ?coord 
WHERE
{
?item wdt:P19 ?place.
?place wdt:P625 ?coord.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

With my database snapshot, this generated a result set of 151,214 geolocated places of birth from Wikidata, which are globally distributed as follows:

geocoded places of birth from Wikidata

 

While this pattern appears to be way more balanced than the places of worship, the heatmap visualization relativizes this impression and shows, again, the data’s major focus on Europe, espacially southern Germany:

heatmap of geocoded places of birth from Wikidata

And what happens if we map all geographic locations in Wikidata?:

#geographic locations 
SELECT ?geoloc ?geolocLabel ?coord
WHERE
{
?geoloc wdt:P31/wdt:P279* wd:Q2221906.
?geoloc wdt:P625 ?coord.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

geographic locations from Wikidata

Here I received a total of 2,356,323 places. While those spread  over all populated areas of the world,  the relative distribution of the heatmap view tells a different story (Interestingly, this category has its highest density in southern England):

heatmap of geographic locations from Wikidata

Summing up these tiny first explorations, Wikidata seems to be strongly biased towards the Global North and particularly Western Europe. which is not a big surprise in face of other crowdsourced geographic information from OpenStreetMap or Wikipedia.

Interestingly, undernaeth this general bias towards Europe, each thematic dataset seems to bear slightly different nuances of skews and focus areas. Further this is not to say that Wikidata has nothing to offer about other regions as well, as the dot map on “geographic locations” indicates.

I plan to conduct more detailed analyses on these issues in future.

Wikidata database prototype successfully set up

With kind assistance by the GeoDatenZentrum, I have managed to set up a prototype of a self-hosted Blazegraph database on a Linux server and populate it with a Wikidata dump file from 6.11.2017. I was basically following the instructions on https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md . The import took the virtual machine (64GB RAM, 4 cores, TB) about three weeks. It seems reasonable to use a faster machine next time, also because Wikidata grows at high pace.

Harnessing GPUs Delivers a Big Speedup for Graph Analytics

To make the database running properly, I changed some parameters of the config file RWStore.properties and the shell script runBlazegraph.sh.

In addition, the memory swap must be disabled to avoid runtime errors:

sudo swapoff -a

sudo sysctl vm.swappiness=0

If properly installed and poulated, a SPARQL endpoint can be accessed via a http request:

http://[ip address of the server]:9999/bigdata/sparql?