This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. It’s a great way to combine R programming with text and visualization.
What is more, when combining this with an open source publication of your dataset and scripts (for instance on a dataverse, Surfdrive or github), your complete workflow is public and reproducible! The code blocks in this example are all displayed for instruction purposes, but they can be hidden as well to keep only text and code output.
This demonstration will show how to integrate historical maps from the Special Collections of the Utrecht University Library with a research dataset, and visualize the results.
One of my own datasets, collected together with a research assistant, documents the background of migrants entering the city of The Hague in the eighteenth century. The data was collected from digitized sources at the Gemeentearchief The Hague (see here).
Placenames were geocoded against Geonames with Openrefine using this Github repository.
Occupations were coded into HISCO, HISCLASS, and HISCAM classifications using this reference dataset for Dutch occupations.
After many hours of entering, cleaning, and standardizing data, this is the dataset: adm_journeymen.csv (sample).
You may know that R is an open-source community-based platform that heavily relies on its users and enthusiastic programmers. R comes with basic functionalities (called “base R”). You can add functionalities to R by installing packages
, that are developed by the community. There are many, many, many different packages. This also means that there are multiple ways to solve the same issue in R. This is a strong feature of R but can also be overwhelming. Also, not every package is as good or useful as the other. Most of the time, just a couple of packages will suit most of your needs.
Getting our Excel or csv file into R requires reading it as a data.frame
. First we import the data and then take a look at the first three rows to see if it looks as it should. For this we use a package that is awesome for manipulating tabular data: data.table
.
library(data.table)
adm <- fread("C:\\Users\\Schal107\\Documents\\UBU\\Team DH\\adm_jm_dec2020.csv") # import the csv file
head(adm, n=3) # display the first x rows of the dataset
## occupation V1 id surname last_name sex pob
## 1: Hoedemakersknecht 118 5408 Andries Schroder m Lubeck
## 2: Kleermakersknecht 401 6203 Jacobus van Lier m Laaren
## 3: Kleermakersknecht 403 7058 Johan Kort Herman Brinkman m Haustenbeek
## pob_modern foreign_dummy pop_pob_1750 pob_lat pob_long dist_pob_denhaag
## 1: Lubeck 1 NA 53.86547 10.68656 471.70698
## 2: Laren 0 NA 52.25667 5.22778 66.53604
## 3: Haustenbeck 1 NA 51.83341 8.77527 307.73242
## origin pob_eq_origin pop_origin_1750 origin_lat origin_long
## 1: NA NA NA NA
## 2: Venray 0 NA 51.525 5.975
## 3: NA NA NA NA
## dist_origin_denhaag dist_pob_origin religion marital_status married_dummy
## 1: NA NA Luth single 0
## 2: 148.9474 96.16829 Cat single 0
## 3: NA NA Ref married 1
## children journeyman_dummy hisco hisclass hiscam hisco_minor skill_level
## 1: 1 79310 8 57.54 79 Average
## 2: 1 79100 8 51.40 79 Average
## 3: 1 79100 8 51.40 79 Average
## skill_num date_arrival month_arrival year_arrival allowed_stay_months
## 1: 3 17570615 6 1757 NA
## 2: 2 17660505 5 1766 12
## 3: 3 17740411 4 1774 6
## date_leaving month_leaving length_stay_if_left settlement_type left_dummy
## 1: NA NA NA gepermiteerd 0
## 2: NA NA NA 0
## 3: NA NA NA admissie 0
## citizenship_dummy settlement_date min_length_stay_if_settlement destination
## 1: 0 17570713 0.9205479
## 2: NA NA
## 3: 0 17741031 6.0000000
## dest_eq_pob pop_dest_1750 dest_lat dest_long dist_dest_denhaag dist_pob_dest
## 1: NA NA NA NA NA NA
## 2: NA NA NA NA NA NA
## 3: NA NA NA NA NA NA
## pdf_page_no source remark entered_by sampled N
## 1: 78 1121_7 David 0 1
## 2: 99 1122_4 17-06-1967 is soldaat geworden David 1 2
## 3: 52 1123_1 David 1 2
## mechanical nonroutine ingenuity occ_class_new occ_class_new_ex_mech
## 1: 0 0 1 1 1
## 2: 0 1 1 2 2
## 3: 0 1 1 2 2
Next, we check if the relevant variables are in the right format. Coordinates should be numeric otherwise we cannot plot them on a map:
str(adm[!is.na(origin_lat), list(pob_modern, pob_lat, pob_long, origin, origin_lat, origin_long, sex)])
## Classes 'data.table' and 'data.frame': 180 obs. of 7 variables:
## $ pob_modern : chr "Laren" "Bordeaux" "Maastricht" "Ghent" ...
## $ pob_lat : num 52.3 44.8 50.8 51 48.2 ...
## $ pob_long : num 5.228 -0.581 5.689 3.717 16.372 ...
## $ origin : chr "Venray" "Amsterdam" "Maastricht" "Gent" ...
## $ origin_lat : num 51.5 52.4 50.8 51.1 48.2 ...
## $ origin_long: num 5.97 4.89 5.69 3.74 16.37 ...
## $ sex : chr "m" "m" "m" "m" ...
## - attr(*, ".internal.selfref")=<externalptr>
This all looks fine for now, but notice that sex
is not defined as a factor but a string. This is something we might want to fix when doing other visualizations or analyses with this variable.
Spatial data can have many different formats and many ways of getting it into R. Because we’ll use an XYZ file
of the Historical Maps collection one option is to use a package called leaflet. This is a Java-based package that allows for interactive features, such as pins, clusters, and label visualizations.
library(leaflet)
Now we can select a map from the Utrecht University Historical Maps Collection. The selected map is titled Belgii faederati nova descriptio and printed in Amsterdam in 1660.
We’ll store this map in R by simply copying the URL of the XYZ
link from the Historical Maps website. To visualize it, we set the correct center and zoom level of the map box, and then simply call the URL inside the leaflet
package. Notice that the quality of the scanned map allows for very detailed zooming.
url <- ("https://maps.georeferencer.com/georeferences/9ee5f3fd-740b-5206-8e85-2ecb94618bc9/2018-03-07T13:48:38.866245Z/map/{z}/{x}/{y}.png?key=ebGMmpORFAU1M65ypiIz") # store the URL from the website in R
leaflet() %>% setView(lng = 5.092092, lat = 52.093992, zoom = 7) %>%
addTiles(url,
options = WMSTileOptions(format = "image/png", transparent = F)
) # plot the map
Because our map is georeferenced it comes with underlying coordinates, meaning that we can add georeferenced data to it. First, we will examine how historically correct the provincial borders of this map actually are. This can be done by importing a recent dataset from the IISH in Amsterdam. Researchers from the IISH are currently meticulously re-drawing provincial borders and publishing these as machine-readable files (GeoJSON
). More information on this project here.
The borders are published as polygons. These are basically two-dimensional geometric figures that, in this case, follow the borders of historical regions of mint authorities. The files are not maps in itself but shapes, with accompanying metadata, that you can add to a map. We will do that below.
library(geojsonio)
## Registered S3 method overwritten by 'geojsonsf':
## method from
## print.geojson geojson
##
## Attaching package: 'geojsonio'
## The following object is masked from 'package:base':
##
## pretty
mint <- geojson_read("C:\\Users\\Schal107\\Documents\\UBU\\Team DH\\Mint authorities.geojson", what = "s") # import IISH polygons
factpal <- colorFactor(topo.colors(10), mint$AUTHORITY) # define a color scheme for our polygons
leaflet() %>% setView(lng = 5.092092, lat = 52.093992, zoom = 7) %>%
addTiles(url,
options = WMSTileOptions(format = "image/png", transparent = F)
)%>%
addPolygons(data = mint[mint$DATEfrom >= "1520-01-01" & mint$DATEto <= "1794-12-31" &
!grepl("Flanders", mint$AUTHORITY) &
mint$AUTHORITY != "Mechelen" &
mint$AUTHORITY != "Tournai" &
mint$AUTHORITY != "Brabant" &
mint$AUTHORITY != "Namur" &
mint$AUTHORITY != "Hainaut" &
mint$AUTHORITY != "United Belgian States" &
mint$AUTHORITY != "Liège" |
mint$AUTHORITY == "Culemborg" |
mint$AUTHORITY == "Vianen" |
mint$AUTHORITY == "Ravenstein" |
mint$AUTHORITY == "Buren" |
mint$AUTHORITY == "Batenburg" |
mint$AUTHORITY == "Cuijk" |
mint$AUTHORITY == "Gemert" |
mint$AUTHORITY == "Borculo" |
mint$AUTHORITY == "Cleves"
, ], color = ~factpal(mint$AUTHORITY), fillOpacity = .01) # this long list of subsetting is required to only retrieve the borders of regions within the Dutch Republic
Although the angles of polygons are a bit off compared to the map, it shows that the map and the official borders follow each other extremely closely.
Because our map is of the Dutch Republic, we’ll plot the origin of Dutch migrants to The Hague. This can be easily done by subsetting the data using data.table
(adm[foreign_dummy == 0
). Next, because many migrants came from the same cities or towns (Amsterdam, Deventer, etc.) we’ll define clustering to be able to visualize all migrants. The interactive map feature then allows for zooming in on the individuals from these clusters.
Many social and economic historians are interested in the social status of migrants. Did only the poor move to cities? To examine that for our migrants, we’ll use the HISCAM scale (see here). This is a social stratification scheme based on occupational titles. It runs from 40
(low status, such as day-labourers), to 100
(very high status, such as lawyers and physicians). Since the occupations of migrants were already coded into HISCAM (see above), we can assign a color corresponding to their status, ranging from white (low) to red (high). This is all done in the few lines of code below. Note that you can even hover over the individual migrants to display their HISCAM score!
qpal <- colorQuantile("Reds", adm$hiscam, n = 6) # define Hiscam color scale
leaflet(data = adm[foreign_dummy == 0]) %>% setView(lng = 5.092092, lat = 52.093992, zoom = 7) %>%
addTiles(url,
options = WMSTileOptions(format = "image/png", transparent = F)) %>%
addCircleMarkers(lat = ~pob_lat, lng = ~pob_long,
color = ~qpal(adm$hiscam), label = ~as.character(adm$hiscam),
clusterOptions = markerClusterOptions()
) %>%
addPolygons(data = mint[mint$DATEfrom >= "1520-01-01" & mint$DATEto <= "1794-12-31" &
!grepl("Flanders", mint$AUTHORITY) &
mint$AUTHORITY != "Mechelen" &
mint$AUTHORITY != "Tournai" &
mint$AUTHORITY != "Brabant" &
mint$AUTHORITY != "Namur" &
mint$AUTHORITY != "Hainaut" &
mint$AUTHORITY != "United Belgian States" &
mint$AUTHORITY != "Liège" |
mint$AUTHORITY == "Culemborg" |
mint$AUTHORITY == "Vianen" |
mint$AUTHORITY == "Ravenstein" |
mint$AUTHORITY == "Buren" |
mint$AUTHORITY == "Batenburg" |
mint$AUTHORITY == "Cuijk" |
mint$AUTHORITY == "Gemert" |
mint$AUTHORITY == "Borculo" |
mint$AUTHORITY == "Cleves"
, ], color = ~factpal(mint$AUTHORITY), fillOpacity = .01)
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
## missing or invalid lat/lon values and will be ignored
You might perhaps be more interested in their actual occupation, or some other variable (like their age). We can plot that as well:
leaflet(data = adm[foreign_dummy == 0]) %>% setView(lng = 5.092092, lat = 52.093992, zoom = 7) %>%
addTiles(url,
options = WMSTileOptions(format = "image/png", transparent = F)) %>%
addMarkers(~pob_long, ~pob_lat, label = ~as.character(adm$occupation), clusterOptions = markerClusterOptions())
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
## missing or invalid lat/lon values and will be ignored
We can also subset the data on a specific occupation. Let’s see where construction workers came from, and use a map that covers more ground:
leaflet(data = adm[hisco_minor == 95]) %>% setView(lng = 8.75439, lat = 51.71905, zoom = 6) %>%
addTiles("https://maps.georeferencer.com/georeferences/95c69e4c-410e-56e5-921e-bc5ef59f5be9/2019-11-15T12:39:45.001798Z/map/{z}/{x}/{y}.png?key=ebGMmpORFAU1M65ypiIz",
options = WMSTileOptions(format = "image/png", transparent = F)) %>%
addMarkers(lat = ~pob_lat, lng = ~pob_long,
label = ~as.character(adm$occupation[adm$hisco_minor == 95]),
clusterOptions = markerClusterOptions() )
## Warning in validateCoords(lng, lat, funcName): Data contains 7 rows with either
## missing or invalid lat/lon values and will be ignored
Answer: most came from the surroundings of The Hague.
A very strong feature of R is that you can combine functions from different packages. Let’s demonstrate this by combining three variables in our map: occupational category (HISCO minor), occupational title, and the HISCAM score. First, we set a color scheme for the occupational categories using another package (viridis
). Next, we combine the occupational title and the Hiscam score of the migrant into one text label, using the function paste0
. Interesting to note here is that paste0
is a base R function, that you can use across packages. This goes for many common conditions, which means that you don’t always have to learn the specific code or functions of every package. The map below combines four different methods for visualization: base R
, data.table
, leaflet
, and viridis
:
package | used for | code |
---|---|---|
base R | pasting labels | paste0(adm$occupation, ", HISCAM = ", adm$hiscam) |
data.table | subsetting dataset | adm[foreign_dummy == 0] |
viridis | color scale | qpal2 <- colorNumeric("viridis", as.factor(adm$hisco_minor)) |
leaflet | interactive map | all the other lines ;) |
library(viridis)
## Loading required package: viridisLite
qpal2 <- colorFactor("viridis", as.factor(adm$hisco_minor)) # define color scheme for occupational categories
leaflet(data = adm[foreign_dummy == 0]) %>% setView(lng = 5.092092, lat = 52.093992, zoom = 7) %>%
addTiles(url,
options = WMSTileOptions(format = "image/png", transparent = F)) %>%
addCircleMarkers(lat = ~pob_lat, lng = ~pob_long,
color = ~qpal2(adm$hisco_minor), label = ~as.character(paste0(adm$occupation, ", HISCAM = ", adm$hiscam )),
clusterOptions = markerClusterOptions()
)
## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with either
## missing or invalid lat/lon values and will be ignored
That’s it for now! Note that the dataset used here required quite some work, but as long as you have coordinates (latitude and longitude), or even just place names, you can already perform GIS visualizations.
Join the R courses of RDM and/or check the upcoming events at the Living Lab Digital Humanities!
Questions: r.schalk@uu.nl