Introduction to mapping in R

Charles Coverdale
Analytics Vidhya
Published in
7 min readMay 31, 2020

--

0.0 Introduction

I love maps. I’m also quite a fan of R.

Over the past few months I’ve been exploring how best to create maps of Australia using R and data from the ABS. This is what I’ve learnt.

Firstly, the ABS update their geographies regularly. They can all be readily accessed here.

You’ll want to download these in ‘ESRI Shapefile format’ (e.g. Statistical Area Level 2 (SA2) ASGS Ed 2016 Digital Boundaries in ESRI Shapefile Format). SA’s or statistical areas are the ABS’ way of carving up the country into manageable regions for analysis.

Secondly, there’s a myriad of packages out there to assist in spatial visualisations. Here’s an incomplete list of one’s that will be handy for the below examples.

#install spatial packages
install.packages("plyr")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("rgdal")
install.packages("tmap")
install.packages("ggmap")
install.packages("sf")
install.packages("ggspatial")
install.packages("rlang")
install.packages("broom")
install.packages("tidyverse")
install.packages("readxl")
install.packages("raustats")
install.packages("purrr")
install.packages("Census2016")
#load spatial packages
library(plyr)
library(dplyr)
library(ggplot2)
library(rgdal)
library(tmap)
library(ggmap)
library(dplyr)
library(sf)
library(ggspatial)
library(rlang)
library(broom)
library(tidyverse)
library(readxl)
library(raustats)
library(purrr)
library("Census2016")

1.0 Importing shapefiles into R

The simple features package for R (“sf”) provides all the fundamental requirements for working with spatial data.

#Use the sf package to import shapefiles
install.packages("sf")
library(sf)
#Select the location where the ASGS files are saved
AUS_SA2_shp <- read_sf("K:/Alteryx/ASGS","SA2_2016_AUST")
AUS_CED_shp <- read_sf("K:/Alteryx/ASGS","CED_2016_AUST")
#Check the data imported correctly
head(AUS_SA2_shp)
head(AUS_CED_shp)

2.0 Making simple spatial maps using ggplot

Making a map of a shapefile is straightforward using ggplot2 and the geom_sf function. Additionally, multiple shapefiles can be plotted in a single plot, with separate attributes. In the below example, both a map of SA2’s across Australia, and a map of Victorian SA2’s (coloured green) are plotted on the same chart.

library(ggplot2)
library(dplyr)
#Read in the SA2 shapefile downloaded from the ABS
AUS_SA2_shp <- read_sf("K:/Alteryx/ASGS","SA2_2016_AUST")
#filter the Australian SA2 shapefile for only Victoria
AUS_SA2_VIC_shp <- AUS_SA2_shp %>%
filter(STE_CODE16 == 2)
#plot a map of Australia (grey) and a map of Victoria (green)
ggplot() +
geom_sf(data=AUS_SA2_shp)+
geom_sf(data=AUS_SA2_VIC_shp,fill = "#A5D46A")+
ggtitle("Australian SA2 map") +
xlab("Longitude") +
xlim(110,155)+
ylab("Latitude") +
theme_bw()

3.0 Make a map of the intersection of two shapefiles

Often shapefiles have different geographies, and you want to figure out where they overlap (formally: intersect). This is simple using the st_intersect function. The following script plots the intersection of a shapefile of Commonwealth Electoral Divisions (i.e. voting electorates) and a shapefile of Victoria.

#import a shapefile of state boundaries
AUS_STATE_shp <- read_sf("K:/Alteryx/ASGS","STE_2016_AUST")
#import a shapefile of Australian electorates
AUS_CED_shp <- read_sf("K:/Alteryx/ASGS","CED_2016_AUST")
#check the imports worked
head(AUS_STATE_shp)
head(AUS_CED_shp)
#filter the shapefile for Victoria only
AUS_STATE_VIC_shp <- AUS_STATE_shp %>%
filter(STE_CODE16==2)
#run a spatial intersection for the state of Victoria and all electorates
VIC_CED <- st_intersection(AUS_STATE_VIC_shp, AUS_CED_shp)
#plot this intersecting shapefile
ggplot() +
geom_sf(data=VIC_CED)+
ggtitle("Commonwealth Electoral Divisions in Victoria") +
xlab("Longitude") +
ylab("Latitude") +
theme_bw()
ggsave("C:/Users/Charles F Coverdale/Desktop/VIC_CED.pdf",units="cm",dpi=200, device="pdf")

4.0 Plot individual features (e.g. cities) on a shapefile

In addition to plotting individual shapefiles, or intersections of shapefiles, it is also possible to plot features (e.g. a certain point) on a map.

In the below example, select capital cities (and their labels) are plotted onto a map of the states of Australia.

#import a shapefile of state boundaries
AUS_STATE_shp <- read_sf("K:/Alteryx/ASGS","STE_2016_AUST")
#make a new dataset of cities in Australia (google the locations)
AUS_cities <- tribble(
~city, ~lat, ~long,
"Brisbane",-27.4698, 153.0251,
"Sydney", -33.8688, 151.2093,
"Melbourne", -37.8136, 144.9631)
#check it looks okay
AUS_cities
#convert those two columns to geometry column with the st_as_sf() function. Google Maps uses the coordinate reference system 4326 (the GPS system).AUS_cities_geometry <- AUS_cities %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
#check it looks right
AUS_cities_geometry
ggplot() +
geom_sf(data=AUS_STATE_shp)+
geom_sf(data=AUS_cities_geometry, size=3)+
geom_sf_label(data =AUS_cities_geometry, aes(label = city))+
ggtitle("Brisbane, Sydney and Melbourne") +
xlab("Longitude") +
ylab("Latitude") +
theme_bw()
ggsave("C:/Users/Charles F Coverdale/Desktop/3_capital_cities.pdf",units="cm",dpi=200, device="pdf")

5.0 Plotting external data sources (e.g. census) on maps

The vast majority of benefit derived from mapping is in visualising variables across regions. In R this involves joining two data sets together based on a common field. The below example involves joining census data (at the SA2 level) to a shapefile, then producing a map.

#Get the census data (already by SA2) by using the Census2016_wide_by_SA2_year function in the Census2016 packagecensus2016_wide <- Census2016_wide_by_SA2_year
head(census2016_wide,n=20)
#Select the key demographic columns (first 8 columns)
census_short <- census2016_wide[,1:8]
head(census_short,n=20)
#Filter for data from the 2016 year
census_short_2016 <- census_short %>%
filter(year==2016)head(census_short_2016)

The census data from this package is in a relatively clean format.

Next, we join the census data to a shapefile.

#Join the two sf (SA2 and census) by their common field.
SA2_shp_census_2016 <- inner_join(AUS_SA2_shp,census_short_2016,
by = c("SA2_NAME16" = "sa2_name"))
#check the format
head(SA2_shp_census_2016)
#Plot a map that uses census data
ggplot() +
geom_sf(data = SA2_shp_census_2016,
aes(fill = median_age, border=NA)) +
ggtitle("Australian median age (SA2)") +
xlab("Longitude") +
ylab("Latitude") +
theme_bw() +
theme(legend.position = "right",
legend.title = element_text("Median age"))
ggsave("C:/Users/Charles F Coverdale/Desktop/median_age_map.png",
units="cm",dpi=200, device="png")

6.0 Australia on the world stage

It is useful to be able to highlight individual countries on a world map. Both coordinate reference systems and map projections (broadly how the world looks depending what shape you assume the globe is) start to matter here.

Firstly, we import a shapefile of the whole world.

Next, we filter out Antartica (for plotting demographic variables or similar, it is almost never useful on a world map).

Third, we create a separate shapefile layer for our country of choice (in this example it’s Australia).

#download a world map (e.g. "Admin 0 – Countries"):
https://www.naturalearthdata.com/downloads/110m-cultural-vectors/
#read in the worldmap shapefile using read_sf
world_map <- read_sf("C:/Users/Charles F Coverdale/Desktop/go_bag/R code/ne_110m_admin_0_countries.shp")
world_map
#remove Antarctica
world_sans_antarctica <- world_map %>%
filter(ISO_A3 != "ATA")
#check the data structure
world_sans_antarctica
#create a world map plot to test is looks right
ggplot() +
geom_sf(data = world_sans_antarctica)+
theme_minimal()
#Create a new dataframe by filtering for Australia by ISO alpha 3 code
world_AUS <- world_sans_antarctica %>%
filter(ISO_A3 == "AUS")
#plot a map with both a world shapefile and an Australia shapefile
#here we use the Robinson coordinate reference system for accuracy
ggplot() +
geom_sf(data = world_sans_antarctica) +
geom_sf(data=world_AUS, fill = "#EC8E55", color = "white")+
coord_sf(crs = 54030)+
theme_void()

7.0 Plotting external data sources (e.g. World Bank) on a world map

Just like in 5.0, it’s also possible to link external data sets to a world map.

Andrew Wheiss has put together a great example for his college class of downloading the world development indicators from the World Bank and plotting them on a world map.

A simplified version of the code to map life expectancy to a global shapefile is below.

#download "Admin 0 – Countries" from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/#read in the world map shapefile
world_map <- read_sf("C:/Users/Charles F Coverdale/Desktop/ne_110m_admin_0_countries.shp")
#remove Antarctica
world_sans_antarctica <- world_map %>%
filter(ISO_A3 != "ATA")
#check the plot works
ggplot() +
geom_sf(data = world_sans_antarctica)+
theme_minimal()
#get latest data from the World Bank
install.packages("WDI")
library(WDI)
#choose the development indicators for 2017
indicators <- c("SP.DYN.LE00.IN")
wdi_raw <- WDI(country = "all", indicators, extra = TRUE,
start = 2017, end = 2017)
#check the data structure
head(wdi_raw)
#extract the life expectancy column
wdi_clean_small <- wdi_raw %>%
select(life_expectancy = SP.DYN.LE00.IN, iso3c)
#check the data structure
head(wdi_clean_small)
#join the shapefile and life expectancy data set together
world_map_with_life_expectancy <- world_sans_antarctica %>%
left_join(wdi_clean_small, by = c("ISO_A3" = "iso3c"))
#produce a map of life expectancy across the world
ggplot() +
geom_sf(data = world_map_with_life_expectancy,
aes(fill = life_expectancy),
size = 0.25) +
coord_sf(crs = 54030) + # Robinson
scale_fill_viridis_c(option = "viridis") +
labs(fill = "Life expectancy") +
theme_void() +
theme(legend.position = "bottom")

This map can be further broken down for an individual region or continent. There’s not much to see for the Australian continent (i.e. no state by state breakdown), so let’s try Africa instead.

#start with the data "world_sans_antarctica_fixed from the previous set
#filter for the UN region called "Africa"
world_Africa <- world_sans_antarctica_fixed %>%
filter(REGION_UN == "Africa")
#check the data looks good
world_Africa
#produce the map (same function as previously)
ggplot() +
geom_sf(data = world_Africa,
aes(fill = life_expectancy),
size = 0.25) +
coord_sf(crs = 54030) + # Robinson
scale_fill_viridis_c(option = "viridis") +
labs(fill = "Life expectancy") +
theme_void() +
theme(legend.position = "bottom")

8.0 Next steps

There’s a million more things you can do with mapping in R. These include gganimate, R shiny, and various facet wrap approaches.

Check out my github or website for future posts.

--

--