library(sf)
library(terra)
library(spData)
library(spDataLarge)
Geographic data can be broken down into two fundamental data models: vector and raster data. Vector and raster data both represent geographic data but their uses and functionality differ dramatically.
Vector data consist of points, lines, and polygons. The point is the most fundamental of the three since lines and polygons are simply points that have been connected. Vector data are often used to represent tangible structures such as a bus stop (point), a road (line), or a county (polygon).
Raster data are a grid of regularly sized cells and can be treated as data matrices. The cells of a raster are analogous to the pixels that make up an image. Raster data are commonly used to represent continuous data such as elevation or temperature.
There are a variety of packages for working with spatial data and these have improved over the years. The most up-to-date packages and the ones we will be focusing on for this workshop are ‘sf’ for vector data and ‘terra’ for raster data. In addition, we will be using some packages that will load in datasets for our exercises. These are ‘spData’ and ‘spDataLarge’.
Before we move any further, it is also vital that you understand Coordinate Reference Systems (CRS). Broadly speaking, coordinate reference systems define how the spatial elements of the data relate to the surface of Earth (or any other celestial body you may be studying).
CRS’s can be either geographic or projected. Most of us are already familiar with the conventions of a geographic CRS. A geographic CRS uses longitude and latitude to describe any location on the Earth’s surface. However, not all geographic CRS’s are made alike and each must make different assumptions about the Earth’s surface. Projected CRS’s are based off of a geographic CRS and are made by converting a three-dimensional surface into a flat two-dimensional surface. They have an origin, x and y axes, and a linear unit of measurement such as meters.
The most important thing to understand about CRS’s is that these should be consistent throughout your analysis. Unfortunately, there is no universal standard for CRS because each CRS has its own advantages and disadvantages. Identification of CRS’s can also differ but we recommend using either Spatial Reference System Identifier (SRID) or well-known-text (WKT2).
In order to get the CRS of a spatial vector object you can use st_CRS()
st_crs(world)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
User input is the SRID and wkt is the complete well-known-text (WKT2) representation. st_set_crs() is used to set the CRS of a vector object. The first argument will be the object and the second argument will be the SRID or the complete WKT2 representation.
For raster data, crs() can be used to access and assign CRS information.
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
my_rast = rast(raster_filepath)
crs(my_rast) #access CRS information
## [1] "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]"
#crs(my_rast) = "EPSG:26912" # set CRS
We will make sure that CRS’s are appropriately used during this workshop, but this is something you will need to manage in your own projects, especially when gathering data from different sources. For more in depth knowledge on CRS’s please see section 2.4 and chapter 6 of the online textbook.
Another aspect of working with spatial data that needs to be consistent are units. Attribute data and CRS’s can all be using different units which will be problematic for analysis.
For vector data, the ‘sf’ package supports units so they are built in.
luxembourg <- world[world$name_long == "Luxembourg", ]
st_area(luxembourg)
## 2408817306 [m^2]
For raster data, units are not currently supported. You can infer the units from the CRS that you are using. For example, EPSG:4326 uses decimal degrees as units.
Again, we will ensure that units are appropriate during this workshop, but you will have to monitor this closely in your own work. Additional information on the subject can be found in section 2.5 of the online textbook.
‘sf’ stands for simple features which is an open standard for working with vector geometry data. Simple features can support all common vector geometry types such as points, lines, polygons and the ‘multi’ versions of each of these. Geometry collections are also supported which can include multiple data types in one object.
Let’s start by looking at the ‘world’ dataset from the ‘spData’ package.
class(world)
## [1] "sf" "tbl_df" "tbl" "data.frame"
names(world)
## [1] "iso_a2" "name_long" "continent" "region_un" "subregion" "type"
## [7] "area_km2" "pop" "lifeExp" "gdpPercap" "geom"
Note that world is an object of class “sf” and that the data include a column for “geom”. “sf” can be simply understood as a regular data frame with an added geom column containing all the geometry data you will need. This column might not always be called “geom” but it will usually have a similar name such as “geometry”.
Let’s use plot() to quickly visualize the data from ‘world’
plot(world)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to plot
## all
cool map!
‘sf’ is not the first package that has been primarily used for working with spatial data in R. ‘sp’ is a package with similar functionality but is being superseded by ‘sf’. Because ‘sf’ is newer you might find that some packages don’t currently support ‘sf’ objects but will support ‘sp’ objects. However, many packages are beginning to adapt to ‘sf’ and there are already a number of mapping packages that are set up for an ‘sf’ workflow. We’ll focus on those packages later on in the workshop.
Take some time now to play around with plot() and the ‘world’ data. You should find that it is already pretty intuitive to use to make some simple maps. Run this example code and then describe how the second map is different than the first and how this is designated in the code. After that, spend some time to make a different map of your own and show it to somebody.
plot(world["pop"])
world_asia <- world[world$continent == "Asia", ]
asia <- st_union(world_asia)
plot(world["pop"], reset = FALSE)
plot(asia, add = TRUE, col = "red")
Geometries are the basic building blocks of simple features. Simple features in R can take on one of the 17 geometry types supported by the ‘sf’ package. We will be focusing on the most common types: points, linestring, polygon, multipoint, multilinestring, multipolygon, and geometrycollection.
old_par = par(mfrow = c(1, 3), pty = "s", mar = c(0, 3, 1, 0))
plot(st_as_sfc(c("POINT(5 2)")), axes = TRUE, main = "POINT")
plot(st_as_sfc("LINESTRING(1 5, 4 4, 4 1, 2 2, 3 2)"), axes = TRUE, main = "LINESTRING")
plot(st_as_sfc("POLYGON((1 5, 2 2, 4 1, 4 4, 1 5))"), col="gray", axes = TRUE, main = "POLYGON")
par(old_par)
Simple feature geometries will be represented by class ‘sfg’ and these are the geometries that will be in the ‘geom’ column of an ‘sf’ object. You usually won’t have to worry about creating ‘sfg’ objects on your own since you’ll often be given the spatial files. However, it is still important to understand the basics of ‘sfg’. A more detailed description can be found in the online textbook.
‘sfg’ objects will be made from one of three base R data types: 1. A numeric vector: a single point 2. A matrix: a set of points, where each row represents a point, a multipoint or linestring 3. A list: a collection of objects such as matrices, multilinestrings or geometry collections
point1 = st_point(c(5, 2))
point2 = st_point(c(1, 3))
class(point1)
## [1] "XY" "POINT" "sfg"
One sfg object contains only a single simple feature geometry. A simple feature geometry column ‘sfc’ is a list of sfg objects, which is additionally able to contain information about the coordinate reference system in use. ‘sfc’ represents the geometry column in ‘sf’ data frames. ‘sfc’ can contain multiple simple feature geometries and these will usually be of the same type but they do not have to be. However, it is important that all the geometries are of the same CRS since ‘sfc’ can contain information about the CRS.
points_sfc <- st_sfc(point1, point2)
class(points_sfc)
## [1] "sfc_POINT" "sfc"
check the CRS for the points_sfc object. After that, make a new object that uses points_sfc and add the CRS 4326 (hint: st_crs() and st_set_crs())
simple feature geometries and simple feature geometry columns make up all of the important spatial information that we want. Usually, though, we also want to associate these data with non-spatial data known as attributes. Objects of class ‘sf’ accomplish these by combining attribute data with ‘sfc’ objects. This results in ‘sf’ objects which can be treated like regular data frames with an added column for spatial data. When working with vector data, most of your analysis will likely be based off of the final ‘sf’ objects.
Here is a simple example that makes an object of class ‘sf’. Take a moment to walk through this code so that you can understand what is happening in each step.
lnd_point = st_point(c(0.1, 51.5)) # sfg object
lnd_geom = st_sfc(lnd_point, crs = 4326) # sfc object
lnd_attrib = data.frame( # data.frame object
name = "London",
temperature = 25,
date = as.Date("2017-06-21")
)
lnd_sf = st_sf(lnd_attrib, geometry = lnd_geom) # sf object
lnd_sf
## Simple feature collection with 1 feature and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 0.1 ymin: 51.5 xmax: 0.1 ymax: 51.5
## Geodetic CRS: WGS 84
## name temperature date geometry
## 1 London 25 2017-06-21 POINT (0.1 51.5)
class(lnd_sf)
## [1] "sf" "data.frame"
sfg’s are the building blocks of sfc’s which provide the geographic information for sf’s.
The ‘terra’ package provides an extensive set of functions to create, read, export, manipulate and process raster datasets. ‘terra’ is useful because it includes many basic functions as well as advanced functions for working with raster data. ‘spDataLarge’ includes raster datasets that we can use to learn some of these functions.
Read in this raster with this code chunk.
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
my_rast = rast(raster_filepath)
Type the name of the raster you just made into the console and answer the following questions.
What is the class of the object? What are the dimensions? What CRS is being used?
All raster objects will include a header that includes all of this information in addition to the actual data being stored.
Similar to ‘sf’, terra also has a basic function for plotting which is also called ‘plot’. Plot the raster object below.
The ‘SpatRaster’ class is what we will be using to represent raster objects. You will usually be reading raster files in from online servers, but you can also make these manually on your own. Below is an example of some code that will make a raster object of class ‘SpatRaster’
new_raster = rast(nrows = 6, ncols = 6, resolution = 0.5,
xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
vals = 1:36)
’SpatRaster’s can also handle rasters with multiple layers and those layers can be subsetted using terra:subset() and combined using c().
E1. Use summary() on the geometry column of the world data object. What does the output tell us about:
summary(world$geom)
## MULTIPOLYGON epsg:4326 +proj=long...
## 177 0 0
Its geometry type? The number of countries? Its coordinate reference system (CRS)?
E2. Run the following code and play with the arguments to try and see what they do.
plot(world["continent"], reset = FALSE)
cex = sqrt(world$pop) / 10000
world_cents = st_centroid(world, of_largest = TRUE)
## Warning in st_centroid.sf(world, of_largest = TRUE): st_centroid assumes
## attributes are constant over geometries of x
plot(st_geometry(world_cents), add = TRUE, cex = cex)
What does the cex argument do (see ?plot)? Why was cex set to the sqrt(world$pop) / 10000?
Experiment with different ways to visualize the global population.
E3. Look at this example code for a map of India
india = world[world$name_long == "India", ]
plot(st_geometry(india), expandBB = c(0, 0.2, 0.1, 1), col = "gray", lwd = 3)
plot(world_asia[0], add = TRUE)
Adapt the previous code chunk to make a map of Nigeria. Adjust the lwd, col and expandBB arguments of plot(). Challenge: read the documentation of text() and annotate the map.
E4. Create an empty SpatRaster object called my_raster with 10 columns and 10 rows. Assign random values between 0 and 10 to the new raster and plot it.
E5. Read-in the raster/nlcd.tif file from the spDataLarge package. What kind of information can you get about the properties of this file?