library(sf)
library(terra)
library(tidyverse)
library(spData)
library(magrittr)
Spatial objects are primarily described using their location and shape and these characteristics are used for most spatial data operations. In the case of vector data, many of these operations are analogous to their attribute data counterpart. Subsetting, joining, and aggregation use similar functions but there are additional decisions to be made when dealing with vector data. For example, if you want to know if a point lies inside a polygon then you will need to decide if a point that intersects with the border of that polygon still counts and if you want to include any points that are close to being inside the polygon. For raster spatial data, we will build off of what we learned for raster attribute data and add map algebra and the different classes of raster operations: local, focal, zonal, and global.
Spatial subsetting is the process of selecting features of a spatial object based on whether or not they in some way relate in space to another object. It is analogous to attribute subsetting and can be done with the base R operator [] or with the ‘dplyr’ filter() function. The ‘nz’ and ‘nz_height’ datasets can be used to demonstrate this.
canterbury <- nz %>% filter(Name == "Canterbury")
canterbury_height <- nz_height[canterbury, ]
Describe what these two lines of code do and what canterbury_height represents.
Using the [] operator, we can understand subsetting as x[y,] where ‘x’ is a target which is subsetted using the contents of a source object ‘y’. In attribute subsetting, y is often a condition that will be logically evaluated (e.g. data[1:6,] means IF value is of row 1:6 then TRUE, else is FALSE). In subsetting ‘sf’ objects, y can be another ‘sf’ object. In this scenario the condition is whether the ‘y’ ‘sf’ object has a relationship with the ‘x’ ‘sf’ object. The types of topological relationships that we usually use for subsetting are ‘intersects’, ‘touches’, ‘crosses’, and ‘within’ (‘intersects’ is the most comprehensive of these and includes ‘touches’, ‘crosses’, and ‘within’). These are functions from the ‘sf’ package that can be passed to the [] operator using the ‘op =’ argument.
For example, st_disjoint() is a function that acts as the opposite of st_intersects() and includes all the of the target (nz_height) that exists outside the source (canterbury).
canterbury_height1 <- nz_height[canterbury, ,op = st_intersects] #compare this object to the other canterbury_height and see if there is any difference
canterbury_height_within <- nz_height[canterbury, ,op = st_disjoint]
There are other methods for subsetting and if you are interested then you can read the rest of section 4.2.1 in the online textbook.
Topological relations describe the spatial relationships between objects. To demonstrate these topics, we’ll use some made up data.
# create a polygon
a_poly <- st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))))
a <- st_sfc(a_poly)
# create a line
l_line <- st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l <- st_sfc(l_line)
# create points
p_matrix <- matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
p_multi <- st_multipoint(x = p_matrix)
p <- st_cast(st_sfc(p_multi), "POINT")
plot(c(a,l,p))
There are a number of questions we can ask about these geometries. Which points intersect with the line? Which points lie within the polygon? Which points are neither intersecting or within any other geometries? The ‘sf’ functions provide simple verbs to help us bridge our questions to code. (see the st_intersects documentation for help)
We’ve answered the first question below. Add a couple lines of code to answer the other questions
st_intersects(p, l)
## Sparse geometry binary predicate list of length 4, where the predicate
## was `intersects'
## 1: (empty)
## 2: (empty)
## 3: 1
## 4: (empty)
The default output from these functions are matrices in which each row represents a feature in the target object and each column represents a feature in the selecting object. You can check section 4.2.2 in the online textbook for more details but if you want the output to be stored as a single vector then your syntax should match what we show below.
st_intersects(p, l, sparse = FALSE)[, 1]
## [1] FALSE FALSE TRUE FALSE
One additional topological relation that we will discuss is when features are not intersecting but are close. st_is_within_distance() can be used to accomplish this.
sel <- st_is_within_distance(p, a, dist = 0.9) # can only return a sparse matrix
lengths(sel) > 0
## [1] TRUE TRUE FALSE TRUE
#> [1] TRUE TRUE FALSE TRUE
Joining two non-spatial datasets relies on a shared variable. Spatial data joining applies the same concept, but instead relies on shared areas of geographic space (it is also known as spatial overlay). As with attribute data, joining adds a new column to the target object (the argument x in joining functions), from a source object (y).
By default, st_join() performs a left join, but it can also do inner joins by setting the argument left = FALSE. Like spatial subsetting, the default topological operator used by st_join() is st_intersects(). This can be changed with the join argument (see ?st_join for details).
Here is a code chunk that will generate some random points.
set.seed(2018) # set seed for reproducibility
(bb_world = st_bbox(world)) # the world's bounds
## xmin ymin xmax ymax
## -180.00000 -89.90000 179.99999 83.64513
#> xmin ymin xmax ymax
#> -180.0 -89.9 180.0 83.6
random_df = tibble(
x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)
random_points = random_df %>%
st_as_sf(coords = c("x", "y")) %>% # set coordinates
st_set_crs(4326) # set geographic CRS
Now we can join these random points to our world dataset. How many of our random points intersect with a country? (don’t answer by looking at the map)
plot(st_geometry(world), reset = FALSE)
plot(random_points, add = TRUE, col = "red")
random_joined = st_join(random_points, world["name_long"])
Sometimes geographic datasets may not actually touch but may be related to each other. In this case, you may want to join these datasets which can be accomplished using st_join() with st_is_within_distance() passed in as an argument.
plot(st_geometry(world), reset = FALSE)
plot(random_points, add = TRUE, col = "red")
random_joined_1 = st_join(random_points, world["name_long"], join = st_is_within_distance, dist = 150000)
How many more countries are included when dist is set to 100,000? 150,000? 200,000? What are the units?
Aggregation with vector data functions similarly to non-spatial data. Both aggregate() or the ‘tidy’ functions group_by() and summarize() can be used.
These both do the same thing.
nz_avheight = aggregate(x = nz_height, by = nz, FUN = mean)
nz_avheight2 = nz %>%
st_join(nz_height) %>%
group_by(Name) %>%
summarize(elevation = mean(elevation, na.rm = TRUE))
In spatial data aggregation it is important to consider spatial congruence. Spatial objects are congruent when the borders line up properly but are incongruent otherwise. Congruent boundaries are usually those defined by geopolitical borders such as state and county lines. Other borders such as those described by natural boundaries may be incongruent if your original spatial objects were defined by state or county borders. In the case of incongruent borders, it is important to decide how to interpolate the data. One method is area weighted spatial interpolation. In this method the incongruent object will aggregate according to the proportion of the area. For example, if you had a polygon for a county and another polgyon for a feature such as farmland then the farmland polygon will probably incongruent with the boundary of the county. If there are attribute data available for the county such as rainfall per year then you may want to know how much rainfall per year the farmland is getting. If we pretend that the county gets 1000 mm per year of rainfall and the farmland makes up 3/4 of the county by area then we area weighted spatial interpolation would calculate the rainfall for the farmland as 750 mm per year since the area it cover is 3/4 of the entire county.
st_interpolate_aw() is one function allowing for area weighted interpolation. However, depending on the data you are using you may want to use other functions.
st_distance() allows for measurement of distance between spatial objects.
nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
canterbury_centroid = st_centroid(canterbury)
## Warning in st_centroid.sf(canterbury): st_centroid assumes attributes are
## constant over geometries of x
st_distance(nz_heighest, canterbury_centroid)
## Units: [m]
## [,1]
## [1,] 115540
Notice the output from st_distance(). It does not only include the value but also includes the units. It is also returned as a matrix because st_distance can be used to create a distance matrix.
This code chunk creates a distance matrix for the first three rows of nz_height to Canterbury and Otago.
co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co)
## Units: [m]
## [,1] [,2]
## [1,] 123537.16 15497.72
## [2,] 94282.77 0.00
## [3,] 93018.56 0.00
Can you guess why some of the values returned in the distance matrix are 0?
You can check the map for help.
plot(st_geometry(co))
plot(st_geometry(nz_height)[1:3], add = TRUE)
Create a distance matrix for the first 5 points in the nz_height data.
plot(st_geometry(nz))
plot(st_geometry(nz_height)[1:5], add = TRUE)
We’ll use these files for the examples in this section.
elev = rast(system.file("raster/elev.tif", package = "spData"))
grain = rast(system.file("raster/grain.tif", package = "spData"))
One method of subsetting is to use coordinates. This can be done using the function cellfromXY() or extract(). Both functions are from the ‘terra’ package. Make sure to index to ‘terra’ when using extract() because extract() is also a function in the ‘tidyverse’
id = cellFromXY(elev, xy = matrix(c(0.1, 0.1), ncol = 2))
elev[id]
## elev
## 1 16
# the same as
terra::extract(elev, matrix(c(0.1, 0.1), ncol = 2))
## elev
## 1 16
You can also use other rasters to subset a raster. Shown below:
clip = rast(xmin = 0.9, xmax = 1.8, ymin = -0.45, ymax = 0.45,
resolution = 0.3, vals = rep(1, 9))
elev[clip]
## elev
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
## 9 9
## 10 10
## 11 11
## 12 12
## 13 13
## 14 14
## 15 15
## 16 16
## 17 17
## 18 18
## 19 19
## 20 20
## 21 21
## 22 22
## 23 23
## 24 24
## 25 25
## 26 26
## 27 27
## 28 28
## 29 29
## 30 30
## 31 31
## 32 32
## 33 33
## 34 34
## 35 35
## 36 36
terra::extract(elev, ext(clip))
## elev
## 1 18
## 2 24
These methods return the values from the cells in the raster. To make a raster by subsetting another raster you can use the [] operator with the ‘drop’ argument set to FALSE.
These are two ways to do the same thing using the [] operator
elev[1:2, drop = FALSE] # spatial subsetting with cell IDs
## elev
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
## 9 9
## 10 10
## 11 11
## 12 12
elev[1, 1:2, drop = FALSE] # spatial subsetting by row,column indices
## elev
## 1 1
## 2 2
Masking is also a common method of subsetting raster data. In masking, you will use a raster where the values of the cells are logical (TRUE or FALSE/NA). You can mask a raster using the mask raster so that you only keep the TRUE cells.
# create raster mask
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
These are two ways of doing the same thing. You can use the [] operator or mask()
# spatial subsetting
elev[rmask, drop = FALSE] # with [ operator
## elev
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 19
## 8 20
## 9 21
## 10 22
## 11 23
## 12 24
mask(elev, rmask) # with mask()
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## name : elev
## min value : 1
## max value : 36
You can also subset using logical statements like below.
elev[elev < 20] = NA
One of the benefits of using raster data is the ability to perform map algebra. Map algebra works because raster data are really just matrices of values that implicitly correspond to coordinates. The coordinates are stored as a header in the files which can be used to look up the coordinates that a cell corresponds to. So if we are using multiple rasters that have the same extent, resolution, and projection then we can perform quick operations on them without having to worry about the actual coordinates. If this doesn’t make sense then you can imagine using transparencies with an overhead projector. As long as the transparent sheets are designed to be compatible then you can easily align them and project all of the data together. If you were born after 2000 then you probably don’t know what I am talking about.
Anyways, terra is useful because it will allow us to do map algebra only if the raster layers correspond with each other. Map algebra can be broken down into four subclasses: local, focal, zonal, and global operations. These vary in how many cells that are processed and what type of output you will receive.
Local operations are simple because they are just cell-by-cell operations over one or multiple layers. Using local operations, it is easy to add, subtract, multiple, or divide
elev + elev
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## name : elev
## min value : 40
## max value : 72
elev^2
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## name : elev
## min value : 400
## max value : 1296
log(elev)
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## name : elev
## min value : 2.995732
## max value : 3.583519
elev > 5
## class : SpatRaster
## dimensions : 6, 6, 1 (nrow, ncol, nlyr)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## name : elev
## min value : 1
## max value : 1
Use the elev raster and write two local operations that will return a raster where all the values are 1. Confirm this using a logical operator (> < =)
Local operations can also be used to classify cells and create factors. This is done by creating a classification matrix that includes the rules for the different factors and then using classify() to do the classification.
Make the classification matrix
rcl <- matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE)
rcl
## [,1] [,2] [,3]
## [1,] 0 12 1
## [2,] 12 24 2
## [3,] 24 36 3
Classify
recl <- classify(elev, rcl = rcl)
Other valuable functions for local operations include app(), tapp(), and lapp() which allow for efficient map algebra that takes multiple layers and returns a single output layer. If you are using large datasets and want to perform more complex functions then it would be preferable to use these functions. You can find more information about these in section 4.3.3 in the online textbook. Another fun thing you can do with local operations is apply a statistical model to the data where multiple layer can be used as predictors with a single layer as a response variable.
Focal operations differ from local operations because they operate on more than one cell at a time. Focal operations use a focal cell and a defined set of neighbors. Usually, these neighbors are the adjacent cells so that altogether the focal cell and its neighbors make up a 3x3 grid. However, neighbors can be defined in any way.
The focal() function can be used for focal operations. The way this works is that the focal cell will be given an output that depends on the cell itself and the neighboring cells.
r_focal <- focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)
Try to guess what this line of code will do.
Zonal operations are similar to focal operations because they perform a function using multiple cells. However, zonal operations do not use a defined rule for its neighbors but instead use another raster layer, which is usually a categorical raster, to define a ‘zonal filter’. Zonal operations are typically used to aggregate data according to the factors of the zonal filter and so they will return a table rather than a raster layer.
z = zonal(elev, grain, fun = "mean")
z
## grain elev
## 1 clay NA
## 2 silt NA
## 3 sand NA
Global operations are a special case of zonal operations where the ‘zonal filter’ is just the entire raster layer. A simple operation would be to locate the min or max. a more complex operation could include calculation of distances from all cells to a target cell.
E1. Canterbury iss the region of New Zealand containing most of the 100 highest points in the country. How many of these high points does the Canterbury region contain?
E2. Which region has the second highest number of nz_height points in, and how many does it have?
E3. Generalizing the question to all regions: how many of New Zealand’s 16 regions contain points which belong to the top 100 highest points in the country? Which regions?
Bonus: create a table listing these regions in order of the number of points and their name. E4. Use data(dem, package = “spDataLarge”), and reclassify the elevation in three classes: low, medium and high. Secondly, attach the NDVI raster (data(ndvi, package = “spDataLarge”)) and compute the mean NDVI and the mean elevation for each altitudinal class.
data(dem, package = "spDataLarge")
data(ndvi, package = "spDataLarge")
E5. Apply a line detection filter to raster(system.file(“external/rlogo.grd”, package = “raster”)). Plot the result. Hint: Read ?raster::focal().
E6. Calculate the NDVI of a Landsat image. Use the Landsat image provided by the spDataLarge package (system.file(“raster/landsat.tif”, package = “spDataLarge”)).
E7. A StackOverflow post shows how to compute distances to the nearest coastline using raster::distance(). Retrieve a digital elevation model of Spain, and compute a raster which represents distances to the coast across the country (hint: use getData()). Second, use a simple approach to weight the distance raster with elevation (other weighting approaches are possible, include flow direction and steepness); every 100 altitudinal meters should increase the distance to the coast by 10 km. Finally, compute the difference between the raster using the Euclidean distance and the raster weighted by elevation. Note: it may be wise to increase the cell size of the input raster to reduce compute time during this operation.