library(sf)  
library(terra)  
library(tidyverse)  
library(spData)
library(spDataLarge)
library(magrittr)

Geometry operations

So far, we have learned how to perform operations on your data according to the attribute (non-spatial data) and then using the spatial data itself. This section will teach you how to perform operations on the actual spatial geometry data which will allow you to manipulate the geometries, add buffers, or create centroids.

Vector geometries can be manipulated using either unary or binary operations. Unary operations only use the geometry that you are manipulating. Binary operations rely on a separate geometry to make the changes to your target geometry.

Raster geometries can also be manipulated to change the resolution (number of pixels) and the extent and origin of the raster. This is useful when trying to align different raster data so that you can perform map algebra.

Finally, we will discuss raster-vector interactions.

Geometric operations on vector data

This section focuses on operations performed on the geometry of ‘sf’ objects which means we will be working with objects of class ‘sfc’ in addition to objects of class ‘sf’

Simplification

Simplification generalizes lines and polygons to remove detail from the geometries. This is useful to reduce the amount of memory required to visualize these geometries. st_simplify() uses an algorithm for simplifying known as the Douglas-Peucker algorithm with an argument of dTolerance to control the intensity of the simplication.

plot(st_geometry(seine))

seine_simp = st_simplify(seine, dTolerance = 2000)  # 2000 m
plot(st_geometry(seine_simp))

We can also do this with polygons

us_states2163 = st_transform(us_states, 2163)
plot(st_geometry(us_states2163))

us_states_simp1 = st_simplify(us_states2163, dTolerance = 100000)  # 100 km
plot(st_geometry(us_states_simp1))

One issue that you can see is that st_simplify() works on a per-geometry basis. In order to make the borders line up we can use ms_simplify() from ‘rmapshaper’

#install.packages("rmapshaper")
library(rmapshaper)
## Registered S3 method overwritten by 'geojsonlint':
##   method         from 
##   print.location dplyr
# proportion of points to retain (0-1; default 0.05)
us_states2163$AREA = as.numeric(us_states2163$AREA)
us_states_simp2 = rmapshaper::ms_simplify(us_states2163, keep = 0.01,
                                          keep_shapes = TRUE)
plot(st_geometry(us_states_simp2))

Centroids

Centroid operations identify the center of geographic objects. Like the center of anything, there are different ways to define it (e.g. how would you define the center of your body?). One of the most common ways to define the center of geographic objects is to use the center of mass (imagine cutting out the map of Georgia from a piece of paper and balancing it on your finger). st_centroid() allows for calculation of geographic centroids.

nz_centroid <- st_centroid(nz)
## Warning in st_centroid.sf(nz): st_centroid assumes attributes are constant over
## geometries of x
seine_centroid <- st_centroid(seine)
## Warning in st_centroid.sf(seine): st_centroid assumes attributes are constant
## over geometries of x

It’s possible that the centroid could fall outside of the geometry (consider a donut). In this case, you can use st_point_on_surface which will ensure that points are on the surface of the geometry

nz_pos = st_point_on_surface(nz)
## Warning in st_point_on_surface.sf(nz): st_point_on_surface assumes attributes
## are constant over geometries of x
seine_pos = st_point_on_surface(seine)
## Warning in st_point_on_surface.sf(seine): st_point_on_surface assumes attributes
## are constant over geometries of x
library(tmap)
p_centr1 = tm_shape(nz) + tm_borders() +
  tm_shape(nz_centroid) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(nz_pos) + tm_symbols(shape = 1, col = "red", size = 0.5)  
p_centr2 = tm_shape(seine) + tm_lines() +
  tm_shape(seine_centroid) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(seine_pos) + tm_symbols(shape = 1, col = "red", size = 0.5)  
tmap_arrange(p_centr1, p_centr2, ncol = 2)

Buffers

Buffers are polygons representing the area within a given distance of a geometric feature: regardless of whether the input is a point, line or polygon, the output is a polygon. Buffers are often used for data analysis when you want to know answers to questions such as: how many points are within this distance of this line?

seine_buff_5km = st_buffer(seine, dist = 5000)
seine_buff_50km = st_buffer(seine, dist = 50000)
plot(st_geometry(seine))

plot(st_geometry(seine_buff_5km))

plot(st_geometry(seine_buff_50km))

Affine transformations

Affine transformation is any transformation that preserves lines and parallelism. However, angles or length are not necessarily preserved. Affine transformations include, among others, shifting (translation), scaling and rotation. Additionally, it is possible to use any combination of these. Affine transformations are an essential part of geocomputation.

The ‘sf’ package implements affine transformation for objects of classes ‘sfg’ and ‘sfc’.

First, make an object of class ‘sfc’

nz_sfc = st_geometry(nz)

Shifting moves every point by the same distance in map units.

nz_shift = nz_sfc + c(0, 100000)

Scaling enlarges or shrinks objects by a factor. This can be done globally or locally. If global, the relative position of the geometry features will stay the same and be scaled together. If scaling locally, then each of the geometry features will scale relative to a point that you must define (usually the centroid).

nz_centroid_sfc = st_centroid(nz_sfc)
nz_scale = (nz_sfc - nz_centroid_sfc) * 0.5 + nz_centroid_sfc

Rotating requires a rotation matrix to be defined

rotation = function(a){
  r = a * pi / 180 #degrees to radians
  matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
} 

nz_rotate can then be used with the defined rotation matrix wrapped in the rotation function

nz_rotate = (nz_sfc - nz_centroid_sfc) * rotation(30) + nz_centroid_sfc
st_crs(nz_shift) = st_crs(nz_sfc)
st_crs(nz_scale) = st_crs(nz_sfc)
st_crs(nz_rotate) = st_crs(nz_sfc)
p_at1 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_shift) + tm_polygons(col = "red") +
  tm_layout(main.title = "Shift")
p_at2 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_scale) + tm_polygons(col = "red") +
  tm_layout(main.title = "Scale")
p_at3 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_rotate) + tm_polygons(col = "red") +
  tm_layout(main.title = "Rotate")
tmap_arrange(p_at1, p_at2, p_at3, ncol = 3)

Finally, the newly created geometries can replace the old ones with the st_set_geometry() function:

nz_scale_sf = st_set_geometry(nz, nz_scale)

Clipping

Spatial clipping is a form of spatial subsetting that only affects the geometry columns of some of the affected features. Clipping only applies to features more complex than points.

This creates a venn diagram with two points x and y for us to use as an example

b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text

This selects the area that is shared between x and y

x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting area

Use these other functions instead of st_intersection() to try and select other areas of the venn diagram: st_union(), st_difference(), st_sym_difference()

Geometry Unions

When working with attribute data, we were still able to aggregate features and dissolve polygons together. When we did this we used summarize() and aggregate() but the function that was working behind the scenes to dissolve the polygons was actually st_union(). This function can take two geometries and then unite them.

us_west = us_states[us_states$REGION == "West", ]
us_west_union = st_union(us_west)

Type transformations

Geometry casting is a powerful operation that enables transformation of the geometry type. st_cast() is used for this operation and it is important to note that it will behave differently on ‘sfg’, ‘sfc’, and ‘sf’ objects. st_cast() will allow you to change between points, lines, and polygons. When changing between multi versions of these, it will take a multi-object and turn it into multiple different objects. Refer to the online textbook for more information on using st_cast and perform geometric type transformations.

Geometric operations on raster data

Geometric raster operations include the shift, flipping, mirroring, scaling, rotation or warping of images.

There are certain geometric operations on raster data that R does not handle well so if you need to perform these then you can work with dedicated GIS softwares and the online textbook also goes in to bridging between R and GIS software in chapter 9.

However, R can handle operations such as changing the extent, resolution, and origin of a raster. This can be important when trying to match raster layers and it can also be useful to aggregate (lower the resolution) of a raster so that it is less computationally intensive.

Geometric intersections

We previously saw how to extract values from a raster overlaid by other spatial objects. To retrieve a spatial output, we can use almost the same subsetting syntax. The only difference is that we have to make clear that we would like to keep the matrix structure by setting the drop argument to FALSE. This will return a raster object containing the cells whose midpoints overlap with clip.

elev = rast(system.file("raster/elev.tif", package = "spData"))
clip = rast(xmin = 0.9, xmax = 1.8, ymin = -0.45, ymax = 0.45,
            resolution = 0.3, vals = rep(1, 9))
crop(elev,clip)
## class       : SpatRaster 
## dimensions  : 2, 1, 1  (nrow, ncol, nlyr)
## resolution  : 0.5, 0.5  (x, y)
## extent      : 1, 1.5, -0.5, 0.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source      : memory 
## name        : elev 
## min value   :   18 
## max value   :   24

For the same operation we can also use the intersect() and crop() command.

Extent and Origin

In order for rasters to match, the extent and origin of the raster need to match. When using publicly available data such as satellite imagery, the files may often come in different resolution or projections.

If the rasters only differ in their extent then you can use extend() with both rasters as the argument so that the smaller one extends to match the larger one. Also, terra will conveniently throw an error if two rasters do not match.

elev = rast(system.file("raster/elev.tif", package = "spData"))
elev_2 = extend(elev, c(1, 2)) #extended to be longer than elev

Try to run this code chunk. What happens?

#elev_3 = elev + elev_2
elev_4 = extend(elev, elev_2)

By default, the origin is the cell corner closest to the coordinates (0,0). origin() will retrieve the origin of a raster.

origin(elev_4)
## [1] 0 0

elev_4 conveniently has a corner with coordinates (0,0)

If the origin does not match then the rasters will not match. This can be fixed by using origin() again with the raster as the argument.

# change the origin
#origin(elev_4) <- c(0.25, 0.25)
plot(elev_4)
plot(elev, add = TRUE) # and add the original raster

Aggregation and disaggregation

Raster datasets can also differ with regard to their resolution. To match resolutions, one can either decrease, using aggregate(), or increase, using disagg(), the resolution of one raster.

In this example, we decrease the resolution by aggregating by a factor of 5. The resulting cell uses the mean of the input cells but the function could be set to other functions such as median() or sum().

data(dem, package="spDataLarge")
dem <- rast(dem)
dem_agg = aggregate(dem, fact = 5, fun = mean)
plot(dem)

plot(dem_agg)

By contrast, the disaggregate() function increases the resolution. However, we have to specify a method on how to fill the new cells. The disaggregate() function provides two methods. The default one (method = “near”) simply gives all output cells the value of the input cell, and hence duplicates values which leads to a blocky output image.

dem_disagg = terra::disaggregate(dem_agg, fact = 5, method = "near")

bilinear is another method that may produce better looking results.

dem_disagg = terra::disaggregate(dem_agg, fact = 5, method = "bilinear")

However, with disaggregation we are always interpolating the values for the new cells so the outputs will always be limited by the lower resolution input. Unfortunately,there is no enhance() function.

zoom and enhance

Resampling

Aggregation and disaggregation are useful when changing the resolution of a single raster. However, in other situations you may have multiple rasters with varying extent and resolution so you will want to resample which is a process for computing values for new pixel locations. There are a variety of methods to resample and these are detailed in the online textbook. The simplest versions are nearest neighbor and bilinear interpolation. Nearest neighbor assigns the value of the nearest cell of the original raster to the cell of the target one. It is fast and usually suitable for categorical rasters. Bilinear interpolation assigns a weighted average of the four nearest cells from the original raster to the cell of the target one. This is the fastest method for continuous rasters. To resample you can use the resample() function which allows you to designate the rasters you would like to work with and the method for resampling. For the exercises in this course you will not need to resample but in your own work with raster data you may need to and we suggest turning to the online textbook and other sources for more information on how to best do this for your situation.

Raster-Vector Interactions

This section focuses on interactions between raster and vector geographic data models. It includes four main techniques: raster cropping and masking using vector objects; extracting raster values using different types of vector data; and raster-vector conversion. The above concepts are demonstrated using data used in previous chapters to understand their potential real-world applications.

Raster cropping

The ability to crop and mask rasters using vector data is valuable because raster data is usually over a larger geographic area than your vector data. Limiting your raster data to the geographic area described by your vector geometries will be less computationally expensive and will help to develop nicer maps.

We will use these files to demonstrate raster cropping and masking

srtm = rast(system.file("raster/srtm.tif", package = "spDataLarge"))
zion = st_read(system.file("vector/zion.gpkg", package = "spDataLarge"))
## Reading layer `zion' from data source 
##   `/Users/dcsuh/Library/R/4.0/library/spDataLarge/vector/zion.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 1 feature and 11 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 302903.1 ymin: 4112244 xmax: 334735.5 ymax: 4153087
## Projected CRS: UTM Zone 12, Northern Hemisphere
zion = st_transform(zion, crs(srtm))

We can start by using crop() to crop our raster using our vector data. Cropping will limit the raster to the smallest rectangular space that still captures the vector geometry

srtm_cropped = crop(srtm, vect(zion))

After cropping, we can mask() the rest of the raster data outside the vector area so that it is NA

srtm_masked = mask(srtm, vect(zion))

mask() can also be altered with the argument ‘inverse’ set to TRUE to turn all of the raster inside the vector into NA

srtm_inv_masked = mask(srtm, vect(zion), inverse = TRUE)

Cropping and masking are typically used so that only the area of interest is left.

Extraction

Raster extraction is the process of identifying and returning the values associated with a ‘target’ raster at specific locations, based on a (typically vector) geographic ‘selector’ object.

The basic example is of extracting the value of a raster cell at specific points. For this purpose, we will use zion_points, which contain a sample of 30 locations within the Zion National Park. The following command extracts elevation values from srtm and creates a data frame with points’ IDs (one value per vector’s row) and related srtm values for each point. Now, we can add the resulting object to our zion_points dataset with the cbind() function:

data("zion_points", package = "spDataLarge")
elevation = terra::extract(srtm, vect(zion_points))
zion_points = cbind(zion_points, elevation)

Line selectors can also be used for extraction but it is often better to use a series of points rather than a line. This is achieved by making a line and then splitting it up into many points as demonstrated below:

Make line

zion_transect = cbind(c(-113.2, -112.9), c(37.45, 37.2)) %>%
  st_linestring() %>% 
  st_sfc(crs = crs(srtm)) %>% 
  st_sf()

Split into points

zion_transect$id = 1:nrow(zion_transect)
zion_transect = st_segmentize(zion_transect, dfMaxLength = 250)
zion_transect = st_cast(zion_transect, "POINT")
## Warning in st_cast.sf(zion_transect, "POINT"): repeating attributes for all sub-
## geometries for which they may not be constant

Find the distance between points

zion_transect = zion_transect %>% 
  group_by(id) %>% 
  mutate(dist = st_distance(geometry)[, 1]) 

Extract elevation value for each point and combine with main object

zion_elev = terra::extract(srtm, vect(zion_transect))
zion_transect = cbind(zion_transect, zion_elev)

What this creates is an elevation map for a straight line through Zion National Park

Finally, polygons can be used for raster extractions. Polygons can return a lot of values per polygon.

zion_srtm_values = terra::extract(x = srtm, y = vect(zion))

This resulting data frame is useful for calculating summary statistics of certain polygons

group_by(zion_srtm_values, ID) %>% 
  summarize(across(srtm, list(min = min, mean = mean, max = max)))
## # A tibble: 1 × 4
##      ID srtm_min srtm_mean srtm_max
##   <dbl>    <dbl>     <dbl>    <dbl>
## 1     1     1122     1818.     2661

Rasterization

Rasterization is the conversion of vector objects into their representation in raster objects. The terra package contains the function rasterize() for doing this work. Its first two arguments are, x, vector object to be rasterized and, y, a ‘template raster’ object defining the extent, resolution and CRS of the output. Deciding on which resolution to use can be difficult because if it is too low then the result may miss the full geographic variability of the vector data; if it is too high, computational times may be excessive.

To demonstrate rasterization in action, we will use a template raster that has the same extent and CRS as the input vector data ‘cycle_hire_osm_projected’ and spatial resolution of 1000 meters:

cycle_hire_osm_projected = st_transform(cycle_hire_osm, "EPSG:27700")
raster_template = rast(ext(cycle_hire_osm_projected), resolution = 1000,
                       crs = st_crs(cycle_hire_osm_projected)$wkt)

rasterize() is used for rasterization and is very flexible. Try to work out what each of these different methods is doing when rasterizing.

ch_raster1 = rasterize(vect(cycle_hire_osm_projected), raster_template,
                       field = 1)
ch_raster2 = rasterize(vect(cycle_hire_osm_projected), raster_template, 
                       fun = "length")
ch_raster3 = rasterize(vect(cycle_hire_osm_projected), raster_template, 
                       field = "capacity", fun = sum)

Spatial Vectorization

Vectorization is the conversion of raster objects into their representation in vector objects. It involves converting spatially continuous raster data into spatially discrete vector data such as points, lines or polygons.

The simplest form of vectorization is to convert the centroids of raster cells into points. as.points() does exactly this for all non-NA raster grid cells. Note, here we also used st_as_sf() to convert the resulting object to the sf class.

elev_point = as.points(elev) %>% 
  st_as_sf()

Another common type of spatial vectorization is the creation of contour lines representing lines of a continuous variable such as elevation or temperature.

data(dem, package="spDataLarge")
dem <- rast(dem)
cl = as.contour(dem)
plot(dem, axes = FALSE)
plot(cl, add = TRUE)

The final type of vectorization involves conversion of rasters to polygons. This can be done with terra::as.polygons(), which converts each raster cell into a polygon consisting of five coordinates, all of which are stored in memory (explaining why rasters are often fast compared with vectors!).

This is illustrated below by converting the grain object into polygons and subsequently dissolving borders between polygons with the same attribute values (also see the dissolve argument in as.polygons()).

grain = rast(system.file("raster/grain.tif", package = "spData"))
grain_poly = as.polygons(grain) %>% 
  st_as_sf()

Exercises

5.5 Exercises Some of the exercises use a vector (zion_points) and raster dataset (srtm) from the spDataLarge package. They also use a polygonal ‘convex hull’ derived from the vector dataset (ch) to represent the area of interest:

library(sf)
library(terra)
zion_points = read_sf(system.file("vector/zion_points.gpkg", package = "spDataLarge"))
srtm = rast(system.file("raster/srtm.tif", package = "spDataLarge"))
ch = st_combine(zion_points) %>%
  st_convex_hull() %>% 
  st_as_sf()

E1. Generate and plot simplified versions of the nz dataset. Experiment with different values of keep (ranging from 0.5 to 0.00005) for ms_simplify() and dTolerance (from 100 to 100,000) st_simplify().

At what value does the form of the result start to break down for each method, making New Zealand unrecognizable? Advanced: What is different about the geometry type of the results from st_simplify() compared with the geometry type of ms_simplify()? What problems does this create and how can this be resolved?

E2. It was previously established that Canterbury region had 70 of the 101 highest points in New Zealand. Using st_buffer(), how many points in nz_height are within 100 km of Canterbury?

E3. Find the geographic centroid of New Zealand. How far is it from the geographic centroid of Canterbury?

E4. Most world maps have a north-up orientation. A world map with a south-up orientation could be created by a reflection (one of the affine transformations not mentioned in this chapter) of the world object’s geometry. Write code to do so. Hint: you need to use a two-element vector for this transformation. Bonus: create an upside-down map of your country.

E5. Subset the point in p that is contained within x and y.

Using base subsetting operators. Using an intermediary object created with st_intersection().

E6. Calculate the length of the boundary lines of US states in meters. Which state has the longest border and which has the shortest? Hint: The st_length function computes the length of a LINESTRING or MULTILINESTRING geometry.

E7. Crop the srtm raster using (1) the zion_points dataset and (2) the ch dataset. Are there any differences in the output maps? Next, mask srtm using these two datasets. Can you see any difference now? How can you explain that?

E8. Firstly, extract values from srtm at the points represented in zion_points. Next, extract average values of srtm using a 90 buffer around each point from zion_points and compare these two sets of values. When would extracting values by buffers be more suitable than by points alone?

E9. Subset points higher than 3100 meters in New Zealand (the nz_height object) and create a template raster with a resolution of 3 km. Using these objects:

Count numbers of the highest points in each grid cell. Find the maximum elevation in each grid cell.

E10. Aggregate the raster counting high points in New Zealand (created in the previous exercise), reduce its geographic resolution by half (so cells are 6 by 6 km) and plot the result.

Resample the lower resolution raster back to a resolution of 3 km. How have the results changed? Name two advantages and disadvantages of reducing raster resolution.

E11. Polygonize the grain dataset and filter all squares representing clay.

Name two advantages and disadvantages of vector data over raster data. At which points would it be useful to convert rasters to vectors in your work?