An end-to-end deep learning geospatial segmentation project using Pytorch and TorchGeo packages
Introduction
According to its documentation, TorchGeo is a “PyTorch domain library providing datasets, samplers, transforms, and pre-trained models specific to geospatial data”. The cause is noble. Make it easier for practitioners to use Deep Learning models on geospatial data. And why is that a good deal?
In a last years’ presentation from Dan Morris (former principal scientist at Microsoft’s AI for Earth program) to the IEEE-GRSS (Geoscience and Remote Sensing Society), he highlighted some challenges related to geospatial analysis (link to the presentation is here):
Working with geospatial data is a pain unless you have a PhD in remote sensing;
Working with very large data is a pain unless you have a PhD in distributed computing; and
The people at the front lines of conservation usually have neither of the above.
On the top of that, people working with Artificial Intelligence for geospatial analysis have an extra layer of complexity, because most frameworks are developed for RGB pictures and don’t take into account the specificities of geospatial data:
Huge images that don’t pass through a neural network due to volume;
Different scales;
Different projections and CRS;
Temporal component; and others.
So, at the present, it is really challenging for someone to apply deep learning models to geospatial tasks without having knowledge on these diverse subjects.
In this context, the TorchGeo library has been launched on November 2021 to address some of these challenges. More than one year has passed and there is still very limited documentation (videos, tutorials, etc.) available for it. That’s why I decided to jump in and make an entire project in my field of expertise (Surface water analysis) using the TorchGeo library.
Considering the project’s size, it will be part of a series published here on GeoCorner.
Contents
The idea is to cover the following topics throughout the series:
The dataset;
Creating RasterDatasets, DataLoaders and Samplers for images and masks;
Intersection Dataset;
Normalizing the data;
Creating spectral indices;
Creating the segmentation model (U-Net);
Loss function and metrics; and
Training loop.
Environment
To start, we will prepare the environment. For simplicity, we will be using google Colab (here), as it has the basic modules pre-installed and it provides free GPUs for training the final model. It can also be done in the personal computer, but for that, you will need to install all the dependencies such as Pytorch, GDAL, etc., and also have GPU with CUDA driver working.
Once we have created an user account and a new Jupyter Notebook within Colab, we can install the missing modules that will be needed in the project.
We will install the libraries rasterio and torchgeo using the Pip command. Then, we will silence the warnings from rasterio, because there is a problem with the color channels in the original dataset (but that doesn’t impact our goal).
Dataset
The dataset we will use is the Earth Surface Water dataset [1] (licensed under Creative Commons Attribution 4.0 International Public License), which has patches from different parts of the world (Figure 1) and its corresponding water masks. The dataset uses optical imagery from Sentinel-2 satellite with 10m of spatial resolution.
Another advantage of this dataset is that we have the performance benchmark to compare our results, as it has been used in a peer reviewed publication:
Xin Luo, Xiaohua Tong, Zhongwen Hu. An applicable and automatic method for earth surface water mapping based on multispectral images. International Journal of Applied Earth Observation and Geoinformation, 2021, 103, 102472. [Link]
We will download the dataset directly to Colab using the wget command, uncompress it. Once uncompressed, the data will have the structure shown in Figure 2, where tra stands for training set and val stands for the validation set. The suffixes scene and truth are meant to separate the original patches and the corresponding masks (ground truth).
To open an sample, we will load the training images (tra_scene) and the the training masks (tra_truth) in lists and then open one index using XArray. Then, with the help of matplotlib we will display the results, as explained in the following snipet.
Creating RasterDataset for the images
Now that we have the original dataset already uncompressed in Colab’s environment, we can prepare it to be loaded into a neural network. For that, we will create an instance of the RasterDataset class, provided by TorchGeo, and point to the specific directory, using the following command (unbind_samples and stack_samples will be used in the sequence):
from torchgeo.datasets import RasterDataset, unbind_samples, stack_samples
train_ds = RasterDataset(
root=(root/’tra_scene’).as_posix(),
crs='epsg:3395', res=10
)
Note that we are specifying the CRS (Coordinate Reference System) to EPSG:3395. TorchGeo requires that all the images are loaded in the same CRS. However, the patches in the dataset are in different UTM projections and the default behavior of TorchGeo is to use the first CRS found as its default. In this case, we have to inform a CRS that is able to cope with these different regions around the globe. To minimize the deformations due to the huge differences in latitude (I can create a history specific for this purpose) within the patches, I have selected World Mercator as the main CRS for the project. Figure 3 shows the world projected in World Mercator CRS.
Sampler
To create training patches that can be fed into a neural network from our dataset, we need to select samples of fixed sizes. TorchGeo has many samplers, but here we will use the RandomGeoSampler class. Basically, the sampler selects random bounding boxes of fixed size that belongs to the original image. Then, these bounding boxes are used in the RasterDataset to query the portion of the image we want. To create the sampler, we just need to enter the following line
from torchgeo.samplers import RandomGeoSampler
sampler = RandomGeoSampler(train, size=(512, 512), length=100)
Size is the shape of the training patches we want and length is the number of patches this sampler will provide as one epoch.
To draw one random item from the dataset, we can call the sampler for one bounding box and them pass this bounding box to the dataset. The result will be a dictionary with the following entries: image, crs and bbox.
import torch # this is to get the same result in every pass
torch.manual_seed(0)
bbox = next(iter(sampler))
sample = train_ds[bbox]
print(sample.keys())
print(sample['image'].shape)
Code output:
dict_keys(['image', 'crs', 'bbox'])torch.Size([6, 512, 512])
Note that our image has shape (6, 512, 512). It means we have 6 spectral channels and the size of 512 rows and 512 columns as we specified for the sampler. To finish this first part, we will display this sample using again the matplotlib, but taking care of the channels to display the RGB correctly. According to the dataset description, the 6 channels are (in this order): Blue, Green, Red, Nir, Swir1, Swir2. Matplotlib expects an array of shape (height, width, channels), where channels are the R, G and B (in this order). We will change the order of the dimensions (axis) using the transpose command.
Besides this, we need to convert our values from integers to float and scale by 1/10000 (Sentinel-2 specs).
import torchimport matplotlib.pyplot as plt
arr = torch.clamp(sample['image']/10000, min=0, max=1).numpy()
rgb = arr.transpose(2, 1, 0)[:, :, [2, 1 , 0]]
plt.imshow(rgb*3)
Conclusion
In this first part, we’ve seen how to create a RasterDataset in TorchGeo and how to draw fixed-sized samples from it using a Sampler. That’s the first step for our workflow. In the next part, we will see how to create the dataset for the masks and combine both into an IntersectionDataset. Then, draw batch samples using a DataLoader. But that’s subject for the next story.
Comments