Learn how to install the basic geospatial dependencies, such as GDAL and XArray, and deploy them as a container
Introduction
Newcomers to the Python programming language quickly understand the significance of utilizing virtual environments and package management tools. The vast number of packages available presents a challenge in maintaining compatibility among dependencies, making virtual environments and package management critical components of a well-organized Python environment.
This complexity of managing dependencies is exacerbated when working with geospatial analysis. In addition to the numerous packages utilized in data science, it is necessary to incorporate specialized libraries such as GDAL, Rasterio, and STAC to support this type of analysis. Besides, it is widely known that GDAL can be particularly difficult to install, regardless of the operating system architecture, be it Windows, Linux, or Mac.
If you want to deploy your geospatial environment to a cloud server, relying on basic conda skills just won’t cut it. Containerizing your environment is the way to go, to ensure compatibility and stability in the target deployment environment.
So, this article is here to save the day! It’s a quick and straightforward guide to setting up a minimalist Docker image, loaded with all the essential tools for geospatial analysis using Python. No more headaches, no more fuss!
Installing Docker
First, we will need Docker installed. On Mac or Windows this can be done through the installation of Docker Desktop directly from docker.com (Figure 1).
If you are on linux, the installation can be done using apt package manager:
> sudo apt-get update
> sudo apt-get install docker.io
> sudo systemctl start docker
> docker run hello-world
Base Image
The next step is to find a base image on Docker Hub to build on top of. There are images available that have all the geospatial dependencies pre-installed, such as those provided by the Pangeo community. However, the downside is that these images have a large compressed size of 1.42 GB.
Normally, the official Python images would be a good starting point for running Python, but installing GDAL on these images can be cumbersome. After exploring various options, I discovered that the easiest way is to start with an image that has GDAL pre-installed. This image is provided by the OSGeo community at hub.docker.com/u/osgeo (Figure 2).
Once in the osgeo/gdal repository, we can go to the tab Tags. Besides the latest image version, we can note several other versions are available for different purposes and with different sizes. The latest has more than 1Gb of compressed size. After trying different versions, I found out that their ‘ubuntu-small’ version strikes a good balance between size (142 MB compressed) and compatibility with the required packages. So let's grab this one.
Note: The following steps are here just for educational purpose and check if the necessary packages can be installed successfully. We could jump directly to creating a Dockerfile from this image.
On the terminal or command line, we can run the following commands to pull the image and to create a container and enter it:
> docker pull osgeo/gdal:ubuntu-small-latest
> docker run -it osgeo/gdal:ubuntu-small-latest
Note that the prompt will change to root@<container_id>:/#.
Once "inside" the container, we can check for the installed versions of basic packages. So, let's type the command python to enter the Python interpreter.
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from osgeo import gdal
>>> gdal.__version__
'3.7.0dev-f26e795279c48852b44bc9659d728421544528b9'
>>>
Installing Additional Packages
Ok, now that we have Python 3.10.6 and the most difficult package to install on Earth (yep, GDAL), we can just install the additional packages using PIP or CONDA, right? Easy Peasy!
Well, not really. If you go back to the container and try to run these commands are not installed by default. So let's install PIP:
apt-get update
apt-get -y install python3-pip --fix-missing
Now, with pip installed, we can install all the additional packages directly with pip install package1 package2 … . As the container is already isolated, we will skip using virtual environment inside the container and install the packages as root.
Creating a Dockerfile
Now, to make it reproducible in other architectures ( aarch64 , X86_64, etc.) let's create a Dockerfile to wrap up everything. Let's start by creating an empty text file called Dockerfile without any extension (that will make it easier to build the image without specifying the filename).
The first thing in a Dockerfile is to specify the base image with its corresponding tag. So we write:
# Use an official GDAL image as the base image
FROM osgeo/gdal:ubuntu-small-latest
Then we need to install the PIP. For that, we will write a RUN command.
# install pip
RUN apt-get update && apt-get -y install python3-pip --fix-missing
Now, we have two options.
1- concatenate several lines of a pip install command:
# install necessary packages
RUN pip install geopandas rioxarray \
    pystac-client, etc...
2- Or, to make it a little more "elegant" we can write our dependencies inside a requirements.txt file to make things more organized, we can create a requirements.txt file with all the packages we want installed, like so:
geopandas
rasterio
xarray
rioxarray
pystac-client
...
And then, on the Dockerfile, we will copy our requirements.txt to the image, install the packages with pip install --no-cache-dir to purge the leftovers and we are set. The complete Dockerfile will look like this:
# Use an official GDAL image as the base image
FROM osgeo/gdal:ubuntu-small-latest
# install pip
RUN apt-get update && apt-get -y install python3-pip --fix-missing
# Set the working directory in the container
WORKDIR /app
# Copy the requirements.txt file to the container
COPY requirements.txt /app/
# Install the necessary dependencies
RUN pip install --no-cache-dir -r requirements.txt
Building the Image
Now that we have our the requirements.txt and the Dockerfile files saved on the filesystem, we can build the final image with the following command:
docker build -t geospatial_minimal .
To push it to the DockerHub, it is necessary to point it to a repository, like so:
docker tag geospatial_minimal:latest <hub_user>/<hub_repository>:tag
docker push <hub_user>/<hub_repository>:tag
And voilà !!!! This image is available publicly at the address https://hub.docker.com/repository/docker/cordmaur/geospatial_minimal/, and most important, with less than 300Mb (Figure 3). Enjoy it!
Conclusion
By following the steps outlined in this article, we have successfully created a minimal (< 300Mb) and efficient Docker image equipped with all the essential dependencies for geospatial analysis in Python 3.10. This image can now be used on cloud servers to serve geospatial applications, ensuring compatibility and stability in deployment.