GeoAI Unpacked #1: EO Foundation Models

Sep 30, 2024

Welcome to GeoAI Unpacked! I am Ali Ahmadalipour and in this newsletter, I’ll be sharing insights and deep dives in geospatial AI, focusing on business opportunities and industry challenges. Each issue will highlight key advances, real-world applications, and my personal perspective on emerging opportunities. For a sneak peak at the topics I plan to cover in upcoming issues, visit this link.

In this very first issue, we’re exploring Earth Observation (EO) foundation models. Let’s get to it!

1. What’s a foundation model? How is it different from previous AI methods?

Foundation model generally refers to a type of large-scale machine learning model that is trained on extensive datasets and can be fine-tuned for various specific tasks. These models are designed to serve as a robust base from which more specialized models or applications can be developed. One of the most popular foundation models is ChatGPT that uses a large language model (LLM) as its foundation.

A few key differences between foundation models and a typical deep learning model are shown in the following figure:

In summary, foundation models are versatile and general-purpose, whereas conventional deep learning models are specialized neural networks tailored for specific tasks. For instance, a conventional deep learning model like a CNN might be trained specifically for object detection for a specific set of objects (e.g. ships, solar farms, oil tanks, flooded areas, etc.), and introducing a new object class (e.g. wind turbines) may require task-specific tuning from scratch. Notably, foundation models can reduce the need for extensive labeled training data by leveraging pre-training on large, unlabeled datasets and fine-tuning with smaller, task-specific datasets.

Segment Anything (SAM) by Meta AI is one example of a state-of-the-art foundation model for computer vision tasks that can identify and segment any object in an image or video. It's been trained on over 1 billion masks from 11 million images and it is capable of generalizing to new object categories without additional training.

2. What’s specific about EO foundation models?

Foundation models for EO share similar arguments with those used in other domains, but they also face unique challenges due to the complexity and diversity of EO data. Traditional AI models in EO are often trained specifically for a particular type of data or a specific application. For example, a model might be trained exclusively on RGB images for land cover classification or on SAR data for change detection. These models might not generalize well to other types of data without additional training. Whereas EO Foundation models are pre-trained on extensive EO datasets (e.g. different sensors and bands) to handle a range of tasks across various regions and conditions. After pre-training on a broad range of EO data, foundation models can be fine-tuned with smaller, task-specific datasets to adapt to particular applications or regions. This should make them highly versatile and effective for diverse tasks with relatively less additional effort.

A motivation for building EO foundation models is to improve generalizability with respect to two dimensions:

a. Task generalizability → being able to perform different types of tasks

b. Spatial generalizability → being able to work well across different geographies

3. A practical example: object detection through time

Object detection (aka image segmentation) has evolved significantly from its early days to the present, driven by advancements in algorithms and computing power. The following figure shows a broad overview of how the field has evolved over the years:

Now let’s look at a practical example: detecting oil tankers from satellite imagery. A couple of decades ago, scene segmentation techniques were used to calculate a similarity score, requiring hand-tuned parameters and working only under specific conditions. Later, datasets of thousands of images enabled machine learning models to outperform traditional methods, but these still required extensive feature engineering and struggled with complex scenes. Then came the deep learning era, with CNNs leveraging supervised learning. AlexNet (2012) sparked the deep learning revolution by demonstrating that model depth was key to its outstanding performance at the time.

Although deep learning (DL) excels at specific object detection tasks, DL models are still trained separately for different tasks or regions, making it ineffective to apply one model across different tasks or locations. For instance, when detecting oil tankers from satellite data, the background (sea water) can have a different color in various regions and the sunshine / fog / waves can substantially impair the model performance. Foundation models, however, are trained on millions of scenes (not just about the object that we care about but about many many other objects too) and can generalize well across various objects, reducing false detections. Therefore, it seems to be possible to train one foundation model that can be used for a multitude of tasks such as crop type detection, flood monitoring, and land use change tracking.

4. Ecosystem and marketplace

Currently, there aren’t many startups that are working on EO foundation models as it requires substantial computational resources, which can be quite costly (before even getting to a decent performance). IBM and NASA have been collaborating on this for over a year, and they recently open sourced the largest geospatial AI foundation model. Clay is a non-profit startup founded by some of the former Microsoft Planetary Computer builders, which is focusing on developing open source models using open data. There are many other organizations that are sponsoring these efforts and a collaborative approach seems to be quite interesting to the community. In addition, there’s a lot of ongoing research on EO foundation models and this Github repo is a fantastic resource that has compiled various types of EO foundation models and benchmarks.

5. Opportunities

Here are a few potential use cases that current deep learning approaches still face challenges, and EO foundation models may (may) outperform them and become quite useful:

Change detection → for defense and intelligence use cases & environment
Flood mapping → quite challenging for current computer vision models since the water color can be different (and muddy), background color is different (e.g. desert vs. forest vs. concrete/asphalt), and it’s difficult to distinguish between permanent water vs. flood water
Precision agriculture → crop health monitoring, crop type detection, and crop yield prediction
Cloud masking → still challenging to distinguish between cloud, ice, water, sand, or soil (depending on the color of the clouds) and difficult to generalize

The list can go on and it’s fair to assume that EO foundation models can basically target all applications that the EO industry is currently working on (and will hopefully outperform the current approaches). Undoubtedly, new markets will emerge as advanced solutions drive greater adoption of Earth Observation across various applications, significantly expanding the addressable market size.

This article has mainly focused on EO foundation models, but there’s potential for a more holistic geospatial foundation model that not only uses satellite data but also integrates climate and socio-economic data as well as language/text as inputs. Such a model can leverage cross domain learning and become useful for semantic contextualization or risk assessment, and may become capable of reasoning and Q&A purposes too. In addition, foundation models are now being developed specifically for certain use cases such as weather and climate forecasting. A recent example came from IBM and NASA, and I'll dive deeper into AI-driven weather forecasting in a future issue.

6. Challenges

Despite their transformative potential, EO foundation models come with several challenges. These challenges can include technical, operational, and business-related factors that should be addressed to effectively leverage foundation models.

6.1. Technical challenges

Data availability and quality

Foundation models require vast and diverse datasets to generalize well across tasks and regions. Acquiring various satellite images (e.g. high-resolution, multi-spectral, or SAR data) can be costly and logistically difficult. In addition, some types of data or geographic regions are underrepresented in available datasets, leading to imbalanced training data that can limit model performance in certain cases. Lastly, while foundation models can leverage self-supervised learning to some extent, labeled data (and human feedback) is still essential for fine-tuning, and annotating satellite imagery at scale is expensive and time-consuming.

It is to be noted that the rapid progress in AI (especially in computer vision) has been partially driven by the development of high-quality benchmark datasets that were widely used by various research teams, and the open-source models (now commonly available on platforms like HuggingFace). To further accelerate the advancement of EO foundation models, it's crucial to prioritize creating new benchmarks and hosting competitions that foster innovation.

Compute and resource constraints

Training foundation models requires immense computational power (often demanding specialized hardware like GPUs or TPUs). This is a significant barrier (specially for startups), as the initial investment in infrastructure may be prohibitive. Moreover, training large models can be energy-intensive, raising concerns about environmental sustainability.

6.2. Operational challenges:

Integration with existing workflows

For many industries, integrating a foundation model into their existing workflows can be complex. Legacy systems in industries like agriculture, insurance, or energy may not be built to handle the output of such complicated models, requiring significant changes in infrastructure and processes.

Interpretability, market adoption, and trust

In EO sectors like agriculture, insurance, or mining, convincing decision-makers to trust AI models (let alone foundation models) over existing simpler approaches can be an uphill battle. Uncertain return on investment (ROI) can aggravate the issue for some businesses, and the high initial costs of developing and deploying foundation models may not justify the potential return.

6.3. Business-related challenges

Foundation models, by nature, aim to generalize across many tasks. Defining a clear business model around a general-purpose capability can be challenging. Customers in different sectors have varying requirements and may not immediately see the value of a generalized solution unless it is tailored to their needs. EO foundation models need a huge upfront investment without knowing if they will provide tangible value and scalable solutions. To address such a challenge, it's beneficial to start small and break the problem down, focusing initially on developing task-oriented foundation models.

7. Final remarks

Earth Observation (EO) foundation models represent a transformative step in geospatial AI and how we analyze satellite imagery at scale. By leveraging large-scale datasets and advanced deep learning architectures, these models have the potential to revolutionize industries like agriculture, climate monitoring, or insurance through improved generalization across domains and regions. However, challenges such as data availability, computational costs, and integration with existing workflows must be addressed for broader adoption.

Looking ahead, future developments in EO foundation models may focus on improving real-time capabilities, generalizing across even more diverse tasks, and enhancing model interpretability. As the field evolves, collaboration between industry, academia, and government agencies is crucial to ensure that these models not only achieve technical advancements but also provide tangible, actionable insights to address global challenges like climate change, resource management, and disaster response.

GeoAI Unpacked