The ExtremeEarth
project

The data and information processed and disseminated puts Copernicus at the forefront of the big data paradigm, giving rise to all relevant challenges, the so-called 5 Vs: volume, velocity, variety, veracity and value. Two important activities related to Copernicus are the thematic exploitation platforms (TEPs) and the Data and Information Access Services (DIAS).

Although the TEPs and DIAS have been welcomed by the EO data user community, they both have a disadvantage: they target users that are experts in EO data and technologies, and ignore the myriad of software developers that might not be experts in EO but still have a lot to gain by integrating EO data in their applications. Therefore, opening up the TEPs and DIASs by extracting information and knowledge hidden in the data, publishing this information and knowledge using linked data technologies, and interlinking it with data in other TEPs and DIASs and other non-EO data, information and knowledge can be an important way of making the development of downstream applications easy for both EO and non-EO experts.

Contrary to multimedia images, for which highly scalable Artificial Intelligence techniques based on deep neural network architectures have been developed by big North American companies such as Google and Facebook recently, similar architectures for satellite images, that can manage the extreme scale and characteristics of Copernicus data, do not exist today. The deep neural network architectures can classify effectively and efficiently multimedia images because they have been trained using extremely large benchmark datasets consisting of millions of images (e.g., ImageNet) and have utilized the power of big data, cloud and GPU technologies. Training datasets consisting of millions of data samples in the Copernicus context do not exist today and published deep learning architectures for Copernicus satellite images typically run using one GPU and do not take advantage of recent advances like distributed scale-out deep learning.

The main objective of ExtremeEarth is to go beyond the four projects mentioned above by developing extreme Earth analytics techniques and technologies that scale to the PBs of big Copernicus data, information and knowledge, and applying these technologies in two of the ESA TEPs: Food Security and Polar.

The technologies to be developed will extend the HOPS data platform to offer unprecedented scalability to extreme data volumes and scale-out distributed deep learning for Copernicus data. The extended HOPS data platform will run on CREODIAS and will be available as open source to enable its adoption by the strong European Earth Observation downstream services industry.

Food Security
Use Case

Due to a changing earth and a growing population, Food Security is one of the most challenging issues of this century. Biomass production and yield will need to be increased, but especially the risk of yield loss under the extreme environmental conditions need to be minimized.

Within the first month of the project, the user requirements for water availability information and irrigation recommendations had been analysed. Planned services, regarding water availability information layer and crop status information, are now designed and implemented using existing and new methods of EO analytics and model applications.

Technical infrastructure, regarding EO data pre-processing, product exchange and the intended provision and dissemination is supported by the Food Security TEP. Providing scalable processing infrastructure, ready-to-use EO products, wide options of new service developments and business models, will help to bring the developments within the Food Security use case to a wider community.

Polar
Use Case

The Polar Regions play a critical role in regulating and driving the Earth's climate and ecological systems, and are currently experiencing significant change. New economic opportunities are resulting in increased attention and vessel traffic, leading to growing global interest both politically and economically.

During the first few months of the ExtremeEarth project we have analysed the user requirements within the Polar use case. Stakeholders and users were invited to a workshop in March where we discussed in detail their requirements for ice charting products. This information was combined with a number of previous reports which have considered the key requirements for ice charts.

The Polar TEP has recently been transferred to the CREODIAS infrastructure. This is the same infrastructure used to run Hopworks, the platform on which we will develop and run our machine learning applications and pipelines. Co-locating these two systems will streamline future development of communications between them.

Deep
Learning

Polar

Several deep learning structures have been designed, implemented and tested. Moreover, two specific application scenarios have been addressed: the detection and discrimination of icebergs and ships, and the identification of the sea ice edge. The experimental results show how our approaches can actually provide good accuracy in characterizing the proposed test scenarios.

The Polar use case of ExtremeEarth will design and implement deep learning architectures that can extract reliable and accurate information about sea ice and polar phenomena from large scale datasets, eventually including multiple sensors. Therefore, deep learning architectures for sea ice characterization must make the best use of the training data that is available, while guaranteeing the efficiency of the analysis system.

Food Security

To face the lack of labeled training datasets, a large database of weak labeled samples will be extracted from uncertain and obsolete crop type maps available at the country level in an unsupervised way. The system generates annual crop type mas and crop boundary maps by using the large database of the labelled sample automatically extracted from the existing thematic products. The training database will be made publicly available.

The ExtremeEarth project aims to develop a deep learning network architecture tailored to the specific spatial, spectral, temporal properties of Sentinel-2 images. Although deep learning architectures typically outperform standard machine learning classification systems, most of the deep systems for remote sensing focus on very high-resolution optical images. Little ad-hoc architectures proposed for Sentinel-2 data are typically trained on a relatively small number of samples and tested on few benchmark images without assessing their real generalization ability.

ExtremeEarth
Infrastructure

The deep learning architectures and techniques discussed so far will be implemented on Hopsworks, a horizontally scalable platform for Data Intensive Artificial Intelligence.

Hopsworks provides an integrated platform for managing the entire lifecycle of data as well as developing machine learning applications and pipelines. Hopsworks offers the world's most scalable Hadoop distribution and its unique metadata architecture HopsFS.

In the ExtremeEarth project, deep analysis is done with the deep learning architectures that will be engineered using Hopsworks so that they scale to big Copernicus data. Hopsworks provides services to move the processing to the data and is based on a Cloud Computing Platform-as-a-Service approach.

Once the ExtremeEarth technologies are integrated in Hopsworks, they will be deployed in the two TEPs and the selected DIAS (CREODIAS). Copernicus data will also be made available in the same environment and will be used to develop the ExtremeEarth technologies.

Linked Data
Tools

The result of the deep learning techniques for satellite image analysis will be geospatial information, encoding knowledge about the domain of each of the two use cases of ExtremeEarth. ExtremeEarth will follow the linked data paradigm in order to allow users to extract the expected value from this knowledge by accessing semantic information, interlinking this information with other available open linked data sources and publishing this knowledge in order to be reused.

GeoTriples supports the automatic transformation of geospatial data from various formats into RDF using Semantic Web standards. GeoTriples has been extended and deployed on HOPS platform, enabling users to perform transformation of big geospatial data into RDF at the extreme scales of the Copernicus paradigm.

JedAI is based on the meta-blocking technique for entity resolution with the ability to discover geospatial relationships among resources in geospatial RDF store as pioneered by UoA and implemented in Silk and Radon tools.

Strabon is an open-source spatiotemporal RDF store. In ExtremeEarth a new cloud-based version of Strabon is under development, aiming to scale to PBs of data in the HOPS platform.

SemaGrow is a data federation engine that facilitates the unified access of geospatial and relational data. Semagrow will be adapted and extended and the resulting component will be used to federate geospatial data sources residing in the new cloud-based Strabon implementation and other geospatial data servers.

1st User Community
Workshop

On March 18 and 19 2019 a dedicated workshop for the ExtremeEarth project was planned and conducted in Munich, Germany. It brought together both the Polar and the Food Security communities in the 1st User Workshop.

For the Food Security use case, ExtremeEarth partners discussed the expectations and needs of potential users regarding the development of wide-scale water availability maps for sustainable agricultural production.

For the Polar use case, ExtremeEarth partners came together with the ice-charting community to map out the future of deep learning applied to operational sea ice and iceberg analysis. Several key stakeholders presented their own areas of interest in relation to the Polar use case.

Copyright AI TEAM 2019