From Copernicus Big Data to Extreme Earth Analytics

Rapidly increasing volumes of diverse data from distributed sources create challenges for extracting valuable knowledge and commercial value from data. This calls for novel methods, approaches and engineering paradigms in analytics and data management. As the success will require not only efficient data processing and management, but also sufficient computing capacity and connectivity, a coordinated action with all related areas is necessary and will contribute to a European leadership in these areas.

Specific Challenge in Big Data technologies and extreme-scale analytics
ExtremeEarth is a H2020 proposal in the area of ICT addressing the call
ICT-12-2018-2020: Big Data technologies and extreme-scale analytics.

ExtremeEarth addresses this challenge directly since it concentrates on Copernicus data, probably the most paradigmatic case of big open data in the European Union today. The extraction of knowledge and commercial value from Copernicus data is of utmost importance for the European industry and society. ExtremeEarth addresses this issue explicitly through its two use cases that bring together the Food Security and Polar TEPs. Knowledge extraction is done through deep learning techniques that work at the extreme scale of data expected in Copernicus. ExtremeEarth also addresses the management of the information and knowledge extracted from Copernicus data using Semantic Web and Linked Data technologies. The implementation of these technologies in the European Hopsworks data platform will enable the computation of extreme geospatial analytics on top of Copernicus information and knowledge.

ExtremeEarth is based on state-of-the-art technologies from the research areas of Remote Sensing, Deep Learning, Big Data, Distributed Systems, Semantic Web and Linked Geospatial Data. Existing implementations of these technologies by project partners will be re-engineered so that they scale to the big data, information, knowledge and extreme earth analytics of the Copernicus setting. A major role in going beyond the current state of the art in all these areas, especially when addressing data volume and scale-out deep learning, will be played by the Hopsworks Data Platform of partners KTH and Logical Clocks. Hopsworks is a platform for managing data, compute and GPUs in a data centre setting. Hopsworks can scale to store an order of magnitude more data than existing Hadoop clusters and works in harmony with other open source big data and deep learning systems such as Spark, TensorFlow, Keras, TensorFlowOnSpark and Horovod.