The standard way for earth observation experts or users to retrieve images from image archives (e.g., ESA’s Copernicus Open Access Hub) is to use a graphical user interface, where they can select the geographical area of the image they are interested in. To enhance the end-user capability, AI4Copernicus researchers have developed a question-answering engine namely EarthQA, that receives input from a question expressed in natural language (in English) that asks for satellite images satisfying certain criteria and returns links to such datasets, which can be then downloaded from the CREODIAS cloud platform.

The question answering engine will be aimed at users of Earth Observation data, and will enable them to discover datasets by posing natural language questions (like these posed to search engines like Google).

Therefore we are currently collecting interesting questions from users like you and we kindly ask for your help to provide us with 10 questions that may be posed to a system like the Copernicus Open Access Hub, one of the five DIASes, or any other Earth Observation data portal, with the intention of discovering an EO dataset of interest to you.


Users can specify some other metadata, such as sensing period, satellite platform and cloud cover. To answer user questions, EarthQA queries two interlinked knowledge graphs: a knowledge graph encoding metadata of satellite images from the CREODIAS cloud platform (the SPARQL endpoint of CREODIAS) and the well-known knowledge graph DBpedia. Hence, the questions can refer to image metadata (e.g., satellite platform, sensing period, cloud cover), but also to geospatial entities appearing in DBpedia knowledge graph (e.g., lake, Greece). In this way, the users can ask questions like “Find all Sentinel-1 GRD images taken during October 2021 that show large lakes in Greece having an area greater than 100 square kilometers”.

EarthQA follows a template-based approach to translate natural language questions into formal queries (GeoSPARQL). Initially, it decomposes the user question by generating its dependency parse tree and then automatically disambiguates the components appearing in the question to elements of the two knowledge graphs.  In particular, it automatically identifies the spatial or temporal entities (e.g., “Greece”, “October 2021”), concepts (e.g., “lake”), spatial or temporal relations (e.g., “in”, “during”), properties (e.g., “area”) and product types (e.g., “Sentinel-1 GRD”) and other metadata (e.g., “cloud cover below 10%”) mentioned in the question and maps them to the respective elements appearing in the two knowledge graphs (dbr:Greece, dbo:Lake, dbp:area, etc). After this, the GeoSPARQL query is automatically generated.