IDEAA: Issue-Driven European Arena Analytics
|This project has received funding from the European Commission Union’s Horizon 2020 research and innovation programme under Grant agreement No. 800192.|
The modern world witnesses an explosion of digital content which leads basically all interactions taking place in society to be either directly digital, or to be captured/recorded in digital fashion. A good example is provided by the EU: the European Data Portal, managed by the Publications Office of the EU, provides more than 10.000 datasets produced by EU institutions and bodies (data about elected officials, regulations and EU law, etc.) and harvests more than 750.000 datasets of Public Sector Information available on public data portals across European countries (data about environment, economy, finance, etc.). Data are publicly available and “are free to use and reuse for commercial or non-commercial purposes”. The portal also supports linked data by providing ametadata SPARQL endpoint and, among the available sources, more than 20.000 datasets provide RDF data.
This leads simultaneously to opportunities for EU citizens to witness, analyse and monitor the functioning of their representative institutions, and tremendous challenges in making sense of the data, in particular by interpreting them in the context of individual’s or specific communities’ needs. For instance, a citizen interested in a certain economic sector or societal topic such as “undergraduate education” has no means to easily follow EU institution activity on that issue and compare it with activity and regulations in one’s country. A lot of public information is available (e.g. the EU Data Portal alone provides more than 13.000 datasets about different aspects of “education” in different countries, 22 specifically about “undergraduate education”), however, even if citizens follow diligently all the public information, acquiring a comprehensive picture and viewpoint of the situation and understanding the consequences, implications and correlations of this issue with respect to a bigger (European) arena is very difficult. Further, citizens are easily confused or “lost”in the huge amount of possibly conflicting data. This lack of knowledge in an abundance of content hinders individuals’ full participation to modern society.
IDEAA (Issue-Driven European Arena Analytics) wants to allow citizens to easily explore the trove of publicly available data with the aim of building a viewpoint on specific issues. Its main strengths are: (feature F1) supply users with succinct and meaningful knowledge with respect to the issue they are interested in; (F2) allow users to interact with the provided knowledge to refine their information need and advance understanding; (F3) suggest interesting or unexpected aspects in the data and (F4) support the comparison of knowledge discovered from different data sources. IDEAA is inspired by human-to-human dialogues, where questions are explorative, possibly imprecise, and answers may be a bit inaccurate but suggestive, conveying an idea that stimulates the interlocutor to further questions.
Image Alice is moving to Paris for work and is looking for a good neighborhood to live in. There are too many advertisements about available places, however, with time, patience and determination she consults a great deal of them and notices some appealing ones. Nevertheless, before committing, she wants to make sure they are in a safe neighbourhood with good public transportation. She finds several articles about crimes, the map and hours of the underground system, etc.; however, the data are just too many and she is unableto easily take advantage of them. Now, the EU Data Portal provides several datasets about “crimes” and, exploiting them, IDEAA can help Alice to acquire an initial outline of the issue by providing a concise answer such as “crime statistics by crime type, neighbourhood and year” (feat.F1). By interacting with this answer Alice can refine the information considering a specific part of the city (feat.F2), e.g. focusing on the neighbourhoods close to work. At this point Alice might notice that “there are few crimes in her work neighbourhood but the numbers are higher in the surrounding neighbourhoods”, however, it would probably be too hard to see at a glance that “the number of crimes in one particular neighbourhood X is dropping much more and much faster than in any other part of the city” (feat.F3). This insight might lead Alice to discover, e.g., that just one year before, a new police station was built in that neighbourhood. Moreover, it would allow her to take a more informed and conscientious decision, e.g., even though the number of crimes in her work neighbourhood is lower at this moment, given the trend and some time, neighbourhood X will probably be safer. The EU Data Portal also provides several datasets about “public transportation” and, by using IDEAA, Alice will be able to compare the knowledge about the two topics and might discover that “before 2016, the number of crimes and number of people taking the public transportation were both growing slowly, instead, after 2016 they have opposite trends with crimes dropping down and people taking the public transportation growing fast” (feat.F4). This insight could strengthen Alice’s choice or inspire her to ask for more information. Note that, IDEAA guides users to grasp the gist of data, take decisions or just stay up-to-date by exposing properties of the data rather than the data themselves.
IDEAA is focused on semantic data (RDF), which are growing steadily. Standards and technologies for semantic data are sufficiently mature to be used as the foundation of novel data science projects, however, much remains to be done toward providing easy-to-use tools for users seeking to understand and exploit the information comprised in RDF graphs as consolidated approaches for RDF analytics are lacking.