Potentials and Benefits of

Creating Knowledge out of Interlinked Data Open Data Strategy for Europe (Reference: IP/11/1524): • expected to deliver a €40 billion boost to the EU'...

0 downloads 526 Views 4MB Size
Potentials and Benefits of Linked Open Government Data Dr. Sören Auer

Fraunhofer-Gesellschaft, the largest organization for applied research in Europe  67 institutes and research units  More than 23,000 staff  €2 billion annual research budget totaling  Roughly two thirds of this sum is generated through contract research on behalf of industry and publicly funded research projects  Roughly one third is contributed by the German federal and Länder governments in the form of base funding  International cooperation

“Fraunhofer lines”

© Fraunhofer

Joseph von Fraunhofer (1787 – 1826) Researcher  Discovery of the “Fraunhofer lines” in the solar spectrum Inventor

 New methods for processing lenses Entrepreneur  Director and partner in a glassworks

© Deutsches Museum

© Fraunhofer-Gesellschaft

“Fraunhofer lines”

© Fraunhofer

Joseph von Fraunhofer

Discovery of the “Fraunhofer lines” in the solar spectrum

New methods for processing lenses

Director and partner in a glassworks

© Fraunhofer

The FraunhoferGesellschaft

Researcher

Inventor

Entrepreneur

Research and development on behalf of industry and state

mp3 music format, white LED, highresolution thermal camera

Research volume: approx. €2 billion annually

The Fraunhofer-Gesellschaft Locations in Germany

Rostock Itzehoe Lübeck Bremerhaven

Oldenburg

Hamburg

Bremen

Wolfsburg

 67 institutes and research units  more than 23,000 staff

Berlin

Hannover

Potsdam-Golm

Wildau Teltow

Braunschweig Magdeburg

Münster Lemgo Goslar

Gelsenkirchen Oberhausen

Dortmund

Paderborn

Schkopau

Duisburg

Kassel

Köln Euskirchen Aachen

Cottbus Halle

Göttingen

Bonn Sankt Augustin

Leipzig Dresden Moritzburg

Leuna

Schmallenberg

Erfurt

Freiberg

Jena Hermsdorf

Gießen

Zittau Chemnitz

Wachtberg Ilmenau Remagen Coburg

Frankfurt

Alzenau Aschaffenburg Würzburg

Darmstadt Sulzbach

Mannheim

St. Ingbert

Kaiserslautern

Saarbrücken

Wertheim/ Bronnbach

Bayreuth Sulzbach-Rosenberg Erlangen Nürnberg, Fürth Regensburg

Karlsruhe Pfinztal Ettlingen

Deggendorf Straubing

Stuttgart Esslingen

Institute/research unit Other research unit Headquarter

© Fraunhofer

Augsburg Weßling Freiburg Kandern Efringen-Kirchen

Freising München, Garching Rosenheim Prien Holzkirchen

Fraunhofer worldwide

Glasgow

Vancouver London Boston East Lansing San José Cambridge Plymouth Storrs Maryland Newark

Gothenburg

Dublin Southampton Wrocław Brussels Vienna Paris Budapest Bolzano Graz Porto

Beijing

Thessaloniki

Seoul

Sendai Tokyo

Jerusalem Cairo Dubai Bangalore

Ampang Singapore Jakarta

Salvador Campinas São Paulo Santiago de Chile

     

Subsidiary Center Project Center ICON / Strategic Cooperation Representative / Marketing Office Senior Advisor

© Fraunhofer

Stellenbosch

Sydney

The Web evolves into a Web of Data Linked Open Data

Facebook Open Graph

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

7

Creating Know le dge out of Interlinked Data

What is Linked Open Gov Data? Linked Open Governement Data Linked Data

Creating Know le dge out of Interlinked Data

We need an Open Government Data Ecosystem

Business • Startups • SMEs • Large companies

Civil society • Grass-roots initiatives • NGOs • Pluralism

Research • Science • Collaborative projects

Politics • On different levels: (supra-) national, regional, municipal

Creating Know le dge out of Interlinked Data

Potential of Open Data

Open Data Strategy for Europe (Reference: IP/11/1524): • expected to deliver a €40 billion boost to the EU's economy each year • make it easier for businesses to obtain access and permission to reuse government-held information Data becomes a tradable commodity, open data is on the rise • “the value of data increases the more you share it” • “Data is the new oil”

Creating Know le dge out of Interlinked Data

Example US Weather Data

Opening US weather data [1] led to • gross receipts by commercial weather industry of USD $400700 million a year with • 400 firms employing 4,000 people Europe had a similar sized economy but closed weather data: • Only 30 firms with 300 employees and receipts of USD $30m50m a year [1] http://www.nap.edu/openbook.php?record_id=10610&page=23

Creating Know le dge out of Interlinked Data

Examples in Spain and UK

Spain: in-country business volume directly associated with open data released by national government was [1] • €550-650 million (USD $669-791 million) • 5,000 and 5,500 employees were directly assigned to activities related to re-using information. UK: Centre for Economics and Business Research says [2] • 58.000 new jobs will be created between now and 2017 for UK connected with the growing data economy. • Deloitte gives advice for growing data economy in UK: Open data - driving growth, ingenuity and innovation, White Paper, Deloitte Analytics, London, 2012 (PDF) [1] http://www.ontsi.red.es/ontsi/en/estudios-informes/characterization-study-infomediary-sectorjune-2011 [2] http://www.sas.com/reg/gen/uk/data-equity

Creating Know le dge out of Interlinked Data

Potential of Open Data

©Nigel Holmes 2012 / from The Human Face of Big Data

• Open Data Strategy (Ref: IP/11/1524) expected to deliver €40 billion boost to the EU's economy each year • Data becomes a tradable commodity, “the value of data increases the more you share it”

Creating Know le dge out of Interlinked Data

Open Budget Data

• Let citizens build and experiment with apps visualizing governmental spending – leverage the creative potential • Reach better societal agreement • Identify overspending, potential for savings, misuse

Creating Know le dge out of Interlinked Data

Open Educational Data

• Identify excellence/ underperformance in education • Foster competition • Create transparency • Make education more efficient and effective

Similar strategy can be pursued with • health care (physicians, hospitals), • public administrations • Environment • …………………………………………………

Creating Know le dge out of Interlinked Data

Open Transportation Data – Increase Efficiency of Public Transport

• It is hard for public transportation agencies to develop the best user interfaces • Why not letting citizens and companies solve this?

Creating Know le dge out of Interlinked Data

What has to be done?

• Publish Open Data in RDF reusing vocabularies

which can be understood and combined by apps in

Where we should be

unforeseen ways

link your data

Where we are now use URIs to denote things use non-proprietary formats (e.g., CSV instead of Excel) make it available as structured data (e.g., Excel instead of image scan of a table) make your stuff available on the Web (whatever format) under an open license

Creating Know le dge out of Interlinked Data

Publishing Data about Kindergartens in XML (1)



Seven Dwarfs ...

...


Creating Know le dge out of Interlinked Data

Publishing Data about Kindergartens in XML (2)



... ...
...


Creating Know le dge out of Interlinked Data

Publishing Data about Kindergartens in XML (3)


address=„...“> . . .



Creating Know le dge out of Interlinked Data

Problem of XML





Seven Dwarfs

...

...

...

...



. . .

...


Syntactic heterogenity – different trees

Semantic heterogenity – different tags and attributes (e.g. kindergarten, child_care,

daycare)

Creating Know le dge out of Interlinked Data

Maybe CSV helps?

Kindergarten

Location

Description



Seven Dwarfs

Rosentalgasse 9, 04105













Child_care

street

Zip

text

Seven Dwarfs

Rosentalgasse

04105











Type

Name

Location

Features

Daycare

Seven Dwarfs

42.052384|13.2736 79











Creating Know le dge out of Interlinked Data

A nightmare …

Imagine you have 10.000 open data files describing child care from communities all over Europe all in different XML, CSV, Excel, JSON, … formats And then you want to look into polution, road congestion, health care, …

Creating Know le dge out of Interlinked Data

Linked Data can help

Linked Data in a Nutshell Creating Know le dge out of Interlinked Data

1. Uses RDF Data Model NIA Subject

organizes

starts

30.4.2013

takesPlaceIn

Seoul

FutureTec-Conf

Predicate

Object

2. Is serialised in triples: NIA FutureTech-Conf FutureTech-Conf

organizes starts takesPlaceAt

3. Uses Content-negotiation

FutureTech-Conf . “20130324”^^xsd:date . Seoul .

The emerging Web of Data Creating Know le dge out of Interlinked Data

2007

20082008 2008

2008

2009

2009

2010

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.

Creating Know le dge out of Interlinked Data

How can we support Linked Government Data Publishing and Integration?

Creating Know le dge out of Interlinked Data

Creating Knowledge out of Interlinked Data

Pan-European Data Portal PublicData.eu publicdata.eu - find and reuse >20,000 datasets from 30 regional and national data catalogs across Europe from a single place (incl. Data.kk.dk, opengov.se, http://ckan.hri.fi, http://lt.ckan.net, digitaliser.dk)

For:  Data literate citizens  Data journalists  Policy experts  Decision makers  Mobile & web developers  Academics / researchers

 Public bodies  Companies  Civic society / NGOs  ...

 Exchange of metadata between different data catalogues  Aggregate datasets from existing data catalogues  Creating a European community of reusers to improve metadata  Creating mechanisms for capturing derived / related datasets  Bridge language and topical gaps to associate related information

EU-FP7 LOD2 Project Overview .

Page 32

http://lod2.eu

Creating Knowledge out of Interlinked Data

PublicData.eu CSV Statistics ● ● ● ● ● ●

Format

31,0% - Not Specified 24,0% - CSV 11,6% - Spreadsheets 10,9% - Text 9,7% - Geo Data 1,7% - RDF

Total

137470

N/A

42758

CSV

33235

Spreadsheet

16019

Text

15014

Geographical

13360

Other (machine-readable)

9372

Archives

2455

RDF

2373

Other

1777

Rich text documents (Word)

EU-FP7 LOD2 – 11.-12.9.2014.

Page 33

Number of resources

230

Databases

59

APIs

40

PDF

4 http://lod2.eu

Creating Know le dge out of Interlinked Data

How can we fix Open Data?

We need: •

Standard formats, which preserve semantic: RDF



Reuse vocabularies



Visualizatuion widgets, mashups, apps, which can make sense out of those vocabularies

Creating Know le dge out of Interlinked Data

7 Dwarfs in RDF

Seven_Dwarfs

rdf:type

Kindergarten

Seven_Dwarfs

rdfs:label

„Seven Dwarfs“

Seven_Dwarfs

foaf:location

„Rosentalgasse 9“

Seven_Dwarfs

rdfs:description

„...“

... Different Kindergarten descriptions also might look different, but there will be definitely less variety than with XML or CSV You can mix and mesh different vocabularies (RDF, RDFS, FOAF)

More information can be added without destroying the data structure

Creating Know le dge out of Interlinked Data

What has to be done?

• Publish Open Data in RDF reusing vocabularies which can

be understood and combined by apps in unforeseen ways (e.g. visualization widgets) link your data

Where we should be Where we are now

use URIs to denote things use non-proprietary formats (e.g., CSV instead of Excel) make it available as structured data (e.g., Excel instead of image scan of a table) make your stuff available on the Web (whatever format) under an open license

Creating Know le dge out of Interlinked Data

How can we lift Open Data to Linked Open Data?

Creating Know le dge out of Interlinked Data

All CSV on PublicData.eu is transformed in RDF

Creating Know le dge out of Interlinked Data

CSV2RDF Confersion at PublicData.eu

Creating Know le dge out of Interlinked Data

Mapping Wiki

• Automatic CSV to RDF transformation won‘t render good results • Mappings Wiki enables the crowdsourcing of mappings

Creating Know le dge out of Interlinked Data

CSV2RDF Mapping Syntax 1 {{CSV2RDFHeader}} 2 3 ... 4 5 {{RelCSV2RDF 6 | name = default-mapping 7 | header = 1 8 | omitRows = -1 9 | omitCols = -1 10 | delimiter = 11 | col1 = Department Family 12 | col2 = Entity 13 | col3 = Payment Date^^xsd:date 14 | col4 = rdf:type 15 | col5 = Cost Centre Name 16 | col6 = Supplier 17 | col7 = Transaction No. 18 | col8 = Line Amount 19 | col9 = Invoice Total^^xsd:decimal 20 }}

Creating Know le dge out of Interlinked Data

How can we make this happen?

Open Datasets

Data Portal

Exploration Widgets

SemMap OntoWiki Domain specific Spatial faceted- Faceted- Statistical Entity-/faceted… … browsing visualization visualizations browsing Based browsing

• Dataset analysis (size, vocabularies, properties) • Selection of suitable visualization widgets

Creating Kn o w le d ge out of Interlinked Data

Browsing Statistical Data with CubeViz

Creating Kn o w le d ge out of Interlinked Data

Browsing Spatial Data with Facete

Towards Linked Data Value Chains

Data Value Chains today

Data Value Chains tomorrow

Extraction

Curation

Extraction, Curation, Quality, Linking, Integration, Publication, Visualization, Analysis

Extraction, Curation

Quality

Quality, Linking, Integration

Linking

Integration

Increased Specialization Separation of concerns More interdisciplinarity

Publication, Visualization, Analysis

Publication

Visualization

Analysis

Cortex – a semantic digital library search backend

Semantic Search Technology IAIS-Cortex – Heart of the Deutsche Digitale Bibliothek • Flexible Data Integration (Import) • Powerful toolbox for aggregation of heterogenous data sources • >2.000 Data Providers

• Reliable Data Management • Based on filesystem or cloud technology • >10.000 concurrent users

• Individual Access • Powerful semantic search and performant access of objects

47

Try it out – www.ddb.de

Jochen Schon

48

Further Cortex Deployments •Memobase+ • about 30 data providers integrated (up to 100) • about 100.000 objects online • Launched 10.2012 •E-Varamu (Estonia) currently under development

Creating Know le dge out of Interlinked Data

How can we create an ecosystem of (Linked) Open Gov Data publishing and value-added services

Creating Knowledge out of Interlinked Data

UK Open Data Institute

Disclaimer: The following slides include material from Nigel Shadbolt, Tim Berners-Lee and http://theodi.org

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 53

http://lod2.eu

Creating Knowledge out of Interlinked Data

„ Government will provide up to £10 million over five years, with match funding

from industry and academia, to establish the world’s first Open Data Institute to help business exploit the opportunities created by release of public data“ “to realise significant economic benefits by enabling businesses and non-profit organisations to build innovative applications and websites using public data.” David Cameron (UK Prime Minister), May 2010 “Our ambition is to become the world leader in open data” George Osborne (UK Chancelor), May 2011

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 54

http://lod2.eu

Creating Knowledge out of Interlinked Data

Open Government Data Delivers

Advantages and benefits of open data include: • • • • • • • •

Transparency Accountability Efficiency Public Service Delivery Engagement Data Improvement Societal*value Economic value

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 55

http://lod2.eu

Creating Knowledge out of Interlinked Data

Open Government Data Successes • National, Regional and City portals launched • Significant data sets released • Public Data Principles • Open Licenses • Open consultations • International collaboration

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 56

http://lod2.eu

Creating Knowledge out of Interlinked Data

Open Data …applications follow

By opening data companies, organizations and interested citizens are able to build applications and mashups on to of the data.

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 57

http://lod2.eu

Creating Knowledge out of Interlinked Data

“The vision is to establish the Open Data institute as a world leading centre to innovate, exploit and research the opportunities for the UK created by the Government’s Open Data policy.” • • • • • • •

Business Innovation Training the Open Data Generation Public Sector Innovation Researching Open Data Open Data Standards and Policies Advisor to Government International Collaboration

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 58

http://lod2.eu

Creating Knowledge out of Interlinked Data

How will the ODI work The Open Data Institute consists of four clusters, the Research and academic cluster, the public sector cluster, the international agencies cluster and the business cluster. The aim of the clusters is to provide an environment to concrete projects with the respective stakeholders.

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 59

http://lod2.eu

Interlinking/ Fusing Creating Know le dge out of Interlinked Data

Manual revision/ authoring

Storage/ Querying

Classification/ Enrichment

LOD Lifecycle supported by Debian based LOD2 Stack

Quality Analysis

http://stack.lod2.eu Evolution / Repair

Extraction Search/ Browsing/ Exploration

Creating Knowledge out of Interlinked Data

Facts: 14 partners, Budget 10,2 M € Consortium - 14 partners from 11 EU countries + 1 Korea Universität Leipzig (Coordinator), Germany

Centrum Wiskunde & Informatica, The Netherlands

National University of Ireland in Galway, Ireland

Freie Universität Berlin , Germany

OpenLink Software, UK

Semantic Web Company, Austria

TenForce, Belgium

Exalead, France

Wolters Kluwer Deutschland, Germany

Open Knowledge Foundation, UK

Vysoka Škola Ekonomická v Praze, Czech Republic

Zemanta d.o.o., Slovenia

Instytut Informatyki Gospodarczej, Poland

Institut Mihajlo Pupin, Serbia

Korea Advanced Institute of Science and Technology, South Korea

EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 61

http://lod2.eu

Creating Knowledge out of Interlinked Data

Summary: Benefits of Linked Data Service-orientation to increase “customer” satisfaction (businesses, citizens) • Linked Data and apps, visualizations, mashups build on top help businesses and citizens to build value-added services Intra-government collaboration for efficient/effective government • Linked Data helps establishing a pay-as-you-go data integration ecosystem Increased transparency • Standardized vocabularies allow to quickly compare data from different sources and do fact checking

Not using the chances arising from the interlinking of government data puts the development of our data economy and the realization of associated positive societal and economical effects at risk. EU-FP7 LOD2 Project Overview . 02.09.2010 . Page 62

http://lod2.eu

Creating Knowledge out of Interlinked Data

Contact Address

Coordinator

University of Leipzig Faculty of Mathematics and Computer Science Institute of Computer Science Department of Business Information Systems

Dr. Sören Auer Scientific Project Leader Phone: +49 (341) 97-32367 Fax: +49 (341) 97-32329 Email: [email protected] http://www.informatik.unileipzig.de/~auer

Postfach 100920 04009 Leipzig Germany

Thanks for your attention! LOD2 Title . 02.09.2010 . Page 63

http://lod2.eu http://lod2.eu