Triplify -- Light-weight Linked Data Publication from Relational Databases

Auer, Sören; Dietzold, Sebastian; Lehmann, Jens; Hellmann, Sebastian; Aumueller, David
Auer, S
Dietzold, S
Lehmann, J
Hellmann, S
Aumueller, D
WWW 2009
Citations range: 

In this paper we present Triplify ­ a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relational database queries. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serializations, in particular as Linked Data. The rationale for developing Triplify is that the largest part of information on the Web is already stored in structured form, often as data contained in relational databases, but usually published by Web applications only as HTML mixing structure, layout and content. In order to reveal the pure structured information behind the current Web, we have implemented Triplify as a light-weight software component, which can be easily integrated into and deployed by the numerous, widely installed Web applications. Our approach includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of configurations for common relational schemata and a REST-enabled data source registry. Triplify configurations containing mappings are provided for many popular Web applications, including osCommerce, WordPress, Drupal, Gallery, and phpBB. We will show that despite its light-weight architecture Triplify is usable to publish very large datasets, such as 160GB of geo data from the OpenStreetMap project. representations is (as we will show in the next section) still outpaced by the growth of traditional Web pages and one might remain skeptical about the potential success of the Semantic Web in general. The missing spark for expanding the Semantic Web is to overcome the chicken-and-egg dilemma between missing semantic representations and search facilities on the Web. In this paper we, therefore, present Triplify ­ an approach to leverage relational representations behind existing Web applications. The vast majority of Web content is already generated by database-driven Web applications. These applications are often implemented in scripting languages such as PHP, Perl, Python, or Ruby. Almost always relational databases are used to store data persistently. The most popular DBMS for Web applications is MySQL. However, the structure and semantics encoded in relational database schemes are unfortunately often neither accessible to Web search engines and mashups nor available for Web data integration. Aiming at a larger deployment of semantic technologies on the Web with Triplify, we specifically intend to: · enable Web developers to publish easily RDF triples, Linked Data, JSON, or CSV from existing Web applications, · offer pre-configured mappings to a variety of popular Web applications such as WordPress, Gallery, and Drupal, · allow data harvesters to retrieve selectively updates to published content without the need to re-crawl unchanged content, · showcase the flexibility and scalability of the approach with the example of 160 GB of semantically annotated geo data that is published from the OpenStreetMap project. The importance of revealing relational data and making it available as RDF and, more recently, as Linked Data [2, 3] has been recognized already. Most notably, Virtuoso RDF views [5, 9] and D2RQ [4] are production-ready tools for generating RDF representations from relational database content. A variety of other approaches has been presented recently (cf. Section 6). Some of them even aim at automating partially the generation of suitable mappings from relations to RDF vocabularies. However, the growth of semantic representations on the Web still lacks sufficient momentum. From our point of view, a major reason for the lack of deployment of these tools and approaches lies in the complexity

Login or register to tag items