Publications
As a researcher, I write publications about my work, which are listed on this page. If you are interested in any of these, feel free to contact me so we can discuss this further.
2024
- Poster Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability ApproachMore
In Proceedings of the 23rd International Semantic Web Conference: Posters and Demos Link Traversal queries face challenges in completeness and long execution time due to the size of the web. Reachability criteria define completeness by restricting the links followed by engines. However, the number of links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the query engine can significantly reduce the number of HTTP requests and the query execution time without sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases. 2024 - Journal Towards Applications on the Decentralized Web using Hypermedia-driven Query EnginesMore
In ACM SIGWEB Newsletter The Web is facing unprecedented challenges related to the control and ownership of data. Due to recent privacy and manipulation scandals caused by the increasing centralization of data on the Web into increasingly fewer large data silos, there is a growing demand for the re-decentralization of the Web. Enforced by user-empowering legislature such as GDPR and CCPA, decentralization initiatives such as Solid are being designed that aim to break personal data out of these silos and give back control to the users. While the ability to choose where and how data is stored provides significant value to users, it leads to major technical challenges due to the massive distribution of data across the Web. Since we cannot expect application developers to take up this burden of coping with all the complexities of handling this data distribution, there is a need for a software layer that abstracts away these complexities, and makes this decentralized data feel as being centralized. Concretely, we propose personal query engines to play the role of such an abstraction layer. In this article, we discuss what the requirements are for such a query-based abstraction layer, what the state of the art is in this area, and what future research is required. Even though many challenges remain to achieve a developer-friendly abstraction for interacting with decentralized data on the Web, the pursuit of it is highly valuable for both end-users that want to be in control of their data and its processing and businesses wanting to enable this at a low development cost. 2024 - Workshop The R3 Metric: Measuring Performance of Link Prioritization during Traversal-based Query ProcessingMore
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management The decentralization envisioned for the current centralized web requires querying approaches capable of accessing multiple small data sources while complying with legal constraints related to personal data, such as licenses and the GDPR. Link Traversal-based Query Processing (LTQP) is a querying approach designed for highly decentralized environments that satisfies these legal requirements. An important optimization avenue in LTQP is the order in which links are dereferenced, which involves prioritizing links to query-relevant documents. However, assessing and comparing the algorithmic performance of these systems is challenging due to various compounding factors during query execution. Therefore, researchers need an implementation-agnostic and deterministic metric that accurately measures the marginal effectiveness of link prioritization algorithms in LTQP engines. In this paper, we motivate the need for accurately measuring link prioritization performance, define and test such a metric, and outline the challenges and potential extensions of the proposed metric. Our findings show that the proposed metric highlights differences in link prioritization performance depending on the queried data fragmentation strategy. The proposed metric allows for evaluating link prioritization performance and enables easily assessing the effectiveness of future link prioritization algorithms. 2024 - Workshop Opportunities for Shape-based Optimization of Link Traversal QueriesMore
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes. 2024 - Conference Decentralized Search over Personal Online Datastores: Architecture and Performance EvaluationMore
- Mohamed Ragab0
- Yury Savateev1
- Helen Oliver2
- Thanassis Tiropanis3
- Alex Poulovassilis4
- Adriane Chapman5
- Ruben Taelman6
- George Roussos7
- Poster Observations on Bloom Filters for Traversal-Based Query Execution over Solid PodsMore
In Proceedings of the 21st Extended Semantic Web Conference: Posters and Demos Traversal-based query execution enables the resolving of queries over Linked Data documents, using a follow-your-nose approach to locating query-relevant data by following series of links through documents. This traversal, however, incurs an unavoidable overhead in the form of data access costs. Through only following links known to be relevant for answering a given query, this overhead could be minimized. Prior work exists in the form of reachability conditions to determine the links to dereference, however this does not take into consideration the contents behind a given link. Within this work, we have explored the possibility of using Bloom filters to prune query-irrelevant links based on the triple patterns contained within a given query, when performing traversal-based query execution over Solid pods containing simulated social network data as an example use case. Our discoveries show that, with relatively uniform data across an entire benchmark dataset, this approach fails to effectively filter links, especially when the queries contain triple patterns with low selectivity. Thus, future work should consider the query plan beyond individual patterns, or the structure of the data beyond individual triples, to allow for more effective pruning of links. 2024 - Workshop DFDP: A Declarative Form Description Pipeline for Decentralizing Web FormsMore
- Ieben Smessaert0
- Patrick Hochstenbach
- Ben De Meester2
- Ruben Taelman3
- Ruben Verborgh4
- Workshop Requirements and Challenges for Query Execution across Decentralized EnvironmentsMore
In Companion Proceedings of the ACM Web Conference 2024 Due to the economic and societal problems being caused by the Web’s growing centralization, there is an increasing interest in decentralizing data on the Web. This decentralization does however cause a number of technical challenges. If we want to give users in decentralized environments the same level of user experience as they are used to with centralized applications, we need solutions to these challenges. We discuss how query engines can act as layer between applications on the one hand, and decentralized environments on the other hand, Query engines therefore act as an abstraction layer that hides the complexities of decentralized data management for application developers. In this article, we outline the requirements for query engines over decentralized environments. Furthermore, we show how existing approaches meet these requirements, and which challenges remain. As such, this article offers a high-level overview of a roadmap in the query and decentralization research domains. 2024 - Demo Demonstration of Link Traversal SPARQL Query Processing over the Decentralized Solid EnvironmentMore
In Proceedings of the 27th International Conference on Extending Database Technology (EDBT) To tackle economic and societal problems originating from the centralization of Knowledge Graphs on the Web, there has been an increasing interest towards decentralizing Knowledge Graphs across a large number of small authoritative sources. In order to effectively build user-facing applications, there is a need for efficient query engines that abstract the complexities around accessing such massively Decentralized Knowledge Graphs (DKGs). As such, we have designed and implemented novel Link Traversal Query Processing algorithms into the Comunica query engine framework that are capable of efficiently evaluating SPARQL queries across DKGs provided by the Solid decentralization initiative. In this article, we demonstrate this query engine through a Web-based interface over which SPARQL queries can be executed over simulated and real-world Solid environments. Our demonstration shows the benefits of a traversal-based approach towards querying DKGs, and uncovers opportunities for further optimizations in future work in terms of both query execution and discovery algorithms. 2024 - Poster Personalized Medicine Through Personal Data PodsMore
In Proceedings of the 15th International SWAT4HCLS Conference Medical care is in the process of becoming increasingly personalized through the use of patient genetic information. Concurrently, privacy concerns regarding collection and storage of sensitive personal genome sequence data have necessitated public debate and legal regulation. Here we identify two fundamental challenges associated with privacy and shareability of genomic data storage and propose the use of Solid pods to address these challenges. We establish that personal data pods using Solid specifications can enable decentralized storage, increased patient control over their data, and support of Linked Data formats, which when combined, could offer solutions to challenges currently restricting personalized medicine in practice. 2024
2023
- Poster Towards Algebraic Mapping Operators for Knowledge Graph ConstructionMore
In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos Declarative knowledge graph construction has matured to the point where state of the art techniques are focusing on optimizing the mapping processes. However, these optimization techniques use the syntax of the mapping language without considering the impact of the semantics. As a result, it is difficult to compare different engines fairly due to the obscurity in their semantic differences. In this poster paper, we propose an initial set of algebraic mapping operators to define the operational semantics of mapping processes, and provide a first step towards a theoretical foundation for mapping languages. We translated a series of RML documents to algebraic mapping operators to show the feasibility of our approach. We believe that further pursuing these initial results will lead to greater interoperability of mapping engines and languages, intensify requirements analysis for the upcoming RML standardization work, and an improved developer experience for all current and future mapping engines. 2023 - Workshop The Need for Better RDF Archiving BenchmarksMore
In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web The advancements and popularity of Semantic Web technologies in the last decades have led to an exponential adoption and availability of Web-accessible datasets. While most solutions consider such datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized benchmarks are needed to systematically test the different capabilities of existing solutions and identify their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases and their requirements. Through this empirical assessment, we highlight the need for the development of more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving. 2023 - Workshop How Does the Link Queue Evolve during Traversal-Based Query Processing?More
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the query engine to start with zero knowledge of the data to query and discover data sources on the fly. The query engine starts with some seed documents and dynamically discovers new data sources by dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery, query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing structural information, but they depend on a deep understanding of the link queue during LTQP. To this end, we investigate the evolution of the types of link sources in the link queue and introduce metrics that describe key link queue characteristics. This paper analyses the link queue to guide future work on LTQP query optimization approaches that exploit structural information within a Solid environment. We find that queries exhibit two different execution patterns, one where the link queue is primarily empty and the other where the link queue fills faster than the engine can process. Our results show that the link queue is not functioning optimally and that our current approach to link discovery is not sufficiently selective. 2023 - Workshop In-Memory Dictionary-Based Indexing of Quoted RDF TriplesMore
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner. 2023 - Conference LDkit: Linked Data Object Graph Mapping toolkit for Web ApplicationsMore
In Proceedings of the 22nd International Semantic Web Conference The adoption of Semantic Web and Linked Data technologies in web application development has been hindered by the complexity of numerous standards, such as RDF and SPARQL, as well as the challenges associated with querying data from distributed sources and a variety of interfaces. Understandably, web developers often prefer traditional solutions based on relational or document databases due to the higher level of data predictability and superior developer experience. To address these issues, we present LDkit, a novel Object Graph Mapping (OGM) framework for TypeScript designed to provide a model-based abstraction for RDF. LDkit facilitates the direct utilization of Linked Data in web applications, effectively working as the data access layer. It accomplishes this by querying and retrieving data, and transforming it into TypeScript primitives according to user defined data schemas, while ensuring end-to-end data type safety. This paper describes the design and implementation of LDkit, highlighting its ability to simplify the integration of Semantic Web technologies into web applications, while adhering to both general web standards and Linked Data specific standards. Furthermore, we discuss how LDkit framework has been designed to integrate seamlessly with popular web application technologies that developers are already familiar with. This approach promotes ease of adoption, allowing developers to harness the power of Linked Data without disrupting their current workflows. Through the provision of an efficient and intuitive toolkit, LDkit aims to enhance the web ecosystem by promoting the widespread adoption of Linked Data and Semantic Web technologies. 2023 - Conference Link Traversal Query Processing over Decentralized Environments with Structural AssumptionsMore
In Proceedings of the 22nd International Semantic Web Conference To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape. 2023 - Poster Reinforcement Learning-based SPARQL Join Ordering OptimizerMore
- Ruben Eschauzier0
- Ruben Taelman1
- Meike Morren
- Ruben Verborgh3
- Demo GLENDA: Querying RDF Archives with full SPARQLMore
In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos The dynamicity of semantic data has propelled the research on RDF Archiving, i.e., the task of storing and making the full history of large RDF datasets accessible. However, existing archiving techniques fail to scale when confronted with very large RDF datasets and support only simple SPARQL queries. In this demonstration, we therefore showcase GLENDA, a system that can run full SPARQL 1.1 compliant queries over large RDF archives. We achieve this through a multi-snapshot change- based storage architecture that we interface using the Comunica query engine. Thanks to this integration we demonstrate that fast SPARQL query processing over multiple versions of a knowledge graph is possible. Moreover, our demonstration provides different statistics about the history of RDF datasets that can be useful for tasks beyond querying and by providing insights about the evolution dynamics of the data. 2023 - Workshop Distributed Social Benefit Allocation using Reasoning over Personal Data in SolidMore
In Proceedings of the 1st International Workshop on Data Management for Knowledge Graphs When interacting with government institutions, citizens may often be asked to provide a number of documents to various officials, due to the way the data is being processed by the government, and regulation or guidelines that restrict sharing of that data between institutions. Occasionally, documents from third parties, such as the private sector, are involved, as the data, rules, regulations and individual private data may be controlled by different parties. Facilitating efficient flow of information in such cases is therefore important, while still respecting the ownership and privacy of that data. Addressing these types of use cases in data storage and sharing, the Solid initiative allows individuals, organisations and the public sector to store their data in personal online datastores. Solid has been previously applied in data storage within government contexts, so we decided to extend that work by adding data processing services on top of such data and including multiple parties such as citizen and the private sector. However, introducing multiple parties within the data processing flow may impose new challenges, and implementing such data processing services in practice on top of Solid might present opportunities for improvement from the perspective of the implementer of the services. Within this work, together with the City of Antwerp in Belgium, we have produced a proof-of-concept service implementation operating at the described intersection of public sector, citizens and private sector, to manage social benefit allocation in a distributed environment. The service operates on distributed Linked Data stored in multiple Solid pods in RDF, using Notation3 rules to process that data and SPARQL queries to access and modify it. This way, our implementation seeks to respect the design principles of Solid, while taking advantage of the related technologies for representing, processing and modifying Linked Data. This document will describe our chosen use case, service design and implementation, and our observations resulting from this experiment. Through the proof-of-concept implementation, we have established a preliminary understanding of the current challenges in implementing such a service using the chosen technologies. We have identified topics such as verification of data that should be addressed when using such an approach in practice, assumptions related to data locations and tight coupling between our logic between the rules and program code. Addressing these topics in future work should help further the adoption of Linked Data as a means to solve challenges around data sharing, processing and ownership such as with government processes involving multiple parties. 2023 - Journal Distributed Subweb Specifications for Traversing the WebMore
In Theory and Practice of Logic Programming Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. 2023 - Conference Scaling Large RDF Archives To Very Long HistoriesMore
In Proceedings of the 17th IEEE International Conference on Semantic Computing In recent years, research in RDF archiving has gained traction due to the ever-growing nature of semantic data and the emergence of community-maintained knowledge bases. Several solutions have been proposed to manage the history of large RDF graphs, including approaches based on independent copies, time-based indexes, and change-based schemes. In particular, aggregated changesets have been shown to be relatively efficient at handling very large datasets. However, ingestion time can still become prohibitive as the revision history increases. To tackle this challenge, we propose a hybrid storage approach based on aggregated changesets, snapshots, and multiple delta chains. We evaluate different snapshot creation strategies on the BEAR benchmark for RDF archives, and show that our techniques can speed up ingestion time up to two orders of magnitude while keeping competitive performance for version materialization and delta queries. This allows us to support revision histories of lengths that are beyond reach with existing approaches. 2023
2022
- Workshop A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query ProcessingMore
In Proceedings of the 6th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization. 2022 - Demo Solid Web MonetizationMore
- Merlijn Sebrechts0
- Tom Goethals1
- Thomas Dupont2
- Wannes Kerckhove3
- Ruben Taelman4
- Filip De Turck5
- Bruno Volckaert6
- Workshop A Policy-Oriented Architecture for Enforcing Consent in SolidMore
In Proceedings of the 2nd International Workshop on Consent Management in Online Services, Networks and Things The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, which has limited expressivity and interpretability. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting. 2022 - Journal Components.js: Semantic Dependency InjectionMore
In Semantic Web Journal A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way. Such compositions can be managed by Dependency Injection (DI), a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them. Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files, such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics, which limits their usage in other contexts. Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows. As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data. We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications, providing global semantics via Linked Data-based configuration files. In this article, we report on the Components.js framework by explaining its architecture and configuration, and discuss its impact by mentioning where and how applications use it. We show that Components.js is a stable framework that has seen significant uptake during the last couple of years. We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others, or applying these configurations in other contexts such as experimentation or static program analysis. We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication. 2022
2021
- Conference Towards a personal data vault society: an interplay between technological and business perspectivesMore
- Sofie Verbrugge0
- Frederic Vannieuwenborg1
- Marlies Van der Wee2
- Didier Colle3
- Ruben Taelman4
- Ruben Verborgh5
- Journal Optimizing Storage of RDF Archives using Bidirectional Delta ChainsMore
In Semantic Web Journal Linked Open Datasets on the Web that are published as RDF can evolve over time. There is a need to be able to store such evolving RDF datasets, and query across their versions. Different storage strategies are available for managing such versioned datasets, each being efficient for specific types of versioned queries. In recent work, a hybrid storage strategy has been introduced that combines these different strategies to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time. While this trade-off is beneficial in the context of Web querying, it suffers from exponential ingestion times in terms of the number of versions, which becomes problematic for RDF datasets with many versions. As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions. We have designed, implemented, and evaluated a change to the hybrid storage strategy where we make use of a _bidirectional delta chain_ instead of the default _unidirectional delta chain_. In this article, we introduce a concrete architecture for this change, together with accompanying ingestion and querying algorithms. Experimental results from our implementation show that the ingestion time is significantly reduced. As an additional benefit, this change also leads to lower total storage size and even improved query execution performance in some cases. This work shows that modifying the structure of delta chains within the hybrid storage strategy can be highly beneficial for RDF archives. In future work, other modifications to this delta chain structure deserve to be investigated, to further improve the scalability of ingestion and querying of datasets with many versions. 2021 - Conference Link Traversal with Distributed Subweb SpecificationsMore
In Rules and Reasoning: 5th International Joint Conference, RuleML+RR 2021, Leuven, Belgium, September 8 – September 15, 2021, Proceedings Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While LTQP allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. 2021 - Demo PROV4ITDaTa: Transparent and direct transfer of personal data to personal storesMore
In WWW2021, the International World Wide Web Conference Data is scattered across service providers, heterogeneously structured in various formats. By lack of interoperability, data portability is hindered, and thus user control is inhibited. An interoperable dataportability solution for transferring personal data is needed. We demo PROV4ITDaTa: a Web application, that allows users to transfer personal data into an interoperable format to their personal datastore. PROV4ITDaTa leverages the open-source solutions RML.io, Comunica, and Solid: (i) the RML.io toolset to describe how to accessdata from service providers and generate interoperable datasets; (ii) Comunica to query these and more flexibly generate enricheddatasets; and (iii) Solid Pods to store the generated data as LinkedData in personal data stores. As opposed to other (hard-coded) so-lutions, PROV4ITDaTa is fully transparent, where each componentof the pipeline is fully configurable and automatically generatesdetailed provenance trails. Furthermore, transforming the personaldata into RDF allows for an interopable solution. By maximizing theuse of open-source tools and open standards, PROV4ITDaTa facilitates the shift towards a data ecosystem wherein users have controlof their data, and providers can focus on their service instead oftrying to adhere to interoperability requirements. 2021 - Journal Semantic micro-contributions with decentralized nanopublication servicesMore
In PeerJ Computer Science While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing. 2021
2020
- Workshop Optimizing Approximate Membership Metadata in Triple Pattern Fragments for Clients and ServersMore
In Proceedings of the 13th International Workshop on Scalable Semantic Web Knowledge Base Systems Depending on the HTTP interface used for publishing Linked Data, the effort of evaluating a SPARQL query can be redistributed differently between clients and servers. For instance, lower server-side CPU usage can be realized at the expense of higher bandwidth consumption. Previous work has shown that complementing lightweight interfaces such as Triple Pattern Fragments (TPF) with additional metadata can positively impact the performance of clients and servers. Specifically, Approximate Membership Filters (AMFs)—data structures that are small and probabilistic—in the context of TPF were shown to reduce the number of HTTP requests, at the expense of increasing query execution times. In order to mitigate this significant drawback, we have investigated unexplored aspects of AMFs as metadata on TPF interfaces. In this article, we introduce and evaluate alternative approaches for server-side publication and client-side consumption of AMFs within TPF to achieve faster query execution, while maintaining low server-side effort. Our alternative client-side algorithm and the proposed server configurations significantly reduce both the number of HTTP requests and query execution time, with only a small increase in server load, thereby mitigating the major bottleneck of AMFs within TPF. Compared to regular TPF, average query execution is more than 2 times faster and requires only 10% of the number of HTTP requests, at the cost of at most a 10% increase in server load. These findings translate into a set of concrete guidelines for data publishers on how to configure AMF metadata on their servers. 2020 - Workshop RDF Test Suite: Improving Developer Experience for Building Specification-Compliant LibrariesMore
In Proceedings of the 1st Workshop on The Semantic Web in Practice: Tools and Pedagogy Guaranteeing compliance of software libraries to certain specifications is crucial for ensuring interoperability within the Semantic Web. While many specifications publish their own declarative test suite for testing compliance, the actual execution of these test suites can add a huge overhead for library developers. In order to remove this burden from these developers, we introduce a JavaScript tool called RDF Test Suite that takes care of all the required bookkeeping behind test suite execution. In this report, we discuss the design goals of RDF Test Suite, how it has been implemented, and how it can be used. In practice, this tool is being used in a variety of libraries, and it ensures their specification compliance with minimal overhead. 2020 - Conference LDflex: a Read/Write Linked Data Abstraction for Front-End Web DevelopersMore
In Proceedings of the 19th International Semantic Web Conference Many Web developers nowadays are trained to build applications with a user-facing browser front-end that obtains predictable data structures from a single, well-known back-end. Linked Data invalidates such assumptions, since data can combine several ontologies and span multiple servers with different APIs. Front-end developers, who specialize in creating end-user experiences rather than back-ends, need an abstraction layer to the Web of Data that matches their existing mindset and workflow. We have developed LDflex, a domain-specific language that exposes common Linked Data access patterns as concise JavaScript expressions. In this article, we describe the design and embedding of the language, and discuss its daily usage within two companies. LDflex succeeds in eliminating a dedicated data layer for common and straightforward data access patterns, without striving to be a replacement for more complex cases. Our experiences reveal that designing a Linked Data developer experience—analogous to a user experience—is crucial for adoption by the target group, who can create Linked Data applications for end users. Crucially, simple abstractions require research to hide the underlying complexity. 2020 - Conference Facilitating the Analysis of COVID-19 Literature Through a Knowledge GraphMore
- Bram Steenwinckel0
- Gilles Vandewiele1
- Ilja Rausch2
- Pieter Heyvaert3
- Ruben Taelman4
- Pieter Colpaert5
- Pieter Simoens6
- Anastasia Dimou7
- Filip De Turck8
- Femke Ongenae9
- Workshop Towards Querying in Decentralized Environments with Privacy-Preserving AggregationMore
In Proceedings of the 4th Workshop on Storing, Querying, and Benchmarking the Web of Data The Web is a ubiquitous economic, educational, and collaborative space. However, it also serves as a haven for personal information harvesting. Existing decentralised Web-based ecosystems, such as Solid, aim to combat personal data exploitation on the Web by enabling individuals to manage their data in the personal data store of their choice. Since personal data in these decentralised ecosystems are distributed across many sources, there is a need for techniques to support efficient privacy-preserving query execution over personal data stores. Towards this end, in this position paper we present a framework for efficient privacy preserving federated querying, and highlight open research challenges and opportunities. The overarching goal being to provide a means to position future research into privacy-preserving querying within decentralised environments. 2020 - Conference Pattern-based Access Control in a Decentralised Collaboration EnvironmentMore
- Jeroen Werbrouck0
- Ruben Taelman1
- Ruben Verborgh2
- Pieter Pauwels3
- Jakob Beetz
- Erik Mannens5
- Preprint Guided Link-Traversal-Based Query ProcessingMore
Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust. 2020 - PhD Thesis Storing and Querying Evolving Knowledge Graphs on the WebMore
In Storing and Querying Evolving Knowledge Graphs on the Web The Web has become our most valuable tool for sharing information. Currently, this Web is mainly targeted at humans, whereas machines typically have a hard time understanding information on the Web. Using knowledge graphs, this information can be linked in a structured way, so that intelligent agents can act upon this data autonomously. Current knowledge graphs remain however rather static. As there is a lot of value in acting upon evolving knowledge, there is a need for evolving knowledge graphs, and ways to manage them. As such, the goal of this PhD is to allow such evolving knowledge graphs to be stored and queried, taking into account the decentralized nature of the Web where anyone should be able to say anything about anything. Concretely, four challenges related to this goal are investigated: (1) generation of evolving data, (2) storage of evolving data, (3) querying over heterogeneous datasets, and (4) querying evolving data. For each of these challenges, techniques and algorithms have been developed, which prove to be valuable for storing and querying evolving knowledge graphs on the Web. This work therefore brings us closer towards a Web in which both human and machine can act upon evolving knowledge. 2020
2019
- Conference Streamlining governmental processes by putting citizens in control of their personal dataMore
- Raf Buyle0
- Ruben Taelman1
- Katrien Mostaert2
- Geroen Joris3
- Erik Mannens4
- Ruben Verborgh5
- Tim Berners-Lee6
- Demo Discovering Data Sources in a Distributed Network of Heritage InformationMore
- Miel Vander Sande0
- Sjors de Valk
- Enno Meijers
- Ruben Taelman3
- Herbert Van de Sompel
- Ruben Verborgh5
- Conference Reflections on: Triple Storage for Random-Access Versioned Querying of RDF ArchivesMore
In Proceedings of the 18th International Semantic Web Conference In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2019 - Demo Using an existing website as a queryable low-cost LOD publishing interfaceMore
In Proceedings of the 16th Extended Semantic Web Conference: Posters and Demos Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In this demonstrator, we advised the data owner to annotate a digital heritage website with JSON-LD snippets, resulting in a dataset of more than three million triples that is now available and officially maintained. The website itself is paged, and thus hydra partial collection view controls were added in the snippets. We then extended the modular query engine Comunica to support following page controls and extracting data from HTML documents while querying. This way, a SPARQL or GraphQL query over multiple heterogeneous data sources can power automated data reuse. While the query performance on such an interface is visibly poor, it becomes easy to create composite data dumps. As a result of implementing these building blocks in Comunica, any paged collection and enriched HTML page now becomes queryable by the query engine. This enables heterogenous data interfaces to share functionality and become technically interoperable. 2019 - Position Statement Bridges between GraphQL and RDFMore
In W3C Workshop on Web Standardization for Graph Data GraphQL offers a highly popular query languages for graphs, which is well known among Web developers. Each GraphQL data graph is scoped within an interface-specific schema, which makes it difficult to integrate data from multiple interfaces. The RDF graph model offers a solution to this integration problem. Different techniques can enable querying over RDF graphs using GraphQL queries. In this position statement, we provide a high-level overview of the existing techniques, and how they differ. We argue that each of these techniques have their merits, but standardization is essential to simplify the link between GraphQL and RDF. 2019
2018
- Conference Comunica: a Modular SPARQL Query Engine for the WebMore
In Proceedings of the 17th International Semantic Web Conference Query evaluation over Linked Data sources has become a complex story, given the multitude of algorithms and techniques for single- and multi-source querying, as well as the heterogeneity of Web interfaces through which data is published online. Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination, such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces. The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms, new or experimental SPARQL features, and support for new Web interfaces. We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications. In this article, we introduce this query engine and explain the architectural choices behind its design. We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms. Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others. 2018 - Demo GraphQL-LD: Linked Data Querying with GraphQLMore
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos The Linked Open Data cloud has the potential of significantly enhancing and transforming end-user applications. For example, the use of URIs to identify things allows data joining between separate data sources. Most popular (Web) application frameworks, such as React and Angular have limited support for querying the Web of Linked Data, which leads to a high-entry barrier for Web application developers. Instead, these developers increasingly use the highly popular GraphQL query language for retrieving data from GraphQL APIs, because GraphQL is tightly integrated into these frameworks. In order to lower the barrier for developers towards Linked Data consumption, the Linked Open Data cloud needs to be queryable with GraphQL as well. In this article, we introduce a method for transforming GraphQL queries coupled with a JSON-LD context to SPARQL, and a method for converting SPARQL results to the GraphQL query-compatible response. We demonstrate this method by implementing it into the Comunica framework. This approach brings us one step closer towards widespread Linked Data consumption for application development. 2018 - Demo Demonstration of Comunica, a Web framework for querying heterogeneous Linked Data interfacesMore
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos Linked Data sources can appear in a variety of forms, going from SPARQL endpoints to Triple Pattern Fragments and data dumps. This heterogeneity among Linked Data sources creates an added layer of complexity when querying or combining results from those sources. To ease this problem, we created a modular engine, Comunica, that has modules for evaluating SPARQL queries and supports heterogeneous interfaces. Other modules for other query or source types can easily be added. In this paper we showcase a Web client that uses Comunica to evaluate federated SPARQL queries through automatic source type identification and interaction. 2018 - Workshop The Fundamentals of Semantic Versioned QueryingMore
In Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems co-located with 17th International Semantic Web Conference The domain of RDF versioning concerns itself with the storage of different versions of Linked Datasets. The ability of querying over these versions is an active area of research, and allows for basic insights to be discovered, such as tracking the evolution of certain things in datasets. Querying can however only get you so far. In order to derive logical consequences from existing knowledge, we need to be able to reason over this data, such as ontology-based inferencing. In order to achieve this, we explore fundamental concepts on semantic querying of versioned datasets using ontological knowledge. In this work, we present these concepts as a semantic extension of the existing RDF versioning concepts that focus on syntactical versioning. We remain general and assume that versions do not necessarily follow a purely linear temporal relation. This work lays a foundation for reasoning over RDF versions from a querying perspective, using which RDF versioning storage, query and reasoning systems can be designed. 2018 - Conference On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scaleMore
- Ruben Taelman0
- Riccardo Tommasini1
- Joachim Van Herwegen2
- Miel Vander Sande3
- Emanuele Della Valle4
- Ruben Verborgh5
- Journal Triple Storage for Random-Access Versioned Querying of RDF ArchivesMore
In Journal of Web Semantics When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. 2018 - Journal Generating Public Transport Data based on Population Distributions for RDF BenchmarkingMore
In Semantic Web Journal When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PoDiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PoDiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, podigg provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data. 2018 - Challenge Versioned Querying with OSTRICH and Comunica in MOCHA 2018More
In Proceedings of the 5th SemWebEval Challenge at ESWC 2018 In order to exploit the value of historical information in Linked Datasets, we need to be able to store and query different versions of such datasets efficiently. The 2018 edition of the Mighty Storage Challenge (MOCHA) is organized to discover the efficiency of such Linked Data stores and to detect their bottlenecks. One task in this challenge focuses on the storage and querying of versioned datasets, in which we participated by combining the OSTRICH triple store and the Comunica SPARQL engine. In this article, we briefly introduce our system for the versioning task of this challenge. We present the evaluation results that show that our system achieves fast query times for the supported queries, but not all queries are supported by Comunica at the time of writing. These results of this challenge will serve as a guideline for further improvements to our system. 2018 - Conference Components.js: A Semantic Dependency Injection FrameworkMore
In Proceedings of the The Web Conference: Developers Track Components.js is a dependency injection framework for JavaScript applications that allows components to be instantiated and wired together declaratively using semantic configuration files. The advantage of these semantic configuration files is that software components can be uniquely and globally identified using URIs. As an example, this documentation has been made self-instantiatable using Components.js. This makes it possible to view the HTML-version of any page to the console, or serve it via HTTP on a local webserver. 2018 - Demo OSTRICH: Versioned Random-Access Triple StoreMore
In Proceedings of the 27th International Conference Companion on World Wide Web The Linked Open Data cloud is evergrowing and many datasets are frequently being updated. In order to fully exploit the potential of the information that is available in and over historical dataset versions, such as discovering evolution of taxonomies or diseases in biomedical datasets, we need to be able to store and query the different versions of Linked Datasets efficiently. In this demonstration, we introduce OSTRICH, which is an efficient triple store with supported for versioned query evaluation. We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created. Using this interface, the user is able to query in, between, and over different versions, ingest new versions, and retrieve summarizing statistics. 2018 - Workshop A Preliminary Open Data Publishing Strategy for Live Data in FlandersMore
- Julián Andrés Rojas Meléndez0
- Brecht Van de Vyvere1
- Arne Gevaert2
- Ruben Taelman3
- Pieter Colpaert4
- Ruben Verborgh5
2017
- Conference Declaratively Describing Responses of Hypermedia-Driven Web APIsMore
In Proceedings of the 9th International Conference on Knowledge Capture While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing. While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients, they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces. In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls. We therefore explored techniques to represent this parameterized response structure in a generic but expressive way. In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce. Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure, as it conforms to the REST architectural style that has helped shape the Web into its current form. 2017 - Workshop Describing configurations of software experiments as Linked DataMore
In Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci) Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions. Reproducing such experiments involves setting up software, benchmarks, and test data. Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility, such as module and dependency version numbers or the configuration of individual components in different setups. To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration. This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web. In order to enable semantic interlinking between configurations and modules, we published the metadata of all 500,000+ JavaScript libraries on npm as 200,000,000+ RDF triples. Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups. This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations. In the future, software could be instantiated automatically based on these descriptions and configurations, reasoning and querying can be applied to software configurations for meta-research purposes. 2017 - Demo Live Storage and Querying of Versioned Datasets on the WebMore
In Proceedings of the 14th Extended Semantic Web Conference: Posters and Demos Linked Datasets often evolve over time for a variety of reasons. While typical scenarios rely on the latest version only, useful knowledge may still be contained within or between older versions, such as the historical information of biomedical patient data. In order to make this historical information cost- efficiently available on the Web, a low-cost interface is required for providing access to versioned datasets. For our demonstration, we set up a live Triple Pattern Fragments interface for a versioned dataset with queryable access. We explain different version query types of this interface, and how it communicates with a storage solution that can handle these queries efficiently. 2017 - Workshop Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web ArchivesMore
In Proceedings of the 3rd Workshop on Managing the Evolution and Preservation of the Data Web Linked Datasets typically evolve over time because triples can be removed from or added to datasets, which results in different dataset versions. While most attention is typically given to the latest dataset version, a lot of useful information is still present in previous versions and its historical evolution. In order to make this historical information queryable at Web scale, a low-cost interface is required that provides access to different dataset versions. In this paper, we add a versioning feature to the existing Triple Pattern Fragments interface for queries at, between and for versions, with an accompanying vocabulary for describing the results, metadata and hypermedia controls. This interface feature is an important step into the direction of making versioned datasets queryable on the Web, with a low publication cost and effort. 2017 - Poster PoDiGG: A Public Transport RDF Dataset GeneratorMore
- Ruben Taelman0
- Ruben Verborgh1
- Tom De Nies
- Erik Mannens3
- Tutorial Modeling, Generating, and Publishing Knowledge as Linked DataMore
In Knowledge Engineering and Knowledge Management: EKAW 2016 Satellite Events, EKM and Drift-an-LOD, Bologna, Italy, November 19–23, 2016, Revised Selected Papers The process of extracting, structuring, and organizing knowledge from one or multiple data sources and preparing it for the Semantic Web requires a dedicated class of systems. They enable processing large and originally heterogeneous data sources and capturing new knowledge. Offering existing data as Linked Data increases its shareability, extensibility, and reusability. However, using Linking Data as a means to represent knowledge can be easier said than done. In this tutorial, we elaborate on the importance of semantically annotating data and how existing technologies facilitate their mapping to Linked Data. We introduce [R2]RML languages to generate Linked Data derived from different heterogeneous data formats –e.g., DBs, XML, or JSON– and from different interfaces –e.g., files or Web apis. Those who are not Semantic Web experts can annotate their data with the RMLEditor, whose user interface hides all underlying Semantic Web technologies to data owners. Last, we show how to easily publish Linked Data on the Web as Triple Pattern Fragments. As a result, participants, independently of their knowledge background, can model, annotate and publish data on their own. 2017
2016
- Poster Exposing RDF Archives using Triple Pattern FragmentsMore
In Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management: Posters and Demos Linked Datasets typically change over time, and knowledge of this historical information can be useful. This makes the storage and querying of Dynamic Linked Open Data an important area of research. With the current versioning solutions, publishing Dynamic Linked Open Data at Web-Scale is possible, but too expensive. We investigate the possibility of using the low-cost Triple Pattern Fragments (TPF) interface to publish versioned Linked Open Data. In this paper, we discuss requirements for supporting versioning in the TPF framework, on the level of the interface, storage and client, and investigate which trade-offs exist. These requirements lay the foundations for further research in the area of low-cost, Web-Scale dynamic Linked Open Data publication and querying. 2016 - Demo Querying Dynamic Datasources with Continuously Mapped Sensor DataMore
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos The world contains a large amount of sensors that produce new data at a high frequency. It is currently very hard to find public services that expose these measurements as dynamic Linked Data. We investigate how sensor data can be published continuously on the Web at a low cost. This paper describes how the publication of various sensor data sources can be done by continuously mapping raw sensor data to RDF and inserting it into a live, low-cost server. This makes it possible for clients to continuously evaluate dynamic queries using public sensor data. For our demonstration, we will illustrate how this pipeline works for the publication of temperature and humidity data originating from a microcontroller, and how it can be queried. 2016 - Demo Linked Sensor Data Generation using Queryable RML MappingsMore
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos As the amount of generated sensor data is increasing, semantic interoperability becomes an important aspect in order to support efficient data distribution and communication. Therefore, the integration of (sensor) data is important, as this data is coming from different data sources and might be in different formats. Furthermore, reusable and extensible methods for this integration are required in order to be able to scale with the growing number of applications that generate semantic sensor data. Current research efforts allow to map sensor data to Linked Data in order to provide semantic interoperability. However, they lack support for multiple data sources, hampering the integration. Furthermore, the used methods are not available for reuse or are not extensible, which hampers the development of applications. In this paper, we describe how the RDF Mapping Language (RML) and a Triple Pattern Fragments (TPF) server are used to address these shortcomings. The demonstration consists of a micro controller that generates sensor data. The data is captured and mapped to rdf triples using module-specific RML mappings, which are queried from a TPF server. 2016 - Workshop Multidimensional Interfaces for Selecting Data within Ordinal RangesMore
In Proceedings of the 7th International Workshop on Consuming Linked Data Linked Data interfaces exist in many flavours, as evidenced by subject pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces are mostly used to retrieve parts of a complete dataset, such parts can for example be defined by ranges in one or more dimensions. Filtering Linked Data by dimensions such as time range, geospatial area, or genomic location, requires the lookup of data within ordinal ranges. To make retrieval by such ranges generic and cost-efficient, we propose a REST solution in-between looking up data within ordinal ranges entirely on the server, or entirely on the client. To this end, we introduce a method for extending any Linked Data interface with an n-dimensional interface-level index such that n-dimensional ordinal data can be selected using n-dimensional ranges. We formally define Range Gates and Range Fragments and theoretically evaluate the cost-efficiency of hosting such an interface. By adding a multidimensional index to a Linked Data interface for multidimensional ordinal data, we found that we can get benefits from both worlds: the expressivity of the server raises, yet remains more cost-efficient than an interface providing the full functionality on the server-side. Furthermore, the client now shares in the effort to filter the data. This makes query processing becomes more flexible to the end-user, because the query plan can be altered by the engine. In future work we hope to apply Range Gates and Range Fragments to real-world interfaces to give quicker access to data within ordinal ranges 2016 - Conference Continuous Client-Side Query Evaluation over Dynamic Linked DataMore
In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers Existing solutions to query dynamic Linked Data sources extend the sparql language, and require continuous server processing for each query. Traditional sparql endpoints already accept highly expressive queries, so extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over dynamic Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we introduce the TPF Query Streamer that allows clients to evaluate sparql queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity, at the expense of an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating queries over dynamic Linked Data to the clients and thus increasing bandwidth usage, the cost at the server side is significantly reduced. Our results show that this solution makes real-time querying more scalable for a large amount of concurrent clients when compared to the alternatives. 2016 - Poster Moving Real-Time Linked Data Query Evaluation to the ClientMore
In Proceedings of the 13th Extended Semantic Web Conference: Posters and Demos Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost. For allowing a large number of concurrent clients to do continuous querying, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this poster, we give the overview of a client-side RDF stream processing engine on top of TPF. Our experiments show that our solution significantly lowers the server load while increasing the load on the clients. Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent clients when compared to the alternatives. 2016 - PhD Symposium Continuously Self-Updating Query Results over Dynamic Heterogeneous Linked DataMore
In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 – June 2, 2016, Proceedings Our society is evolving towards massive data consumption from heterogeneous sources, which includes rapidly changing data like public transit delay information. Many applications that depend on dynamic data consumption require highly available server interfaces. Existing interfaces involve substantial costs to publish rapidly changing data with high availability, and are therefore only possible for organisations that can afford such an expensive infrastructure. In my doctoral research, I investigate how to publish and consume real-time and historical Linked Data on a large scale. To reduce server-side costs for making dynamic data publication affordable, I will examine different possibilities to divide query evaluation between servers and clients. This paper discusses the methods I aim to follow together with preliminary results and the steps required to use this solution. An initial prototype achieves significantly lower server processing cost per query, while maintaining reasonable query execution times and client costs. Given these promising results, I feel confident this research direction is a viable solution for offering low-cost dynamic Linked Data interfaces as opposed to the existing high-cost solutions. 2016 - Workshop Continuously Updating Query Results over Real-Time Linked DataMore
In Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web Existing solutions to query dynamic Linked Data sources extend the SPARQL language, and require continuous server processing for each query. Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost. Extending these endpoints for time-sensitive queries increases the server cost even further. To make continuous querying over real-time Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries. In this paper, we discuss a framework on top of TPF that allows clients to execute SPARQL queries with continuously updating results. Our experiments indicate that this extension significantly lowers the server complexity. The trade-off is an increase in the execution time per query. We prove that by moving the complexity of continuously evaluating real-time queries over Linked Data to the clients and thus increasing the bandwidth usage, the cost of server-side interfaces is significantly reduced. Our results show that this solution makes real-time querying more scalable in terms of cpu usage for a large amount of concurrent clients when compared to the alternatives. 2016
2015
- Master’s Thesis Continuously Updating Queries over Real-Time Linked DataMore
This dissertation investigates the possibilities of having continuously updating queries over Linked Data with a focus on server availability. This work builds upon the ideas of Linked Data Fragments to let the clients do most of the work when executing a query. The server adds metadata to make the clients aware of the data volatility for making sure the query results are always up-to-date. The implementation of the framework that is proposed, is eventually tested and compared to other alternative approaches. 2015