Short Biography
Ruben Taelman is a Postdoctoral Researcher at IDLab, Ghent University – imec, Belgium.
His research concerns the decentralized publication and querying of Linked Data on the Web,
and investigating the trade-offs that exist between server and client.
He has contributed to this domain through publications in journals such as
the Semantic Web Journal
and the Journal of Web Semantics.
Next to that, he is actively applying his research by developing reusable software
that can be used by developers and other researchers.
Furthermore, he has presented his work at various conferences and workshops such as
the Web Conference,
the International Semantic Web Conference (ISWC),
and the Extended Semantic Web Conference (ESWC).
Additionally, he has served as a reviewer and program committee member for these journals and conferences.
Academic Work Experience
Education
Teaching
Courses
-
Web Development
2017 - now
Ghent University, Belgium
Giving lectures as co-lecturer and coaching of Computer Science students on designing and implementing Web applications.
-
Linked Data & Solid
2023 - 2024
VAIA
Teaching about Web Querying to professionals wanting to learn about Linked Data and Solid.
-
Research Project
2021 - now
Ghent University, Belgium
Coaching a group of students to write an academic survey paper.
-
Design of Cloud and Mobile Applications
2015 - 2017
Ghent University, Belgium
Coaching of Informatics students on designing and implementing cloud applications.
Supervision of Master Students
-
Jitse De Smet
2023 - 2024
Ghent University, Belgium
Abstracting Data Updates over a Document-oriented interface of a Permissioned Decentralized Environment
-
Victor Ronsyn
2023 - 2024
Ghent University, Belgium
Performance evaluation of the variety of Linked Data Interfaces
-
Elias Vervacke
2022 - 2023
Ghent University, Belgium
Writing a highly efficient JSON-LD parser for specific frames
-
Simon Van Braeckel
2022 - 2023
Ghent University, Belgium
Adaptive query optimization during Link Traversal Query Processing
-
Laurens Debackere
2021 - 2022
Ghent University, Belgium
Efficient and secure querying of Linked Data federations
-
Robin De Baets
2021 - 2022
Ghent University, Belgium
Caching and replication in decentralized Linked Data environments
-
Jasper Vrints
2021 - 2022
Ghent University, Belgium
Query Optimization using WebAssembly
-
Jonas Bovyn
2021 - 2022
Ghent University, Belgium
Query Optimization using WebAssembly
-
Sammy Delanghe
2021 - 2022
Ghent University, Belgium
Analysis of Property Graphs in SPARQL engines
-
Thomas Devriese
2020 - 2021
Ghent University, Belgium
Scaling networks of personal data stores through Approximate Membership Functions
-
Marie Denoo
2020 - 2021
Ghent University, Belgium
Enabling fine-grained access control in decentralized environments
-
Karel Haerens
2019 - 2020
Ghent University, Belgium
Query execution over decentralized social media networks
-
Serge Morel
2018 - 2019
Ghent University, Belgium
Computational integrity for outsourced execution of SPARQL queries
-
Isa Sebrechts
2018 - 2019
Ghent University, Belgium
Usability of distributed data sources for modern Web applications
-
Wouter Vandenberghe
2017 - 2019
Ghent University, Belgium
An analysis of the effects of HTTP2 on a low-cost query interface
-
Thibault Mahieu
2017 - 2018
Ghent University, Belgium
Reducing storage requirements of multi-version graph databases using forward and reverse deltas
-
Brecht Hendrickx
2017 - 2018
Ghent University, Belgium
Client-side querying of media fragments using a low-cost server interface
Fellowships
Awards
Invited talks
Publications
Journal articles
Journal
Towards Applications on the Decentralized Web using Hypermedia-driven Query Engines
-
Ruben Taelman0
In ACM SIGWEB Newsletter
The Web is facing unprecedented challenges related to the control and ownership of data. Due to
recent privacy and manipulation scandals caused by the increasing centralization of data on the
Web into increasingly fewer large data silos, there is a growing demand for the re-decentralization
of the Web. Enforced by user-empowering legislature such as GDPR and CCPA, decentralization
initiatives such as Solid are being designed that aim to break personal data out of these silos and
give back control to the users. While the ability to choose where and how data is stored provides
significant value to users, it leads to major technical challenges due to the massive distribution
of data across the Web. Since we cannot expect application developers to take up this burden of
coping with all the complexities of handling this data distribution, there is a need for a software
layer that abstracts away these complexities, and makes this decentralized data feel as being
centralized. Concretely, we propose personal query engines to play the role of such an abstraction
layer. In this article, we discuss what the requirements are for such a query-based abstraction layer,
what the state of the art is in this area, and what future research is required. Even though many
challenges remain to achieve a developer-friendly abstraction for interacting with decentralized
data on the Web, the pursuit of it is highly valuable for both end-users that want to be in control
of their data and its processing and businesses wanting to enable this at a low development cost.
2024
More
Journal
Distributed Subweb Specifications for Traversing the Web
-
Bart Bogaerts0
-
Bas Ketsman1
-
Younes Zeboudj2
-
Heba Aamer3
-
Ruben Taelman4
-
Ruben Verborgh5
In Theory and Practice of Logic Programming
Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents
rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However,
in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of
Data with a simple document-based interface is appealing, as it enables data publishers to control their data
and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance
issues (due to the high number of documents containing data) as well as information quality concerns (due to
the many sources providing such documents). In existing LTQP approaches, the burden of finding sources to
query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data
publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and
trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its
properties. We illustrate with a theoretic example that this can improve query results and reduce the number
of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and
indeed observe that not just the data quality but also the efficiency of querying improves.
2023
More
Journal
Components.js: Semantic Dependency Injection
-
Ruben Taelman0
-
Joachim Van Herwegen1
-
Miel Vander Sande2
-
Ruben Verborgh3
In Semantic Web Journal
A common practice within object-oriented software is using composition to realize complex object behavior in a reusable way.
Such compositions can be managed by Dependency Injection (DI),
a popular technique in which components only depend on minimal interfaces and have their concrete dependencies passed into them.
Instead of requiring program code, this separation enables describing the desired instantiations in declarative configuration files,
such that objects can be wired together automatically at runtime. Configurations for existing DI frameworks typically only have local semantics,
which limits their usage in other contexts.
Yet some cases require configurations outside of their local scope, such as for the reproducibility of experiments, static program analysis, and semantic workflows.
As such, there is a need for globally interoperable, addressable, and discoverable configurations, which can be achieved by leveraging Linked Data.
We created Components.js as an open-source semantic DI framework for TypeScript and JavaScript applications,
providing global semantics via Linked Data-based configuration files.
In this article, we report on the Components.js framework by explaining its architecture and configuration,
and discuss its impact by mentioning where and how applications use it.
We show that Components.js is a stable framework that has seen significant uptake during the last couple of years.
We recommend it for software projects that require high flexibility, configuration without code changes, sharing configurations with others,
or applying these configurations in other contexts such as experimentation or static program analysis.
We anticipate that Components.js will continue driving concrete research and development projects that require high degrees of customization
to facilitate experimentation and testing, including the Comunica query engine and the Community Solid Server for decentralized data publication.
2022
More
Journal
Optimizing Storage of RDF Archives using Bidirectional Delta Chains
-
Ruben Taelman0
-
Thibault Mahieu1
-
Martin Vanbrabant2
-
Ruben Verborgh3
In Semantic Web Journal
Linked Open Datasets on the Web that are published as RDF can evolve over time.
There is a need to be able to store such evolving RDF datasets,
and query across their versions.
Different storage strategies are available for managing such versioned datasets,
each being efficient for specific types of versioned queries.
In recent work, a hybrid storage strategy has been introduced that combines these different strategies
to lead to more efficient query execution for all versioned query types at the cost of increased ingestion time.
While this trade-off is beneficial in the context of Web querying,
it suffers from exponential ingestion times in terms of the number of versions,
which becomes problematic for RDF datasets with many versions.
As such, there is a need for an improved storage strategy that scales better in terms of ingestion time for many versions.
We have designed, implemented, and evaluated a change to the hybrid storage strategy
where we make use of a _bidirectional delta chain_
instead of the default _unidirectional delta chain_.
In this article,
we introduce a concrete architecture for this change,
together with accompanying ingestion and querying algorithms.
Experimental results from our implementation
show that the ingestion time is significantly reduced.
As an additional benefit,
this change also leads to lower total storage size and even improved query execution performance in some cases.
This work shows that modifying the structure of delta chains within the hybrid storage strategy
can be highly beneficial for RDF archives.
In future work,
other modifications to this delta chain structure deserve to be investigated,
to further improve the scalability of ingestion and querying of datasets with many versions.
2021
More
Journal
Semantic micro-contributions with decentralized nanopublication services
-
Tobias Kuhn0
-
Ruben Taelman1
-
Vincent Emonet2
-
Haris Antonatos3
-
Stian Soiland-Reyes4
-
Michel Dumontier5
In PeerJ Computer Science
While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.
2021
More
Journal
Triple Storage for Random-Access Versioned Querying of RDF Archives
-
Ruben Taelman0
-
Miel Vander Sande1
-
Joachim Van Herwegen2
-
Erik Mannens3
-
Ruben Verborgh4
In Journal of Web Semantics
When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version.
Nevertheless, useful information is present in or between previous versions.
In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives.
Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands.
In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead,
by compressing consecutive versions and adding metadata for reducing lookup times.
We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions.
Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH.
Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency.
By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries.
OSTRICH performs better for many smaller dataset versions than for few larger dataset versions.
Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results.
Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion,
which only in some cases increases storage space when compared to other approaches.
This allows data owners to store and query multiple versions of their dataset efficiently,
lowering the barrier to historical dataset publication and analysis.
2018
More
Journal
Generating Public Transport Data based on Population Distributions for RDF Benchmarking
-
Ruben Taelman0
-
Pieter Colpaert1
-
Erik Mannens2
-
Ruben Verborgh3
In Semantic Web Journal
When benchmarking RDF data management systems such as public transport route planners,
system evaluation needs to happen under various realistic circumstances,
which requires a wide range of datasets with different properties.
Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing.
For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility.
Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic,
raising questions about the generalizability of benchmark results to real-world scenarios.
In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth,
we designed PoDiGG, a highly configurable generation algorithm for synthetic public transport datasets
with realistic geospatial and temporal characteristics comparable to those of their real-world variants.
The algorithm is inspired by real-world public transit network design and scheduling methodologies.
This article discusses the design and implementation of PoDiGG and validates the properties of its generated datasets.
Our findings show that the generator achieves a sufficient level of realism,
based on the existing coherence metric and new metrics we introduce specifically for the public transport domain.
Thereby, podigg provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data.
2018
More
Conference and workshop contributions
Conference
Decentralized Search over Personal Online Datastores: Architecture and Performance Evaluation
-
Mohamed Ragab0
-
Yury Savateev1
-
Helen Oliver2
-
Thanassis Tiropanis3
-
Alex Poulovassilis4
-
Adriane Chapman5
-
Ruben Taelman6
-
George Roussos7
In Proceedings of the 24th International Conference on Web Engineering
Data privacy and sovereignty are open challenges in today’s Web, which the Solid ecosystem aims to meet by providing personal online datastores (pods) where individuals can control access to their data. Solid allows developers to deploy applications with access to data stored in pods, subject to users’ permission. For the decentralised Web to succeed, the problem of search over pods with varying access permissions must be solved. The ESPRESSO framework takes the first step in exploring such a search architecture, enabling large-scale keyword search across Solid pods with varying access rights. This paper provides a comprehensive experimental evaluation of the performance and scalability of decentralised keyword search across pods on the current ESPRESSO prototype. The experiments specifically investigate how controllable experimental parameters influence search performance across a range of decentralised settings. This includes examining the impact of different text dataset sizes (0.5MB to 50MB per pod, divided into 1 to 10,000 files), different access control levels (10%, 25%, 50%, or 100% file access), and a range of configurations for Solid servers and pods (from 1 to 100 pods across 1 to 50 servers). The experimental results confirm the feasibility of deploying a decentralised search system to conduct keyword search at scale in a decentralised environment.
2024
More
Conference
LDkit: Linked Data Object Graph Mapping toolkit for Web Applications
-
Karel Klíma0
-
Ruben Taelman1
-
Martin Nečaský2
In Proceedings of the 22nd International Semantic Web Conference
The adoption of Semantic Web and Linked Data technologies
in web application development has been hindered by the complexity of
numerous standards, such as RDF and SPARQL, as well as the challenges
associated with querying data from distributed sources and a variety of
interfaces. Understandably, web developers often prefer traditional solutions
based on relational or document databases due to the higher level of
data predictability and superior developer experience. To address these
issues, we present LDkit, a novel Object Graph Mapping (OGM) framework
for TypeScript designed to provide a model-based abstraction for
RDF. LDkit facilitates the direct utilization of Linked Data in web applications,
effectively working as the data access layer. It accomplishes
this by querying and retrieving data, and transforming it into TypeScript
primitives according to user defined data schemas, while ensuring
end-to-end data type safety. This paper describes the design and implementation
of LDkit, highlighting its ability to simplify the integration
of Semantic Web technologies into web applications, while adhering to
both general web standards and Linked Data specific standards. Furthermore,
we discuss how LDkit framework has been designed to integrate
seamlessly with popular web application technologies that developers are
already familiar with. This approach promotes ease of adoption, allowing
developers to harness the power of Linked Data without disrupting
their current workflows. Through the provision of an efficient and intuitive
toolkit, LDkit aims to enhance the web ecosystem by promoting the
widespread adoption of Linked Data and Semantic Web technologies.
2023
More
Conference
Link Traversal Query Processing over Decentralized Environments with Structural Assumptions
-
Ruben Taelman0
-
Ruben Verborgh1
In Proceedings of the 22nd International Semantic Web Conference
To counter societal and economic problems caused by data silos on the Web, efforts such as Solid strive to reclaim private data by storing it in permissioned documents over a large number of personal vaults across the Web. Building applications on top of such a decentralized Knowledge Graph involves significant technical challenges: centralized aggregation prior to query processing is excluded for legal reasons, and current federated querying techniques cannot handle this large scale of distribution at the expected performance. We propose an extension to Link Traversal Query Processing (LTQP) that incorporates structural properties within decentralized environments to tackle their unprecedented scale. In this article, we analyze the structural properties of the Solid decentralization ecosystem that are relevant for query execution, we introduce novel LTQP algorithms leveraging these structural properties, and evaluate their effectiveness. Our experiments indicate that these new algorithms obtain accurate results in the order of seconds, which existing algorithms cannot achieve. This work reveals that a traversal-based querying method using structural assumptions can be effective for large-scale decentralization, but that advances are needed in the area of query planning for LTQP to handle more complex queries. These insights open the door to query-driven decentralized applications, in which declarative queries shield developers from the inherent complexity of a decentralized landscape.
2023
More
Conference
Scaling Large RDF Archives To Very Long Histories
-
Olivier Pelgrin0
-
Ruben Taelman1
-
Luis Galárraga2
-
Katja Hose3
In Proceedings of the 17th IEEE International Conference on Semantic Computing
In recent years, research in RDF archiving has
gained traction due to the ever-growing nature of semantic data
and the emergence of community-maintained knowledge bases.
Several solutions have been proposed to manage the history of
large RDF graphs, including approaches based on independent
copies, time-based indexes, and change-based schemes. In particular, aggregated changesets have been shown to be relatively
efficient at handling very large datasets. However, ingestion time
can still become prohibitive as the revision history increases. To
tackle this challenge, we propose a hybrid storage approach based
on aggregated changesets, snapshots, and multiple delta chains.
We evaluate different snapshot creation strategies on the BEAR
benchmark for RDF archives, and show that our techniques can
speed up ingestion time up to two orders of magnitude while
keeping competitive performance for version materialization and
delta queries. This allows us to support revision histories of
lengths that are beyond reach with existing approaches.
2023
More
Conference
Towards a personal data vault society: an interplay between technological and business perspectives
-
Sofie Verbrugge0
-
Frederic Vannieuwenborg1
-
Marlies Van der Wee2
-
Didier Colle3
-
Ruben Taelman4
-
Ruben Verborgh5
In 60th FITCE Communication Days Congress for ICT Professionals: Industrial Data–Cloud, Low Latency and Privacy (FITCE)
The Web is evolving more and more to a small set of walled gardens where a very small number of
platforms determine the way that people get access to the Web (i.e. "log in via Facebook"). As a pushback against this
ongoing centralization of data and services on the Web, decentralization efforts are taking place that move data into
personal data vaults. This positioning paper discusses a potential personal data vault society and the required research
steps in order to get there. Emphasis is given on the needed interplay between technological and business research
perspectives. The focus is on the situation of Flanders, based on a recent announcement of the Flemish government to
set up a Data Utility Company. The concepts as well as the suggested path for adoption can easily be extended, however,
to other situations/regions as well.
2021
More
Conference
Link Traversal with Distributed Subweb Specifications
-
Bart Bogaerts0
-
Bas Ketsman1
-
Younes Zeboudj2
-
Heba Aamer3
-
Ruben Taelman4
-
Ruben Verborgh5
In Rules and Reasoning: 5th International Joint Conference, RuleML+RR 2021, Leuven, Belgium, September 8 – September 15, 2021, Proceedings
Link Traversal–based Query Processing (LTQP), in which a SPARQL query is evaluated over a web of documents rather than a single dataset,
is often seen as a theoretically interesting yet impractical technique.
However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing,
as it enables data publishers to control their data and access rights.
While LTQP allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data)
as well as information quality concerns (due to the many sources providing such documents).
In existing LTQP approaches, the burden of finding sources to query is entirely in the hands of the data consumer.
In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest
and guide the data consumer towards relevant and trustworthy data.
We introduce a theoretical framework that enables such guided link traversal and study its properties.
We illustrate with a theoretic example that this can improve query results and reduce the number of network requests.
2021
More
Conference
LDflex: a Read/Write Linked Data Abstraction for Front-End Web Developers
-
Ruben Verborgh0
-
Ruben Taelman1
In Proceedings of the 19th International Semantic Web Conference
Many Web developers nowadays are trained to build applications with a user-facing browser front-end that obtains predictable data structures from a single, well-known back-end. Linked Data invalidates such assumptions, since data can combine several ontologies and span multiple servers with different APIs. Front-end developers, who specialize in creating end-user experiences rather than back-ends, need an abstraction layer to the Web of Data that matches their existing mindset and workflow. We have developed LDflex, a domain-specific language that exposes common Linked Data access patterns as concise JavaScript expressions. In this article, we describe the design and embedding of the language, and discuss its daily usage within two companies. LDflex succeeds in eliminating a dedicated data layer for common and straightforward data access patterns, without striving to be a replacement for more complex cases. Our experiences reveal that designing a Linked Data developer experience—analogous to a user experience—is crucial for adoption by the target group, who can create Linked Data applications for end users. Crucially, simple abstractions require research to hide the underlying complexity.
2020
More
Conference
Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph
-
Bram Steenwinckel0
-
Gilles Vandewiele1
-
Ilja Rausch2
-
Pieter Heyvaert3
-
Ruben Taelman4
-
Pieter Colpaert5
-
Pieter Simoens6
-
Anastasia Dimou7
-
Filip De Turck8
-
Femke Ongenae9
In Proceedings of the 19th International Semantic Web Conference
At the end of 2019, Chinese authorities alerted the World Health Organization (WHO) of the outbreak of a new strain of the coronavirus, called SARS-CoV-2, which struck humanity by an unprecedented disaster a few months later. In response to this pandemic, a publicly available dataset was released on Kaggle which contained information of over 63,000 papers. In order to facilitate the analysis of this large mass of literature, we have created a knowledge graph based on this dataset. Within this knowledge graph, all information of the original dataset is linked together, which makes it easier to search for relevant information. The knowledge graph is also enriched with additional links to appropriate, already existing external resources. In this paper, we elaborate on the different steps performed to construct such a knowledge graph from structured documents. Moreover, we discuss, on a conceptual level, several possible applications and analyses that can be built on top of this knowledge graph. As such, we aim to provide a resource that allows people to more easily build applications that give more insights into the COVID-19 pandemic.
2020
More
Conference
Pattern-based Access Control in a Decentralised Collaboration Environment
-
Jeroen Werbrouck0
-
Ruben Taelman1
-
Ruben Verborgh2
-
Pieter Pauwels3
-
Jakob Beetz
-
Erik Mannens5
In Proceedings of LDAC2020 - 8th Linked Data in Architecture and Construction Workshop
As the building industry is rapidly catching up with digital advancements, and Web technologies grow in both maturity and security , a data-and Web-based construction practice comes within reach. In such an environment, private project information and open online data can be combined to allow cross-domain interoperability at data level, using Semantic Web technologies. As construction projects often feature complex and temporary networks of stakeholder firms and their employees , a property-based access control mechanism is necessary to enable a flexible and automated management of distributed building projects. In this article, we propose a method to facilitate such mechanism using existing Web technologies: RDF, SHACL, WebIDs, nanopublications and the Linked Data Platform. The proposed method will be illustrated with an extension of a custom nodeJS Solid server. The potential of the Solid ecosystem has been put forward earlier as a basis for a Linked Data-based Common Data Environment: its decentralised setup, connection of both RDF and non-RDF resources and fine-grained access control mechanisms are considered an apt foundation to manage distributed building data.
2020
More
Conference
Streamlining governmental processes by putting citizens in control of their personal data
-
Raf Buyle0
-
Ruben Taelman1
-
Katrien Mostaert2
-
Geroen Joris3
-
Erik Mannens4
-
Ruben Verborgh5
-
Tim Berners-Lee6
In Proceedings of the 6th International Conference on Electronic Governance and Open Society: Challenges in Eurasia
Governments typically store large amounts of personal information on their citizens, such as a home address, marital status, and occupation, to offer public services. Because governments consist of various governmental agen- cies, multiple copies of this data often exist. This raises concerns regarding data consistency, privacy, and access control, especially under recent legal frame- works such as GDPR. To solve these problems, and to give citizens true control over their data, we explore an approach using the decentralised Solid ecosys- tem, which enables citizens to maintain their data in personal data pods. We have applied this approach to two high-impact use cases, where citizen infor- mation is stored in personal data pods, and both public and private organisations are selectively granted access. Our findings indicate that Solid allows reshaping the relationship between citizens, their personal data, and the applications they use in the public and private sector. We strongly believe that the insights from this Flemish Solid Pilot can speed up the process for public administrations and private organisations that want to put the users in control of their data.
2019
More
Conference
Reflections on: Triple Storage for Random-Access
Versioned Querying of RDF Archives
-
Ruben Taelman0
-
Miel Vander Sande1
-
Joachim Van Herwegen2
-
Erik Mannens3
-
Ruben Verborgh4
In Proceedings of the 18th International Semantic Web Conference
In addition to their latest version, Linked Open Datasets on the Web can also contain useful information in or between previous versions. In order to exploit this information, we can maintain history in RDF archives. Existing approaches either require much storage space, or they do not meet sufficiently expressive querying demands. In this extended abstract, we discuss an RDF archive indexing technique that has a low storage overhead, and adds metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating versioned queries. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new trade-off regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. Our storage technique reduces query evaluation time through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis.
2019
More
Conference
Comunica: a Modular SPARQL Query Engine for the Web
-
Ruben Taelman0
-
Joachim Van Herwegen1
-
Miel Vander Sande2
-
Ruben Verborgh3
In Proceedings of the 17th International Semantic Web Conference
Query evaluation over Linked Data sources has become a complex story,
given the multitude of algorithms and techniques for single- and multi-source querying,
as well as the heterogeneity of Web interfaces through which data is published online.
Today’s query processors are insufficiently adaptable to test multiple query engine aspects in combination,
such as evaluating the performance of a certain join algorithm over a federation of heterogeneous interfaces.
The Semantic Web research community is in need of a flexible query engine that allows plugging in new components such as different algorithms,
new or experimental SPARQL features, and support for new Web interfaces.
We designed and developed a Web-friendly and modular meta query engine called Comunica that meets these specifications.
In this article, we introduce this query engine and explain the architectural choices behind its design.
We show how its modular nature makes it an ideal research platform for investigating new kinds of Linked Data interfaces and querying algorithms.
Comunica facilitates the development, testing, and evaluation of new query processing capabilities, both in isolation and in combination with others.
2018
More
Conference
On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scale
-
Ruben Taelman0
-
Riccardo Tommasini1
-
Joachim Van Herwegen2
-
Miel Vander Sande3
-
Emanuele Della Valle4
-
Ruben Verborgh5
In Proceedings of the 14th International Conference on Semantic Systems
RDF Stream Processing (RSP) is a rapidly evolving area of research that focuses on extensions of the Semantic Web in order to model and process Web data streams.
While state-of-the-art approaches concentrate on server-side processing of RDF streams,
we investigate the Triple Pattern Fragments Query Streamer (TPF-QS) method for server-side publishing of RDF streams,
which moves the workload of continuous querying to clients.
We formalize TPF-QS in terms of the RSP-QL reference model in order to formally compare it with existing RSP query languages.
We experimentally validate that, compared to the state of the art,
the server load of TPF-QS scales better with increasing numbers of concurrent clients in case of simple queries,
at the cost of increased bandwidth consumption.
This shows that TPF-QS is an important first step towards a viable solution for Web-scale publication and continuous processing of RDF streams.
2018
More
Conference
Components.js: A Semantic Dependency Injection Framework
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
In Proceedings of the The Web Conference: Developers Track
Components.js is a dependency injection framework for JavaScript applications that allows components to be instantiated and wired together declaratively using semantic configuration files.
The advantage of these semantic configuration files is that software components can be uniquely and globally identified using URIs.
As an example, this documentation has been made self-instantiatable using Components.js. This makes it possible to view the HTML-version of any page to the console, or serve it via HTTP on a local webserver.
2018
More
Conference
Declaratively Describing Responses of Hypermedia-Driven Web APIs
-
Ruben Taelman0
-
Ruben Verborgh1
In Proceedings of the 9th International Conference on Knowledge Capture
While humans browse the Web by following links, these hypermedia links can also be used by machines for browsing.
While efforts such as Hydra semantically describe the hypermedia controls on Web interfaces to enable smarter interface-agnostic clients,
they are largely limited to the input parameters to interfaces, and clients therefore do not know what response to expect from these interfaces.
In order to convey such expectations, interfaces need to declaratively describe the response structure of their parameterized hypermedia controls.
We therefore explored techniques to represent this parameterized response structure in a generic but expressive way.
In this work, we discuss four different approaches for declaring a response structure, and we compare them based on a model that we introduce.
Based on this model, we conclude that a SHACL shape-based approach can be used for declaring such a parameterized response structure,
as it conforms to the REST architectural style that has helped shape the Web into its current form.
2017
More
Conference
Continuous Client-Side Query Evaluation over Dynamic Linked Data
-
Ruben Taelman0
-
Ruben Verborgh1
-
Pieter Colpaert2
-
Erik Mannens3
In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers
Existing solutions to query dynamic Linked Data sources extend the sparql language, and require continuous server processing for each query.
Traditional sparql endpoints already accept highly expressive queries, so extending these endpoints for time-sensitive queries increases the server cost even further.
To make continuous querying over dynamic Linked Data more affordable, we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries.
In this paper, we introduce the TPF Query Streamer that allows clients to evaluate sparql queries with continuously updating results.
Our experiments indicate that this extension significantly lowers the server complexity, at the expense of an increase in the execution time per query.
We prove that by moving the complexity of continuously evaluating queries over dynamic Linked Data to the clients and thus increasing bandwidth usage,
the cost at the server side is significantly reduced.
Our results show that this solution makes real-time querying more scalable for a large amount of concurrent clients when compared to the alternatives.
2016
More
Tutorial
Modeling, Generating, and Publishing Knowledge as Linked Data
-
Anastasia Dimou0
-
Pieter Heyvaert1
-
Ruben Taelman2
-
Ruben Verborgh3
In Knowledge Engineering and Knowledge Management: EKAW 2016 Satellite Events, EKM and Drift-an-LOD, Bologna, Italy, November 19–23, 2016, Revised Selected Papers
The process of extracting, structuring, and organizing knowledge from one or multiple data sources
and preparing it for the Semantic Web requires a dedicated class of systems.
They enable processing large and originally heterogeneous data sources and capturing new knowledge.
Offering existing data as Linked Data increases its shareability, extensibility, and reusability.
However, using Linking Data as a means to represent knowledge can be easier said than done.
In this tutorial, we elaborate on the importance of semantically annotating data and how existing technologies facilitate their mapping to Linked Data.
We introduce [R2]RML languages to generate Linked Data derived from different heterogeneous data formats –e.g., DBs, XML, or JSON–
and from different interfaces –e.g., files or Web apis.
Those who are not Semantic Web experts can annotate their data with the RMLEditor,
whose user interface hides all underlying Semantic Web technologies to data owners.
Last, we show how to easily publish Linked Data on the Web as Triple Pattern Fragments.
As a result, participants, independently of their knowledge background, can model, annotate and publish data on their own.
2017
More
Challenge
Versioned Querying with OSTRICH and Comunica in MOCHA 2018
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
In Proceedings of the 5th SemWebEval Challenge at ESWC 2018
In order to exploit the value of historical information in Linked Datasets,
we need to be able to store and query different versions of such datasets efficiently.
The 2018 edition of the Mighty Storage Challenge (MOCHA) is organized to discover the efficiency of such Linked Data stores and to detect their bottlenecks.
One task in this challenge focuses on the storage and querying of versioned datasets,
in which we participated by combining the OSTRICH triple store and the Comunica SPARQL engine.
In this article, we briefly introduce our system for the versioning task of this challenge.
We present the evaluation results that show that our system achieves fast query times for the supported queries,
but not all queries are supported by Comunica at the time of writing.
These results of this challenge will serve as a guideline for further improvements to our system.
2018
More
Workshop
The R3 Metric: Measuring Performance of Link Prioritization during Traversal-based Query Processing
-
Ruben Eschauzier0
-
Ruben Taelman1
-
Ruben Verborgh2
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management
The decentralization envisioned for the current centralized web requires querying approaches capable of accessing multiple small data sources while complying with legal constraints related to personal data, such as licenses and the GDPR. Link Traversal-based Query Processing (LTQP) is a querying approach designed for highly decentralized environments that satisfies these legal requirements. An important optimization avenue in LTQP is the order in which links are dereferenced, which involves prioritizing links to query-relevant documents. However, assessing and comparing the algorithmic performance of these systems is challenging due to various compounding factors during query execution. Therefore, researchers need an implementation-agnostic and deterministic metric that accurately measures the marginal effectiveness of link prioritization algorithms in LTQP engines. In this paper, we motivate the need for accurately measuring link prioritization performance, define and test such a metric, and outline the challenges and potential extensions of the proposed metric. Our findings show that the proposed metric highlights differences in link prioritization performance depending on the queried data fragmentation strategy. The proposed metric allows for evaluating link prioritization performance and enables easily assessing the effectiveness of future link prioritization algorithms.
2024
More
Workshop
Opportunities for Shape-based Optimization of Link Traversal Queries
-
Bryan-Elliott Tam0
-
Ruben Taelman1
-
Pieter Colpaert2
-
Ruben Verborgh3
In Proceedings of the 16th Alberto Mendelzon International Workshop on Foundations of Data Management
Data on the web is naturally unindexed and decentralized. Centralizing web data, especially personal data, raises ethical and legal concerns. Yet, compared to centralized query approaches, decentralization-friendly alternatives such as Link Traversal Query Processing (LTQP) are significantly less performant and understood. The two main difficulties of LTQP are the lack of apriori information about data sources and the high number of HTTP requests. Exploring decentralized-friendly ways to document unindexed networks of data sources could lead to solutions to alleviate those difficulties. RDF data shapes are widely used to validate linked data documents, therefore, it is worthwhile to investigate their potential for LTQP optimization. In our work, we built an early version of a source selection algorithm for LTQP using RDF data shape mappings with linked data documents and measured its performance in a realistic setup. In this article, we present our algorithm and early results, thus, opening opportunities for further research for shape-based optimization of link traversal queries. Our initial experiments show that with little maintenance and work from the server, our method can reduce up to 80% the execution time and 97% the number of links traversed during realistic queries. Given our early results and the descriptive power of RDF data shapes it would be worthwhile to investigate non-heuristic-based query planning using RDF shapes.
2024
More
Workshop
DFDP: A Declarative Form Description Pipeline for Decentralizing Web Forms
-
Ieben Smessaert0
-
Patrick Hochstenbach
-
Ben De Meester2
-
Ruben Taelman3
-
Ruben Verborgh4
In Proceedings of the 2nd International Workshop on Semantics in Dataspaces
Forms are key to bidirectional communication on the Web: without them, end-users would be unable to place online orders or file support tickets. Organizations often need multiple, highly similar forms, which currently require multiple implementations. Moreover, the data is tightly coupled to the application, restricting the end-user from reusing it with other applications, or storing the data somewhere else. Organizations and end-users have a need for a technique to create forms that are more controllable, reusable, and decentralized. To address this problem, we introduce the Declarative Form Description Pipeline (DFDP) that meets these requirements. DFDP achieves controllability through end-users’ editable declarative form descriptions. Reusability for organizations is ensured through descriptions of the form fields and associated actions. Finally, by leveraging a decentralized environment like Solid, the application is decoupled from the storage, preserving end-user control over their data. In this paper, we introduce and explain how such a declarative form description can be created and used without assumptions about the viewing environment or data storage. We show how separate applications can interoperate and be interchanged by using a description that contains details for form rendering and data submission decisions using a form, policy, and rule ontology. Furthermore, we prove how this approach solves the shortcomings of traditional Web forms. Our proposed pipeline enables organizations to save time by building similar forms without starting from scratch. Similarly, end-users can save time by letting machines prefill the form with existing data. Additionally, DFDP empowers end-users to be in control of the application they use to manage their data in a data store. User study results provide insights to further improve usability by providing automatic suggestions based on field labels entered.
2024
More
Workshop
Requirements and Challenges for Query Execution across Decentralized Environments
-
Ruben Taelman0
In Companion Proceedings of the ACM Web Conference 2024
Due to the economic and societal problems being caused by the Web’s growing centralization, there is an increasing interest in decentralizing data on the Web. This decentralization does however cause a number of technical challenges. If we want to give users in decentralized environments the same level of user experience as they are used to with centralized applications, we need solutions to these challenges. We discuss how query engines can act as layer between applications on the one hand, and decentralized environments on the other hand, Query engines therefore act as an abstraction layer that hides the complexities of decentralized data management for application developers. In this article, we outline the requirements for query engines over decentralized environments. Furthermore, we show how existing approaches meet these requirements, and which challenges remain. As such, this article offers a high-level overview of a roadmap in the query and decentralization research domains.
2024
More
Workshop
The Need for Better RDF Archiving Benchmarks
-
Olivier Pelgrin0
-
Ruben Taelman1
-
Luis Galárraga2
-
Katja Hose3
In Proceedings of the 9th Workshop on Managing the Evolution and Preservation of the Data Web
The advancements and popularity of Semantic Web technologies in the last decades have led to an
exponential adoption and availability of Web-accessible datasets. While most solutions consider such
datasets to be static, they often evolve over time. Hence, efficient archiving solutions are needed to meet
the users’ and maintainers’ needs. While some solutions to these challenges already exist, standardized
benchmarks are needed to systematically test the different capabilities of existing solutions and identify
their limitations. Unfortunately, the development of new benchmarks has not kept pace with the evolution
of RDF archiving systems. In this paper, we therefore identify the current state of the art in RDF archiving
benchmarks and discuss to what degree such benchmarks reflect the current needs of real-world use cases
and their requirements. Through this empirical assessment, we highlight the need for the development of
more advanced and comprehensive benchmarks that align with the evolving landscape of RDF archiving.
2023
More
Workshop
How Does the Link Queue Evolve during Traversal-Based Query Processing?
-
Ruben Eschauzier0
-
Ruben Taelman1
-
Ruben Verborgh2
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs
Link Traversal-based Query Processing (LTQP) is an integrated querying approach that allows the
query engine to start with zero knowledge of the data to query and discover data sources on the fly.
The query engine starts with some seed documents and dynamically discovers new data sources by
dereferencing hyperlinks in previously ingested data. Given the dynamic nature of source discovery,
query processing tends to be relatively slow. Optimization techniques exist, such as exploiting existing
structural information, but they depend on a deep understanding of the link queue during LTQP. To this
end, we investigate the evolution of the types of link sources in the link queue and introduce metrics
that describe key link queue characteristics. This paper analyses the link queue to guide future work on
LTQP query optimization approaches that exploit structural information within a Solid environment. We
find that queries exhibit two different execution patterns, one where the link queue is primarily empty
and the other where the link queue fills faster than the engine can process. Our results show that the
link queue is not functioning optimally and that our current approach to link discovery is not sufficiently
selective.
2023
More
Workshop
In-Memory Dictionary-Based Indexing of Quoted RDF Triples
-
Ruben Taelman0
-
Ruben Verborgh1
In Proceedings of the 7th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs
The upcoming RDF 1.2 recommendation is scheduled to introduce the concept of quoted triples, which allows statements to be made about other statements. Since quoted triples enable new forms of data access in SPARQL 1.2, in the form of quoted triple patterns, there is a need for new indexing strategies that can efficiently handle these data access patterns. As such, we explore and evaluate different in-memory indexing approaches for quoted triples. In this paper, we investigate four indexing approaches, and evaluate their performance over an artificial dataset with custom triple pattern queries. Our findings show that the so-called indexed quoted triples dictionary vastly outperforms other approaches in terms of query execution time at the cost of increased storage size and ingestion time. Our work shows that indexing quoted triples in a dictionary separate from non-quoted RDF terms achieves good performance, and can be implemented using well-known indexing techniques into existing systems. Therefore, we illustrate that the addition of quoted triples into the RDF stack can be achieved in a performant manner.
2023
More
Workshop
Distributed Social Benefit Allocation using Reasoning over Personal Data in Solid
-
Jonni Hanski0
-
Pieter Heyvaert1
-
Ben De Meester2
-
Ruben Taelman3
-
Ruben Verborgh4
In Proceedings of the 1st International Workshop on Data Management for Knowledge Graphs
When interacting with government institutions, citizens may often be asked to provide a number of
documents to various officials, due to the way the data is being processed by the government, and
regulation or guidelines that restrict sharing of that data between institutions. Occasionally, documents
from third parties, such as the private sector, are involved, as the data, rules, regulations and individual
private data may be controlled by different parties. Facilitating efficient flow of information in such cases
is therefore important, while still respecting the ownership and privacy of that data. Addressing these
types of use cases in data storage and sharing, the Solid initiative allows individuals, organisations and
the public sector to store their data in personal online datastores. Solid has been previously applied in
data storage within government contexts, so we decided to extend that work by adding data processing
services on top of such data and including multiple parties such as citizen and the private sector.
However, introducing multiple parties within the data processing flow may impose new challenges,
and implementing such data processing services in practice on top of Solid might present opportunities
for improvement from the perspective of the implementer of the services. Within this work, together
with the City of Antwerp in Belgium, we have produced a proof-of-concept service implementation
operating at the described intersection of public sector, citizens and private sector, to manage social
benefit allocation in a distributed environment. The service operates on distributed Linked Data stored
in multiple Solid pods in RDF, using Notation3 rules to process that data and SPARQL queries to access
and modify it. This way, our implementation seeks to respect the design principles of Solid, while taking
advantage of the related technologies for representing, processing and modifying Linked Data. This
document will describe our chosen use case, service design and implementation, and our observations
resulting from this experiment. Through the proof-of-concept implementation, we have established a
preliminary understanding of the current challenges in implementing such a service using the chosen
technologies. We have identified topics such as verification of data that should be addressed when using
such an approach in practice, assumptions related to data locations and tight coupling between our logic
between the rules and program code. Addressing these topics in future work should help further the
adoption of Linked Data as a means to solve challenges around data sharing, processing and ownership
such as with government processes involving multiple parties.
2023
More
Workshop
A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing
-
Ruben Taelman0
-
Ruben Verborgh1
In Proceedings of the 6th International Workshop on Storing, Querying and Benchmarking Knowledge Graphs
The societal and economic consequences surrounding Big Data-driven platforms have increased the call for decentralized solutions. However, retrieving and querying data in more decentralized environments requires fundamentally different approaches, whose properties are not yet well understood. Link-Traversal-based Query Processing (LTQP) is a technique for querying over decentralized data networks, in which a client-side query engine discovers data by traversing links between documents. Since decentralized environments are potentially unsafe due to their non-centrally controlled nature, there is a need for client-side LTQP query engines to be resistant against security threats aimed at the query engine’s host machine or the query initiator’s personal data. As such, we have performed an analysis of potential security vulnerabilities of LTQP. This article provides an overview of security threats in related domains, which are used as inspiration for the identification of 10 LTQP security threats. This list of security threats forms a basis for future work in which mitigations for each of these threats need to be developed and tested for their effectiveness. With this work, we start filling the unknowns for enabling query execution over decentralized environments. Aside from future work on security, wider research will be needed to uncover missing building blocks for enabling true data decentralization.
2022
More
Workshop
A Policy-Oriented Architecture for Enforcing Consent in Solid
-
Laurens Debackere0
-
Pieter Colpaert1
-
Ruben Taelman2
-
Ruben Verborgh3
In Proceedings of the 2nd International Workshop on Consent Management in Online Services, Networks and Things
The Solid project aims to restore end-users’ control over their data by decoupling services and applications from data storage. To realize data governance by the user, the Solid Protocol 0.9 relies on Web Access Control, which has limited expressivity and interpretability. In contrast, recent privacy and data protection regulations impose strict requirements on personal data processing applications and the scope of their operation. The Web Access Control mechanism lacks the granularity and contextual awareness needed to enforce these regulatory requirements. Therefore, we suggest a possible architecture for relating Solid’s low-level technical access control rules with higher-level concepts such as the legal basis and purpose for data processing, the abstract types of information being processed, and the data sharing preferences of the data subject. Our architecture combines recent technical efforts by the Solid community panels with prior proposals made by researchers on the use of ODRL and SPECIAL policies as an extension to Solid’s authorization mechanism. While our approach appears to avoid a number of pitfalls identified in previous research, further work is needed before it can be implemented and used in a practical setting.
2022
More
Workshop
Optimizing Approximate Membership Metadata in Triple Pattern Fragments for Clients and Servers
-
Ruben Taelman0
-
Joachim Van Herwegen1
-
Miel Vander Sande2
-
Ruben Verborgh3
In Proceedings of the 13th International Workshop on Scalable Semantic Web Knowledge Base Systems
Depending on the HTTP interface used for publishing Linked Data, the effort of evaluating a SPARQL query can be redistributed differently between clients and servers. For instance, lower server-side CPU usage can be realized at the expense of higher bandwidth consumption. Previous work has shown that complementing lightweight interfaces such as Triple Pattern Fragments (TPF) with additional metadata can positively impact the performance of clients and servers. Specifically, Approximate Membership Filters (AMFs)—data structures that are small and probabilistic—in the context of TPF were shown to reduce the number of HTTP requests, at the expense of increasing query execution times. In order to mitigate this significant drawback, we have investigated unexplored aspects of AMFs as metadata on TPF interfaces. In this article, we introduce and evaluate alternative approaches for server-side publication and client-side consumption of AMFs within TPF to achieve faster query execution, while maintaining low server-side effort. Our alternative client-side algorithm and the proposed server configurations significantly reduce both the number of HTTP requests and query execution time, with only a small increase in server load, thereby mitigating the major bottleneck of AMFs within TPF. Compared to regular TPF, average query execution is more than 2 times faster and requires only 10% of the number of HTTP requests, at the cost of at most a 10% increase in server load. These findings translate into a set of concrete guidelines for data publishers on how to configure AMF metadata on their servers.
2020
More
Workshop
RDF Test Suite: Improving Developer Experience for Building Specification-Compliant Libraries
-
Ruben Taelman0
In Proceedings of the 1st Workshop on The Semantic Web in Practice: Tools and Pedagogy
Guaranteeing compliance of software libraries to certain specifications is crucial for ensuring interoperability within the Semantic Web. While many specifications publish their own declarative test suite for testing compliance, the actual execution of these test suites can add a huge overhead for library developers. In order to remove this burden from these developers, we introduce a JavaScript tool called RDF Test Suite that takes care of all the required bookkeeping behind test suite execution. In this report, we discuss the design goals of RDF Test Suite, how it has been implemented, and how it can be used. In practice, this tool is being used in a variety of libraries, and it ensures their specification compliance with minimal overhead.
2020
More
Workshop
Towards Querying in Decentralized Environments with Privacy-Preserving Aggregation
-
Ruben Taelman0
-
Simon Steyskal1
-
Sabrina Kirrane2
In Proceedings of the 4th Workshop on Storing, Querying, and Benchmarking the Web of Data
The Web is a ubiquitous economic, educational, and collaborative space. However, it also serves as a haven for personal information harvesting. Existing decentralised Web-based ecosystems, such as Solid, aim to combat personal data exploitation on the Web by enabling individuals to manage their data in the personal data store of their choice. Since personal data in these decentralised ecosystems are distributed across many sources, there is a need for techniques to support efficient privacy-preserving query execution over personal data stores. Towards this end, in this position paper we present a framework for efficient privacy preserving federated querying, and highlight open research challenges and opportunities. The overarching goal being to provide a means to position future research into privacy-preserving querying within decentralised environments.
2020
More
Workshop
The Fundamentals of Semantic Versioned Querying
-
Ruben Taelman0
-
Hideaki Takeda1
-
Miel Vander Sande2
-
Ruben Verborgh3
In Proceedings of the 12th International Workshop on Scalable Semantic Web Knowledge Base Systems
co-located with 17th International Semantic Web Conference
The domain of RDF versioning concerns itself with the storage of different versions of Linked Datasets.
The ability of querying over these versions is an active area of research, and allows for basic insights to be discovered,
such as tracking the evolution of certain things in datasets. Querying can however only get you so far.
In order to derive logical consequences from existing knowledge, we need to be able to reason over this data, such as ontology-based inferencing.
In order to achieve this, we explore fundamental concepts on semantic querying of versioned datasets using ontological knowledge.
In this work, we present these concepts as a semantic extension of the existing RDF versioning concepts that focus on syntactical versioning.
We remain general and assume that versions do not necessarily follow a purely linear temporal relation.
This work lays a foundation for reasoning over RDF versions from a querying perspective,
using which RDF versioning storage, query and reasoning systems can be designed.
2018
More
Workshop
A Preliminary Open Data Publishing Strategy for Live Data in Flanders
-
Julián Andrés Rojas Meléndez0
-
Brecht Van de Vyvere1
-
Arne Gevaert2
-
Ruben Taelman3
-
Pieter Colpaert4
-
Ruben Verborgh5
In Proceedings of the 27th International Conference Companion on World Wide Web
For smart decision making, user agents need live and historic access to open data from sensors installed in the public domain.
In contrast to a closed environment, for Open Data and federated query processing algorithms,
the data publisher cannot anticipate in advance on specific questions,
nor can it deal with a bad cost-efficiency of the server interface when data consumers increase. When publishing observations from sensors,
different fragmentation strategies can be thought of depending on how the historic data needs to be queried.
Furthermore, both publish/subscribe and polling strategies exist to publish live updates.
Each of these strategies come with their own trade-offs regarding cost-efficiency of the server-interface, user-perceived performance and cpu use.
A polling strategy where multiple observations are published in a paged collection was tested in a proof of concept for parking spaces availability.
In order to understand the different resource trade-offs presented by publish/subscribe and polling publication strategies,
we devised an experiment on two machines, for a scalability test.
The preliminary results were inconclusive and suggest more large scale tests are needed in order to see a trend.
While the large-scale tests will be performed in future work,
the proof of concept helped to identify the technical Open Data principles for the 13 biggest cities in Flanders.
2018
More
Workshop
Describing configurations of software experiments as Linked Data
-
Joachim Van Herwegen0
-
Ruben Taelman1
-
Sarven Capadisli2
-
Ruben Verborgh3
In Proceedings of the First Workshop on Enabling Open Semantic Science (SemSci)
Within computer science engineering, research articles often rely on software experiments in order to evaluate contributions.
Reproducing such experiments involves setting up software, benchmarks, and test data.
Unfortunately, many articles ambiguously refer to software by name only, leaving out crucial details for reproducibility,
such as module and dependency version numbers or the configuration of individual components in different setups.
To address this, we created the Object-Oriented Components ontology for the semantic description of software components and their configuration.
This article discusses the ontology and its application, and demonstrates with a use case how to publish experiments and their software configurations on the Web.
In order to enable semantic interlinking between configurations and modules,
we published the metadata of all 500,000+ JavaScript libraries on npm as 200,000,000+ RDF triples.
Through our work, research articles can refer by URL to fine-grained descriptions of experimental setups.
This brings us faster to accurate reproductions of experiments, and facilitates the evaluation of new research contributions with different software configurations.
In the future, software could be instantiated automatically based on these descriptions and configurations,
reasoning and querying can be applied to software configurations for meta-research purposes.
2017
More
Workshop
Versioned Triple Pattern Fragments: A Low-cost Linked Data Interface Feature for Web Archives
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
-
Erik Mannens3
In Proceedings of the 3rd Workshop on Managing the Evolution and Preservation of the Data Web
Linked Datasets typically evolve over time because triples can be
removed from or added to datasets, which results in different dataset versions.
While most attention is typically given to the latest dataset version, a lot of useful
information is still present in previous versions and its historical evolution. In order
to make this historical information queryable at Web scale, a low-cost interface is
required that provides access to different dataset versions. In this paper, we add a
versioning feature to the existing Triple Pattern Fragments interface for queries at,
between and for versions, with an accompanying vocabulary for describing the
results, metadata and hypermedia controls. This interface feature is an important
step into the direction of making versioned datasets queryable on the Web, with a
low publication cost and effort.
2017
More
Workshop
Multidimensional Interfaces for Selecting Data within Ordinal Ranges
-
Ruben Taelman0
-
Pieter Colpaert1
-
Ruben Verborgh2
-
Pieter Colpaert3
-
Erik Mannens4
In Proceedings of the 7th International Workshop on Consuming Linked Data
Linked Data interfaces exist in many flavours, as evidenced by subject
pages, sparql endpoints, triple pattern interfaces, and data dumps. These interfaces
are mostly used to retrieve parts of a complete dataset, such parts can for example be
defined by ranges in one or more dimensions. Filtering Linked Data by dimensions
such as time range, geospatial area, or genomic location, requires the lookup of data
within ordinal ranges. To make retrieval by such ranges generic and cost-efficient,
we propose a REST solution in-between looking up data within ordinal ranges
entirely on the server, or entirely on the client. To this end, we introduce a method
for extending any Linked Data interface with an n-dimensional interface-level index
such that n-dimensional ordinal data can be selected using n-dimensional ranges.
We formally define Range Gates and Range Fragments and theoretically evaluate
the cost-efficiency of hosting such an interface. By adding a multidimensional
index to a Linked Data interface for multidimensional ordinal data, we found that
we can get benefits from both worlds: the expressivity of the server raises, yet
remains more cost-efficient than an interface providing the full functionality on
the server-side. Furthermore, the client now shares in the effort to filter the data.
This makes query processing becomes more flexible to the end-user, because the
query plan can be altered by the engine. In future work we hope to apply Range
Gates and Range Fragments to real-world interfaces to give quicker access to data
within ordinal ranges
2016
More
Workshop
Continuously Updating Query Results over Real-Time Linked Data
-
Ruben Taelman0
-
Ruben Verborgh1
-
Pieter Colpaert2
-
Erik Mannens3
-
Rik Van de Walle4
In Proceedings of the 2nd Workshop on Managing the Evolution and Preservation of the Data Web
Existing solutions to query dynamic Linked Data sources extend the SPARQL language,
and require continuous server processing for each query.
Traditional SPARQL endpoints accept highly expressive queries, contributing to high server cost.
Extending these endpoints for time-sensitive queries increases the server cost even further.
To make continuous querying over real-time Linked Data more affordable,
we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries.
In this paper, we discuss a framework on top of TPF that allows clients to execute
SPARQL queries with continuously updating results.
Our experiments indicate that this extension significantly lowers the server complexity.
The trade-off is an increase in the execution time per query.
We prove that by moving the complexity of continuously evaluating real-time queries over Linked Data to the clients
and thus increasing the bandwidth usage, the cost of server-side interfaces is significantly reduced.
Our results show that this solution makes real-time querying more scalable in terms of cpu usage for a large amount
of concurrent clients when compared to the alternatives.
2016
More
Demo
Demonstration of Link Traversal SPARQL Query Processing over the Decentralized Solid Environment
-
Ruben Taelman0
-
Ruben Verborgh1
In Proceedings of the 27th International Conference on Extending Database Technology (EDBT)
To tackle economic and societal problems originating from the centralization of Knowledge Graphs on the Web, there has been an increasing interest towards decentralizing Knowledge Graphs across a large number of small authoritative sources. In order to effectively build user-facing applications, there is a need for efficient query engines that abstract the complexities around accessing such massively Decentralized Knowledge Graphs (DKGs). As such, we have designed and implemented novel Link Traversal Query Processing algorithms into the Comunica query engine framework that are capable of efficiently evaluating SPARQL queries across DKGs provided by the Solid decentralization initiative. In this article, we demonstrate this query engine through a Web-based interface over which SPARQL queries can be executed over simulated and real-world Solid environments. Our demonstration shows the benefits of a traversal-based approach towards querying DKGs, and uncovers opportunities for further optimizations in future work in terms of both query execution and discovery algorithms.
2024
More
Demo
GLENDA: Querying RDF Archives with full SPARQL
-
Olivier Pelgrin0
-
Ruben Taelman1
-
Luis Galárraga2
-
Katja Hose3
In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos
The dynamicity of semantic data has propelled the research
on RDF Archiving, i.e., the task of storing and making the full history of
large RDF datasets accessible. However, existing archiving techniques fail
to scale when confronted with very large RDF datasets and support only
simple SPARQL queries. In this demonstration, we therefore showcase
GLENDA, a system that can run full SPARQL 1.1 compliant queries over
large RDF archives. We achieve this through a multi-snapshot change-
based storage architecture that we interface using the Comunica query
engine. Thanks to this integration we demonstrate that fast SPARQL
query processing over multiple versions of a knowledge graph is possible.
Moreover, our demonstration provides different statistics about the history
of RDF datasets that can be useful for tasks beyond querying and
by providing insights about the evolution dynamics of the data.
2023
More
Demo
Solid Web Monetization
-
Merlijn Sebrechts0
-
Tom Goethals1
-
Thomas Dupont2
-
Wannes Kerckhove3
-
Ruben Taelman4
-
Filip De Turck5
-
Bruno Volckaert6
In International Conference on Web Engineering
The Solid decentralization effort decouples data from services, so that users are in full control over their personal data. In this light, Web Monetization has been proposed as an alternative business model for web services that does not depend on data collection anymore. Integrating Web Monetization with Solid, however, remains difficult because of the heterogeneity of Interledger wallet implementations, lack of mechanisms for securely paying on behalf of a user, and an inherent issue of trusting content providers to handle payments. We propose the Web Monetization Provider as a solution to these challenges. The WMP acts as a third party, hiding the underlying complexity of transactions and acting as a source of trust in Web Monetization interactions. This demo shows a working end-to-end example including a website providing monetized content, a WMP, and a dashboard for configuring WMP into a Solid identity.
2022
More
Demo
PROV4ITDaTa: Transparent and direct transfer of personal data to personal stores
-
Gertjan De Mulder0
-
Ben De Meester1
-
Pieter Heyvaert2
-
Ruben Taelman3
-
Ruben Verborgh4
-
Anastasia Dimou5
In WWW2021, the International World Wide Web Conference
Data is scattered across service providers, heterogeneously structured in various formats.
By lack of interoperability, data portability is hindered, and thus user control is inhibited.
An interoperable dataportability solution for transferring personal data is needed.
We demo PROV4ITDaTa: a Web application, that allows users to transfer personal data into an interoperable format to their personal datastore.
PROV4ITDaTa leverages the open-source solutions RML.io, Comunica, and Solid:
(i) the RML.io toolset to describe how to accessdata from service providers and generate interoperable datasets;
(ii) Comunica to query these and more flexibly generate enricheddatasets;
and (iii) Solid Pods to store the generated data as LinkedData in personal data stores.
As opposed to other (hard-coded) so-lutions, PROV4ITDaTa is fully transparent,
where each componentof the pipeline is fully configurable and automatically generatesdetailed provenance trails.
Furthermore, transforming the personaldata into RDF allows for an interopable solution.
By maximizing theuse of open-source tools and open standards,
PROV4ITDaTa facilitates the shift towards a data ecosystem wherein users have controlof their data,
and providers can focus on their service instead oftrying to adhere to interoperability requirements.
2021
More
Demo
Discovering Data Sources in a Distributed Network of Heritage Information
-
Miel Vander Sande0
-
Sjors de Valk
-
Enno Meijers
-
Ruben Taelman3
-
Herbert Van de Sompel
-
Ruben Verborgh5
In Proceedings of the Posters and Demo Track of the 15th International Conference on Semantic Systems
The Netwerk Digitaal Erfgoed is a Dutch partnership that focuses on improving the visibility, usability and sustainability of digital collections in the cultural heritage sector. The vision is to improve the usability of the data by sur-mounting the borders between the separate collections of the cultural heritage institutions. Key concepts in this vision are the alignment of the data by using shared descriptions (e.g. thesauri), and the publication of the data as Linked Open Data. This demo paper describes a Proof of Concept to test this vision. It uses a register, where only summaries of datasets are stored, instead of all the data. Based on these summaries, a portal can query the register to find what datasources might be of interest, and then query the data directly from the relevant data sources.
2019
More
Demo
Using an existing website as a queryable low-cost LOD publishing interface
-
Brecht Van de Vyvere0
-
Ruben Taelman1
-
Pieter Colpaert2
-
Ruben Verborgh3
In Proceedings of the 16th Extended Semantic Web Conference: Posters and Demos
Maintaining an Open Dataset comes at an extra recurring cost when it is published in a dedicated Web interface. As there is not often a direct financial return from publishing a dataset publicly, these extra costs need to be minimized. Therefore we want to explore reusing existing infrastructure by enriching existing websites with Linked Data. In this demonstrator, we advised the data owner to annotate a digital heritage website with JSON-LD snippets, resulting in a dataset of more than three million triples that is now available and officially maintained. The website itself is paged, and thus hydra partial collection view controls were added in the snippets. We then extended the modular query engine Comunica to support following page controls and extracting data from HTML documents while querying. This way, a SPARQL or GraphQL query over multiple heterogeneous data sources can power automated data reuse. While the query performance on such an interface is visibly poor, it becomes easy to create composite data dumps. As a result of implementing these building blocks in Comunica, any paged collection and enriched HTML page now becomes queryable by the query engine. This enables heterogenous data interfaces to share functionality and become technically interoperable.
2019
More
Demo
GraphQL-LD: Linked Data Querying with GraphQL
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos
The Linked Open Data cloud has the potential of significantly enhancing and transforming end-user applications.
For example, the use of URIs to identify things allows data joining between separate data sources.
Most popular (Web) application frameworks, such as React and Angular have limited support for querying the Web of Linked Data,
which leads to a high-entry barrier for Web application developers.
Instead, these developers increasingly use the highly popular GraphQL query language for retrieving data from GraphQL APIs,
because GraphQL is tightly integrated into these frameworks.
In order to lower the barrier for developers towards Linked Data consumption, the Linked Open Data cloud needs to be queryable with GraphQL as well.
In this article, we introduce a method for transforming GraphQL queries coupled with a JSON-LD context to SPARQL,
and a method for converting SPARQL results to the GraphQL query-compatible response.
We demonstrate this method by implementing it into the Comunica framework.
This approach brings us one step closer towards widespread Linked Data consumption for application development.
2018
More
Demo
Demonstration of Comunica, a Web framework for querying heterogeneous Linked Data interfaces
-
Joachim Van Herwegen0
-
Ruben Taelman1
-
Miel Vander Sande2
-
Ruben Verborgh3
In Proceedings of the 17th International Semantic Web Conference: Posters and Demos
Linked Data sources can appear in a variety of forms, going from SPARQL endpoints to Triple Pattern Fragments and data dumps.
This heterogeneity among Linked Data sources creates an added layer of complexity when querying or combining results from those sources.
To ease this problem, we created a modular engine, Comunica, that has modules for evaluating SPARQL queries and supports heterogeneous interfaces.
Other modules for other query or source types can easily be added.
In this paper we showcase a Web client that uses Comunica to evaluate federated SPARQL queries through automatic source type identification and interaction.
2018
More
Demo
OSTRICH: Versioned Random-Access Triple Store
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
In Proceedings of the 27th International Conference Companion on World Wide Web
The Linked Open Data cloud is evergrowing and many datasets are frequently being updated.
In order to fully exploit the potential of the information that is available in and over historical dataset versions,
such as discovering evolution of taxonomies or diseases in biomedical datasets,
we need to be able to store and query the different versions of Linked Datasets efficiently.
In this demonstration, we introduce OSTRICH, which is an efficient triple store with supported for versioned query evaluation.
We demonstrate the capabilities of OSTRICH using a Web-based graphical user interface in which a store can be opened or created.
Using this interface, the user is able to query in, between, and over different versions, ingest new versions, and retrieve summarizing statistics.
2018
More
Demo
Live Storage and Querying of Versioned Datasets on the Web
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
-
Erik Mannens3
In Proceedings of the 14th Extended Semantic Web Conference: Posters and Demos
Linked Datasets often evolve over time for a variety of reasons.
While typical scenarios rely on the latest version only, useful knowledge may still be contained within or between older versions,
such as the historical information of biomedical patient data. In order to make this historical information cost- efficiently available on the Web,
a low-cost interface is required for providing access to versioned datasets.
For our demonstration, we set up a live Triple Pattern Fragments interface for a versioned dataset with queryable access.
We explain different version query types of this interface, and how it communicates with a storage solution that can handle these queries efficiently.
2017
More
Demo
Querying Dynamic Datasources with Continuously Mapped Sensor Data
-
Ruben Taelman0
-
Pieter Heyvaert1
-
Ruben Verborgh2
-
Erik Mannens3
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos
The world contains a large amount of sensors that produce new data at
a high frequency. It is currently very hard to find public services that expose these
measurements as dynamic Linked Data. We investigate how sensor data can be
published continuously on the Web at a low cost. This paper describes how the
publication of various sensor data sources can be done by continuously mapping
raw sensor data to RDF and inserting it into a live, low-cost server. This makes it
possible for clients to continuously evaluate dynamic queries using public sensor
data. For our demonstration, we will illustrate how this pipeline works for the
publication of temperature and humidity data originating from a microcontroller,
and how it can be queried.
2016
More
Demo
Linked Sensor Data Generation using Queryable RML Mappings
-
Pieter Heyvaert0
-
Ruben Taelman1
-
Ruben Verborgh2
-
Erik Mannens3
In Proceedings of the 15th International Semantic Web Conference: Posters and Demos
As the amount of generated sensor data is increasing, semantic
interoperability becomes an important aspect in order to support
efficient data distribution and communication. Therefore, the integration
of (sensor) data is important, as this data is coming from different
data sources and might be in different formats. Furthermore, reusable
and extensible methods for this integration are required in order to be
able to scale with the growing number of applications that generate semantic
sensor data. Current research efforts allow to map sensor data
to Linked Data in order to provide semantic interoperability. However,
they lack support for multiple data sources, hampering the integration.
Furthermore, the used methods are not available for reuse or are not extensible,
which hampers the development of applications. In this paper,
we describe how the RDF Mapping Language (RML) and a Triple Pattern
Fragments (TPF) server are used to address these shortcomings. The
demonstration consists of a micro controller that generates sensor data.
The data is captured and mapped to rdf triples using module-specific
RML mappings, which are queried from a TPF server.
2016
More
Poster
Optimizing Traversal Queries of Sensor Data Using a Rule-Based Reachability Approach
-
Bryan-Elliott Tam0
-
Ruben Taelman1
-
Julián Andrés Rojas Meléndez2
-
Pieter Colpaert3
In Proceedings of the 23rd International Semantic Web Conference: Posters and Demos
Link Traversal queries face challenges in completeness and long execution time due to the size of the web.
Reachability criteria define completeness by restricting the links followed by engines. However, the number of
links to dereference remains the bottleneck of the approach. Web environments often have structures exploitable
by query engines to prune irrelevant sources. Current criteria rely on using information from the query definition
and predefined predicate. However, it is difficult to use them to traverse environments where logical expressions
indicate the location of resources. We propose to use a rule-based reachability criterion that captures logical
statements expressed in hypermedia descriptions within linked data documents to prune irrelevant sources. In
this poster paper, we show how the Comunica link traversal engine is modified to take hints from a hypermedia
control vocabulary, to prune irrelevant sources. Our preliminary findings show that by using this strategy, the
query engine can significantly reduce the number of HTTP requests and the query execution time without
sacrificing the completeness of results. Our work shows that the investigation of hypermedia controls in link
pruning of traversal queries is a worthy effort for optimizing web queries of unindexed decentralized databases.
2024
More
Poster
Observations on Bloom Filters for Traversal-Based Query Execution over Solid Pods
-
Jonni Hanski0
-
Ruben Taelman1
-
Ruben Verborgh2
In Proceedings of the 21st Extended Semantic Web Conference: Posters and Demos
Traversal-based query execution enables the resolving of
queries over Linked Data documents, using a follow-your-nose approach
to locating query-relevant data by following series of links through documents.
This traversal, however, incurs an unavoidable overhead in the form
of data access costs. Through only following links known to be relevant
for answering a given query, this overhead could be minimized. Prior
work exists in the form of reachability conditions to determine the links
to dereference, however this does not take into consideration the contents
behind a given link. Within this work, we have explored the possibility
of using Bloom filters to prune query-irrelevant links based on the triple
patterns contained within a given query, when performing traversal-based
query execution over Solid pods containing simulated social network data
as an example use case. Our discoveries show that, with relatively uniform
data across an entire benchmark dataset, this approach fails to effectively
filter links, especially when the queries contain triple patterns with low
selectivity. Thus, future work should consider the query plan beyond
individual patterns, or the structure of the data beyond individual triples,
to allow for more effective pruning of links.
2024
More
Poster
Personalized Medicine Through Personal Data Pods
-
Elias Crum0
-
Ruben Taelman1
-
Bart Buelens2
-
Gokhan Ertaylan3
-
Ruben Verborgh4
In Proceedings of the 15th International SWAT4HCLS Conference
Medical care is in the process of becoming increasingly personalized through the use of patient genetic
information. Concurrently, privacy concerns regarding collection and storage of sensitive personal
genome sequence data have necessitated public debate and legal regulation. Here we identify two
fundamental challenges associated with privacy and shareability of genomic data storage and propose
the use of Solid pods to address these challenges. We establish that personal data pods using Solid
specifications can enable decentralized storage, increased patient control over their data, and support of
Linked Data formats, which when combined, could offer solutions to challenges currently restricting
personalized medicine in practice.
2024
More
Poster
Towards Algebraic Mapping Operators for Knowledge Graph Construction
-
Sitt Min Oo0
-
Ben De Meester1
-
Ruben Taelman2
-
Pieter Colpaert3
In Proceedings of the 22nd International Semantic Web Conference: Posters and Demos
Declarative knowledge graph construction has matured to the point where state of the art techniques are
focusing on optimizing the mapping processes. However, these optimization techniques use the syntax
of the mapping language without considering the impact of the semantics. As a result, it is difficult
to compare different engines fairly due to the obscurity in their semantic differences. In this poster
paper, we propose an initial set of algebraic mapping operators to define the operational semantics of
mapping processes, and provide a first step towards a theoretical foundation for mapping languages.
We translated a series of RML documents to algebraic mapping operators to show the feasibility of our
approach. We believe that further pursuing these initial results will lead to greater interoperability of
mapping engines and languages, intensify requirements analysis for the upcoming RML standardization
work, and an improved developer experience for all current and future mapping engines.
2023
More
Poster
Reinforcement Learning-based SPARQL Join Ordering Optimizer
-
Ruben Eschauzier0
-
Ruben Taelman1
-
Meike Morren
-
Ruben Verborgh3
In Proceedings of the 20th Extended Semantic Web Conference: Posters and Demos
In recent years, relational databases successfully leverage reinforcement learning to optimize query plans. For graph databases and RDF quad stores, such research has been limited, so there is a need to understand the impact of reinforcement learning techniques. We explore a reinforcement learning-based join plan optimizer that we design specifically for optimizing join plans during SPARQL query planning. This paper presents key aspects of this method and highlights open research problems. We argue that while we can reuse aspects of relational database optimization, SPARQL query optimization presents unique challenges not encountered in relational databases. Nevertheless, initial benchmarks show promising results that warrant further exploration.
2023
More
Poster
PoDiGG: A Public Transport RDF Dataset Generator
-
Ruben Taelman0
-
Ruben Verborgh1
-
Tom De Nies
-
Erik Mannens3
In Proceedings of the 26th International Conference Companion on World Wide Web
A large amount of public transport data is made available by many different providers,
which makes RDF a great method for integrating these datasets.
Furthermore, this type of data provides a great source of information that combines both geospatial and temporal data.
These aspects are currently undertested in RDF data management systems, because of the limited availability of realistic input datasets.
In order to bring public transport data to the world of benchmarking, we need to be able to create synthetic variants of this data.
In this paper, we introduce a dataset generator with the capability to create realistic public transport data.
This dataset generator, and the ability to configure it on different levels,
makes it easier to use public transport data for benchmarking with great flexibility.
2017
More
Poster
Exposing RDF Archives using Triple Pattern Fragments
-
Ruben Taelman0
-
Ruben Verborgh1
-
Erik Mannens2
In Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management: Posters and Demos
Linked Datasets typically change over time, and knowledge of this historical information can be useful.
This makes the storage and querying of Dynamic Linked Open Data an important area of research.
With the current versioning solutions, publishing Dynamic Linked Open Data at Web-Scale is possible, but too expensive.
We investigate the possibility of using the low-cost Triple Pattern Fragments (TPF) interface to publish versioned Linked Open Data.
In this paper, we discuss requirements for supporting versioning in the TPF framework, on the level of the interface, storage and client,
and investigate which trade-offs exist. These requirements lay the foundations for further research in the area of low-cost,
Web-Scale dynamic Linked Open Data publication and querying.
2016
More
Poster
Moving Real-Time Linked Data Query Evaluation to the Client
-
Ruben Taelman0
-
Ruben Verborgh1
-
Pieter Colpaert2
-
Erik Mannens3
-
Rik Van de Walle4
In Proceedings of the 13th Extended Semantic Web Conference: Posters and Demos
Traditional RDF stream processing engines work completely server-side, which contributes to a high server cost.
For allowing a large number of concurrent clients to do continuous querying,
we extend the low-cost Triple Pattern Fragments (TPF) interface with support for time-sensitive queries.
In this poster, we give the overview of a client-side RDF stream processing engine on top of TPF.
Our experiments show that our solution significantly lowers the server load while increasing the load on the clients.
Preliminary results indicate that our solution moves the complexity of continuously evaluating real-time queries
from the server to the client, which makes real-time querying much more scalable for a large amount of concurrent
clients when compared to the alternatives.
2016
More
PhD Symposium
Continuously Self-Updating Query Results over Dynamic Heterogeneous Linked Data
-
Ruben Taelman0
In The Semantic Web. Latest Advances and New Domains: 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 – June 2, 2016, Proceedings
Our society is evolving towards massive data consumption from heterogeneous sources, which includes rapidly changing data
like public transit delay information.
Many applications that depend on dynamic data consumption require highly available server interfaces.
Existing interfaces involve substantial costs to publish rapidly changing data with high availability,
and are therefore only possible for
organisations that can afford such an expensive infrastructure.
In my doctoral research, I investigate how to publish and consume real-time and historical Linked Data on a large scale.
To reduce server-side costs for making dynamic data publication affordable,
I will examine different possibilities to divide query evaluation between servers and clients.
This paper discusses the methods I aim to follow together with preliminary results and the steps required to use this solution.
An initial prototype achieves significantly lower server processing cost per query, while maintaining reasonable
query execution times and client costs.
Given these promising results, I feel confident this research direction is a viable solution for offering low-cost
dynamic Linked Data interfaces as opposed to the existing high-cost solutions.
2016
More
Other publications
Position Statement
Bridges between GraphQL and RDF
-
Ruben Taelman0
-
Miel Vander Sande1
-
Ruben Verborgh2
In W3C Workshop on Web Standardization for Graph Data
GraphQL offers a highly popular query languages for graphs, which is well known among Web developers. Each GraphQL data graph is scoped within an interface-specific schema, which makes it difficult to integrate data from multiple interfaces. The RDF graph model offers a solution to this integration problem. Different techniques can enable querying over RDF graphs using GraphQL queries. In this position statement, we provide a high-level overview of the existing techniques, and how they differ. We argue that each of these techniques have their merits, but standardization is essential to simplify the link between GraphQL and RDF.
2019
More
Master’s Thesis
Continuously Updating Queries over Real-Time Linked Data
-
Ruben Taelman0
This dissertation investigates the possibilities of having continuously updating
queries over Linked Data with a focus on server availability. This work builds upon the ideas
of Linked Data Fragments to let the clients do most of the work when executing a query. The
server adds metadata to make the clients aware of the data volatility for making sure the
query results are always up-to-date. The implementation of the framework that is proposed,
is eventually tested and compared to other alternative approaches.
2015
More
Academic Involvement
Tutorials
-
An Introduction to GraphQL
2019
Olaf Hartig, Ruben Taelman
Half-day tutorial at the 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 2019.
-
Building Decentralized Applications with Solid and Comunica
2019
Ruben Taelman, Joachim Van Herwegen, Ruben Verborgh
Full-day tutorial at the 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 2019.
-
Querying Linked Data with Comunica
2019
Ruben Taelman, Joachim Van Herwegen
Half-day tutorial at the 16th Extended Semantic Web Conference (ESWC2019), Portoroz, Slovenia, 2019.
-
Modeling, Generating and Publishing knowledge as Linked Data
2016
Anastasia Dimou, Ruben Taelman, Pieter Heyvaert, Ruben Verborgh
Full-day tutorial at the 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW2016), Bologna, Italy, 2016.
Organizing Committee Member
Journal Reviewer
Program Committee Member
- International Semantic Web Conference (ISWC)
- 2019 (Resources Track)
- 2020 (Resources Track)
- 2021 (Resources Track)
- 2022 (Research Track, Resources Track)
- 2023 (Research Track, Resources Track - Senior)
- 2024 (Research Track, Resources Track - Senior)
- Extended Semantic Web Conference (ESWC)
- 2018 (Research Track)
- 2019 (Research Track)
- 2020 (Research Track, Poster & Demo Track)
- 2021 (Research Track, Resources Track, In-Use Track)
- 2022 (Research Track, Resources Track, In-Use Track)
- 2023 (Research Track, Resources Track)
- 2024 (Research Track)
- The Web Conference
- 2025 (Semantics and Knowledge Track - Area Chair)
- 2024 (Semantics and Knowledge Track - Area Chair)
- 2023 (Semantics and Knowledge Track)
- 2022 (Semantics and Knowledge Track)
- 2018 (Developers’ Track)
- International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE)
- SEMANTiCS Conference
- 2017 (Poster & Demo Track)
- 2018 (Research Track, Poster & Demo Track)
- 2019 (Research Track)
- 2020 (Research Track)
- 2021 (Research Track)
- 2022 (Research Track)
- ACM International Conference on Information and Knowledge Management
- Workshop on Decentralizing the Semantic Web
- Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW)
- 2017
- 2018
- 2019
- 2020
- 2021
- 2022
- 2023
- Workshop on Data Management for Knowledge Graphs (DMKG)
- Linked Data in Architecture and Construction Workshop
- Open Mighty Storage Challenge (MOCHA)
- International Workshop on Semantics for Transport (Sem4Tra)