Hi!
I'm Ruben Taelman, a Web postdoctoral researcher at IDLab,
with a focus on decentralization, Linked Data publishing, and querying.
My goal is to make data accessible for everyone by providing
intelligent infrastructure and algorithms for data publication and retrieval.
To support my research, I co-edit specifications such as SPARQL,
and develop various open source JavaScript libraries such as streaming RDF parsers and the Comunica engine to query Linked Data on the Web.
As this website itself contains Linked Data, you can query it live with Comunica.
Have a look at my publications or projects
and contact me if any of those topics interest you.
Latest blog posts
-
Did AI Crawlers Kill SPARQL Federation?
Public Knowledge Graph infrastructure is degrading due to AI crawlers.
RDF provides the basis for distributing Knowledge Graphs (KGs) across different locations, which is useful when KGs cover different data domains with varying purposes, are managed by different teams and organizations, or are exposed by different access policies. One of the most popular ways of publishing a KG is through a SPARQL endpoint, which offers queryable access. When multiple of these KGs need to be integrated, techniques such as SPARQL federation can be used. While many KGs have been available as public SPARQL endpoints, their openness is currently being challenged by the huge load that is placed on them by modern AI crawlers that power LLMs. Recently, public SPARQL endpoints have started putting in place stricter usage restrictions to avoid going down under this increased server load. While these restrictions limit the range of SPARQL queries that can be executed over them, it becomes especially problematic for SPARQL federated queries, which often involves sending multiple smaller queries to endpoints in a short timeframe.
-
The cost of modularity in SPARQL
How much do modularity and decentralization conflict with centralized speed?
The JavaScript-based Comunica SPARQL query engine is designed for querying over decentralized environments, e.g. through federation and link traversal. In addition, it makes use of a modular architecture to achieve high flexibility for developers. To determine the impact of these design decisions, and to be able to put the base level performance of Comunica in perspective, I compared its performance to state-of-the-art centralized SPARQL engines in terms of querying over centralized Knowledge Graphs. Results show that Comunica can closely match the performance of state-of-the-art SPARQL query engines, despite having vastly different optimization criteria.
Highlighted publications
- Conference Link Traversal Query Processing over Decentralized Environments with Structural AssumptionsMore
In Proceedings of the 22nd International Semantic Web Conference (2023) - Conference Comunica: a Modular SPARQL Query Engine for the WebMore
In Proceedings of the 17th International Semantic Web Conference (2018) - Journal Triple Storage for Random-Access Versioned Querying of RDF ArchivesMore
In Journal of Web Semantics (2018)
Latest publications
- Demo Client-Driven Offline-First RDF 1.2 using OR-SetsMore
In International Conference on Web Engineering (2026) - Conference Traqula: Providing a Foundation for The Evolving SPARQL Ecosystem Through Modular Query Parsing, Transformation, and GenerationMore
In Proceedings of the 23rd Extended Semantic Web Conference (2026) - Poster From VCF to RDF: RML-Based Conversion Approaches for the Semantic Representation of Variant DataMore
In Proceedings of the 17th International SWAT4HCLS Conference (2026) - Journal Genomic sequence data sharing for clinical practice: A scoping reviewMore
In Computers in Biology and Medicine (2026) - Journal Expressive Querying and Scalable Management of Large RDF ArchivesMore
In Semantic Web Journal (2025)