Preprint

Guided Link-Traversal-Based Query Processing

Link-Traversal-Based Query Processing (LTBQP) is a technique for evaluating queries over a web of data by starting with a set of seed documents that is dynamically expanded through following hyperlinks. Compared to query evaluation over a static set of sources, LTBQP is significantly slower because of the number of needed network requests. Furthermore, there are concerns regarding relevance and trustworthiness of results, given that sources are selected dynamically. To address both issues, we propose guided LTBQP, a technique in which information about document linking structure and content policies is passed to a query processor. Thereby, the processor can prune the search tree of documents by only following relevant links, and restrict the result set to desired results by limiting which documents are considered for what kinds of content. In this exploratory paper, we describe the technique at a high level and sketch some of its applications. We argue that such guidance can make LTBQP a valuable query strategy in decentralized environments, where data is spread across documents with varying levels of user trust.