XR Strategies

This page gives an overview of the strategies explored in the XR project to join data coming from XML and RDF instances.

In the sequel, we assume an input query Q = (h, Q_X, Q_R), where Q_X is the set of tree patterns appearing in Q, Q_R is the set of triple patterns appearing in Q, and h is a tuple are variables appearing in Q_X ∪ Q_R, called the query head. We call I = (I_X, I_R), an XR data instance, where I_X is an XML instance and I_R is an RDF instance.

XML||RDF : Independent executions

In this strategies, Q_X and Q_R are evaluated on I_X and I_R respectively. Their results are combined into a natural join. The algorithm is given below.

XML->RDF : Bind XML, then push to RDF

Q_X is evaluated first over I_X, then for each tuple found, values bound to URI variables are pushed (as selections) into a copy of Q_R and added to a union of triple pattern queries. The final union of conjunctive queries obtained is finally evaluated on I_R, and provides the answer to Q over I.

RDF->XML-URI: Bind RDF, then push URIs

This approaches works based on the assumptions that the XML instance has a means of mapping URIs to the node it contains.
Q_R is evaluated first over I_R, then for each tuple found, values bound to URI variables are pushed into a copy of Q_X and added to a union of tree pattern queries. The final union of conjunctive queries obtained is finally evaluated on I_X, and provides the answer to Q over I.

RDF->XML-XPath: Bind RDF, then push XPaths

In situations where the XML instance has no means of mapping a given URI to a node it contains, one needs to “externally” dereference the URI, i.e. obtain an exact path to the node it refers to. Then, path expressions rather then URIs will be carried over the tree pattern queries of Q.

Q_R is evaluated first over I_R, then for each tuple found and for each URI u bound to a URI variable v in Q_X, obtain the exact path of the node u refers to. Add to q a predicate stating that the node labeled with variable v should match that of the given exact path. The newly created subquery is added to a union of tree pattern queries. The final union of conjunctive queries obtained is finally evaluated on I_X, and provides the answer to Q over I.

Tuple-at-a-time approaches

It may not always be possible nor efficient to evaluate a union of tree pattern queries on the XML instance. In such a case, one may evaluate multiple XML queries in sequence.

In other words, Q_R is evaluated first on I_R, then for each tuple found, values bound to URI variables are pushed (as selections) into q_, a copy of Q_X, and directly evaluated on I_X. The result of each successive evaluations of q are combined to produce the overall result. This general approach may apply to different variants of RDF-to-XML join strategies. The following algorithm details one of them, namely RDF=>XML-URI.

Pruning algorithms

Whether or not the tree patterns of an XR query are evaluated as a single union or as a sequence of simpler queries, that execution may be expensive overall. Fortunately, it is possible to prune out some of the XML sub-queries, if we are given some insight about the underlying structure of the data.
We call dereferencing the ability to get an XML node from a URI. There may be many ways to implemented a dereferencing function, such as using an index, or a XML-URI encoding scheme. In the sequel, we assume that such a dereferencing mechanism is available.

Suppose Q_R has been evaluating, and for each binding, you consider pushing the URI-variable bound values to q, a copy of Q_X.
Let us consider a URI u bound to a variable v appearing in Q_X.
One first observation to make if that, if u is not an XML node URI, i.e. dereferencing fails, then pushing u to q will necessarily render it unsatisfiable. Therefore the whole binding tuple can be discarded.
Second, assuming the dereferencing provides the exact path of the node corresponding to u, you can check in linear time if this path is compatible with the tree pattern of q, in which v appears. If the path is not compatible with any of the considered tree patterns, then q is not satisfiable and q can be ignored.
Finally, given a second URI t bound to a variable w, and assuming both t and u can be dereferenced, you may want to compare the documents in which they appear. If these are different, but v and w appearing in a common tree pattern in q, then q will clearly not be satisfiable.

These pruning techniques can be applied to all variants of the RDF-to-XML algorithms, provided a dereferencing mechanism is available. The algorithm below details the pruning process for one of those algorithms: RDF=>XML-XPath.

Materialization-based approaches

The general idea behind this new set of approaches is to evaluate one side of the query, and push its results to the other’s side data instance rather than the other side’s query.

Bind RDF, then push to XML data

Q_R is evaluated over I_R, then its result is a materialized in a temporary XML instance I’_X. Finally, Q_X is rewritten into a new query Q’_X, such that Q’_X(I_X ∪ I’_X)=Q(I).

Bind XML, then push to RDF data

Q_X is evaluated over I_X, then its result is a materialized in a temporary RDF instance I’_R. Finally, Q_R is rewritten into a new query Q’_R, such that Q’_R(I_R ∪ I’_R)=Q(I).

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

XR Strategies

XML||RDF : Independent executions

XML->RDF : Bind XML, then push to RDF

RDF->XML-URI: Bind RDF, then push URIs

RDF->XML-XPath: Bind RDF, then push XPaths

Tuple-at-a-time approaches

Pruning algorithms

Materialization-based approaches

Bind RDF, then push to XML data

Bind XML, then push to RDF data

In this section

Main recent results

Latest News

Events

Events in April 2024

Categories

More…