Return to XR: an XML-RDF Hybrid Model for Annotated Documents

XR Experiments

Note: You need an SVG-enabled browser to view the following pages.

Preliminaries

Systems

Experiments were run on systems with characteristics listed below:

OS Hardware Java Sub-systems
Type: Linux CPU: 8 x Intel Xeon @ 2.93.GHz Version: 1.6.0_24 RDF: RDF-3X 0.3.7
Version: 2.6.33.7-desktop-2mnb Mem: 16GB Memory (startup/max):252MB/12GB XML: BaseX 7.3

Datasets

Experiments are based on synthetics datasets. Datasets are named Dij, where i is the input factor used to generate XMark data, and j is the number of triples, as a ratio of XML nodes. For instance, D101/3 contains 16M XML nodes, and roughly 5M triples. The sizes of relative data are listed in the table below:

XML data size 1 10 100
Number of nodes 1.6M 16M 167M
Data size 111MB 1GB 11GB
RDF size
(number of triples)
D1 D10 D100
D1/3 0.5M 5M 50M
D1 1.5M 15M
D3 5M 50M

Workloads

XML selectivity RDF selectivity
Workload 1 (W1) LOW LOW
Workload 2 (W2) LOW HIGH
Workload 3 (W3) HIGH LOW
Workload 4 (W4) HIGH HIGH

Charts

  • XML||RDF: Independent executions (detail)
  • XML->RDF: Bind XML, then RDF (detail)
  • RDF->XML-URI: Bind RDF, then push URIs (detail)
  • RDF->XML-URI-Pr: Bind RDF, prune, then push URIs (detail)
  • RDF->XML-XPath: Bind RDF, then push XPaths (detail)
  • RDF->XML-XPath-Pr: Bind RDF, prune, then push XPaths (detail)

The remaining, hatch-patterned, bars are the respective tuple-at-a-time counterparts of the RDF-to-XML approach above with and without pruning.

  • Colored bars indicate execution time averaged over 3 runs.
  • Bars with negative values indicate the data was missing (most likely the timeout for the whole set of strategies expired, in some fewer case this indicates a crash).
  • Columns labeled ‘—’ must be ignored.
  • Cardinalities of the queries are indicated between parenthesis under each serie. ‘-1′ means none of the strategies completed with the allocated 5 minutes.

Strategies side-by-side

 

Datasets D11/3, D11 and D13 D101/3, D101 and D103 D13, D101 and D1001/3
1000 nodes per XML file linear scalelog scale
(raw data)
linear scalelog scale
(raw data)
linear scalelog scale
(raw data)
# XML files = XMark size factor linear scalelog scale
(raw data)
linear scalelog scale
(raw data)
(D1001/3 generation failed)

Scalability

Workload 1 Workload 2 Workload 3 Workload 4
1000 nodes per XML file open open open open
# XML files = XMark size factor open open open open
All XML data in a single file open open open open

Recent results

We recently experimented with a new set of strategies involving materialization.

  • XML||RDF: Independent executions (detail)
  • XML->RDF-Data: Bind XML, then push to RDF data (detail)
  • RDF->XML-URI-Data: Bind RDF, then push to XML data (detail)
  • RDF->XML-URI-Data-Pr: Bind RDF, prune, then push to XML data (detail)

Strategies side-by-side

 

Datasets D11/3, D11 and D13 D101/3, D101 and D103 D13, D101 and D1001/3
1000 nodes per XML file linear scalelog scale
(raw data)
linear scalelog scale
(raw data)
linear scalelog scale
(raw data)

Scalability

Workload 1 Workload 2 Workload 3 Workload 4
1000 nodes per XML file open open open pen

Permanent link to this article: https://team.inria.fr/oak/projects/xr-an-xml-rdf-hybrid-model-for-annotated-documents/xr-experiments/