Note: You need an SVG-enabled browser to view the following pages.
Preliminaries
Systems
Experiments were run on systems with characteristics listed below:
OS | Hardware | Java | Sub-systems |
---|---|---|---|
Type: Linux | CPU: 8 x Intel Xeon @ 2.93.GHz | Version: 1.6.0_24 | RDF: RDF-3X 0.3.7 |
Version: 2.6.33.7-desktop-2mnb | Mem: 16GB | Memory (startup/max):252MB/12GB | XML: BaseX 7.3 |
Datasets
Experiments are based on synthetics datasets. Datasets are named Dij, where i is the input factor used to generate XMark data, and j is the number of triples, as a ratio of XML nodes. For instance, D101/3 contains 16M XML nodes, and roughly 5M triples. The sizes of relative data are listed in the table below:
|
|
Workloads
XML selectivity | RDF selectivity | |
---|---|---|
Workload 1 (W1) | LOW | LOW |
Workload 2 (W2) | LOW | HIGH |
Workload 3 (W3) | HIGH | LOW |
Workload 4 (W4) | HIGH | HIGH |
Charts
- XML||RDF: Independent executions (detail)
- XML->RDF: Bind XML, then RDF (detail)
- RDF->XML-URI: Bind RDF, then push URIs (detail)
- RDF->XML-URI-Pr: Bind RDF, prune, then push URIs (detail)
- RDF->XML-XPath: Bind RDF, then push XPaths (detail)
- RDF->XML-XPath-Pr: Bind RDF, prune, then push XPaths (detail)
The remaining, hatch-patterned, bars are the respective tuple-at-a-time counterparts of the RDF-to-XML approach above with and without pruning.
- Colored bars indicate execution time averaged over 3 runs.
- Bars with negative values indicate the data was missing (most likely the timeout for the whole set of strategies expired, in some fewer case this indicates a crash).
- Columns labeled ‘—’ must be ignored.
- Cardinalities of the queries are indicated between parenthesis under each serie. ‘-1′ means none of the strategies completed with the allocated 5 minutes.
Strategies side-by-side
Datasets | D11/3, D11 and D13 | D101/3, D101 and D103 | D13, D101 and D1001/3 |
---|---|---|---|
1000 nodes per XML file | linear scale – log scale (raw data) |
linear scale – log scale (raw data) |
linear scale – log scale (raw data) |
# XML files = XMark size factor | linear scale – log scale (raw data) |
linear scale – log scale (raw data) |
(D1001/3 generation failed) |
Scalability
Workload 1 | Workload 2 | Workload 3 | Workload 4 | |
---|---|---|---|---|
1000 nodes per XML file | open | open | open | open |
# XML files = XMark size factor | open | open | open | open |
All XML data in a single file | open | open | open | open |
Recent results
We recently experimented with a new set of strategies involving materialization.
- XML||RDF: Independent executions (detail)
- XML->RDF-Data: Bind XML, then push to RDF data (detail)
- RDF->XML-URI-Data: Bind RDF, then push to XML data (detail)
- RDF->XML-URI-Data-Pr: Bind RDF, prune, then push to XML data (detail)
Strategies side-by-side
Datasets | D11/3, D11 and D13 | D101/3, D101 and D103 | D13, D101 and D1001/3 |
---|---|---|---|
1000 nodes per XML file | linear scale – log scale (raw data) |
linear scale – log scale (raw data) |
linear scale – log scale (raw data) |
Scalability
Workload 1 | Workload 2 | Workload 3 | Workload 4 | |
---|---|---|---|---|
1000 nodes per XML file | open | open | open | pen |