Note: You need an SVG-enabled browser to view the following pages.

Preliminaries

Systems

Experiments were run on systems with characteristics listed below:

OS	Hardware	Java	Sub-systems
Type: Linux	CPU: 8 x Intel Xeon @ 2.93.GHz	Version: 1.6.0_24	RDF: RDF-3X 0.3.7
Version: 2.6.33.7-desktop-2mnb	Mem: 16GB	Memory (startup/max):252MB/12GB	XML: BaseX 7.3

Datasets

Experiments are based on synthetics datasets. Datasets are named Dⁱ_j, where i is the input factor used to generate XMark data, and j is the number of triples, as a ratio of XML nodes. For instance, D¹⁰_1/3 contains 16M XML nodes, and roughly 5M triples. The sizes of relative data are listed in the table below:

XML data size	1	10	100
Number of nodes	1.6M	16M	167M
Data size	111MB	1GB	11GB

RDF size (number of triples)	D¹_–	D¹⁰_–	D¹⁰⁰_–
D^–_1/3	0.5M	5M	50M
D^–₁	1.5M	15M	–
D^–₃	5M	50M	–

Workloads

	XML selectivity	RDF selectivity
Workload 1 (W₁)	LOW	LOW
Workload 2 (W₂)	LOW	HIGH
Workload 3 (W₃)	HIGH	LOW
Workload 4 (W₄)	HIGH	HIGH

Charts

XML||RDF: Independent executions (detail)
XML->RDF: Bind XML, then RDF (detail)
RDF->XML-URI: Bind RDF, then push URIs (detail)
RDF->XML-URI-Pr: Bind RDF, prune, then push URIs (detail)
RDF->XML-XPath: Bind RDF, then push XPaths (detail)
RDF->XML-XPath-Pr: Bind RDF, prune, then push XPaths (detail)

The remaining, hatch-patterned, bars are the respective tuple-at-a-time counterparts of the RDF-to-XML approach above with and without pruning.

Colored bars indicate execution time averaged over 3 runs.
Bars with negative values indicate the data was missing (most likely the timeout for the whole set of strategies expired, in some fewer case this indicates a crash).
Columns labeled ‘—’ must be ignored.
Cardinalities of the queries are indicated between parenthesis under each serie. ‘-1′ means none of the strategies completed with the allocated 5 minutes.

Strategies side-by-side

Datasets	D¹_1/3, D¹₁ and D¹₃	D¹⁰_1/3, D¹⁰₁ and D¹⁰₃	D¹₃, D¹⁰₁ and D¹⁰⁰_1/3
1000 nodes per XML file	linear scale – log scale (raw data)	linear scale – log scale (raw data)	linear scale – log scale (raw data)
# XML files = XMark size factor	linear scale – log scale (raw data)	linear scale – log scale (raw data)	(D¹⁰⁰_1/3 generation failed)

Scalability

	Workload 1	Workload 2	Workload 3	Workload 4
1000 nodes per XML file	open	open	open	open
# XML files = XMark size factor	open	open	open	open
All XML data in a single file	open	open	open	open

Recent results

We recently experimented with a new set of strategies involving materialization.

XML||RDF: Independent executions (detail)
XML->RDF-Data: Bind XML, then push to RDF data (detail)
RDF->XML-URI-Data: Bind RDF, then push to XML data (detail)
RDF->XML-URI-Data-Pr: Bind RDF, prune, then push to XML data (detail)

Strategies side-by-side

Datasets	D¹_1/3, D¹₁ and D¹₃	D¹⁰_1/3, D¹⁰₁ and D¹⁰₃	D¹₃, D¹⁰₁ and D¹⁰⁰_1/3
1000 nodes per XML file	linear scale – log scale (raw data)	linear scale – log scale (raw data)	linear scale – log scale (raw data)

Scalability

	Workload 1	Workload 2	Workload 3	Workload 4
1000 nodes per XML file	open	open	open	pen

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

XR Experiments

Preliminaries

Systems

Datasets

Workloads

Charts

Strategies side-by-side

Scalability

Recent results

Strategies side-by-side

Scalability

In this section

Main recent results

Latest News

Events

Events in November 2024

More…