Use Case: XDS Reproducibility
Introduction
XDS is one of the most widely used packages for processing single-crystal X-ray diffraction data, and it underpins a large fraction of the macromolecular structures deposited in the Protein Data Bank. Because the integrated intensities it produces feed directly into downstream scaling, phasing and refinement, the behaviour of XDS has a direct bearing on the scientific conclusions drawn from a diffraction experiment.
To track how the behaviour of XDS changes from one version to the next, and to demonstrate a validation framework that can be applied to any scientific software, we established the XDS reproducibility project. The project builds on the inherently reproducible job framework of DECTRIS CLOUD, in which software is distributed as versioned containers, so that rerunning a job returns the same result every time.
The Framework
The XDS reproducibility framework on DECTRIS CLOUD is built from the following elements:
- A reference database of diffraction datasets. Each entry is a complete rotation dataset accompanied by metadata describing the sample, beamline, detector and experimental conditions. The database is curated so that the datasets span a representative range of crystal systems, resolution limits and data-quality regimes.
- A processing pipeline composed of single-responsibility scripts:
- a processing script that runs one version of XDS against one dataset and collects its quality metrics;
- an orchestration script that invokes the processing script for every version of XDS against every dataset in the database, producing a complete results matrix;
- a visualization script that assembles the collected metrics into an interactive dashboard, allowing the output of the different XDS runs to be compared directly.
Because every version of XDS is held in its own versioned container, each cell of the resulting version-by-dataset matrix is fully reproducible.
Dashboard Results
A recent output from the visualization script can be viewed in the following job output:
DECTRIS CLOUD Public Share Link
The job page hosts an HTML dashboard presenting the comparative results across the different XDS versions. The full dashboard can be opened using the expand button:
Fig 2: The dashboard as it is initially seen in the job details page.
With the dashboard open, you can browse the results of running each version of XDS, identified by its build number, across every dataset in the reference database. The tabs at the top switch between two views. The single-dataset view focuses on one dataset at a time and overlays the quality-metric curves obtained from the different XDS versions, so that version-to-version differences are easy to read off:
Fig 3: The "SINGLE DATASET" view of the dashboard
The all-datasets view aggregates the statistics from every dataset and every version of XDS at once, allowing the distribution of results to be compared across versions:
Fig 4: The "ALL DATASETS" view of the dashboard, making comparisons between the distribution of results optained from all the datasets among different XDS versions.
The dashboard is updated regularly with new datasets, additional XDS versions and further metadata tagging of the datasets. You can return to this page to find a job link with the latest results.
Questions and Contributions
Do you have questions or comments about the XDS reproducibility project, or would you like to contribute a dataset of your own to the reference database? Write to us at support@dectris.cloud and we will be glad to help with your questions or with adding your data.
References
[1] Kabsch, W. (2010). XDS. Acta Cryst. D66, 125-132.
[2] Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. (2011). Data processing and analysis with the autoPROC toolbox. Acta Cryst. D67, 293-302.