DIALS: Creating and running a job for peak integration

In this tutorial, we will show how to create and run a job for peak integration using the public DIALS environment.

Creating a job template

Step 1: Initial details

Navigate to Analysis → Jobs → Create Job Template, and fill out the job template name and description:

To work with the job script below, make sure to choose the DIALS environment.

Step 2: Inserting the job script

On the next page, you can specify the job script:

For this tutorial, an example job script for running DIALS is given below, which can be copy-pasted into the box:

#!/bin/bash

# Resolve and source DIALS environment
DIALS_ENV_SCRIPT=$(find /usr/local/dials -type f -name "dials_env.sh" | head -n 1)

if [[ -f "$DIALS_ENV_SCRIPT" ]]; then
    source "$DIALS_ENV_SCRIPT"
else
    echo "ERROR: Could not locate dials_env.sh in /usr/local/dials"
    exit 1
fi

set -euo pipefail
echo "JOB_OUTPUT_DIR: ${JOB_OUTPUT_DIR}"
echo "JOB_WORK_DIR:   ${JOB_WORK_DIR}"
echo "JOB_TEMPLATE_DIR: ${JOB_TEMPLATE_DIR}"
echo "DC_PATH:        ${DC_PATH}"

cd "$JOB_WORK_DIR"

IMPORT_ARGS=""
if [[ "${DC_SPACE_GROUP_NUMBER:-"-1"}" != "-1" ]]; then
    IMPORT_ARGS+=" space_group=${DC_SPACE_GROUP_NUMBER}"
fi
dials.import "${DC_PATH}" ${IMPORT_ARGS} output.experiments=imported.expt
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;


SPOT_FINDING_ARGS=""
if [[ "${DC_SPOT_THRESHOLD:-"-1"}" != "-1" ]]; then
    SPOT_FINDING_ARGS+=" threshold.algorithm=overload threshold.overload=${DC_SPOT_THRESHOLD}"
fi
dials.find_spots imported.expt ${SPOT_FINDING_ARGS} output.reflections=spots.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

INDEX_ARGS=""
if [[ "${DC_INDEX_METHOD:-"-1"}" != "-1" ]]; then
    INDEX_ARGS+=" method=${DC_INDEX_METHOD}"
fi
if [[ "${DC_CELLSIZE_MAXIMUM:-"-1"}" != "-1" ]]; then
    INDEX_ARGS+=" max_cell=${DC_CELLSIZE_MAXIMUM}"
fi
dials.index imported.expt spots.refl ${INDEX_ARGS} output.experiments=indexed.expt output.reflections=indexed.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

REFINE_ARGS=""
if [[ "${DC_REFINE_MODE:-"-1"}" != "-1" ]]; then
    REFINE_ARGS+=" refinement.reflections.outlier.algorithm=${DC_REFINE_MODE}"
fi
dials.refine indexed.expt indexed.refl ${REFINE_ARGS} output.experiments=refined.expt output.reflections=refined.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

INTEGRATE_ARGS=""
if [[ "${DC_INTEGRATION_PADDING:-"-1"}" != "-1" ]]; then
    INTEGRATE_ARGS+=" integration.padding=${DC_INTEGRATION_PADDING}"
fi
dials.integrate refined.expt refined.refl ${INTEGRATE_ARGS} output.experiments=integrated.expt output.reflections=integrated.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

dials.symmetry integrated.expt integrated.refl output.experiments=symmetrized.expt output.reflections=symmetrized.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

dials.scale symmetrized.expt symmetrized.refl output.experiments=scaled.expt output.reflections=scaled.refl
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

dials.export scaled.expt scaled.refl mtz.hklout=final.mtz
find . -maxdepth 1 -type f -exec cp {} "$JOB_OUTPUT_DIR" \;

The job script carries out the following tasks:

Sources the DIALS environment, so that DIALS commands can be readily used.
Calls a series of DIALS command. Before each command, it checks for input variables with the "DC_" prefix and uses them as input arguments, if they do not have the value -1.
After each command, the contents of the JOB_WORK_DIR is copied to the output folder. In this way, you will still have some output if the analysis fails, and you can also see the output appearing gradually as the job is running.

Step 3: Defining the input variables

To the right on the same page, you can define configurable input variables. Most importantly is the DC_PATH variable of type PATH, which defines the path to the data folder.

In addition, other DC_ input variables, referred to in the script will need to be defined here:

Step 4: Reviewing and saving the job template

On the next page, you can review your template and save it by clicking START BUILD:

Step 5: Viewing the job template in the table

Once saved, the Template can be viewed in a table at Analysis → Jobs → Job Templates. From here, a job can be started via the RUN JOB button:

Running the job

Step 1: Defining the experimental data path

To start the DIALS job, after clicking the RUN JOB button in the job template table, use the file browser wizard to choose the folder with the data to analyze:

The PATH variable at the top (orange box), should contain the full path to the folder with the files you want to analyze.

Step 2: Configuring the input parameters and launching the job

Choose the machine type and local software disk appropriate for your job. For the DIALS job, it is essential that you make sure that you have enough RAM for analyzing your data. Additionally, the local software disk should be large enough to store any temporary or results files generated during the process:

Finally, the input variables can be given values different from -1, if you want to include them as input arguments for the DIALS commands.

Once you have configured the job for your liking, launch it by pressing RUN JOB.

Viewing the output

Step 1: Viewing the output in the table

Once launched, the job and its output can be viewed in the jobs table: Analysis → Jobs → Jobs. Clicking the eye symvol will bring up details such as the job ID, current duration of the job, as well as the job log:

Pressing the + symbol gives an overview of the generated output files. As the job is running and goes through the different DIALS commands, more and more output files will gradually by added to the overview.

Step 2: Viewing the output in the processed folder

In addition, in the "processed" folder of the experiment, a subfolder can be found with all generated output files as well: