August 21th 2025: Jobs and Job Templates
Recording of the first part of the meeting
Environments and help pages referenced
Job template for running a simple python script
Python Basics Ubuntu Environment (needs app login)
XDS environment (needs app login, only visible to academic users)
Simple python example (needs app login)
PRETTY XDS job template (needs app login)
Machine-generated transcript
Hi everyone, and welcome to this version of the power user meeting, where we're going to focus on jobs and job templates. This first part of the meeting is recorded, so anyone can ask questions, but if you don't want to be part of the recording, just wait until the end. For the Q&A part, we will turn the recording off.
I will first briefly go through how one can create a job template. I'll give a demo, also on the app, to show how it's done in practice. Then we'll go through running a job, and finally talk about some recent changes that have been made to jobs, as well as upcoming changes that we are planning.
The basic concept of a job is probably familiar to most people. If you're used to working on a cluster and submitting Slurm scripts, then it's a very similar concept.
The way it works on our platform is that, if you want to have this kind of automatic analysis workflow, it is based around our software environments. A software environment is a container that has the software you want to use for a specific analysis task. You would then write a bash script that describes the specific steps needed for that task.
Finally, you can configure a set of input parameters that can be changed every time you run the job. This means you can adapt the job to different datasets. Once you have configured this as a package, we call it a template. You can then run the job.
When you do, you need to choose an experiment path. This will be mounted in the cloud to the machine that's running the job. You configure the input parameters and then run it. You’ll be able to view the job in a table on the platform.
An advantage of jobs compared to interactive sessions is that you don't have any idle time—you’re only consuming resources for your specific analysis task. As soon as the job is done, the machine spins down again.
Before going into more detail, I’ll show what this looks like in practice. Starting from the homepage, you can go to the Analysis tab, then down to the Job section and My Templates. From there, you can create a new template.
You’ll first have to fill out some fairly simple details, like giving a name to the job and adding a short description. Here is the sponsor for the job and the license you assign to it. Different licenses have different slots allocated, meaning a different number of job templates you can create. Finally, you choose the environment—this is the software container with the specific software you need.
For this simple demo, I’ll create a fairly basic job. I choose an environment called Python Basics, but it needs to be the Ubuntu version, because we don’t currently have the job functionality working for Python environments. This environment is fairly simple: on top of the Ubuntu desktop system it also has Miniconda installed, along with a number of Python packages. We use this as the foundation for our job.
Next, we can specify the script. There is an example bash script already filled out to give you an idea of how things work. We’ll overwrite most of it, but leave the beginning part. Here, we’re already writing out a couple of parameters. For example: The work directory is where the job executes. The template directory is where you upload files. The output directory is where output files are stored.
Since I want to run a Python script, I’ll first upload it. I’ve uploaded two files that I now want to use for my job example. These go to the job template directory. As a first step, I’ll copy them into my work directory. One file is an image, and the other is a script.
When writing bash scripts, you need to know something about the container you’re using. In this case, with the Python Basics environment, I know there is a Conda environment I can call to execute the script. I use a conda run command with the name of the environment, and then call the script. This script takes an input parameter, which is the image I copied into my job template. I provide that here.
The Python script generates an image and writes it to the work directory. When the job finishes, the machine shuts down and we lose access to the work directory. That means we must copy all the output files we want to keep into the output directory. As the last step, I copy the generated plot—called cloud plot—into the output directory.
This is probably the simplest example of executing a job. Here, I also have input variables defined, and I could add more if needed. We’ll see that later in more complex examples. For now, we go to the next step, review everything, and then build the template. It now appears in My Templates. You can search for it by name—it is case-sensitive. The template is labeled unvalidated because it hasn’t been run yet.
To run it, I choose Run Job. The first thing is to choose which experiment to mount. This is always required, even if no path variable is defined in the script. For example, even if you’re just running a simulation, you must still choose an experiment so the job knows where to put its output. I then choose a project, select the machine type, set the software disk size (depending on expected output), select the sponsor, and choose the template version. If I’ve built several versions, I can still run an older one.
I can also configure input variables here, if I want. Once everything is set, I click Run Job. The job now appears in my job list with a pending status. This means we are waiting for the requested machine type and the environment to spin up. The larger the environment, the longer it takes.
I also have a completed version of the same job. When it finishes, we see the generated plot, which you can open in a larger view. Using the Go to File Browser button, you can navigate to the project or experiment folder where the files are stored, and collaborators also have access.
This was the simplest job example. To recap:
- The work directory is the default execution directory.
- The template directory stores uploaded files, which you can copy into the work directory or reference directly.
- The output directory is where you save files you want to access later, and these also appear in the job table.
When defining a path to data, you can do it in two ways:
- As an input variable of type path, which lets you navigate with the file browser.
- As a string variable, where you type the path directly (though this is more hardcoded).
Even if you use a string variable, you always need to select at least the experiment in the file browser.
Now, for a second demo: this shows a more relevant scientific example. We go back to the app, the Analysis tab, and look at public job templates. I’ll use prettyXES, which is available to everyone. I click Run Job. Even though I didn’t configure it myself, I can still run it.
This job has two input variables, so I configure two paths: I start by choosing my experiment, then navigate to the auxiliary folder and select the input file. I do the same for the calculation input path. I double-check the paths, then choose the machine type. This time I select base, leave the software disk as is, and leave the default input variables. There’s also a support file to handle output. I click Run Job. Again, it appears as pending, but I also have a completed version to show the results.
Here we see several analysis plots based on the XDS analysis done on the input file, as well as summaries extracted from XDS. In the file browser, we see all files from the job table and a subdirectory with the full XDS output.
Going back to the presentation, we also have other prepared job templates for different workflows. For example, one for cryo, another for cryo-EM (with an improved file viewer for PDB structures), and one from 4D-STEM showing reconstruction outputs.
To summarize recent changes:
- We added the PDB viewer, which is helpful for structural analysis.
- We enabled internet access, so you can download a PDB as a starting point.
- Coming soon: an HTML file viewer, useful for pipelines that generate HTML summaries.
This concludes the presentation. We’re very open to feedback—ideas, questions, improvements. For example, other file types you’d like supported in the viewers, or things that could be made easier. Finally, a reminder: we have a special birthday meeting next week. You can submit a slide about what you’ve been working on, and I think everyone will be interested to hear about it.
You can also talk to us on Discord. We have a QR code here you can scan if you want to go to that place. Then I think we'll go to the non-recorded part of the discussion.