October 30th 2025: Guest Speaker Philip Ploner on Particle Physics Simulation Workflows

Recording of the first part of the meeting

Machine-generated transcript

Introduction (Camilla Larsen)

Hello everyone, and welcome to this edition of the DECTRIS CLOUD Power User Meeting.

We hold this meeting every two weeks to provide short updates on the platform and to answer any questions our users may have. Today’s session is special because we have a guest speaker, Philip Ploner, who you will hear from shortly.

The plan for today is as follows:
Since some participants may not have used the platform before, I will begin with a brief introduction to the DECTRIS CLOUD concept and its purpose. Philip will then take over to demonstrate how the platform works in practice, using particle physics simulation workflows as an example. We will finish with time for questions.

To understand why we developed DECTRIS CLOUD, it helps to look at how science itself has evolved. The average number of co-authors per publication has increased dramatically over the past few decades. We have moved from an era of individual researchers to team-based and now fully collaborative science across distributed, interdisciplinary teams. This shift has fundamentally changed how researchers work and what they need from their tools and infrastructure.

At the same time, the way science is conducted is also changing. We see rapid growth in multimodal and multi-omics research, where data from multiple sources and techniques is combined at an accelerating rate. Compute requirements are increasing just as rapidly, with demand roughly doubling every six months. What once ran on a workstation now often requires large-scale distributed computing. In parallel, open-source software is expanding rapidly, with millions of new repositories created each year as scientists share tools and build on each other’s work.

Together, this represents a convergence of data, compute, and collaboration. That convergence is exactly why we built DECTRIS CLOUD: to provide a platform that supports how modern science actually works.

In practice, DECTRIS CLOUD is a web application accessible at app.dectris.cloud. Through the platform, users can upload data and share it with collaborators, define and share automated workflows for reproducible science, and access scalable storage and compute resources. Signing up is free, and a free account includes 20 CPU hours and 1 GPU hour per month.

To upload data, users sign into the platform, navigate to the Data tab, and create a new Project. A project serves as a dedicated space for storing data. Once created and populated with the required metadata, users can manage access by adding collaborators, who then immediately gain access to the project’s data.

Inside a project, users can organize their data into subfolders and upload files via drag and drop.

Once data is available in the cloud, the next step is analysis. The foundation of this process in DECTRIS CLOUD is what we call Environments. An environment is an isolated software container with a defined set of tools and dependencies. Environments can be created via scripts or by saving snapshots of virtual machines.

With an environment, users can start interactive Sessions, such as virtual machines or Jupyter notebooks, or run Jobs, which are scripted executions inside the container. This approach enables fully reproducible workflows. Users can independently choose their compute requirements, select how many CPUs are needed, and specify which data to process. Both environments and data can be shared with collaborators.

This concludes the conceptual overview. I will now hand over to Philip, who will demonstrate how this works in practice within particle physics.

Particle Physics Simulation Workflows (Philip Ploner)

Thanks a lot. Hi everyone, my name is Philip. I am a student at ETH Zurich in the particle physics group. I will quickly give an overview of what we are currently working on, and then I will show you how we can use DECTRIS CLOUD to streamline this workflow and make it easily usable and shareable.

Here on this slide, you can see the Large Hadron Collider at CERN. At the LHC, protons collide at a very high rate. In the CMS detector, we have around one billion proton–proton collisions happening per second. As you can imagine, the detectors collect data at an extremely high rate.

We are reading out around one petabyte of data per second from these detectors, which is a very high rate. To cope with this, we use a triggering algorithm. This algorithm decides which data to keep, because we cannot store all of it. For long-term storage, we only keep around 0.0025% of the data.

This triggering system has to decide very quickly which data to keep and which to reject. At some stages, there are already machine learning frameworks involved in this decision, but overall the selection is still relatively coarse. For example, we sometimes select based on physics features such as momentum.

What we want to do is make these decisions more refined so that we can detect finer signatures that might be interesting for new physics. These are very complex pattern recognition tasks with high-dimensional input from the detectors. Machine learning has proven to be very successful for this type of task.

We have created a dataset of around one billion simulated proton–proton collision events that we can use to train machine learning models. On the right, you can see a sketch of the simulation workflow. It uses three software packages: MadGraph, Pythia, and Delphes. These tools simulate the entire process from the collision itself to the detector response at the CMS experiment.

These simulations are based on Monte Carlo methods and are stored as large datasets. This simulation workflow is quite complex, because the different software components have to work together and require specific, compatible environments. This makes it difficult to share the workflow with others.

I will now switch to the DECTRIS CLOUD interface and show how we can use it to run this simulation workflow. In the Analysis section, there is an Environments function where you can save an environment, for example using a Docker container. Once the environment is installed, it is easy to share it with other group members so everyone can access it and run workflows.

Once the environment is set up, you can create a job template. This is an interface between the DECTRIS CLOUD user interface and your code. In the job template, you can configure settings such as the number of events to simulate, the physics process, a random seed for reproducibility, and whether to generate plots. You can also choose the machine and the project path.

Once you run the job, it appears in the job session overview. You can see completed jobs and inspect output validation plots. You can also browse the output files, including plots and datasets in ROOT and Parquet formats, which can then be used to train machine learning models.

That is mostly it from my side. The main point is that if you have complicated workflows that you want to streamline and share, you can do this in DECTRIS CLOUD using shared environments, compute resources, and reproducible workflows. I will now hand back to Camilla so we can move on to questions.

 

Was this article helpful?