Overall quality optimization for DQM stage in High Energy Physics experiments

Data Acquisition (DAQ) and Data Quality Monitoring (DQM) are key parts in the HEP data chain, where the data are processed and analyzed to obtain accurate monitoring quality indicators. Such stages are complex, including an intense processing work-flow and requiring a high degree of interoperability between software and hardware facilities. Data recorded by DAQ sensors and devices are sampled to perform live (and offline) DQM of the status of the detector during data collection providing to the system and scientists the ability to identify problems with extremely low latency, minimizing the amount of data that would otherwise be unsuitable for physical analysis. DQM stage performs a large set of operations (Fast Fourier Transform (FFT), clustering, classification algorithms, Region of Interest, particles tracking, etc.) involving the use of computing resources and time, depending on the number of events of the experiment, sampling data, complexity of the tasks or the quality performance. The objective of our work is to show a proposal with aim of developing a general optimization of the DQM stage considering all these elements. Techniques based on computational intelligence like EA can help improve the performance and therefore achieve an optimization of task scheduling in DQM.


Introduction
The experimental research of multi-messenger accelerator-driven particle physics, studying from the largest-scale structures in the observable, to the most fundamental particles, and accelerated and non-accelerated based High Energy Physics (HEP) facilities must be supported by sustainable software (s/w) and hardware (h/w) that address tremendous technical challenges. A fundamental part of the HEP data chain is related to Data Acquisition (DAQ) and Data Quality Monitoring (DQM). DAQ/DQM are key parts in this software chain where the data are acquired, processed and analyzed in the early stages of the HEP experiments. DQM aims to fill the gap between the fast, high-bandwidth on-line monitoring with a limited and fixed CPU budget, and off-line processing which has access to substantially more resources but is much less agile. Hence, DQM is an indispensable tool to identify problems with the experiments and to reduce data loss. In this sense, DQM provides an information interface for quality monitoring management through measurements, parameters, or histograms.
The tasks performed in DQM involve a very large computational load (related to time consumption and computing resources), so they must be carefully selected to provide results with adequate performance constraints. The tasks performed in DQM can be abstracted as a flow of operations that are applied to the data coming from the DAQ, with the objective of delivering processed outputs and results of the state of the experiment. In this article we present a proposal for the optimization of the general data processing in the DQM stage that allows to improve the performance of the execution of workflows in aspects such as the use of computing resources or the processing time of operations. With this proposal, a decision making tool for DQM is obtained, through which it is possible to select how the processing will be done in the most optimal way considering all the objectives of the study.

Problem description
DQM is a key stage of information processing in the early phases of virtually all experiments in the HEP context. As shown in Figure 2, multiple algorithms are performed to compute results for scientists and engineers. The general idea is to execute a complete workflow consisting of smaller interconnected algorithms, grouped in tasks, that share data inputs and outputs (see Figure 1). The processing elements in DQM are as follows: Algorithms -It refers to algorithms performed in the DQM workflow such as FFT or the identification of Region of Interest. Algorithms can be executed on different architectures and computing resources (CPUs, GPUs, FPGAs/ASICs [6], Cloud providers, ...). Task -It corresponds with a set of algorithms that perform a function or operation, for example the validation of a procedure, the calculation of some parameters or the obtaining of graphical results of the DQM. Workflow -DQM perform a set of computationally intensive operations that generally result in a complete workflow, containing different tasks (T ) and algorithms (A) in each step.
In addition, all these elements are subject to parameterization, constraints and interdependencies among them. The computational cost of the overall process is high, which means that the workflow must be carefully scheduled in order to provide light processing time and balanced resource consumption. In this sense, an adequate planning of the execution of the set of tasks and algorithms, shaping a workflow, can improve the general performance. We are faced with a combinatorial problem of type NP-Complete [1], where there is a wide search space and to obtain an optimal global solution becomes a very complex task. To solve this type of problems different proposals for resource planning have been approached, particularly with great success the proposals based on Evolutionary Algorithms (EA) [3]. Thus, we think that a conveniently designed EA would also work well for the problem of concern. In the following sections we define all the necessary elements to consider.  DQM processing stage in protoDUNE [8] where it comprises a group of operations that shape a data processing workflow.

Methodology
The goal of the proposal is to offer a decision-making tool that allows optimizing the processing workflow described in section 2.

Multi-objective optimization
To solve optimization problems, algorithms are used to find a solution that maximize or minimize the value of a function for a given problem. In many real problems and in particular resource planning problems, different objective functions are involved. To tackle this problem, multiobjective algorithms have been designed [5,2], allowing several objectives to be optimized at the same time. The solutions to a multi-objective optimization problem are the set of nondominated solutions called the Pareto set [11] In this context, the goal of optimization is to determine how, when and where the processing for the DQM stage will be optimally executed. This depends on many criteria that can be contradictory or conflicting and in addition have associated constraints. The general mathematical model for specifying a multi-objective optimization problem consists in finding a vector x * = [x * 1 , x * 2 , · · · , x * n ] T that satisfies m constraints (i = 1, 2, · · · , m) of g i (x) ≥ 0, and p constraints (i = 1, 2, · · · , p) of h i (x) = 0 and finally optimize the vector function x n ] T is the decision variable vector.

Evolutionary algorithms
Meta-heuristics based on EA are widely used in resources scheduling with constraints [10] offering a good balance between promising solutions and computational performance compared to other traditional proposals [5]. A proposal based on the multi-objective optimization with EA for the problem described in section 2 has been developed.
The basic principles of EA, as well as the general working scheme of operation and algorithms are defined in different published works [4,5,2]. In this type of problems an individual (or chromosome) encodes a solution to the optimization problem where all the details are specified about what, when, how and where each operation will be performed. The algorithm manages a set of solutions, called population. The overall quality of the solutions is expected to improve along successive generation of populations through the operations of selection, crossover and mutation.

A multi-objective Evolutionary Algorithm for DQM optimization
The modeling of this type of algorithms for DQM optimization involves the aspects developed in the following subsections.

Representation of the solution
It is necessary to represent the DQM workflow (tasks, algorithms, dependencies and resources) ( Figure 1)

Objective functions
The measurement of the performance of the solutions corresponds to two parts: a) the satisfaction of the constraints, and b) the performance of the scheduling in terms of time and use of resources of the DQM workflow.

Constraints.
Resource scheduling in DQM is subject to different constraints, such as the number of computing resources, data dependency between tasks, or their availability at any given time, including the provision of algorithm implementations on different infrastructures. In Figure 5, an example of modeling of constraints due to resource and time availability can be seen. The degree to which constraints are violated determines how feasible the schedule is. (i) Time of the workflow execution (T w ): it is the total estimated time of a complete workflow.
Seeing the DQM task scheduling as an acyclic directed graph (DAG) G = (V, E), the total time is the largest path cost in G, calculated as: for each operation v ∈ V in linearized order T ops (u) = max (u,v)∈E {T ops (u) + T op (u, v))}, being T ops the start time with respect to the preceding task in the workflow, T op the set of operations to be performed in the workflow such as T op = T e i + T l i + T t i , where T e the estimated execution time of the algorithm, T l the estimated latency time and T t the time to transfer data. (ii) Resource utilization (R u ). It is the indicator of the total resources used by the execution of the algorithms, calculated as follows: R u = n i=0 R op i = n i=0 (R cpu + R gpu + R f pga ), being n the total number of operations, R cpu , R gpu and R f pga the resource use indicators of the execution of the algorithm on the different selected architectures.

Conclusions
Data processing in DQM is a critical stage in identifying problems and reducing data loss in early processing stages of HEP experimentation. In order to mitigate this problem, the proposal developed allows an optimization of the entire workflow in DQM, with two clear objectives, on the one hand to reduce the execution time of DQM processing tasks, offering better planning of them, and on the other hand, the minimization of the use of computing resources used by the workflow. With this proposal, is possible to carry out a global optimization of the workflow data processing in DQM, allowing to use it as a decision making tool for scientists to decide more efficiently the DQM processing stage.