The H.E.S.S. data acquisition system

The High Energy Stereoscopic System (H.E.S.S.) is an array of five Imaging Atmospheric Cherenkov Telescopes located in the Khomas Highland in Namibia. It measures cosmic gamma-rays with very high energies (> 100 GeV) using the Earth's atmosphere as a calorimeter. The H.E.S.S. experiment has entered Phase II in September 2012 with the inauguration of a fifth telescope that is larger and more complex than the other four. The very large mirror area of 600 m2 in comparison to the 100 m2 of the smaller telescopes results in a lower energy threshold as well as an increased overall sensitivity of the system. Moreover, the huge effective area, due to the large mirror size, is crucial in the detection of short time scale low energy transient events. This paper will give a brief overview of the design principles of the current H.E.S.S. data acquisition and array control system. Particular emphasis is given to the new Target of Opportunity alert system that has recently been introduced to the array and allows the instrument to react to such an alert within 60 s.


Introduction
With the current generation of ground-based Imaging Atmospheric Cherenkov Telescopes (IACTs), the field of very high energy (VHE; > 100 GeV) gamma-ray astronomy has become an important part of Astroparticle Physics. One of such experiments, the High Energy Stereoscopic System (H.E.S.S.) [1], is an IACT array located in the Khomas Highland in Namibia. It consists of four smaller telescopes with a mirror surface area of 100 m 2 and a recently inaugurated fifth telescope (H.E.S.S. II) with a mirror surface area of 600 m 2 . The huge collection area of the fifth telescope significantly increases the sensitivity to transient events. A special type of transient events is known as Gamma Ray Bursts (GRBs) [2]. Due to their short duration of around 30 s [3] and their random distribution throughout the sky, the reaction time of IACT arrays with their small field of view (up to 5 • for H.E.S.S.) has to be minimal.
In the following, an overview on the H.E.S.S. Data Acquisition System (DAQ) will be given followed by the presentation of a fully automatic Target of Opportunity (ToO) reaction scheme as currently implemented in the H.E.S.S. DAQ system. The direct optical connection between the Central Trigger and the Cherenkov camera trigger systems are represented by purple lines with diamond-shaped edges. A physically separate 1 Gb /s Ethernet network, indicated by green lines with circles, is used for mounting the NFS [4] and GlusterFS [5] file servers. Figure taken from [6].
The main responsibility of the H.E.S.S. DAQ system [6] is the operation, i.e. read out and control, of the five Cherenkov Telescopes (CTs) but it is also used for slow control, error handling and user interaction with the array. It is a multi-machine, multi-process and multi-core system and consists of approximately 230 processes. An overview on the network structure on-site is shown in Figure 1. The network shown as well as the computer farm on-site consists of off-the-shelf hardware, i.e. spare parts are relatively cheap and easy to obtain and no custom made hardware is used within the H.E.S.S. DAQ.
The data rates of the Cherenkov cameras peak at 46 MB /s for the primary scientific data during routine operation. To be able to cope with data rate bursts, for example due to short time-scale transient events or to read out other equipment on-site, the required maximum data rate is of the order of 80 MB /s. The server farm, shown in Figure 1, uses for that purpose a custom-made round-robin load-balancing scheme. In this scheme, all data from all telescopes is sent to one of the nodes in the server room for four seconds, the Central Trigger pace, and buffered in memory. After four seconds, the data-receiving node is switched and the Central Trigger sends the IP address of the new data-receiving node to all cameras. The node that received the data beforehand then starts the event building process and converts the raw Cherenkov data byte-stream to the common H.E.S.S. data format. To be able to deal with different data rates, depending on different run configurations or hardware upgrades, the trigger pace as well as the number of nodes used can be dynamically altered and adjusted accordingly. The H.E.S.S. DAQ allows the array to operate in different observation modes with different sets of telescopes at the same time using so-called SubArrays. The different modes include, but are not limited to, observation runs as well as calibration and maintenance runs. The detector configuration used during such a run is defined in a MySQL database and can be easily changed and adapted to specific needs. The flexibility of this approach allowed the commissioning of the newest telescope, CT 5, while the Phase I array (CT 1-4) was taking scientific data.
To improve data taking efficiency and to reduce the possibility for human error, the H.E.S.S. DAQ is designed to be as automatic as possible. However, the Shift Crew on-site (the non-expert H.E.S.S. member personnel that travel to the H.E.S.S. site on a monthly basis to operate the array) has to, for example, manually activate the telescope tracking systems and is responsible for error recovery. In addition, the H.E.S.S. DAQ has to be able to quickly adapt to a change in array configuration due to missing or faulty hardware. The automation and the flexible design of the H.E.S.S. DAQ resulted in a loss of dark time of less than 1 % since 2009 due to central DAQ problems [6].
For optimal use of the available dark time, roughly 1000 h a year, a dedicated tool called the "AutoScheduler" [7] schedules all observation runs for a given night. It takes into account various predetermined conditions, for example target priority, zenith angle, number of runs already taken on that target and available telescopes, and uses an optimisation algorithm to prepare the schedule. The schedule for a given night is then written into a MySQL database which is in turn used by the DAQ to observe the given targets. Nonetheless, the Shift Crew can adjust the schedule, for example by adding calibration runs manually. They are, however, not allowed to change the observation schedule, unless there are exceptional circumstances.

Controllers & State Machine
Apart from the Cherenkov cameras there are several other pieces of equipment mounted on the telescopes or located on-site that must be monitored and read out. This includes for example CCD cameras, weather stations and LIDARs, etc. Moreover, the Shift Crew on-site needs rapid feedback during data taking about the current status of the array, the weather and about the data quality. Each of these tasks is taken over by dedicated programs inside of the H.E.S.S. DAQ, so-called Controllers. Each piece of hardware is mapped onto at least one Controller which is responsible for activating & deactivating, monitoring and reading out its corresponding hardware. For complex hardware like the Cherenkov cameras several Controllers are used to reduce the complexity, for example Camera HV Controller, Camera Trigger Controller, Camera Lid Controller, etc. To represent the current status of a single device a common state machine must be implemented by every Controller, see Figure 2. There are four different states available: Safe, Ready, Configured and Running, each of which represent a hardware status ranging from being turned off to taking data. The Safe state is the default during day time for most of the hardware controllers, i.e. the corresponding device is turned off. During night time most of the hardware is in the Ready state, i.e. the device is turned on and slow control data is read out. Prior to data taking, a controller reaching the Configured state indicates that the device has received all necessary configuration parameters to proceed with data taking in the Running state.
Note that there are also dedicated Controllers for displaying slow control information, real-time analysis results, etc. to the Shift Crew.  Figure 2. There are two types of transitions: ascending and descending ones. Each state can be changed to its adjacent state using the corresponding transition which, therefore, makes it possible to send any Controller of the DAQ to any state without specifying any further information other than the target state. During a state transition a Controller will wait for its dependent processes. For example, the camera high voltage must not be turned on while the telescopes are still moving to their new target not to turn off pixels in the camera due to the light coming from stars passing through the field of view. Also, every hardware Controller must wait for its Receiver process to open its data files before sending data. Therefore, a process will only start its state transition if all Controllers it depends on have reached the target state of this state transition, see Figure 3.  If several processes inside of the DAQ belong to a logical group, for example all processes belonging to one of the Cherenkov telescopes, they are grouped into a so-called Context. A given Context can also contain several SubContexts. Each Context is managed by a dedicated Controller called a Manager. It is responsible for distributing the run configuration to the processes in its context as well as for the error handling and synchronization of these processes. The state of a Manager depends on the states of all of its subordinate processes, i.e. during an ascending transition the Manager is in the least ascended state of all managed processes. The hierarchy of Managers, and, therefore, of Contexts, is as follows: at the top the so-called RunManager is responsible for monitoring available resources and starting scheduled runs accordingly; the level below consists of the different SubArray-Managers which take care of all processes that belong to a subset of the array that should take data on a given target; finally come the Managers for specific tasks in the DAQ such as the control of all processes concerned with a particular telescope. Due to the unpredictable nature of ToO alerts, the DAQ must be able to react in any given state, regardless of one or multiple runs currently ongoing or of a SubArray currently being in transition. Moreover, only operational telescopes may be used during the automatic response to the prompt ToO alert. For this, the Shift Crew is asked to update the list of available telescopes during the night so that the AutoScheduler as well as the ToO alert scheme are aware of the current state of the array and can use the available resources accordingly.

Reaction to Target of Opportunity Alerts
The majority of the time after a ToO alert has been received and before the telescopes are on target is spent moving the telescopes to the new observation position on the sky. Therefore, the idea of the ToO alert scheme is to immediately start to slew the telescopes once the ToO alert is received. The time required for slewing is then used to stop any ongoing runs and to start a new joined ToO run using all available telescopes. To make use of the higher slewing speed of CT 5, the other telescopes are declared as optional processes for ToO runs, i.e. a failure in any of the hardware components of the small telescopes will not prevent the other telescopes from data taking. Additionally, during a ToO run optional dependencies are not waited for. They are, however, allowed to join the run at a later stage or rejoin the run if they dropped out because of an error.
The detailed implementation of this ToO scheme in the DAQ is realized in the following way. If the GCNAlerter decides in favour of a prompt ToO alert, it notifies the RunManager of the name and the position of the new target. The RunManager is aware of all ongoing activity within the array and can react accordingly. At first, every process within the DAQ is notified of the ToO alert, giving, for example, the tracking the possibility to immediately slew to the new target. Once all processes have been notified, all ongoing runs are aborted and the DAQ is sent to the Ready state. Note that if one of the currently ongoing runs was in a transition phase, the run is only aborted when the transition is finished. Once all processes are in the Ready state, the AutoScheduler is used to schedule a series of ToO runs on the new target. After that, normal DAQ functionality is used to start the runs.
A notable difference between normal and ToO functionality is that during the stopping of ongoing runs, as well as during the starting of the first ToO run, special flags are set within each controller. These flags allow the Controller to behave differently during transitions after a ToO alert. In contrast to a normal observation run, the CT 5 Tracking Controller already starts to slew the telescope to the new target once the ToO alert is received [8]. Therefore, no additional actions are necessary during its ascending transitions and processes depending on the CT 5 Tracking Controller can perform their transitions in parallel to the slewing of the telescope. Nonetheless, the CT 5 Tracking Controller is waiting during its Configuring transition until the source is within 10 • of the current pointing position and during Starting until the source is within the field of view of the camera. Therefore, it is ensured that data taking does not start prematurely and at the same time this speeds up the transition by allowing the camera to start increasing voltage in the pixels while the target is still 10 • away from the source.
Once CT 5 reached the new target position, the data taking will start immediately even if the small telescopes are still moving to the new target. Moreover, the fine positioning, which is normally done during the starting transition of the tracking controller, is done during data taking to further speed up the reaction time. This behaviour might also be implemented for the small size telescopes but currently this is only the case for CT 5 [8].
An additional feature of the ToO alert scheme is to be able to quickly update the target position during an ongoing ToO run. This is needed in order to be able to react to refined coordinates received through the GCN network. In a case like this the DAQ uses similar functionality as for a new ToO alert. The GCNAlerter notifies the RunManager of the updated ToO coordinate and the RunManager distributes this information to all processes within the DAQ. This allows, for example, the CT 5 telescope to slew to the updated target position while the array continues to take data. Additionally, once the new source position is reached, the tracking notifies all processes within its context so that other processes belonging to this telescope can react if necessary.
Using this ToO alert scheme the H.E.S.S. DAQ was tested (using an angular distance of roughly 70 • to the new pointing position) and found to be able to react to a fake ToO alert within 60 s. This is a speed up to the normal transition time by a factor of 3. Compared to the MAGIC telescope array which can react to a ToO within 28 s [9], the H.E.S.S. response time is still a factor of 2 worse. However, one has to keep in mind that the MAGIC telescope system was specifically designed to be light (70 t for a 17 m diameter mirror) to be able to react to ToO alerts as fast as possible. Considering the 580 t of the H.E.S.S. II telescope and its 24 m * 33 m mirror, then a factor of 2 in reaction time in contrast to a factor of ∼ 9 in weight is a major accomplishment.

Summary
The H.E.S.S. DAQ is a complex but flexible system which is responsible for data taking, slow control, error handling and interaction of the Shift Crew with the five telescope array. To accomplish these tasks approximately 230 processes are used. The flexibility of the DAQ was shown during the commissioning of the H.E.S.S. II telescope. In the time period from 2009 to 2012 the lost dark time due to central DAQ problems was less than 1 %.
A new ToO alert scheme has been developed and integrated into the H.E.S.S. DAQ system making use of the flexible design of the Controller and their common state machine. First tests with the newly introduced ToO alert scheme and the full five telescope H.E.S.S. array have been done recently. Using a representative angular distance of roughly 70 • a reaction time to ToO alerts of less than 1 min was achieved. Further improvements to the ToO alert scheme are planned to reduce the response time even more, for example by changing the behaviour of the Cherenkov cameras during a ToO alert or by allowing reverse pointing for the different telescopes.