ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond

. ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework’s state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires further enhancement of metadata infrastructure in order to ensure semantic coherence and robust bookkeeping. This paper describes the evolution of ATLAS metadata infrastructure for Run 2 and beyond, including the transition to dual-use tools—tools that can operate inside or outside the ATLAS control framework—and the implications thereof. It further examines how the design of this infrastructure is changing to accommodate the requirements of future frameworks and emerging event processing architectures.


Introduction
Metadata are essential to event data processing, in a variety of roles. Metadata handling and metadata flow, though, differ in significant ways from event loop management and execution control:  Handling may be asynchronous to control framework state machine transitions.  Object lifetime management is different; most metadata describes a collection of events. For in-file metadata (metadata that describes the events in a physical file in which it is located) the lifetime is controlled by opening and closing of the file.  Scheduling, processing, and propagation can be different, as most metadata will need to be summarized using semantic knowledge of the data, whereas event data records can simply be appended. Evolution of event processing frameworks (to multiprocessing and multithreaded models, to scattergather architectures, to operability on heterogeneous and high-performance platforms) and the need for metadata access in analyses downstream of experiments' frameworks, require concomitant evolution of metadata handling and its supporting infrastructure.

Metadata use-cases
We identified five metadata use cases for metadata used or produced during event processing.

Data Integrity
There are certain metadata that identify the file and its contents and how it connects to larger data groupings such as datasets.

Provenance
As ATLAS has a long chain of data products, it is useful for a file to know from which data products an event sample was derived.

Auto-configuration
Job-configuration depends on data that is stored as in-file metadata. This can include cached conditions data using an interval-of-validity structure. ATLAS jobs can use this information to automatically configure their setup.

Bookkeeping
Several things happen to the data during processing including filtering. The metadata tracks counts needed for efficiency and luminosity calculations. This can also be needed by the grid data processing to make sure files are merged into data blocks of related data, e.g. luminosity blocks, physics streams, etc.

Caching for analysis
At the analysis level the metadata can be used to help physicists avoid database accesses or release dependence by caching data needed within the file. At this point the data and tools also need to be in a form that can be used outside Athena, the ATLAS control framework.

The Run 1, incident-driven metadata infrastructure
For Run 1, ATLAS [1] augmented the I/O infrastructure [2] to handle in-file metadata as described in [3]. This infrastructure (see Figure 1) profits from a rich feature set provided by the ATLAS execution control framework [4] including: 1. Standardized interfaces and invocation mechanisms for tools and services  It uses a MetaDataSvc to handle separate data store instances for metadata.  It relies on MetaDataTools to summarize and propagate metadata from input files.

2.
Segregation of transient data stores with concomitant object lifetime management  An input metadata store is used for reading and mirrors the lifetime of the input file (i.e.: is flushed on input file transitions). Metadata objects are propagated (and summarized) into a separate metadata store from which they can be written into output files.

3.
Mechanisms for handling occurrences asynchronous to the control framework's state machine transitions  MetaDataSvc and MetaDataTools are invoked via handling FileIncidents that are fired by the EventSelector on input file boundaries.

The metadata infrastructure reuse in downstream physics analyses on xAOD
For Run 2, ATLAS has changed its event data model to unify the ATLAS control framework Athena and ROOT analyses. In addition to storing the event data in a new format (xAOD), many tools are shared in both environments (Dual-Use Tools).
To allow the reuse of metadata components in downstream analyses that are not utilizing Athena, Dual-Use Tools are being developed to summarize and propagate metadata records (see Figure 2). This approach transfers some functionality from the framework MetaDataTools to Dual-Use Tools and provides a generic MetaDataTool-Wrapper to allow framework integration. The MetaDataTool-Wrapper will listen to the framework incidents and interact with the data stores.

The metadata infrastructure inside AthenaMP
AthenaMP [5], the multiprocessing version of the ATLAS control framework, starts out as a single process and after initialization forks several worker processes to process the events. Each worker has its own EventSelector that reads their sub-sample of events directly from the input files. Incident firing is the same as for serial execution and each worker has access to all the metadata. If a worker does not process any events in an input file then that file is skipped for metadata processing as well. Each worker produces an output file containing events and metadata. These output files have to be merged after completion of the AthenaMP job. Because of the complexity, in Run 1, metadata merging required execution of the full Athena framework, whereas event data can be appended with more light-weight tools. Optionally, a shared TokenReader can be used to iterate over the input files and dispatch event processing by sending Token (event references) to the worker. Even in this mode, the worker will still access the file directly for event data and metadata (see Figure 3).  Each worker uses metadata processing infrastructure that is identical to the serial processing.

Event service framework and metadata
A new EventService framework is being developed by ATLAS to implement a fine-grained approach to event processing. This will allow ATLAS to exploit opportunistic, potentially short-lived resources such as High Performance Computing, Amazon spot market or volunteer computing. It decouples processing of events from the chunkiness of files, data locality considerations, and from WAN access latencies.
As opportunistic resources can become unavailable during the run-time of a job, the EventService streams output events away quickly to minimize losses if the worker vanishes. This functionality also lowers the local storage demands. This means jobs cannot rely on finalization before producing output files. Instead a new incident is used to trigger output file transitions and by default, EventService produces single event output files. Writing a sequence of output files, done by the OutputStreamSequencer (see Figure 4), is straightforward to implement for event data. However, metadata clients need to be able to produce records describing just the sub-sample of events in the output file, while at the same time, they need to propagate all metadata needed for the complete event sample. For the Event Service, AthenaMP manages the distribution of events to parallel workers via TokenReader. Workers retrieve event data using the token either by directly accessing the file or using a shared ObjectReader (in development).  Figure 4. Metadata infrastructure with output file sequencing. An OutputStream Sequencer is used to split output files. This service uses a new incident to ensure that the metadata service and tools can correctly propagate the metadata for the subsample of events in the output file.

Metadata infrastructure and ATLAS future frameworks
Maintaining semantic integrity in data organization and ensuring that all events in semantically meaningful units are processed are requirements that are independent of framework and processing architecture. As ATLAS execution frameworks evolve to support heterogeneous and emerging architectures, though, these requirements may need to be supported differently, and with greater generality. The collaboration is in the process of gathering and documenting its requirements for a nextgeneration framework, with a framework prototype to be developed in 2015. Those requirements foresee decreased reliance upon incidents because of potential blocking and other issues in multithreaded deployment. A future framework will nonetheless retain the notion of "schedulable incidents," so that reaction to incidents may be handled under the control of a scheduler. Key to this strategy is the recognition that reacting to occurrences asynchronous to state machine transitions (such as hitting file boundaries) is not necessarily "unschedulable": It is not difficult to define a whiteboard architecture in which some components are listening/waiting, not for event data, but for metadata or their control objects. Such a strategy may require framework evolution to support heterogeneity in the "type" of the next datum to be processed, which may not always be the next event or the next event data object. It must further remain possible to propagate semantic context from input to output, and to ensure that such context be accessible to components that need it.

Conclusion
During Run 1, a robust and versatile metadata infrastructure has proven essential for ATLAS:  Job configuration relies on in-file metadata  Event filtering requires sufficient bookkeeping metadata Run 2 conditions further emphasize the importance of metadata for distributed data processing and analysis:  Increased data rates will cause Luminosity Blocks to no longer be constrained to a single file and their accounting becomes more complex.  A common Event Data Model for Athena and ROOT analyses requires sharing metadata and tools handling metadata.  The move to new computing architectures requires extensions to the metadata infrastructure. ATLAS has extended its metadata framework in several ways to cope with Run 2 challenges. An initiative to determine metadata requirements for a next-generation control framework, which will be multithreaded, is near completion. A prototype, with reduced reliance upon incident handling and more amenable to robust bookkeeping for opportunistic distributed processing, is due by early 2016.