Table of contents

Volume 368

2012

Previous issue Next issue

14th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2011) 5–9 September 2011, Uxbridge, London, UK

Accepted papers received: 21 May 2012
Published online: 21 June 2012

Preface

011001
The following article is Open access

, , , , , , and

This volume of Journal of Physics: Conference Series is dedicated to scientific contributions presented at the 14th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2011) which took place on 5–7 September 2011 at Brunel University, UK.

The workshop series, which began in 1990 in Lyon, France, brings together computer science researchers and practitioners, and researchers from particle physics and related fields in order to explore and confront the boundaries of computing and of automatic data analysis and theoretical calculation techniques. It is a forum for the exchange of ideas among the fields, exploring and promoting cutting-edge computing, data analysis and theoretical calculation techniques in fundamental physics research.

This year's edition of the workshop brought together over 100 participants from all over the world. 14 invited speakers presented key topics on computing ecosystems, cloud computing, multivariate data analysis, symbolic and automatic theoretical calculations as well as computing and data analysis challenges in astrophysics, bioinformatics and musicology. Over 80 other talks and posters presented state-of-the art developments in the areas of the workshop's three tracks: Computing Technologies, Data Analysis Algorithms and Tools, and Computational Techniques in Theoretical Physics. Panel and round table discussions on data management and multivariate data analysis uncovered new ideas and collaboration opportunities in the respective areas.

This edition of ACAT was generously sponsored by the Science and Technology Facility Council (STFC), the Institute for Particle Physics Phenomenology (IPPP) at Durham University, Brookhaven National Laboratory in the USA and Dell.

We would like to thank all the participants of the workshop for the high level of their scientific contributions and for the enthusiastic participation in all its activities which were, ultimately, the key factors in the success of the workshop.

Further information on ACAT 2011 can be found at http://acat2011.cern.ch

Dr Liliana Teodorescu Brunel University

The PDF also contains details of the workshop's committees and sponsors.

011002
The following article is Open access

All papers published in this volume of Journal of Physics: Conference Series have been peer reviewed through processes administered by the proceedings Editors. Reviews were conducted by expert referees to the professional and scientific standards expected of a proceedings journal published by IOP Publishing.

Papers

Computing technologies for physics research

012001
The following article is Open access

, , , and

This paper describes a new approach to the visualization of information about the operation of the ATLAS Trigger and Data Acquisition system. ATLAS is one of the two general purpose detectors positioned along the Large Hadron Collider at CERN. Its data acquisition system consists of several thousand computers interconnected via multiple gigabit Ethernet networks, that are constantly monitored via different tools. Operational parameters ranging from the temperature of the computers to the network utilization are stored in several databases for later analysis. Although the ability to view these data-sets individually is already in place, currently there is no way to view this data together, in a uniform format, from one location. The ADAM project has been launched in order to overcome this limitation. It defines a uniform web interface to collect data from multiple providers that have different structures. It is capable of aggregating and correlating the data according to user defined criteria. Finally, it visualizes the collected data using a flexible and interactive front-end web system. Structurally, the project comprises of 3 main levels of the data collection cycle: The Level 0 represents the information sources within ATLAS. These providers do not store information in a uniform fashion. The first step of the project was to define a common interface with which to expose stored data. The interface designed for the project originates from the Google Data Protocol API. The idea is to allow read-only access to data providers, through HTTP requests similar in format to the SQL query structure. This provides a standardized way to access this different information sources within ATLAS. The Level 1 can be considered the engine of the system. The primary task of the Level 1 is to gather data from multiple data sources via the common interface, to correlate this data together, or over a defined time series, and expose the combined data as a whole to the Level 2 web interface. The Level 2 is designed to present the data in a similar style and aesthetic, despite the different data sources. Pages can be constructed, edited and personalized by users to suit the specific data being shown. Pages can show a collection of graphs displaying data potentially coming from multiple sources. The project as a whole has a great amount of scope thanks to the uniform approach chosen for exposing data, and the flexibility of the Level 2 in presenting results. The paper will describe in detail the design and implementation of this new tool. In particular we will go through the project architecture, the implementation choices and the examples of usage of the system in place within the ATLAS TDAQ infrastructure.

012002
The following article is Open access

, , , and

This paper describes P-BEAST, a highly scalable, highly available and durable system for archiving monitoring information of the trigger and data acquisition (TDAQ) system of the ATLAS experiment at CERN. Currently this consists of 20,000 applications running on 2,400 interconnected computers but it is foreseen to grow further in the near future. P-BEAST stores considerable amounts of monitoring information which would otherwise be lost. Making this data accessible, facilitates long term analysis and faster debugging. The novelty of this research consists of using a modern key-value storage technology (Cassandra) to satisfy the massive time series data rates, flexibility and scalability requirements entailed by the project. The loose schema allows the stored data to evolve seamlessly with the information flowing within the Information Service. An architectural overview of P-BEAST is presented alongside a discussion about the technologies considered as candidates for storing the data. The arguments which ultimately lead to choosing Cassandra are explained. Measurements taken during operation in production environment illustrate the data volume absorbed by the system and techniques for reducing the required Cassandra storage space overhead.

012003
The following article is Open access

We present an online measurement of the LHC beamspot parameters in ATLAS using the High Level Trigger (HLT). When a significant change is detected in the measured beamspot, it is distributed to the HLT. There, trigger algorithms like b-tagging which calculate impact parameters or decay lengths benefit from a precise, up-to-date set of beamspot parameters. Additionally, online feedback is sent to the LHC operators in real time. The measurement is performed by an algorithm running on the Level 2 trigger farm, leveraging the high rate of usable events. Dedicated algorithms perform a full scan of the silicon detector to reconstruct event vertices from registered tracks. The distribution of these vertices is aggregated across the farm and their shape is extracted through fits every 60 seconds to determine the beamspot position, size, and tilt. The reconstructed beamspot values are corrected for detector resolution effects, measured in situ using the separation of vertices whose tracks have been split into two collections. Furthermore, measurements for individual bunch crossings have allowed for studies of single-bunch distributions as well as the behavior of bunch trains. This talk will cover the constraints imposed by the online environment and describe how these measurements are accomplished with the given resources. The algorithm tasks must be completed within the time constraints of the Level 2 trigger, with limited CPU and bandwidth allocations. This places an emphasis on efficient algorithm design and the minimization of data requests.

012004
The following article is Open access

, and

The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment at CERN is the infrastructure responsible for collecting and transferring ATLAS experimental data from detectors to the mass storage system. It relies on a large, distributed computing environment, including thousands of computing nodes with thousands of application running concurrently. In such a complex environment, information analysis is fundamental for controlling applications behavior, error reporting and operational monitoring. During data taking runs, streams of messages sent by applications via the message reporting system together with data published from applications via information services are the main sources of knowledge about correctness of running operations. The flow of data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. This requires strong competence and experience in understanding and discovering problems and root causes, and often the meaningful information is not in the single message or update, but in the aggregated behavior in a certain time-line. The AAL project is meant at reducing the man power needs and at assuring a constant high quality of problem detection by automating most of the monitoring tasks and providing real-time correlation of data-taking and system metrics. This project combines technologies coming from different disciplines, in particular it leverages on an Event Driven Architecture to unify the flow of data from the ATLAS infrastructure, on a Complex Event Processing (CEP) engine for correlation of events and on a message oriented architecture for components integration. The project is composed of 2 main components: a core processing engine, responsible for correlation of events through expert-defined queries and a web based front-end to present real-time information and interact with the system. All components works in a loose-coupled event based architecture, with a message broker to centralize all communication between modules. The result is an intelligent system able to extract and compute relevant information from the flow of operational data to provide real-time feedback to human experts who can promptly react when needed. The paper presents the design and implementation of the AAL project, together with the results of its usage as automated monitoring assistant for the ATLAS data taking infrastructure.

012005
The following article is Open access

, , , , , , , , , et al

ATLAS has recorded almost 5PB of RAW data since the LHC started running at the end of 2009. Many more derived data products and complimentary simulation data have also been produced by the collaboration and, in total, 70PB is currently stored in the Worldwide LHC Computing Grid by ATLAS. All of this data is managed by the ATLAS Distributed Data Management system, called Don Quixote 2 (DQ2). DQ2 has evolved rapidly to help ATLAS Computing operations manage these large quantities of data across the many grid sites at which ATLAS runs and to help ATLAS physicists get access to this data. In this paper we describe new and improved DQ2 services: popularity; space monitoring and accounting; exclusion service; cleaning agents; deletion agents. We describe the experience of data management operation in ATLAS computing, showing how these services enable management of petabyte scale computing operations. We illustrate the coupling of data management services to other parts of the ATLAS computing infrastructure, in particular showing how feedback from the distributed analysis system in ATLAS has enabled dynamic placement of the most popular data, helping users and groups to analyse the increasing data volumes on the grid.

012006
The following article is Open access

and

For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges are being met with a R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques used for efficient indexing of the data. We also discuss the hardware requirements as they were determined by testing with actual data and realistic loads.

012007
The following article is Open access

and

As cloud middleware and cloud providers have become more robust, various experiments with experience in Grid submission have begun to investigate the possibility of taking previously Grid-Enabled applications and making them compatible with Cloud Computing. Successful implementation will allow for dynamic scaling of the available hardware resources, providing access to peak-load handling capabilities and possibly resulting in lower costs to the experiment. Here we discuss current work within the CMS collaboration at the LHC to both perform computation on EC2, both for production and analysis use-cases. We also discuss break-even points between dedicated and cloud resources using real-world costs derived from a CMS site.

012008
The following article is Open access

A crucial component of the CMS Software is the reconstruction, which translates the signals coming from the detector's readout electronics into concrete physics objects such as leptons, photons and jets. Given its relevance for all physics analyses, the behaviour and quality of the reconstruction code must be carefully monitored. In particular, the compatibility of its outputs between subsequent releases and the impact of the usage of new algorithms must be carefully assessed. The automated procedure adopted by CMS to accomplish this ambitious task and the innovative tools developed for that purpose are presented. The whole chain of steps is illustrated, starting from the application testing over large ensembles of datasets to emulate Tier-0, Tier-1 and Tier-2 environments, to the collection of the physical quantities in the form of several hundred thousand histograms, to the estimation of their compatibility between releases, to the final production and publication of reports characterised by an efficient representation of the information.

012009
The following article is Open access

, , and

In the Grid world, there are many tools for monitoring both activities and infrastructure. The huge amount of information available needs to be well organized, especially considering the pressing need for prompt reaction in case of problems impacting the activities of a large Virtual Organization. Such activities include data taking, data reconstruction, data reprocessing and user analysis. The monitoring system for the LHCb Grid Computing relies on many heterogeneous and independent sources of information. These offers different views for a better understanding of problems, while an operations team follow defined procedures that have been put in place to handle them. This work summarizes the state-of-the-art of LHCb Grid operations, emphasizing the reasons that brought to various choices, and what are the tools currently in use to run our daily activities. We highlight the most common problems experienced across years of activities on the WLCG infrastructure, the services with their criticality, the procedures in place, the relevant metrics, the tools available and the ones still missing.

012010
The following article is Open access

, and

The LHCb computing model was designed in order to support the LHCb physics program, taking into account LHCb specificities (event sizes, processing times etc...). Within this model several key activities are defined, the most important of which are real data processing (reconstruction, stripping and streaming, group and user analysis), Monte-Carlo simulation and data replication. In this contribution we detail how these activities are managed by the LHCbDIRAC Data Transformation System. The LHCbDIRAC Data Transformation System leverages the workload and data management capabilities provided by DIRAC, a generic community grid solution, to support data-driven workflows (or DAGs). The ability to combine workload and data tasks within a single DAG allows to create highly sophisticated workflows with the individual steps linked by the availability of data. This approach also provides the advantage of a single point at which all activities can be monitored and controlled. While several interfaces are currently supported (including python API and CLI), we will present the ability to create LHCb workflows through a secure web interface, control their state in addition to creating and submitting jobs. To highlight the versatility of the system we present in more detail experience with real data of the 2010 and 2011 LHC run.

012011
The following article is Open access

, , , , , , , , and

The Virtual Machine framework was used to assemble the STAR-computing environment, validated once, deployed on over 100 8-core VMs at NERSC and Argonne National Lab, and used as a homogeneous Virtual Farm processing events acquired in real time by STAR detector located at Brookhaven National Lab. To provide time dependent calibration, a database snapshot scheme was devised. The two high capacity filesystems, localized at the opposite coasts of US and interconnected via Globus-Online protocol, were used in this setup, which resulted with a highly scalable Cloud-based extension of STAR computing resources. The system was in continuous operation for over 3 months.

012012
The following article is Open access

, , , , , , and

With the Job Execution Monitor, a user-centric job monitoring software developed at the University of Wuppertal and integrated into the job brokerage systems of the WLCG, job progress and grid worker node health can be supervised in real time. Imminent error conditions can thus be detected early by the submitter and countermeasures can be taken. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job misbehaviour. To remove the last "blind spot" from this monitoring, a remote debugging technique based on the GNU C compiler suite was developed and integrated into the software; its design concept and architecture is described in this paper and its application discussed.

012013
The following article is Open access

The World-wide LHC Computing Grid is a global infrastructure set up to process the experimental data from the experiments at the Large Hadron Collider located at CERN. The UK component is provided by the GridPP project across 19 sites at the universities and Rutherford Lab. To ensure that these large computational resources are available and reliable requires many different monitoring systems, ranging from local site monitoring of individual components, through UK-wide monitoring of Grid functionality, to the worldwide monitoring of resource provision and usage. In this paper we describe the monitoring systems used for the many different aspects of the system, and how some of them are being integrated together.

012014
The following article is Open access

and

We have developed an interface within the ALICE analysis framework that allows transparent usage of the experiment's distributed resources. This analysis plug-in makes it possible to configure back-end specific parameters from a single interface and to run with no change the same custom user analysis in many computing environments, from local workstations to PROOF clusters or GRID resources. The tool is used now extensively in the ALICE collaboration for both end-user analysis and large scale productions.

012015
The following article is Open access

and

When ultra-high energy cosmic rays enter the atmosphere they interact producing extensive air showers (EAS) which are the objects studied by the Pierre Auger Observatory. The number of particles involved in an EAS at these energies is of the order of billions and the generation of a single simulated EAS requires many hours of computing time with current processors. In addition, the storage space consumed by the output of one simulated EAS is very high. Therefore we have to make use of Grid resources to be able to generate sufficient quantities of showers for our physics studies in reasonable time periods. We have developed a set of highly automated scripts written in common software scripting languages in order to deal with the high number of jobs which we have to submit regularly to the Grid. In spite of the low number of sites supporting our Virtual Organization (VO) we have reached the top spot on CPU consumption among non LHC (Large Hadron Collider) VOs within EGI (European Grid Infrastructure).

012016
The following article is Open access

, , , and

Following a previous publication [1], this study aims at investigating the impact of regional affiliations of centres on the organisation of collaboration within the Distributed Computing ALICE infrastructure, based on social networks methods. A self-administered questionnaire was sent to all centre managers about support, email interactions and wished collaborations in the infrastructure. Several additional measures, stemming from technical observations were produced, such as bandwidth, data transfers and Internet Round Trip Time (RTT) were also included. Information for 50 centres were considered (60% response rate). Empirical analysis shows that despite the centralisation on CERN, the network is highly organised by regions. The results are discussed in the light of policy and efficiency issues.

012017
The following article is Open access

Current High Energy and Nuclear Physics (HENP) libraries and frameworks were written before multicore systems became widely deployed and used. From this environment, a 'single-thread' processing model naturally emerged but the implicit assumptions it encouraged are greatly impairing our abilities to scale in a multicore/manycore world. While parallel programming - still in an intensive phase of R&D despite the 30+ years of literature on the subject - is an obvious topic to consider, other issues (build scalability, code clarity, code deployment and ease of coding) are worth investigating when preparing for the manycore era. Moreover, if one wants to use another language than C++, a language better prepared and tailored for expressing concurrency, one also needs to ensure a good and easy reuse of already field-proven libraries. We present the work resulting from such investigations applied to the Go programming language. We first introduce the concurrent programming facilities Go is providing and how its module system addresses the build scalability and dependency hell issues. We then describe the process of leveraging the many (wo)man-years put into scientific Fortran/C/C++ libraries and making them available to the Go ecosystem. The ROOT data analysis framework, the C-BLAS library and the Herwig-6 MonteCarlo generator will be taken as examples. Finally, performances of the tools involved in a small analysis written in Go and using ROOT I/O library will be presented.

012018
The following article is Open access

, , , , , , , , , et al

The shared memory architecture of multicore CPUs provides HEP developers with the opportunity to reduce the memory footprint of their applications by sharing memory pages between the cores in a processor. ATLAS pioneered the multi-process approach to parallelize HEP applications. Using Linux fork() and the Copy On Write mechanism we implemented a simple event task farm, which allowed us to achieve sharing of almost 80% of memory pages among event worker processes for certain types of reconstruction jobs with negligible CPU overhead. By leaving the task of managing shared memory pages to the operating system, we have been able to parallelize large reconstruction and simulation applications originally written to be run in a single thread of execution with little to no change to the application code. The process of validating AthenaMP for production took ten months of concentrated effort and is expected to continue for several more months. Besides validating the software itself, an important and time-consuming aspect of running multicore applications in production was to configure the ATLAS distributed production system to handle multicore jobs. This entailed defining multicore batch queues, where the unit resource is not a core, but a whole computing node; monitoring the output of many event workers; and adapting the job definition layer to handle computing resources with different event throughputs. We will present scalability and memory usage studies, based on data gathered both on dedicated hardware and at the CERN Computer Center.

012019
The following article is Open access

, , and

In order to optimize the use and management of computing centres, their conversion to cloud facilities is becoming increasingly popular. In a medium to large cloud facility, many different virtual clusters may concur for the same resources: unused resources can be freed either by turning off idle virtual machines, or by lowering resources assigned to a virtual machine at runtime. PROOF, a ROOT-based parallel and interactive analysis framework, is officially endorsed in the computing model of the ALICE experiment as complementary to the Grid, and it has become very popular over the last three years. The locality of PROOF-based analysis facilities forces system administrators to scavenge resources, yet the chaotic nature of user analysis tasks deems them unstable and inconstantly used, making PROOF a typical use-case for HPC cloud computing. Currently, PoD dynamically and easily provides a PROOF-enabled cluster by submitting agents to a job scheduler. Unfortunately, a Tier-2 does not comfortably share the same queue between interactive and batch jobs, due to the very large average time to completion of the latter: an elastic cloud approach would enable interactive virtual machines to temporarily subtract resources to the batch ones, without a noticeable impact on them. In this work we describe our setup of a dynamic PROOF-based cloud analysis facility based on PoD and OpenNebula, orchestrated by a simple and lightweight control daemon that makes virtualization transparent for the user.

012020
The following article is Open access

and

The PROOF benchmark suite is a new utility suite of PROOF to measure performance and scalability. The primary goal of the benchmark suite is to determine optimal configuration parameters for a set of machines to be used as PROOF cluster. The suite measures the performance of the cluster for a set of standard tasks as a function of the number of effective processes. Cluster administrators can use the suite to measure the performance of the cluster and find optimal configuration parameters. PROOF developers can also utilize the suite to help them measure, identify problems and improve their software. In this paper, the new tool is explained in detail and use cases are presented to illustrate the new tool.

012021
The following article is Open access

, and

Traditional relational databases have not always been well matched to the needs of data-intensive sciences, but efforts are underway within the database community to attempt to address many of the requirements of large-scale scientific data management. One such effort is the open-source project SciDB. Since its earliest incarnations, SciDB has been designed for scalability in parallel and distributed environments, with a particular emphasis upon native support for array constructs and operations. Such scalability is of course a requirement of any strategy for large-scale scientific data handling, and array constructs are certainly useful in many contexts, but these features alone do not suffice to qualify a database product as an appropriate technology for hosting particle physics or cosmology data. In what constitutes its 1.0 release in June 2011, SciDB has extended its feature set to address additional requirements of scientific data, with support for user-defined types and functions, for data versioning, and more. This paper describes an evaluation of the capabilities of SciDB for two very different kinds of physics data: event-level metadata records from proton collisions at the Large Hadron Collider (LHC), and the output of cosmological simulations run on very-large-scale supercomputers. This evaluation exercises the spectrum of SciDB capabilities in a suite of tests that aim to be representative and realistic, including, for example, definition of four-vector data types and natural operations thereon, and computational queries that match the natural use cases for these data.

012022
The following article is Open access

, , and

The massive data processing in a multi-collaboration environment with geographically spread diverse facilities will be hardly "fair" to users and hardly using network bandwidth efficiently unless we address and deal with planning and reasoning related to data movement and placement. The needs for coordinated data resource sharing and efficient plans solving the data transfer paradigm in a dynamic way are being more required. We will present the work which purpose is to design and develop an automated planning system acting as a centralized decision making component with emphasis on optimization, coordination and load-balancing.

We will describe the most important optimization characteristic and modeling approach based on "constraints". Constraint-based approach allows for a natural declarative formulation of what must be satisfied, without expressing how. The architecture of the system, communication between components and execution of the plan by underlying data transfer tools will be shown. We will emphasize the separation of the planner from the "executors" and explain how to keep the proper balance between being deliberative and reactive. The extension of the model covering full coupling and reasoning about computing resources will be shown.

The system has been deployed within STAR experiment over several Tier sites and has been used for data movement in the favour of user analyses or production processing. We will present several real use-case scenario and performance of the system with a comparison to the "traditional" - solved by hands methods. The benefits in terms of indispensable shorter data delivery time due to leveraging available network paths and intermediate caches will be revealed. Finally, we will outline several possible enhancements and avenues for future work.

012023
The following article is Open access

, , , and

We describe parallel implementations of an algorithm used to evaluate the likelihood function used in data analysis. The implementations run, respectively, on CPU and GPU, and both devices cooperatively (hybrid). CPU and GPU implementations are based on OpenMP and OpenCL, respectively. The hybrid implementation allows the application to run also on multi-GPU systems (not necessarily of the same type). The hybrid case uses a scheduler so that the workload needed for the evaluation of function is split and balanced in corresponding sub-workloads to be executed in parallel on each device, i. e. CPU-GPU or multi-CPUs. We present the results of the scalability when running on CPU. Then we show the comparison of the performance of the GPU implementation on different hardware systems from different vendors, and the performance when running in the hybrid case. The tests are based on likelihood functions from real data analysis carried out in the high energy physics community.

012024
The following article is Open access

, and

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty algorithm where mod operation is not performed explicitly due to unsigned integer overflow. Using this hybrid generator, multivariate correlated sampling based on alias technique has been carried out using both CUDA and OpenCL languages.

012025
The following article is Open access

, and

In-line holographic imaging is used for small particulates, such as cloud or spray droplets, marine plankton, and alluvial sediments, and enables a true 3D object field to be recorded at high resolution over a considerable depth. To reconstruct a digital hologram a 2D FFT must be calculated for every depth slice desired in the replayed image volume. A typical in-line hologram of ∼ 100 micrometre-sized particles over a depth of a few hundred millimetres will require O(1000) 2D FFT operations to be performed on an hologram of typically a few million pixels. In previous work we have reported on our experiences with reconstruction on a computational grid. In this paper we discuss the technical challenges in making efficient use of the NVIDIA Tesla and Fermi GPU systems and show how our reconstruction code was optimised for near real-time video slice reconstruction with holograms as large as 4K by 4K pixels. We also consider the implications for grid and cloud computing approaches to hologram replay, and the extent to which a GPU can replace these approaches, when the important step of locating focussed objects within a reconstructed volume is included.

012026
The following article is Open access

, , and

Data from high-energy physics experiments are collected with significant financial and human effort and are mostly unique. However, until recently no coherent strategy existed for data preservation and re-use, and many important and complex data sets have simply been lost. While the current focus is on the LHC at CERN, in the current period several important and unique experimental programs at other facilities are coming to an end, including those at HERA, b-factories and the Tevatron. To address this issue, an inter-experimental study group on HEP data preservation and long-term analysis (DPHEP) was convened at the end of 2008. The group now aims to publish a full and detailed review of the present status of data preservation in high energy physics. This contribution summarises the results of the DPHEP study group, describing the challenges of data preservation in high energy physics and the group's first conclusions and recommendations. The physics motivation for data preservation, generic computing and preservation models, technological expectations and governance aspects at local and international levels are examined.

012027
The following article is Open access

, and

Preserving data from past experiments and preserving the ability to perform analysis with old data is of growing importance in many domains of science, including High Energy Physics (HEP). A study group on this issue, DPHEP, has been established in this field to provide guidelines and a structure for international collaboration on data preservation projects in HEP. This contribution presents a framework that allows experimentalists to validate their software against a previously defined set of tests in an automated way. The framework has been designed with a special focus for longevity, as it makes use of open protocols, has a modular design and is based on simple communication mechanisms. On the fabrics side, tests are carried out in a virtual environment using a cloud infrastructure. Within the framework, it is easy to run validation tests on different hardware platforms, or different major or minor versions of operating systems. Experts from IT or the experiments can automatically detect failures in the test procedure by the help of reporting tools. Hence, appropriate actions can be taken in a timely manner. The design and important implementation aspects of the framework are shown and first experiences from early-bird users will be presented.

Data analysis – algorithms and tools

012028
The following article is Open access

, , and

Multivariate analysis (MVA) methods, especially discrimination techniques such as neural networks, are key ingredients in modern data analysis and play an important role in high energy physics. They are usually trained on simulated Monte Carlo (MC) samples to discriminate so called "signal" from "background" events and are then applied to data to select real events of signal type. We here address procedures that improve this work flow. This will be the enhancement of data / MC agreement by reweighting MC samples on a per event basis. Then training MVAs on real data using the sPlot technique will be discussed. Finally we will address the construction of MVAs whose discriminator is independent of a certain control variable, i.e. cuts on this variable will not change the discriminator shape.

012029
The following article is Open access

and

Tau leptons play an important role in the physics program of the LHC. They are being used in electroweak measurements, in detector related studies and in searches for new phenomena like the Higgs boson or Supersymmetry. In the detector, tau leptons are reconstructed as collimated jets with low track multiplicity. Due to the background from QCD multijet processes, efficient tau identification techniques with large fake rejection are essential. Since single variable criteria are not enough to efficiently separate them from jets and electrons, modern multivariate techniques are used. In ATLAS, several advanced algorithms are applied to identify taus, including a projective likelihood estimator and boosted decision trees. All multivariate methods applied to the ATLAS simulated data perform better than the baseline cut analysis. Their performance is shown using high energy data collected at the ATLAS experiment. The improvement ranges from a factor of 2 to 5 in rejection for the same efficiency, depending on the selected efficiency operating point and the number of prongs in the tau decay. The strengths and weaknesses of each technique are also discussed.

012030
The following article is Open access

, , and

This paper presents the latest results from the Ringer algorithm, which is based on artificial neural networks for the electron identification at the online filtering system of the ATLAS particle detector, in the context of the LHC experiment at CERN. The algorithm performs topological feature extraction using the ATLAS calorimetry information (energy measurements). The extracted information is presented to a neural network classifier. Studies showed that the Ringer algorithm achieves high detection efficiency, while keeping the false alarm rate low. Optimizations, guided by detailed analysis, reduced the algorithm execution time by 59%. Also, the total memory necessary to store the Ringer algorithm information represents less than 6.2 percent of the total filtering system amount.

012031
The following article is Open access

Background properties in experimental particle physics are typically estimated from control samples corresponding to large numbers of events. This can provide precise knowledge of average background distributions, but typically does not take into account statistical fluctuations in a data set of interest. A novel approach based on mixture model decomposition is presented, as a way to extract additional information about statistical fluctuations from a given data set with a view to improving on knowledge of background distributions obtained from control samples. Events are treated as heterogeneous populations comprising particles originating from different processes, and individual particles are mapped to a process of interest on a probabilistic basis. The proposed approach makes it possible to estimate features of the background distributions from the data, and to extract information about statistical fluctuations that would otherwise be lost using traditional supervised classifiers trained on high-statistics control samples. A feasibility study on Monte Carlo is presented, together with a comparison with existing techniques. Finally, the prospects for the development of tools for intensive offline analysis of individual interesting events at the Large Hadron Collider are discussed.

012032
The following article is Open access

, , , , and

Most classification algorithms used in high energy physics fall under the category of supervised machine learning. Such methods require a training set containing both signal and background events and are prone to classification errors should this training data be systematically inaccurate for example due to the assumed MC model. To complement such model-dependent searches, we propose an algorithm based on semi-supervised anomaly detection techniques, which does not require a MC training sample for the signal data. We first model the background using a multivariate Gaussian mixture model. We then search for deviations from this model by fitting to the observations a mixture of the background model and a number of additional Gaussians. This allows us to perform pattern recognition of any anomalous excess over the background. We show by a comparison to neural network classifiers that such an approach is a lot more robust against misspecification of the signal MC than supervised classification. In cases where there is an unexpected signal, a neural network might fail to correctly identify it, while anomaly detection does not suffer from such a limitation. On the other hand, when there are no systematic errors in the training data, both methods perform comparably.

012033
The following article is Open access

and

The ATLAS hadronic tau trigger plays an important role in many analyses. Among these analyses are searches for H0, H±, W', and Z' in the tau decay channel. In order to achieve the needed sensitivity in these measurement it is important to reduce the QCD background, but at the same time to keep the signal efficiency high. Furthermore it is important to understand the trigger efficiency in real data. This paper summarizes the performance of the tau trigger in data collected by the ATLAS detector in 2011.

012034
The following article is Open access

and

A sophisticated trigger system, capable of real-time track reconstruction, is used in the ATLAS experiment to select interesting events in the proton-proton collisions at the Large Hadron Collider at CERN. A set of b-jet triggers was activated in ATLAS for the entire 2011 data-taking campaign and successfully selected events enriched in jets arising from heavy-flavour quarks. Such triggers were demonstrated to be crucial for the selection of events with no lepton signature and a large jet multiplicity. An overview of the track reconstruction and online b-jet selection with performance estimates from data is presented in these proceedings.

012035
The following article is Open access

and

In order to reach the track parameter accuracy motivated by the physics goals of the experiment, the ATLAS tracking system needs to determine accurately its almost 700,000 degrees of freedom. The demanded precision for the alignment of the silicon sensors is below 10 μm. The implementation of the track based alignment within the ATLAS software framework unifies different alignment approaches and allows the alignment of all tracking subsystems together. The alignment software relies on the tracking information (track-hit residuals) but also includes the capability to set constraints on the beam-spot and primary vertex as well as the momentum measured by the Muon System or the E/p using the calorimetry information. The alignment chain starts at the trigger level where a stream of high pT isolated tracks is selected online. Also a cosmic ray trigger is enabled while ATLAS is recording collision data, thus a stream of cosmic-ray tracks is recorded exactly with the same detector operating conditions as the normal collision tracks. We will present results of the alignment of the ATLAS tracker using the 2011 collision data. The validation of the alignment is performed using track-hit residuals as well as using more advanced physics observables. The results of the alignment with real data reveals that the attained precision for the alignment parameters is approximately 5 μm.

012036
The following article is Open access

and

The CMS all-silicon tracker consists of 16 588 modules with 25 684 sensors in total. In 2010 it has been successfully aligned using tracks from cosmic rays and proton-proton collisions, following the time dependent movements of its innermost pixel layers. In 2011, ultimate local precision is achieved by determining sensor curvatures in addition to module shifts and rotations, challenging the alignment procedure to determine about 200 000 parameters. This is achieved in a global fit approach using Millepede II with the General Broken Lines track model. Remaining alignment uncertainties are dominated by systematic effects that bias track parameters by an amount relevant for physics analyses. These effects are controlled by including information about the Z boson mass in the fit.

012037
The following article is Open access

and

The Tile Barrel Calorimeter (TileCal) is the central section of the hadronic calorimeter of ATLAS. It is a key detector for the reconstruction of hadrons, jets, taus and missing transverse energy and it assists the muon measurements due to a low signal-to-noise ratio. The energy deposited in each cell is read out by two electronic channels for redundancy and is estimated by reconstructing the amplitude of the digitized signal pulse sampled every 25 ns. This work presents an alternative approach for TileCal signal detection and amplitude estimation under low signal-to-noise ratio (SNR) conditions, exploring the applicability of a Matched Filter. The proposed method is compared to the Optimal Filter algorithm, that is currently being used at TileCal for energy reconstruction. The results for a simulated data set showed that for conditions where the signal pedestal could be considered stationary, the proposed method achieves a better SNR performance than the Optimal Filter technique.

012038
The following article is Open access

, , , and

The concept of "particle flow" has been developed to optimise the jet energy resolution by distinguishing the different jet components. A highly granular calorimeter designed for the particle flow algorithm provides an unprecedented level of detail for the reconstruction of calorimeter showers and enables new approaches to shower analysis. In this paper the measurement and use of the fractal dimension of showers is described. The fractal dimension is a characteristic number that measures the global compactness of the shower. It is highly dependent on the primary particle type and energy. Its application in identifying particles and estimating their energy is described in the context of a calorimeter designed for the International Linear Collider.

012039
The following article is Open access

, , , , , , , , , et al

Visual Physics Analysis (VISPA) is an analysis environment with applications in high energy and astroparticle physics. Based on a data-flow-driven paradigm, it allows users to combine graphical steering with self-written C++ and Python modules. This contribution presents new concepts integrated in VISPA: layers, convenient analysis execution, and web-based physics analysis. While the convenient execution offers full flexibility to vary settings for the execution phase of an analysis, layers allow to create different views of the analysis already during its design phase. Thus, one application of layers is to define different stages of an analysis (e.g. event selection and statistical analysis). However, there are other use cases such as to independently optimize settings for different types of input data in order to guide all data through the same analysis flow. The new execution feature makes job submission to local clusters as well as the LHC Computing Grid possible directly from VISPA. Web-based physics analysis is realized in the VISPA@Web project, which represents a whole new way to design and execute analyses via a standard web browser.

012040
The following article is Open access

Based on the ROOT TEve/TGeo classes and the standard linear collider data structure, a dedicated linear collider event display has been developed. It supports the latest detector models for both International Linear Collider and Compact Linear Collider as well as the CALICE test beam prototypes. It can be used to visualise event information at the generation, simulation and reconstruction levels. Many options are provided in an intuitive interface. It has been heavily employed in a variety of analyses.

012041
The following article is Open access

, and

This paper introduces a probability density estimator based on Green's function identities. A density model is constructed under the sole assumption that the probability density is differentiable. The method is implemented as a binary likelihood estimator for classification purposes, so issues such as mis-modeling and overtraining are also discussed. The identity behind the density estimator can be interpreted as a real-valued, non-scalar kernel method which is able to reconstruct differentiable density functions.

012042
The following article is Open access

and

We present a new approach to simulate Beyond-Standard-Model (BSM) processes which are defined by multiple parameters. In contrast to the traditional grid-scan method where a large number of events are simulated at each point of a sparse grid in the parameter space, this new approach simulates only a few events at each of a selected number of points distributed randomly over the whole parameter space. In subsequent analysis, we rely on the fitting by the Bayesian Neural Network (BNN) technique to obtain accurate estimation of the acceptance distribution. With this new approach, the signal yield can be estimated continuously, while the required number of simulation events is greatly reduced.

012043
The following article is Open access

A frequently faced task in experimental physics is to measure the probability distribution of some quantity. Often this quantity to be measured is smeared by a non-ideal detector response or by some physical process. The procedure of removing this smearing effect from the measured distribution is called unfolding, and is a delicate problem in signal processing, due to the well-known numerical ill behavior of this task. Various methods were invented which, given some assumptions on the initial probability distribution, try to regularize the unfolding problem. Most of these methods definitely introduce bias into the estimate of the initial probability distribution. We propose a linear iterative method (motivated by the Neumann series / Landweber iteration known in functional analysis), which has the advantage that no assumptions on the initial probability distribution is needed, and the only regularization parameter is the stopping order of the iteration, which can be used to choose the best compromise between the introduced bias and the propagated statistical and systematic errors. The method is consistent: "binwise" convergence to the initial probability distribution is proved in absence of measurement errors under a quite general condition on the response function. This condition holds for practical applications such as convolutions, calorimeter response functions, momentum reconstruction response functions based on tracking in magnetic field etc. In presence of measurement errors, explicit formulae for the propagation of the three important error terms is provided: bias error (distance from the unknown to-be-reconstructed initial distribution at a finite iteration order), statistical error, and systematic error. A trade-off between these three error terms can be used to define an optimal iteration stopping criterion, and the errors can be estimated there. We provide a numerical C library for the implementation of the method, which incorporates automatic statistical error propagation as well. The proposed method is also discussed in the context of other known approaches.

012044
The following article is Open access

and

Adaptive Metropolis (AM) is a powerful recent algorithmic tool in numerical Bayesian data analysis. AM builds on a well-known Markov Chain Monte Carlo algorithm but optimizes the rate of convergence to the target distribution by automatically tuning the design parameters of the algorithm on the fly. Label switching is a major problem in inference on mixture models because of the invariance to symmetries. The simplest (non-adaptive) solution is to modify the prior in order to make it select a single permutation of the variables, introducing an identifiability constraint. This solution is known to cause artificial biases by not respecting the topology of the posterior. In this paper we describe an online relabeling procedure which can be incorporated into the AM algorithm. We give elements of convergence of the algorithm and identify the link between its modified target measure and the original posterior distribution of interest. We illustrate the algorithm on a synthetic mixture model inspired by the muonic water Cherenkov signal of the surface detectors in the Pierre Auger Experiment.

012045
The following article is Open access

, , , , and

Monte-Carlo sampling of two dimensional correlated variables (with non zero covariance) has been carried out using an extended alias technique which was originally proposed by A. J. Walker to sample from an one dimensional distribution. Although, the method has been applied to a correlated two dimensional Gaussian data sample, it is quite general and can easily be extended for sampling from a multidimensional correlated data sample of any arbitrary distribution.

012046
The following article is Open access

, and

When monitoring complex experiments, comparison is often made between regularly acquired histograms of data and reference histograms which represent the ideal state of the equipment. With the larger HEP experiments now ramping up, there is a need for automation of this task since the volume of comparisons could overwhelm human operators. However, the two-dimensional histogram comparison tools available in ROOT have been noted in the past to exhibit shortcomings. We discuss a newer comparison test for two-dimensional histograms, based on the Energy Test of Aslan and Zech, which provides more conclusive discrimination between histograms of data coming from different distributions than methods provided in a recent ROOT release.

012047
The following article is Open access

, and

Gerda is an experiment designed to look for the neutrinoless double beta decay of 76Ge. The experiment uses an array of high-purity germanium detectors (enriched in 76Ge) directly immersed in liquid argon. Gerda is presently operating eight enriched coaxial detectors (approximately 15 kg of 76Ge) and about 25 new custom-made enriched BEGe detectors will be deployed in the next phase (additional 20kg of 76Ge). The paper describes the Gerda off-line analysis of the high-purity germanium detector data. Firstly we present the signal processing flow, focusing on the digital filters and on the algorithms used. Secondly we discuss the rejection of non-physical events and the data quality monitoring. The analysis is performed completely with the Gerda software framework (Gelatio), designed to support a multi-channel processing and to perform a modular analysis of digital signals.

012048
The following article is Open access

Over a decade ago, the H1 Collaboration decided to embrace the object-oriented paradigm and completely redesign its data analysis model and data storage format. The event data model, based on the ROOT framework, consists of three layers - tracks and calorimeter clusters, identified particles and finally event summary data - with a singleton class providing unified access. This original solution was then augmented with a fourth layer containing user-defined objects. This contribution will summarise the history of the solutions used, from modifications to the original design, to the evolution of the high-level end-user analysis object framework which is used by H1 today. Several important issues are addressed - the portability of expert knowledge to increase the efficiency of data analysis, the flexibility of the framework to incorporate new analyses, the performance and ease of use, and lessons learned for future projects.

Computations in theoretical physics – techniques and methods

012049
The following article is Open access

We describe three algorithms for computer-aided symbolic multi-loop calculations that facilitated some recent novel results. First, we discuss an algorithm to derive the canonical form of an arbitrary Feynman integral in order to facilitate their identification. Second, we present a practical solution to the problem of multi-loop analytical tensor reduction. Finally, we discuss the partial fractioning of polynomials with external linear relations between the variables. All algorithms have been tested and used in real calculations.

012051
The following article is Open access

, and

The new version of the program SecDec is described, which can be used for the extraction of poles within dimensional regularisation from multi-loop integrals as well as phase space integrals. The numerical evaluation of the resulting finite functions is also done by the program in an automated way, with no restriction on the kinematics in the case of loop integrals.

012052
The following article is Open access

Most calculations of quantum corrections in supersymmetric theories are made with the dimensional reduction, which is a modification of the dimensional regularization. However, it is well known that the dimensional reduction is not self-consistent. A consistent regularization, which does not break the supersymmetry, is the higher covariant derivative regularization. However, the integrals obtained with this regularization can not be usually calculated analytically. We discuss application of this regularization to the calculations in supersymmetric theories. In particular, it is demonstrated that integrals defining the β-function are possibly integrals of total derivatives. This feature allows to explain the origin of the exact NSVZ β-function, relating the β-function with the anomalous dimensions of the matter superfields. However, integrals for the anomalous dimension should be calculated numerically.

012053
The following article is Open access

A Key feature of the minimal supersymmetric extension of the Standard Model (mssm) is the existence of a light Higgs boson, the mass of which is not a free parameter but an observable that can be predicted from the theory. Given that the lhc is able to measure the mass of a light Higgs with very good accuracy, a lot of effort has been put into a precise theoretical prediction. We present a calculation of the susy-qcd corrections to this observable to three-loop order. We perform multiple asymptotic expansions in order to deal with the multi-scale three-loop diagrams, making heavy use of computer algebra and keeping a keen eye on the numerical error introduced. We provide a computer code in the form of a Mathematica package that combines our three-loop susy-qcd calculation with the literature of one- and two-loop corrections to the Higgs mass, providing a state-of-the-art prediction for this important observable.

012054
The following article is Open access

, and

We present additions and improvements in Version 7 of FormCalc, most notably analytic tensor reduction, choice of OPP methods, and MSSM initialization via FeynHiggs, as well as a parallelized Cuba library for numerical integration.

012055
The following article is Open access

, and

We present the publicly available program NGluon allowing the numerical evaluation of primitive amplitudes at one-loop order in massless QCD. The program allows the computation of one-loop amplitudes for an arbitrary number of gluons. The focus of the present article is the extension to one-loop amplitudes including an arbitrary number of massless quark pairs. We discuss in detail the algorithmic differences to the pure gluonic case and present cross checks to validate our implementation. The numerical accuracy is investigated in detail.

012056
The following article is Open access

, , , , , , and

The program package GoSam is presented which aims at the automated calculation of one-loop amplitudes for multi-particle processes. The amplitudes are generated in terms of Feynman diagrams and can be reduced using either D-dimensional integrand-level decomposition or tensor reduction, or a combination of both. GoSam can be used to calculate one-loop corrections to both QCD and electroweak theory, and model files for theories Beyond the Standard Model can be linked as well. A standard interface to programs calculating real radiation is also included. The flexibility of the program is demonstrated by various examples.

012057
The following article is Open access

, and

We present an algebraic approach to one-loop tensor integral reduction. The integrals are presented in terms of scalar one- to four-point functions. The reduction is worked out explicitly until five-point functions of rank five. The numerical C++ package PJFry evaluates tensor coefficients in terms of a basis of scalar integrals, which is provided by an external library, e.g. QCDLoop. We shortly describe installation and use of PJFry. Examples for numerical results are shown, including a special treatment for small or vanishing inverse four-point Gram determinants. An extremely efficient application of the formalism is the immediate evaluation of complete contractions of the tensor integrals with external momenta. This leads to the problem of evaluating sums over products of signed minors with scalar products of chords. Chords are differences of external momenta. These sums may be evaluated analytically in a systematic way. The final expressions for the numerical evaluation are then compact combinations of the contributing basic scalar functions.

012058
The following article is Open access

We review the recent progress towards automation in the computation of the next-to-leading corrections to scattering amplitudes. Such progress allows for the construction of quite general, flexible and fully automated packages that would be of major importance for the Higgs boson and beyond the Standard Model physics searches at high energy particle colliders.

012059
The following article is Open access

I apply commonly used regularization schemes to a multiloop calculation to examine the properties of the schemes at higher orders. I find complete consistency between the conventional dimensional regularization scheme and dimensional reduction, but I find that the four-dimensional helicity scheme produces incorrect results at next-to-next-to-leading order and singular results at next-to-next-to-next-to-leading order. It is not, therefore, a unitary regularization scheme.

012060
The following article is Open access

, and

We report results of a new numerical regularization technique for infrared (IR) divergent loop integrals using dimensional regularization, where a positive regularization parameter ε, satisfying that the dimension d = 4 + 2ε, is introduced in the integrand to keep the integral from diverging as long as ε > 0. A sequence of integrals is computed for decreasing values of ε, in order to carry out a linear extrapolation as ε → 0. Each integral in the sequence is calculated according to the Direct Computation Method (DCM) to handle (threshold) integrand singularities in the interior of the domain. The technique of this paper is applied to one-loop N-point functions. In order to simplify the computation of the integrals for small ε, particularly in the case of a threshold singularity, a reduction of the N-point function is performed numerically to a set of 3-point and 4-point integrals, and DCM is applied to the resulting vertex and box integrals.

012061
The following article is Open access

New features of the symbolic algebra package Form 4 are discussed. Most importantly, these features include polynomial factorization and polynomial gcd computation. Examples of their use are shown. One of them is an exact version of Mincer which gives answers in terms of rational polynomials and 5 master integrals.

012062
The following article is Open access

, and

Octave is one of the most widely used open source tools for numerical analysis and liner algebra. Our project aims to improve Octave by introducing support for GPU computing in order to speed up some linear algebra operations. The core of our work is a C library that executes some BLAS operations concerning vector- vector, vector matrix and matrix-matrix functions on the GPU. OpenCL functions are used to program GPU kernels, which are bound within the GNU/octave framework. We report the project implementation design and some preliminary results about performance.

012063
The following article is Open access

and

We present a method developed by the NNPDF Collaboration that allows the inclusion of new experimental data into an existing set of parton distribution functions without the need for a complete refit. A Monte Carlo ensemble of PDFs may be updated by assigning each member of the ensemble a unique weight determined by Bayesian inference. The reweighted ensemble therefore represents the probability density of PDFs conditional on both the old and new data. This method is applied to the inclusion of W-lepton asymmetry data into the NNPDF2.1 fit producing a new PDF set, NNPDF2.2.

012064
The following article is Open access

, and

We describe a new method to extract parton distribution functions from hard scattering processes based on Self-Organizing Maps. The extension to a larger, and more complex class of soft matrix elements, including generalized parton distributions is also discussed.