This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
The following article is Open access

Application of rule-based data mining techniques to real time ATLAS Grid job monitoring data

, , , , , and

Published under licence by IOP Publishing Ltd
, , Citation R Ahrens et al 2012 J. Phys.: Conf. Ser. 396 032060 DOI 10.1088/1742-6596/396/3/032060

1742-6596/396/3/032060

Abstract

The Job Execution Monitor (JEM) is a job-centric grid job monitoring software developed at the University of Wuppertal and integrated into the pilot-based PanDA job brokerage system leveraging physics analysis and Monte Carlo event production for the ATLAS experiment on the Worldwide LHC Computing Grid (WLCG). With JEM, job progress and grid worker node health can be supervised in real time by users, site admins and shift personnel. Imminent error conditions can be detected early and countermeasures can be initiated by the Job's owner immedeatly. Grid site admins can access aggregated data of all monitored jobs to infer the site status and to detect job and Grid worker node misbehavior. Shifters can use the same aggregated data to quickly react to site error conditions and broken production tasks. In this work, the application of novel data-centric rule based methods and data-mining techniques to the real time monitoring data is discussed. The usage of such automatic inference techniques on monitoring data to provide job and site health summary information to users and admins is presented. Finally, the provision of a secure real-time control and steering channel to the job as extension of the presented monitoring software is considered and a possible model of such the control method is presented.

Export citation and abstract BibTeX RIS

Please wait… references are loading.
10.1088/1742-6596/396/3/032060