The following article is Open access

Launching large computing applications on a disk-less cluster

, and

Published under licence by IOP Publishing Ltd
, , Citation Rainer Schwemmer et al 2011 J. Phys.: Conf. Ser. 331 052032 DOI 10.1088/1742-6596/331/5/052032

1742-6596/331/5/052032

Abstract

The LHCb Event Filter Farm system is based on a cluster of the order of 1.500 disk-less Linux nodes. Each node runs one instance of the filtering application per core. The amount of cores in our current production environment is 8 per machine for the old cluster and 12 per machine on extension of the cluster.

Each instance has to load about 1.000 shared libraries, weighting 200 MB from several directory locations from a central repository. The repository is currently hosted on a SAN and exported via NFS. The libraries are all available in the local file system cache on every node. Loading a library still causes a huge number of requests to the server though, because the loader will try to probe every available path. Measurements show there are between 100.000-200.000 calls per application instance start up. Multiplied by the numbers of cores in the farm, this translates into a veritable DDoS attack on the servers, which lasts several minutes. Since the application is being restarted frequently, a better solution had to be found.scp

Rolling out the software to the nodes is out of the question, because they have no disks and the software in it's entirety is too large to put into a ram disk. To solve this problem we developed a FUSE based file systems which acts as a permanent, controllable cache that keeps the essential files that are necessary in stock.

Export citation and abstract BibTeX RIS

Please wait… references are loading.
10.1088/1742-6596/331/5/052032