KCDC - The KASCADE Cosmic-ray Data Centre

KCDC, the KASCADE Cosmic-ray Data Centre, is a web portal, where data of astroparticle physics experiments will be made available for the interested public. The KASCADE experiment, financed by public money, was a large-area detector for the measurement of high-energy cosmic rays via the detection of air showers. KASCADE and its extension KASCADE-Grande stopped finally the active data acquisition of all its components including the radio EAS experiment LOPES end of 2012 after more than 20 years of data taking. In a first release, with KCDC we provide to the public the measured and reconstructed parameters of more than 160 million air showers. In addition, KCDC provides the conceptional design, how the data can be treated and processed so that they are also usable outside the community of experts in the research field. Detailed educational examples make a use also possible for high-school students and early stage researchers.


Introduction
The aim of the project KCDC -KASCADE Cosmic-ray Data Centre [1] is the installation and establishment of a public data centre for high-energy astroparticle physics based on the data of the KASCADE experiment (logo of KCDC, see Figure 1). KASCADE ( Figure 2) was a quite successful large detector array for measuring high-energy cosmic rays via the detection of extensive air showers (EAS). KASCADE recorded data during more than 20 years on site of the KIT, Campus North, Karlsruhe, Germany (formerly Forschungszentrum Karlsruhe) at 49.1 • N, 8.4 • E, and 110 m a.s.l. KASCADE collected within its lifetime more than 1.7 billion events of which some 425.000.000 survived all quality cuts. Initially, about 160 million events had been made available at KCDC for public usage.
KASCADE [2], as an extensive air shower (EAS) experiment, studies the cosmic ray primary composition and the hadronic interactions in the energy range of E 0 = 10 14 − 10 17 eV. With its extension to KASCADE-Grande in 2003 [3], the range could be extended to 10 18 eV. EAS are generated when high-energy cosmic particles enter the Earth's atmosphere. Forward-boosted secondary particles as well as emitted light during the development of the EAS in various frequency ranges form the detectable products. Main parts of the experiment were the Grande array spread over an area of 700 × 700 m 2 , the original KASCADE array covering 200 × 200 m 2 with non-shielded and shielded detectors, a large-size hadron calorimeter, and additional muon tracking devices. The radio antenna field LOPES [4] and the microwave experiment CROME [5], as well as some smaller test experiments and monitoring equipment completed the experimental set-up of KASCADE-Grande.
One of the main results obtained by KASCADE is a picture of increasingly heavier composition above the 'knee' in the cosmic-ray energy spectrum, caused by a break in the spectrum of the light components. Conventional acceleration and propagation models predict a change of the composition towards heavier components due to the charge dependent cut-offs in the flux of the individual elements [6,7]. The discovery of the knee in the heavy components by KASCADE-Grande, convincingly supports these theories [8].
KASCADE-Grande finally stopped the active data acquisition of all its components end of 2012 and is now decommissioned. The collaboration, however, continues the detailed analysis of nearly 20 years of high-quality air-shower data. Moreover, with KCDC, we provide to the public the edited data via a custom-made web page.

KCDC in a Nutshell
The KASCADE/KASCADE-Grande experiment was a large-area detector for the measurement of cosmic ray air showers financed by taxes. The aim of KCDC is the installation and establishment of a public data centre for high-energy astroparticle physics. In the research field of astroparticle physics, such a data release is a novelty, whereas the data publication in astronomy has been established for a long time. Therefore, there are no completed concepts, how the data can be treated and processed so that they are reasonably usable outside the collaboration. The first goal of KCDC is to make the data from the KASCADE experiment available to the community. A concept for this kind of data centre (software and hardware) is meanwhile developed, implemented, and already released as a public beta version to external users. However, the project faces thereby still open questions, e.g. how to ensure a consistent calibration, how to deal with data filtering and how to provide the data in a portable format as well as how a sustainable storage solution can be implemented. In addition, access rights and license policy play a non-negligible role and are considered in details. Readers are invited to visit KCDC under https://kcdc.ikp.kit.edu.
Already with the first release, KCDC provides efforts to fulfill following three basic requirements: • KCDC as data provider: There is free and unlimited open access to KASCADE cosmic ray data, where a selection of fully calibrated and reconstructed quantities per individual air shower is provided. The access has to rely on a reliable data source with a guaranteed data quality. • KCDC as information platform: For a meaningful usage of KCDC, a detailed experiment description as well as sufficient meta information on the provided data is needed for any kind of data analysis. This is accompanied by a reasonable description of the physics background as well as tutorials, which are focused on a level for teachers and pupils (in the present version of KCDC the tutorials are provided in German, only). • KCDC as long-term digital data archive: To constitute a sustainable piece of work, KCDC serves also as archive of software and data for the collaboration as well as for the public.

The Web Portal
The web portal (entrance page see Figure 3) as interface between the data archive, the data centre's software and the user is one of the most important parts of KCDC. It provides the door to the open data publication, where the baseline concept follows the 'Berlin Declaration on Open Data and Open Access' [9] which explicitly requests the use of web technologies and free, unlimited access for everyone. We declared both, the scientific and the non-scientific audience as focus of possible users. This requires extensive documentation of experiment, data, and software on a level understandable and handy for all. The portal uses modern technologies, including standard internet access and interactive data selections. The selected data are provided for download via a corresponding ftp-server. Figure 4 shows schematically the basic concept of the KCDC web portal.
It is foreseen that the software behind the data centre including the web portal is also made available at a later stage. Therefore, we were anxious that KCDC provides a modern software solution justified to perform both, publishing the KASCADE data and understandable for a general audience. If KCDC is running successfully and is accepted by the community the software will be released as Open Source for free use also by other experiments. To serve as a general software solution for open access to (astroparticle) data, KCDC is build as a modular, flexible framework with a good scalability (e.g. to large computing centres). The configuration is hold to be simple and doable also via a web interface; the entire software is based solely on Open Source Software (Python, Django, HTML/Javascript, CSSdata provider, etc.)

Data Availability
Since November 2013, the first release of KCDC, more than 160 million events of the KASCADE experiment with 14 parameters per event are available. In the first year of operation nearly 100 users registered, where we recognized access to KCDC by IP-addresses from more than 30 countries distributed over 5 continents.   If registered via the 'user page', the user is able to enter the data shop. A registration is necessary in order to ensure that the 'End User License Agreement' is read, i.e. the legal aspects of public data are understood (see also next chapter). At the data shop the user can select specific event samples. For each parameter a description is available in a corresponding info box appearing by a mouse-over function. After defining cuts the selection can be submitted. The user gets an Email notification when the selection has been processed and is ready for download via an ftp server. The data will be at the user's disposal in ASCII-format including a detailed header with descriptions of the selection and the data format. Also several pre-selections of KASCADE data are available directly at the data shop.

Web pages
The data can be used for any analysis, presumed that the user accept the 'limited use licence' (following text is taken from the KCDC-EULA, see also next chapter): Subject to your agreement and continuing compliance with the KCDC Terms, KIT hereby grants to you a limited, personal, nonexclusive, non-transferable, non-assignable and fully revocable license to -(a) use the webportal and (b) download and use the scientific data of the KCDC in compliance with good scientific practice -provided through the webportal or related online services for your non-commercial scientific purposes only. Commercial purposes are defined as projects for your own or third parties for which you are paid or granted values in lieu of cash for the use of the data.
There is no restriction on the kind of analysis with the provided data nor the publication of the results. However, the KCDC team would acknowledge notification on a use exceeding private education, as well as bug reports or suggestions for improvements. This can be done directly via the web portal and/or per Email to ikp-kcdc@lists.kit.edu.

Legal Aspects of KCDC
Opposite to software open source publications, there is no standard procedure yet available for open data publication. In cooperation with KIT and its law department we developed an own license based on the EULA (end user license agreement) model [10], adapted from that one often used for software. We had to consider a twofold issue as the license is needed for the web portal and the data. The KCDC approach is based on the EULA model, because it is flexible and adaptable to our needs, it includes the idea of requiring a good scientific practice, and it can be signed during registration and can be shipped with each data package.
In our custom-made adaption of the KCDC EULA we followed some key points from industry, like (i) no warranty for damage by owner of web portal or data; (ii) no guarantee for availability or uptime of the server; (iii) in case of disputes with local laws the EULA intention is conserved; (iv) changes are possible at any time; (v) the termination of EULA is at our digression, only, as well as obvious requirements from the open data idea, like (i) free access to the data and the web portal; (ii) good scientific practice for the work with the data; (iii) commercial usage of the data is not prohibited 1 ; (iv) the citation of collaboration, KIT, and the web portal is mandatory; (v) free redistribution of data 'as is'.