Implementation and performance estimation of new archive system for the TLS control system

The TLS (Taiwan Light Source) is a third generation of synchrotron light source that has been operated for more than 25 years, and its control system is a proprietary designed system. The TLS control system maintenance is difficult because of component outage problems. Some parts of the control system are being rejuvenated with the help of the EPICS framework used in the TPS (Taiwan Photon Source) control system to ensure that TLS continues to operate normally, saving both manpower and money. A new EPICS archive system was needed to efficiently record various machine parameters and status information during routine operations. As a result, the EPICS Archiver Appliance has been evaluated as suitable for deploying to archive TLS machine data which encapsulated the PV (Process Variable) channel access. Specific graphical user interfaces and API packages have been supported for quickly retrieving archived data, as well as a plotting function for easy diagnosis. Furthermore, the performance of this new TLS archive system has been estimated, and related system resources will be manually adjusted for better service. The efforts will be summarized in this paper.


Introduction
In 1992, the installation and integration of several subsystems was completed.In April 1993, it was successfully commissioned, and in October of same year, it was made available for usage.
The control system of TLS [1] is a proprietary designed system.The machine's long-term usage wore out some components, resulting in poor performance.However, maintaining the TLS control system is difficult because several parts have been discontinued and suitable replacements are difficult to find.Some control system components have been upgraded several times over the previous two decades to ensure that TLS continues to function normally [2].
Many facilities successfully used EPICS (Experimental Physics and Industrial Control System) [3] as the accelerator control system.The TPS (Taiwan Photon Source) control system is also built on the EPICS framework [4], and various control applications and diagnostic tools that support EPICS are in use.To save both manpower and money, the EPICS framework was adopted to assist revitalize some of the TLS control systems.
The EPICS Archiver Appliance [5,6] has been installed in a number of facilities [7,8].It is evaluated as suitable for deploying to archive TLS machine data with the features of good data retrieval performance and multiple-stage storage mechanism.It offers web-based interfaces for managing EPICS PVs (Process Variables) and setup parameters, as well as per-PV flexibility.Its API package has been supported for quickly retrieving archived data.The existing TLS archive system 2. System architecture of the TLS data archive system EPICS is used for freshly designed and renewed subsystems such as the BPM system, power supply controllers, insertion devices, electronics instruments interface, and etc.It is necessary to establish a new EPICS archive system with high-performance processing, quick data reading, and reliable storage that can process EPICS data directly.EPICS Archiver Appliance which supports CA network protocol is chosen to be the new archive system of TLS, it can receive PV that corresponds to some machine attributes and saves them as binary files.The system architecture for the TLS archive system is presented in Fig. 1.The high-performance server was utilized to process and store data from EPICS.There are two identically configured servers.One serves as the primary server connected to the TLS control network using 10Gbps fiber optic, and the other serves as a backup with 1Gbps network.To build a high availability system, the data will be kept synchronized between the two machines.When one server is offline, another server could take over to continue offering service.

Software environment of the TLS data archive system
The EPICS Archiver Appliance is composed of four services, each of which runs on a separate Tomcat JVM.There are two sampling methods: CA-Monitor samples data when PV updates, and Scan samples data periodically.Since some PV update rate is faster than the estimated sampling period, data may be dropped when using CA-Monitor.In this case, we use the Scan approach to avoid this issue.
The software architecture of the TLS archive system is shown in Fig. 2. The archive system is made up of two high-performance servers that utilize the same environment.The primary server scans the monitored PVs via Channel Access and stores them into STS (short-term store) which retains data from the most recent 24 hours.Data older than 24 hours and less than 14 days will automatically be transferred from STS to MTS (medium-term store).Older data will be integrated, moved to LTS (long-term store), and archived month by month.The RDB and the archived data will be periodically synced to the secondary server.Excellent data management flexibility is provided by the storage method of multiple stages, it is simple to backup and retrieve data.The clients can retrieve historical data through a specified port.CS-Studio [9] and Phoebus [10] are developed and maintained through a collaboration between many universities and laboratories.The tool named Data Browser may connect to the archive system using the PB/HTTP protocol to retrieve historical data and display trend charts on the screen.The programs it offers with a graphical user interface can be easily executed on Windows, Linux, and Macintosh; the GUI is shown in Fig. 3.With the help of retrieval API, existing viewers can retrieve data from the new archive system after simple modifications.New viewers can also be created with powerful tools such as Python or MATLAB.Figure 4 shows a Python retrieval tool integrated with the Tkinter toolkit; a GUI is provided to retrieve historical data, draw trends, and export data.

Performance test
The EPICS Archiver Appliance is a high-efficiency archiver system that can perform a lot of PV monitoring tasks.The system has been tested for several months, and about 300 PVs have been monitored.A single PV will generate about 5.8G data per year when updated at a speed of 10Hz.For workloads facing a small number of PVs, the overall CPU load is less than 10%.The average time taken to process data traveling from STS to MTS is 0.47 seconds, and the average time taken to process data moving from MTS to LTS is 0.39 seconds.
Python has been used to produce a program that can simulate client requests and establish multiple processes to simulate simultaneous use by multiple people.The program retrieves data using the EPICS Archiver Appliance's RESTful API which provides content in a variety of formats, including JSON, CSV, MAT, TXT, and RAW.This makes it extremely convenient because data can be loaded directly without using a conversion program.

Comparison of different file formats
The results of our comparison of the file properties of various formats are displayed in Table 1.We retrieved a PV with a fixed sampling rate of 10 HZ, the file size is proportional to the retrieving data period.Compared to JSON to other file formats, its file size is substantially greater and will require more network resources to transmit.The MAT format is a Binary format file that is supported by the commercial mathematics software MATLAB.When compared to other formats, it offers the smallest file size.However, processing the MAT format requires more server computing resources.
Converting to TXT and JSON formats requires additional attribute label information, which will require more computing resources and transmission time.Overall, stitching directly to RAW format, which was produced using Google's Protocol Buffers [11], is more advantageous in terms of file size and processing speed.

Retrieval performance test
To simulate the performance of a regular data retrieval, another computer connects to the archive server via a 1G network and runs the client tool.Over 26 million records totalling a file size of 529M make up the historical archive, which is a month's worth of data for a specific PV.Concurrent retrieval requests of 1 to 10 processes are executed, and the test elements include total execution time, bandwidth consumed, server CPU utilization, and the average download speed of each user.We found that a single retrieval request only uses 1 CPU logical processor.In addition, network bandwidth is the biggest bottleneck in this test.The test results are shown in Table 2. Another experiment performs some data processing during data retrieval.With the "ncount" and "firstSampling" options, the client tool simulates the concurrent execution of 1 to 60 processes for retrieving the RAW data.Figure 5 presents the test results.The execution time for a small number of concurrent requests is about the same.The CPU approaches its maximum capacity when there are about 40 concurrent queries.Furthermore, the system can still handle much more requests than this, but it will take more time to process them.

The impact of Java heap
The EPICS Archiver Appliance is developed based on Java and runs on the JVM.One of Java's memory management features is the Garbage Collection (GC) mechanism, which automatically cleans away unused data in the heap.The Java program's memory usage is limited by heap to prevent system crashes.However, too small heap size might push the system to run GC frequently, which would degrade performance.
We encountered an OOM error when performing certain calculations (such as median).Furthermore, getting long-term historical data in MAT format returns a file with a size of 0. These issues were rectified after the heap size was increased to 16G.
The effect of various heap sizes on retrieval time has been investigated.The test results are shown in Table 3.There are 30 processes to retrieve data for a month at the same time, as well as 30 consecutive tests.Because we can't predict when the GC will run, the time it takes to perform all retrieval jobs is variable.Overall, execution times for bigger heap sizes are often faster, however, these are not proportional.The smaller heap size may cause the system to perform GC more frequently; the standard deviation shows the execution times of the smaller heap are more frequently influenced by GC, whereas the execution times of the bigger heap are more stable.

Summary
A new TLS archive system has been constructed with the help of the EPICS Archiver Appliance.Two high-performance servers make up a high-reliability archive system for long-term monitoring of various accelerator parameters and machine status information.The new archive system stores data in binary format, which reduces file size compared to the current archive system.Using the retrieval API, diagnostic tools or graphical data browsers can retrieve historical data.The performance of this new TLS archive system has been estimated, and relevant system configurations have been modified.In the future, more TLS PVs will be added to the archive system, and the system will be continually tuned to provide better services.

Figure 1 .
Figure 1.The system architecture of the TLS archiver.

Figure 2 .
Figure 2. The software architecture of TLS archiver.

Figure 4 .
Figure 4. Retrieving archive data with Python.

Figure 5 .
Figure 5.The impact of the number of concurrent retrievals on CPU and time usage.

Table 1 .
Results of comparison of the properties of various formats

Table 2 .
Results of retrieving test through 1G network

Table 3 .
Effect of Different Heap Sizes on Retrieval Time