Cryogenic system operational experience at SNS

The helium cryogenic system at Spallation Neutron Source (SNS) provides cooling to 81 superconducting radio frequency cavities. During the first ten years of operation, much operational experience and lessons learned have been gained. The lessons learned include integrated system issues as well as component failures in the areas of mechanical, electrical and controls. Past issues that have been corrected as well as current issues in the system will be detailed in this paper. In 2009, a Process Failure Modes and Effects Analysis (PFMEA) was completed as a way to identify high risk items and prioritize efforts. Since 2009, the progress on mitigating the identified high risk items has been tracked. The results of the PFMEA and the progress made in reducing risk to the cryogenic system operation will be detailed in this paper.


Introduction
The design of the SNS cryogenic system is similar to the system deployed at Thomas Jefferson National Accelerator Facility (TJNAF) with some modifications. The SNS system is designed with about sixty percent of the refrigeration capacity of the original TJNAF system [1]. Table 1 details the system specifications. Figure 1 is a simplified diagram of the system. The major components of the system include a purifier, helium gas storage, warm compressors, 4.5-K cold box, liquid helium storage, 2-K cold box, linear accelerator (LINAC) distribution system, controls and additional ancillary systems.  The Central Helium Liquefier (CHL) at SNS is a highly automated and highly reliable machine. The system availability is approximately 99.7% during neutron production (~4500 hours per year). For the last seven years, the average amount of cryogenic system down time is approximately 15 hours per year during neutron production. The down time for each year is depicted on the chart in figure 2. Figure 3 displays the percentage of down time by category. By classifying the cause of each time the system was down, it is seen that the sinus filter was the largest contributor to down time. A sinus filter is a power filter used in conjunction with variable frequency drives (VFD) to protect against voltage spikes and prolong motor life time. This contribution was due to a single event in 2014. The VFDs also represent a significant fraction of CHL down time. The 2-K operating experience including lessons learned about the VFDs and sinus filters is included in [2]. The other major categories contributing to down time are related to electrical and controls including instruments, power supplies, control cards and loose wires.
Since commissioning the SNS cryogenic system, maintaining the reliability of the system has been a primary focus. To do this, a multi-faceted approach has been deployed. First, preventative maintenance is performed to correct problems before they effect neutron production. This preventative maintenance plan has been continually modified and improved as new issues have emerged. Second to prioritize efforts, a Failure Modes and Effects Analysis (FMEA) was performed. This helped identify weaknesses in the system and direct efforts to the issues that provide the biggest improvement to the system. The final element to maintaining the high reliability of the system is to continue to improve and refine operation methods, procedures, and control sequences.

Warm Compressor System
The SNS cryogenic system contains six oil flooded screw compressors, three first stage and three second stage. These are Howden compressors with Teco Westinghouse motors. Each compressor skid is equipped with its own oil removal stage. Additional oil removal is installed between the compressor skids and the main cold box. The warm compressors are equipped with built in volume ratio [3]. Several lessons have been learned about these compressor skids. The original shaft seals installed in the compressors were prone to blistering which resulted in oil leaks. A new dual seal was procured and installed. This changed the compression of the seal from a spring design to a stainless steel bellows. This has resulted in longer shaft seal life and less oil leakage.
The oil removal of the original second stage skids was undersized. Since the bypass around the second stage compressor was placed upstream of the final oil removal skid, oil migrated from the discharge of the second stage compressor back to the medium pressure header. The headers are piped in such a way that oil accumulates in the low spot of the header and a drain valve routes the oil back to the first stage compressor skids. Over time, the second stage compressor oil level drops and the first stage oil level rises. Oil transfers are required periodically to manage oil inventory.

4-K cold box
The 4-K cold box provides the primary and shield cooling to the transfer line and cryomodules. This box contains carbon beds at the 80 K and 20 K positions. It also contains a number of heat exchangers, five turbines, and a helium sub-cooler. The 4-K cold box is integrated with a 7000L helium dewar which provides a buffer during transitional periods. Despite its high reliability, there have been several lessons learned with this cold box.
The nitrogen loop within the cold box uses excess nitrogen. The main reason for this is inefficient heat exchange between the helium and nitrogen circuits. An extended study of this portion of the cold box has been conducted and is presented in [4]. In addition to the nitrogen issues, the regeneration of the carbon beds within the cold box has been problematic. The valves isolating the carbon beds have leaked which has made it difficult to heat the bed sufficiently to regenerate the carbon. One possibility of mitigating this issue is to install thermal loops on each side of the isolation valves to reach equilibrium temperature on both sides.
Another issue that arose in the 4-K cold box was a glitch in the reading of the speed sensor of a turbine. The speed sensors were outputting a very low voltage signal to a tachometer. Several times, turbines tripped due to a loss of speed signal. An oscilloscope was installed to read both the output of the speed sensor and the output of the tachometer. It was discovered that intermittently, the tachometer output signal would drop to zero. Figure 4 shows the oscilloscope reading at the output of the tachometer. Initially in this reading, the output is zero before it begins to read again. To rectify this, the speed sensor was positioned closer to the target on the turbine, which resulted in the voltage signal increasing. Additionally, filters were added in the PLC logic to minimize impact in the event of a temporary signal glitch. For future installations, dual speed sensors should be considered. The 4-K cold box was designed with injection points at the 10-K, 20-K, 30-K, 40-K, 50-K and 80-K points where the primary helium can be returned. For 2-K operation, the helium returns from the 2-K cold box to the 30K injection point. During 4-K operation, the helium returns to the coldest injection point which is the 10K point. The system was designed so that helium could be returned to the helium dewar which would in effect provide a 4-K injection point. However, piping in that portion of the system did not accommodate the flows experienced in the system. In future systems, a functional 4-K injection point should be considered. Also for future installations, it may be beneficial to apply coriolis flow meters in certain locations of the system such as on supercritical lines.

Electrical and controls
The control system uses Experimental Physics and Industrial Control System (EPICS) to perform operator interface and high level controls [5]. The EPICS monitoring and control software runs in a combination of hard and soft Input/Output Controllers (IOCs). The IOCs perform all automatic control sequences and most of the Proportional Integral Derivative (PID) control loops. Allen Bradley PLCs are used for low level control and equipment protection interlocks and communicate between the equipment and the IOCs. Figure 5 shows the control system architecture.
In 2007, a communication failure between two IOCs resulted in an unexpected pressurizing of the LINAC. Since that event, the process variable and the control device are maintained in the same IOC whenever possible. Additional changes have been made to inputs that have to be communicated from one IOC to another. These changes have the data points hold their last values in the event of a communication failure rather than going to zero. In future installations, it is recommended that consideration be given to more control pushed down to the PLC level while running a "hot spare" PLC. This decreases the dependency on the IOC and provides redundancy in control. The SNS cryogenic system is required to run continuously and has operated since 2005. To maintain the 4160 volt (V) switchgear to the cryogenic plant, it is necessary to shut the system down including all of the warm compressors. There is an inherit conflict between the preventative maintenance of the switchgear and the desire to continuously operate the cryogenic system. In future installations where the cryogenic system is required to continuously operate for a number of years, consideration should be given to providing a way to maintain switchgear while operating the cryogenic system. At SNS, the maintenance of the power supply switchgear was delayed many times to avoid shutting the cryogenic system down. In 2014, the plant was shut down for eight hours for switchgear maintenance. Evidence of arcing and burning of insulation was discovered in the switchgear.

Power outage
The reliability of power at SNS has been very high. However, there have been a few interruptions of power to the site. This is most dangerous for the superconducting LINAC while the cryomodules are at 4-K operation. The SNS cryogenic system is equipped with two recovery compressors that can be powered by a diesel generator in the event of a loss of site power. These compressors can be utilized to recover helium from the system. Data for a power outage is presented in figure 6. As seen in the plot, pressure increases in the low and medium headers after a loss of power. One reason for this is that the inlet valve for the turbines are ramped closed at a rate that spins the turbines down safely. Because the warm compressors are off, a pressure increase is experienced in the headers to which the turbines are discharging. Despite operating two recovery compressors, the cryomodules experienced a pressure increase of approximately one atmosphere. This kind of pressure transient has caused component failures and cavity detuning. In future installations, careful consideration should be given to cryomodule design for pressure fluctuations.

Preventative maintenance program
Preventative maintenance planning is conducted utilizing DataStream software. The work orders for the tasks to be done are generated automatically by the software either based on time or on the number of operating hours. The component of the cryogenic system that requires the most preventative maintenance is the warm compressor system. During these maintenance iterations, a number of tasks are completed including alignments, servicing bearings, and changing filters. The detailed and careful approach to operating and maintaining the warm compressor system has led to zero down time of the cryogenic system caused by issues with the warm compressors.
Continuous improvement of the preventative maintenance plan is conducted to mitigate issues that have emerged. An example of this is when the preventative maintenance program was expanded to check critical electrical terminal strips on a periodic basis. Loose wires have accounted for 17% of the cryogenic system down time hours. At SNS, there are two extended maintenance periods per year, lasting about six to eight weeks each. Electricians retighten all of the screw terminals in selected cabinets during those periods. Those include the magnetic bearing, VFD, and cryomodule control cabinets.

Process failure modes and effects analysis (PFMEA)
The first part of the PFMEA process was to break the work down to task level steps for analysis. Conducting the PFMEA provided a systematic approach to asking two basic questions: how could this fail during this process task, and if it does fail, what is the effect based on severity, probability, and detection? This effort provided a structured way for a cross functional team to study the cryogenic system. Doing this helped identify weaknesses in the process and ranked them in need of focus. The results of this study were a driving force that produced action. Vulnerabilities were addressed and resources were provided.
Each failure mode of each process task was analysed. These were assigned severity, probability and detection numbers with higher numbers representing more critical issues. These three numbers were multiplied and the product yielded a number referred to as Risk Priority Number (RPN). The RPNs above a selected threshold were considered high priority items. After the original analysis, there were seventy-six. Currently, there are less than twenty. As resources are made available, these issues will be addressed. The overall RPN for the cryogenic system has decreased by approximately 60%. In a system that is very reliable, continually evaluating the system in a structured way is important to maintaining such high reliability.
There were tangible benefits to conducting the PFMEA. One issue that was documented is that there was no way to detect a low pressure in the helium dewar. Since this provides the liquid helium supply to the sub-cooler, it is important for it to maintain pressure to have the head to push the liquid helium to the cold box. If this pressure gets low, pressure stability of the discharge of the final turbine can be impacted. An alarm was added to detect such an issue and this reduced the RPN. Another tangible example had to do with the differential pressure (dP) interlock on the warm compressor mass in valve. If the pressure in the gas tank supplying the warm compressor system gets lower than one atmosphere above the medium header pressure, an interlock closes the mass in valve. It was also documented that there was no way to know if the dP interlock had engaged or not. Without knowledge of this, the pressure in the high pressure header would have to drift to alarm levels before initiating the auto-dialer. An alarm was added for this instance and the RPN reduced by a factor of ten.

Summary
Maintaining the reliability of a cryogenic system is a continuous long term effort. The primary components of the approach being utilized at SNS are a comprehensive preventative maintenance program, a PFMEA, and incorporating lessons learned to continually improve the system. Dealing with issues that emerge and then adapting the procedures and maintenance planning to prevent them in the future has been a key strategy leading to the reliability of the SNS machine. Multiple lessons have been learned in the subcomponents of the SNS cryogenic system including warm compressors, 4-K cold box, 2-K cold box, control system, and integration of all of these subsystems. A PFMEA was utilized to help identify and prioritize issues to most efficiently make use of resources.
For future installations, consideration should be given to the lessons that have been learned at the SNS cryogenic system. For the warm compressor system, a new more durable shaft seal has been utilized. The bypass valve for second stage compressors can be placed down stream of final oil removal to prevent oil migration from second stage to first stage compressors. For the 4-K cold box, consideration should be given to more efficient nitrogen to helium heat exchange [6]. Other possible improvements for the 4-K cold box include utilizing dual speed sensors for turbines and including a 4-K injection point. For the control system, consideration should be given to installing more of the control of the system to the PLC level and running a "hot spare" to reduce the chance of communication issues and provide redundancy. Another suggested improvement is to design the system in such a way that the power supply switchgear can be maintained without shutting the system down. Finally when designing cryomodules for future installations, it is important to consider the pressure transients that are likely to occur from power outages.