The OSG Open Facility: an on-ramp for opportunistic scientific computing

The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource owners and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.


Introduction
Most scientific computing needs of experimental high-energy physics (HEP) are met using distributed high-throughput computing (DHTC). In 2006, the Open Science Grid (OSG) [1] was formed as a North American DHTC infrastructure, primarily to meet the computational demands of HEP experiments, including the two general-purpose Large Hadron Collider (LHC) experiments, ATLAS and CMS. In the decade since, usage of OSG computational resources has grown to its current rate of over 100 million hours per month, as shown in Fig. 1. While a majority of computation on OSG resources is still driven by ATLAS and CMS, a growing proportion is being performed by opportunistic users from a range of research disciplines. A focused OSG effort working with both the resource providers and users of opportunistic cycles have, in two years, doubled the annual total of opportunistic hours to 200 million.

Present state of the OSG and Open Facility
In the past year, 150 million jobs consumed over 1.2 billion hours of computing across 129 clusters that comprise the fabric of the OSG. The OSG, as most DHTC grids, uses the Virtual Organization (VO) trust model where individual users are associated with one or more VOs. Most VOs recognized by the OSG represent large scientific collaborations, such as ATLAS and CMS, or university communities, such as GLOW (University of Wisconsin) or HCC (Holland Computing Center at the University of Nebraska). The latter category of VO allows users in their respective community to opportunistically access resources on the OSG that might otherwise be idle when not used by their owner VOs. In 2011, the OSG created a VO known simply as the "OSG VO" to enable those opportunistic resources as an open facility, not restricted to particular research or institutional communities. Membership to the OSG VO is open to any US researcher who could benefit from DHTC resources with the understanding that any resources they access would be opportunistic. The growth of usage in the OSG VO since its creation in 2011 is shown in Fig. 2. The upward growth in the Open Facility continued, with OSG VO users consuming over 140 million hours in 2016. A majority of the computing sites on the OSG support access by the OSG VO. While these sites were initially (in 2011) almost entirely ATLAS and CMS Tier1 and Tier2 sites, an increasing number of university computing clusters not affiliated with any other VO have joined the OSG to share opportunistic cycles. As such, OSG usage on such clusters is almost exclusively via the Open Facility.

Access to the Open Facility
Initially, access to the Open Facility was obtained by users logging into an OSG-maintained interactive node running the HTCondor batch system. From this node, submitted jobs would be delivered to a central flocking node which, in turn, handles negotiation and submission to individual resources across the OSG fabric. This method of access, known as "OSG-Direct," is deprecated for new users but continues to provide millions of hours of computing for users who joined the OSG VO in its early years. Over 25 million CPU hours on the Open Facility were utilized by OSG-Direct users in 2016. Currently, the Open Facility supports several methods of access for new users, as illustrated in Fig. 3.

OSG-XD
The eXtreme Science and Engineering Discovery Environment (XSEDE) [2] is a US National Science Foundation initiative that enables access and training for use of large scientific computing resources (primarily large HPC installations/supercomputers). Since 2013, the OSG has been one of the resources on which researchers can have allocation grants awarded. Users obtaining one of these grants, which are issued quarterly on a year-long basis, gain access to a login node managed by OSG (which is also the host currently used by OSG-Direct users). From here, these "OSG-XD" users can submit their jobs to run on sites in the OSG Open Facility. Unlike other XSEDE allocations which are intended for HPC sites, users accessing the OSG via XSEDE allocations are not limited to only the computing resources awarded in their allocation. OSG-XD users who have exceeded their allocation are allowed to continue using OSG resources, albeit at a lower priority than users who have not. In the past year, approximately 32 million compute hours were utilized by OSG-XD users despite the sum of their XSEDE allocations being only 5 million hours.

OSG-Connect
OSG-Connect is an inclusive service that provides an easy-to-use virtual environment for submitting jobs to the OSG Open Facility. New users are able to sign up using their own campus identities via InCommon [4] or CILogon [3], or create a new identity if neither of those are available for them. Using those credentials, the user is then able to access a login and submit host via SSH, Globus online and the local Stash service for data management, and other services using the same credentials. While OSG-Connect does not require allocations for usage as OSG-XD does, users are required to belong to a registered project for bookkeeping, where each project corresponds to a single PI's research project. OSG-Connect was first offered as a service in 2014 and in the subsequent year offered just under 10 million CPU hours to users. In 2016, users from 66 individual projects utilized over 50 million CPU hours on the OSG Open Facility by way of OSG-Connect. The OSG organization maintains a central login node for OSG-Connect users at the University of Chicago.

Campus submission environments
Access to the Open Facility can also be integrated with access to local compute resources at a campus. One way in which this can be achieved is by a local deployment of the OSG-Connect machinery. In effect, this allows a single access point on campus to submit jobs to the OSG Open Facility as with the OSG-Connect login node as well as to local resources. One of the largest such deployments is at Duke University, where the Duke-Connect platform has provided users with over 7 million CPU hours on the OSG Open Facility in 2016. OSG tools have also been used to allow access to the Open Facility from institutions with existing OSG sites. An example of this is MIT, which in 2016 formed a virtual computing center that allowed users to submit jobs to local resources (including CMS Tier-2 and Tier-3 sites already on the OSG), the OSG Open Facility, and also global CMS computing resources. Users of the MIT virtual computing facility utilized over 20 million CPU hours on the OSG in 2016.

Usage of the Open Facility
Users on the OSG Open Facility are organized into projects for accounting purposes. Typically, each project corresponds to a single PI's research program where the PI can authorize multiple users to utilize that project. In the case of OSG-XD users, each project corresponds to an XSEDE allocation. In 2016, 99 projects with users from over 80 institutions spanning the globe utilized the OSG Open Facility. These projects span a range of research disciplines, as shown in Fig. 4; in a typical month, 30-40 of these projects are active on the Open Facility. While most projects reflect usage corresponding to a single PI's research efforts, large physics collaborations are also able to take advantage of the OSG Open Facility. Collaborations such as LIGO, IceCube, mu2e, and IceCube were all able to utilize over 1 million CPU hours on the OSG Open Facility in the past year. The usage of the Open Facility by large collaborations represents a new pattern in usage in the past two years. While these experiments have dedicated computing resources of their own, the additional resources afforded by opportunistic usage of the OSG allowed experiments such as LIGO and AMS to perform specific computational campaigns much faster than they would have otherwise.
In the year 2015 [5], usage of the OSG Open Facility resulted in 34 peer-reviewed publications.

Recent Developments and Future Potential
The OSG Organization continues to work on making access to DHTC easier for researchers as well as expanding the types and quantity of available resources on the Open Facility. from University resources that are not directly affiliated with an existing OSG VO. Clusters at the Syracuse University, the University of Washington, and Clemson University are three such sites and together provide over 10 million CPU hours to OSG Open Facility users. Recent developments, such as the development of HTCondor-BoscoCE [7] aim to ease the integration of such resources into the OSG computational fabric.

Conclusions
The Open Science Grid continues to increase access to DHTC resources to enable science in a wide range of disciplines. While the computational needs of LHC experiments have increased during the most recent run of the LHC, the OSG continues to make opportunistic resources available to users of the Open Facility. An average of 200 million CPU hours are used opportunistically by OSG users, with over 60% going to individual researchers and communities accessing these resources via the OSG VO. We anticipate that these trends will continue in the near future.