Last Updated: October 19, 2018
Globus delivers secure and scalable data management capabilities to the research community, including data transfer and sharing. Globus allows researchers to use a secure, unified interface to orchestrate a variety of data management tasks across multiple Globus-enabled sites, all within the visibility and access control limits set by each site. By leveraging federated identities, existing institutional logins may be used for authentication to Globus. Computer scientists at the University of Chicago and Argonne National Laboratory have developed Globus to meet the specific needs and requirements of the research community. For over a decade, tens of thousands of researchers, including most leading US universities and national laboratories, have relied on Globus to provide data management capabilities.
Globus uses a hybrid Software as a Service architecture. Two primary components make up the Globus ecosystem: Globus services and Globus endpoints. Globus services are hosted on Amazon Web Services (AWS) and operated by Globus. Services provide secure platform interfaces (APIs) and are accessed via clients including a Globus provided web application and command line client. Globus endpoints are storage systems hosted and operated by the subscriber running Globus Connect software to enable use of Globus data management capabilities, including access to data via GridFTP and HTTPS protocols.
Globus security is based on well-established standards such as OAuth 2 and OpenID Connect. Globus services leverage federated login and allow user authentication using one of the many supported identity providers (e.g., institutional identities, ORCID, Google). Since Globus acts as an identity broker and uses federated login, institutional credentials are never seen by Globus. Globus services also allow the user to link their identities from multiple identity providers into a single account. Globus Auth, the identity and access management service in Globus, is based on OAuth 2 and is used to secure all Globus services and used for integration with third-party applications. This provides an advanced, user consent based delegated authorization model that allow applications and services to act on behalf of users and other services.
The Globus High Assurance tier provides additional security controls to meet the higher authentication and authorization standards required for access to restricted data, such as Protected Health Information, Personally Identifiable Information, and Controlled Unclassified Information. Users must authenticate with specific identities as determined by the policy set by administrators at the institution to obtain access; authenticating with one of the linked identities is not sufficient to obtain access. Users must re-authenticate in each new application session with the required identity. For example users will need to reauthenticate within each new instance of an application, web browser session and on each new device to obtain access. Each authentication lasts for a specific period of time, determined by policy set by the administrator, after which the user must re-authenticate to continue access. The time configured by the administrator is called the authentication assurance timeout. It is a shared responsibility of the subscriber, who sets the access control policies, including the authentication assurance timeout, and Globus, who enforces these rules, to ensure the security and confidentiality of the data.
Globus endpoints are storage systems that are accessible via Globus services. Endpoints are created by installing Globus Connect on the subscriber’s own storage systems, whether local, campus, or cloud. Endpoints are configured with one or more storage gateways, which are specific storage instances and policies that govern access to data on that instance. For example, an endpoint may have three storage gateways: one for data on on-premises object store, another for data on scratch a file system and another for data on a tape archive. Each of these gateways have their own access policies and may be designated as high assurance. The architecture of Globus Connect Server version 5, which supports High Assurance Gateways, is shown in Figure 1.
Data stored on endpoints are strictly under the purview of the subscriber. Using standard system tools and Globus Connect software, endpoint administrators (e.g., storage owners or storage administrators) set all authorization and access policies and can change policies or revoke user access at any time.
For storage gateways designated as high assurance, the administrator must configure the authentication assurance timeout, which is the length of time an authentication is valid. This determines how often a user has to authenticate with the identity within a session for continued access. In addition, for multi-user storage systems, such as campus storage, the administrator must specify the institutional identity provider required for authentication and the set of authorized users. For single-user storage systems, such as laptops, the administrator must choose the specific identity from their set of linked identities that is required for access to the endpoint.
Globus uses a "data channel" for moving data between two endpoints. This data channel is established directly between the source and destination endpoints and cannot be accessed by the Globus service, only by the servers running on the endpoints. Encryption of the data channel is enforced for all transfers to or from a high assurance endpoint. Transfers are encrypted using OpenSSL libraries installed at the endpoint and TLS 1.2. The cipher used for a transfer is negotiated between the source and destination endpoints and depends on the preference-ordered list of OpenSSL ciphers (default HIGH) on each endpoint. In addition to the data channel, Globus uses a "control channel" to communicate with the source and destination endpoints for a transfer. The control channel is encrypted with TLS 1.2.
Collections are hosted on storage gateways, and provide the interfaces needed for users to access data. Both HTTPS and GridFTP protocols are supported for accessing data from collections. The high assurance policies specified at the storage gateway by the administrator are inherited by the collection and govern both collection access and management. In the follow sections, we describe data access via the two types of collections: mapped and guest.
Mapped collections are created by administrators for users who have local accounts to access their data. For mapped collections, the user’s identity must be mapped to a local account. When a user selects a mapped collection on a high assurance storage gateway as the source of a data transfer, the Globus transfer service enforces that the user has authenticated with an identity from the required domain, within a session, and within the authentication assurance timeout period as configured on the collection. The Globus transfer service then establishes a control channel for communication with the high assurance collection via Globus Connect. Globus Connect ensures that control channel connections can only be established from the Globus transfer service IP address. Authentication and control channel establishment to a destination collection is identical to a source collection.
Once a user has authenticated to a source and destination collection, they may request a data transfer. A data channel for moving files between two Globus endpoints is established between the source and destination collections, secured via session keys established when the channel is created. Files are transferred directly between the source and destination systems; files never flow through the Globus service.
If enabled by the endpoint administrator, users may share data with collaborators by creating guest collections. Sharing occurs directly from the system where the data reside and does not require data to be copied to an external storage system, such as a cloud hosted system. Users can set permissions per folder, and can share with individual users or with groups. Like mapped collections, guest collections inherit access policies from the storage gateways on which they are created, including high-assurance policies.
Data transfer occurs as described above for mapped collections with the following differences. Collaborators can access a guest collection without a local account on the storage system, but, as with mapped collections, they have to authenticate with the identity that grants them access within a session and within the authentication assurance timeout period. Access control rules for guest collections are maintained by the Globus transfer service. The transfer service and endpoint, in tandem, enforce those rules. For bulk data management via GridFTP, the transfer service pushes the relevant access control rules to the endpoint, and for HTTPS access the HTTPS server pulls the relevant rules from the transfer service. The user defined rules are then evaluated along with the administrator defined policies on the storage gateway and the endpoint, to determine access.
Users can share guest collections with read-write or read-only permissions, subject to the storage gateway wide policy set by the administrator. Shared data access can be updated or revoked by the user at any time. For long running tasks, checks occur every minute for permission and identity set changes as well as consent revocation on the application that initiated the task. If any of the checks fail, the task is terminated. Thus any change in permissions that revokes access is enforced within a minute after the change.
With guest collections, access can be defined for groups as well as for individual users. A user can create and configure a group, add individual identities as members, and then assign access privileges to data on the guest collection for the group. Each group member’s access is governed by the group’s access privileges. Groups that are used to manage access for restricted data must be flagged as high-assurance groups and configured with an authentication assurance timeout.
Access to data on guest collections granted via group membership, requires the group member to authenticate with the specific identity that grants them the membership within each application session within the authentication assurance timeout period. Analogous to high assurance data access, high assurance group management requires the group administrator or manager to authenticate with the specific identity that grants them the role within each application session within the authentication assurance timeout period.
As part of service operation and logging, Globus services store information associated with tasks, such as temporary user credentials and information entered by the researcher, including username, email address, and endpoint name. Although files are never shared with the Globus service, the service does access and store filenames and directory paths in order to control the flow of research data. All filenames and directory paths are deleted within 90 days. When managing files containing restricted data, it is expected that filenames and paths may contain restricted data, and the Globus service protects filenames and paths accordingly.
All Globus operational and log data stored in AWS are encrypted at rest, using either AWS Key Management Service encryption or AWS service-specific encryption options. All AWS resources are monitored to ensure that their encryption options are set correctly. It is the responsibility of the subscriber to encrypt data at rest on the Globus endpoint.
Globus Connect generates a detailed audit trail that allows reconstruction of data access and user activities. Audit logs record details of all data access events as well as activities such as login and resource management. Logs are written by Globus Connect directly to the subscriber’s storage system. Management of the logs, such as policies and procedures for access, encryption, and retention, are the responsibility of the subscriber.