Last Updated: May 30, 2018

Table of Contents

1. Introduction

This guide provides an overview of Globus Connect Server version 5, and is targeted at system administrators for the installation and operation of deployments that use this version.

Globus Connect Server version 5 is the next evolution of the server, which provides new capabilities and enhancements for both administrator and users, and platform features to build interesting solutions for data management. However, it is not yet fully functional, as compared to version 4. New features will be released as they are built out and capabilities added incrementally. At this time, there is no path to upgrade or migrate from version 4 to version 5. However, such functionality will be made available at a future date when version 5 has feature parity with version 4.

Institutions that are interested in the new features that version 5 makes available, or that would like to operate an early pilot of this release, will find this guide useful to understand the new architecture and installation steps. Others should continue to use Globus Connect Server version 4, until version 5 is full featured and ready for broad deployment.

The latest version, 5.1, supports the following features:

  • Google Drive storage

  • Shared endpoints (guest access) only on POSIX

  • HTTPS access to data, for direct access from browsers and other HTTPS clients

  • GridFTP access to data, for reliable, bulk data transfer via the Globus transfer service

1.1. Terminology

The Globus Connect Server architecture has evolved to support several new capabilities, and this section provides an overview of the various components in version 5:

  • Endpoint (Changed from version 4): A deployment of Globus Connect Server, optionally across multiple data transfer nodes. This provides the interface for management and configuration. An endpoint can be configured with more than one “Connector” to allow the endpoint to talk to multiple different types of storage systems simultaneously ((e.g., POSiX file system, Google Drive, etc.).

  • Storage Gateway: A named, discoverable set of policies against an endpoint that defines who can access data on a particular subset of a storage system connected to the endpoint, and how such data can be accessed. Multiple storage gateways can be created against a storage system connected to the endpoint.

  • Collections: Collections are what Globus users see and use to access their data. A set of named files (blobs), hierarchically named in folders, associated with a storage gateway. Each collection has a unique DNS name, is accessible and manageable via HTTPS (client/server access), GridFTP (async bulk transfer), and REST API (advanced operations), and authenticated and authorized via Globus Auth-issued OAuth2 access tokens. The current Globus system calls these “endpoints”. Two type of collections, as determined by access requirements are supported:

    • Mapped collection: Each user accessing data on the collection must have a local account on the storage system. In the current Globus system, these are called “host endpoints”.

    • Guest collection: Users can access data on the collection without a local account on the storage system, based on permissions granted by a local user via Globus. In the current Globus systems, these are called “shared endpoints”

Globus Connect Server version 5.1

With the above architecture, this version supports many new features including:

  • Multiple storage types connected to the same endpoint, and thus same set of servers

  • Multiple storage gateways against the same storage type

  • Clear separation between management and configuration, and data access interfaces

Note:

Globus Connect Server v5.1 only supports guest collections and use of a single server / single data transfer node. Subsequent releases will also support mapped collections, as well as deployments consisting of multiple servers / multiple data transfer nodes.

1.2. Installation Summary

Summarized below are the steps that an administrator must follow to create an endpoint using Globus Connect Server version 5:

  • Install Globus Connect Server on the data transfer nodes or servers, and create the endpoint definition. The endpoint definition includes server and network use configuration.

  • Register the endpoint as a resource server with Globus, using the developer console for Globus Auth to be used for securing the endpoint interactions.

  • For each storage type (e.g. Posix, Google Drive) that will be used with the endpoint, install and configure the relevant storage connector for use with the endpoint. This is the software that interfaces with a specific type of storage.

  • Create Storage Gateway(s), that define policies such as specific subset of a storage system, users who are allowed to create collections on the gateway, type of collection supported and authentication policies required.

With the above in place, users interact with the endpoint as follows:

  • Users discover storage gateways of interest via search

  • They can then create collections on the storage gateway to access and share the data, as long as the policy of the storage gateway permits it

  • User can then use collections to access data, via GridFTP and HTTP/S protocols

2. Globus Connect Server Prerequisites

Important:The prerequisites listed in this section must be met before you begin to install Globus Connect Server on your system. Contact us if you have any questions regarding the prerequisites.

2.1. Supported Linux distributions

Globus Connect Server is currently supported on the following Linux distributions:

  • CentOS 7

  • Red Hat Enterprise Linux 7

  • Ubuntu 16.04 LTS

  • Debian 9

2.2. Administrator privileges

You must have administrator (root) privileges on your system to install Globus Connect Server; sudo can be used to perform the installation.

2.3. System time synchronization

Your system must be running ntpd or another daemon for synchronizing with standard time servers.

2.4. Internet-accessible system

Other hosts on the Internet must be able to initiate connections to the system where you will be installing Globus Connect Server. If your system is behind a network address translation (NAT) firewall/router, you may not be able to use the default configuration to install Globus—please see the configuration instructions in the NAT/firewall section. Otherwise, perform the checks shown below to confirm that your system meets the default accessibility requirements. If you are installing on an Amazon EC2 instance, you can skip ahead to the Open TCP ports section.

Your network administrator may be able to offer assistance if you run into problems, or contact us.

2.4.1. Check hostname local DNS resolution

Execute this command on the system where you plan to install Globus Connect Server:

$ hostname -f

Confirm that a fully qualified domain name (FQDN) is returned (e.g., 'ep1.transfer.globus.org' ).

2.4.2. Check hostname external DNS resolution

Use a public DNS server operated by a different organization to verify that the returned FQDN is publicly resolvable. More concretely, you can use nslookup to check that your server’s FQDN resolves against one of Google’s public DNS servers:

$ nslookup 'ep1.transfer.globus.org' 8.8.4.4

If you get a message of the form "** server can’t find ep1.transfer.globus.org: NXDOMAIN", your system’s hostname is not resolvable via public DNS and you need to address the issue before continuing with the installation.

2.5. Open TCP ports

If your system is behind a firewall, select TCP ports must be open for Globus to work. You may need to coordinate with your network or security administrator to open the ports.

The TCP ports that must be open for the default Globus Connect Server installation, together with brief descriptions of each, are listed here:

  • Ports 50000—51000 inbound and outbound to/from ANY

    • Used for GridFTP data channel traffic.

    • The use of the default port range is strongly recommended (you can read why here).

    • Data channel traffic is sent directly between endpoints—it is not relayed by the Globus service.

  • Port 443 inbound from ANY

    • Used by Globus Connect Server Manager Service

    • Used for GridFTP control channel traffic.

    • Used for HTTPS access to collections.

  • Port 443 outbound to ANY

    • Used to communicate with the Globus service via its REST API.

    • Used to communicate with Google Drive servers.

    • Used to pull Globus Connect Server packages from the Globus repository.

3. Globus Connect Server Installation and Endpoint Creation

This section covers the installation of Globus Connect Server and the set up of a Globus server endpoint with the default configuration—the recommended starting point for new resource providers. You will be able to fine-tune this configuration later without doing a reinstall.

Before continuing, it is important to confirm that the prerequisites detailed in the previous section have been met.

3.1. Install Globus Connect Server

Skip to the appropriate section for your Linux distribution and follow the instructions to install Globus Connect Server on your system.

3.1.1. CentOS and Red Hat Enterprise Linux

$ wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sudo yum install epel-release-latest-7.noarch.rpm
$ sudo yum install http://downloads.globus.org/toolkit/gt6/stable/installers/repo/rpm/globus-toolkit-repo-latest.noarch.rpm
$ sudo yum-config-manager --enable Globus-Connect-Server-5-Stable
$ sudo yum-config-manager --enable Globus-Toolkit-6-Stable
$ sudo yum install globus-connect-server51

Note: If SELinux is enabled, install the GCS policy module

$ sudo yum install globus-connect-server-manager51-selinux

3.1.2. Ubuntu

$ sudo curl -LOs http://downloads.globus.org/toolkit/gt6/stable/installers/repo/deb/globus-toolkit-repo_latest_all.deb
$ sudo dpkg -i globus-toolkit-repo_latest_all.deb
$ sudo sed -i /etc/apt/sources.list.d/globus-toolkit-6-stable*.list \
        -e 's/^# deb /deb /'
$ sudo sed -i /etc/apt/sources.list.d/globus-connect-server-stable*.list \
        -e 's/^# deb /deb /'
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:certbot/certbot
$ sudo apt-get update
$ sudo apt-get install globus-connect-server51

3.1.3. Debian

$ echo 'deb http://ftp.debian.org/debian stretch-backports main' \
        | sudo tee /etc/apt/sources.list.d/stretch-backports.list
$ sudo curl -LOs http://downloads.globus.org/toolkit/gt6/stable/installers/repo/deb/globus-toolkit-repo_latest_all.deb
$ sudo dpkg -i globus-toolkit-repo_latest_all.deb
$ sudo sed -i /etc/apt/sources.list.d/globus-toolkit-6-stable*.list \
        -e 's/^# deb /deb /'
$ sudo sed -i /etc/apt/sources.list.d/globus-connect-server-stable*.list \
        -e 's/^# deb /deb /'
$ sudo apt-get update
$ sudo apt-get install globus-connect-server51

3.2. Basic Globus Endpoint Setup Process

To create your endpoint, you will need to register the Globus Connect Server installation as a Globus app on developers.globus.org. Once registered, update the globus-connect-server.conf config file with the appropriate values, and then finally run the globus-connect-server-setup command. Once the endpoint is created, you’ll then need to follow up to make it managed and grant the Administrator Role to the appropriate Globus identities.

3.2.1. Register the endpoint on Globus developers console

You will need to register the endpoint as an app on developers.globus.org. Once the registration is complete, a Client ID and a Client Secret will be provided that your endpoint will use to securely identify itself to, and interact with, Globus services. It’s recommended you create a separate Project, as described below, to manage your registration.

  1. Log into developers.globus.org.

  2. Click "Add another project" and fill out the form. The project provides a way to manage this registration, by adding other administrators.

  3. From the "Add.." menu for the project click "Add a new Globus Connect Server" and fill out the form. The display name will be used to identify this application to the user when they use the endpoint for the first time. It is recommended that the display name match the endpoint display name.

  4. Click "Generate a New Client Secret", fill out the form.

  5. Save the Client ID and Client Secret values for use in the globus-connect-server.conf file.

  6. It is also recommended that you add other appropriate users in your organization as administrators of the project for the sake of redundancy, and also to prevent the loss of administrative control of your Globus Project should any one project administrator leave your organization.

Note: Each new endpoint requires a new Globus Connect Server Globus app, with its own client id and client secret, but these can be within the same project or in a different project.

3.2.2. Configure Globus Endpoint

To configure your new endpoint, you will need to edit the /etc/globus-connect-server.conf file as follows:

  1. Set the [Globus].ClientId option to the value of the Client ID of your app from step 3.2.1.

  2. Set the [Globus].ClientSecret option to the value of the Client Secret of your app from step 3.2.1.

  3. Change the [Endpoint].Name value to the name of the GCS endpoint. It is best if the endpoint name matches the Display Name you picked in 3.2.1.

  4. Change the [Endpoint].ServerName value to the public DNS name of the GCS machine.

  5. Set the [LetsEncrypt].Email option to be the email address of the endpoint admin. You will get expiration warnings/notices regarding your endpoint’s certs at this email address, so it should be an email address that is regularly checked.

  6. Set the [LetsEncrypt].AgreeToS option to True.

3.2.3. Create Globus Endpoint

Once the configuration file updates are complete, run the globus-connect-server-setup command to create the endpoint.

$ sudo globus-connect-server-setup

The setup script will produce various outputs that you will want to keep. Be sure to make note of the values it outputs for “Deployed GCS Manager”, “Google Drive Redirect URL”, and “Created GCS Endpoint”. The “Deployed GCS Manager” value is the service address for the GCS Manager service for your endpoint - support may ask for this if you submit a support ticket. The “Google Drive Redirect URL” value will be important if you configure the Google Drive connector on this endpoint at some point. The “Created GCS Endpoint” value will include your endpoint’s display name as well as its UUID - both of which are important for identifying your endpoint.

3.3. Setting the endpoint as managed

Endpoints that require premium functionality, such as guest collections and premium connectors, need to be covered under a subscription. Globus Connect Server version 5.1 only supports premium features, and thus all such endpoints must be made managed so as to be covered under an organization’s Globus subscription.

In order to make your endpoint managed, you will need to submit a support request to Globus requesting that this be done. This request must be made by one of your organization’s Globus Subscription Support Contacts. The request must:

  1. State that you are requesting that a new GCSv5.1 endpoint be made managed.

  2. Give the UUID of your endpoint as shown in the globus-connect-server-setup command outputs from Section 3.2.3.

  3. Give the name of the organization with the Globus Subscription under which this new endpoint is covered.

After submitting the request, you’ll need to wait until you hear back that your endpoint has been made managed before proceeding to Section 3.4.

3.4. Grant the Administrator Role to the Appropriate Globus Identities

You will now need to grant the Administrator Role for your endpoint to the Globus identities you wish to be able to manage your endpoint definition. To do this, you will run the following command:

$ sudo /opt/globus/bin/gcs-config endpoint admin add-role USER@DOMAIN

It is also recommended that you add multiple appropriate users in your organization as administrators of the endpoint for the sake of redundancy, and also to prevent the loss of administrative control of your endpoint should any one administrator leave your organization.

3.5. Configure One or More Storage Gateways on the Endpoint

Before your endpoint can be used to access data, you’ll need to create one or more storage gateways on the endpoint. The process for doing this depends on the type of storage gateway you wish to create. Section 6 provides information on creating and managing storage gateways.

4. Globus Connect Server Configuration

The configuration options for Globus Connect Server are stored in the /etc/globus-connect-server.conf file. After updating settings in the /etc/globus-connect-server.conf file you must run the globus-connect-server-setup command (as root) before the settings will take effect on your endpoint. Given below is an overview of the supported options that can be configured in the /etc/globus-connect-server.conf file.

Note:A detailed description of every option can be found in the comments globus-connect-server.conf file that gets created on your system during the Globus Connect Server install process.

4.1. [Globus] section

4.1.1. ClientId option

The Client ID of the Globus Connect Server. This is created using the application located at developers.globus.org and selecting the New GCS option. This value should be initially set to match the Client ID for the Globus app you create for your endpoint, and then should never be changed again.

4.1.2. ClientSecret option

The Client Secret of the Globus Connect Server. This is created using the application located at developers.globus.org and selecting the New GCS option. This value should be initially set to match the Client Secret for the Globus app you create for your endpoint, and then should never be changed again unless you are changing the secret used by that Globus app using the tools provided at the above URL.

4.2. [Endpoint] section

4.2.1. Name option

Display name of the endpoint. The special value %(SHORT_HOSTNAME)s will substitute the first segment of the current machine’s public hostname. It is best if this name matches the Display Name you chose when you created the Globus app for this endpoint.

4.2.2. ServerName option

The public hostname of the GCS server. The special value of %(HOSTNAME)s will use the hostname of the current machine. This value must be resolvable in public DNS to a public IP address for your system.

4.3. [LetsEncrypt] section

4.3.1. Email option

The user must supply an email address for Let’s Encrypt. You will get expiration warnings/notices regarding your endpoint’s certs at this email address, so it should be an email address that is regularly checked.

4.3.2. AgreeToS option

The user must explicitly agree to the Let’s Encrypt Terms of Service. If not set to True, the globus-connect-server-setup scripts will provide a link to letsencrypt.org so user can view and acknowledge the ToS.

4.4. [GridFTP] section

4.4.1. IncomingPortRange option

Port range to use for incoming connections. The format is startport,endport. If not set, this will default to 50000,51000. The use of the default port range is strongly recommended (you can read why here).

4.4.2. OutgoingPortRange option

Port range to use for outgoing connections. The format is startport,endport. Only use this if your firewall restricts outgoing ports and gridftp won’t work otherwise. The default is not restrict outgoing TCP ports.

4.4.3. DataInterface option

Hostname or IP address of the interface to use for data connections. Normally, you will not need to set a value for this option. This option is usually only set when it is desired to force the data channel traffic to use a specific interface - such as forcing data channel traffic to use a high-speed link or in NATed environments with split DNS. If not set in this file, then the default behavior is:

  • When run on an EC2 instance, the data interface will be automatically configured to use the public ipv4 address of the instance.

  • When run on a non-EC2 instance, if [Endpoint].ServerName is set, then that value is used. If this resolves to a private IP address, a warning will be issued.

  • Otherwise, this will not be set, and the gridftp server will tell clients to connect to the IP address that the control connection was established on.

4.4.4. RequireEncryption option

Require an encrypted data connection for all transfers. If this is set to True, then transfers attempted without encryption will result in error.

5. NAT/Firewall support and configuration

The Globus Connect Server package provides configuration tools for several related services to enable administrators to easily configure a Globus endpoint. The globus-connect-server.conf file controls how the services used by Globus are configured, and includes configuration options to manage firewall-related configuration of services. Each service provided by the Globus Connect Server packages may be configured separately as described below.

Note that the descriptions below include examples of Globus Connect Server service configurations only. Configuring the firewalls themselves to allow the ports and host connections is not discussed. See the Open TCP ports section for a discussions of the ports used by Globus Connect Server.

5.1. Configuring GridFTP

The options related to configuring GridFTP to work with firewalls and/or NAT are: [Endpoint].ServerName, [GridFTP].IncomingPortRange, [GridFTP].OutgoingPortRange, and [GridFTP].DataInterface.

By default, Globus Connect Server configures the GridFTP server assuming that incoming TCP connections are allowed to the port range 50000-51000 on the GridFTP server node for the data channel traffic and that the source port range for outbound data channel connections is not restricted.

5.2. Using GridFTP behind a NAT Firewall

To use a GridFTP behind a NAT firewall, the [Endpoint].ServerName value needs be set to a value that resolves in public DNS to a public IP address associated with the server hosting the endpoint. The firewall/NAT device must properly forward all traffic destined for the server’s public IP address to the correct internal address of the server. If operating in a split DNS environment, where the DNS name given in the [Endpoint].ServerName option resolves to a different IP address in internal DNS than the public IP address that the DNS name resolves to in public DNS, then the [GridFTP].DataInterface option should be set to be the same as the public IP address.

5.3. NAT Firewall Example

As an example, let us consider a site that is using NAT and also has a split DNS configuration. The DNS name for their server is public-gridftp.example.org, which is thus what the [Endpoint].ServerName option value is set to. The public-gridftp.example.org DNS name resolves to 1.2.3.4 in public DNS. However, for the site’s internal DNS, public-gridftp.example.org resolves to 192.168.0.1. Because of this, we’ll need to set the [GridFTP].DataInterface to 1.2.3.4 to that it will advertise the proper public IP address for data channel. If the [GridFTP].DataInterface value isn’t set to the proper public IP address in this case, then GridFTP will advertise the IP address that the local host resolves the [Endpoint].ServerName value to as its data interface - which would be 192.168.0.1 due to the split DNS configuration at the site. Since 192.168.0.1 is a private IP address, transfers with this GridFTP server will fail if it is advertising its own data interface as 192.168.0.1 to other Globus endpoints on the public Internet. Thus, we specifically configure GridFTP to advertise the correct public IP address of 1.2.3.4 for its data interface.

Example of partial globus-connect-server.conf file for the above case:

[Endpoint]
...
ServerName = public-gridftp.example.org
...
[GridFTP]
...
DataInterface = 1.2.3.4

5.4. Using GridFTP with Firewall Port Restrictions

To use a GridFTP server with a firewall with incoming and/or outgoing port restrictions, use the [GridFTP].IncomingPortRange and [GridFTP].OutgoingPortRange configuration options. The former restricts the TCP port range that the GridFTP server listens on for data channel connections. The [GridFTP].OutgoingPortRange option restricts the TCP source port range that the GridFTP server uses when creating outgoing data channel connection sockets. For both of these items, the syntax of the port range is startport,endport (e.g. 50000,51000).

The use of the default values for both [GridFTP].IncomingPortRange and [GridFTP].OutgoingPortRange is strongly recommended (you can read why here).

5.5. Port Restrictions Example

As an example, this partial globus-connect-server.conf file configures the GridFTP server to listen for data channel connections on ports from 4000 to 5000 instead of the default 50000 to 51000. This example also configures GridFTP to use local source ports from 6000 and 7000 when establishing outbound data channel connections.

The use of the default values for both [GridFTP].IncomingPortRange and [GridFTP].OutgoingPortRange is strongly recommended (you can read why here).

[GridFTP]
Server = public-gridftp.example.org
IncomingPortRange = 4000,5000
OutgoingPortRange = 6000,7000

6. Managing storage gateways

Storage gateways define policies regarding the access and use of the storage systems they pertain to. Once a storage gateway has been created on an endpoint, it is possible for permitted users to create collections in the storage system specified in the storage gateway. These collections can then be accessed via the Globus service, allowing permitted users to browse their contents, transfer files to/from the collection, and perform other operations on the files the collection makes available.

A storage gateway must be configured to use a specific Storage Connector, which determines what type of storage system(s) it can be used to access. Additional details for each type of storage connector, and how these affect the configuration and use of storage gateways, will be given in the documentation for each storage connector.

The /opt/globus/bin/gcs-config storage-gateway command makes available several options for managing existing storage gateways. This section of the documentation will discuss the options of the /opt/globus/bin/gcs-config storage-gateway command that are common to all storage connector types. The following subsections describe the various options for the command. For specific instructions on various storage gateways, see

6.1. Creating a new storage gateway on an endpoint

The /opt/globus/bin/gcs-config storage-gateway create command is used to create new storage gateways. The details of this command differ depending on the Storage Connector being used for the storage gateway. As such, this command is discussed in more detail in the documentation for each Storage Connector type. This command supports the following common options for all Storage Connector types.

6.1.1. --connector option

The type of storage connector to use for the storage gateway.

6.1.2. --display-name option

The name that will be displayed in the Globus service for the collection.

6.1.3. --domain option

The identity domain for identities allowed to create collections on the storage system that the storage gateway is configured for. A Globus user must have an identity from this domain in their Globus Account in order to be able to create collections on the storage gateway. The details of this option differ depending on the Storage Connector being used for the storage gateway. As such, this option is discussed in more detail in the documentation for each Storage Connector type.

6.1.4. --root option

Defines the directory on the server hosting the endpoint that will be the root of the storage gateway. Collections created against this storage gateway must be located inside of this directory and cannot access parts of the storage system outside of this directory.

6.1.5. --restrict-paths option

This option can be used to place further access restrictions on how directories located in the storage system for which the storage gateway is configured can be accessed. Paths are specified in a comma separated list prefixed by the access permission permitted for the path as given by: R(read), RW(read/write), or N(no access).

For example, consider a storage gateway rooted at /data that the storage gateway creator wants to make available to users generally read/write. Let us also assume that there is a directory /data/static that the creator of the storage gateway wants to make accessible in a read-only manner. Let us further say that there is a /data/secret directory that the storage gateway creator does not want to be accessible via this storage gateway at all. This could be accomplished by setting the --restrict-paths option like so:

--restrict-paths RW/data,R/data/static,N/data/secret

6.1.6. --help option

Shows help text explaining the use and options of the /opt/globus/bin/gcs-config storage-gateway create command.

6.2. Listing existing storage gateways on an endpoint

To see the currently configured storage gateways on an endpoint, the /opt/globus/bin/gcs-config storage-gateway list command can be used. For example:

$ sudo /opt/globus/bin/gcs-config storage-gateway list

ID | Storage Type | Display Name | Root
b4be2c2e-1d13-4591-b984-746e32fb655b | POSIX | posix-storage-gateway-demo | /shared
c81cf69c-e494-465c-83f3-7baf272de1c0 | POSIX | Data Storage Gateway | /data

This command provides one method of finding the ID of a storage gateway.

6.3. Showing the details of an existing storage gateway

To see the configuration details of a currently configured storage gateway on an endpoint, the /opt/globus/bin/gcs-config storage-gateway show command can be used. For example:

$ sudo /opt/globus/bin/gcs-config storage-gateway show c81cf69c-e494-465c-83f3-7baf272de1c0

c81cf69c-e494-465c-83f3-7baf272de1c0
--identity-provider 927d2228-f927-42b2-9ace-c523fa2ba34e
--domain example.edu
--connector "POSIX"
--display-name "Data Storage Gateway"
--root "/data"

The ID of a particular storage gateway can be supplied as an argument to display the configuration of that specific storage gateway. If no ID is given as an argument, then the configuration information for all existing storage gateways will be shown.

6.4. Modifying an existing storage gateway

The configuration options for an existing storage gateway can be changed with the /opt/globus/bin/gcs-config storage-gateway update command. It is necessary to supply the ID of the storage gateway that you wish to modify as an argument for the command. This command supports the same options as the /opt/globus/bin/gcs-config storage-gateway create command. For example:

$ sudo /opt/globus/bin/gcs-config storage-gateway update --root /new-data c81cf69c-e494-465c-83f3-7baf272de1c0

$ sudo /opt/globus/bin/gcs-config storage-gateway show c81cf69c-e494-465c-83f3-7baf272de1c0

c81cf69c-e494-465c-83f3-7baf272de1c0
--identity-provider 927d2228-f927-42b2-9ace-c523fa2ba34e
--domain example.edu
--connector "POSIX"
--display-name "Data Storage Gateway"
--root "/new-data"

Note that changing the configuration of a storage gateway that has existing collections created against it can easily break those collections. As a result, it is not recommended to make changes to the configuration of a storage gateway that has existing collections.

6.5. Deleting an existing storage gateway

An existing storage gateway can be deleted with the /opt/globus/bin/gcs-config storage-gateway delete command. It is necessary to supply the ID of the storage gateway that you wish to delete as an argument for the command. If this storage gateway has existing collections, then you will be prompted to delete those collections in order to continue with deleting the storage gateway. For example:

$ /opt/globus/bin/gcs-config storage-gateway delete c81cf69c-e494-465c-83f3-7baf272de1c0

Storage System c81cf69c-e494-465c-83f3-7baf272de1c0 in use
Delete collection "test-share-001" (48840f6d-ef3f-4039-b8b7-31a3064b8fa3) [y/N]: y
Deleted storage gateway c81cf69c-e494-465c-83f3-7baf272de1c0

Please note that deleted collections cannot be recovered.

7. POSIX connector

The POSIX storage connector allows Globus Connect Server to access POSIX storage systems mounted on the server that hosts the endpoint. Access to POSIX storage systems through the POSIX connector is facilitated by the creation of POSIX storage gateways on an endpoint. Please refer to Section 6 for the various options available in the tool to manage storage gateways.

7.1. Creating a storage gateway using the POSIX connector

To create a new POSIX storage gateways on an endpoint, the /opt/globus/bin/gcs-config storage-gateway create command is used. For example:

$ sudo /opt/globus/bin/gcs-config storage-gateway create --root /data --domain example.edu --connector POSIX --display-name "Data Storage Gateway"

Storage Gateway Created: c81cf69c-e494-465c-83f3-7baf272de1c0

Note that the ID of the new storage gateway is given in the output.

This would create a storage gateway on the endpoint that:

  1. /data directory is the root of what can be accessed via the storage gateway, that is only folders and files under that directory can be accessed via the gateway

  2. Allows Globus users with a Globus Account that includes an identity from the Identity Provider that controls the example.edu domain to create collections in the /data directory, provided that a user’s example.edu identity maps to a local account on the server hosting the endpoint and provided that local account is not otherwise forbidden from creating collections by other policy set on the storage gateway (see below).

  3. Uses the POSIX storage connector.

  4. Has a display name of “Data Storage Gateway”.

The the /opt/globus/bin/gcs-config storage-gateway create command supports the following options for storage gateways configured to use the POSIX connector, in addition to the common options supported for all storage connectors:

7.1.1. --domain option

Identities from this domain are allowed to use the POSIX storage gateway to create collections on the file system for which the storage gateway is configured. Identities from this domain must map to an appropriate local user on the server hosting the endpoint. For example, if this value was set to abc.edu, then a Globus user would need to have logged into Globus with a Globus Account that included an abc.edu identity to be able to create collections using this POSIX storage gateway. In addition, in order to create collections using this POSIX storage gateway, a Globus user’s abc.edu identity would need to map to a local account on the server hosting the endpoint. To continue the example, if a user logged into Globus using their "bob@abc.edu" identity, there would need to exist a "bob" local user account on the server hosting the endpoint for the Globus user to be able to create collections using this POSIX storage gateway.

7.1.2. --users-deny option

This option can be used to further control which local users are allowed to create collections using this storage gateway. Specifies local users that are explicitly forbidden from creating collections using this storage gateway. Users are specified in a comma separated list. This option takes precedence over the --users-allow, --groups-deny, and --groups-allow options.

As explained previously, a Globus user can only create a share using a storage gateway if their Globus Account includes an identity from the identity domain that the storage gateway is configured to allow and if that identity maps to a local user on the server hosting the endpoint. This option can be used to deny such users the ability to create collections even if they would have otherwise been eligible.

For example, consider a storage gateway configured with --domain abc.edu. Let us assume that the creator of the storage gateway wants to prevent the users "bob@abc.edu" and "alice@abc.edu" from being able to create collections using this storage gateway. Ordinarily, so long as the "bob" and "alice" local accounts existed on the server hosting the endpoint and so long as Alice or Bob had their "bob@abc.edu" or "alice@abc.edu" identities in the Globus Account they used to log into Globus, they’d be able to create collections on this storage gateway. However, this could be prevented by using the --users-deny option like so:

--users-deny alice,bob

7.1.3. --users-allow option

This option can be used to further control which local users are allowed to create collections using this storage gateway. Specifies local users that are explicitly allowed to create collections using this storage gateway. Users are specified in a comma separated list. This option takes precedence over the --groups-deny and --groups-allow options. See --users-deny section for more details.

7.1.4. --groups-deny option

This option can be used to further control which local users are allowed to create collections using this storage gateway. Specifies local groups that are explicitly forbidden from creating collections using this storage gateway. Groups are specified in a comma separated list. This option takes precedence over the --groups-allow option. See --users-deny section for more details.

7.1.5. --groups-allow option

This option can be used to further control which local users are allowed to create collections using this storage gateway. Specifies local groups that are explicitly allowed to create collections using this storage gateway. Groups are specified in a comma separated list. See --users-deny section for more details.

7.2. Creating a collection via a POSIX storage gateway

Your users can now create collections using the POSIX storage gateway that will enable access to the data on the associated storage system. Documentation on creating and managing collections using a POSIX storage gateway can be found here.

7.3. Details regarding effective permissions on collections created via a POSIX storage gateway

The ACL permissions you grant to a Globus user or Globus group for a collection are just one part of the puzzle in determining a user’s actual level of access to that collection. To accurately determine the effective permissions that a user will have to a collection we must also consider local file system permissions. All access made to a particular collection will use the local file system permissions of the local user that was associated with the collection when the collection was created as a base. This access is further restricted by the ACLs that are set on the collection as well. When considering the effective permissions that a Globus user has to a collection, it is important to remember that the most restrictive of the collection ACL permissions and the local file system permissions will determine access.

To illustrate, consider a collection that is configured to provide access to the /data/public/ local path on the server hosting the endpoint. Let us assume that this collection is associated with the "alice" local user, and that the "alice" local user has read-only permissions to this path. Let us now assume that Alice, the owner of the collection, creates an ACL entry for the collection which grants the "bob@abc.edu" identity read and write permissions for the collection. Despite the fact that an ACL entry has been created for the collection that grants the "bob@abc.edu" identity read and write permissions, the "bob@abc.edu" identity will still only have effective read-only permissions for the collection due to the fact that the local user that the share is associated with has read-only access to the /data/public/ path on the server hosting the endpoint.

8. Management Console

The management console, available on managed endpoints, provides a graphical web interface that can be used to monitor endpoint activity and to identify and troubleshoot faults that may indicate underlying infrastructure issues. An Administrator for an endpoint decides who has access to the Management Console for an endpoint via the assignment of the Activity Manager or Activity Monitor role to users, as appropriate. Instructions on how to manage and assign roles for an endpoint can be found here.

You can read about the details and benefits of the management console here.

9. Roles and privileges

Users (or groups) can be granted various roles on any managed endpoint, with each role granting the user (or group) different privileges with respect to that endpoint. All roles can be managed via the Transfer API or the Roles tab on the Manage Endpoints page on the Globus webapp.

The following roles are supported on managed endpoints. These roles need to be explicitly set and none of the privileges are inherited.

  • Administrator

    • Has full control over the endpoint definition of the endpoint.

    • Can delete endpoint definition

    • Can see endpoint definition even if set to private

    • Can manage roles for endpoint

    • An administrator for an S3 endpoint or a share also has all of the abilities of an Access Manager

    • Can be granted by other administrators. The creator of the endpoint is granted Administrator role by default

    • Does not have Activity Manager and/or Activity Monitor capabilities without being granted such explicitly

  • Activity Manager

    • Has full access to the Management Console for the endpoint

    • Can see endpoint definition even if set to private

    • Can be granted by any user who has Administrator role on the endpoint

  • Activity Monitor

    • Has read only access to the Management Console for the endpoint

    • Can see endpoint definition even if set to private

    • Can be granted by any user who has Administrator role on the endpoint

  • Access Manager

    • Can manage permissions on any endpoints that supports sharing (S3 or shared endpoints)

    • Has read/write access to folders and files on the endpoint

    • If the endpoint is set to private (in the case of S3 endpoint), cannot see the endpoint.

    • Can be granted by any user who has Administrator role on the endpoint

10. Generating and Monitoring Log Files

10.1. GridFTP Server Log Files

By default, the GridFTP log is located at:

/var/log/gridftp.log

The configuration settings for the GridFTP log file are found in this file:

/etc/gridftp.d/globus-connect-server

Logging for the GridFTP service is enabled by default. Additional details concerning logging for the GridFTP server are available in the globus-gridftp-server man page here.

11. Getting Help

11.1. Troubleshooting Common Problems

This section describes some basic tests you can run when you experience problems with a transfer or an endpoint. These tests can help you narrow down the potential causes of the issue and simplify troubleshooting.

11.1.1. Test Basic Endpoint Functionality

An important verification of endpoint health is to confirm that the endpoint is able to successfully participate in transfers from and to other endpoints. Globus maintains two test endpoints, Globus Tutorial Endpoint 1 and Globus Tutorial Endpoint 2, that are always available for users to access when checking the functionality of their own endpoints. First, attempt to transfer the contents of the /share/godata/ directory on the Globus Tutorial Endpoint 1 endpoint to your own endpoint. After that, attempt to transfer those same files to the /~/ directory on the Globus Tutorial Endpoint 2 endpoint. If these tests both succeed, then your endpoint is functional and able to serve as the destination and the source of transfers. For more detailed instructions on how to use the Globus service to transfer files, see here.

11.1.2. Verify globus-gridftp-server Service

Another important check on servers hosting a Globus endpoint is to verify that the globus-gridftp-server service has properly started and is running. To do this, first use the ps command to see if there is an instance of globus-gridftp-server running:

# ps aux | grep globus-gridftp-server
root       604  0.0  0.7  97924  7312 ?        Ss   14:18   0:00 /usr/sbin/globus-gridftp-server -c /etc/gridftp.conf -C /etc/gridftp.d -pidfile /var/run/globus-gridftp-server.pid -no-detach -config-base-path /

If you do not see an instance of globus-gridftp-server running, then the service has not started. You can try to start it by executing the globus-connect-server-setup command and then checking to see if an instance of globus-gridftp-server appears in the ps output. If you still don’t see an instance of globus-gridftp-server running after issuing the globus-connect-server-setup command, you can take a look in the logs for clues as to what might be wrong.

11.2. Globus Help Resources

11.2.1. Documentation Website

This website (docs.globus.org) contains a wealth of information about configuring and using the Globus service. Many common issues can be resolved quickly by browsing our frequently asked questions and reading the relevant guides and how-to’s. We recommend consulting these resources first when looking for fast resolution to any issue you are having with the Globus service.

11.2.2. Mailing Lists

If you use Globus, then participating in one or more of the public email lists is an excellent way to keep in touch with your peers in the Globus Community. For questions about managing your Globus deployment, e.g. installing software for a Globus endpoint, configuring your firewall, and integrating your institution’s identity system, subscribe to the admin list. For other inquiries and discussions, try the user or developer lists. For more information on mailing lists and how to subscribe, click here.

11.2.3. Globus Support

Questions or issues that pertain to Globus Connect Server installation or to any client or service that is used in the Globus software-as-a-service (SaaS) or platform-as-a-service (PaaS) offering can be directed to the Globus support team by submitting a ticket. Subscriptions include a guaranteed support service level.

When submitting a ticket for an issue with Globus Connect Server, please include the endpoint name, a description of your issue, and screenshot/text dumps of any errors you are seeing. Please also include the output of the following commands, run as root, from the server hosting the GCS endpoint:

uname -a
ifconfig
ping $(hostname -f)
cat /etc/issue
cat /etc/gridftp.d/*
cat /etc/gridftp.conf
cat /var/lib/globus-connect-server/endpoint-uuid.txt
globus-gridftp-server --version
grep -v "^$\|^;" /etc/globus-connect-server.conf

Appendix A: Understanding Data Channel Traffic

The data channel is where Globus Connect Server actually transmits the data that is being moved between endpoints. The default port range used for data channel connections is TCP 50000 to 51000. We strongly recommend that all endpoints be configured to use the default data port range, as this will provide maximum compatibility with other endpoints that are also configured to use the default data port range and have their firewall rules configured to allow traffic in this range. If your endpoint uses a non-default data port range, then you are - in effect - requiring other sites to potentially have to create additional firewall rules in order to be able to communicate properly with your endpoint. Many sites will not want to do this, which will thus limit the ability of your endpoint to interoperate with the majority of endpoints which are configured to use the default port range.

If two endpoints (ep1 and ep2) are to be able to successfully conduct transfers, then those endpoints must each be able to connect to each other in their configured data port ranges. For example, consider the following:

Globus Connect Server ep1 uses data port range 40000 to 41000

Globus Connect Server ep2 uses data port range 50000 to 51000

When two Globus Connect Server endpoints attempt to conduct a transfer, the endpoint that will be the recipient in that transfer picks out a port (or ports) in its configured data port range that it will listen on to receive the the transfer from the sender endpoint. This port value gets communicated back from the receiver endpoint to the sender endpoint via GridFTP control channel data mediated by the Globus service, which both the sender and recipient are listening to on port 443. Once the sender endpoint receives the data port range info for the recipient endpoint, it then initiates an outbound connection to the recipient to that port (or ports) on the recipient to conduct the actual data transfer.

To illustrate, consider the case of ep1 and ep2 mentioned above. If ep1 wanted to send ep2 a file, then ep2 would pick out a port (or ports) in its configured data port range of 50000 to 51000. For the sake of example let’s say that port 50021 has been chosen. This value would then get communicated from ep2 to ep1, via the Globus service through the GridFTP control channel that both ep1 and ep2 are listening to. At that point, ep1 would then initiate a connection out to port 50021 on ep2.

To further illustrate, consider again the case of ep1 and ep2 mentioned above. If ep2 wanted to send ep1 a file, then ep1 would pick out a port (or ports) in its configured data port range of 40000 to 41000. For the sake of example let’s say that port 40331 has been chosen. This value would then get communicated from ep1 to ep2, via the Globus service through the GridFTP control channel that both ep1 and ep2 are listening to. At that point, ep2 would then initiate a connection out to port 40331 on ep1.

It is also important to consider what happens in cases where one endpoint is a Globus Connect Server endpoint and the other endpoint is a Globus Connect Personal endpoint. In such cases, the Globus Connect Personal endpoint will always initiate the connection to the Globus Connect Server endpoint for the transfer. Thus, it will always be the Globus Connect Server endpoint that picks the port (or ports) on which it will listen for that connection. This is the case irrespective of which endpoint is the sender or the recipient. As discussed previously, this information gets communicated from the Globus Connect Server endpoint to the Globus Connect Personal endpoint via the Globus service.

After looking at the example given we can see that, in terms of firewall rules, the outbound rules for ep1 must allow it to connect outbound to ep2 on ep2’s configured data port range if ep1 is to be able to send files to ep2. In terms of inbound rules, the firewall rules for ep1 must be configured to allow it to accept inbound connections on its own configured data port range for it to be able to receive files from other endpoints. The firewall rules for the data port range of any endpoint will be similar, and must allow outbound connections to the configured data port range of a remote endpoint for the local endpoint to be able to send files to the remote endpoint, and must allow inbound connections to the configure data port range of the local endpoint for that endpoint to be able to receive files from other endpoints.

As illustrated, an endpoint must be able to receive inbound connections on its own configured data port range, as well as be able to make outbound connections to the data port range of any endpoint it wishes to communicate with. If all Globus Connect Server admins pick their own custom port ranges, then this quickly leads to a situation in which site firewall policies become littered with custom rules for these various port ranges and endpoints. However, if everyone uses the default data port range, then firewall rules are much more predictable and manageable. It is for this reason that we recommend that everyone use the default data port range for their endpoint. Those who use a custom data port range may find that they have problems with their endpoint being able to communicate with other endpoints, for the reasons detailed above. Those using custom data port ranges may also find that the admins of other sites and endpoints may not be willing to set up custom firewall rules to accommodate custom data port range choices.

Appendix B: How to update a Globus Connect Server 5.1 install

If you are using a version of Globus Connect Server released prior to Globus Connect Server 5.1, including all versions of Globus Connect Server 4.x and Globus Connect Server 5.0, then then you cannot update to GCSv5.1 with the instructions given here. In such a case, please contact us to discuss your options as to how to migrate to Globus Connect Server 5.1.

If you are using Globus Connect Server 5.1, then follow these instructions to update your install:

Red Hat Enterprise Linux, CentOS

$ sudo yum update \*globus\*

Debian, Ubuntu

$ sudo apt-get update
$ sudo apt-get install --only-upgrade ".\*globus.\*"

After updating your packages, be sure to restart the services and ensure that the update takes full effect by running:

$ sudo globus-connect-server-setup

Appendix C: Setting Endpoint Network Use Options

Globus transfer uses configured network use levels and location of an endpoint to determine performance parameters to set on transfers against the endpoint. Administrators of endpoint may override the default values to best suit their deployment and needs. The configuration settings from source and destination endpoints are used to determine the concurrency and parallelism options used for a given transfer, thus leveraging the available transfer capacity, without overwhelming smaller capacity endpoints during transfers with larger capacity endpoints.

The location parameter is used to determine the distance and hence expected latency between the two endpoints, and is used in the automatic tuning of the transfers. By default the value of location parameter is automatically determined by Globus, but can be set by the endpoint administrator to explicit coordinates (in decimal degrees). This parameter cannot be set for S3 endpoints or shared endpoints.

Network use is set to "Normal" level by default. An administrator of a managed endpoint can set the network use levels for transfers against their endpoint. Endpoints that have multiple physical servers, and good end to end connectivity (network and storage) can set higher network use to ensure that Globus uses the bandwidth available, while smaller deployments can set this to lower levels.

Three preset options are provided for the endpoint administrator, which have the following values:

Option Value

Minimal

MaxConcurrency = 1

PreferredConcurrency = 1

MaxParallelism = 1

PreferredParallelism = 1

Normal (Default)

MaxConcurrency = number of servers * 4

PreferredConcurrency = number of servers * 2

MaxParallelism = 8

PreferredParallelism = 4

Aggressive

MaxConcurrency = number of servers * 8

PreferredConcurrency = number of servers * 4

MaxParallelism = 16

PreferredParallelism = 4

Note: S3 endpoints do not support parallelism options, only concurrency.

In addition to above, an administrator can choose the "Custom" option that lets them set absolute values for both concurrency and parallelism. All these options have a limit of 64 for MaximumConcurrency and MaximumParallelism. These values can be modified by using the —network-use option on endpoint-modify command in the Globus CLI.

For a given transfer, the concurrency is calculated as the smallest value across the MaximumConcurrency values of both endpoints, and the maximum of the PreferredConcurrency of both endpoints. Parallelism is also calculated similarly, with an additional consideration for transfers with high latency (trans-oceanic transfers) where the parallelism is set to minimum of the Maximum Parallelism value set for both endpoints.

Glossary

Access Manager

The access manager role grants the ability to control read and/or write access permissions for other Globus users on a shared endpoint. You can read a more in-depth discussion here.

Collection

Collections are what Globus users see and use to access their data. A set of named files (blobs), hierarchically named in folders, associated with a storage gateway. Each collection has a unique DNS name, is accessible and manageable via HTTPS (client/server access), GridFTP (async bulk transfer), and REST API (advanced operations), and authenticated and authorized via Globus Auth-issued OAuth2 access tokens. The current Globus system calls these “endpoints”. Two type of collections, as determined by access requirements are supported:

  • Mapped collection: Each user accessing data on the collection must have a local account on the storage system. In the current Globus system, these are called “host endpoints”.

  • Guest collection: Users can access data on the collection without a local account on the storage system, based on permissions granted by a local user via Globus. In the current Globus systems, these are called “shared endpoints”

Endpoint

Endpoint (Changed from version 4): A deployment of Globus Connect Server, optionally across multiple data transfer nodes. This provides the interface for management and configuration. An endpoint can be configured with more than one “Connector” to allow the endpoint to talk to multiple different types of storage systems simultaneously ((e.g., POSiX file system, Google Drive, etc.).

Endpoint Definition

This term refers to the metadata about the endpoint, stored as an object in the Globus.org database, used to simplify using and referring to the endpoint for users. Much of the information in the endpoint definition is sent to Globus when the globus-connect-server-setup command is run.

Globus Account

A Globus Account is the set of a user’s linked identities in Globus. The first identity in the user’s identity set in the Globus account is the primary identity, and all subsequent identities added to the Globus account will be linked identities. For example, if Bob has a "bob@globusid.org" identity as their primary identity, and also has a "bob@abc.edu" identity as a linked identity, then Bob’s Globus account contains both the "bob@globusid.org" identity and "bob@abc.edu" identity in its identity set. A user can view and manage the identities in their Globus account here.

GridFTP

GridFTP is an extension of the standard File Transfer Protocol (FTP) for high-speed, reliable, and secure data transfer. See the GridFTP documents for more information.

Managed Endpoint

A managed endpoint is an endpoint that is covered under a subscription and allows advanced features to be enabled. To convert an existing endpoint into a managed endpoint see this writeup.

Storage Connector

A plug-in installed on a Globus Connect Server node that allows it to support a particular storage type. E.g. Google Drive connector, HPSS connector, etc.

Storage Gateway

A named, discoverable set of policies against an endpoint that defines who can access data on a particular subset of a storage system connected to the endpoint, and how such data can be accessed. Multiple storage gateways can be created against a storage system connected to the endpoint.