The Google Cloud Storage connector allows Globus Connect Server to access Google Cloud Storage buckets associated with Google accounts. Access to Google Cloud Storage through the Google Cloud Storage connector is facilitated by the creation of Google Cloud Storage storage gateways on an endpoint. The Google Cloud Storage connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.

This document describes how to create Google Cloud Storage storage gateways and collections. After the installation is complete, any authorized user can register credentials to access data on Google Cloud Storage buckets, or create a guest collection by following the steps in this How To.

The installation must be done by a system administrator, and has the following distinct set of steps:

  • Registration of the endpoint with Google to obtain credentials for the endpoint to securely use the Google Cloud Storage APIs for accessing data.

  • Create a storage gateway on the endpoint configured to use the Google Cloud Storage connector and the credentials from Google.

  • Create a mapped collection using the storage gateway to allow access to Google Cloud Storage Data

Please contact us at support@globus.org if you have questions or need help with installation and use of the Google Cloud Storage connector.


Google Cloud Storage Virtual Filesystem

The Google Cloud Storage connector provides a distributed object store, where each data object is accessed based on a bucket name and an object name.

The Google Cloud Storage Connector attempts to make this look like a regular filesystem, by treating the bucket name as the name of a directory in the root of the storage gateway’s file system. For example, if a user has access to buckets bucket1 and bucket2, then those buckets would show up as directories when listing /.

The Google Cloud Storage Connector also treats the / character as a delimiter in the API so that it can present something that looks like like subdirectories. For example, the object object1 in bucket1 would appear as /bucket1/object1 to the Google Cloud Storage connector, and the object object2/object3 in bucket2 would appear as a file called object3 in the directory /bucket2/object2.

Registration of endpoint with Google

The Globus Connect Server v5 endpoint needs to be registered as an application with Google so that users can authorize the endpoint to access Google Cloud Storage or Google Drive on their behalf. The following steps describe how the endpoint can be registered as a Google OAuth client to obtain a client id and secret from Google.

Note

The same client id and secret may be used for both Google Cloud Storage and Google Drive connectors — it is not necessary to register twice.

Prerequisites

It is necessary that these steps be performed on a fully functional Globus Connect Server 5 endpoint, as discussed above.

You will need a Google account to complete these steps, and the registration will be stored under that Google account. This account is only for registration of the application and has no bearing on Google accounts that will be allowed to use this endpoint to access data. An administrator may use an existing Google account.

Steps

  1. To register the endpoint with Google, go to the Google Developer Console

  2. If you have never created a project with Google, you will be prompted to create one. If you create a project, you do not have to change the default permissions for the project when given the option to do so.

  3. After you have created or selected a project, you will use the Google API Dashboard to enable APIs, configure the OAuth consent screen, and create credentials for use with your endpoint.

  4. Enable this project to use the APIs required to interact with Google Cloud Storage and Google Drive. Select the "Libary" menu, and repeat the following steps for these API names: Cloud Storage, Google Cloud Storage JSON API, Cloud Resource Manager API, and Google Drive API

    1. Search for the API name and select the matching result.

    2. Once on the API page, select "Enable".

  5. Select the "OAuth consent screen" menu to configure the OAuth consent screen that will be shown to users.

    1. For the "Application name", enter "Globus Connect Server".

    2. For the "Scopes for Google APIs" section, select "Add Scopes", then select "manually paste", and paste the following scopes before selecting "ADD":

      1. Google Drive

         https://www.googleapis.com/auth/drive.appdata
         https://www.googleapis.com/auth/drive
      2. Google Cloud Storage

         https://www.googleapis.com/auth/cloudplatformprojects.readonly
         https://www.googleapis.com/auth/devstorage.read_write
    3. For "Authorized domains", add globus.org and your own domain

    4. For "Application Homepage link" and "Application Privacy Policy link", enter a URL from your own domain, or "https://globus.org".

    5. Other fields are optional.

    6. Select "Save".

  6. Select the "Create credentials" button, and then the "OAuth client ID" option.

    1. You will be prompted to select an application type. Choose "Web application" and configure it as follows:

      1. Name: set a descriptive name to be able to identify the registration of this endpoint in your projects on the Google API Manager. For example, the endpoint Display Name can be used for this.

      2. Authorization redirect URIs: set to the value that was given for the "Google OAuth Redirect URL" when the Globus Connect Server 5 endpoint was created, as discussed in the Create Globus Endpoint section of the Globus Connect Server 5 Install Guide. If neccesary, running globus-connect-server-setup again will output the URL.

      3. Select "Create".

  7. Make note of the client ID and secret you get from Google for this application, as you will need them to configure the storage gateway.

Storage Gateway

A Google Cloud Storage Connector Storage Gateway is created with the command globus-connect-server storage-gateway create google-cloud-storage, and can be updated with the command globus-connect-server storage-gateway update google-cloud-storage.

Before looking into the policy options specific to the Google Cloud Storage Connector, please familiarize yourself with the Globus Connect Server v5 Data Access Guide which describes the steps to create and update a storage gateway, using the POSIX connector as an example. The commands to create and update a storage gateway for the Google Cloud Storage Connector are similar.

Google Cloud Storage Connector Storage Gateway Policies

The Google Cloud Storage Connector has policies to manage application credentials and to control access to an enumerated set of buckets and Google Cloud Storage projects.

Application Credentials

The --google-client-id and --google-client-secret command-line options provide information for Globus Connect Server to authenticate with Google Cloud Storage Connector. These values must be configured in order to be able to access data on collections created with the Google Cloud Storage Connector type.

These are configured by setting up the application project as described in the Google Cloud Storage Connector configuration guide.

Example 1. Setting Google Cloud Storage Connector Application Credentials.

For our example, we’ll assume we’ve obtained credentials as described above. We’ll use the command-line options --google-client-id and --google-client-secret to configure these on our storage gateway.

    --google-client-id GOOGLE_CLIENT_ID \
    --google-client-secret GOOGLE_CLIENT_SECRET

Bucket Restrictions

The --bucket command-line option argument is the name of a bucket which is allowed access by this storage gateway.

Example 2. Restricting Access to Buckets

For our example, we’ll create a Storage Gateway that restricts access to two buckets owned by our organization: research-data-bucket-1, and research-data-bucket-2. Users will be restricted to only those buckets when using collections created on this storage gateway, and only if their credential has permissions to do so.

--bucket research-data-bucket-1 --bucket research-data-bucket-2

Google Cloud Storage Gateway Projects

The --google-cloud-storage-project command-line option argument is the name of a Google Project which may be used to create collections on this storage gateway.

If no projects are configured for a Google Cloud Storage Connector Storage Gateway, then any project name can be used when creating a mapped collection. Otherwise, the project must be a member of the configured project list.

Example 3. Restricting Access to Projects

For our example, we’ll create a Storage Gateway that restricts access to two projects green-data-13843 and orange-storage-2749994. Each collection created on this storage gateway must be associated with one of those projects.

--google-cloud-storage-project green-data-13843 \
--google-coud-storage-project orange-storage-2749994

Creating the Storage Gateway

Now that we have decided on all our policies, we’ll use the command to create the storage gateway.

% globus-connect-server storage-gateway create google-drive \
    "Google Cloud Storage Gateway" \
    --domain example.org \
    --google-client-id GOOGLE_CLIENT_ID \
    --google-client-secret GOOGLE_CLIENT_SECRET \
    --bucket research-data-bucket-1 --bucket research-data-bucket-2 \
    --google-cloud-storage-project green-data-13843 \
    --google-coud-storage-project orange-storage-2749994 \

Storage Gateway Created: 7187a9a0-68e4-48ea-b3b9-7fd06630f8ab

This was successful and the output the ID of the new storage gateway ( 7187a9a0-68e4-48ea-b3b9-7fd06630f8ab in this case) for our reference. Note that this will always be a unique value if you run the command. If you forget the id of a storage gateway, you can always use the command globus-connect-server storage-gateway list to get a list of the storage gateways on the endpoint.

You can also add other policies to configure additional identity mapping and path restriction policies as described in the Globus Connect Server v5 Data Access Guide.

Note that this creates the storage gateway, but does not yet make it accessible via Globus and HTTPS. You’ll need to follow the steps in the next section.

Collection

A Google Cloud Storage Collection is created with the command globus-connect-server collection create, and can be updated with the command globus-connect-server collection update.

Collection Policies

Every Google Cloud Storage Collection must be associated with exactly one Google Cloud Storage project. For a mapped collection, this must be set on creation time (though it may be updated later). For a guest collection, it does not need to be set. If it is included in the collection creation API call, set it must match that of the mapped collection it is being created on.

Google Cloud Storage Collection Project

The --google-project-id command-line option argument is the name of a Google Project which will be used for all data accesses via the Google Cloud Storage API.

If the storage gateway has values set as described in Google Cloud Storage Gateway Projects, then the value must be a member of the list. If not, any existing project name may be used.

Users accessing a collection be members of the project set in the collection policies.

Example 4. Selecting a Project

For our example, we’ll use the green-data-13843 project we associated with our storage gateway as the project to use for our collection.

--google-project-id green-data-13843

Create a collection

The Google Cloud Storage Connector can use all of the policy setting options described in {data-access-guide-collections}. Recall however, that the paths are interpreted as described above in [google_drive_virtual_filesystem]. For our example, we’ll adapt use the policies with some adaption to be suitable for a Google Cloud Storage collection. In particular, we’ll set the base path to / and change the sharing path restrictions to allow read-only sharing of research-data-bucket.

Example 5. Create a Collection
% globus-connect-server collection create \
    7187a9a0-68e4-48ea-b3b9-7fd06630f8ab \
    / collection_name \
    --organization 'Example organization' \
    --contact-email support@example.org \
    --info-link https://example.org/storage/info \
    --description "Google Cloud Storage for Project green-data-13843" \
    --keywords example.org,home \
    --allow-guest-collections \
    --sharing-restrict-paths '{
        "DATA_TYPE": "path_restrictions#1.0.0",
        "read": ["/research-data-bucket"]
    }' \
    --google-project-id green-data-13843
Collection ID: 56c3dff0-d827-4f11-91f3-b0704c53aa4c

This was successful and the output the ID of the new collection ( 56c3dff0-d827-4f11-91f3-b0704c53aa4c in this case) for our reference. Note that this will always be a unique value if you run the command. If you forget the id of a collection, you can always use the command globus-connect-server collection list to get a list of the collections on the endpoint.

You can use this value as an endpoint for the Globus transfer service and web application, or when editing or deleting this endpoint.

There are many policy-related options to this command, they are documented in full in the reference manual, but many are discussed in later sections of this document.

User Credential

As mentioned in above, access to mapped collections on a Google Cloud Storage require users to register credentials. These credentials are created by performing an authentication flow with Google. This is initiated by visiting the Credentials tab of the collection. The user is directed to that page when they first attempt to access that collection.

Appendix A: Document Types for the Google Cloud Storage Connector

GoogleCloudStoragePolicies Document

This type describes the public and private policies for a Google Cloud Storage Gateway.

Name

Type

Description

DATA_TYPE

string google_cloud_storage_policies#1.0.0

Type of this document

client_id

string

Client ID registered with the Google Application console to access Google Cloud Storage.[Private]

secret

string

Secret created to access access Google Cloud Storage with the client_id in this policy.[Private]

buckets

array (string)

The list of Google Cloud Storage buckets which the Storage Gateway is allowed to access, as well as the list of buckets that will be shown in root level directory listings. If this list is unset, bucket access is unrestricted and all non public credential accessible buckets will be shown in root level directory listings. The value is a list of bucket names.

auth_callback

string <uri>

URL of the auth callback that must be registered on the Google API console for the application client_id in order to process Google credentials.

projects

array (string)

The list of Google Cloud Storage project ids which the Storage Gateway is allowed to access. If this list is unset, project access is unrestricted. The value is a list of project id strings.

{
  "DATA_TYPE": "google_cloud_storage_policies#1.0.0",
  "client_id": "string",
  "secret": "string",
  "buckets": [
    "string"
  ],
  "auth_callback": "https://example.globus.org/api/v1/authcallback_google",
  "projects": [
    "strawberry-delta-129193"
  ]
}

GoogleCloudStorageCollectionPolicies Document

The GoogleCloudStorageCollectionPolicies docuemnt describes google-specific policies for a collection.

Name

Type

Description

DATA_TYPE

string google_cloud_storage_collection_policies#1.0.0

Type of this document

project

string

Google Cloud Platform project ID value that is associated with this collection.

{
  "DATA_TYPE": "google_cloud_storage_collection_policies#1.0.0",
  "project": "strawberry-delta-129193"
}

GoogleCloudStorageUserCredentialPolicies Document

Name

Type

Description

DATA_TYPE

string google_cloud_storage_user_credential_policies#1.0.0

Type of this document

sub

string

OpenID Connect subject property of this credential. [read-only]

email

string <email>

OpenID Connect email property of this credential. [read-only]

access_token

string

Access token to interact with the Google Cloud Storage API. [read-only][Private]

refresh_token

string

Refresh token to generate new access tokens to use with the Google Cloud Storage API. [read-only][Private]

scopes

array (string)

List of OAuth 2 scopes associated with the tokens in this credential. [read-only]

projects

array ( GoogleCloudStorageProject )

List of Google Cloud Platform projects available for use with this credential. [read-only]

token_expiry

string <date-time>

Time when he access token expires. [read-only]

{
  "DATA_TYPE": "google_cloud_storage_user_credential_policies#1.0.0",
  "sub": "string",
  "email": "user@example.com",
  "access_token": "string",
  "refresh_token": "string",
  "scopes": [
    "string"
  ],
  "projects": [
    {
      "projectId": "strawberry-delta-129193",
      "name": "Globus Data Project"
    }
  ],
  "token_expiry": "2020-02-04T21:44:12Z"
}

© 2010- The University of Chicago Legal