Last Updated: September 15, 2021

The Globus Azure Blob storage connector can be used for access and sharing of data on Microsoft Azure Blob Storage. The connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.

This document describes how to install the Azure Blob Connector and configure Azure Blob Storage Gateways and Collections. After these steps are complete, any Globus user you have authorized can register a credential to access Azure Blob files to which they have permissions to and, if enabled, can create guest collections for sharing access using those credentials by following the instructions in How To Share Data Using Globus.

This document assumes that you or another administrator has already installed Globus Connect Server v5.4.28 or higher on all data transfer nodes, and that you have an administrator role on that endpoint.

The installation must be done by a system administrator, and has the following distinct set of steps:

  • Register a Microsoft Azure Application which the connector will use to access the Azure Blob APIs.

  • Create a storage gateway on the endpoint configured to use the Azure Blob Connector.

  • Create a mapped collection using the Azure Blob Storage Gateway to provide access to Azure Blob Storage Gateway data.

Please contact us at support@globus.org if you have questions or need help with configuration and use of the Azure Blob Connector.


Azure Blob Connector Virtual Filesystem

Azure Blob Storage is unstructured object storage, where each data object is accessed based on a container name and a blob name.

The Azure Blob Connector attempts to make this look like a regular filesystem, by exposing containers as directories in the root of the storage gateway’s file system.

The Azure Blob Connector then treats the / character in blob names as a delimiter, presenting blobs in what looks like subdirectories.

For example, the blob projects/abc/output.txt in container project-data would appear as the file 'output.txt' in the /project-data/projects/abc directory.

Registration of endpoint with Microsoft

The Globus Connect Server v5 endpoint needs to be registered as an application with Microsoft so that users can authorize the endpoint to access Azure storage on their behalf, or so the app credentials can be used as a service principal. The following steps describe how the endpoint can be registered as an Azure application to obtain a client id and secret. See Microsoft documentation for additional information.

Prerequisites

It is necessary that these steps be performed on a fully functional Globus Connect Server 5 endpoint.

You will need a Microsoft account to complete these steps, and the registration will be stored under that account. This account should be in the same organization as the Azure storage account.

Registration Steps

  1. To register the endpoint with Microsoft, go to Microsoft Azure App registrations

  2. Select + New registration to add a new registration.

    1. For Name, enter a name such as Globus Connect Server. This will be displayed to users of your collection when they are prompted to log in to Microsoft during credential registration.

    2. For Supported account types you should choose Single Tenant.

    3. For Redirect URI: set the value that was displayed when the endpoint was created.

      If you don’t have that value handy, you can run the command

      globus-connect-server endpoint show

      You’ll see output that looks somthing like this:

      Display Name:    Test Endpoint
      ID:              669ec822-ca79-455c-89a7-cccb7aefbf8e
      Subscription ID: 6e62e6d7-e368-45f4-a23d-fb41243e8005
      Public:          True
      GCS Manager URL: https://21542.data.globus.org
      Network Use:     normal

      You can construct the auth callback URL by appending /api/v1/authcallback to the value of the GCS Manager URL. In this example case, the result is https://21542.data.globus.org/api/v1/authcallback.

    4. Select Register

  3. Select API permissions to configure the permissions required for Azure storage access. This step is not required for service principal authentication.

    1. Select +Add a permission

      1. Select Microsoft Graph, and then Delegated permissions.

      2. Under OpenId permissions, check email, offline_access, openid, profile.

      3. Select < All APIs to go back one screen.

      4. Select Azure Storage, and then Delegated permissions.

      5. Under Permissions, check user_impersonation.

      6. Select Add permissions to save these selections

  4. Select Certificates & secrets to create a secret

    1. Select + New client secret

      1. Enter a description and choose an expiration time, if desired. The storage-gateway will need to be updated if the secret changes.

    2. Make note of the Value. This will be used to configure the storage gateway --ms-client-secret option

  5. If desired, select Branding to configure additional login screen details.

  6. Select Overview

    1. Note the Application (client) ID. This will be used to configure the storage gateway --ms-client-id option.

    2. If you chose Single Tenant for supported account types, note the Directory (tenant) ID. This will be used to configure the storage gateway --ms-tenant option.

  7. App registration is complete.

Storage Gateway

An Azure Blob Storage Gateway is created with the command globus-connect-server storage-gateway create azure-blob, and can be updated with the command globus-connect-server storage-gateway update azure-blob.

Before looking into the policy options specific to the Azure Blob Connector, please familiarize yourself with the Globus Connect Server v5 Data Access Guide, which describes the steps to create and update a storage gateway, using the POSIX connector as an example. The commands to create and update a storage gateway for the Azure Blob Connector are similar.

Azure Blob Connector Storage Gateway Policies

The Azure Blob Connector has policies to manage application credentials and storage account settings.

Application Credentials

The --ms-client-id and --ms-client-secret command-line options provide information for Globus Connect Server to authenticate with Azure Blob Storage. These values must be configured in order for users to access data on collections created with the Azure Blob Connector.

By default, each user of a Azure Blob collection will authenticate to their own Azure AD account, which must have been granted permission to access blob storage via Azure’s Role-Based Access Control. Alternately, you can use --azure-credential-type to use the --ms-client-id and --ms-client-secret values as service principal credentials. When using service principal auth, all users of the collection will access the storage using those service credentials.

You will also need to know your Microsoft tenant ID, Azure Storage account name, and whether or not Azure Data Lake Gen2 hierarchical namespace is enabled on the storage account.

These are configured after registering the application with Microsoft as described in the Azure Blob Connector configuration guide.

Example 1. Setting Azure Blob Connector Application Credentials.

For our example, we’ll assume we’ve obtained credentials as described above. We’ll use the command-line options --ms-client-id and --ms-client-secret to configure these on our storage gateway, along with --ms-tenant and --azure-storage-account to configure the storage account. Our storage account has Azure Data Lake Gen2 hierarchical namespace enabled.

    --ms-client-id CLIENT_ID \
    --ms-client-secret CLIENT_SECRET \
    --ms-tenant TENANT_ID \
    --azure-storage-account STORAGE_ACCOUNT \
    --azure-adls

Creating the Storage Gateway

Now that we have decided on all our policies, we’ll use the following command to create the storage gateway.

% globus-connect-server storage-gateway create azure-blob \
    "Azure Blob Storage Gateway" \
    --domain example.org \
    --ms-client-id CLIENT_ID \
    --ms-client-secret CLIENT_SECRET \
    --ms-tenant TENANT_ID \
    --azure-storage-account STORAGE_ACCOUNT \
    --azure-adls

Storage Gateway Created: 7187a9a0-68e4-48ea-b3b9-7fd06630f8ab

This was successful and outputs the ID of the new storage gateway ( 7187a9a0-68e4-48ea-b3b9-7fd06630f8ab in this case) for our reference. Note that this will always be a unique value if you run the command. If you forget the id of a storage gateway, you can always use the command globus-connect-server storage-gateway list to get a list of the storage gateways on the endpoint.

You can also add other policies to configure additional identity mapping and path restriction policies as described in the Globus Connect Server v5 Data Access Guide.

Note that this creates the storage gateway, but does not yet make it accessible via Globus and HTTPS. You’ll need to follow the steps in the next section.

Collection

An Azure Blob Collection is created with the command globus-connect-server collection create, and can be updated with the command globus-connect-server collection update.

As the Azure Blob Connector does not introduce any policies beyond those used by the base collection type, you can follow the sequence in the Collections Section of the Globus Connect Server v5 Data Access Guide. Recall however, that the paths are interpreted as described above in Azure Blob Connector Virtual Filesystem.

User Credential

As mentioned above, access to Azure Blob mapped collections require users to register credentials. These credentials are created by performing an authentication flow with Microsoft. This is initiated by visiting the Credentials tab of the collection details page on the Globus web app. The user is directed to that page when they first attempt to access that collection.

When the storage gateway is configured for service principal credentials, the user will not visit the Credentials tab of the collection to register their own credential. In that case, all users will access the blob storage as the service principal.

Appendix A: Limitations

Directories

Azure Blob without Azure Data Lake Gen2 hierarchical namespace support does not have true directories, so directories created by this connector are normal blobs with metadata hdi_isfolder = true. It is possible that other tools may be confused by the presence of these blobs.

If the --azure-adls/--azure-no-adls storage gateway option is improperly set, directory creation and deletion will fail, and directory listings may produce incorrect results.

Appendix B: Document Types for the Azure Blob Connector

AzureBlobStoragePolicies Document

This type describes the public and private policies for a Azure Blob Storage Gateway.

Name

Type

Description

DATA_TYPE

string azure_blob_storage_policies#1.0.0

Type of this document

client_id

string

Client ID registered with the Azure console to access Azure Blob.[Private]

secret

string

Secret created in the Azure console to access Azure Blob with the client_id in this policy.[Private]

tenant

string

Tenant ID of the Microsoft organization.[Private]

auth_type

string

The method of authentication to Azure. 'user' prompts the user to log in to their Microsoft account via an oauth2 flow. 'service_principal' uses the configured client_id and client_secret values to authenticate as an Azure service principal.

account

string

Azure Storage account to access with this storage gateway.[Private]

adls

boolean

Flag indicating the Azure storage account has enabled Azure Data Lake Gen2 hierarchical namespace support.[Private]

auth_callback

string <uri>

URL of the auth callback that must be registered on the Azure console for the application client_id in order to process Microsoft credentials.

{
  "DATA_TYPE": "azure_blob_storage_policies#1.0.0",
  "client_id": "string",
  "secret": "string",
  "tenant": "string",
  "auth_type": "string",
  "account": "string",
  "adls": true,
  "auth_callback": "https://example.globus.org/api/v1/authcallback"
}

AzureBlobUserCredentialPolicies Document

The AzureBlobUserCredentialPolicies document describes the Azure Blob-specific policy information included in a UserCredential. This document contains read-only data about the user’s credentials.

Name

Type

Description

DATA_TYPE

string azure_blob_user_credential_policies#1.0.0

Type of this document

sub

string

OAuth subject identifier claim.

email

string <email>

OAuth email claim.

access_token

string

OAuth access token.

refresh_token

string

OAuth refresh token.

scopes

array (string)

OAuth scopes associated with this access token.

token_expiry

string <date-time>

OAuth access token expiration time.

{
  "DATA_TYPE": "azure_blob_user_credential_policies#1.0.0",
  "sub": "string",
  "email": "user@example.com",
  "access_token": "string",
  "refresh_token": "string",
  "scopes": [
    "openid",
    "email",
    "profile",
    "offline_access",
    "user_impersonation"
  ],
  "token_expiry": "2020-02-04T21:44:12Z"
}