Azure Blob Connector
The Globus Azure Blob storage connector can be used for access and sharing of data on Microsoft Azure Blob Storage. The connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.
This document describes how to install the Azure Blob Connector and configure Azure Blob Storage Gateways and Collections. After these steps are complete, any Globus user you have authorized can register a credential to access Azure Blob files to which they have permissions to and, if enabled, can create guest collections for sharing access using those credentials by following the instructions in How To Share Data Using Globus.
This document assumes that you or another administrator has already installed Globus Connect Server v5.4.28 or higher on all data transfer nodes, and that you have an administrator role on that endpoint.
The installation must be done by a system administrator, and has the following distinct set of steps:
-
Register a Microsoft Azure Application which the connector will use to access the Azure Blob APIs.
-
Create a storage gateway on the endpoint configured to use the Azure Blob Connector.
-
Create a mapped collection using the Azure Blob Storage Gateway to provide access to Azure Blob Storage Gateway data.
Please contact us at support@globus.org if you have questions or need help with configuration and use of the Azure Blob Connector.
Azure Blob Connector Virtual Filesystem
Azure Blob Storage is unstructured object storage, where each data object is accessed based on a container name and a blob name.
The Azure Blob Connector attempts to make this look like a regular filesystem, by exposing containers as directories in the root of the storage gateway’s file system.
The Azure Blob Connector then treats the /
character in blob names
as a delimiter, presenting blobs in what looks like
subdirectories.
For example, the blob projects/abc/output.txt
in container
project-data
would appear as the file 'output.txt' in the
/project-data/projects/abc
directory.
Registration of endpoint with Microsoft
The Globus Connect Server v5 endpoint needs to be registered as an application with Microsoft so that users can authorize the endpoint to access Azure storage on their behalf, or so the app credentials can be used as a service principal. The following steps describe how the endpoint can be registered as an Azure application to obtain a client id and secret. See Microsoft documentation for additional information.
Prerequisites
It is necessary that these steps be performed on a fully functional Globus Connect Server 5 endpoint.
You will need a Microsoft account to complete these steps, and the registration will be stored under that account. This account should be in the same organization as the Azure storage account.
Registration Steps
-
To register the endpoint with Microsoft, go to Microsoft Azure App registrations
-
Select + New registration to add a new registration.
-
For Name, enter a name such as Globus Connect Server. This will be displayed to users of your collection when they are prompted to log in to Microsoft during credential registration.
-
For Supported account types you should choose Single Tenant.
-
For Redirect URI: set the value that was displayed when the endpoint was created.
If you don’t have that value handy, you can run the command
globus-connect-server endpoint show
You’ll see output that looks something like this:
Display Name: Test Endpoint ID: 669ec822-ca79-455c-89a7-cccb7aefbf8e Subscription ID: 6e62e6d7-e368-45f4-a23d-fb41243e8005 Public: True GCS Manager URL: https://21542.data.globus.org Network Use: normal
You can construct the auth callback URL by appending
/api/v1/authcallback
to the value of the GCS Manager URL. In this example case, the result ishttps://21542.data.globus.org/api/v1/authcallback
. -
Select Register
-
-
Select API permissions to configure the permissions required for Azure storage access. This step is not required for service principal authentication.
-
Select +Add a permission
-
Select Microsoft Graph, and then Delegated permissions.
-
Under OpenId permissions, check
email
,offline_access
,openid
,profile
. -
Select < All APIs to go back one screen.
-
Select Azure Storage, and then Delegated permissions.
-
Under Permissions, check
user_impersonation
. -
Select Add permissions to save these selections
-
-
-
Select Certificates & secrets to create a secret
-
Select + New client secret
-
Enter a description and choose an expiration time, if desired. The storage-gateway will need to be updated if the secret changes.
-
-
Make note of the Value. This will be used to configure the storage gateway --ms-client-secret option
-
-
If desired, select Branding to configure additional login screen details.
-
If desired, select Token configuration to add the optional upn claim to the ID token. This is usually not necessary; See credential mapping.
-
Select Overview
-
Note the Application (client) ID. This will be used to configure the storage gateway --ms-client-id option.
-
If you chose Single Tenant for supported account types, note the Directory (tenant) ID. This will be used to configure the storage gateway --ms-tenant option.
-
-
App registration is complete.
Azure Blob Configuration Encryption
All configuration information, including Azure Blob secrets and user credential information, is encrypted with a secret key on the node servicing the request before storing it locally and uploading it to GCS cloud services for distribution to other nodes in the endpoint. The encryption key is only available locally to the node and is secured such that only the node admin has access.
Storage Gateway
An Azure Blob Storage Gateway is created with the command globus-connect-server storage-gateway create azure-blob, and can be updated with the command globus-connect-server storage-gateway update azure-blob.
Before looking into the policy options specific to the Azure Blob Connector, please familiarize yourself with the Globus Connect Server v5 Data Access Guide, which describes the steps to create and update a storage gateway, using the POSIX connector as an example. The commands to create and update a storage gateway for the Azure Blob Connector are similar.
Azure Blob Connector Storage Gateway Policies
The Azure Blob Connector has policies to manage application credentials and storage account settings.
Application Credentials
The --ms-client-id and --ms-client-secret command-line options provide information for Globus Connect Server to authenticate with Azure Blob Storage. These values must be configured in order for users to access data on collections created with the Azure Blob Connector.
By default, each user of a Azure Blob collection will authenticate to their own Azure AD account, which must have been granted permission to access blob storage via Azure’s Role-Based Access Control. The user’s Azure AD account must match the username mapped from their Globus identity, unless the --ms-allow-any-account command-line option is set.
Alternately, you can use --azure-credential-type to use the --ms-client-id and --ms-client-secret values as service principal credentials. When using service principal auth, all users of the collection will access the storage using those service credentials.
You will also need to know your Microsoft tenant ID, Azure Storage account name, and whether or not Azure Data Lake Gen2 hierarchical namespace is enabled on the storage account.
These are configured after registering the application with Microsoft as described in the Azure Blob Connector configuration guide.
For our example, we’ll assume we’ve obtained credentials as described above. We’ll use the command-line options --ms-client-id and --ms-client-secret to configure these on our storage gateway, along with --ms-tenant and --azure-storage-account to configure the storage account. Our storage account has Azure Data Lake Gen2 hierarchical namespace enabled.
--ms-client-id CLIENT_ID
\
--ms-client-secret CLIENT_SECRET
\
--ms-tenant TENANT_ID
\
--azure-storage-account STORAGE_ACCOUNT
\
--azure-adls
Creating the Storage Gateway
Now that we have decided on all our policies, we’ll use the following command to create the storage gateway.
% globus-connect-server storage-gateway create azure-blob \
"Azure Blob Storage Gateway" \
--domain example.org
\
--ms-client-id CLIENT_ID
\
--ms-client-secret CLIENT_SECRET
\
--ms-tenant TENANT_ID
\
--azure-storage-account STORAGE_ACCOUNT
\
--azure-adls
Storage Gateway Created: 7187a9a0-68e4-48ea-b3b9-7fd06630f8ab
This was successful and outputs the ID of the new storage gateway (
in this case) for our reference. Note that this will always
be a unique value if you run the command. If you forget the id of a storage
gateway, you can always use the command
globus-connect-server storage-gateway
list to get a list of the storage gateways on the endpoint.7187a9a0-68e4-48ea-b3b9-7fd06630f8ab
You can also add other policies to configure additional identity mapping and path restriction policies as described in the Globus Connect Server v5 Data Access Guide.
Note that this creates the storage gateway, but does not yet make it accessible via Globus and HTTPS. You’ll need to follow the steps in the next section.
Collection
An Azure Blob Collection is created with the command globus-connect-server collection create, and can be updated with the command globus-connect-server collection update.
As the Azure Blob Connector does not introduce any policies beyond those used by the base collection type, you can follow the sequence in the Collections Section of the Globus Connect Server v5 Data Access Guide. Recall however, that the paths are interpreted as described above in Azure Blob Connector Virtual Filesystem.
User Credential
As mentioned above, access to Azure Blob mapped collections require users to register credentials. These credentials are created by performing an authentication flow with Microsoft. This is initiated by visiting the Credentials tab of the collection details page on the Globus Web App. The user is directed to that page when they first attempt to access that collection.
When registering credentials, the Microsoft account username must match the
mapped username on the collection (by default the Globus account username, unless
identity mapping
is configured). The Microsoft account username is determined from
the preferred_username
claim of the MS ID token as long as it is an email
address, otherwise the email
claim is used. In most cases the preferred_username
claim will be correct, but if the optional upn
claim is enabled in the
app registration, that will be used instead.
Alternately, the storage-gateway --ms-allow-any-account command-line option can be set to allow access to any Microsoft account.
When the storage gateway is configured for service principal credentials, the user will not visit the Credentials tab of the collection to register their own credential. In that case, all users will access the blob storage as the service principal.
Appendix A: Limitations
Directories
Azure Blob without Azure Data Lake Gen2 hierarchical namespace support does not
have true directories, so directories created by this connector are normal blobs
with metadata hdi_isfolder = true
. It is possible that other tools may be
confused by the presence of these blobs.
If the --azure-adls/--azure-no-adls storage gateway option is improperly set, directory creation and deletion will fail, and directory listings may produce incorrect results.
Appendix B: Document Types for the Azure Blob Connector
AzureBlobStoragePolicies Document
Connector-specific storage gateway policies for the AzureBlob connector
One of the following schemas:
{
"DATA_TYPE": "azure_blob_storage_policies#1.0.0",
"account": "string",
"adls": true,
"auth_callback": "string",
"auth_type": "string",
"client_id": "string",
"secret": "string",
"tenant": "string",
"user_credential_required": true
}
AzureBlobUserCredentialPolicies Document
Connector-specific user credential policies for the AzureBlob connector
One of the following schemas:
{
"DATA_TYPE": "azure_blob_user_credential_policies#1.0.0",
"access_token": "string",
"email": "string",
"refresh_token": "string",
"scopes": [
"string"
],
"sub": "string",
"tid": "string",
"token_expiry": "2019-08-24T14:15:22Z"
}