Globus Connect Server Storage Gateway Create Azure Blob
Description
The globus-connect-server storage-gateway create azure-blob command creates a new Azure Blob storage gateway. When creating a storage gateway, provide the policies to access a storage system via Globus Connect Server collections.
The Azure Blob connector provides access to data stored in Microsoft Azure Blob Storage. Before creating a Azure Blob storage gateway, follow the instructions in the connector configuration manual to create the Microsoft app registration with permissions to access storage.
Azure Blob Connector Virtual Filesystem
Azure Blob Storage is unstructured object storage, where each data object is accessed based on a container name and a blob name.
The Azure Blob Connector attempts to make this look like a regular filesystem, by exposing containers as directories in the root of the storage gateway’s file system.
The Azure Blob Connector then treats the /
character in blob names
as a delimiter, presenting blobs in what looks like
subdirectories.
For example, the blob projects/abc/output.txt
in container
project-data
would appear as the file 'output.txt' in the
/project-data/projects/abc
directory.
Authentication Policies
There are three command-line options that control which user identities are allowed access to the data on a storage gateway: --domain, --authentication-timeout-mins, and --high-assurance.
The value of the --domain command-line option restricts access to users who have an identity in the given domain. This may be configured to be multiple values to allow authentication by multiple identity providers. If more than one domain is allowed, the storage gateway needs to have an identity mapping method configured to decide how to process names from the different identity namespaces. See Identity Mapping Policies for more information.
The value of the --authentication-timeout-mins command-line option defines the timeout (in minutes) after which a user will need to re-authenticate in order to access mapped collections on non high assurance storage gateways or for any data access on high assurance storage gateways. If this is not supplied, the default value of this timeout is 11 days.
The value of the --high-assurance command-line option defines whether the storage gateway manages high assurance data. If it is set, then the authentication timeout is enforced on per application sessions. This option can only be set when the storage gateway is created and is immutable.
Identity Mapping Policies
Globus Connect Server v5.4 supports a flexible system for mapping user identity information in Globus to the local account needed to access data on a variety of storage systems. This includes a default mapping for cases where there is only one allowed domain, as well as pattern-based mappings and callouts to external programs for custom mapping algorithms.
Default Identity to Username Mapping
When Globus Connect Server maps an identity to an account, it retains the entire username by
default, so the username user@example.org
is mapped to the account
user@example.org
.
Custom Identity to Username Mapping
The --identity-mapping command-line option configures a storage gateway to use either an expression based identity mapping or an external identity mapping program. See the Identity Mapping Guide for more information.
The --identity-mapping command-line option can be passed on the command-line with a few different types of data as its arguments:
-
--identity-mapping external:
CMD
-
When mapping a identity to a username, Globus Connect Server invokes the command-line program
CMD
to map the identity. The value of theCMD
string will be parsed as a shell command-line, so arguments may be included if quoted. A full description of the input, output, and arguments to the program are included in Identity Mapping Guide. -
--identity-mapping file:
JSON_FILE
-
--identity-mapping
JSON
-
The
JSON_FILE
argument is a path to a file which contains a JSON document containing the mapping configuration, as described in the Identity Mapping Guide. TheJSON
argument is the json document itself.
User Policies
The --user-allow and --user-deny command-line options control which users may access data on a storage gateway. These operate on the result of the identity mapping, a user name that is in the namespace of storage gateway. This may be a user name, id, or email address based on the storage gateway requirements.
A username is allowed or denied access depending on whether the --user-allow and --user-deny command-line option are set on a storage gateway, and whether the username is present in one or both of those policies. In general, if a username is in the value of --user-deny it is always denied, and if a --user-allow policy is provided the username must be in the policy value in order to be allowed access.
The full set of effects of these policies are contained in the following table:
--user-allow | --user-deny | result |
---|---|---|
member |
- |
Allowed |
member |
not a member |
Allowed |
- |
- |
Allowed |
- |
not a member |
Allowed |
- |
member |
DENIED |
not a member |
- |
DENIED |
not a member |
not a member |
DENIED |
not a member |
member |
DENIED |
member |
member |
DENIED |
Azure Blob Connector Storage Gateway Policies
The Azure Blob Connector has policies to manage application credentials and storage account settings.
Application Credentials
The --ms-client-id and --ms-client-secret command-line options provide information for Globus Connect Server to authenticate with Azure Blob Storage. These values must be configured in order for users to access data on collections created with the Azure Blob Connector.
By default, each user of a Azure Blob collection will authenticate to their own Azure AD account, which must have been granted permission to access blob storage via Azure’s Role-Based Access Control. The user’s Azure AD account must match the username mapped from their Globus identity, unless the --ms-allow-any-account command-line option is set.
Alternately, you can use --azure-credential-type to use the --ms-client-id and --ms-client-secret values as service principal credentials. When using service principal auth, all users of the collection will access the storage using those service credentials.
You will also need to know your Microsoft tenant ID, Azure Storage account name, and whether or not Azure Data Lake Gen2 hierarchical namespace is enabled on the storage account.
These are configured after registering the application with Microsoft as described in the Azure Blob Connector configuration guide.
Data Access Policies
The --restrict-paths command-line option controls access to subtrees of the data provided by the storage gateway. This is configured using the PathRestrictions document type.
Path restrictions provide a framework for administrators to constrain data access on the storage gateway. Restrictions can be set at the folder level. They may allow read, write, or deny access to data. These are absolute paths from the root of the storage gateway virtual file system.
Network Use Policies
The command line option --network-use alters the network performance and scalability parameters used by collections created using this storage gateway, overriding the default behavior set on the endpoint. This feature can only be used with a Globus subscription.
If the network use is set to custom
, then all of the
--preferred-parallelism, --max-parallelism, --preferred-concurrency, and
--max-concurrency options must also be set. See the
network use section
of the installation guide for a description of what these values mean.
If the network use is set to null
, then the default behavior is restored, and
collections use the settings on the endpoint.
OPTIONS
- -h, --help
-
Show help message and exit.
- --version
-
Show the version and exit.
- -F, --format "text"|"json"
-
Output format for this command. If the format is json, then the resulting role document is displayed.
- --use-explicit-host IP_ADDRESS (new in 5.4.23)
-
IP address of the GCS node to use for this request. If not specified, any available GCS node in the endpoint will be used.
- --user-deny username
- --user-deny file:PATH (new in 5.4.79)
-
Connector-specific username for a user denied access to this Storage Gateway. Give this option multiple times to deny multiple users. Set a value of "" to clear this value. If the parameter value begins with
file:
, read the input file path and parse as one or more lines of a whitespace delimited list of usernames to deny access to this storage gateway. - --user-allow username
- --user-allow file:PATH (new in 5.4.79)
-
Connector-specific username for a user allowed access to this Storage Gateway. Give this option multiple times to allow multiple users. Set a value of "" to clear this value. If the parameter value begins with
file:
, read the input file path and parse as one or more lines of a whitespace delimited list of usernames to allow access to this storage gateway. -
--identity-mapping external:
CMD
-
--identity-mapping file:
JSON_FILE
|JSON
-
Identity Mapping configuration for use in this Storage Gateway. You can use JSON input to specify a complete mapping document, or, if you want to use an external command for mapping, use external:command --arguments. Give this option multiple times to set multiple mappings in order of precedence. Set a value of null to clear this value.
-
--restrict-paths
JSON
| file:JSON_FILE
-
Path restrictions for accessing data on collections created using this storage gateway.
- --domain DOMAIN
-
Allowed domain. Give this option multiple times to allow multiple domains. Users creating credentials or collections on this storage gateway must have an identity in one of these domains.
- --authentication-timeout-mins INT
-
Timeout (in minutes) during which a user is required to have authenticated in a session to access this storage gateway.
- --high-assurance
-
Flag indicating that High Assurance features are required on this storage gateway. This can only be set on create and is immutable.
- --mfa / --no-mfa (new in 5.4.21)
-
Flag indicating that access to collections on this storage gateway require that the user has authenticated with multi factor authentication using an identity in the allowed_domains. Only usable on high assurance storage gateways.
- --network-use "normal"|"minimal"|"aggressive"|"custom"|"null" (new in 5.4.76)
-
Set storage gateway network use. If custom, the storage gateway’s max and preferred concurrency and parallelism must be set. If this is set to a non-null value, then all collections which use this storage gateway will use this value instead of the endpoint’s network use setting.
- --preferred-parallelism INTEGER
-
Set the storage gateway’s preferred parallelism; requires
--network-use=custom
- --max-parallelism INTEGER
-
Set the storage gateway’s max parallelism; requires
--network-use=custom
- --preferred-concurrency INTEGER
-
Set the storage gateway’s preferred concurrency; requires
--network-use=custom
- --max-concurrency INTEGER
-
Set the storage gateway’s max concurrency; requires
--network-use=custom
- --ms-client-id
CLIENT_ID
-
The client ID of the Microsoft application that Globus Connect Server will be accessing the Azure Blob resource as.
- --ms-client-secret
SECRET
-
The application secret associated with the client_id.
- --ms-tenant
TENANT_ID
-
Microsoft Tenant ID. Required when the Microsoft application is configured for single-tenant mode.
- --ms-allow-any-account (new in 5.4.65)
-
Allow users to access any Azure AD account. If disabled (the default), users may only access Azure AD accounts which match the account their identity maps to. When using OAuth user authentication, this option can be enabled to allow users to access personal or external accounts without custom identity mapping.
- --azure-credential-type "user"|"service_principal"
-
Set the type of credential used for authentication to Azure. 'user' requires the user to register a credential by logging in to Azure AD. 'service_principal' uses the configured client-id and client-secret values to authenticate as a service principal for all users.
- --azure-storage-account
ACCOUNT_NAME
-
The Azure storage account that owns the containers to be accessed.
- --azure-adls, --azure-no-adls
-
Indicate whether Azure Data Lake Gen2 hierarchical namespace is enabled on the storage account.