Globus Automation Overview
The Globus automation platform provides tools and services which can be used to create reliable, easily-repeatable processes for research data management. The platform builds on key Globus services like Authorization and Data Transfer.
The automation platform introduces a few key concepts which may then be extended and combined to create custom processes solving particular research data management problems. These concepts are action providers, actions, and flows. Read on to learn about how flows can orchestrate action providers together in order to create actions that perform the actual automation.
Use Cases
The key to the platform is enabling users to orchestrate multiple processing steps into a single workflow, or flow. Some of these steps are provided by Globus and others of which may be custom implementations supporting a specific need. Examples of these workflows might be:
-
Automatically detect data output from scientific instruments which is then transferred, processed, and indexed.
-
Provide a curated pipeline for description, annotation and publication of research datasets.
-
Run data transfers on a recurring schedule.
Action Providers
An action provider is an HTTP accessible service which acts as a single step in a process and implements the action provider interface. When an action provider is invoked, it creates (or "provides") an action which represents a single unit of work. Examples of units of work are running a file transfer using Globus Transfer or ingesting data into Globus Search.
Each action provider expects to be invoked with parameters particular to the service it provides.
To support usability and discovery, each can be introspected to determine what its input schema
or input properties are.
Introspection also provides information such as who operates the action provider, descriptive text on the service it provides, and who can use the service.
Access to action providers and their invocation is controlled via Globus Auth.
Some of these services may be synchronous meaning that an invocation will complete in the context of the HTTP request that triggered it.
Other services support asynchronous activities, meaning that the invocation will persist beyond the HTTP request that invoked it and the caller must monitor the action for updates on when it is completed and its result.
Globus operates a series of these action providers available for public use. For a full list of these action providers, see the hosted action providers documentation. Globus also supports users writing their own action providers via the Globus Action Provider Toolkit - a Python SDK that makes it easy to provide custom services that can be tied into the Globus automation ecosystem.
These action providers form the foundation of the Globus automation ecosystem and are primarily used by referencing their URLs in flows. Globus Flows allows users to flexibly piece together these individual services to create reliable high level workflows.
Actions
An action represents a single, discrete invocation of an action provider. It is record of an operation and includes details for its result, its current execution status, and metadata dictating which Globus Auth identities are allowed to read or modify the action’s state. Globus Flows allows orchestrating these individual actions into robust processes that can tolerate their distinct execution states, including success and failure. Users will not often need to operate on actions directly, rather, the User will start a run of a flow and the run will invoke action providers, creating actions as necessary to accomplish the automation.
Flows
A flow represents a single process that orchestrates a series of services into a self-contained operation. One can think of a flow as a declaratively defined ordering of action providers with condition handling to define expected success or failure scenarios.
A flow may be defined and deployed to the Globus Flows service by any user. When deploying, the user may control which other users can discover the flow and separately, which users can run the flow. All access control is provided by Globus Auth. Thus, flows can easily and safely be shared among users.
It may also be interesting to note that once deployed, the flow will implement the action provider interface. What this means is that a flow is technically a form of action provider, and as such it can be referenced by other flows by its flow URL. This allows for modularity in defining flows and in a separation of concerns where "sub-flows" can be trusted to provide some process or behavior.
When users start a flow, we call that a run. A run shares the action interface, supporting operations such as viewing its status, cancelling its execution, and removing its execution state. This allows for common tooling and terminology for working with runs and actions. In general, any operation available on an action will be possible on a run and vice versa.
Globus Flows imposes no restrictions on how long a run may execute or on the number of units of work defined in a flow. We support long-lived runs by providing monitoring and status updates.
Authentication and Authorization
All interactions with Globus Action Providers and Globus Flows are authenticated by Globus Auth.
Scopes
Access tokens for Globus Flows must be bounded by one or more of the following scopes:
Name | Scope String | Definition |
---|---|---|
manage_flows |
|
Grants ability to manage flows |
view_flows |
|
Grants ability to view flows |
run |
|
Grants ability to view flows |
run_status |
|
Grants ability to check the status of runs |
run_manage |
|
Grants ability to manage runs |
Flow Roles
Permissions on flows are managed via lists of identities and groups. These lists define which users have a given role on the flow.
The supported roles are:
flow_viewers
-
Users who are allowed to see that the flow exists and read its definition. Users without this permission cannot see that the flow exists.
flow_starters
-
Users who can run this flow. A user without
flow_starters
permissions will receive an error if they attempt to start this flow.flow_starters
have all of capabilities offlow_viewers
. flow_administrators
-
Users who can manage the flow's roles, edit its definition, and alter metadata such as "title" and "description".
flow_administrators
have all of capabilities offlow_starters
. flow_owner
-
The user primarily responsible for maintaining the flow. Other users with
flow_administrators
permissions may assume ownership of the flow. Aflow_owner
has all of the capabilities offlow_administrators
.
Run Roles
A run is an instance of a flow, started by a particular user, at a point in time, and viewable until (and after) completion.
The runner of a flow may be different from the flow's author, so the run has its own roles which are as follows:
run_monitors
-
Users who can view the current state of this run, including the steps which have been executed, the input and output of each step, and whether or not the run has terminated.
run_managers
-
Users who can edit the run's metadata (e.g.
label
andtags
) and cancel the execution of the run.run_managers
have all of the capabilities ofrun_monitors
. run_owner
-
The user who started this run. This role cannot be transferred to another user. A
run_owner
has all of the capabilities ofrun_managers
.
Users with permissions on a Flow are not given any implicit permissions on Runs of that Flow.
If a user running a Flow wants to allow an owner or administrator of the Flow to see their Run, they must explicitly grant that permission.
Role Values
Roles within Globus Flows are primarily specified in the form of Principal URNs.
To formulate a Principal URN, prefix Identity IDs with
urn:globus:auth:identity:
and Group IDs with urn:globus:groups:id:
.
For example:
-
urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c
-
urn:globus:groups:id:fdb38a24-03c1-11e3-86f7-12313809f035
specify an Identity (typically a specific person) and a Group, respectively.
In addition to Principal URNs, two special values are defined by the service for use in roles:
all_authenticated_users
-
All users who have logged in via Globus Auth
public
-
all_authenticated_users
plus unauthenticated access