Last Updated: May 12, 2016
About This Document
This document provides information for an organization’s administrators on configuring Globus Data Publication. The operations described here must be carried out before any users can submit data to the service. Users only interested in submitting data for publication need not familiarize themselves with the content of this document. The companion User’s Guide provides the information they need.
When access to Globus Data Publication is setup for an organization a new community will be created and administration rights for that community will be granted to members of a Globus group. From that point forward, members of this administrative group will perform all management and configuration operations. These administrators are able to create and update collections within the community and assign different Globus groups to different roles within a collection or community.
Prior to working with Globus Data Publication, a number of setup operations must be performed using various Globus capabilities. Detailed description of how to perform these functions is outside the scope of this document, so links to related documents are provided.
A suitable server for storing the publication data must be setup. This server must have Globus Connect Server installed and must have Sharing enabled. Access to the necessary sharing capability requires a subscription.
A shared endpoint must be created and the Globus user named
email@example.com be granted Access Manager role on the shared endpoint. This shared endpoint will be configured as the storage location for the collection as described below.
Globus groups must be created to grant users capabilities when using the collection. Group invitations should be sent to the appropriate users, and for each group, the Globus user named
firstname.lastname@example.org be invited to the group granted and administrator or manager role. The groups to be created are as follows:
Submit: The group of users who are allowed to create new entries in the collection.
Readers: The group of users who will be able to view and search for entries in the collection.
Curators: The group of users responsible for performing the curation workflow for the collection.
For any collection that does not wish to restrict access to any of these capabilities, it is not necessary to create the corresponding group. Groups may be re-used across collections in the case where the same users will be performing the same role for multiple collections. Assignment of the groups to the roles in the collection is described below.
2. Creating and Configuring a Collection
The process of creating a new collection starts on the home page of the community in which the collection will reside. The community home page can be found using the "Communities & Collections" link on the dashboard or by entering the community name in the search bar. When an administrator for the community accesses this page, additional options are presented on the right side of the page.
The "Configure…" button allows descriptive information about the community to be updated. Community configuration also allows Persistent Identifier configurations to be created which may be used by all collections within the community. Configuration of persistent identifiers is described below for collections, but is the same at the community level.
The "Create collection" button launches the collection creation interface which is a single page for configuring all aspects of a collection. The same configuration options for a collection can be accessed by a community or collection administrator by selecting the "Configure…" button on the "Admin Tools" panel of the collection’s home page.
2.1. Entering descriptive information
The "Collection’s Metadata" pane provides input boxes for descriptive text regarding the collection. Of these, only the collection name is required. This is the name which will be displayed in other interface elements such as collection selection during submission. The "Short Description" field will be displayed on pages that list all collections so it can be used to provide some additional descriptive text to help users identify the collection when browsing. The next two fields: Introductory text and Additional descriptive text will be displayed on the home page of the collection. The introductory text appears in a highlighted area along with the title of the collection on the home page. HTML formatting tags are permitted in this text box, so this can be used to provide more detailed and highly customized description and presentation of the collection. The additional description appears above the list of datasets in the collection and is intended to provide more detailed information describing the collection. The final two fields present license information at different stages of interactions with the datasets in the collection. The first field, "Publication License," is presented to the user for approval during the submission workflow. Anyone submitting to the collection will be required to accept this license. The next field, "Distribution License," will be presented to users at the time they choose to access data stored in a dataset in the collection.
2.2. Configuring Storage
In the "Dataset Storage Settings" pane, the endpoint and root path where dataset data will be placed is entered. The endpoint must be specified by providing the unique identifier (UUID) for the endpoint. The UUID can be viewed by searching for the endpoint, and expanding the desired endpoint’s entry on this page. As mentioned above, the endpoint must have Administrator or Manager rights for the Globus user
email@example.com. The "Collection Root Path" provides a path on that endpoint which is the base from which dataset folders will be created.
If this path does not exist, it will be automatically created. The "Test Setting" button will perform a test both that the
firstname.lastname@example.org user has the necessary permissions on the endpoint and that directory creation works properly. If either of those operations fails, an error will be presented in this pane.
2.3. Configuring Persistent Identifiers
A particularly important task when configuring a community or collection is setting up the Persistent Identifier (PID) configuration. PID configurations can be created on both communities and collections. When a configuration has been set on a community, all collections within the community have access to that configuration. This provides a convenient method to perform these settings once and then re-use them across many collections within the community. PID configurations set on a collection are also saved so the collection can be easily re-configured to re-use a previous configuration should changes be needed during the use of the collection. PID configurations are managed through the "Identifier Settings" pane.
The "Configuration" dropdown list provides options for selecting any previously configured PID settings or for creating a new configuration with any of the providers implemented by the service. The list of provider configurations to select from includes those previously configured on the collection or a parent community. If a configuration is selected which was defined on the collection or community which is being configured and which is not in use by any child collection, the button "Delete Configuration" will be enabled.
The dropdown also provides options for creating new PID configurations of any of the types supported by the service. When one of these options is selected, the pane will expand with prompts for the PID-specific options needed for configuration. The first field will always be a name for the new configuration. This is the name which will be displayed in the dropdown list in any further display of the Identifier Settings pane for this collection or any child collections if configuration is done on a community.
2.3.1. Configuring an EZID PID Provider
The EZID service provides a method for creating Digital Object Identifiers (DOIs) which are intended to be permanent references to a data resource. EZID requires creation of an account with the EZID service. It is a good practice to use the EZID functionality to create delegated credentials which can be entered here allowing the Globus Data Publication service to use EZID to create DOIs on behalf of the organization owning the collection.
After entering a name for the configuration, the username and password
credentials for the EZID account to be used by Globus Data Publication
are entered. The "Publisher name" will be used to identify the
publishing organization in the DOI created via EZID (formally, this
value will be stored in the
datacite.publisher field). The "Shoulder"
is a value assigned to the account by EZID and forms part of the URL
generated for the identifier. The default value
10.5072/FK2 is for the
testing "sandbox" operated by EZID. It should be changed unless the
entire configuration including the credentials being used is intended
for use with the sandbox. The "Resolver Base URL" also forms part of the
final URL being generated for the PID and represents the root service
which will be used to resolve the identifier in the future. This is the
host which will perform the lookup of the identifier and re-direct the
client to the dataset’s
landing page on
Globus Data Publication. It will
be rare to make changes to this value, but other resolution services are
possible so the option to change it is provided.
2.3.2. Configuring a Handle PID Provider
The Handle System is a general purpose identifier resolution system which is commonly used for providing stable URLs which can be redirected to other resources throughout their life-cycle. Use in data publication and preservation systems is common. To begin using the Handle service, it is necessary to register an account with CNRI. When registration is complete, a handle prefix will be assigned. Additionally, as part of configuration a key-pair will be generated for performing administrative operations including creating new Handle entries. Configuring a server and creating these key-pairs is described in the Handle Documentation. To use the handle system a handle server which owns the assigned prefix must be running. An organization that wishes to use Handle but which cannot operate a handle server can contact us to discuss options for using the Globus-operated handle server to host their prefix.
After providing a name for the newly created configuration, the first
field to be filled in is the prefix assigned by CNRI during
registration. The value in the "Namespace" field will be included in
every handle generated helping to identify handles generated by a
particular configuration. The "Administrative
Private Key" and the "Private Key Passcode" are generated by the
administrator of the handle prefix. The private key is typically
stored as a binary file, but it must be converted to a
representation for upload on this form. This can be done using a
command-line utility such as
base64 to create the required
string. The data placed in this field of the form should not
contain extra characters and should not have any carriage returns
embedded or entered at the end of the form. When generated during
configuration of the handle server, these administrative keys are
assigned an "Authorization Index" which also is to be entered on the
form. Following the default configuration process, this index will be
300, so that value is provided as a default here. The "Revolver base
URL" will be part of the final URL generated for the PID. It can
reference any handle server which operates the HTTP based resolution
service. Typically, this will use the root resolution service located
http://hdl.handle.net as provided in the default, but other
values entered here will be used in the URL generated by the service
for the identifier.
2.4. Configuring the Forms and Workflows
The "Workflow Settings" pane is used to define what forms and workflow steps will be used by the collection. The "Input Form" dropdown lists the available forms which can be configured for use during the submission workflow. By default, the forms listed will conform to the three levels of information defined by Datacite for DOI registration. The three pre-defined forms and their content are:
Publication Year: A date associated with the dataset’s publication containing at least the year, but also month and day if desired.
Language: The primary language of any text content.
Publisher: The organization credited with publishing the dataset.
Datacite Mandatory + Recommended:
All of the above
Subject Keywords: Summarizing words primarily intended to enable easier discovery and search for the dataset.
Description: Open text describing the dataset.
Resource Type: A classification for the type of data contained in the dataset.
Contributors: A group of individuals or organizations who contributed to the creation of the dataset. The contributors role as well as their identify are specified.
Related Identifier: Identifiers of other datasets or uniquely identifiable entities which are related to the dataset. The relation type as well as the identifier text may be specified.
Datacite Mandatory + Recommended + Optional:
All of the above
Size: An indication of the size of the dataset. Values and units which are appropriate for the dataset may be specified.
Format: The technical details of the file type or other details about the content of the dataset.
Version: An identifier differentiating this dataset from other iterations of the same dataset which may previously have been published.
Rights: The rights associated with submission or distribution of the dataset. Typically referencing standard licensing terms such as Creative Commons levels.
Rights URI: Many standard licenses identify their license by specific URI representations. In combination with the
Rightsfield, this field can uniquely identify the rights associated with a dataset.
Description: Additional descriptive fields along with the type of description being applied. The various description types are selected from a controlled list.
More complex, customized forms can be created with further consultation with the Globus team. If any customized forms have been configured for your use, they will appear on this list as well.
The "Submission Workflow" can also be customized to create different ordering of the submission steps or inclusion or exclusion of desired steps. Creation of new workflow steps or alteration of the default workflow settings requires additional agreement with the Globus team. Like the forms, if any such customized workflows have been created, they will appear in this list.
The "Curation Type" sets the curation options for the collection. As discussed in the section on curation, curation may include simply reviewing the information entered by the submitting user ("Accept/Reject") or may allow the curating user to edit the information which has been entered ("Edit Metadata"). Curation can also be omitted entirely so that when users complete a submission it will directly enter the collection.
2.5. Assigning Groups to Roles
All user roles within a collection are mapped to user groups in Globus. Using this method, once groups have been configured, their members may be changed using the Globus Groups Interface. The groups to be configured were enumerated in the Pre-requisites section. The collection specific groups: Submitters, Access to Data and Curation are set using the "Collection Permissions" pane. The Submitter and Access to Data group can be set to allow "All Users." When submitters is set to all users, any user logged in to the service will be allowed to submit to this collection. The collection will appear on the list of available collections for any user who selects "Start a New Submission" on the Dashboard.
When Access to Data is set to all users, then any web user, regardless of whether they are logged in to the service will be allowed to view the landing page for datasets in the collection and these datasets will be visible in search and browsing results. For collections containing publicly citeable or accessible datasets, it will be common to set Access to Data to "All Users."
When either of the submit or access to data group are set to "Restricted to Group…" a button will appear to "Select" (the first time) or "Change" the group associated with this role. This button is always present for the "Curation Group." Choosing this option will navigate to a page where any group within Globus can be searched for and selected. Upon choosing the group and pressing the Select button on that page, the browser is re-directed back to the collection configuration page. The selected group name will appear in the "Collection Permissions" pane next to appropriate group.