Premium Storage Connectors
  • Premium Storage Connectors for GCSv5.4
  • ActiveScale
  • Amazon Web Services S3
  • Azure Blob
  • BlackPearl Connector
  • Box
  • Ceph
  • Dropbox
  • Google Cloud Storage
  • Google Drive
  • HPSS
  • iRODS
  • OneDrive
  • POSIX Staging
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Timers Transfer Globus Connect Server Compute Helper Pages
  • Applications
    Globus Connect Personal Globus Connect Server Premium Storage Connectors Compute Command Line Interface Python SDK JavaScript SDK
  • Guides
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Premium Storage Connectors
  3. Premium Storage Connectors for GCSv5.4
  4. HPSS

HPSS Connector

The HPSS connector can be used for accessing and sharing of data on an HPSS storage system. The connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.

This document describes the steps needed to install an endpoint and the HPSS connector needed to access the storage system. This installation should be done by a system administrator, and once completed, users can use the endpoint to access HPSS storage via Globus to transfer, share and publish data on the system.

Table of Contents
  • Preinstallation Checklist
    • Review Release Notes
    • Supported HPSS Versions
    • Recommended HPSS Patches
    • HPSS Configuration Options
    • Note on Kerberos Configurations
    • Required HPSS Files
    • Configure Local User Accounts
    • Configure HPSS Credentials
    • Verify Basic Operations Via Scrub
  • Installation
    • Upgrading from Version 2.8 and Earlier
    • Installing the HPSS Connector
    • Creating HPSS Storage Gateways
    • Creating HPSS Collections
  • Log Collection
    • HPSS Connector Standard Logging
    • HPSS Connector Debug Logging
    • HPSS API Debug Logging
  • Mailing List
  • Appendix A: Troubleshooting
  • Appendix B: Performance

Preinstallation Checklist

In order for the HPSS DSI to function properly on the HPSS client node, please verify the following items.

Review Release Notes

Recently discovered issues and workarounds will be documented in the GitHub repository prior to inclusion in this document. See the repo Readme for details.

Supported HPSS Versions

This connector supports HPSS versions 7.3 and newer. However, we are limited to only HPSS configurations that we have access to for building RPMs and verifying releases.

Recommended HPSS Patches

These HPSS issues severely impact performance so the patches are highly recommended.

BZ2819 - PIO 60 second delay impacts small file performance. There is a small percentage chance that, after a transfer completes, HPSS PIO will wait 60 seconds before informing the client that the transfer has completed. This fix has been implemented in 7.3.3p9, 7.3.4, 7.4.1p1 and 7.4.2.

BZ2856 - Enabling HPSS_API_REUSE_CONNECTIONS returns address already in use. This one sets a limit on how many active connections we can have. GridFTP and HPSS make considerable use of ephemeral TCP ports. Quick, successive file transfers can lead the system to run out of available ports. There is no fix for this HPSS issue at this time. The number of ephemeral ports can be increased and the amount of time a socket spends in timed wait can be decreased to help avoid this issue.

BZ7772 - PIO 5 second delay impacts small file performance. There is a high percentage chance that, after a transfer completes, HPSS PIO will wait 5 seconds before informing the client that the transfer has completed. This greatly impacts the performance of file retrieves and checksum operations. This fix has been implemented in 7.5.3+.

BZ7883 - Prevents successful transfers of files over 4GiB on HPSS versions 7.5.2+. Due to what appears to be a transfer length calculation error, transfer of files larger than 4GiB generate an EIO error at the 4GiB mark and the transfer terminates. This bug impacts all HPSS clients using the HPSS PIO interface. Upgrade to HPSS 7.5.2u5 / HPSS 7.5.3u1 to resolve this issue.

BZ10852 - Interrupted or canceled Transfer tasks result in lingering GridFTP processes. This fix will be released in HPSS 8.3+.

BZ11137 - Prevents the HPSS Connector from reauthenticating to HPSS when the storage gateway configuration changes. Without this patch, changes to the HPSS storage gateway options require a restart of the GCS Manager process.

HPSS Configuration Options

These options are generally set in /var/hpss/etc/env.conf and affect operation of data transfers. Some of these options may be required depending upon your configuration.

HPSS_API_HOSTNAME

This option selects the network interface used for data transfers between the Globus services you are configuring and the HPSS mover machine(s). If this is unset, data transfers use the default network interface. This option is generally necessary on multihomed nodes. It should be set on the node running GridFTP.

MVR_CLIENT_TIMEOUT

This controls the amount of time before a mover process will stop waiting for data from the Globus service in order to reclaim network resources. Default is 15 minutes. In very large file transfers, it is possible that movers may timeout before the transfer reaches data offsets which those movers are responsible for. This option is set on the mover nodes.

Note on Kerberos Configurations

Kerberos must be configured for access to the proper Kerberos realm that contains HPSS. This file is usually kept in /etc/krb5.conf. You may need to enable the allow_weak_crypto option in the [libdefaults] section if the DSI module can not talk to the HPSS servers.

Required HPSS Files

The following HPSS files located in /var/hpss/etc are known to be required for operation of the HPSS DSI:

  • auth.conf

  • authz.conf

  • env.conf

  • ep.conf

  • group

  • HPSS.conf

  • hpss.keytab (or hpss.unix.keytab)

  • ieee_802_addr

  • passwd

  • site.conf

Configure Local User Accounts

When a user accesses HPSS via Globus, home directory lookups and translations between usernames and user IDs are performed using the OS Name Service (ex /etc/passwd, ldap, nis, etc). The HPSS password file (ie HPSS_UNIX_AUTH_PASSWD) is not used by Globus. This has a direct impact on authentication and file access. Verify that HPSS users have the same UID on the local system and within HPSS. For example, given an HPSS user hpssuser1, the UID and GID returned from the following command should match the UID and GID of the same account within HPSS:

$ getent passwd hpssuser1
hpssuser1:x:12345:1000:HPSS User:/home/hpssuser1:/bin/bash
Warning

It is not necessary, and discouraged, to grant HPSS users the capability to log directly into the HPSS client node (ex using SSH).
Note

Although Globus does not make use of HPSS_UNIX_AUTH_PASSWD, the HPSS client API implementation does use the HPSS passwd file during the authentication process. See the required HPSS files.

Configure HPSS Credentials

Globus uses the account hpssftp to access HPSS initially then changes user ID to the authenticated HPSS user (ie hpssuser1). This removes the need to maintain per-user keytab files on the HPSS client node. However, this requires that the Globus process have access to the hpssftp keytab entry during the authentication phase which runs under the authenticating user’s UID.

Assuming the keytab for hpssftp is stored in /var/hpss/etc/hpss.keytab:

$ chmod 644 /var/hpss/etc/hpss.keytab

HPSS installations configured for Kerberos authentication must also allow non privileged users write access to HPSS temporary kerberos ticket cache, typically /var/hpss/cred:

$ chmod 1777 /var/hpss/cred
Note

The hpssftp keytab file must not be exposed to unprivileged users. Prevent local shell access by non privileged HPSS users (ex. PAM).

Verify Basic Operations Via Scrub

As a non privileged HPSS user on the local node, verify that the local account is able to authenticate successfully to HPSS as hpssftp. For example:

$ /opt/hpss/bin/scrub -a krb5 -p hpssftp -k -t /var/hpss/etc/hpss.keytab
scrub> quit

As a non privileged HPSS user, log into HPSS and perform some basic directory and file operations. Unlike the previous step, make sure these operations are performed as a non privileged HPSS user:

$ /opt/hpss/bin/scrub
/hpss/home/testuser1
scrub> mkdir testdir
scrub> rmdir testdir
scrub> open testfile wc
File created using COS 1 (Small File COS)
scrub> write 5k
.done (144.981 KB/sec)
scrub> close
scrub> unlink testfile
scrub> quit

Installation

This section will explain how to install and configure the HPSS connector. Your Globus Connect Server endpoint must already be installed. See the Globus Connect Server v5.4 Install Guide

Upgrading from Version 2.8 and Earlier

As of version 2.9, the HPSS Connector is installed from RPM instead of building from source. Because of this, several changes to the system for installation of previous versions need to be reversed so that they do not conflict with the RPM installation. This includes:

/etc/gridftp.d/hpss

Make sure this file does not exist. Also remove any other files within the same directory that supply configuration options for hpss_local.

/etc/ld.so.conf.d/gridftp_hpss_dsi.conf

Remove this file if it exists so that the GridFTP server can find the new HPSS DSI in /usr/lib64/. Be sure to run ldconfig to update system paths.

/var/hpss/etc/gridftp_hpss_dsi.conf

Delete or move this file to save it to avoid conflict with the RPM-managed version of the file.

Installing the HPSS Connector

The HPSS Connector is installed by RPM in order to simplify installation and enforce version requirements with requisite software. Unfortunately, Globus does not have access to all HPSS and RHEL version combinations to provide RPMs for all installations.

Globus Connect Server v5.4 support was added in version 2.17 of the HPSS Connector. Prior versions of the connector are not compatible with GCSv5.4.

Note

If you are interested in supporting this development by providing access to a HPSS build environment, please email support@globus.org.

Visit the release page and find the latest release. Note that release candidates are also available from this page and are indicated with the Pre-release badge and have a Release Candidate name. For production use, make sure to choose the most recent official release which will have the Latest Release badge.

Go to the Assets section under the latest release. When possible, RPMs for supported platforms will be available for your convenience. If an RPM is not available, goto the Git repo Readme for the latest instructions on how to build the RPM from source.

The RPM has a naming scheme of globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm.

globus-gridftp-server-hpss

package name

-7.5-

This package is for HPSS version 7.5.X

-2.9-1

This is the first release of this connector version 2.9

.el7.

This package is for RHEL 7.X.

+gcsv5

This package is for Globus Connect Server

Download the RPM and matching .asc file which will allow you to verify that the RPM has not changed since creation. Using a recent version of gpg (>= 2.0), import the public key used for signing:

$ gpg --keyserver hkp://keys.openpgp.org --recv-keys 1EA106A24003C353
gpg: requesting key 4003C353 from hkp server keys.openpgp.org
gpg: key 4003C353: "Jason Alt <jasonalt@globus.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1

And then verify the downloaded RPM:

$ gpg --verify globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm.asc globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm
gpg: Signature made Wed 06 Nov 2019 09:45:26 PM UTC using RSA key ID 4003C353
gpg: Good signature from "Jason Alt <jasonalt@gmail.com>"
gpg:                 aka "Jason Alt <jasonalt@globus.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: C36C 826C 18ED 73C3 38DC  FA53 1EA1 06A2 4003 C353

And finally, install the downloaded RPM using YUM:

$ sudo yum install ./globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm

Creating HPSS Storage Gateways

An HPSS Storage Gateway is created with the command globus-connect-server storage-gateway create hpss, and can be updated with the command globus-connect-server storage-gateway update hpss.

Before looking into the policy options specific to the HPSS Connector, please familiarize yourself with the Globus Connect Server v5 Data Access Guide which describes the steps to create and update a storage gateway, using the POSIX connector as an example. The commands to create and update a storage gateway for the HPSS Connector are similar.

Note

Until HPSS bug BZ11137 is resolved, updating HPSS policy options on a storage gateway or updating the HPSS Connector packages requires restarting the GCS Manager process for the changes to take effect.

HPSS Connector Storage Gateway Policies

The HPSS Connector has the following policies to configure how it accesses HPSS.

HPSS Authentication

When a Globus user authenticates to a HPSS endpoint, the HPSS Connector will first authenticate to HPSS using either Kerberos or Unix authentication and the hpssftp user credential stored on the local node. Then the hpssftp user session will change user ID to the HPSS account that corresponds to the Globus user. In this way, Globus users can authenticate to HPSS without requiring a local HPSS key file for each user.

The authentication_mech and authenticator properties are used to control how the HPSS Connector authenticates to HPSS. authentication_mech sets the type of HPSS authentication to use and accepts either krb5 or unix. authenticator specifies the file containing an HPSS credential for the hpssftp user. The authenticator property accepts a value of the form: auth_keyfile|auth_keytab:<path_to_file>. This file must exist on all nodes in the endpoint.

Store File Checksums in User Defined Attributes

The HPSS Connector can optionally store MD5 checksums in the User-Defined Attributes (UDA) for use later with the sync-by-checksum transfer mode. This allows the HPSS Connector to report the checksum of an existing file without recalling the file from tape. This behavior is controlled by the uda_checksum property which accepts either true or false.

Creating HPSS Collections

An HPSS Collection is created with the command globus-connect-server collection create, and can be updated with the command globus-connect-server collection update.

As the HPSS Connector does not introduce any policies beyond those used by the base collection type, you can follow the sequence in the Collections Section of the Globus Connect Server v5 Data Access Guide.

Log Collection

New in version 2.14.

The HPSS Connector provides three methods for collecting log messages for monitoring and debugging. Of these three methods, only HPSS Connector Standard Logging is recommended for production use. The other two methods are reserved for troubleshooting.

HPSS Connector Standard Logging

By default, the HPSS Connector will send log messages of severity INFO, ERROR and WARN to the standard GridFTP server log file, typically located at /var/log/gridftp.log. You can enable the collection of these messages using the standard GridFTP server logging option log_level. You can override the default log settings by creating /etc/gridftp.d/hpss_logging and setting these configuration options:

log_single /var/log/gridftp.log
log_level ERROR,WARN

Setting the log_level value to include any of the following severities will allow you to collect messages of that type.

ERROR

These messages require immediate action by the endpoint administrator. In the case of an error message, the connector has encountered a fatal condition and functionality is severely limited. These conditions typically impact all users of the connector. For example, the ERROR severity is used for configuration options that prevent users from using the system.

WARN

This message implies that a condition has occurred that the endpoint administrator should be aware of but the connector has recognized the condition and worked around it in order to provide functionality. In these cases, functionality may be limited but overall transfer operations will succeed. For example, the WARN severity is used to alert the endpoint administrator that the HPSS installation is missing recommended HPSS patches.

INFO

This severity is useful for gathering more information about the activity of GridFTP connections. It is used to notify the endpoint administrator of the particular commands used during a GridFTP session. For example, INFO is used to record directory listing and file transfer events.

HPSS Connector log messages with have the format:

[22825] Tue Sep  1 13:18:09 2020 :: [HPSS Connector][INFO] User=johndoe TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41 :: Getting attributes of /home/johndoe/10M.dat
[22825]

The process id.

Tue Sep 1 13:18:09 2020

Current date.

[HPSS Connector]

Tag to indicate that this message is from the HPSS Connector.

[ERROR]

The log message’s severity.

User=johndoe

The authenticated user. This field is not always available.

TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41

The associated Globus Transfer task ID. This field is not always available.

The remainder of the message includes the field separator :: followed by the actual HPSS Connector log message.

HPSS Connector Debug Logging

The HPSS Connector can be configured to log additional debug information. This logging mechanism is independent of the GridFTP server’s log file used with standard logging and is designed for use in resolving difficult issues by tracing interactions with the HPSS API. This logging interface is disabled by default. It is recommended that this remain off during production because of its verbosity.

See /etc/gridftp.d/hpss_debug for details on debug logging.

HPSS API Debug Logging

You can optionally enable the HPSS API to log additional debug information to its own log file. This may be useful in conjunction with debug logging when resolving interactions with HPSS. This option should remain disabled during production use.

See /etc/gridftp.d/hpss_debug for details on HPSS API debug logging.

Mailing List

Releases, upcoming features and discussions take place on the mailing list: hpss-discuss@globus.org

Join the List Here

Appendix A: Troubleshooting

Below are some common issues encountered while using the Globus Transfer service with an endpoint running the HPSS connector along with possible resolutions to each problem.

Could not find home directory for <user>. Failed to log into HPSS. Error code is -5

This error message implies that the HPSS policy options on the HPSS Storage Gateway are incorrect. Update the HPSS Storage Gateway using globus-connect-server storage-gateway update hpss and restart the GCS Manager process.

Async Stage Requests Cause Red-Ball-of-Doom

Recent changes to make use of the async stage request API for HPSS in order to avoid inundating the core server with duplicate stage requests has exposed a deficiency for the DSI use case of HPSS. The HPSS async stage API expects the call to be available long term in order to receive stage completion messages. However, the GridFTP/DSI use case is a short-lived transient environment; the GridFTP process can not wait minutes/hours/days for stage completion messages. Users of DSI versions 2.6+ will see the impact as a red-ball-of-doom indicator in the HPSS GUI console. The warning is innocuous and can be ignored. IBM is aware of this issue and a change request has been created.

As a work around, users of 2.6 should update to 2.7 and all users of 2.7+ can use the blackhole sync method. This configures nc (netcat) to listen for stage completion messages intended for the DSI and discard whatever it receives. nc should be launched on a highly-available server reachable by the HPSS core servers (preferably run it directly on the core servers). Choose a port to use for receiving callback notifications on and run this command:

admin@hpss-core $ nc -v -v -k -l <port>

Once nc is running, add this to /etc/gridftp.d/hpss_issue_35 on the GridFTP nodes running the HPSS DSI:

$ASYNC_CALLBACK_ADDR <host>:<port>

Login Failure: No such file or directory

This error message indicates that hpss_LoadDefaultThreadState() has returned ENOENT causing the login procedure to fail. This is occurs when the UID of the authenticating user as known to the GridFTP process does not match the user’s ID as known by HPSS. See Local user accounts must match user accounts in HPSS.

Command Failed: Error (login)
Endpoint: xxxx
Server: xxxx
Message: Login Failed ---
Details: 530-Login incorrect. : GlobusError: v=1 c=PATH_NOT_FOUND\r\n530-GridFTP-Errno: 2\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: No such file or directory\r\n530 End.\r\n

Login Failure: Operation not permitted

This error message indicates that hpss_SetLoginCred() failed with EPERM during the login procedure. This step in the login process accesses the keytab defined in AuthenticationMech so that the DSI can connect to HPSS as user LoginName. The error value indicates that the GridFTP process was unable to access the keytab file. See hpssftp credentials must be accessible by local unprivileged accounts.

Command Failed: Error (login)
Endpoint: xxxx
Server: xxxx
Message: Login Failed ---
Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 1\r\n530-GridFTP-Reason: System error in hpss_SetLoginCred()\r\n530-GridFTP-Error-String: Operation not permitted\r\n530 End.\r\n

Login Failure: Invalid argument

If you receive this message, it is likely that /var/hpss/etc/site.conf is invalid.

 Error (login)
 Endpoint: XXX
 Server: XXX
 Message: Login Failed
---
Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 22\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: Invalid argument\r\n530 End.\r\n

Transfer Error: Operation timed out

Large file transfers to/from HPSS tend to span multiple sets of HPSS mover processes. Each set is responsible for a large contiguous chunk of the file transfer. First set transfers offsets 0-N, second set transfers (N+1)-M, and so on. These mover sets are all initialized at the beginning of the transfer.

Any mover will timeout after MVR_CLIENT_TIMEOUT seconds (defaults to 15 minutes). If a mover set does not start the transfer within this timeout, the entire transfer aborts. This is an HPSS issue, not a DSI issue.

This error condition is usually obvious from the following errors issued in MVR_CLIENT_TIMEOUT seconds + 5 minute intervals. See MVR_CLIENT_TIMEOUT for more details.

 2019-06-12 14:29:39
 Error (transfer)
 Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec)
 Server: XXXX:2811
 Command: STOR ~/scratch_backups/XXXX
 Message: The operation timed out
---
Details: Timeout waiting for response
 2019-06-12 14:49:47
 Error (transfer)
 Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec)
 Server: XXXX:2811
 File: /~/scratch_backups/XXXX
 Command: STOR ~/scratch_backups/XXX
 Message: Fatal FTP response
---
Details: 451-GlobusError: v=1 c=INTERNAL_ERROR\r\n451-GridFTP-Errno: 5011\r\n451-GridFTP-Reason: System error in hpss_PIOExecute\r\n451-GridFTP-Error-String: \r\n451 End.\r\n

Appendix B: Performance

GridFTP installations benefit from and take full advantage of classes of service that use fixed length classic style allocation. In short, you’ll get the best performance from the GridFTP interface (actually any HPSS interface) if the segment count is below 32.

HPSS has multiple disk/tape allocation algorithms used to allocate space for incoming data. Fixed length allocation gives you equal size chunks to store data in. This was deemed wasteful because the last block was most certainly never filled. Variable length allocation was created to solve this problem; it will give you increasingly larger segments as data is stored and truncates the last block. This is a win for most situations when HPSS is unsure how much data is to be stored for the given file.

Using either of these allocation mechanisms (any variable length allocation or fixed w/o knowing the file size), HPSS is free to continue to allocate segments until all the data is stored. This has a definite performance impact because internally HPSS retrieves data in 32-segment chunks. This means when you request a file from HPSS, internally it breaks it up into multiple transfers, each of which is ⇐ 32 segments. Functionally, this is transparent to the client. In terms of performance, the client will see a high load followed by a pause followed by a high load, etc.

In order to avoid the performance hit, you can use fixed length allocation with segment counts < 32 and take advantage of the fact that any WELL-BEHAVED GridFTP client will inform HPSS of the size of the incoming file before the transfer begins. In fact, the DSI is designed to require this. If a GridFTP client is not well behavad, the DSI will act as though a zero length transfer is about to occur and will handle it as such. So you’ll know if the client is not doing the right thing.

  • Premium Storage Connectors for GCSv5.4
  • ActiveScale
  • Amazon Web Services S3
  • Azure Blob
  • BlackPearl Connector
  • Box
  • Ceph
  • Dropbox
  • Google Cloud Storage
  • Google Drive
  • HPSS
  • iRODS
  • OneDrive
  • POSIX Staging
© 2010- The University of Chicago Legal Privacy Accessibility