HPSS Connector
Last Updated: Feb 23, 2022
The HPSS connector can be used for accessing and sharing of data on an HPSS storage system. The connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.
This document describes the steps needed to install an endpoint and the HPSS connector needed to access the storage system. This installation should be done by a system administrator, and once completed, users can use the endpoint to access HPSS storage via Globus to transfer, share and publish data on the system.
Preinstallation Checklist
In order for the HPSS DSI to function properly on the HPSS client node, please verify the following items.
Review Release Notes
Recently discovered issues and workarounds will be documented in the GitHub repository prior to inclusion in this document. See the repo Readme for details.
Supported HPSS Versions
This connector supports HPSS versions 7.3 and newer. However, we are limited to only HPSS configurations that we have access to for building RPMs and verifying releases.
Recommended HPSS Patches
These HPSS issues severely impact performance so the patches are highly recommended.
BZ2819 - PIO 60 second delay impacts small file performance. There is a small percentage chance that, after a transfer completes, HPSS PIO will wait 60 seconds before informing the client that the transfer has completed. This fix has been implemented in 7.3.3p9, 7.3.4, 7.4.1p1 and 7.4.2.
BZ2856 - Enabling HPSS_API_REUSE_CONNECTIONS returns address already in use. This one sets a limit on how many active connections we can have. GridFTP and HPSS make considerable use of ephemeral TCP ports. Quick, successive file transfers can lead the system to run out of available ports. There is no fix for this HPSS issue at this time. The number of ephemeral ports can be increased and the amount of time a socket spends in timed wait can be decreased to help avoid this issue.
BZ7772 - PIO 5 second delay impacts small file performance. There is a high percentage chance that, after a transfer completes, HPSS PIO will wait 5 seconds before informing the client that the transfer has completed. This greatly impacts the performance of file retrieves and checksum operations. This fix has been implemented in 7.5.3+.
BZ7883 - Prevents successful transfers of files over 4GiB on HPSS versions 7.5.2+. Due to what appears to be a transfer length calculation error, transfer of files larger than 4GiB generate an EIO error at the 4GiB mark and the transfer terminates. This bug impacts all HPSS clients using the HPSS PIO interface. Upgrade to HPSS 7.5.2u5 / HPSS 7.5.3u1 to resolve this issue.
BZ10852 - Interrupted or canceled Transfer tasks result in lingering GridFTP processes. This fix will be released in HPSS 8.3+.
HPSS Configuration Options
These options are generally set in /var/hpss/etc/env.conf and affect operation of data transfers. Some of these options may be required depending upon your configuration.
- HPSS_API_HOSTNAME
-
This option selects the network interface used for data transfers between the Globus services you are configuring and the HPSS mover machine(s). If this is unset, data transfers use the default network interface. This option is generally necessary on multihomed nodes. It should be set on the node running GridFTP.
- MVR_CLIENT_TIMEOUT
-
This controls the amount of time before a mover process will stop waiting for data from the Globus service in order to reclaim network resources. Default is 15 minutes. In very large file transfers, it is possible that movers may timeout before the transfer reaches data offsets which those movers are responsible for. This option is set on the mover nodes.
Note on Kerberos Configurations
Kerberos must be configured for access to the proper Kerberos realm that contains HPSS. This file is usually kept in /etc/krb5.conf. You may need to enable the allow_weak_crypto option in the [libdefaults] section if the DSI module can not talk to the HPSS servers.
Required HPSS Files
The following HPSS files located in /var/hpss/etc
are known to be required for operation of the HPSS DSI:
-
auth.conf
-
authz.conf
-
env.conf
-
ep.conf
-
group
-
HPSS.conf
-
hpss.keytab (or hpss.unix.keytab)
-
ieee_802_addr
-
passwd
-
site.conf
Configure Local User Accounts
When a user accesses HPSS via Globus, home directory lookups and translations between usernames and user IDs are performed using the OS Name Service (ex /etc/passwd, ldap, nis, etc). The HPSS password file (ie HPSS_UNIX_AUTH_PASSWD) is not used by Globus. This has a direct impact on authentication and file access. Verify that HPSS users have the same UID on the local system and within HPSS. For example, given an HPSS user hpssuser1
, the UID and GID returned from the following command should match the UID and GID of the same account within HPSS:
$ getent passwd hpssuser1
hpssuser1:x:12345:1000:HPSS User:/home/hpssuser1:/bin/bash
Configure HPSS Credentials
Globus uses the account hpssftp
to access HPSS initially then changes user ID to the authenticated HPSS user (ie hpssuser1
). This removes the need to maintain per-user keytab files on the HPSS client node. However, this requires that the Globus process have access to the hpssftp
keytab entry during the authentication phase which runs under the authenticating user’s UID.
Assuming the keytab for hpssftp
is stored in /var/hpss/etc/hpss.keytab
:
$ chmod 644 /var/hpss/etc/hpss.keytab
HPSS installations configured for Kerberos authentication must also allow non privileged users write access to HPSS temporary kerberos ticket cache, typically /var/hpss/cred
:
$ chmod 1777 /var/hpss/cred
hpssftp
keytab file must not be exposed to unprivileged users. Prevent local shell access by non privileged HPSS users (ex. PAM).
Verify Basic Operations Via Scrub
As a non privileged HPSS user on the local node, verify that the local account is able to authenticate successfully to HPSS as hpssftp. For example:
$ /opt/hpss/bin/scrub -a krb5 -p hpssftp -k -t /var/hpss/etc/hpss.keytab
scrub> quit
As a non privileged HPSS user, log into HPSS and perform some basic directory and file operations. Unlike the previous step, make sure these operations are performed as a non privileged HPSS user:
$ /opt/hpss/bin/scrub
/hpss/home/testuser1
scrub> mkdir testdir
scrub> rmdir testdir
scrub> open testfile wc
File created using COS 1 (Small File COS)
scrub> write 5k
.done (144.981 KB/sec)
scrub> close
scrub> unlink testfile
scrub> quit
Installation
This section will explain how to install and configure the HPSS connector. Your Globus Connect Server endpoint must already be installed. See the Globus Connect Server v4 Install Guide
Upgrading from Version 2.8 and Earlier
As of version 2.9, the HPSS Connector is installed from RPM instead of building from source. Because of this, several changes to the system for installation of previous versions need to be reversed so that they do not conflict with the RPM installation. This includes:
- /etc/gridftp.d/hpss
-
Make sure this file does not exist. Also remove any other files within the same directory that supply configuration options for
hpss_local
. - /etc/ld.so.conf.d/gridftp_hpss_dsi.conf
-
Remove this file if it exists so that the GridFTP server can find the new HPSS DSI in /usr/lib64/. Be sure to run
ldconfig
to update system paths. - /var/hpss/etc/gridftp_hpss_dsi.conf
-
Delete or move this file to save it to avoid conflict with the RPM-managed version of the file.
Installing the HPSS Connector
The HPSS Connector is installed by RPM in order to simplify installation and enforce version requirements with requisite software. Unfortunately, Globus does not have access to all HPSS and RHEL version combinations to provide RPMs for all installations.
Visit the release page and find the latest release. Note that release candidates are also available from this page and are indicated with the Pre-release
badge and have a Release Candidate
name. For production use, make sure to choose the most recent official release which will have the Latest Release
badge.
Go to the Assets
section under the latest release. When possible, RPMs for supported platforms will be available for your convenience. If an RPM is not available, goto the Git repo Readme for the latest instructions on how to build the RPM from source.
The RPM has a naming scheme of globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv4.x86_64.rpm
.
- globus-gridftp-server-hpss
-
package name
- -7.5-
-
This package is for HPSS version 7.5.X
- -2.9-1
-
This is the first release of this connector version 2.9
- .el7.
-
This package is for RHEL 7.X.
- +gcsv4
-
This package is for Globus Connect Server version 4
Download the RPM and matching .asc
file which will allow you to verify that the RPM has not changed since creation. Using a recent version of gpg (>= 2.0), import the public key used for signing:
$ gpg --keyserver hkp://keys.openpgp.org --recv-keys 1EA106A24003C353
gpg: requesting key 4003C353 from hkp server keys.openpgp.org
gpg: key 4003C353: "Jason Alt <jasonalt@globus.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
And then verify the downloaded RPM:
$ gpg --verify globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv4.x86_64.rpm.asc globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv4.x86_64.rpm
gpg: Signature made Wed 06 Nov 2019 09:45:26 PM UTC using RSA key ID 4003C353
gpg: Good signature from "Jason Alt <jasonalt@gmail.com>"
gpg: aka "Jason Alt <jasonalt@globus.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: C36C 826C 18ED 73C3 38DC FA53 1EA1 06A2 4003 C353
And finally, install the downloaded RPM using YUM:
$ sudo yum install ./globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv4.x86_64.rpm
Configure the HPSS Connector
Review /var/hpss/etc/gridftp_hpss_dsi.conf
in the source directory for any changes you may wish to make for your site. You will likely leave most of these options commented out to use their default values.
- LoginName <user>
-
(optional) This is the HPSS service user used to initially authenticate with HPSS. GridFTP requires a privileged user with control permission on the core server’s client interface in order to log into HPSS and then change its credentials to that of the connecting user. Defaults to
hpssftp
which is also handled special by HPSS with regards to gate keeper operations. - AuthenticationMech [unix|krb5|gsi|spkm]
-
(optional) Defines the type of authentication that the DSI will perform when logging into HPSS. Note that this is not the authentication mechanism the GridFTP users will use; they always use GSI. Defaults to HPSS_API_AUTHN_MECH or HPSS_PRIMARY_AUTHN_MECH.
- Authenticator [auth_keytab|auth_keyfile|auth_key|auth_passwd][:<file>]
-
(optional) Defines the location of credentials to be used by the DSI to authenticate to HPSS as
LoginName
. Defaults to HPSS_PRIMARY_AUTHENTICATOR. When this option points to a file, that file’s contents must be accessible to the GridFTP process which runs as the UID of the authenticated user.-
For unix authentication, you can put the
LoginName
account credentials into its own file usinghpss_unix_keytab
and pointAuthenticator
to that file instead of giving the GridFTP process read access to the target of HPSS_PRIMARY_AUTHENTICATOR. -
For sites using kerberos authentication with HPSS, you’ll need to create a kerberos keytab file using the kerberos utility
ktutil
if you wish to seperate theLoginName
credentials from the target of HPSS_PRIMARY_AUTHENTICATOR.
-
- UDAChecksumSupport [on|off]
-
(optional) Causes checksums to be stored within UDAs so that the checksum can be retrieved later without recalling the file from tape. It is recommended that you set this option to
on
to avoid unnecessary tape recalls. The default isoff
.
Basic Endpoint Functionality Test
After completing the installation, you should do some basic transfer tests with your endpoint to ensure that it is working. We document a process for basic endpoint functionality testing here.
Log Collection
New in version 2.14.
The HPSS Connector provides three methods for collecting log messages for monitoring and debugging. Of these three methods, only HPSS Connector Standard Logging is recommended for production use. The other two methods are reserved for troubleshooting.
HPSS Connector Standard Logging
By default, the HPSS Connector will send log messages of severity INFO, ERROR
and WARN to the standard GridFTP server log file, typically located at
/var/log/gridftp.log
. You can enable the collection of these messages using
the standard GridFTP server logging option log_level
. You can override the
default log settings by creating /etc/gridftp.d/hpss_logging
and setting
these configuration options:
log_single /var/log/gridftp.log log_level ERROR,WARN
Setting the log_level
value to include any of the following severities will
allow you to collect messages of that type.
- ERROR
-
These messages require immediate action by the endpoint administrator. In the case of an error message, the connector has encountered a fatal condition and functionality is severely limited. These conditions typically impact all users of the connector. For example, the
ERROR
severity is used for configuration options that prevent users from using the system. - WARN
-
This message implies that a condition has occurred that the endpoint administrator should be aware of but the connector has recognized the condition and worked around it in order to provide functionality. In these cases, functionality may be limited but overall transfer operations will succeed. For example, the
WARN
severity is used to alert the endpoint administrator that the HPSS installation is missing recommended HPSS patches. - INFO
-
This severity is useful for gathering more information about the activity of GridFTP connections. It is used to notify the endpoint administrator of the particular commands used during a GridFTP session. For example,
INFO
is used to record directory listing and file transfer events.
HPSS Connector log messages with have the format:
[22825] Tue Sep 1 13:18:09 2020 :: [HPSS Connector][INFO] User=johndoe TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41 :: Getting attributes of /home/johndoe/10M.dat
- [22825]
-
The process id.
- Tue Sep 1 13:18:09 2020
-
Current date.
- [HPSS Connector]
-
Tag to indicate that this message is from the HPSS Connector.
- [ERROR]
-
The log message’s severity.
- User=johndoe
-
The authenticated user. This field is not always available.
- TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41
-
The associated Globus Transfer task ID. This field is not always available.
The remainder of the message includes the field separator ::
followed by the actual HPSS Connector log message.
HPSS Connector Debug Logging
The HPSS Connector can be configured to log additional debug information. This logging mechanism is independent of the GridFTP server’s log file used with standard logging and is designed for use in resolving difficult issues by tracing interactions with the HPSS API. This logging interface is disabled by default. It is recommended that this remain off during production because of its verbosity.
See /etc/gridftp.d/hpss_debug
for details on debug logging.
HPSS API Debug Logging
You can optionally enable the HPSS API to log additional debug information to its own log file. This may be useful in conjunction with debug logging when resolving interactions with HPSS. This option should remain disabled during production use.
See /etc/gridftp.d/hpss_debug
for details on HPSS API debug logging.
Mailing List
Releases, upcoming features and discussions take place on the mailing list: hpss-discuss@globus.org
Appendix A: Troubleshooting
Below are some common issues encountered while using the Globus Transfer service with an endpoint running the HPSS connector along with possible resolutions to each problem.
Async Stage Requests Cause Red-Ball-of-Doom
Recent changes to make use of the async stage request API for HPSS in order to
avoid inundating the core server with duplicate stage requests has exposed a
deficiency for the DSI use case of HPSS. The HPSS async stage API expects the
call to be available long term in order to receive stage completion messages.
However, the GridFTP/DSI use case is a short-lived transient environment; the
GridFTP process can not wait minutes/hours/days for stage completion messages.
Users of DSI versions 2.6+ will see the impact as a red-ball-of-doom
indicator
in the HPSS GUI console. The warning is innocuous and can be ignored. IBM is
aware of this issue and a change request has been created.
As a work around, users of 2.6 should update to 2.7 and all users of 2.7+ can
use the blackhole sync
method. This configures nc
(netcat) to listen for
stage completion messages intended for the DSI and discard whatever it receives.
nc
should be launched on a highly-available server reachable by the HPSS core
servers (preferably run it directly on the core servers). Choose a port to use
for receiving callback notifications on and run this command:
admin@hpss-core $ nc -v -v -k -l <port>
Once nc
is running, add this to /etc/gridftp.d/hpss_issue_35
on the GridFTP
nodes running the HPSS DSI:
$ASYNC_CALLBACK_ADDR <host>:<port>
Login Failure: No such file or directory
This error message indicates that hpss_LoadDefaultThreadState() has returned ENOENT
causing the login procedure to fail. This is occurs when the UID of the authenticating user as known to the GridFTP process does not match the user’s ID as known by HPSS. See Local user accounts must match user accounts in HPSS.
Command Failed: Error (login) Endpoint: xxxx Server: xxxx Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=PATH_NOT_FOUND\r\n530-GridFTP-Errno: 2\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: No such file or directory\r\n530 End.\r\n
Login Failure: Operation not permitted
This error message indicates that hpss_SetLoginCred() failed with EPERM
during the login procedure. This step in the login process accesses the keytab defined in AuthenticationMech
so that the DSI can connect to HPSS as user LoginName
. The error value indicates that the GridFTP process was unable to access the keytab file. See hpssftp credentials must be accessible by local unprivileged accounts.
Command Failed: Error (login) Endpoint: xxxx Server: xxxx Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 1\r\n530-GridFTP-Reason: System error in hpss_SetLoginCred()\r\n530-GridFTP-Error-String: Operation not permitted\r\n530 End.\r\n
Login Failure: Cannot access config file
The following error implies that /var/hpss/etc/gridftp_hpss_dsi.conf does not exist.
Error (login) Endpoint: XXX Server: XXX Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=PATH_NOT_FOUND\r\n530-GridFTP-Errno: 2\r\n530-GridFTP-Reason: System error in Can not access config file\r\n530-GridFTP-Error-String: No such file or directory\r\n530 End.\r\n
Login Failure: Invalid argument
If you receive this message, it is likely that /var/hpss/etc/site.conf is invalid.
Error (login) Endpoint: XXX Server: XXX Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 22\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: Invalid argument\r\n530 End.\r\n
Transfer Error: Operation timed out
Large file transfers to/from HPSS tend to span multiple sets of HPSS mover processes. Each set is responsible for a large contiguous chunk of the file transfer. First set transfers offsets 0-N, second set transfers (N+1)-M, and so on. These mover sets are all initialized at the beginning of the transfer.
Any mover will timeout after MVR_CLIENT_TIMEOUT seconds (defaults to 15 minutes). If a mover set does not start the transfer within this timeout, the entire transfer aborts. This is an HPSS issue, not a DSI issue.
This error condition is usually obvious from the following errors issued in MVR_CLIENT_TIMEOUT seconds + 5 minute intervals. See MVR_CLIENT_TIMEOUT for more details.
2019-06-12 14:29:39 Error (transfer) Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec) Server: XXXX:2811 Command: STOR ~/scratch_backups/XXXX Message: The operation timed out --- Details: Timeout waiting for response
2019-06-12 14:49:47 Error (transfer) Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec) Server: XXXX:2811 File: /~/scratch_backups/XXXX Command: STOR ~/scratch_backups/XXX Message: Fatal FTP response --- Details: 451-GlobusError: v=1 c=INTERNAL_ERROR\r\n451-GridFTP-Errno: 5011\r\n451-GridFTP-Reason: System error in hpss_PIOExecute\r\n451-GridFTP-Error-String: \r\n451 End.\r\n
Appendix B: Performance
GridFTP installations benefit from and take full advantage of classes of service that use fixed length classic style allocation. In short, you’ll get the best performance from the GridFTP interface (actually any HPSS interface) if the segment count is below 32.
HPSS has multiple disk/tape allocation algorithms used to allocate space for incoming data. Fixed length allocation gives you equal size chunks to store data in. This was deemed wasteful because the last block was most certainly never filled. Variable length allocation was created to solve this problem; it will give you increasingly larger segments as data is stored and truncates the last block. This is a win for most situations when HPSS is unsure how much data is to be stored for the given file.
Using either of these allocation mechanisms (any variable length allocation or fixed w/o knowing the file size), HPSS is free to continue to allocate segments until all the data is stored. This has a definite performance impact because internally HPSS retrieves data in 32-segment chunks. This means when you request a file from HPSS, internally it breaks it up into multiple transfers, each of which is ⇐ 32 segments. Functionally, this is transparent to the client. In terms of performance, the client will see a high load followed by a pause followed by a high load, etc.
In order to avoid the performance hit, you can use fixed length allocation with segment counts < 32 and take advantage of the fact that any WELL-BEHAVED GridFTP client will inform HPSS of the size of the incoming file before the transfer begins. In fact, the DSI is designed to require this. If a GridFTP client is not well behavad, the DSI will act as though a zero length transfer is about to occur and will handle it as such. So you’ll know if the client is not doing the right thing.