HPSS Connector
The HPSS connector can be used for accessing and sharing of data on an HPSS storage system. The connector is available as an add-on subscription to organizations with a Globus Standard subscription - please contact us for pricing.
This document describes the steps needed to install an endpoint and the HPSS connector needed to access the storage system. This installation should be done by a system administrator, and once completed, users can use the endpoint to access HPSS storage via Globus to transfer, share and publish data on the system.
Preinstallation Checklist
In order for the HPSS DSI to function properly on the HPSS client node, please verify the following items.
Review Release Notes
Recently discovered issues and workarounds will be documented in the GitHub repository prior to inclusion in this document. See the repo Readme for details.
Supported HPSS Versions
This connector supports HPSS versions 7.3 and newer. However, we are limited to only HPSS configurations that we have access to for building RPMs and verifying releases.
Recommended HPSS Patches
These HPSS issues severely impact performance so the patches are highly recommended.
BZ2819 - PIO 60 second delay impacts small file performance. There is a small percentage chance that, after a transfer completes, HPSS PIO will wait 60 seconds before informing the client that the transfer has completed. This fix has been implemented in 7.3.3p9, 7.3.4, 7.4.1p1 and 7.4.2.
BZ2856 - Enabling HPSS_API_REUSE_CONNECTIONS returns address already in use. This one sets a limit on how many active connections we can have. GridFTP and HPSS make considerable use of ephemeral TCP ports. Quick, successive file transfers can lead the system to run out of available ports. There is no fix for this HPSS issue at this time. The number of ephemeral ports can be increased and the amount of time a socket spends in timed wait can be decreased to help avoid this issue.
BZ7772 - PIO 5 second delay impacts small file performance. There is a high percentage chance that, after a transfer completes, HPSS PIO will wait 5 seconds before informing the client that the transfer has completed. This greatly impacts the performance of file retrieves and checksum operations. This fix has been implemented in 7.5.3+.
BZ7883 - Prevents successful transfers of files over 4GiB on HPSS versions 7.5.2+. Due to what appears to be a transfer length calculation error, transfer of files larger than 4GiB generate an EIO error at the 4GiB mark and the transfer terminates. This bug impacts all HPSS clients using the HPSS PIO interface. Upgrade to HPSS 7.5.2u5 / HPSS 7.5.3u1 to resolve this issue.
BZ10852 - Interrupted or canceled Transfer tasks result in lingering GridFTP processes. This fix will be released in HPSS 8.3+.
BZ11137 - Prevents the HPSS Connector from reauthenticating to HPSS when the storage gateway configuration changes. Without this patch, changes to the HPSS storage gateway options require a restart of the GCS Manager process.
HPSS Configuration Options
These options are generally set in /var/hpss/etc/env.conf and affect operation of data transfers. Some of these options may be required depending upon your configuration.
- HPSS_API_HOSTNAME
-
This option selects the network interface used for data transfers between the Globus services you are configuring and the HPSS mover machine(s). If this is unset, data transfers use the default network interface. This option is generally necessary on multihomed nodes. It should be set on the node running GridFTP.
- MVR_CLIENT_TIMEOUT
-
This controls the amount of time before a mover process will stop waiting for data from the Globus service in order to reclaim network resources. Default is 15 minutes. In very large file transfers, it is possible that movers may timeout before the transfer reaches data offsets which those movers are responsible for. This option is set on the mover nodes.
Note on Kerberos Configurations
Kerberos must be configured for access to the proper Kerberos realm that contains HPSS. This file is usually kept in /etc/krb5.conf. You may need to enable the allow_weak_crypto option in the [libdefaults] section if the DSI module can not talk to the HPSS servers.
Required HPSS Files
The following HPSS files located in /var/hpss/etc
are known to be required for operation of the HPSS DSI:
-
auth.conf
-
authz.conf
-
env.conf
-
ep.conf
-
group
-
HPSS.conf
-
hpss.keytab (or hpss.unix.keytab)
-
ieee_802_addr
-
passwd
-
site.conf
Configure Local User Accounts
When a user accesses HPSS via Globus, home directory lookups and translations between usernames and user IDs are performed using the OS Name Service (ex /etc/passwd, ldap, nis, etc). The HPSS password file (ie HPSS_UNIX_AUTH_PASSWD) is not used by Globus. This has a direct impact on authentication and file access. Verify that HPSS users have the same UID on the local system and within HPSS. For example, given an HPSS user hpssuser1
, the UID and GID returned from the following command should match the UID and GID of the same account within HPSS:
$ getent passwd hpssuser1
hpssuser1:x:12345:1000:HPSS User:/home/hpssuser1:/bin/bash
Configure HPSS Credentials
Globus uses the account hpssftp
to access HPSS initially then changes user ID to the authenticated HPSS user (ie hpssuser1
). This removes the need to maintain per-user keytab files on the HPSS client node. However, this requires that the Globus process have access to the hpssftp
keytab entry during the authentication phase which runs under the authenticating user’s UID.
Assuming the keytab for hpssftp
is stored in /var/hpss/etc/hpss.keytab
:
$ chmod 644 /var/hpss/etc/hpss.keytab
HPSS installations configured for Kerberos authentication must also allow non privileged users write access to HPSS temporary kerberos ticket cache, typically /var/hpss/cred
:
$ chmod 1777 /var/hpss/cred
hpssftp
keytab file must not be exposed to unprivileged users. Prevent local shell access by non privileged HPSS users (ex. PAM).
Verify Basic Operations Via Scrub
As a non privileged HPSS user on the local node, verify that the local account is able to authenticate successfully to HPSS as hpssftp. For example:
$ /opt/hpss/bin/scrub -a krb5 -p hpssftp -k -t /var/hpss/etc/hpss.keytab
scrub> quit
As a non privileged HPSS user, log into HPSS and perform some basic directory and file operations. Unlike the previous step, make sure these operations are performed as a non privileged HPSS user:
$ /opt/hpss/bin/scrub
/hpss/home/testuser1
scrub> mkdir testdir
scrub> rmdir testdir
scrub> open testfile wc
File created using COS 1 (Small File COS)
scrub> write 5k
.done (144.981 KB/sec)
scrub> close
scrub> unlink testfile
scrub> quit
Installation
This section will explain how to install and configure the HPSS connector. Your Globus Connect Server endpoint must already be installed. See the Globus Connect Server v5.4 Install Guide
Upgrading from Version 2.8 and Earlier
As of version 2.9, the HPSS Connector is installed from RPM instead of building from source. Because of this, several changes to the system for installation of previous versions need to be reversed so that they do not conflict with the RPM installation. This includes:
- /etc/gridftp.d/hpss
-
Make sure this file does not exist. Also remove any other files within the same directory that supply configuration options for
hpss_local
. - /etc/ld.so.conf.d/gridftp_hpss_dsi.conf
-
Remove this file if it exists so that the GridFTP server can find the new HPSS DSI in /usr/lib64/. Be sure to run
ldconfig
to update system paths. - /var/hpss/etc/gridftp_hpss_dsi.conf
-
Delete or move this file to save it to avoid conflict with the RPM-managed version of the file.
Installing the HPSS Connector
The HPSS Connector is installed by RPM in order to simplify installation and enforce version requirements with requisite software. Unfortunately, Globus does not have access to all HPSS and RHEL version combinations to provide RPMs for all installations.
Globus Connect Server v5.4 support was added in version 2.17 of the HPSS Connector. Prior versions of the connector are not compatible with GCSv5.4.
Visit the release page and find the latest release. Note that release candidates are also available from this page and are indicated with the Pre-release
badge and have a Release Candidate
name. For production use, make sure to choose the most recent official release which will have the Latest Release
badge.
Go to the Assets
section under the latest release. When possible, RPMs for supported platforms will be available for your convenience. If an RPM is not available, goto the Git repo Readme for the latest instructions on how to build the RPM from source.
The RPM has a naming scheme of globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm
.
- globus-gridftp-server-hpss
-
package name
- -7.5-
-
This package is for HPSS version 7.5.X
- -2.9-1
-
This is the first release of this connector version 2.9
- .el7.
-
This package is for RHEL 7.X.
- +gcsv5
-
This package is for Globus Connect Server
Download the RPM and matching .asc
file which will allow you to verify that the RPM has not changed since creation. Using a recent version of gpg (>= 2.0), import the public key used for signing:
$ gpg --keyserver hkp://keys.openpgp.org --recv-keys 1EA106A24003C353
gpg: requesting key 4003C353 from hkp server keys.openpgp.org
gpg: key 4003C353: "Jason Alt <jasonalt@globus.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
And then verify the downloaded RPM:
$ gpg --verify globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm.asc globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm
gpg: Signature made Wed 06 Nov 2019 09:45:26 PM UTC using RSA key ID 4003C353
gpg: Good signature from "Jason Alt <jasonalt@gmail.com>"
gpg: aka "Jason Alt <jasonalt@globus.org>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: C36C 826C 18ED 73C3 38DC FA53 1EA1 06A2 4003 C353
And finally, install the downloaded RPM using YUM:
$ sudo yum install ./globus-gridftp-server-hpss-7.5-2.9-1.el7+gcsv5.x86_64.rpm
Creating HPSS Storage Gateways
An HPSS Storage Gateway is created with the command globus-connect-server storage-gateway create hpss, and can be updated with the command globus-connect-server storage-gateway update hpss.
Before looking into the policy options specific to the HPSS Connector, please familiarize yourself with the Globus Connect Server v5 Data Access Guide which describes the steps to create and update a storage gateway, using the POSIX connector as an example. The commands to create and update a storage gateway for the HPSS Connector are similar.
HPSS Connector Storage Gateway Policies
The HPSS Connector has the following policies to configure how it accesses HPSS.
HPSS Authentication
When a Globus user authenticates to a HPSS endpoint, the
HPSS Connector will first authenticate to HPSS using either
Kerberos or Unix authentication and the hpssftp
user credential stored
on the local node. Then the hpssftp
user session will change user ID
to the HPSS account that corresponds to the Globus user. In
this way, Globus users can authenticate to HPSS without
requiring a local HPSS key file for each user.
The authentication_mech and authenticator properties are used to
control how the HPSS Connector authenticates to HPSS.
authentication_mech sets the type of HPSS authentication to
use and accepts either krb5
or unix
. authenticator specifies the
file containing an HPSS credential for the hpssftp
user. The
authenticator property accepts a value of the form:
auth_keyfile|auth_keytab:<path_to_file>
. This file must exist on all
nodes in the endpoint.
Store File Checksums in User Defined Attributes
The HPSS Connector can optionally store MD5 checksums in the
User-Defined Attributes (UDA) for use later with the sync-by-checksum
transfer mode. This allows the HPSS Connector to report the checksum
of an existing file without recalling the file from tape. This behavior
is controlled by the uda_checksum property which accepts either
true
or false
.
Creating HPSS Collections
An HPSS Collection is created with the command globus-connect-server collection create, and can be updated with the command globus-connect-server collection update.
As the HPSS Connector does not introduce any policies beyond those used by the base collection type, you can follow the sequence in the Collections Section of the Globus Connect Server v5 Data Access Guide.
Log Collection
New in version 2.14.
The HPSS Connector provides three methods for collecting log messages for monitoring and debugging. Of these three methods, only HPSS Connector Standard Logging is recommended for production use. The other two methods are reserved for troubleshooting.
HPSS Connector Standard Logging
By default, the HPSS Connector will send log messages of severity INFO, ERROR
and WARN to the standard GridFTP server log file, typically located at
/var/log/gridftp.log
. You can enable the collection of these messages using
the standard GridFTP server logging option log_level
. You can override the
default log settings by creating /etc/gridftp.d/hpss_logging
and setting
these configuration options:
log_single /var/log/gridftp.log log_level ERROR,WARN
Setting the log_level
value to include any of the following severities will
allow you to collect messages of that type.
- ERROR
-
These messages require immediate action by the endpoint administrator. In the case of an error message, the connector has encountered a fatal condition and functionality is severely limited. These conditions typically impact all users of the connector. For example, the
ERROR
severity is used for configuration options that prevent users from using the system. - WARN
-
This message implies that a condition has occurred that the endpoint administrator should be aware of but the connector has recognized the condition and worked around it in order to provide functionality. In these cases, functionality may be limited but overall transfer operations will succeed. For example, the
WARN
severity is used to alert the endpoint administrator that the HPSS installation is missing recommended HPSS patches. - INFO
-
This severity is useful for gathering more information about the activity of GridFTP connections. It is used to notify the endpoint administrator of the particular commands used during a GridFTP session. For example,
INFO
is used to record directory listing and file transfer events.
HPSS Connector log messages with have the format:
[22825] Tue Sep 1 13:18:09 2020 :: [HPSS Connector][INFO] User=johndoe TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41 :: Getting attributes of /home/johndoe/10M.dat
- [22825]
-
The process id.
- Tue Sep 1 13:18:09 2020
-
Current date.
- [HPSS Connector]
-
Tag to indicate that this message is from the HPSS Connector.
- [ERROR]
-
The log message’s severity.
- User=johndoe
-
The authenticated user. This field is not always available.
- TaskID=19c156ce-ec77-11ea-85ac-0e1702b77d41
-
The associated Globus Transfer task ID. This field is not always available.
The remainder of the message includes the field separator ::
followed by the actual HPSS Connector log message.
HPSS Connector Debug Logging
The HPSS Connector can be configured to log additional debug information. This logging mechanism is independent of the GridFTP server’s log file used with standard logging and is designed for use in resolving difficult issues by tracing interactions with the HPSS API. This logging interface is disabled by default. It is recommended that this remain off during production because of its verbosity.
See /etc/gridftp.d/hpss_debug
for details on debug logging.
HPSS API Debug Logging
You can optionally enable the HPSS API to log additional debug information to its own log file. This may be useful in conjunction with debug logging when resolving interactions with HPSS. This option should remain disabled during production use.
See /etc/gridftp.d/hpss_debug
for details on HPSS API debug logging.
Mailing List
Releases, upcoming features and discussions take place on the mailing list: hpss-discuss@globus.org
Appendix A: Troubleshooting
Below are some common issues encountered while using the Globus Transfer service with an endpoint running the HPSS connector along with possible resolutions to each problem.
Could not find home directory for <user>. Failed to log into HPSS. Error code is -5
This error message implies that the HPSS policy options on the HPSS Storage Gateway are incorrect. Update the HPSS Storage Gateway using globus-connect-server storage-gateway update hpss and restart the GCS Manager process.
Async Stage Requests Cause Red-Ball-of-Doom
Recent changes to make use of the async stage request API for HPSS in order to
avoid inundating the core server with duplicate stage requests has exposed a
deficiency for the DSI use case of HPSS. The HPSS async stage API expects the
call to be available long term in order to receive stage completion messages.
However, the GridFTP/DSI use case is a short-lived transient environment; the
GridFTP process can not wait minutes/hours/days for stage completion messages.
Users of DSI versions 2.6+ will see the impact as a red-ball-of-doom
indicator
in the HPSS GUI console. The warning is innocuous and can be ignored. IBM is
aware of this issue and a change request has been created.
As a work around, users of 2.6 should update to 2.7 and all users of 2.7+ can
use the blackhole sync
method. This configures nc
(netcat) to listen for
stage completion messages intended for the DSI and discard whatever it receives.
nc
should be launched on a highly-available server reachable by the HPSS core
servers (preferably run it directly on the core servers). Choose a port to use
for receiving callback notifications on and run this command:
admin@hpss-core $ nc -v -v -k -l <port>
Once nc
is running, add this to /etc/gridftp.d/hpss_issue_35
on the GridFTP
nodes running the HPSS DSI:
$ASYNC_CALLBACK_ADDR <host>:<port>
Login Failure: No such file or directory
This error message indicates that hpss_LoadDefaultThreadState() has returned ENOENT
causing the login procedure to fail. This is occurs when the UID of the authenticating user as known to the GridFTP process does not match the user’s ID as known by HPSS. See Local user accounts must match user accounts in HPSS.
Command Failed: Error (login) Endpoint: xxxx Server: xxxx Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=PATH_NOT_FOUND\r\n530-GridFTP-Errno: 2\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: No such file or directory\r\n530 End.\r\n
Login Failure: Operation not permitted
This error message indicates that hpss_SetLoginCred() failed with EPERM
during the login procedure. This step in the login process accesses the keytab defined in AuthenticationMech
so that the DSI can connect to HPSS as user LoginName
. The error value indicates that the GridFTP process was unable to access the keytab file. See hpssftp credentials must be accessible by local unprivileged accounts.
Command Failed: Error (login) Endpoint: xxxx Server: xxxx Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 1\r\n530-GridFTP-Reason: System error in hpss_SetLoginCred()\r\n530-GridFTP-Error-String: Operation not permitted\r\n530 End.\r\n
Login Failure: Invalid argument
If you receive this message, it is likely that /var/hpss/etc/site.conf is invalid.
Error (login) Endpoint: XXX Server: XXX Message: Login Failed --- Details: 530-Login incorrect. : GlobusError: v=1 c=INTERNAL_ERROR\r\n530-GridFTP-Errno: 22\r\n530-GridFTP-Reason: System error in hpss_LoadDefaultThreadState()\r\n530-GridFTP-Error-String: Invalid argument\r\n530 End.\r\n
Transfer Error: Operation timed out
Large file transfers to/from HPSS tend to span multiple sets of HPSS mover processes. Each set is responsible for a large contiguous chunk of the file transfer. First set transfers offsets 0-N, second set transfers (N+1)-M, and so on. These mover sets are all initialized at the beginning of the transfer.
Any mover will timeout after MVR_CLIENT_TIMEOUT seconds (defaults to 15 minutes). If a mover set does not start the transfer within this timeout, the entire transfer aborts. This is an HPSS issue, not a DSI issue.
This error condition is usually obvious from the following errors issued in MVR_CLIENT_TIMEOUT seconds + 5 minute intervals. See MVR_CLIENT_TIMEOUT for more details.
2019-06-12 14:29:39 Error (transfer) Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec) Server: XXXX:2811 Command: STOR ~/scratch_backups/XXXX Message: The operation timed out --- Details: Timeout waiting for response
2019-06-12 14:49:47 Error (transfer) Endpoint: XXXX HPSS Archive (e38ee901-6d04-11e5-ba46-22000b92c6ec) Server: XXXX:2811 File: /~/scratch_backups/XXXX Command: STOR ~/scratch_backups/XXX Message: Fatal FTP response --- Details: 451-GlobusError: v=1 c=INTERNAL_ERROR\r\n451-GridFTP-Errno: 5011\r\n451-GridFTP-Reason: System error in hpss_PIOExecute\r\n451-GridFTP-Error-String: \r\n451 End.\r\n
Appendix B: Performance
GridFTP installations benefit from and take full advantage of classes of service that use fixed length classic style allocation. In short, you’ll get the best performance from the GridFTP interface (actually any HPSS interface) if the segment count is below 32.
HPSS has multiple disk/tape allocation algorithms used to allocate space for incoming data. Fixed length allocation gives you equal size chunks to store data in. This was deemed wasteful because the last block was most certainly never filled. Variable length allocation was created to solve this problem; it will give you increasingly larger segments as data is stored and truncates the last block. This is a win for most situations when HPSS is unsure how much data is to be stored for the given file.
Using either of these allocation mechanisms (any variable length allocation or fixed w/o knowing the file size), HPSS is free to continue to allocate segments until all the data is stored. This has a definite performance impact because internally HPSS retrieves data in 32-segment chunks. This means when you request a file from HPSS, internally it breaks it up into multiple transfers, each of which is ⇐ 32 segments. Functionally, this is transparent to the client. In terms of performance, the client will see a high load followed by a pause followed by a high load, etc.
In order to avoid the performance hit, you can use fixed length allocation with segment counts < 32 and take advantage of the fact that any WELL-BEHAVED GridFTP client will inform HPSS of the size of the incoming file before the transfer begins. In fact, the DSI is designed to require this. If a GridFTP client is not well behavad, the DSI will act as though a zero length transfer is about to occur and will handle it as such. So you’ll know if the client is not doing the right thing.