GCSv5.4 Troubleshooting Guide
- 1. Introduction
- 2. Troubleshooting Firewall Issues
-
3. Collection Access Issues
- 3.1. Troubleshooting Identity Mapping Related Mapped Collection Access Issues
- 3.2. Troubleshooting Local Account Related Mapped Collection Access Issues
- 3.3. Troubleshooting Policy Based Mapped Collection Access Issues
- 3.4. Troubleshooting Policy and Globus Permission Based Guest Collection Access Issues
- 3.5. Troubleshooting Storage System Permission Related Mapped Collection Access Issues
- 4. Troubleshooting Certificate Issues
- Appendix A: How To Find a Globus User’s Identity Set
- Appendix B: How to Find Collection and Storage Gateway Details
- Appendix C: Globus Logging Locations
- Appendix D: Obtaining Debug Log Events
- Appendix E: Troubleshooting GCS Node Configurations
1. Introduction
This document will discuss methods to troubleshoot common issues for GCSv5.4 based endpoints.
2. Troubleshooting Firewall Issues
Firewall issues can be a source of problems for endpoint admins and users. Below we’ll discuss some of the most common firewall related issues.
2.1. Troubleshooting GCS Manager Related Firewall Issues
The GCS Manager provides the interface for your endpoint that the Globus service uses to communicate with and manage it. If the GCS Manager on your endpoint is not accessible, you will encounter problems when attempting to use your endpoint.
2.1.1. Common Errors
Below are some common errors that may indicate the GCS Manager for the involved endpoint is not accessible.
Example Timeout Error
You may see the above error when attempting to browse a collection hosted on your endpoint in the Globus file manager. This sort of error could also be due to attempting to access a directory with a very large number of files and sub-directories.
-
If you see this sort of error for all paths you attempt to browse for all collections on an endpoint, then you should suspect an issue with the GCS Manager.
-
If you only see this issue when browsing particular paths, then this issue is likely due to the path in question simply having too many files and sub-directories for them to be listed in a timely manner. In such a case, arranging the directory in question so as to have a smaller number of entries should resolve the issue.
Example Unexpected Error
You may see the above error when attempting to modify the properties of your endpoint, the subscription status of your endpoint, or the properties of a collection hosted on your endpoint in the Globus Web App. In such a context, this error is usually a sign that GCS Manager on the endpoint is not accessible.
2.1.2. Troubleshooting Steps
To check if the GCS Manager is properly accessible on a given node in your endpoint, we’ll first want to check that the GCS Manager Service and Apache Service are both up and running. You can easily check this by looking at the outputs of the 'globus-connect-server self-diagnostic' command.
-
If you find that either of these services are down, then the GCS Manager Service will not be accessible.
-
If they’re down, you may simply be able to start the services with the 'systemctl start' command.
-
If you find that they won’t properly start, then you’ll likely want to contact Globus Support for further assistance.
Assuming that the needed services are up and running, we’ll next want to see what public DNS shows for the FQDN of your endpoint’s GCS Manager. We’ll start by first looking up the GCS Manager URL by running the following command on one of the nodes in your endpoint:
$ globus-connect-server endpoint show
Display Name: ABC University Endpoint
ID: 00000000-1111-2222-3333-444444444444
Subscription ID: 01234567-89ab-cdef-0123-456789abcdef
Public: True
GCS Manager URL: https://a1b2c.9f8e.data.globus.org
Network Use: normal
Organization: ABC University
We’ll now use 'dig' to check the IP addresses associated with the GCS Manager’s FQDN in DNS like so:
$ dig +short a1b2c.9f8e.data.globus.org
198.51.100.2
198.51.100.3
You should see the public IP address of each node in your endpoint returned.
-
If you find that the public IP address of the node in question isn’t returned, then that is a sign that the node wasn’t properly deployed.
-
If you find that a private IP address for a node is returned, then this is also a sign that the node wasn’t properly deployed.
-
If this endpoint consists of only a single node and you don’t see the expected IP address, then review our documentation here as to how to deploy the first node in an endpoint.
-
If this endpoint consists of multiple nodes and you don’t see the expected IP addresses, then review our documentation here as to how to deploy additional nodes beyond the first node in an endpoint.
GCS Manager Connectivity Testing Process
After checking to make sure that your node’s IP address is properly listed in the DNS record for your GCS Manager’s FQDN, we’ll want to attempt to connect to the GCS Manager at that IP address from various different locations to test that it is accessible from those locations.
You’ll want to run these tests in the order presented, as each subsequent test assumes the previous tests were successful and the diagnosis suggested are not necessarily accurate if the tests are run out of order. For the examples given below, our scenario assumes we’re wanting to test accessing the GCS Manager with an FQDN of "a1b2c.9f8e.data.globus.org" on the 198.51.100.2 system. You will of course want to alter these commands to suit your own tests on your own systems. A successful connection attempt should produce a connection to the GCS Manager service that pulls down a json document with contents showing various bits of information about the endpoint. If you encounter failures at any step you’ll need to resolve those issues before moving on to the next step.
The commands used for the 'curl' tests will vary depending on which system we’re running the test from.
-
The following commands will be run from the terminal on the 198.51.100.2 system:
$ curl -vk --resolve a1b2c.9f8e.data.globus.org:443:127.0.0.1 https://a1b2c.9f8e.data.globus.org/api/info $ curl -vk --resolve a1b2c.9f8e.data.globus.org:443:198.51.100.2 https://a1b2c.9f8e.data.globus.org/api/info
-
The following command will be run from the terminal on all other systems:
$ curl -vk --resolve a1b2c.9f8e.data.globus.org:443:198.51.100.2 https://a1b2c.9f8e.data.globus.org/api/info
1) Run the 'curl' test on the 198.51.100.2 system. Please remember that the 'curl' test in this case consists of two commands.
-
A "timeout" error or a "no route to host" error for the first command suggests a host firewall issue related to policy for self connections directed at loopback.
-
A success with the first command, but a "timeout" error or a "no route to host" error for the second command suggests a host firewall issue related to policy for self connections directed at the public IP address or possibly a networking issue.
-
A successful https connection to the GCS Manager address that doesn’t produce the expected output suggests a problem with the GCS Manager itself.
2) Run the 'curl' test on a second host on the same network segment as the 198.51.100.2 host. To be clear, there should be no network firewall between the second host and the 198.51.100.2 host.
-
A "timeout" error or a "no route to host" error here suggests a host firewall issue related to policy for inbound connections from other hosts.
3) Run the 'curl' test on a third host on your campus network, but on a different subnetwork than the 198.51.100.2 host. For example, if the 198.51.100.2 host is in a DMZ, then the third host should be outside of the DMZ. Both hosts should still be behind the campus border firewall.
-
A "timeout" error or a "no route to host" error here suggests an issue related to an internal network firewall.
4) Run the 'curl' test on a fourth host that is not on your campus network. The fourth host should be outside of your campus border firewall:
-
A "timeout" error or a "no route to host" error here suggests an issue related to the campus border firewall.
If you are able to successfully complete the above troubleshooting steps but still find that you’re having problems related to your GCS Manager service, then you’ll likely want to open a ticket with Globus Support to look into the matter further.
2.2. Troubleshooting Data Channel Related Firewall Issues
During a transfer data is moved between endpoints using Data Channel connections. If there are problems establishing these Data Channel connections between endpoints, then transfers will not work correctly.
2.2.1. Common Errors
You will most likely become aware of Data Channel issues with your endpoint after you or your users notice that transfers to or from your endpoint appear to fail. You can see the details for your transfers by going to the Activity page in the Globus Web App. When looking at the Event Log tab for a job that involves an endpoint that has Data Channel connectivity issues, you’ll see fault events that will clue you in to the problem. We’ll discuss these events generally below.
A fault event will look something like this:
Error (transfer)
Endpoint: XYZ University Endpoint
Server: dtn-hostname.xyzu.edu:2811
File: (Varies)
Command: (Varies)
Message: Data channel authentication failed
(This is common for data channel issues, but the 'Message' value may be different)
Details: (See below)
The above is telling us that it is the "dtn-hostname.xyzu.edu" node in the "XYZ University Endpoint" endpoint that is reporting the problem. Faults involving Data Channel issues will often have "Data channel authentication failed" as the fault "Message" value - but this is not always the case. The "Details" field for the fault event will contain a more complete explanation of the nature of the fault event. There are many possible variations on the values that could be seen for this field for a fault event related to data channel issues.
We’ll go over a few representative examples to give admins a better idea of what this field might be telling them. All "Details" field examples are given within the context of being part of a fault event with the other fault field values set as shown above.
Example Data Channel Error A - Connection Reset By Peer
Details: 500-Command failed. : globus_xio: The GSI XIO driver failed to
establish a secure connection. The failure occurred during a handshake
read.\r\n500-globus_xio: System error in recv: Connection reset by peer
\r\n500-globus_xio: A system call failed: Connection reset by peer
\r\n500 End.\r\n
Example Data Channel Error B - An Existing Connection Was Forcibly Closed By The Remote Host
Details: 500-Command failed. : globus_xio: The GSI XIO driver failed to
establish a secure connection. The failure occurred during a handshake
read.\r\n500-globus_xio: System error in recv: Unknown error\r\n500-globus_xio:
A system call failed: An existing connection was forcibly closed by the remote host.
\r\r\n500-\r\n500 End.\r\n
The examples above are telling us that the endpoint reporting the problem encountered an issue when attempting to negotiate a data channel session with the remote endpoint. A "Details" field value like this often means that the session negotiation process was able to start, but something interfered with that process. Firewall policy is often the root cause of such issues - especially policy that selectively filters ssl/tls traffic.
Example Data Channel Error C - Connection Timed Out
Details: 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_server_write_cb:3163:
\r\n500-callback failed.\r\n500-globus_xio_tcp_driver.c:globus_l_xio_tcp_system_connect_cb:2022:
\r\n500-Unable to connect to 198.51.100.10:50672
\r\n500-globus_xio_system_select.c:
globus_l_xio_system_handle_write:1108:\r\n500-System error in connect: Connection timed out
\r\n500-globus_xio: A system call failed: Connection timed out
\r\n500 End.\r\n
Example Data Channel Error D - Connection Timed Out
Details: 500-Command failed. : globus_xio: The GSI XIO driver failed to establish a
connection via the underlying protocol.\r\n500-globus_xio: Unable to connect to 198.51.100.10:50778
\r\n500-globus_xio: System error in connect: Connection timed out
\r\n500-globus_xio:
A system call failed: Connection timed out
\r\n500 End.\r\n
The examples above are telling us that the endpoint reporting the problem encountered an issue when attempting to negotiate a data channel session with the remote endpoint. A "Details" field value like this often means that the session negotiation process was unable to even be started. This sort of issue is often due to firewall policy blocking traffic in the data port range.
Example Data Channel Error E - Could Not Verify Credential
Details: 500-Command failed. : an authentication operation failed
\r\n500-globus_xio_gsi: gss_init_sec_context failed.\r\n500-GSS
failure: \r\n500-GSS Major Status: Authentication Failed\r\n
500-GSS Minor Status Error Chain:\r\n500-globus_gsi_gssapi: SSL
handshake problems\r\n500-OpenSSL Error: ssl/statem/statem_clnt.c:1914:
in library: SSL routines, function tls_process_server_certificate:
certificate verify failed\r\n500-globus_gsi_callback_module:
Could not verify credential
\r\n500-globus_gsi_callback_module:
Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain
with hash d4c3b2a1\r\n500-\r\n500 End.\r\n
This sort of error tells us that the endpoint doesn’t trust the cert being offered for the data channel connection. This generally only happens if there is something interfering with the establishment of the data channel session between the two endpoints involved in the transfer. Data channel traffic looks similar to https traffic in some ways, so firewall or network policy designed to limit or monitor such traffic can interfere with the establishment of data channel sessions between endpoints.
We sometimes see these sorts of errors for endpoints located behind https intercept proxies or similar devices. Globus data channel traffic cannot be proxied in this way. Sites that do operate with policy designed to intercept https/ssl traffic will need to configure exceptions for Globus data channel traffic for endpoints operating on their network.
2.2.2. Troubleshooting Steps
If you have not already done so, you’ll want to read our doc discussing the basics of Data Channel traffic.
When troubleshooting Data Channel issues it’s important to remember all transfers involve a source endpoint and a destination endpoint and that the factors causing the issues could be located at the site of the source endpoint, the destination endpoint, or possibly even both. You’ll want to first verify that firewall policy at your own site is consistent with the requirements for GCSv5.4 as given in our doc.
Data Channel traffic looks very similar to ssl/tls traffic in many ways, and testing that firewall policy at your site is configured to as to permit Data Channel traffic to/from your endpoint can be done using tools that allow you set up ssl/tls sessions. A convenient way to do this is to use the 'openssl' utility.
We’ll want to create a cert and key pair that we can use for the tests we do with the 'openssl' utility. We’ll then use that cert and key pair in our commands to create a simple 'openssl' listener and a simple 'openssl' client that will connect to the listener. It is important that both the 'openssl' listener and client use this same cert and key pair or the tests discussed will not work correctly.
A successful connection attempt from the client to the listener will generate debug outputs on both sides showing a successful ssl session negotiation. Users at the terminal on both sides can communicate back and forth by simulating a sort of crude text chat via typing messages into the terminal and pressing the ENTER key, so long as the connection was successful and remains in place. By checking to ensure that the 'openssl' client is able to connect to the 'openssl' listener, and verifying that the client and listener are able to 'talk' back and forth to each other, we can determine if data channel traffic appears to be blocked and can also get an idea as to where it might be blocked.
In addition to ensuring that communication is possible over the connection between the 'openssl' listener and client, you’ll also want to verify that the cert offered by the listener in the connection test actually matches what the listener is expected to offer. When the session is initiated between the client and listener, the outputs in the terminal of the client will show the cert for the listener that the client was presented with for the connection attempt and will also show the verification status for that cert. The operator of the client will want to ensure that they see the line "Verify return code: 0 (ok)" for the cert offered by the listener.
If any return code other than '0 (ok)' is shown in the outputs for the client for the cert offered by the listener, then the operator of the client will want to compare the cert offered by the listener in the outputs to the local copy of the cert that was initially created for use in the testing. The reason you want to manually verify this is that it can sometimes happen that sites can have devices set up (https intercept proxy or similar) which will replace certs for ssl sessions so as to allow the device to monitor (intercept) the connection. This sort of behavior will interfere with data channel traffic, so we’ll want to catch this if it’s happening.
Data Channel Connectivity Testing Process
We’ll use a few 'openssl' commands to create our cert/key pair and to create our 'openssl' listeners and clients that we’ll use for the testing.
The cert and key pair we’ll use can be created using the 'openssl' utility. The cert/key pair will be generated only once and that same cert/key pair will then need to be copied to each system involved in the testing.
openssl req -x509 -newkey rsa:2048 -nodes -sha256 -days 7 -subj "/C=US/O=Globus Online/CN=FXP DCAU Cert" -keyout key.pem -out cert.pem
A simple ssl listener can be created using the 'openssl' command.
openssl s_server -tls1_2 -port 50500 -key key.pem -cert cert.pem -CAfile cert.pem
The 'openssl' utility can be used to connect to such a listener as a client.
openssl s_client -tls1_2 -connect 198.51.100.20:50500 -key key.pem -cert cert.pem -CAfile cert.pem
To troubleshoot suspected issues with inbound Data Channel connections to your endpoint you’ll want to follow the steps below. You’ll want to perform these steps in the order presented, as each subsequent step assumes the previous steps were successful and the diagnosis suggested are not necessarily accurate if the steps are run out of order. If you encounter a failure at any step, you’ll need to resolve those issues before moving on to subsequent steps.
1) Set up an 'openssl' listener on the system hosting your endpoint bound to a port in the data port range. Attempt to connect to that 'openssl' listener with an 'openssl' client on the same host via both loopback and your system’s public IP address.
-
A failure with the connection to loopback suggests a host firewall policy issue related to self connections directed at loopback.
-
A failure with the connection to the system’s public IP address suggests either a host firewall issue related to self connections directed at the system’s public IP address or possibly a networking issue.
2) Set up an 'openssl' listener on the system hosting your endpoint as discussed previously. Attempt to connect to that 'openssl' listener with an 'openssl' client on a second host on the same network segment as the system hosting the endpoint. To be clear, there should be no network firewall between these hosts.
-
A failure here suggests an issue with host firewall policy related to inbound connections from other hosts.
3) Set up an 'openssl' listener on the system hosting your endpoint as discussed previously. Attempt to connect to that 'openssl' listener with an 'openssl' client on a third host on your campus network, but on a different subnetwork than the system hosting the endpoint. For example, if the system hosting the endpoint is in a DMZ, then the third host should be outside of the DMZ. Both hosts should still be behind the campus border firewall.
-
A failure here suggests an issue related to an internal network firewall.
4) Set up an 'openssl' listener on the system hosting your endpoint as discussed previously. Attempt to connect to that 'openssl' listener with an 'openssl' client on a fourth host that is not on your campus network. The fourth host should be outside of your campus border firewall.
-
A failure here suggests an issue related to the campus border firewall.
To troubleshoot suspected issues with outbound Data Channel connections from your endpoint you’ll use the same process described above for troubleshooting issues with inbound Data Channel traffic, except you’ll swap the locations where the 'openssl' listener and 'openssl' client are located.
-
If an admin at site A has gone through the above steps and found that their endpoint seems to pass the tests for both inbound and outbound data channel connections, but there still appear to be a data channel issue with transfers involving endpoints at some other site B, the next step is to reach out to the admins at that site B and request that they verify data channel connectivity for the system’s hosting their endpoints in the same manner.
-
If the site B admins report success in such verification, but data channel issues for transfers between endpoints at site A and site B persist, the next step is to attempt to directly test data channel connectivity between the endpoints involved. This is done simply by setting up an 'openssl' listener on one endpoint and attempting to connect to it with an 'openssl' client on the other endpoint. It is important to remember that a proper test will involve a set of tests in which each node in each endpoint participates running as the 'openssl' client and also separately as the 'openssl' listener so that the ability to establish data channel sessions in both directions (from site A to site B, as well as from site B to site A) is properly tested.
-
If admins at both site A and site B find that they are not able to properly establish ssl sessions in one (or both) directions between the systems hosting their endpoints, then they will need to reach out to their networking teams for further assistance to attempt to discover why this is so.
If you are able to successfully complete the above troubleshooting steps but still find that you’re having problems related to Data Channel traffic on your endpoint, then you’ll likely want to open a ticket with Globus Support to look into the matter further.
3. Collection Access Issues
For a Globus user to be able to access any collection, all applicable permissions, mappings, and policies must permit that access.
3.1. Troubleshooting Identity Mapping Related Mapped Collection Access Issues
For a Globus user to be able to access a mapped collection, they must have an identity in their Globus identity set which matches an allowed domain and maps to a valid user. The domains allowed are determined by policy set on the storage gateway, as is the mapping of the Globus user’s identity.
3.1.1. Common Errors
Below are some common errors that indicate a Globus user’s identity cannot be properly mapped on the storage gateway backing the collection they are trying to access.
Example Not From Allowed Domain Error
Command Failed: Error (login)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
Message: Login Failed
---
Details: 530-Login incorrect. : GlobusError: v=1 c=LOGIN_DENIED
530-GridFTP-Message: None of your identities are from domains allowed by resource policies
530-GridFTP-JSON-Result: {
"DATA_TYPE": "result#1.0.0",
"code": "permission_denied",
"detail": {
"DATA_TYPE": "not_from_allowed_domain#1.0.0"
,
"allowed_domains": [
"xyzu.edu"
]
},
"has_next_page": false,
"http_response_code": 403,
"message": "None of your identities are from domains allowed by resource policies"
}
530 End.
This error is telling us that the Globus user attempting to access the "XYZ University Mapped Collection" doesn’t have an identity in their Globus identity set from the required "xyzu.edu" identity domain that the storage gateway backing the mapped collection is configured to permit.
Example Invalid User Error
Command Failed: Error (login)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
Message: Login Failed
---
Details: 530-Login incorrect. : GlobusError: v=1 c=LOGIN_DENIED
530-GridFTP-Message: Identity set contains an identity from an allowed domain,
but it does not map to a valid username for this connector
530-GridFTP-JSON-Result: {
"DATA_TYPE": "result#1.0.0",
"code": "permission_denied",
"detail": {
"DATA_TYPE": "invalid_user#1.0.0"
},
"has_next_page": false,
"http_response_code": 403,
"message": "Identity set contains an identity from an allowed domain, but
it does not map to a valid username for this connector"
}
530 End.
This error is telling us that the Globus user attempting to access the "XYZ University Mapped Collection" has an identity from an identity domain allowed by the storage gateway backing the collection, but that this identity cannot be mapped by the storage gateway to a valid user on the storage system that the storage gateway is configured to use.
3.1.2. Troubleshooting Steps
You’ll want to check the identity set of the user reporting the error and verify that it contains an identity from one of the identity domains the storage gateway backing the collection is configured to allow by checking the collection and storage gateway details. If the user’s identity set doesn’t contain an identity from an allowed identity domain, then that will have to be addressed before the user will be able to access the collection.
Once you’ve checked the user’s identity set and verified that it contains an identity from an identity domain allowed by the storage gateway, you’ll want to look up the collection and storage gateway details to see if the storage gateway is using a custom identity mapping policy or a default identity mapping policy. If the storage gateway is using a default identity mapping policy then you’ll want be sure that the user’s Globus identity maps to a valid user on the storage system backing the storage gateway per the default mapping policy for that storage gateway type. If the storage gateway is not using a default identity mapping policy, then you’ll need to look at the custom mapping policy being used by the storage gateway and ensure that it properly maps the user’s Globus identity to a valid user on the storage system backing the storage gateway. Our Identity Mapping Guide will offer additional guidance in how to interpret and create custom identity mapping policy.
3.2. Troubleshooting Local Account Related Mapped Collection Access Issues
The storage system user to which a Globus user’s permitted identity maps must be valid and must not be disabled.
3.2.1. Common Errors
Below are some common errors that indicate that there is an issue with the user account on the storage system to which a Globus user’s identity maps.
Example System Account is Disabled Error
Command Failed: Error (login)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
Message: Login Failed
---
Details: 530 Login incorrect. : Access denied, user's system account is disabled.
This error is telling us that the Globus user attempting to access the "XYZ University Mapped Collection" has an identity from an identity domain allowed by the storage gateway backing the collection, but that mapping policy on the storage gateway backing the collection is mapping this identity to a disabled user account on the storage system that the storage gateway has been configured to use.
3.2.2. Troubleshooting Steps
This error is most commonly encountered when a Globus user is attempting to access a mapped collection backed by a POSIX storage gateway and the local account on the system to which their Globus identity maps is disabled. Make sure that the user’s local account is not disabled and make sure that it is configured to use a valid shell.
3.3. Troubleshooting Policy Based Mapped Collection Access Issues
In order for a Globus user to be able to access a given path on a mapped collection, that path must be permitted by the storage gateway’s applicable restrict paths policy.
3.3.1. Common Errors
Below are some common errors that indicate that a user is attempting to access a path on a mapped collection that is denied by restrict paths policy on the storage gateway.
Example Path Not Allowed Error
Denied by endpoint, Command Failed: Error (list)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
Command: MLST /path/to/directory/
Message: Fatal FTP Response
---
Details: 500 Command failed : Path not allowed
.
This error is telling us that the Globus user is being denied access to the "/path/to/directory/" path on "XYZ University Mapped Collection" because this path is not permitted by restrict paths policy on the storage gateway.
3.3.2. Troubleshooting Steps
You’ll want to ensure that the path shown in the error is permitted by the restrict paths policy on the storage gateway. You can see the restrict paths policy on the storage gateway by checking the storage gateway’s details. Keep in mind that the path shown in the error is relative to the collection root. You can see the mapped collection’s root path by checking the mapped collection’s details.
3.4. Troubleshooting Policy and Globus Permission Based Guest Collection Access Issues
In order for a Globus user to be able to access a given path on a guest collection, that path must be permitted by the restrict paths policy on the storage gateway and the sharing restrict paths policy on the guest collection. Additionally, for a Globus user to be able to access a guest collection, their identity must be associated with a permission on the guest collection that grants them the appropriate level of access to the path in question.
3.4.1. Common Errors
Below are some common errors that indicate that a user is attempting to access a path on a guest collection that is denied by restrict paths policy on the storage gateway, sharing restrict paths policy on the guest collection, or permissions on the guest collection.
Example Path Not Allowed Error
Denied by endpoint, Command Failed: Error (list)
Endpoint: XYZ University Guest Collection
Server: dtn-hostname.xyzu.edu:443
Command: MLST /path/to/directory/
Message: Fatal FTP Response
---
Details: 500 Command failed : Path not allowed
.
This error is telling us that the Globus user is being denied access to the "/path/to/directory/" path on "XYZ University Guest Collection" because this path is not permitted by one or more of 1) the restrict paths policy on the storage gateway, 2) the sharing restrict paths policy on the guest collection, or 3) the permissions set on the guest collection.
Example No Effective ACL Rules Error
No effective ACL rules on the endpoint XYZ University Guest Collection
This error is telling us that the Globus user is being denied access to the "XYZ University Guest Collection" due to there not being permissions on the collection associated with the user’s identity.
3.4.2. Troubleshooting Steps
You’ll want to ensure that the path shown in the error is permitted by the restrict paths policy on the storage gateway. You can see the restrict paths policy on the storage gateway by checking the storage gateway’s details. Keep in mind that the path shown in the error is relative to the collection root. You can see the guest collection’s root path by checking the guest collection’s details.
You’ll next want to ensure that the path shown in the error is permitted by the sharing restrict paths policy on the guest collection. You can see the sharing restrict paths policy on the guest collection by checking the guest collection’s details. Keep in mind that the path shown in the error is relative to the collection root. You can see the guest collection’s root path by checking the guest collection’s details.
You’ll also want to ensure that there exists a permission on the guest collection that grants access to the desired path on the collection to the Globus identity of the Globus user attempting to access it. You’ll first want to check the identities in the Globus user’s identity set. You then want to check the permissions on the guest collection to ensure that there exists a rule that grants one of the identities in the Globus user’s identity set access to the guest collection. You can see the permissions set on a guest collection by looking the guest collection up in the Globus Web App, clicking on it, and then checking the 'Permissions' tab.
3.5. Troubleshooting Storage System Permission Related Mapped Collection Access Issues
In order for a Globus user to be able to access a given path on a mapped collection, the storage system user that Globus user’s identity maps to must have the required storage system level permissions to be able to access that path. For example, for a Globus user to be able to access a path on a mapped collection backed by a POSIX storage gateway, that Globus user’s identity must map to a local user that has the needed file system permissions for the path in question.
3.5.1. Common Errors
Below are some common errors that indicate that a user is attempting to access a path on a mapped collection that is denied to them by storage system level permissions. Errors of this sort are usually particular to the specific storage system type that the storage gateway is backed by.
Example POSIX Storage System Permission Denied Error - Directory
Denied by endpoint, Command Failed: Error (list)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
Command: MLST /path/to/directory/
Message: Fatal FTP Response
---
Details: 550-GlobusError: v=1 c=PERMISSION_DENIED
550-GridFTP-Errno: 13
550-GridFTP-Reason: System error in scandir
550-GridFTP-Error-String: Permission denied
550 End.
This error is telling us that the Globus user is unable to perform a directory listing of the "/path/to/directory/" path on the "XYZ University Mapped Collection" due to the storage system user their identity maps to not having the needed permissions on the storage system.
Example POSIX Storage System Permission Denied Error - File
Error (transfer)
Endpoint: XYZ University Mapped Collection
Server: dtn-hostname.xyzu.edu:443
File: /path/to/file.txt
Command: RETR /path/to/file.txt
Message: Fatal FTP response
---
Details: 500-GlobusError: v=1 c=INTERNAL_ERROR\r\n500-GridFTP-Error:
globus_xio_register_open\r\n500-globus_xio: Unable to open
file /path/to/file.txt
\r\n500-globus_xio: System error in open:
Permission denied
\r\n500-globus_xio: A system call failed:
Permission denied
\r\n500 End.\r\n
This error is telling us that the Globus user is unable to transfer the "/path/to/file.txt" file on the "XYZ University Mapped Collection" due to the storage system user their identity maps to not having the needed permissions on the storage system to be able to access the file.
3.5.2. Troubleshooting Steps
You’ll first want to check the identities in the Globus user’s identity set. Once you’ve got the Globus user’s identity set, you’ll want to look up the storage gateway details to see how identity mapping is configured for the storage gateway so as to determine which storage system user the Globus user’s identity is being mapped to. Our Identity Mapping Guide offers additional guidance in how to interpret identity mapping policy on the storage gateway.
You’ll next want to determine the full path on the storage system for the file/directory in the error message. Keep in mind that the path shown in the error is relative to the collection root. You can see the mapped collection’s root path by checking the mapped collection’s details.
Once you’ve completed the above steps, you’ll want to check permissions on the storage system to ensure that the storage system user mapped to by the Globus user’s identity has appropriate rights to the path/file in question.
4. Troubleshooting Certificate Issues
The GCSv5.4 software makes use of certificates so that your endpoint can identify itself and interoperate with other parts of the Globus ecosystem. These certificates must be valid or your endpoint will not work correctly.
4.1. Troubleshooting Certificate Expiration Issues
If the certificates being used by your endpoint expire then your endpoint will stop working.
4.1.1. Common Errors
Below are some common errors that may indicate that your endpoint’s certificate has expired.
Example Certificate Has Expired Error
Command Failed: Error (connect)
Endpoint: XYZ University Endpoint
Server: dtn-hostname.xyzu.edu:443
Message: Could not connect to server
---
Details: an authentication operation failed\nglobus_xio_gsi:
gss_init_sec_context failed.\nGSS failure: \nGSS Major Status:
Authentication Failed\nGSS Minor Status Error Chain:\nglobus_gsi_gssapi:
SSL handshake problems\nOpenSSL Error: ../ssl/statem/statem_clnt.c:1913:
in library: SSL routines, function tls_process_server_certificate:
certificate verify failed\nglobus_gsi_callback_module:
Could not verify credential\nglobus_gsi_callback_module:
The certificate has expired
: Credential with subject:
/CN=a1b2c.9f8e.data.globus.org has expired.\n\n
This error is letting us know that the certificate being used by the "XYZ University Endpoint" endpoint has expired. This error will most commonly be encountered by users of the Globus Web App attempting to access a collection on an endpoint with an expired certificate.
4.1.2. Troubleshooting Steps
By default, the GCSv5.4 software will configure your endpoint to use a certificate issued by the Let’s Encrypt service. The GCSv5.4 software will automatically renew such a certificate for you. If the automatic renewal process for the Let’s Encrypt certificate malfunctions in some way, then the certificate will expire and you can see errors such as shown above. This automatic renewal process is handled by the GCS Manager Assistant Service. If this service is down, then the automatic renewal of the endpoint’s Let’s Encrypt certificate will fail.
You can check the status of the GCS Manager Assistant Service with this command:
systemctl -l status gcs_manager_assistant.service
If the service is down, then you can try to restart it with a command such as this:
systemctl start gcs_manager_assistant.service
If the service was down, wait ~5 minutes after restarting it before attempting to access your endpoint’s collections so as to give the service time to catch up on its tasks. You’ll also want to ensure that the GCS Manager Assistant Service is enabled so that it will automatically restart when the system is rebooted. If the GCS Manager Assistant Service is disabled, you can enable it with a command such as this:
systemctl enable gcs_manager_assistant.service
It is also possible to configure an endpoint to use a custom certificate, issued by some other CA than Let’s Encrypt. Such certificates will NOT be automatically renewed by the GCSv5.4 software, so you will need to handle such renewals yourself.
If you are able to successfully complete the above troubleshooting steps but still find that you’re having problems related to the certificate on your endpoint, then you’ll likely want to open a ticket with Globus Support to look into the matter further.
Appendix A: How To Find a Globus User’s Identity Set
A Globus user can see their identity set by logging into the Globus Web App and then going to their Identities page. The user’s primary identity will have a crown symbol by it, all other identities listed are linked identities. A user can click on the entry for a given identity to see additional details for it.
Appendix B: How to Find Collection and Storage Gateway Details
The details for a collection can be found with the globus-connect-server collection show command. For example:
globus-connect-server collection show --include-private-policies -F json COLLECTION_UUID
The details for a storage gateway can be found with the globus-connect-server storage-gateway show command. For example:
globus-connect-server storage-gateway show --include-private-policies \
-F json STORAGE_GATEWAY_UUID
A guest collection will inherit properties from the mapped collection that backs it. The "mapped_collection_id" property shown in the output of the globus-connect-server collection show
command when it is run against a guest collection will show the mapped collection that backs that guest collection. When looking at a guest collection, it is a good idea to identify the mapped collection that backs it and to then run the globus-connect-server collection show
command on that mapped collection as well. You can see the sharing restrict paths policy set on a guest collection by checking the "sharing_restrict_paths" property given by the globus-connect-server collection show
command run against the guest collection as discussed above.
A mapped collection will inherit properties from the storage gateway that backs it. The "storage_gateway_id" property shown in the output of the above command when it is run against a mapped collection will show the storage gateway that backs that mapped collection. When looking at a mapped collection, it is a good idea to use the globus-connect-server storage-gateway show
command to examine the properties of the storage gateway that backs that mapped collection as well.
When looking at the properties for a storage gateway, you can see if the gateway is using a default identity mapping policy or a custom identity mapping policy by looking for a "identity_mappings" entry in the output of the globus-connect-server storage-gateway show
command discussed above. If the "identity_mappings" entry is present, then the storage gateway is using a custom identity mapping policy. If the "identity_mappings" entry is NOT present, then the storage gateway is using default identity mapping policy. You can see the restrict paths policy set on a storage gateway by checking the "restrict_paths" property given by the globus-connect-server storage-gateway show
command run against the guest collection as discussed above.
Appendix C: Globus Logging Locations
Logging for the Globus Connect Server and GridFTP daemon, as well as HTTPS Transfer logs, can be found in the below locations:
1) Globus Connect Server application log:
/var/log/globus-connect-server/gcs-manager/gcs.log
2) GridFTP log:
/var/log/gridftp.log
3) HTTPS transfer logs (Apache/HTTPD access and error logs):
/var/log/apache2/[access*,error*]
/var/log/httpd/[access*,error*]
4) In addition to standard logging, High Assurance (HA) Endpoint Collections also provide audit logging capabilities which are further detailed on the Globus Connect Server Audit page.
Appendix D: Obtaining Debug Log Events
When troubleshooting an issue, it can often be helpful to obtain debug log events from the GridFTP and GCS Manager services that correspond to the problem you’re having. The following steps will allow you to gather such log events.
1) Enable debug logging for the GridFTP service by creating a file named '/etc/gridftp.d/z_logging' that contains only the following:
log_level ALL
2) Enable debug logging for the GCS Manager by creating a file named '/etc/sysconfig/gcs_manager' (for RHEL derived distributions) or '/etc/default/gcs_manager' (for Debian derived distributions) that contains only the following:
GCS_MANAGER_LOG_LEVEL=DEBUG
After that, you’ll need to restart the service like so:
systemctl restart gcs_manager.service
If you’re simply wanting to enable debug logging for these services then you can stop here. If you’re wanting to capture log events related to a specific action or associated with a particular error that you can reproduce, then continue below.
3) We’ll now put a marker in the log files for the GridFTP service and GCS Manager service to make the logs easier to parse:
for log in /var/log/gridftp.log /var/log/globus-connect-server/gcs-manager/gcs.log; do echo ----Start Test $(date)----- >> $log; done
4) At this point, go ahead and take the actions needed to reproduce the error you’re seeing so we can capture the log events associated with the attempt.
5) We’ll wait 60 seconds after completing step 4, and then put more markers in the log files to make them easier to parse:
for log in /var/log/gridftp.log /var/log/globus-connect-server/gcs-manager/gcs.log; do echo ----End Test $(date)----- >> $log; done
At this point, you can create a copy of the marked portions of the '/var/log/gridftp.log' and '/var/log/globus-connect-server/gcs-manager/gcs.log' files and use them to assist in your own troubleshooting or provide them to Globus support if you’ve been directed to follow these steps in a support ticket.
Appendix E: Troubleshooting GCS Node Configurations
GCSv5 supports multi-node configurations, with each node’s externally routable IP address being associated with the DNS record for an Endpoint.
In the case that your node’s IP address has, or will change (e.g. due to planned maintenance), you can determine the node’s current configurations, as well as update these configurations, by leveraging the globus-connect-server node
command in conjunction with its various options.
E.1. View your nodes:
You can determine your nodes' currently configured IP addresses
by running the globus-connect-server node list
command.
(The IP addresses are the values associated with the nodes' DNS 'A' records
and are used to route traffic to your Endpoint via round-robin DNS.)
In the example below, the output shows a two-node Endpoint.
One node is inactive, which will prevent traffic from being routed to the node
(useful for server patching, maintenance, or troubleshooting).
$ globus-connect-server node list
ID | IP Addresses | Status
------------------------------------ | ------------ | --------
8363556a-06f1-4778-81c7-734bd39ca623 | 18.118.13.43 | active
95d87888-6349-4c35-9afa-715090d64ccc | 3.16.79.168 | inactive
E.2. View node details:
You can view details for a given node by using the globus-connect-server node show
command.
The example below shows a node with matching data_interface
and ip_addresses
values,
but note that there are circumstances where these values may not match.
For example, an Endpoint node could be configured to use an internal/non-routable IP address.
$ globus-connect-server node show -F json 8363556a-06f1-4778-81c7-734bd39ca623
{
"DATA_TYPE": "node#1.2.0",
"data_interface": "18.118.13.43",
"id": "8363556a-06f1-4778-81c7-734bd39ca623",
"incoming_port_range": [
50000,
51000
],
"ip_addresses": [
"18.118.13.43"
],
"status": "active"
}
E.3. Updating node IP addresses
If your node’s IP address changes, update the node’s IP address (and potentially the data interface) by executing the 'globus-connect-server node update' command. If your endpoint or collections are configured to use a custom domain, you’ll need to update the IP addresses in that as well.
In the case that there are no currently active nodes supporting your endpoint (e.g you have executed globus-connect-server node update --disable ${nodeUUID}
on the last 'active' node), you will need to execute the GCS Node setup command on your server to re-establish a node in support of your endpoint as there will be no active nodes to service requests in the endpoints current state.
E.3.1. Update node with matching Control Channel and Data Interface IP’s:
In the case that your node’s data_interface will match its externally routable IP (this is the usual configuration), you would execute a command such as the below:
globus-connect-server node update 8363556a-06f1-4778-81c7-734bd39ca623 -i ${your_nodes_newIP} --use-explicit-host localhost
E.3.2. Update node with a Data Interface IP which differs from its Control Channel IP:
In the case that your node’s data_interface IP does not match its externally routable IP (non-standard configuration), you would execute a command such as the below:
globus-connect-server node update 8363556a-06f1-4778-81c7-734bd39ca623 -i ${your_nodes_newIP} --data-interface ${your_nodes_DataInterfaceIP} --use-explicit-host localhost
Use of the --use-explicit-host localhost
option and argument in the above examples ensure that the commands are routed to your local GCS instances API. This is an important inclusion as the command may otherwise be routed to your node’s original IP address as returned by DNS.