Guides
  • Guides
  • Tutorials
    • File Management
    • Manage Identities
    • Storage Connectors
    • Automation with Flows
  • Overviews & Concepts
    • Clients, Scopes, and Consents
    • Collections and Endpoints
    • High Assurance Collections for Protected Data
    • Security Overview
  • Recipes & Manuals
    • Automating Transfer and Share of Data from Instruments
    • Automation with Service Accounts
    • GCS Apache Reverse Proxy
    • GCS Default VirtualHost
    • Monitoring Globus Connect Server
    • MRDP
    • Require Flow on Collection Transfer Actions
    • Use Globus Preview
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Timers Transfer Globus Connect Server Compute Helper Pages
  • Applications
    Globus Connect Personal Globus Connect Server Premium Storage Connectors Compute Command Line Interface Python SDK JavaScript SDK
  • Guides
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Guides
  3. Recipes & Manuals
  4. Monitoring Globus Connect Server

Monitoring GCS Node Health

When nodes are added to a Globus Connect Server (GCS) endpoint, each node’s IP address is put behind a DNS round robin record. When clients initiate a transfer or HTTPS connection to the GCS endpoint or a collection, a node is selected from the list of IP addresses. At scale, this allows for a load balancing effect of connections to a GCS endpoint.

A problem is introduced when a node is down or unhealthy, yet clients are unaware of this and are still given the option to use it. In this case, users will see intermittent errors, which can be confusing and frustrating. In this post we discuss ways to monitor horizontally scaled GCS deployments to detect an unhealthy node before it affects users.

1. GCS Info API

Having an HTTP endpoint anonymously presented as a health checker that returns a 200 Success when things are running well, or a failure code (or simple network connection error) when they are not, is a common pattern well discussed elsewhere.

The GCS Manager API provides an /api/info endpoint. This API endpoint is anonymously available and will return a 200 success along with information about the GCS deployment when it’s running correctly. It will return an HTTP error or network failure if it is not running, not available, or otherwise cannot handle the request. This can be an effective heath check for GCS nodes.

1.1. GCS Health Check Script

The following script will look up the DNS record of a GCS endpoint and check each node’s IP address in the record to verify that the node’s GCS service is healthy. It does this by performing an HTTP GET to the /api/info endpoint of each node.

#!/bin/bash

DOMAIN_NAME=${1}

function node_up {
	ip=${1}
	echo "${ip} is alive"
}

function node_down {
	ip=${1}
	echo "The node at ${ip} is down"
}

for ip in $(dig ${DOMAIN_NAME} A +short); do
	echo "Testing ${ip}"

	code=$(curl --connect-timeout 5 -o /dev/null -s --fail -w "%{http_code}\n"  --resolve ${DOMAIN_NAME}:443:${ip} https://${DOMAIN_NAME}/api/info)
	if [[ $rc -ne 0 || ${code} != "200" ]]; then
    	node_down ${ip}
	else
    	node_up ${ip}
	fi
done

The script starts by running the dig command to look up the passed in GCS endpoint domain name. It then uses curl to check each selected IP address. If the /api/info endpoint returns an HTTP code of 200, then the node_up function is called. If the HTTP GET fails, then the node_down is called.

Endpoint administrators can modify the node_up and node_down functions to respond to the health checker in any way they wish. When the node is down, the script could invoke a call to an alert system, send a metric to a monitoring system, disable the node, or take any other action that can help prevent connections to the unhealthy node.

2. GCS Node Services

A healthy deployment of a GCS node involves multiple services. These services are run out of systemd and thus can be monitored with systemctl status <service name>. The following table shows the required services and their systemd names:

Per Node:

Service Name

Service

Description

Apache

apache2.service | httpd.service

Provides Web based access to Collections and API’s

GCS_Manager

gcs_manager.service

Main GCS service

GCS_Manager.socket

gcs_manager.socket

A communication channel for the services

GCS_Manager_Assistant

gcs_manager_assistant.service

Ensures that Endpoint configuration are kept in sync

GridFTP

globus-gridftp-server.service

Data transfer service

GCS OIDC (optional)

globus-oidc.service

The GCS OIDC identity provider service, when enabled

  • Guides
  • Tutorials
    • File Management
    • Manage Identities
    • Storage Connectors
    • Automation with Flows
  • Overviews & Concepts
    • Clients, Scopes, and Consents
    • Collections and Endpoints
    • High Assurance Collections for Protected Data
    • Security Overview
  • Recipes & Manuals
    • Automating Transfer and Share of Data from Instruments
    • Automation with Service Accounts
    • GCS Apache Reverse Proxy
    • GCS Default VirtualHost
    • Monitoring Globus Connect Server
    • MRDP
    • Require Flow on Collection Transfer Actions
    • Use Globus Preview
© 2010- The University of Chicago Legal Privacy Accessibility