Globus Streaming Application Tools
Globus provides a globus-streams-libs package that includes a system level library (PRELOAD library). The library dynamically loads into your application’s Linux process space and intercepts standard socket library calls that establish connections. This allows the library to execute the authentication protocol and contact point lookups.
Additionally, there is a CLI that works in conjunction with these client libraries. Installation of both of these tools is discussed here. In this document we will discuss how the CLI and PRELOAD library work together to provide seamless integration of Globus data streaming with your application.
1. Streams CLI
The globus-streams CLI is a simple command line utility that provides two major functions:
-
Associating Tunnel IDs with application contact strings.
-
Fetching LAN secrets from GCS for authentication.
1.1. Tunnel IDs and Contact Strings
When a user creates a tunnel, they are given a tunnel ID. A tunnel ID is simply a ID that is the key to metadata stored inside of the Globus Transfer service. To form the connections required for Globus data streaming, the TCP contact strings must be associated with the tunnel ID so that user applications are able to look up these contact strings based on tunnel ID.
The above diagram shows the network overlay of a Globus data stream. The three arrows show the three TCP connections required for the initiating application to form a connection to the listening application. Arrow 1 represents the TCP connection from the user’s initiating application to the stream access point of the initiating side GCS. Arrow 2 shows the connection between the two stream access points. Arrow 3 represents the connection from the stream access point of the listening side GCS to the user’s listening application. Here we focus on the application facing connections (arrows 1 and 3).
1.1.1. Initiating Application
Let’s start by looking at the connection from the initiating application to the initiating side GCS (arrow 1). The initiating application must discover the ip:port (contact string) to use to contact the initiating side GCS. The following globus-streams CLI command retrieves the ip:port from the Globus Transfer service and writes the ip:port to a secure file under $HOME/.globus/streams/<tunnel id>.conf, which can subsequently be accessed by the initiating application to contact the initiating side GCS.
globus-streams environment initialize $TUNNEL_ID
It is possible the ip:port is not available or has changed, for example if the initiating side GCS was not ready to accept connections when the environment was first initialized or if the tunnel restarted and therefore the ip:port changed. In such cases, the ip:port will have to be retrieved from the Globus Transfer service again, using the following globus-streams CLI command.
globus-streams environment contact-lookup $TUNNEL_ID
This command first checks to see if the file $HOME/.globus/streams/<tunnel id>.conf exists on the system. If it does, the command checks the file for the time the contact string was last fetched from the file and the timestamp of when it should be looked up in Globus Transfer again. If the contact string has not expired, the command will simply print the contact string. If it has expired, the command will look up the new contact string in Globus Transfer, write it to the file, and print out the new contact string.
1.1.2. Listening Application
Next, let’s look at the connection from the listening side GCS to the listening application (arrow 3). For the listening side GCS to contact the listening application, GCS must know the addressable hostname and port of the application. The following command stores in Globus Transfer the addressable hostname and port of the listening application associated with the tunnel ID so that it is available to the listening side GCS.
globus-streams environment initialize --listener-contact-string 192.168.0.10:8888 $TUNNEL_ID
1.2. LAN Secret Fetching
In addition to associating the tunnel ID with the underlying listener addresses, globus-streams environment initialize command requests a LAN secret be associated with the tunnel. The secret is a random string written to $HOME/.globus/streams/<tunnel id>.conf file and is used for the entire lifetime of the tunnel. The secret is used only for the LAN connection between GCS and the application associated with that tunnel. The command must be run on both the initiator and listener sides because each side must have its own LAN secret.
2. PRELOAD Library
The PRELOAD library is a lightweight, dynamically loadable library that can run inside the process space of your application to automatically redirect connections to a Globus stream access point.
Here we see a diagram of how the preload library works. When a user runs their application, the environment variable LD_PRELOAD is set to libglobus_streams_client.so.0. Now when their application runs, the Globus library will be injected into the process space, and it will intercept calls to the system’s socket library that establish connections. When those calls are intercepted, the library will verify if they are destined for a Globus data stream or not. If not, the calls are simply passed directly through to the socket library.
If the connection calls are associated with a Globus data stream, then the PRELOAD library needs to determine the underlying contact points and perform the connection handshake protocol. Next, we describe the details of how the listening and initiator sides determine the contact points.
2.1. Listening Side
The following environment variable must be set prior to starting the listener application so that the PRELOAD library knows the listener ports and associated tunnel:
GLOBUS_STREAMS_INTERCEPT_PORT_<port number>=$TUNNEL_ID
When a call to bind() is intercepted, PRELOAD looks at the listening port being requested. If that port has a matching GLOBUS_STREAMS_INTERCEPT_PORT_<port> environment variable then PRELOAD knows this is a Globus data streams port and the tunnel ID is the value of that environment variable. Once a bound listener is identified as a Globus data stream listener, the PRELOAD library will intercept any calls to accept() and perform the needed authentication protocol.
2.2. Initiator Side
The PRELOAD library on the initiator side must find out the ip:port to use when connecting to the initiator GCS and whether the connection requires the handshake protocol. Note the output of the globus-streams CLI initialize command discussed above:
$ globus-streams environment initialize 0a866857-141d-432f-a4b9-88dbbeb09cbb
Initializing the environment for tunnel: 0a866857-141d-432f-a4b9-88dbbeb09cbb
The environment is initialized for use with tunnel 0a866857-141d-432f-a4b9-88dbbeb09cbb
Your application key file base directory is /home/ubuntu/.globus/streams/
Your contact string is: globus.0a866857-141d-432f-a4b9-88dbbeb09cbb:3425
The last line shows the contact string to use when trying to form a connection through a tunnel. The hostname is globus.0a866857-141d-432f-a4b9-88dbbeb09cbb and the port is 3425. These are dummy values formatted so that the PRELOAD library can identify them as Globus data stream connections.
When a call to gethostbyname, or getaddrinfo, is intercepted by the PRELOAD library, it checks to see if the hostname being looked up starts with globus.. If so, it knows that this connection is destined for a Globus data stream, and thus it must look up the real endpoint contact string. To do this, it execs out a call globus-streams environment contact-lookup and associates the ip:port of initiator GCS with the lookup value that it returns to the application. Later, when a connect() call is intercepted by the PRELOAD library, if both the host information and port match, then the PRELOAD library will perform the connection handshake protocol.