Globus Toolkit/GT-2

Summary

GUC will treat directory as if its a file for the purpose of sync comparison

Details

Type: Bug

Status: Open

Description

To reproduce, provide guc with a singe line input file, use the sync option, and dump-only option. (non-recursive)

Specify a source and dest path that is a directory but do not put a trailing / in.

Guc will output as if there is a file that needs to be synced.

Code in globus_l_guc_expand_single_url() checks for a trailing slash to determine if target is a directory.  The stat type field is not inspected.

I'm going to put in a patch for globus online so that an error is returned.

Comments

Globus Toolkit/GT-3

Summary

gridftp server incorrectly handles relative path configuration values

Details

Type: Bug

Status: Resolved 2012-05-22

Description

A regression in 5.2.1 causes config values set to relative paths to be invalid.
example: these should result in the same behavior, but 1) will fail.

1) cd /tmp; globus-gridftp-server -l logfile

2) cd /home; globus-gridftp-server -l /tmp/logfile

Comments

Mike Link - 2012-05-22

Fixed.

Globus Toolkit/GT-4

Summary

Can’t build gridftp server with alternate CC, needed by LTA

Details

Type: Bug

Status: Open

Description

This is my script.  It's failing in /Users/karl/koa/lta/mac_gui/ext-globus/gt5.0.2-all-source-installer/source-trees/xio/src (amont other places) because configure ignores what I set for CC.

MACOSX_SDK=/Developer/SDKs/MacOSX10.4u.sdk
export CC=/usr/bin/gcc-4.0
export GLOBUS_CC=$CC
export CFLAGS="-DMACOSX_DEPLOYMENT_TARGET=10.4 -isysroot $MACOSX_SDK \
        -mmacosx-version-min=10.4"
export LDFLAGS="-isysroot $MACOSX_SDK"
export LIBTOOL="/tmp/gtscratch/sbin/libtool-gcc32dbg --tag=CC"

#rm -rf /tmp/gtscratch
./configure --prefix=/tmp/gtscratch \
    --disable-system-openssl \
    --with-flavor=gcc32dbg \
    --with-gsiopensshargs='--without-pam'

make gpt
make globus_core
make globus-data-management-server

Please make gridftp work with an alternate CC.  This is needed so I can set Mac OS 10.4+ as the ABI.

Comments

Karl Pickett - 2010-12-07

Since I got tired of fighting the build script complexity and needed to get this working, I just symlinked /usr/bin/gcc to gcc-4.0.  So LTA isn't blocked on this.

Globus Toolkit/GT-5

Summary

globus-url-copy doesn’t pipeline url input from a file

Details

Type: Bug

Status: Open

Description

guc has different queues for urls from file input and urls from listing.  when only the input queue has urls, pipelining is very inefficient and only a small percentage of urls get pipelined.

this is somewhat related to #KOA-1085 in that only the few pipelined urls exhibit the bug in that issue.

Comments

Mike Link - 2011-01-21

the fix seems fairly obvious, but results in urls lost if the transfer cancelled or process fails, which is a worse outcome than simply inefficiency.

after verifying that this isn't a blocker for KOA it is lower priority, possibly after 5.0.3.

Globus Toolkit/GT-6

Summary

GridFTP installation on OpenBSD 4.9

Details

Type: Bug

Status: Open

Description

Dear all,

during installation of GridFTP (from GT 5.0.4) on an OpenBSD 4.9 x86_64 virtual machine, I stumbled upon a problem that blocks successful compilation.

I extracted the Globus Toolkit 5.0.4 sources, configured them with:

$ ./configure --prefix=$GLOBUS_LOCATION --with-flavor=gcc32

...and tried to compile and install GridFTP with:

$ time make gridftp

The make run starts with building the "gpt" target. But this fails after some time with the following message:

"
cd gpt && OBJECT_MODE=32 ./build_gpt
[...]
build_gpt ====> building /home/globus/tmp/gt5.0.4-all-source-installer/gpt/packaging_tools
/home/globus/usr/local/globus/sbin/gpt-build -srcdir=source-trees-thr/core/source gcc64dbgthr
sh: NOT: not found
/home/globus/usr/local/globus/etc/gpt/globus_core-src.tar.gz could not be untarred:512
Died at /home/globus/usr/local/globus/lib/perl/Grid/PkgMngmt/ExpandSource.pm line 42.
make: *** [globus_core-thr-compile] Error 2
"

Checking "[...]/ExpandSource.pm" it calls an external function "get_tool_location()" from a non-existing Perl module named "LocalEnv.pm". There only exists a file "LocalEnv.pm.in" both in the sources dir ($GLOBUS_SOURCE/gpt/packaging_tools/perl/GPT") and in the installation dir ($GLOBUS_LOCATION/lib/perl/Grid/GPT").

I don't know if Perl tries to use this module, or just refuses it, because it cannot find the *.pm file for it. The *.pm.in file still contains placeholders like "@@" for gtar, gunzip and others, that weren't replaced by the actual paths to the programs. I assume that this *.pm.in file isn't there by intention. BTW, there are other *.pm.in files located in the same "[...]/packaging_tools/perl/GPT" dir, but the corresponding *.pm files exist.

To workaround this issue, I just replaced the calls to "get_tool_location()"  for "$gunzip = [...] " and "$gtar = [...]" in lines 83/84 of "[...]/ExpandSource.pm" with the actual paths to the tools. Then making the gpt target could continue successfully. The remainder of the compilation also works through. The gridftp target is built successfully.

I don't have much insight to the GPT tools, but I assume there is an error in the process that prepares the Perl modules. Could you have a look?

Best regards,
Frank Scheiner

--
Frank Scheiner

High Performance Computing Center Stuttgart (HLRS)
Department Project User Management & Accounting

Comments

Globus Toolkit/GT-7

Summary

Issues with netlogger style logs

Details

Type: Bug

Status: Open

Description

Dear all,

I've developed a usage numbers collection toolkit using the netlogger style logs provided by the Globus GridFTP service.
Information about GridFTP operations is stored in an SQL DB. This database can be queried later to draw some useful information
from the logged data. For example:

* aggregated traffic for the last month (or another period)
* average performance of transfers
* top senders
* etc.

During testing I recognized some issues with the logfiles that hinder the collection or make it even impossible:

Tests with striped transfers showed that a striped transfer is only logged on one "stripe" (usually the last stripe) and the frontend's netlog. The problem is, that this "stripe" only logs the amount of data transferred by itself and not the whole data that was transferred by all stripes. If one wants to know the full size, one has to gather this information from the netlog on the frontend. But the netlog on the frontend doesn't log the user name (if it's not running as root - please correct me if I'm wrong). Additionally the START and DATE values (on frontend and backend) differ slightly, so one also cannot correlate the line from the frontend's netlog with the corresponding line of the last "stripe" to get the user name.
In PRACE we like to monitor availability and performance of our GridFTP servers regularly, but it would be nice if we could filter this "monitoring" traffic. With the user name it would be easy, but as I described this is not really possible, as one can either have the user name (backend netlog) or the full amount of data that was transferred (frontend netlog).

It would be very nice, if either all the backends would log the amount of data they transferred, or if the frontend would log the username and the remote system's IP address (which is also missing in the frontend's netlog, as I recently found out, it's always "0.0.0.0").

Is the current behaviour intented? If you need more details, please let me know.

Best regards,
Frank Scheiner

--
Frank Scheiner

High Performance Computing Center Stuttgart (HLRS)
Department Project User Management & Accounting

Comments

Globus Toolkit/GT-8

Summary

globus-url-copy -rst … -st … segfault if the options are not correctly synchronized

Details

Type: Bug

Status: Open

Description

1) globus-url-copy -rst -rst-retries 15 -rst-interval 1  -st 10 ... 
2) stop the network after a while (e.g. by command: service network stop)
3) the application crashes just after the retries:

Segmentation Fault

The bug is reproducable in 100%

Comment ad 1: the crash occurs if the time needed for retries > stall timeout
In the above use case:
15 retries * (1 sec of interval + time for single retry) > 10 sec of stall timeout
The problem may be workarounded by estimating the parameters, but the we never know "time for single retry" exactly,
so that potentially any coexistence of -rst and -st is dangerous.

Comment ad 2: probably the same problem will occur in case of other communication errors, but this one is easy to simulate

GT was installed from rpm epel repository, (rpm version for Centos: globus-ftp-client.x86_64 0:5.3-2.el5),
but I expect the same problem with source-compilled version.

Comments

Globus Toolkit/GT-9

Summary

Failure in globus_ftp_client_operationattr_set_authorization() results in using freed memory

Details

Type: Bug

Status: Resolved 2012-06-05

Description

In globus_ftp_client_operationattr_set_authorization(), if a strdup() fails, i_attr->auth_info.account ends up pointing to freed memory. Specifically, if the strdup() of subject fails, the restoring of i_attr->auth_info.account isn't done correctly. The problem arises if i_attr->auth_info.account isn't NULL, account is NULL, and subject isn't NULL.
I'll attach a patch.

Comments

Mike Link - 2012-06-05

Thanks. This will go out with our 5.2.2 release.

Globus Toolkit/GT-10

Summary

Problem checking directory permissions with MLST

Details

Type: Bug

Status: Open

Description

If I'm not the owner of a directory, but a member of the group who owns the directory, the server does not show me permissions of the directory via MLST.

Example:
The directory chi-vm-4.isi.edu:/test/phaseI is owned by birn-dwei:fbirn_it_docs. I'm not birn-dwei, but I'm in the group fbirn_it_docs, and the permissions of that directory are 0770.

Snipped from the server-logfile:
[27614] Tue Nov 10 07:26:20 2009 :: 152.16.51.164:52017: [CLIENT]: MLST /test/phaseI
[27614] Tue Nov 10 07:26:20 2009 :: 152.16.51.164:52017: [SERVER]: 250-status of /test/phaseI
 Type=dir;Modify=20091109221317;Size=4096;Perm=;UNIX.mode=0770;UNIX.owner=birn-dwei;UNIX.group=fbirn_it_docs;Unique=810-b58001; /test/phaseI

RFC 3659 says in section 7.5.5:
   7.5.5. The perm Fact
       The perm fact is used to indicate access rights the current FTP user
       has over the object listed. ...

Perm probably shouldn't be empty. If I own a directory, perm contains the right permissions.

Comments

Globus Toolkit/GT-11

Summary

globus-url-copy -rst segfault if the network is not working

Details

Type: Bug

Status: Open

Description

1) stop the network (e.g. service network stop)
2) run globus-url-copy with any "-rst" option (e.g. globus-url-copy -rst -rst-retries 15 -rst-interval 1 ...)
3) the application crashes just as the retries:

Segmentation Fault

The bug is reproducable in 100%

GT was installed from rpm epel repository, (rpm version for Centos: globus-ftp-client.x86_64 0:5.3-2.el5),
but I expect the same problem with source-compilled version.

Comments

Globus Toolkit/GT-12

Summary

GUC exits with zero when it hits a stall-timeout

Details

Type: Bug

Status: Open

Description

GFDL has a wrapper on top of guc that does retry transfers when guc exits with an error. As GUC is exiting with zero when it hits a stall-timeout, the tool wouldn't catch the error and retry the transfer.

Comments

Globus Toolkit/GT-13

Summary

Include python client program to GridFTP releases

Details

Type: New Feature

Status: Open

Description

Scott Koranda has created a Python GridFTP client that is a thin wrapper around our C client.  We would like to enhance that using GRIDFTP-60 (e.g. "C client v42, Python wrapper v5"), and then ship it as part of GT.  This would allow us to monitor usage of this Python client over time.  If we are seeing significant adoption, we can take over support for it ourselves and make it a standard part of Globus.

Comments

Mike Link - 2012-03-07

maybe now that this has been used a bit we can determine if it should be included.

Globus Toolkit/GT-14

Summary

globus-url-copy preserve timestamp

Details

Type: New Feature

Status: Open

Description

feature request or a bug :-)

I see someone else cares enough to write a patch to preserve timestamps though the patch does not separate out this issue. See:
http://gridftp.bio-mirror.net/biomirror/
http://gridftp.bio-mirror.net/biomirror/gt5.0.2_patches.txt

Comments

Globus Toolkit/GT-15

Summary

Add explicit CWD command to client API

Details

Type: New Feature

Status: Resolved 2012-06-05

Description

Condor has been using the gridftp client library to interact with the NorduGrid ARC job scheduler. ARC uses the gridftp procotol as its job submission/management interface. Submitting a new job includes putting a file containing its JDL (job description language). The status of submitted jobs can be monitored by getting a series of files from the server.

In order to support submission to ARC, we had to expose an explicit CWD command in the gridftp client API. In ARC, to submit a new job, the client issues a CWD command to directory /jobs/new. The server's response prints a current working directory of 'jobs/########', where '########' is the id of the new job. The client then puts a file containing the job description to '/jobs/new/job'.

I would like to contribute our patch for inclusion in the the gridftp codebase. This would allow us to use the standard gridftp libraries instead of maintaining our own patched version. The patch may also be of value to other users. The patch was made against Globus 5.0.0. Let me know if you have any questions or concerns.

Comments

Mike Link - 2012-06-05

Thanks Jaime, the patch is fine.  This will go out with our 5.2.2 release.

Globus Toolkit/GT-16

Summary

Delete option in GUC

Details

Type: New Feature

Status: Open

Description

GFDL has requested this feature.

Comments

Globus Toolkit/GT-17

Summary

Authz callout for LDAP

Details

Type: New Feature

Status: Open

Description

GFDL is using a authz callout for LDAP (developed by suragrid) in GridFTP. There is no support for this callout. They want such a callout be included in GT.

Comments

Globus Toolkit/GT-18

Summary

Provide a way for GUC to specify multiple different credentials for multiple different transfers

Details

Type: New Feature

Status: Open

Description

associate each url with a credential set, possibly via existing alias mechanism

Comments

Globus Toolkit/GT-19

Summary

RPM packaging for UDT driver

Details

Type: New Feature

Status: Resolved 2013-11-07

Description

We have some colleagues who would benefit from using UDT (they have a high-bandwidth link, but a relatively high rate of packet loss).  Currently, OSG doesn't ship UDT because RPM support is missing in Globus.

I started an email thread about it a few months back, but I don't think it ever resulted in a ticket.  One sticking point was an RPM of UDT itself.  I have a spec file, attached, that packages UDT.

Once that was done, I got lost in the depths of Globus packaging and wasn't able to get a UDT plugin for GridFTP built.

Comments

Mike Link - 2012-03-29

Thanks Brian.  We talked about this recently and expect to work on it shortly after the 5.2.1 release.

nickbertrand - 2013-04-24

Just curious if any progress has been made on getting the UDT XIO driver packaged. Manually building libglobus_xio_udt_driver.so works, but it would be nice to be able to use an RPM instead.

bbockelm - 2013-04-24

I haven't heard much myself.  Mike?

OSG is still interested in this.

bbockelm - 2013-10-22

In case there was a question, OSG is still interested in this!

Thanks,

Brian

Joe Bester - 2013-11-07

This is included in GT 5.2.5

Globus Toolkit/GT-20

Summary

Document specification for GT GridFTP implementation specific commands/protocol

Details

Type: Task

Status: Resolved 2013-01-09

Description

Document specification of commands/protocol that are added in GT GridFTP implementation. i.e, the commands that are not available in OGF GridFTP v1 and v2 spec and the relevant RFCs

Comments

Raj Kettimuthu - 2012-10-22

A first cut of the command list is available at: http://confluence.globus.org/display/GFTP/GridFTP+Command+List

Stuart Martin - 2013-01-09

resolving this since we have a first version.

Globus Toolkit/GT-21

Summary

Improve init script for gridftp

Details

Type: Task

Status: Open

Description

The GridFTP init script does not follow the fedora init conventions enough to be acceptable to EPEL, or to the OSG packaging effort. GRAM-241 has some info about things that were needed for the GRAM scripts---similar things will need to be done for the GridFTP init scripts. See http://fedoraproject.org/wiki/Packaging:SysVInitScript for more info on the fedora guidelines.

Comments

Globus Toolkit/GT-22

Summary

Include bottleneck detection and netlogger in the default GridFTP build

Details

Type: Task

Status: Open

Description

Bottleneck detection is not supported by most GridFTP servers. Main reason for this is this code is not build by default. We need to include this in default build as this is critical to identify the bottleneck for a transfer

Comments

Globus Toolkit/GT-23

Summary

MLSD does not return broken symlinks

Details

Type: Task

Status: Open

Description

Karl is implementing rm in GO and he runs into this issue. As MLSD does not return broken symlinks, he has to use SITE RDEL and that has a bug too.

Comments

Globus Toolkit/GT-24

Summary

Make DATAIP usage target as default

Details

Type: Task

Status: Open

Description

Without this information, the logs are less useful.

Comments

Globus Toolkit/GT-25

Summary

Wildcard support in MLSD

Details

Type: Task

Status: Open

Description

Many scientific communities have directories that consists of 10,000+ files. In the Globus Online web GUI, they would like to filter files in a directory using wild cards. We need support in GridFTP to list files using wild cards.

Comments

Globus Toolkit/GT-26

Summary

Determine any additional information that needs to be added to the GridFTP usage stats packets

Details

Type: Task

Status: Open

Description

Determine the list of additional things that needs to be added to the usage statistics packet for GridFTP in the next release. Some of these include:
xio stack in use
gridftp session packet (session can consist of number of individual transfers)

Comments

Globus Toolkit/GT-27

Summary

EPSV spec prohibits responding with data address

Details

Type: Task

Status: Open

Description

The fix for GRIDFTP-185 to fix IPv6 compatibility required removing the data IP address from the EPSV response.  Apparently an earlier version of the spec made the address optional, while the final version prohibits it.  The spec intention is to make NAT traversal easier, but this breaks cases where we legitimately want to have different control channel and data channel addresses.

Some clients fail immediately when encountering the address in the EPSV response (google chrome), while the globus ftp client library silently ignores it and follows the intention of the spec, always connecting to the given port on the control channel address.

We previously extended our own SPAS command to address this issue, but the client library does not support this yet.  "SPAS [1|2]" will respond with (possible multiple) EPSV formatted data contact strings including an ipv4 (SPAS 1) or ipv6 (SPAS 2) ip address.

At some point we will need a solution in order to support both ipv6 and different control/data interfaces at the same time.

Comments

Globus Toolkit/GT-28

Summary

Manpages for some programs in the globus-gridftp-server package

Details

Type: Improvement

Status: Open

Description

Adrian Colesa from the IGE project wrote some missing man pages for the globus-gridftp-server package: https://rt.ige-project.eu/rt/Ticket/Display.html?id=32
I attach these here.

Comments

Globus Toolkit/GT-29

Summary

understand possible performance improvements using dedicated circuits

Details

Type: Improvement

Status: Open

Description

From the "How OSG uses Globus" doc, it was suggested to evaluate if GridFTP performance could be improved when being used over emerging dedicated circuits (as opposed to shared best effort networks)

Comments

Globus Toolkit/GT-30

Summary

GUC gives confusing error for two party DCAU failure

Details

Type: Improvement

Status: Open

Description

The Globus Online team was attempting recursive transfers for ESG using guc's dump -only sync option.  The user cert used did not have a CA trusted by GO.  Thus, guc would fail the handshake on the data channel and initiate the TCP close but the error given was:


Details       : error: Unable to list url gsiftp://cmip2.dkrz.de:2812/gpfs_750/transfer/replication_cmip5/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/piControl/6hr/atmos/6hrPlev/r1i1p1/v20101129/psl/:
globus_ftp_client: the server responded with an error
500 500-Command failed. : an end-of-file was reached
500-globus_xio: The GSI XIO driver failed to establish a secure connection. The failure occured during a handshake read.
500-globus_xio: An end of file occurred
500 End.


Command was:

running: ['/usr/local/globus/bin/globus-url-copy', '-src-cred', '/tmp/koauser1060.1001/tmptek_8rkoaproxy', '-dst-cred', '/tmp/koauser1060.1001/tmpjRR40akoaproxy', '-r', '-sync', '-sync-level', '2', '-do', '-', u'gsiftp://cmip2.dkrz.de:2812/gpfs_750/transfer/replication_cmip5/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/piControl/6hr/atmos/6hrPlev/r1i1p1/v20101129/psl/', u'gsiftp://cmip-bdm1.badc.rl.ac.uk:2811/disks/drizzle1/archive/test-data/psl/']

Comments

Globus Toolkit/GT-31

Summary

Allow add of custom SITE command without modifying server code

Details

Type: Improvement

Status: Resolved 2013-10-16

Description

This request came from Jason Alt at NCSA, and in the context of better integration of GO Transfer with their MSS. He stated that he had to modify the GridFTP server code to add custom commands, which is not idea, and would like to see a pluggable architecture there.

Rachana

Comments

Globus Toolkit/GT-32

Summary

Add environment variables to enable ftp client support for ipv6

Details

Type: Improvement

Status: Resolved 2013-04-15

Description

This will enable older middleware to work with ipv6.

Should also enable globus_io support, or possibly change the default to allowed.

Comments

Mike Link - 2013-01-14

Fixed for 5.2.4.  Added support for the environment variable GLOBUS_FTP_CLIENT_IPV6.  When defined, it will have the same effect as the api call globus_ftp_client_operattionattr_set_allow_ipv6() with a value of TRUE.

fprelz - 2013-01-24

Are there any obvious cons to adding a similar environment variable that would have the effect of calling globus_io_attr_set_tcp_allow_ipv6() directly at the globus_xio level ?

Mike Link - 2013-01-29

Another variable for globus_io will be available in 5.2.4 to do just that: GLOBUS_IO_IPV6

fprelz - 2013-04-15

Hi: I downloaded the GT5.2.4 source. I do find the GLOBUS_IO_IPV6 variable there, but I cannot find
any reference to GLOBUS_FTP_CLIENT_IPV6, mentioned in the first comment to this ticket. Both are needed, as
GLOBUS_FTP_CLIENT_IPV6 will enable the 'extended' FTP protocol commands. Am I missing something ?
Thanks.

Globus Toolkit/GT-33

Summary

Data channel authentication is needlessly failing by trying to validate the user’s own cert

Details

Type: Improvement

Status: Open

Description

This has happened a couple times in globus online file transfer and it is really frustrating to figure out.   We try to allow any *user* cert and only do CA cert checks on hostnames (require igtf cas, etc.).  That works fine to log in to a gridftp server and do transfers.  However, when doing a directory listing that uses DCAU, the control channel lib (this happens with our new dirlist tool and guc) ends up looking at signing policy files for the user cert and bombs out if there's a problem .   It would be nice if the user cert is just marked as trusted, period, and dcau doesn't fail if a proxy issued by that cert is returned by the server.

Comments

Globus Toolkit/GT-34

Summary

GSI XIO driver not reporting useful / clear errors to a client

Details

Type: Improvement

Status: Open

Description

On the GO side, we have had two cases thus far where directory listings failed and caused an obscure error (

globus_xio_gsi: gss_init_sec_context failed.
globus_gsi_gssapi: Unable to verify remote side's credentials
globus_gsi_gssapi: SSLv3 handshake problems: Couldn't do ssl handshake
OpenSSL Error: s3_pkt.c:1087: in library: SSL routines, function SSL3_READ_BYTES
: sslv3 alert unsupported certificate SSL alert number 43

One was where we didn't have a signing policy for a cred, the other was when the signing policy check didn't succeed.  (Actually, a third case where we didn't have the CA for the user cred).  All of those gave the fairly useless SSL3_READ_BYTES error from the *server*.

We would much rather have the *client* say "client error: signing policy check failed for cert X" or "signing policy X does not exist" to make this faster to debug.

Comments

Karl Pickett - 2011-07-07

See KOA-1401 for more context.  It took almost a week to track down what was a simple signing policy problem.

Globus Toolkit/GT-35

Summary

Improve error messages

Details

Type: User Story

Status: Resolved 2012-05-02

Description

From user complaints and from internal developer debugging, a number of error messages need to be improved

Comments

Globus Toolkit/GT-36

Summary

Enhancement of GridFTP performance through network reservation integration and hardware offloading

Details

Type: User Story

Status: Open

Description

Enhance GridFTP framework to support and utilize network reservations, including a pluggable interface for existing
and future end-to-end network reservation services, integration of OSCARS and TeraPaths via GridFTP
plugins, support for advanced transport protocols via XIO modules.

Comments

Globus Toolkit/GT-37

Summary

Quantify the benefits of various features in GridFTP

Details

Type: User Story

Status: Open

Description

At the CEDPS review in May '09, we have been asked to quantify the benefits of various GridFTP features, developed as part of the CEDPS project, to the user community. Analysis of usage statistics need to be improved to get this type of information in an automated fashion.

Comments

Globus Toolkit/GT-38

Summary

Single port GridFTP

Details

Type: User Story

Status: Open

Description

Firewalls pose a problem for data channel establishment in two-channel FTP-based protocols such as GridFTP. Common firewall configurations allow outbound connection requests but block all incoming connection requests. In other words, firewalls often block the path to a listener, thus making it impossible for the listening side of the FTP data channel to be properly contacted. Solutions such as opening a range of ports have been proposed but not embraced by security-conscious system administrators.
Here the idea of single port GridFTP is proposed. Server will listen on single port (2811) for both control and data channels.
- 2811 listener is a little more than current inetd-type process -- it will read one command from a connection to know whether to start a control or data process.
- new connections come in on 2811 and give control channel auth command, and control channel process is forked.
- on control channel, pasv-type command is sent, and response includes a host:2811:token.
- data connection comes in again on 2811, gives data command, and data process is forked.
- data chan process decrypts token with host cert or dummy stripe group cert, which tells it where to contact the control process.
- data process auths to control process with user cred, or whatever it would normally use, and then token is used to determine data transfer details.
- parallelism would always be 1 and multiple connections would only be supported via stripes.

Problems:
- parallel streams are very common, so now every connection takes N times as many resources as before.
- new clients and new servers would be needed at all ends for this to work.  hurts when source server requires this but dest server is not updated.

Possible fixes for problem #2:
- Must require new client. have client send session id to both ends via delegated proxy.  new server would read that id from data channel auth to know where the data belongs.  The trick is in manipulating delegated cred to handle session id.  There is a trouble in that for any connection received on 2811, GridFTP banner message is sent. When the data connection from an old server get banner, it will die.  But this can be solved by having separate port for control and data (2811 for control and 2812 for data).
- No need for new client. Ports 0-1023 are reserved and those values will not be send in the passive response by the existing servers. Single/dual port GridFTP server can use 0 -1023 as tokens and send it in the passive response. When the active server receives 0-1023 for port, it should know that this value is token and the server is actually is listening on 2811/2812. The problem is this will work only if there is new server on both ends.

Comments

Globus Toolkit/GT-39

Summary

Design and Implement protocol enhancements to GridFTP that would enable network reservation to be integrated in the GridFTP framework

Details

Type: Technical task

Status: Open

Description

Actual network reservation can be done either by one of the servers involved in the transfer or by the client, although it makes more sense to do it in the client end. Irrespective of where in the framework this interaction is implemented, it makes sense to have the GridFTP client control whether the network reservation needs to be done or not.

Network reservation integration on the GridFTP server: A new command to reserve resources (RSRV) has to be added to protocol. This command can be used to reserve network bandwidth as well as end system resources such as memory. GridFTP client has to end this command before the data channel connection is formed. It has to provide information on the resource requirement such as amount of bandwidth or memory required, duration etc.

GridFTP client globus-url-copy: GridFTP protocol does not let the data channel connection map {source host, source port, destination host, destination port} to be known to the client or the receiving server. Striping and parallel TCP connections introduce added potential complications and limit the client's ability to speculate the limited connection map {source host, destination host}. The control channel hosts and data channel hosts are not same for striped (or multi-node) transfers. So, protocol changes are needed to provide client with the data mover information to make the reservation. A new command (BIND) that lets the client to determine the host, port information for the source data movers is needed. Client can determine the host, port information for the destination data movers using the current GridFTP protocol.

The goal is to design the new commands and implement them.

Comments

Globus Toolkit/GT-40

Summary

Integrate OSCARS with GridFTP/GlobusOnline framework

Details

Type: Technical task

Status: Open

Description

OSCARS is the network reservation system for ESnet. The network reservation component in GridFTP will be developed in a modular fashion so that it will be able to interact with multiple end-to-end reservation systems (e.g., OSCARS, TeraPaths, DRAGON) and can be used in many environments; DOE labs using ESNet, science labs using Internet II etc.

The goal here is to integrate with OSCARS.

Comments

Globus Toolkit/GT-41

Summary

Assist with the development of XIO modules for non-TCP protocols

Details

Type: Technical task

Status: Open

Description

Assist with the creation of XIO modules for emerging alternative protocols such as RDMAoE

Comments

Globus Toolkit/GT-42

Summary

OS native GridFTP-Lite only installer

Details

Type: Technical task

Status: Resolved 2012-05-09

Description

Create OS-native GridFTP-Lite Distribution with the goal to make it available as part of standard OS distributions and be available to a wide range of users.

Comments

Mike Link - 2012-03-07

accomplished in 5.2.0

Globus Toolkit/GT-43

Summary

Ability to allocate GridFTP resources

Details

Type: Technical task

Status: Open

Description

Develop capabilities in GridFTP to enable coarse-grain allocation of system resources such as CPU and memory for data transfers

Comments

Globus Toolkit/GT-44

Summary

audit not working when proxy expires

Details

Type: Bug

Status: Open

Description

when user submits a job with proxy lifetime < job lifetime, its record never appears in audit. Unless user restarts the job to check job status on completion, in which case there will be a record.
It seems that when job manager dies, job is not audited. job manager can also die when gt machine is restarted.

Comments

Globus Toolkit/GT-45

Summary

Manager lock double-locked

Details

Type: Bug

Status: Open

Description

While holding the manager lock, register_job_id is called.  Eventually, one of the child functions calls the manager lock again, resulting in deadlock.

Comments

bbockelm - 2011-08-02

This is really a separate issue, but same idea - double lock.

bbockelm - 2011-08-02

Another deadlock - manager lock is held by expire proxies, then locked again by stop all jobs.

Joe Bester - 2011-10-20

Patches for the 1st and 3rd are committed, still investigating the 2nd.

Globus Toolkit/GT-46

Summary

globus-gatekeeper leaks logfile to globus-job-manager

Details

Type: Bug

Status: Open

Description

If you do an "lsof" on a globus-job-manager, you'll notice that it holds open file handles pointing at the globus-gatekeeper.log.

We should have globus-gatekeeper.log opened with FD_CLOEXEC so the job-manager doesn't inherit it.

Comments

Globus Toolkit/GT-47

Summary

globus-job-manager null pointer dereference for some call paths

Details

Type: Bug

Status: Open

Description

In some call paths to restart a job, the **old_job_request object may be NULL.  There is an unchecked dereference, resulting in a segfault.

Note that, based on the code, I'm taking an educated guess of the correct error code.  Would be useful to have an expert review.

Comments

Globus Toolkit/GT-48

Summary

Held Condor jobs should be reported as SUSPENDED

Details

Type: Bug

Status: Open

Description

Adding this bugzilla entry to jira for tracking.

https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=6768

when a Condor job is in the held state, GRAM should report the job's status as SUSPENDED, since it certainly isn't running.

Comments

Globus Toolkit/GT-49

Summary

GRAM Fork LRM’s softenv implementation doesn’t work without SEG

Details

Type: Bug

Status: Open

Description

The softenv implementation for fork only occurs in the fork-starter code path, so it won't work for job managers where the SEG is not used for fork.

Comments

Globus Toolkit/GT-50

Summary

Possible memory issues in globus-gram-job-manager-13.34

Details

Type: Bug

Status: Resolved 2012-05-10

Description

We have an OSG site who is complaining about segfaults regularly occurring in globus-job-manager.  See https://ticket.grid.iu.edu/goc/12056 (ticket includes a corefile).

Looking through the core, it segfaults at globus_gram_job_manager_contact.c:1471.  The context pointer from globus_fifo_peek has value 0x20.  I perused the source and didn't find any obvious way for this to occur, making me think there is a memory management issue.

Comments

bbockelm - 2012-04-29

Here's the request from the callback.  Note how several of the fields look nonsensical.

(gdb) p *request
$11 = {config = 0x0, manager = 0x3e54351860, status = 0, expected_terminal_state = 0, status_update_time = 427856416, failure_code = 426510048, gt3_failure_message = 0x0, gt3_failure_type = 0x196498d0 "8.5T\240",
  gt3_failure_source = 0x0, gt3_failure_destination = 0x0, exit_code = 1412765792, stop_reason = 62, job_id_string = 0x0, original_job_id_string = 0x19659150 "(+5Tx", poll_frequency = 426023456, dry_run = 0,
  two_phase_commit = 1, commit_extend = 0, creation_time = 0, queued_time = 267700738144, cache_tag = 0x0, symbol_table = 0x0, rsl = 0x0, rsl_spec = 0x0, jm_restart = 0x0, uniq_id = 0x0, job_contact = 0x0,
  job_contact_path = 0x0, job_state_file = 0x0, scratch_dir_base = 0x0, scratchdir = 0x0, remote_io_url = 0x0, remote_io_url_file = 0x0, x509_user_proxy = 0x0, mutex = {none = 0, pthread = {__data = {__lock = 0,
        __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}, dummy = 0}, cond = {none = 0, pthread = {cond = {__data = {
          __lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' , __align = 0}, poll_space = 0, space = 0},
    dummy = 0}, client_contacts = 0x0, stage_in_todo = 0x3e54351860, stage_in_shared_todo = 0x0, stage_out_todo = 0xb9a8, stage_stream_todo = 0x2370, jobmanager_state = GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE_COMMITTED,
  restart_state = GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE_END, unsent_status_change = 8, poll_timer = 0, pending_queries = 0x18, job_dir = 0x100000055 
, streaming_requested = 6, cache_location = 0xdd18
, cache_handle = 0xdd18, job_history_file = 0x18
, job_history_status = 0, cached_stdout = 0x4
, cached_stderr = 0x0, response_context = 0x100000050, old_job_contact = 0x6
, seg_event_queue = 0xdd30, seg_last_timestamp = 56624, gateway_user = 0x17b0
, job_stats = {unsubmitted_timestamp = {tv_sec = 0, tv_nsec = 4}, file_stage_in_timestamp = {tv_sec = 16, tv_nsec = 4294967387}, pending_timestamp = {tv_sec = 6, tv_nsec = 62688}, active_timestamp = {tv_sec = 62688, tv_nsec = 277368}, failed_timestamp = {tv_sec = 0, tv_nsec = 16}, file_stage_out_timestamp = {tv_sec = 0, tv_nsec = 4294967393}, done_timestamp = {tv_sec = 6, tv_nsec = 340056}, restart_count = 340056, callback_count = 0, status_count = 14, register_count = 0, unregister_count = 0, signal_count = 0, refresh_count = 4, file_clean_up_count = 0, file_stage_in_http_count = 0, file_stage_in_https_count = 0, file_stage_in_ftp_count = 103, file_stage_in_gsiftp_count = 1, file_stage_in_shared_http_count = 2, file_stage_in_shared_https_count = 0, file_stage_in_shared_ftp_count = 340080, file_stage_in_shared_gsiftp_count = 0, file_stage_out_http_count = 340080, file_stage_out_https_count = 0, file_stage_out_ftp_count = 32300, file_stage_out_gsiftp_count = 0, client_address = 0x0, user_dn = 0x10
}, job_log_level = 0, log_pattern = 0x10000006f
}

Joe Bester - 2012-05-01

How do I download the core file? I see a mention of it but no link.

bbockelm - 2012-05-01

Hi Joe,

Sorry, you may need an OSG login to get attachments.  I have attached the core file to this ticket.

Brian

Joe Bester - 2012-05-01

Also, what OS and job manager RPM is used to create it?

bbockelm - 2012-05-01

I believe the OS is RHEL 5 (I was able to read it with gdb on a fully-updated CentOS 5.8).

Here's the job manager RPM: https://koji-hub.batlab.org/koji/buildinfo?buildID=1777

You can pull the other necessary RPMs (gram-protocol, io, etc) from the same server.  They were all the latest versions.

Joe Bester - 2012-05-01

The stack trace shows it happening during deactivation, so it's probably some callback happening after associated structures have been freed. I'll be able to investigate more tomorrow.

Joe Bester - 2012-05-10

I think this is also related to the allow-manager-restart.patch. The variable manager->done is set to GLOBUS_TRUE when it has outstanding callbacks instead of waiting for things to finish. At deactivation, the event handlers are polled, but the rest of the state has been freed already.

Joe Bester - 2012-05-10

I'll mark this as closed as the offending patch is in response to GT-156 which is open.

Globus Toolkit/GT-51

Summary

command filtering uses uninitialized variable

Details

Type: Bug

Status: Open

Description

The globus-job-manager-script.pl program doesn't pass the $job_description to the run_command() subroutine, so if $FILTER_COMMAND is non-NULL, it will not get the actual executable and arguments from job description passed to it.

Comments

Globus Toolkit/GT-52

Summary

SEG may deadlock with threads

Details

Type: Bug

Status: Open

Description

Investigating a user report, it seems likely that the SEG (at least the PBS SEG module) is capably of hitting deadlock states which prevent jobs from advancing at all from any job states expected by SEG events. This should be investigated and fixed.

Comments

Globus Toolkit/GT-53

Summary

RSL eval doesn’t indicate what symbol was not found

Details

Type: Bug

Status: Open

Description

The RSL substitution evaluation functions do not provide any context information when they fail. Their interface is defined to return only a success or failure value with no other error information included. At the minimum, it would be helpful to be able to have an API function to get info about the last evaluation error for the current thread.

Comments

Globus Toolkit/GT-54

Summary

Globus XIO close call can deadlock

Details

Type: Bug

Status: Open

Description

The globus XIO close call can deadlock - it requires two free threads to complete.

The attached patch makes the code give up on close after 60s.  It will possibly leak some resources, or close things immaturely (if there are pending requests), but it beats a deadlock!

Comments

bbockelm - 2011-08-19

While the threaded deadlock was the initial issue, I think this would be useful for non-threaded mode too.  There's no need to block the job-manager for a very long time in order to release the resource.

This is really relevant on the OSG, where we've seen nasty firewalls randomly block connections - it might take multiple hours to close a socket, if you're lucky.

Globus Toolkit/GT-55

Summary

GRAM5 job manager uses a lot of memory when SEG is pointed to incorrect log path

Details

Type: Bug

Status: Open

Description

Vladimir Mencl  reports that having the job manager SEG module configured to parse PBS logs causes it to go into a cycle of high CPU and memory use. This should be detected better and treated as a misconfiguration failure if possible.

Comments

Globus Toolkit/GT-56

Summary

Tear-down of object requires multiple threads

Details

Type: Bug

Status: Open

Description

In threaded mode, calling globus_gram_job_manager_destroy requires multiple threads (as it calls globus_io_close, which is blocks on a second callback).  For fork'd g-j-m, no other threads exist.

The attached patch simply doesn't call destroy for these processes.

Comments

Globus Toolkit/GT-57

Summary

Fork LRM doesn’t include softenv RSL attribute in rvf file

Details

Type: Bug

Status: Open

Description

The fork LRM implementation has code to handle the softenv RSL attribute (in some cases) but the attribute is not defined in the fork.rvf file so it can't be used by default without some tricks.

Comments

Globus Toolkit/GT-58

Summary

Globus GRAM return codes

Details

Type: New Feature

Status: Open

Description

This is an IGE project internal ticket that project team has decided to forward to Globus:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The return code from the payload should be returned by Globus GRAM tools (like globus-job-status or globus-job-submit) - or there should be some way to easily and uniformly obtain this return code.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Can you comment if such functionality will/can be implemented?

Comments

Globus Toolkit/GT-59

Summary

Add support for OSG’s "NFS Lite" concept

Details

Type: New Feature

Status: Open

Description

OSG has some patches to GRAM's condor LRM script to avoid using NFS between the service node and compute nodes. We should investigate these patches and get the equivalent functionality into the LRM scripts we distribute.

Comments

Stuart Martin - 2010-02-18

This was brought up at a recent OSG/Globus collaboration meeting...

OSG has added an "NFS lite" job manager, and it would be useful if it was included within Globus. The name is confusing and it can be described instead as the "Condor with file transfer" job manager. The Condor job manager shipped with Globus assumes that Condor relies on a shared file system, but many sites strongly prefer not to use a shared file system for home directories on the gatekeeper, because many NFS implementations do not scale well. (Or if they scale well, they are expensive.) The NFS lite job manager tells Condor to use file transfer instead of a shared file system. The
name derives from the fact than an OSG site can use this to eliminate one place where a shared filesystem is required, but it is still required elsewhere.

Globus Toolkit/GT-60

Summary

Create a program to help users and admin debug gram issues

Details

Type: New Feature

Status: Open

Description

A GRAM log files can be difficult to read and find the important error or debug information inside.  A tool could be written to help a user find the important information inside a gram log files.

Tasks:
        - Create and review examples with potential users of this program.
        - decide on the format and options required
        - implement program

Comments

Globus Toolkit/GT-61

Summary

Implement watchdog timer for globus-job-manager

Details

Type: New Feature

Status: Open

Description

Every so often, globus-job-manager may deadlock or have threads die.

In such a case, we ought to have a watchdog timer in the main thread that will cause the globus-job-manager to die if it hasn't heard from the Globus callback system in awhile.

Comments

Globus Toolkit/GT-62

Summary

Add support for vector operations in gram

Details

Type: New Feature

Status: Open

Description

To improve scalability, performance and efficiency for a single client processing 1000s to 100,000s of jobs, GRAM could be enhanced to allow a client to send a vector of operations.  Currently, for a client to process 1000 jobs, that will require each job to perform a set of operations (round trips) from client to service.  It would be much more efficient for the client to construct a vector of operations and enhance both the client and service to be able to process a vector.  The operations job submissions, 2-phase commits, job status queries, job cancels, subscribing for notifications, etc.

Work closely with the Condor-G team (and others) to make sure the new vector operations can be used by them to improve scalability, performance and efficiency.

Comments

Stuart Martin - 2010-02-24

Adding comments from Tuecke:

We shouldn't just hack this into/around the current protocol.  There are a bunch of things we should do to cleanup and improve the protocol, so we should consider this item within that larger context.  For example, separating out delegation, removing the need for httpg, operate on a single port rather than the job manager using anonymous port, etc.

Joe Bester - 2010-02-24

Are there users looking for that functionality?

Globus Toolkit/GT-63

Summary

Should we drop globus-gram-job-manager-pbs-setup-seg’s dependency on torque-server?

Details

Type: Task

Status: Open

Description

From a user:

Why installing and configuring a PBS based CE, I noticed that the globus-gram-job-manager-pbs-setup-seg rpm has a dependency on the torque-server.  However, it looks like the dependency is there only because the SEG uses the PBS accounting logs.  However, I'd argue that the typical resource is exporting these files to the CE using nfs and would not have the torque server installed on the CE.  Since globus gets the location of the pbs logs from the /etc/globus/globus-pbs.conf file, can we drop this dependency and just document what steps the admin needs to take in order to use the SEG?

What do you think? Is the user right? Should we change it?

Comments

Globus Toolkit/GT-64

Summary

Investigate thread safety of GRAM service

Details

Type: Task

Status: Open

Description

The RPMs generated using the nordugrid patches build everything threaded. There is a suspected bug in the SEG which causes deadlock [GRAM-139]. The job manager and clients have not been tested with threads as part of the release process, so there may be other issues. The task here is to compile and run the tests with threads and try to locate the problems that occur using the existing tests with threaded clients and service implementations. As issues are discovered other bug issues should be added to jira. The tests runs should include the protocol, client, and job manager test suites built with threads, configured with fork and some other LRM, in both cases with and without SEG enabled.

Comments

Globus Toolkit/GT-65

Summary

GRAM records datagram socket failure, but doesn’t record socket name

Details

Type: Task

Status: Resolved 2012-05-11

Description

I'm getting the following warning/error from GRAM:

ts=2012-02-26T10:18:27.325249Z id=6819 event=gram.send_job.end level=WARN status=-3 errno=2 msg="Error creating datagram socket" reason="No such file or directory"

However, I don't know what what socket it is trying to create.  The log message should be extended to include this in order to help debugging.

Comments

Joe Bester - 2012-05-11

I've committed an update to that log message to use the correct string for the different error conditions and also to add the path to the socket it tries to connect to.

Globus Toolkit/GT-66

Summary

gram bugzilla cleanup

Details

Type: Task

Status: Resolved 2012-09-12

Description

There are many old bugs in bugzilla that probably no longer apply.  Joe and Stu will review and cleanup the bugzilla gram bugs.

Comments

Stuart Martin - 2012-09-05

Reviewed all open bugs and resolved, "won't fix" for almost all of them.  Down to just 30 open bugs left.  Next is for Joe to look thru the remaining ones.

Joe Bester - 2012-09-12

Zarro Boogs found.

Globus Toolkit/GT-67

Summary

Gather Performance profile of GRAM5

Details

Type: Task

Status: Open

Description

We currently do not have a profile of the execution of the GRAM5 Job Manager. Having such a profile would help us focus on performance optimizations which will most improve GRAM5. I think that we can generate some high-level data from the CEDPS-style logging implementation, but there may be some events which are not logged, or which require additional start or end messages. Otherwise, we will have to add some other metrics collection code to record what is occurring in GRAM5.

Once we have a way to collect this information, we should generate performance profiles for various job loads so that we can have a better view of the performance picture and how it relates to service scalability.

Comments

Globus Toolkit/GT-68

Summary

Create systemd unit files for GRAM5

Details

Type: Task

Status: Open

Description

Fedora has replaced system v-style init scripts with systemd http://freedesktop.org/wiki/Software/systemd which has its own way of doing dependency-ordered process initialization. The GRAM5 services (globus-scheduler-event-generator and globus-gatekeeper) would need to have some glue written to adapt to this startup method.

Comments

Globus Toolkit/GT-69

Summary

comparison doc for GRAM5

Details

Type: Task

Status: Open

Description

Create a comparison document for GRAM5, comparing features with GRAM4 and CREAM, similar to that in http://www.globus.org/alliance/publications/papers/TG07-GRAM-comparison-final.pdf

Comments

Globus Toolkit/GT-70

Summary

Clarify new Condor polling scheme in GT 5.0.2 docs

Details

Type: Task

Status: Open

Description

The GRAM5 docs refer to the condor seg, but that is no longer used in 5.0.2. That is mentioned in the changes page, but the main documentation needs to be cleaned up as well.

Comments

Globus Toolkit/GT-71

Summary

Use SIGQUIT as a trigger to dump the request hash table

Details

Type: Task

Status: Open

Description

One incredibly useful mechanism for debugging in Java is that, when you send the process SIGQUIT, it will dump stack trace of all threads and memory statistics to stdout.

We should have a similar tool for globus-job-manager.  I propose we catch SIGQUIT and dump to the log all items in the request hash and a few pertinent pieces of information (at least job and jobmanager state).

This would help immensely in debugging GRAM-319.

Comments

bbockelm - 2012-04-06

Joe - have you had time to look at this?  It would be very useful for debugging.  We're still struggling with job-managers that don't shut down, and I'd like to be able to trace it back to jobs.

For example, I have about 100 GRAM jobs in the jobmanager "START" state after a month.  I don't know if that's causing any job managers to stay running.

Joe Bester - 2012-04-06

No, I've been dealing with build and test issues for the 5.2.1 release and haven't had a chance to look at this.

bbockelm - 2012-04-06

Ok, no problem - just wanted to see where your thinking is on this.  Something for $VERSION_NEXT, I suppose.

Globus Toolkit/GT-72

Summary

Sanity Check GRAM5 LRM setup packages

Details

Type: Task

Status: Open

Description

The LRM setup packages treat the perl code installed in $GLOBUS_LOCATION/lib/perl as data that needs to be generated at runtime. It would simplify external packaging if the configuration could be separated from the LRM perl code, and make it have reasonable defaults if the configuration values are not present.

Comments

Joe Bester - 2010-07-13

For the condor LRM, the following is done in the LRM setup package:
1 Modify condor.in to condor.pm to set paths to condor_submit, condor_rm, and condor's mpi script, set environment variable CONDOR_CONFIG, and set an optional config parameter for vanilla jobs.
2 Create share/globus_gram_job_manager/condor.rvf
3 Probe for current machines condor os and arch
4 Create etc/grid-services/jobmanager-condor using the globus-job-manager-service.pl script

This could be replaced by a non-setup package that includes the following:
- condor.pm (not .in), with default paths to condor executables and CONDOR_CONFIG.  The default paths can be chosen at configure time, so that we can use the paths of native condor packages as the default.
- condor.rvf as a file instead of generating it by writing to the deploy directory at postinstall time
- jobmanager-condor as a file that is distributed (with stubstitutions for $prefix done at configure time).

Globus Toolkit/GT-73

Summary

Improve globus_scheduler_event_generator packaging

Details

Type: Sub-task

Status: Resolved 2012-05-02

Description

When the SEG was first designed, it was intended to be run by a GT4 container to process LRM events. One instance ran per container per LRM to process log events.

In GRAM2 and GRAM5, we didn't want to run a SEG per user per LRM, so we wrote a script "globus-job-manager-event-generator" that executes the (potentially privileged) globus-scheduler-event-generator program and writes its output to log files in a compact, LRM-independent format. One globus-job-manager-event-generator program is run per LRM. The job manager uses the  seg_job_manager_module to parse the logs generated by this program.

As a result of this architectural shift, the globus-scheduler-event-generator program is no used independently of the globus-job-manager-event-generator script. The functionality of these can be combined into a program that behaves like the latter.

Another thing to add is an init script for the SEG program to allow it to be started at boot time for all installed LRMs. By default, it can search $libdir for libglobus_seg_*.la and start a SEG process for each, with a configuration file to explicitly set which SEGs to start and what user to run those as.

Comments

Joe Bester - 2011-08-23

I've committed code to trunk that includes an init script for the SEG which determines based on installed files whether to start the SEG or not for each LRM.

I've remove the globus-job-manager-event-generator script and added an option to the SEG to write to log files in a directory like that script used to do, but without the extra forking and parsing code.

Globus Toolkit/GT-74

Summary

Improve fork packaging

Details

Type: Sub-task

Status: Resolved 2012-05-02

Description

Currently the fork LRM is implemented in the following packages:
globus_gram_job_manager_setup_fork
globus_scheduler_event_generator_fork
globus_scheduler_event_generator_fork_setup
globus_scheduler_event_generator_fork_test
globus_fork_starter
globus_scheduler_provider_setup_fork
globus_wsrf_gram_service_java_setup_fork

The last two should be removed from the packaging metadata and "cvs rm"ed as they are gram4/mds4 specfic. The first 5 should be combined into the globus_gram_job_manager_fork package.  This package would provide
    lib/perl/Globus/GRAM/JobManager/fork.pm
    lib/libglobus_seg_fork_$(GLOBUS_FLAVOR_NAME).la
    libexec/globus-fork-starter
    etc/globus-fork.conf
    etc/grid-services/jobmanager-fork
    sbin/globus-gram-setup-fork

The fork.pm file will be made a distributed file, instead of shipping fork.in and a script to transform it to fork.pm. The config file will contain the configurable items and the globus-gram-setup-fork program will optionally do the probes. That can be used by either a gpt setup package or the rpm/deb postinstall phase. The default globus-fork.conf file should look something like:

# Path to the fork SEG log file. This is used to tell the fork starter where to
# write log entries and the fork seg module where to read them from.
#
# log_path=${localstatedir}/globus-fork.log

# Path to the mpiexec command used to launch mpi2 jobs
# mpiexec=/usr/bin/mpiexec

# Path to the mpirun command used to run older mpi jobs
# mpirun=/usr/bin/mpirun

# Path to the softenv installation used to set up the job environment
# softenv_dir=

Comments

Joe Bester - 2011-08-23

I've committed new code to the trunk which includes a common configuration file for all fork operations as well as a static service definition file. The fork LRM source package include the SEG module, fork starter, and perl module. The default configuration uses the polling method, but the fork SEG can be enabled at install time.

Globus Toolkit/GT-75

Summary

Improve condor packaging

Details

Type: Sub-task

Status: Resolved 2012-05-02

Description

The GRAM Condor LRM packages are split among
globus_gram_job_manager_setup_condor
globus_wsrf_gram_service_java_setup_condor
globus_scheduler_provider_setup_condor

The latter two should be removed from the packaging list and "cvs rm"ed. The first should be changed into a non-setup package that provides the following:
    lib/perl/Globus/GRAM/JobManager/condor.pm
    etc/grid-services/jobmanager-condor
    etc/globus-condor.conf
    sbin/globus-gram-setup-condor

(Note that the condor SEG has been removed as of 5.0.2 and replaced by code in
the job manager to process per-job condor logs)

The perl module will, by default, look in the system default path for condor tools  (to work with a natively-packaged condor).  That can be
overriden by values in etc/globus-condor.conf (which by default will contain only comments about what parameters are valid and their default values).

The globus-gram-setup-condor script will probe for condor programs and modify the globus-gram-condor.conf file.  A separate GPT setup package can be created which just runs that at postinstall time. For native packages, this can be included in the postinstall rules of the rpm/deb.

I think we should remove condor-os and condor-arch from the default grid-services entry; we can easily compute that information (it's derived from
uname()) in the condor.pm module in place of doing so at postinstall time. We can ahve overrides for that in the globus-condor.conf file. Some admins have asked for more flexibility in those values so that, for example, x86 jobs can be submitted to both INTEL and X86_64 machines. We can accomplish that by defining a multi-value format for the config file.

Example configuration file:
#
# Path to the condor_submit executable
# condor_submit=/usr/bin/condor_submit

# Path to the condor_rm executable
# condor_rm=/usr/bin/condor_rm

# Path to the condor configuration file
# condor_config=/etc/condor/condor_config

# Default CondorOS requirement
# condor_os=@CONDOR_OS@

# Default CondorArch requirement
# condor_arch=@CONDOR_ARCH@

# Do file existance checking on jobs in the standard universe. If set to no,
# then jobs which refer to files which do not exist will exit with ambiguous
# error messages. However, if the execution file system is not the same as
# the submit machine's file system, this may cause jobs to fail which would
# run otherwise
# check_vanilla_files=yes

# Path to a script to launch an mpi job in Condor. If set to no, then
# MPI jobs will be rejected
# mpi_script=no

Comments

Joe Bester - 2011-08-23

I've committed code to trunk that creates a new configuration file that contains the condor things we used to probe for at setup time. I've made it so condor_os and condor_arch are not required in the jobmanager config any more. If not present, then the condor-system default will be used. I've also added those into the rvf file, so that jobs can be targeted towards specific architecture/OS.

Globus Toolkit/GT-76

Summary

Improve pbs packaging

Details

Type: Sub-task

Status: Resolved 2012-05-02

Description

The PBS LRM is implemented across the following files:
globus_gram_job_manager_setup_pbs
globus_wsrf_gram_service_java_setup_pbs
globus_scheduler_event_generator_pbs
globus_scheduler_event_generator_pbs_setup
globus_scheduler_event_generator_pbs_test
globus_scheduler_provider_setup_pbs

The wsrf and scheduler_provider setup packages can be removed from CVS as they aren't needed for GRAM5.
The others can be combined into a single package that provides:

    lib/perl/Globus/GRAM/JobManager/pbs.pm
    lib/libglobus_seg_pbs_$(GLOBUS_FLAVOR_NAME).la
    etc/globus-pbs.conf
    etc/grid-services/jobmanager-pbs
    sbin/globus-gram-setup-pbs

The globus-gram-setup-pbs probes for pbs tools and offers command-line options for the other parameters, modifying the globus-pbs.conf file. The pbs.pm file will not be autoconf substituted, but instead will read the config file for values. The setup program can be run as a setup package or via native packaging postinstall support in the rpm/deb.

The default configuration file should look something like this:
# Path to mpiexec program to launch mpi2 tasks
# mpiexec=/usr/bin/mpiexec

# Path to mpirun program to launch older-style mpi tasks
# mpirun=/usr/bin/mpirun

# Path to qsub program to submit a job to the LRM
# qsub=/usr/bin/qsub

# Path to qstat program to poll a job's status
# qstat=/usr/bin/qstat

# Path to the qdel program to cancel a job
# qdel=/usr/bin/qdel

# Flag indicating whether PBS is configured as a cluster or simple SMP machine
# cluster=yes

# Number of compute elements per schedulable nodes
# cpu_per_node=1

# Remote shell program to start executables on different nodes in the
# $PBS_NODEFILE
# remote_shell=/usr/bin/ssh

# Path to the softenv installation used to set up the job environment
# softenv_dir=

Comments

Joe Bester - 2011-08-23

I've committed code to trunk to combine the pbs SEG and perl modules into a single package with a shared configuration file.

Globus Toolkit/GT-77

Summary

Improve lsf packaging

Details

Type: Sub-task

Status: Resolved 2012-08-21

Description

The LSF LRM implementation consists of the following packages:
- globus_gram_job_manager_setup_lsf
- globus_wsrf_gram_service_java_setup_lsf
- globus_scheduler_event_generator_lsf
- globus_scheduler_event_generator_lsf_setup
- globus_scheduler_event_generator_lsf_test
- globus_scheduler_provider_setup_lsf

The wsrf and provider packages can be removed from CVS. The others can be combined into a gram lsf package that provides:

    lib/perl/Globus/GRAM/JobManager/lsf.pm
    lib/libglobus_seg_lsf_$(GLOBUS_FLAVOR_NAME).la
    etc/globus-lsf.conf
    etc/grid-services/jobmanager-lsf
    sbin/globus-gram-setup-lsf

The substituions previously done to lsf.in can be done instead to globus-lsf.conf, so that the deployed script isn't modified and all of the configuration parameters are clearly described. The globus-gram-setup-lsf program can for lsf tools and offer command-line options for the other parameters.  A separate GPT setup package will be created which just runs that at postinstall time. For native packages, this can be included in the postinstall rules of the rpm/deb.

Example Configuration file:
# Path to the LSF shell profile
# lsf_profile=/opt/lsf/conf/profile.lsf

# Path to mpirun program to launch older-style mpi tasks
# mpirun=/usr/bin/mpirun

# Path to bhist program to poll a job's status
# bhist=. $lsf_profile && bhist

# Path to bsub program to submit an LSF job
# bsub=. $lsf_profile && bsub

# Path to the bjobs program to get information about a job
# bjobs=. $lsf_profile && bjobs

# Path to the bkill program to cancel a job
# bkill=. $lsf_profile && bkill

Comments

Joe Bester - 2012-05-09

I've been talking with OSG about access to an LSF system. This and GT-96 should be doable once I have access.

Joe Bester - 2012-05-09

Likely not going to have access until after mid-June.

Joe Bester - 2012-08-21

I've committed new a LSF package and RPM/Debian metadata to the 5.2 branch and trunk.

Globus Toolkit/GT-78

Summary

Improve SGE packaging

Details

Type: Sub-task

Status: Resolved 2012-05-02

Description

Update the SGE LRM package to be less dependent on GPT setup scripts. Move configuration code out of the LRM perl module into a configuration file. Combine the SEG and LRM modules into a single source package.

Comments

Joe Bester - 2011-08-30

Committed changes and metadata to trunk last week after testing.

Globus Toolkit/GT-79

Summary

Add a high-level diagram for the approach doc

Details

Type: Improvement

Status: Open

Description

A high-level diagram is needed in the "approach" documentation to help people understand the GRAM5 architecture.

Comments

Globus Toolkit/GT-80

Summary

globus-job-manager-event-generator loads all historical events the first time run

Details

Type: Improvement

Status: Open

Description

The first time the globus-job-manager-event-generator is run, it will read all existing LRM logs and write SEG events for them. In the case of a heavily used system or one with a long history, this can take a very long time and use much CPU. It might make sense to either add a command-line option to skip the historical events or base it off of the time when the software was installed, with the assumption that no events relevant to a new GRAM installation will occur before GRAM is installed.

Comments

Globus Toolkit/GT-81

Summary

Debug/verbose flags for globusrun, globus-job-run

Details

Type: Improvement

Status: Open

Description

An option for globusrun and globus-job-run like the -dbg option for gridftp, which turns on higher levels of debugging statements.

Comments

Globus Toolkit/GT-82

Summary

SGE on Ranger loading softenv instead of modules

Details

Type: Improvement

Status: Open

Description

sge.pm is not supporting modules on TACC ranger. The MPI LD_LIBRARY_PATH and other variables are not getting loaded and MPI jobs do not run through gram5 unless the user environment is explicitly added to the user login environment. Can the SGE.pm be changed to support modules as a permanent solution?

Thanks,

Comments

Stuart Martin - 2010-09-29

I'm following up with Warren to figure out what makes sense to do here.  Thanks for the report Suresh.

Globus Toolkit/GT-83

Summary

Add gram-level prologue and epilogue script execution for mpi jobs

Details

Type: Improvement

Status: Resolved 2013-12-06

Description

In the how OSG uses Globus doc (may 2009), they request that GRAM provide support for gram-level prologue and epilogue script execution for mpi jobs (but it could be for any job)

Comments

Joe Bester - 2012-09-12

See https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5698 for more info

Stuart Martin - 2013-01-11

Here is some relevant work done in the past by LRZ. It was done a long time ago by a former colleague Gabriel Mateescu.
http://www.grid.lrz-muenchen.de/en/mware/globus/download_preamble.html

helmut - 2013-07-26

One often wants to execute some setup work before running an application. A typical example is setting the environment by way of the module command before running an MPI application.

The straightforward, but not elegant, way of doing that is to create a script, say mpijob.sh, which contains

module load mpi
mpirun -np 16 my_mpi_app
This has several disadvantages: (1) it requires to submit a script (myjob.sh) to the execution site id addition to the job description; (2) it requires hard-coding in the script the number of processes; (3) it forces the user to specfiy low-level and site-dependent information such as the MPI-launcher program, e.g., mpirun or mpiexec.

A better way is to include the setup work as part of the job description submitted to Globus, and to leave the MPI-launching mechanism to be handled by the execution site. That is, we would like something like


  my_mpi_app
  ${GLOBUS_USER_HOME}
  ...
  16
  ...

  mpi
  
    
    module load mpi
    
  


This was supported in GT4 and should also be supported in GT5!

A very detailed description of the solution can be found at
http://www.lrz.de/services/compute/grid_en/software_en/preamble_support_en/

Stuart Martin - 2013-12-06

I talked with Helmut and we decided this is not a priority.  It can be reopened if/when things change.

Globus Toolkit/GT-84

Summary

softenv extensions for GRAM5

Details

Type: Improvement

Status: Open

Description

GRAM4 has softenv extensions but GRAM5 does not. Nanohub has requested this feature in GRAM5

Comments

Stuart Martin - 2011-09-15

Another request came in for this via gt-user  http://www.mail-archive.com/gt-user@lists.globus.org/msg02712.html

Globus Toolkit/GT-85

Summary

configurable control of number of perl scripts that can run simultaneously

Details

Type: Improvement

Status: Open

Description

The GRAM5 code by default will run up to 5 perl scripts per job manager simultaneously. We should probably investigate whether having more than 1 is worthwhile, and if so, make it a tunable parameter from the job manager configuration file.

Comments

Globus Toolkit/GT-86

Summary

Modify job directory to increase number of concurrent jobs

Details

Type: Improvement

Status: Open

Description

The GRAM5 code uses a common parent directory for creation of the job-specific directories, which contain proxy, stdout, stderr, and lrm-specific files. On some filesystems (ext[234]), there is a limit to the number of hard links a file (in this case, the links ".." in the subdirs) can have, so after creating a large number of jobs, errors like the following occur

GRAM Job submission failed because mkdir failed: /home/bamboo/.globus/job/ip-10-190-201-161/16217855108623346281.6322278966767028136: Too many links (error code 22)

Comments

Globus Toolkit/GT-87

Summary

Improve developer doc for a reliable client

Details

Type: Improvement

Status: Open

Description

It is not easy to figure out from our developer doc how to write a reliable gram client.  Condor-G has done it, but I don't think many others have.  The doc should be improved to describe how to write a reliable multi-threaded gram client using GRAM5.

Comments

Globus Toolkit/GT-88

Summary

Improved error codes and error reporting for users

Details

Type: Improvement

Status: Open

Description

More helpful error codes to assist in debugging would be a big help.  Perhaps a larger number of more specific errors could be used, or errors could include stack-trace-esque information that could be used to help debug.

Comments

Globus Toolkit/GT-89

Summary

Add fallback to poll when SEG does not respond with events

Details

Type: Improvement

Status: Open

Description

For any LRMs that can optionally use SEG or poll for LRM job monitoring, if SEG is configured, but SEG events are not being received quickly enough (indicating problems / misconfiguration) the GRAM service should fallback to using poll to get the job status.  Additionally, an error or warning should be output to notify the admin about the problem.

Comments

bbockelm - 2012-08-07

Hi Joe,

Any progress to report on this?  Last update I saw was in GT-225 in mid-June.

I ask because we've recently had another issue with a PBS site losing status updates.  We haven't been able to track down the precise issue, but I suspect it might be something this would avoid.

Thanks,

Brian

Globus Toolkit/GT-90

Summary

make the job audit logging easy to deploy

Details

Type: Improvement

Status: Open

Description

To be considered in the 5.2 repackaging work, make the job audit logging easy to deploy.  Ideally installing a single package.

Comments

Globus Toolkit/GT-91

Summary

tracking gram client software

Details

Type: Improvement

Status: Open

Description

This was a significant issue last year when we were trying to understand what clients were out there using gram.

It is similar to GRIDFTP-60.

Comments

Globus Toolkit/GT-92

Summary

transition from httpg to https

Details

Type: Improvement

Status: Open

Description

GRAM5 security protocol requires GSI delegation.  Because of this standard SSL implementations cannot be used.  Delegation could be factored out of the GRAM protocol and reimplemented in the application layer.

Comments

Globus Toolkit/GT-93

Summary

Add job name attribute to SGE lrm adapter

Details

Type: Improvement

Status: Open

Description

Raminderjeet Singh has requested support for the "name" RSL attribute for the SGE LRM adapter. It currently is only implemented in the PBS adapter.

Comments

Globus Toolkit/GT-94

Summary

simplify the throughput tester program and use improved version as doc

Details

Type: Improvement

Status: Open

Description

The current throughput tester program could be simplified by stripping out the GRAM4-isms.

We probably want throughput tester programs written in C and Java.  These programs should be doc'd well in order to use them as GRAM5 client code examples.

Comments

Joe Bester - 2010-03-22

I found that there actually is a C version of the throughput tester in CVS, though it hasn't been making it to the releases. It lives in gram/testing/throughput/source

Globus Toolkit/GT-95

Summary

Add support for a "managed fork" service

Details

Type: Improvement

Status: Open

Description

OSG asks that GRAM2 add support for a "managed fork" service.  Today, condor is used.  But the requirements may be simple enough for improvements to be made to Fork in order to avoid the condor dependency.

Comments

Joe Bester - 2009-06-09

It would probably be much more  straightforward to implement a cap on the number of active fork jobs that the job manager will process in gram5 than in gram2 because the single job manager process knows about all of the jobs.

Globus Toolkit/GT-96

Summary

Updating adapter for LSF v7

Details

Type: Improvement

Status: Resolved 2012-08-21

Description

Hi Stu,

I'm helping deploy some infrastructure at UNC Chapel Hill, and have run
into some GRAM5 (5.0.3) issues I would like help with.

The site is running LSF 7, and I have already made some changes to
lsf.pm (diff attached). I suspect that the your lsf.pm was developed
against LSF 6, and that could explain the changes necessary. I think I
have a good handle on this problem, and we might even expand on the
changes ones we get past the two problems below.

We are running without SEG. SEG testing will be done once we know that
the basic mode works. The client is Condor-G.

Problem 1: Sometimes, maybe every 100 jobs or so, a job status becomes
"stuck". For example, globus-job-status will keep on returning ACTIVE
forever. It seems like the poll method in lsf.pm is never reached, so I
assume that the status is just picked up from the state file. I'm
wondering why it is not updated, and why poll() is not called in this
case. Killing the globus-job-manager process, and then running
globus-job-status again will return DONE.

Problem 2: Possibly related to high load and/or NFS mounted home
directories, but sometimes we end up with errors about not being able to
open lock files. For example:

ts=2011-08-10T17:06:16.943530Z id=12875 event=gram.state_file_read.end
level=ERROR gramid=/16145785936629637586/6793341820049045889/
status=-158
path=/usr/local/globus-5.0.3/tmp/gram_job_state/job.gnet641.its.unc.edu.16145785936629637586.6793341820049045889
msg="Error opening job lock file" errno=2 reason="No such file or directory"

I don't have a good idea for why this is happening, but I feel like the
handling of the state files and lock files could probably be made a
little bit more robust.

Any help appreciated,

--
Mats Rynge
USC/ISI - Pegasus Team 

Comments

Joe Bester - 2012-08-21

I've incorporated the lsf patches into the new version that will be in 5.2.3 as RPM and deb packages. There have been quite a few changes to the lock file handling since this report. I think this issue should be fixed as well.

Globus Toolkit/GT-97

Summary

security concerns with gass file staging

Details

Type: Improvement

Status: Open

Description

OSG raised the security issue recently with file staging in Condor-G using GRAM.  Condor-G uses a long running GAHP service for submitting and processing a users jobs.  The GAHP server starts a GASS server for file staging.  If a user's proxy is stolen, then it could be used to push/pull files from/to the GAHP's GASS server.

2 ideas proposed by OSG:
  a) change to a model where files are pushed to the GRAM service
  b) restrict the files/dirs available to a GASS server

Comments

Globus Toolkit/GT-98

Summary

Define and implement site specific callouts in the GRAM LRM interface

Details

Type: Improvement

Status: Open

Description

Sites add functionality to the GRAM LRM submission script.  Typically, a patch
is written and maintained.  It is applied to each release / install.  This
method can be problematic since the patch may not apply cleanly depending on
the changes made in the script.  One possibility would be to define a set of
callout(s) that will be run in the script.  Sites can then implement a callout
function instead of a patch.  This would avoid the fragile patch method.

Comments

Stuart Martin - 2010-02-18

Here are some details from OSG on this topic:

Currenty OSG is carrying a couple of patches to Globus that would be better handled by having a plugin mechanism, so we don't need to change the Globus source code. The two main examples:

 We patch GRAM to extend the environment with OSG-specific environment variables. It would be great if there was a hook for us to do this, so we wouldn't need to patch each of the job managers.
 We patch GRAM so when a job finishes, we collect OSG-specific accounting information. Again, a properly placed hook would mean that we don't need to patch Globus.

Globus Toolkit/GT-99

Summary

improve error output for globusrun

Details

Type: Improvement

Status: Open

Description

When David Carver was recently deploying and testing GRAM5, he could have used better error messages.  He used this table to help find solutions:  http://www.globus.org/toolkit/docs/5.0/5.0.1/execution/gram5/user/#gram5-error-codes
From that he requested the "possible solution" info be added to the error output of globusrun.

Here is an example of an error message you get from GRAM in GT 5.0.1:
-------
$ globusrun -s -r never-1.ci.uchicago.edu "&(executable=/bin/notThere)"
GRAM Job failed because the executable does not exist (error code 5)
-------

David's recommendation for the error output is this:
-------
$ globusrun -s -r never-1.ci.uchicago.edu "&(executable=/bin/notThere)"

GRAM Job failed because the executable does not exist
Error Code: 5
Reason: the executable "/bin/notThere" does not exist
Possible Solution: Check that the RSL executable attribute refers to an executable that exists on the target system.
-------

Looks like a good suggestion to me.

Comments

Globus Toolkit/GT-100

Summary

Investigate how to setup GRAM5 services in a HA setup

Details

Type: User Story

Status: Open

Description

ATLAS (and others) want to be able to cluster a set of GRAM2 services in a HA setup to provide greater scalability and reliability.

Comments

Stuart Martin - 2010-02-18

This was reiterated at a recent OSG/Globus collaboration meeting...

It would be really nice if there was a well-understood and tested mechanism to provide load balancing and failover in GRAM by having multiple gatekeepers. This is not trivial because submitting a job to one gatekeeper creates state on that gatekeeper. That said, production sites would like to find ways to keep a site running when a single gatekeeper goes down.

Globus Toolkit/GT-101

Summary

GASS Cache doesn’t check for updates

Details

Type: User Story

Status: Open

Description

This is from Ian Stokes-Rees (OSG) by way of Alain Roy:

We are struggling with a challenge presented by GASS cache.  A common mode for us to work in is:

1. User develops job script ~/osg/work/run.sh
2. User submits 20 jobs Monday which use run.sh as executable
3. User looks at results Tuesday and tweaks run.sh
4. User submits 20 more jobs Tuesday which use run.sh as executable.

We have just discovered that the GASS cache (at least at many sites with their current setting) will result in the FIRST "run.sh" script from Monday being used instead of the SECOND "run.sh" script edited on Tuesday.  Not surprisingly, this is undesired behavior.

Comments

Globus Toolkit/GT-102

Summary

client connections can’t be timed out

Details

Type: User Story

Status: Open

Description

The GRAM client API uses the io_compat API instead of XIO directly.  In order to set timeouts, the GRAM client API needs to use XIO directly.  This should be changed in GRAM5.

Jaime Frey wrote:
Today, I investigated a problem observed by Igor Sfiligoi (CC'd) that I was hoping you could comment on. He's submitting Condor-G glideins to numerous sites. In the past couple days, all gram commands started returning connection failures, though the sites were functioning normally. His gahp_server had run out of fds (fd limit of 1024). All of the fds were tied up in established connections to 20 job-managers at one site, up to 100 connections to each job-manager. The number of connections to each job-manager was roughly equal to the number of gram commands the gahp had sent to that job-manager. The gahp was trying to read data off of the connections. When I killed the jobmanagers, all of the fds were quickly closed.
The gahp server was linked with Globus 4.0.5 libraries. The remote site appears to be running a late-2.4.x release of Globus. My question is why didn't the Globus libraries in the gahp server timeout on the connections after several minutes? Are timeouts off be default in globus_xio?

Comments

Globus Toolkit/GT-103

Summary

GRAM refresh credentials test sometimes fails because job terminates

Details

Type: Bug

Status: Open

Description

The tests which refresh credentials in the globus_gram_client_test package sometimes fail if the job terminates before the refresh completes. I've marked this as a TODO test in 5.0.5 so that it won't affect bamboo build results.

Comments

Globus Toolkit/GT-104

Summary

API docs are not easily searchable

Details

Type: Bug

Status: Open

Description

I find myself looking up specific function calls, and the current page does not make this easy. A search field on would be nice, or an index containing all functions that I could Ctrl-f and link to the description.

Comments

Globus Toolkit/GT-105

Summary

NMI Build of Globus 5.0.1 for Debian 5.0 Platform is failing

Details

Type: Bug

Status: Resolved 2012-05-10

Description

The build appears to abort compilation when attempting to pull in ssl. This URL references the full output:
http://nmi-s003.cs.wisc.edu/nmi/index.php?page=results/runDetails&runid=231648&MetronomeSessID=d3h57hrb1ejv8p3hiiceuqc535&opt_user=wmihalo

Comments

Globus Toolkit/GT-106

Summary

Free requirement for cred_get_subject_name not in API docs

Details

Type: Bug

Status: Open

Description

This applies to cred_get_issuer_name as well. There is a comment:

/* ToDo: This logic needs fixing. The issuer_name is passed up and is
             freed by the caller - but it must be freed with OPENSSL_free(),
             not free() and the caller cant be expected to know that */

But the doc header (and generated API doc) does not say anything about using OPENSSL_free. cred_get_X509_*_name do document their free requirements. I imagine the behavior is relied on by many applications at this point, so just documenting the requirement seems the best course.

Comments

Globus Toolkit/GT-107

Summary

GSI XIO Driver hangs in delegation code

Details

Type: Bug

Status: Open

Description

Some of the globus_io test cases fail or hang because of a bug in the XIO GSI driver. The problem is occurring in the handling of GLOBUS_XIO_GSI_ACCEPT_DELEGATION or GLOBUS_XIO_GSI_REGISTER_ACCEPT_DELEGATION cntl implementations. If the read registered to get the accept_sec_context tokens reads (all or part of) the accept_delegation   token, it will be ignored by the accept_sec_context(), but then the cntl implementation will try to register another read to get that token. If the token has already been completely read, it will cause the server to hang in the select waiting for the token, if the token has been partially read, it could cause the server to get an incorrect token length. The solution is probably to use the same buffering code used in the init/accept sec context code so that tokens are properly read. I've not seen anything that uses this except the globus_io_accept_delegation and globus_io_register_accept_delegation code and tests for that code.

Comments

Globus Toolkit/GT-108

Summary

--libdir is being ignored

Details

Type: Bug

Status: Open

Description

not even sure if this is the right project.. but building the 5.2 toolkit from source, configure --libdir=X is ignored on linux x64 and keeps getting set to prefix/lib64.  I dont want lib64.

Comments

alina - 2012-09-26

This issue affects our project because it is expecting to find globus libraries inside lib instead of lib64. I tried to build with --libdir, but it is ignored.
We use globus 5.2.1.

Thanks,
Alina

Globus Toolkit/GT-109

Summary

gsissh init script is broken

Details

Type: Bug

Status: Resolved 2013-03-22

Description

5.2 install from source, prefix to /usr/local/foo.

The installed sbin/SXXsshd has two problems.

. ${GLOBUS_LOCATION}/libexec/globus-script-initializer  # <-- libexec/ does not exist, It should be share/globus instead?
. ${libexecdir}/globus-sh-tools.sh

PID_FILE=${localstatedir}/sshd.pid  # <--- Should be gsisshd.pid
SSHD=${sbindir}/sshd

Comments

Jim Basney - 2012-05-28

Thanks for the report. gsi_openssh_setup-5.5-src.tar.gz in gsi_openssh_bundle-5.5-src.tar.gz contains the updated paths.

Globus Toolkit/GT-110

Summary

GSS_I_DISALLOW_ENCRYPTION not being enforced by GSI C GSSAPI

Details

Type: Bug

Status: Open

Description

The GSSAPI Extensions documents (http://www.ggf.org/documents/GFD.24.pdf  and older version at http://www.ggf.org/security/gsi/draft-ggf-gss-extensions-07.pdf) define id-gss-ext-context-opts-disallow-encryption and GSS_DISALLOW_ENCRYPTION respectively. GSI C defines GSS_DISALLOW_ENCRYPTION and when specified sets the GSS_I_DISALLOW_ENCRYPTION flag on the context, but the flag is NOT enforced.

It is recommended that either the flag be enforced or GSI C modified to return a "Not Implemented" error when GSS_DISALLOW_ENCRYPTION is specified by applications. The latter is probably preferable.

Comments

Globus Toolkit/GT-111

Summary

MyProxy SRPM in 5.1 fails to build

Details

Type: Bug

Status: Open

Description

MyProxy SRPM in 5.1.1 fails to rebuild.  It appears that when we run %check, it makes a call to "grid-proxy-init", which is not brought in by the BuildRequires:

Not sure of the correct Globus project to file this under, but hopefully this gets to the right place.

Comments

Globus Toolkit/GT-112

Summary

Building source-trees-thr/database/c/sqlite/sqlite-3.3.17 causes problems

Details

Type: Bug

Status: Open

Description

The readline header file has been detected but the readline library -lreadline has not been (it happens when GT with 64-bit flavor is built, and the readline is 32-bit). However, the macro -DHAVE_READLINE=1 is set on. It looks like that the configure script decided that the readline library is also availbale. It causes problems because symbols supposed to be defined in the readline library cannot be found.

[gt-installer]$ ./configure --prefix=/home/condor/execute/dir_4041/userdir/install --with-flavor=gcc64dbg --with-buildopts='-verbose' LDFLAGS=-L/usr/local/lib



CPP='/prereq/gcc-3.4.3/bin/gcc -E'; export CPP; CPPFLAGS=' -I/home/condor/execute/dir_4041/userdir/install/include -I/home/condor/execute/dir_4041/userdir/install/include/gcc64dbgpthr'; export CPPFLAGS; CFLAGS='-g -D_REENTRANT -std=gnu99 -D_XOPEN_SOURCE=600 -D__EXTENSIONS__ -m64 -D_REENTRANT -Wall'; export CFLAGS; LDFLAGS='-L/usr/local/lib -L/home/condor/execute/dir_4041/userdir/install/lib -m64 '; export LDFLAGS; LIBS='-lsocket -lnsl -lpthread -lposix4'; export LIBS; CXX='/prereq/gcc-3.4.3/bin/g++'; export CXX; CXXCPP='/prereq/gcc-3.4.3/bin/g++ -E'; export CXXCPP; CXXFLAGS='-g -D_REENTRANT -m64 '; export CXXFLAGS; AR='/prereq/binutils-2.21/bin/ar'; export AR; ARFLAGS='ruv'; export ARFLAGS; RANLIB='/prereq/binutils-2.21/bin/ranlib'; export RANLIB; NM='/prereq/binutils-2.21/bin/nm -B'; export NM; CC='/prereq/gcc-3.4.3/bin/gcc'; export CC; ./configure --prefix=/home/condor/execute/dir_4041/userdir/install --enable-threadsafe --disable-tcl



checking for readline in -lreadline... no
checking readline.h usability... no
checking readline.h presence...
no checking for readline.h... no
checking for /usr/include/readline.h... no
checking for /usr/include/readline/readline.h... no
checking for /usr/local/include/readline.h... no
checking for /usr/local/include/readline/readline.h... yes



creating libsqlite3_gcc64dbgpthr.la
(cd .libs && rm -f libsqlite3_gcc64dbgpthr.la && ln -s ../libsqlite3_gcc64dbgpthr.la libsqlite3_gcc64dbgpthr.la)
./libtool --mode=link /prereq/gcc-3.4.3/bin/gcc -g -D_REENTRANT -std=gnu99 -D_XOPEN_SOURCE=600 -D__EXTENSIONS__ -m64 -D_REENTRANT -Wall -I. -I./src -DNDEBUG -DTHREADSAFE=1 -DSQLITE_THREAD_OVERRIDE_LOCK=-1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DHAVE_READLINE=1 -I/usr/local/include/readline -lpthread \
   -o sqlite3 ./src/shell.c libsqlite3_gcc64dbgpthr.la \
   -lcurses -lrt /prereq/gcc-3.4.3/bin/gcc -g -D_REENTRANT -std=gnu99 -D_XOPEN_SOURCE=600 -D__EXTENSIONS__ -m64 -D_REENTRANT -Wall -I. -I./src -DNDEBUG -DTHREADSAFE=1 -DSQLITE_THREAD_OVERRIDE_LOCK=-1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DHAVE_READLINE=1 -I/usr/local/include/readline -o .libs/sqlite3 ./src/shell.c ./.libs/libsqlite3_gcc64dbgpthr.so -lpthread -lcurses -lrt -R/home/condor/execute/dir_4041/userdir/install/lib Undefined first referenced symbol in file
write_history /home/condor/execute/dir_4041/ccdjLdc4.o
stifle_history /home/condor/execute/dir_4041/ccdjLdc4.o
read_history /home/condor/execute/dir_4041/ccdjLdc4.o
readline /home/condor/execute/dir_4041/ccdjLdc4.o
add_history /home/condor/execute/dir_4041/ccdjLdc4.o
ld: fatal: Symbol referencing errors. No output written to .libs/sqlite3
collect2: ld returned 1 exit status
make[1]: *** [sqlite3] Error 1
make[1]: Leaving directory `/home/condor/execute/dir_4041/userdir/gt-installer/source-trees-thr/database/c/sqlite/sqlite-3.3.17'
ERROR: Build has failed
make: *** [globus_database_sqlite-thr-compile] Error 9

Comments

Globus Toolkit/GT-113

Summary

Usage stats uploader gets confused after it hits an error

Details

Type: Bug

Status: Resolved 2012-12-04

Description

When the globus-usage-uploader hits an error trying to upload a set of data to the database, it aborts the current transaction. This has the side-effect of getting some of its locally-cached state out of sync, so that some things in future uploads which refer to the same host or service might end up with ids which aren't in the database and cause further errors. I think this is why when the usage stats uploader is behind and tries to load multiple files of packets to the database, they tend to be all-or-nothing failures. I think it should be possible to flush some of the caches when a transaction is aborted and keep things working (though somewhat slow). It might be better to separate the transactions into smaller chunks and then write failed chunks of packets to an error file instead of doing it at a full hour's worth of data level. In any case, data is not lost in this case, but requires some intervention to get into the database.

Comments

Joe Bester - 2012-12-04

I think this is fixed with the latest changes to the uploader which roll back to sane states when a packet upload fails.

Globus Toolkit/GT-114

Summary

i18n rules in installer don’t work

Details

Type: Bug

Status: Open

Description

The i18n rules in the installer's makefile for 5.0.0 and 5.0.1 don't work, as the i18n code and bundle aren't included in the installer. Either the rules should be eliminated or the i18n code should be included in the installer.

Comments

Steve Tuecke - 2010-03-31

Given IGE's mission of making Globus more European-friendly, they are likely to care a lot about Internationalization (i18n).  So we should figure out what work is involved in bringing this up-to-date with i18n, and then work with IGE to figure out who should do what.

Globus Toolkit/GT-115

Summary

Missing dependencies in the myproxy deb package for ubuntu

Details

Type: Bug

Status: Open

Description

When installing myproxy, a few dependencies are not listed as requirements.  In order to install myproxy, we need to install the following packages explicitly:

libglobus-usage0 libglobus-gss-assist3 globus-proxy-utils myproxy

Comments

Globus Toolkit/GT-116

Summary

doc how to build and create a source and binary RPM and Deb package to work with GT 5.2.x

Details

Type: Documentation

Status: Open

Description

step by step instructions for creating source RPMs and binary RPMs.  Same for Debian packages.

Comments

Globus Toolkit/GT-117

Summary

document callback_data lifecycle

Details

Type: Documentation

Status: Open

Description

I tried to re-use a globus_gsi_callback_data_t when making calls to globus_gsi_cred_verify_cert_chain, and discovered that it consumed ~100kb per call until _destroy was called.This was surprising behavior to me as a new GT user. I am also re-using a globus_gsi_cred_handle_t to load different certificates, which does work.

I assume that callback_data is designed to be passed down a call stack, so the it can't return an error on attempted re-use. I think it would be helpful to add something to the callback_data_init/destroy API docs along these lines:

The callback_data will grow every time it is passed to a GT function; it is designed to be passed down a call stall, but it should be destroyed after each top level call is made.

Comments

Globus Toolkit/GT-118

Summary

IPv6

Details

Type: New Feature

Status: Open

Description

This is an internal ticket of IGE project:

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hello,

IPv6 is becoming more and more an issue everywhere. Can Globus work with IPv6? This is more a question than a request. However, if the answer is no, it can not work with IPv6, then the request will be to upport IPv6 inGlobus.

Thanks!
Helmut
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Comments

Globus Toolkit/GT-119

Summary

Make the authz and mapping callout order configurable

Details

Type: New Feature

Status: Open

Description

Here is a request from Oscar Koeroo 

Making the "gsi-authz.conf" file use the order in which the
"globus_mapping" and "globus_authorization" statements are listed. Top
first, followed by another one. At the moment we have to push both the
call-outs to our LCAS authorization and LCMAPS mapping frameworks in
the "globus_mapping" specific call-out as it does not fit any of our
deployment scenarios to perform the mapping process before an
authorization call-out. We procure poolaccounts or other state-full
system resources at the mapping sequences which should not be procured
when the authorization failed.

Comments

Globus Toolkit/GT-120

Summary

Add support for self-signed X.509 certificates

Details

Type: New Feature

Status: Open

Description

InCommon is moving to self-signed X.509 certificates, and away from having their own CA.

https://spaces.internet2.edu/display/InCCollaborate/X.509+Certificates+in+Metadata

This is something we talked about doing in Globus years ago.  We should take a look at exactly what they are doing in InCommon, and consider adding support for that approach in Globus.  At a minimum, support would presumably including making sure our C and Java security libraries will correctly validate a self-signed cert, and that the gridmap file can refer to them.

This may be useful for usability.  For example, it might provide a means for zero-config deployment of GridFTP servers that can be used by Globus.org.

Comments

Globus Toolkit/GT-121

Summary

include IGTF CA certificate distribution

Details

Type: New Feature

Status: Open

Description

To be considered in the 5.2 repackaging effort, include IGTF CA certificate distribution - and bundle in fetch-crl

Comments

Globus Toolkit/GT-122

Summary

Implement Globus native packaging plan

Details

Type: New Feature

Status: Open

Description

Implement the plan created in RIC-60

Comments

Globus Toolkit/GT-123

Summary

Add lcas and lcmaps as a new authorization option for GT5 services

Details

Type: New Feature

Status: Open

Description

Oscar Koeroo will provide a snapshot of the LCAS and LCMAPS (L&L) authorization code.  Integrate the L&L authorization code into Globus Toolkit 5 releases.  Retain gridmap authorization as the default.  Add new configuration options to use L&L authorization instead of gridmap for GRAM, GridFTP, and GSI-OpenSSH.

Comments

Globus Toolkit/GT-124

Summary

Homebrew package for OS X users

Details

Type: New Feature

Status: Open

Description

It would be awesome to get a homebrew (http://mxcl.github.com/homebrew/) package for globus toolkit for those of us working on os x.

Comments

Globus Toolkit/GT-125

Summary

Build RPMs for Fedora 17

Details

Type: Task

Status: Resolved 2012-06-05

Description

The fedora 17 release is scheduled for release in May. We should add support for this in its current state so that we can shake out any GT-related bugs prior to its release.

Comments

Joe Bester - 2012-04-25

I've built the 5.2.1 binaries with a fedora 17 beta release, but the rpm signing tool seems to be missing or renamed, so I've disabled the fedora 17 builds for the 5.2.1 release. I'll move this to unscheduled and we can reschedule when fedora 17 release process completes.

Joe Bester - 2012-05-22

I've updated the fedora 17 amis to get the latest testing packages and the builds are working ok. I'll still have to update the ami when fedora 17 final occurs which should be in next sprint.

Joe Bester - 2012-05-29

The ami is now updated the fedora 17 final, and I've begun a new build with that ami.

Joe Bester - 2012-06-05

This is in the updates repo. To install, first get http://www.globus.org/ftppub/gt5/5.2/testing/packages/rpm/fedora/17/x86_64/Globus-testing-config.fedora-17-1.noarch.rpm

Globus Toolkit/GT-126

Summary

gpt-build defaults to putting things in non-FHS locations

Details

Type: Task

Status: Open

Description

As described by Jim Basney:



The changes in GT 5.1.0 are causing some issues when I try to install
things to $GLOBUS_LOCATION using gpt-build.

For example, when I do 'make gsi-openssh install' in the
gt5.1.0-all-source-installer directory, I get things installed in:

  $GLOBUS_LOCATION/share/globus
  $GLOBUS_LOCATION/share/man

which I believe is due (at least in part) to the config.site file in the
gt5.1.0-all-source-installer directory setting
libexecdir='${datadir}/globus'. But then when I do 'gpt-build
gsi_openssh_bundle-5.3-src.tar.gz gcc32dbg' I get things installed in:

  $GLOBUS_LOCATION/libexec
  $GLOBUS_LOCATION/man

presumably because there's no config.site file in this case. So rather
than upgrading the gsi_openssh files I have installed, I end up with new
versions installed in $GL/libexec and $GL/man while the old versions in
$GL/share are untouched.

When I try 'gpt-build myproxy-5.4.tar.gz gcc32dbg' using a GT 5.1.0
$GLOBUS_LOCATION I get:

  ERROR: Flavor gcc32dbg has not been installed
  ERROR: Build has failed

then after I do

  ln -s $GLOBUS_LOCATION/share/globus/flavors \
        $GLOBUS_LOCATION/etc/globus_core

I get:

  $GLOBUS_LOCATION/libexec/globus-build-env-gcc32dbg.sh:
  No such file or directory
  ERROR: Build has failed

So I try:

  ln -s $GLOBUS_LOCATION/share/globus/globus-build-env-gcc32dbg.sh \
        $GLOBUS_LOCATION/libexec/globus-build-env-gcc32dbg.sh

and then get:

  $GLOBUS_LOCATION/sbin/libtool-gcc32dbg: No such file or directory

so it seems the current myproxy package isn't compatible with GT 5.1.0.

If I create a new myproxy GPT package from GT 5.1.0, then I get similar
errors when trying to install it using gpt-build to a GT 5.0.3
$GLOBUS_LOCATION.

Is this expected? Will it no longer be possible for me to release GPT
packages that can be installed via gpt-build into $GLOBUS_LOCATION for
the different supported GT versions because of GT 5.1.0 GPT changes?
I'll need to release different myproxy & gsi_openssh GPT package
versions for GT 5.0.x versus GT 5.1.x/5.2.x?

Comments

Eric Blau - 2012-05-30

There's a new option to gpt-build, to the 5.2 branch.
The way it works is that GPT now installs a file similar to the 5.2 installer's config.site
file as part of the GPT installation, and then, if one does
   gpt-build -fhs 
it will put that file's location into CONFIG_SITE, and put that inside CONFIGENV_GPTMACRO.
Upshot is that your build will then get the right libexecdir set.

This seems not to have been included in 5.2.1 source installer, so it needs to be included in 5.2.2 and newer.

Globus Toolkit/GT-127

Summary

Check 5.0 Delegation Doc for Relevancy

Details

Type: Task

Status: Open

Description

From Cristina:

Can you verify the delegation pieces displayed in this section of the GT Users' Guide (for GT5)?

http://www.globus.org/toolkit/docs/5.0/5.0.0/user/#gtuser-security

Comments

Globus Toolkit/GT-128

Summary

Add support for using external packages in the installer

Details

Type: Task

Status: Open

Description

Some of the packages distributed in GT 5.0.x are just gpt-wrapped versions of external programs. It would be helpful for the native repackaging effort to drop those so that we can better meet native packaging policies and reduce our support overhead.

There are three main ways I see to do this:
* Do like we do with globus_openssl in GT 5.0.x where we have gpt packages that are just metadata wrappers around external libraries. The pro for this are that we'd not have to change metadata for dependent packages. The con is that we still have packages to deal with that don't really bring much value.
* Hand code configure.in rules to detect and use the external libraries in packages that need them. This is done in the globus_openssl package. We'd have to duplicate those configuration rules (which use pkg-config if present, falling back to environment variables falling back to configuration options).
* Add a prerequisite for pkg-config, and use its autoconf macro packages to handle external dependencies. The pro for this is that we'd get some decent handling of compile and link lines for most everything we depend on and the actual expression of dependencies is quite simple (something like PKG_CHECK_MODULES([LIBXML2], [libxml-2.0]) is all we'd need in the configuration scripts). The con is that we'd add another external dependency.

It would be helpful to have this work done in such a way that the top-level configuration script for the installer is able to detect missing dependencies prior to trying to build dependent packages. This may be hard to automate, but the number of external dependencies is rather small, so some hard-coding might be ok.

Comments

Eric Blau - 2010-04-09

  Easiest approach is GPT metadata wrappers.  Most comprehensive approach is hand coded configure.in rules.  I don't think we can necessarily count on pkg-config files to exist on all systems (several TeraGrid systems do not have pkg-config, nor .pc files for their libraries) on which GT will be built.  I like the globus_openssl approach.

Globus Toolkit/GT-129

Summary

Add Lintian testing to debian package build script

Details

Type: Task

Status: Open

Description

Debian has a tool "lintian" which checks debian packages for compliance with the debian policies. I've looked a little bit at it, but haven't figured out how to run it, but we should probably add that to the build system before 5.2 is released, so that the packages will be more useful to Mattias for inclusion in debian.

Comments

Globus Toolkit/GT-130

Summary

Myproxy should not have flavored libraries

Details

Type: Task

Status: Open

Description

Myproxy is creating flavored libraries, and putting its headers into flavored directories:

Also, I notice that GT 5.1.0 puts myproxy.h in

  $GLOBUS_LOCATION/include/globus//myproxy.h

rather than

  $GLOBUS_LOCATION/include//myproxy.h

and most other GT 5.1.0 headers are in

  $GLOBUS_LOCATION/include/globus

(i.e., not in a  subdirectory). Should I be changing something
in the myproxy package to put headers in
$GLOBUS_LOCATION/include/globus? I notice that myproxy is also the only
package installing flavored libraries in GT 5.1.0, which is probably
related.

Comments

Globus Toolkit/GT-131

Summary

Implement regular Solaris Build and Tests at NMI server for the LIGO project

Details

Type: Task

Status: Open

Description

Implement a cron job on the NMI server at Wisconsin that will reliably run Solaris Build and Tests for the LIGO project and for other users that might be interested in GT on Solaris.

Comments

Lukasz Lacinski - 2011-03-27

We have got x86_64_sol_5.10 platform.

Globus Toolkit/GT-132

Summary

Document test-toolkit package and use this software for daily NMI tests

Details

Type: Task

Status: Open

Description

test-toolkit, which ships with all of the GT software, needs better documentation. It should be placed into production on the NMI server for TEST runs. We are currently running gram-gridftp-test-package for TEST runs on the NMI server and have discovered that some critical branches of the TEST runs complete "successfully" without making any runs. This needs to be fixed. We also need to better document this useful utility so that other people within the GT community can make better use of it.

Comments

Globus Toolkit/GT-133

Summary

Figure out RLS external dependencies

Details

Type: Task

Status: Resolved 2012-05-02

Description

For native packaging it would be helpful to create external dependencies for the packages we ship in GT5 but which don't contain any patches beyond packging-related ones. RLS depends on iodbc, sqlite, sqlite_odbc, and psql_odbc. It's not clear to me whether those are all needed at compile time or whether RLS is safe to be updated for this work.

Tasks:
* Determine feasibility of updating RLS (who can do this?)
* Remove dependencies on sqlite, sqlite_odbc, and psql_odbc from RLS server and setup packages
* Add code to detect libiodbc and generate appropriate compile and link lines to RLS build

Comments

Stuart Martin - 2010-04-06

Comments and Tasks from Rob Schuler:

Since the RLS setup creates databases, we'd have to redefine what we mean by "set up". Like we could install the setup scripts and instruct the user to run them if/when the db requirements are in place.

* Add code to detect libiodbc and generate appropriate compile and link lines to RLS build

* Add code to detect sqlite (and/or mysql) libraries to setup the ODBC ini file for the user. Currently this is done through queries of GPT installation metadata.

Stuart Martin - 2012-05-02

RLS is currently not in GT 5.2

Globus Toolkit/GT-134

Summary

Remove MPI-related flavoring from globus_core and gpt

Details

Type: Task

Status: Resolved 2012-05-02

Description

The globus_core and gpt packages contain code to manage mpi flavoring of libraries and executables. Since we have removed nexus in GT5, there is no more code in GT that uses this information, and so I don't think we need to support these features any more.

Tasks:
* Remove flavor labels related to mpi from gpt
* Remove detection code for mpi from globus_core

Comments

Eric Blau - 2010-04-07

Per discussion on 4/7/2010 w/ Nick Karonis, Nick thinks that keeping this functionality will be useful for mpich-g2/mPIG/successor once a replacement for duroc/rendezvous is in place (and potentially useful for creation of said replacement.)

Stuart Martin - 2012-05-02

With the 5.2 release this is done

Globus Toolkit/GT-135

Summary

Add Java-based RLS test suite to GT Distribution

Details

Type: Task

Status: Resolved 2012-05-02

Description

Annette DeSchon has asked that the RLS test suite be added back to the GT distribution. It was included through GT 4.2.1.

Comments

Stuart Martin - 2012-05-02

RLS is currently not in GT 5.2

Globus Toolkit/GT-136

Summary

modify GPT_INIT/GLOBUS_INIT macros to allow a build using pkg-config in place of GPT scripts

Details

Type: Task

Status: Open

Description

The GPT build-time scripts do three major things.  They do the dependency checking to ensure that all compile time dependencies are met.  They convert the source metadata to binary metadata/filelists.  They assemble the linklines for the package from the dependent package linking information.  The first of these can be worked around by building in a context in which the dependency existence and ordering has been predetermined, such as the new installer model proposed in RIC-90.

Once each package outputs pkg-config metadata as well as GPT metadata, the linklines should be just as easily assembled using pkg-config as using the GPT script gpt_build_config (called from configure, as found in the GPT_INIT macro).

Altering the GPT_INIT macro to allow the use of pkg-config instead of gpt_build_config would open up the possibility of source builds that do not depend on the presence of an installed GPT.  Backwards compatible behavior could be maintained.

Conversion of source to binary GPT metadata could either be handled by an XSLT transformation, or simply ignored in the no-buildtime-GPT scenario.

Comments

Eric Blau - 2011-07-12

I've written some xslt files to pull necessary information out of GPT metadata to use pkg-config internally.  This obviously introduces a dependency on some xslt processor being present, but I think this can be scoped to bootstrap-only.

Eric Blau - 2011-08-02

Made some progress, still some issues to be worked out.  xslt can happen at bootstrap, need to get the right detection of xslt processors figured out.

Eric Blau - 2011-11-23

This is stalled and is definitely going to be a "would like to do for the future", as this feature won't be in 5.2.

Globus Toolkit/GT-137

Summary

Add tests for gsi credential library

Details

Type: Task

Status: Open

Description

Currently there are none. Some of the other tests likely exercise this library, but only certain use cases.

Comments

Globus Toolkit/GT-138

Summary

Add multiarch support for debian packages

Details

Type: Task

Status: Open

Description

Debian 7 will add multiarch support, which allows different architecture versions of the same package to be installed simultaneously on a single system with using the emulation libraries or chroots, which is nice. It is pretty different than the LSB, however, so work will need to be done to support this in our packaging metadata. Ubuntu has support for this as well. I'm not sure if all of the GT dependencies have been updated to support this, so we may be able to wait until things like openssl and libxml2 are updated for multiarch.

Comments

Globus Toolkit/GT-139

Summary

Update GT 5.2 release process document

Details

Type: Task

Status: Resolved 2013-01-14

Description

Get this page: http://confluence.globus.org/display/GT/GT+5.2+release+process up to date to include info about the bamboo scripts, frag generator, etc.

Comments

Joe Bester - 2012-04-26

I've made some changes to it (referring to the bamboo artifact fetcher script). The details will change when RIC-256 is complete, so I'm going to move this out of the current sprint.

Globus Toolkit/GT-140

Summary

move GT5 relevant JGlobus java client code to a new repo and distribution mechanism

Details

Type: Task

Status: Resolved 2012-05-10

Description

Jars certainly can be packaged up natively.  The standard way of doing this is like this:

http://packages.debian.org/lenny/all/libcommons-beanutils-java/filelist

However, as a Java developer, I rarely find this kind of packaging useful.  I would suggest that the most useful way to distribute JGlobus Java client libraries is 1) simple tar'ed distros and 2) via the Globus maven repository.

---

I can help Eric, et al. with how to set up build process/distribution process to facilitate distributing the clients via these channels.  I have some template files which could be fairly easily altered or extended for this purpose.

-Tom

Comments

Stuart Martin - 2012-05-10

JGlobus 2 release solved this

Globus Toolkit/GT-141

Summary

Non-atomic data in the database

Details

Type: Improvement

Status: Open

Description

First reported here: https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=6148

There are some parts of the various packet tables which contain combinations of two data items (java style host/ip, list of service names). Since we do queries on that field, we end up splitting the data on the client side. In some cases we don't care much about what the data is, just the distribution of different hosts or different services. These make queries slow.

Comments

Globus Toolkit/GT-142

Summary

Adding support for SHA-2 credentials

Details

Type: Improvement

Status: Open

Description

investigate what is needed to add support for SHA-2 credentials in the Globus Toolkit

Comments

Globus Toolkit/GT-143

Summary

Make it easy for the service-side authz callout to use the same loging method configured for the service

Details

Type: Improvement

Status: Open

Description

From Oscar Koeroo 

Other differences are how logging output is pushed to a logfile or
syslog. There are differences experienced in that regard to slip-stream
it with the gatekeeper, gridftpd and gsi-opensshd logging facilities.
Ideally we'd like to hook our logging facility into that what has been
chosen for the service itself.

Comments

Globus Toolkit/GT-144

Summary

increase availability of Globus' common components in linux distributions

Details

Type: Improvement

Status: Resolved 2012-05-02

Description

The Nordic Grid computing project that produces the ARC middleware (see www.knowarc.eu) has done some work to port and get some of the Globus common components into some of the common Linux distributions, notably Debian & Ubuntu. Matthias Ellert in Uppsala, Sweden who has been doing the work, here are samples.

Here is the list of source packages (GPT + 30 Globus packages):

http://packages.debian.org/source/sid/grid-packaging-tools
http://packages.debian.org/search?keywords=globus&searchon=sourcenames&suite=unstable§ion=all

The packages are slowly migrating from unstable (sid) to testing
(squeeze) [currently GPT + 3 of the Globus packages]:

http://packages.debian.org/source/squeeze/grid-packaging-tools
http://packages.debian.org/search?keywords=globus&searchon=sourcenames&suite=testing§ion=all

We should review what has been done here and commit any changes to the common components to facilitate this process of including them in Debian and Ubuntu.

Comments

Stuart Martin - 2012-05-02

This is done with the 5.2 releases

Globus Toolkit/GT-145

Summary

usage report generators need to be optimized

Details

Type: Improvement

Status: Open

Description

The report generator java code is quite complicated for what it achieves, essentially creating and collating the relations on the client side instead of in the SQL. Part of this is because the database schemas are terrible, but also because the report generator code is pretty convoluted itself.  Joe has done some work on getting the reports to cache results in the database, but it needs more work before it can be used.

Comments

Globus Toolkit/GT-146

Summary

Add ability to renew CA certificates with the same key as before

Details

Type: Improvement

Status: Open

Description

The globus simple ca can generate CA certificates, but doesn't have an option to regenerate a new CA certificate using the same private key that was previously used. If this were implemented, one could generate a new CA certificate after the original expires which would still validate the signatures of keys issued with the previous CA cert. Originally requested: http://lists.globus.org/pipermail/gt-user/2011-October/010126.html

Comments

Globus Toolkit/GT-147

Summary

Improve CRL behavior

Details

Type: Improvement

Status: Open

Description

This request comes from an OSG/Globus meeting...

Our view is that Globus support for GSI is incomplete--how do we handle CA certificates? CRLs? OSG now has tools to address these needs from outside of Globus.. That said, OSG would like to suggest that Globus revisit how CRLs work in GSI. Right now in the C implementation, a CA's CRL that has passed the "nextUpdate" is considered to be expired and no certificates issued by that CA can be accepted. However, if no CRL is present, authentication can proceed. We understand the Java GSI implementation is inconsistent with this. We realize that changing this behavior is not simple (it's not just a technical issue), but we suggest harmonization of the implementations.

Comments

Globus Toolkit/GT-148

Summary

up-to-date usage graphs for GRAM and GridFTP

Details

Type: Improvement

Status: Open

Description

It would be useful to have usage graphs that are always up-to-date.  Currently, the daily reports only provide numbers.  We could create a script that loads data from the metrics database into Google Docs, so that we always have up-to-date graphs.

Comments

Globus Toolkit/GT-149

Summary

Memory leaks in globus-job-manager

Details

Type: Bug

Status: Resolved 2012-05-17

Description

RIC-265 had a leak in the gssapi that was showing up as a problem for OSG when running long-lived GRAM job managers. I did a valgrind run with some of the test cases and it showed a few other recurring leaks in some of the newer code.

Comments

Joe Bester - 2012-05-14

Attaching valgrind-logs.tar.gz, whch contains some valgrind logs for the job manager, when running the client and jobmanager test suites with fork and condor LRMs.

Joe Bester - 2012-05-16

Also found and fixed some minor leaks in the config file parser where deleting the structure contents leaves some values allocated, but those are 1-time leaks.

bbockelm - 2012-05-16

Hi Joe,

As of 13.35, we see about 5-10 MB of leaks a day on a busy gatekeeper.  Prior RIC-265, it was about 50-100MB / day.

What's the best way to get globus-job-manager going with valgrind?  I'll happily do that for a few processes if it helps us understand where the memory goes.

Brian

Joe Bester - 2012-05-17

Attached the latest job manager with the memory leak patches (13.39). I ran with valgrind using globus-personal-gatekeeper -start -valgrind.

In general, you can modify the /etc/grid-services/jobmanager file to start valgrind, but it'll be slow to run.

bbockelm - 2012-05-17

Hi Alain, Joe,

Working with Purdue's memory leak issue, I think there's a leak here:

globus_gsi_sysconfig_get_home_dir_unix

on the order of 5MB (per job: Purdue's globus-job-manager processes leak 15GB within 20 minutes).  Looking at the stack trace:

#23 0x000000310320b944 in globus_common_v_create_string () at globus_libc.c:2212
#24 0x0000003103004d75 in globus_gsi_sysconfig_get_home_dir_unix (home_dir=0x292c5) at globus_gsi_system_config.c:4454
#25 0x0000003103005dbe in globus_gsi_sysconfig_get_cert_dir_unix (cert_dir=0x7fff1d039388) at globus_gsi_system_config.c:5082

it seems that the leak looks a whole lot like the one Joe fixed two days ago.  Can we spin and rebuild for Purdue to try it out?

Brian

alainroy - 2012-05-17

Joe, we last took globus-gram-job-manager 13.35. What other changes should we expect in 13.39? Just memory leak fixes?

bbockelm - 2012-05-17

Shoot.  I'm wrong, that's not where the memory growth is coming from (it's munmap'd later, my grep expressions were missing it).

Regardless, we should try this and see if Purdue is hitting one of the cases Joe fixed.

Joe Bester - 2012-05-17

There were two which were very recurring: condor job file name is leaked every time a job finishes, and restart jobs leak the client callback contact string.

Since the 5.2.1 release, I've committed GT-65, GT-154, GT-159, GT-185 as well as the leak fixes.

bbockelm - 2012-05-17

Hi Joe,

Unfortunately, in the end, I had to simply nuke the gram_job_state directory to get Purdue back to health.  After doing that, the issues cleared up (for now).  Will let you know when they reappear.

Brian

alainroy - 2012-05-17

Thanks guys. We'll still try to build it and put it into osg-testing and see how it helps. -alain

Globus Toolkit/GT-150

Summary

Abort in job-manager shutdown code

Details

Type: Bug

Status: Resolved 2012-05-10

Description

When going through core files on my CE, I found the following abort relatively frequently:

(gdb) bt
#0  0x00000034ae630265 in raise () from /lib64/libc.so.6
#1  0x00000034ae631d10 in abort () from /lib64/libc.so.6
#2  0x00000000004125df in globus_gram_job_manager_request_free (request=0x1cee5cd0) at globus_gram_job_manager_request.c:1315
#3  0x000000000040c02c in globus_l_gram_job_manager_remove_reference_locked (manager=0x7fff33563290, key=0x1cd6d360 "/16217900475151569756/11840111202095113110/", reason=0x42c6fc "stop all jobs")
    at globus_gram_job_manager.c:1122
#4  0x000000000040c6c6 in globus_gram_job_manager_remove_reference (manager=0x7fff33563290, key=0x1cd6d360 "/16217900475151569756/11840111202095113110/", reason=0x42c6fc "stop all jobs") at globus_gram_job_manager.c:988
#5  0x000000000040c8f3 in globus_gram_job_manager_stop_all_jobs (manager=0x7fff33563290) at globus_gram_job_manager.c:2201
#6  0x00000034b361850e in globus_callback_space_poll_nothreads () from /usr/lib64/libglobus_common.so.0
#7  0x00000034b363846f in ?? () from /usr/lib64/libglobus_common.so.0
#8  0x0000000000409432 in main (argc=, argv=) at main.c:611

Sample core file attached.  Relevant RPMs (including debuginfo RPMs) are here: https://koji-hub.batlab.org/koji/buildinfo?buildID=1694

Looking at that line of code, it seems to be relatively harmless.  We could probably just log and move on - sysadmins get nervous when they see SIGABRT show up in the kernel logs.

Comments

Joe Bester - 2012-05-10

This assertion is saying that the request still has an outstanding callback which may fire, which is unexpected, since the reference count is 0 at that point. When that callback actually fires, it will probably hit a segfault anyway, as the structure is being freed in the code with the assertion.

Joe Bester - 2012-05-10

There's something in the OSG allow-manager-restart.patch that registers the callback without adding a reference.

Joe Bester - 2012-05-10

I'm marking this as no need to fix, as I think the offending patch is in response to GT-156 which is open.

Globus Toolkit/GT-151

Summary

Build RPMs for SUSE 11

Details

Type: Task

Status: Resolved 2012-06-05

Description

SLES is used by XSEDE for Kraken and Nautilus. We don't yet generate RPMs for that platform. It has some differences from fedora and redhat in its package naming for dependencies, and uses a different tool zypper for downloading from a repository, so the spec files and build and test scripts will need some changes for that.

Comments

Joe Bester - 2012-05-07

I've created an AMI for SUSE 11 and have committed some changes to the build scripts for SUSE. I've created a build task in bamboo, and have started the builds, but they are not working yet.

Joe Bester - 2012-05-09

64-bit Builds are now completing, with the caveat that voms is not supported with myproxy, and there is no equivalent to the groupinstall for zypper as far as I can determine.

The test-rpms script will need some updates to use zypper in place of yum as well.

Joe Bester - 2012-05-09

http://en.opensuse.org/YaST_Metapackage_Handler and http://en.opensuse.org/openSUSE:Build_Service_Tutorial seem to show a way to handle package groups.

Joe Bester - 2012-06-05

The packages are uploaded to the testing repo, so they can be accessed if you install http://www.globus.org/ftppub/gt5/5.2/testing/packages/rpm/sles/11.2/RPMS/noarch/Globus-testing-config.sles-11-1.noarch.rpm
Package groups are not implemented.

Globus Toolkit/GT-152

Summary

MFMT / SITE UTIME not working properly on my mac. gt 5.0.5.

Details

Type: Bug

Status: Resolved 2012-05-22

Description

05-09 15:00:18.790 conn_send_cmd_v D0.14> SITE UTIME 20120504171508 ~/test5
05-09 15:00:18.790 conn_send_cmd_v D0.15> MDTM ~/test5


05-09 15:00:18.960 _conn_cmd_cb D0.14< 250 OK.
05-09 15:00:18.960 _conn_cmd_cb D0.15< 213 19700101010220

Comments

Mike Link - 2012-05-09

In 5.0.5 I am storing the date to set in a mode_t (used for chmod), but it appears mode_t on mac is only 2 bytes.  This would actually cause problems in at least one other way if a striped server was used, as those values would be read as 4 bytes when passed to the backend.

5.2.1 doesn't have this problem for MFMT since it uses a separate var.  But it would still have a problem passing the chmod value to the backend in split configurations.

Mike Link - 2012-05-22

5.2.x does not have the utime bug, but does have the related mode_t size problem.  I'm fixing 5.2.x by changing the chmod mode_t to an int.  I'll apply the same fix to 5.0.x cvs in case GC mac builds from there in the future, but for now it looks like GC mac will use 5.2.

Globus Toolkit/GT-153

Summary

Investigate making GET/PUT default for servers that support it

Details

Type: Task

Status: Resolved 2012-05-22

Description

VDT had patches in place to make globus-url-copy use GET/PUT (GridFTP V2) by default, as well as a ftp client library patch to allow enabling GET/PUT by environment variable, for older apps that had no way to enable it via the api.  As some users of VDT transition to standard Globus releases, it would be good to add a solution for this.  It should be alright to enable it by default for hosts that support it.  If there is a good reason not to do that, I can add the environment variable to enable the feature.

Comments

Mike Link - 2012-05-22

Made GET/PUT the default transfer commands when the server supports it.

Globus Toolkit/GT-154

Summary

Kill off perl processes when idle

Details

Type: Improvement

Status: Resolved 2012-05-11

Description

We see about 50% of the memory used by GRAM goes to the perl processes.  Even on a busy CE, a lot of these have been completely idle for quite a bit of time (I think).  Can they be killed after a given idle timeout?  We could really use the memory savings.

Comments

bbockelm - 2012-03-11

Hi Stu,

Can I bump this ticket just a bit?  When reviewing the last 3 weeks of operation, at least 1 crash was due to memory exhaustion.

Would be nice to get it scheduled for 5.2.1.

Brian

Joe Bester - 2012-05-09

here's a patch that implements this. If no script commands are queued and a script handle has been idle for more than 30 seconds, it gets closed. I'm not sure if that timeout should be tuned up or if that's sufficient to keep the condor startup load down. https://globus.atlassian.net/secure/attachment/10869/GT-154.diff

bbockelm - 2012-05-09

Hi Joe,

This comment looks incorrect to me:

+    /**
+     * Periodic callback handle to abort if something removes the lock file.
+     */
+    globus_callback_handle_t            idle_script_handle;

Other than that, looks good.

Brian

bbockelm - 2012-05-09

Hi Joe,

This comment looks incorrect to me:

+    /**
+     * Periodic callback handle to abort if something removes the lock file.
+     */
+    globus_callback_handle_t            idle_script_handle;

Other than that, looks good.

Brian

Joe Bester - 2012-05-11

I've committed this fix with the comment updated to reflect the use of the timer.

Globus Toolkit/GT-155

Summary

Job manager deletes job dir sometimes

Details

Type: Bug

Status: Resolved 2012-05-10

Description

Original reported to me from OSG via email

The issue is reported in their tracker: https://ticket.grid.iu.edu/goc/12064

Comments

Joe Bester - 2012-05-03

I tracked this down to an interaction between multiple job managers managing different LRMs. In that case, the 2nd job manager started will attempt to reload all state files, and when it hits one for a different LRM, it destroys the request structure, deleting the state dir. In previous versions, the per-job lock file prevented this behavior.

Joe Bester - 2012-05-03

Attaching 13.35 which includes a fix for this issue and GRAM-329

Joe Bester - 2012-05-09

Committed to 5.2 branch and trunk.

bbockelm - 2012-05-10

Hi Joe,

After upgrade to 13.35, I had some jobs that showed these symptoms.  Here's the excerpt from the job log:

[root@red globus]# grep 16217979690222107261.11840111202095093410 *.log
gram_hcc.log:ts=2012-05-10T19:43:51.409895Z id=23516 event=gram.state_file.read.end level=ERROR path=/var/lib/globus/gram_job_state/job.red.unl.edu.16217979690222107261.11840111202095093410 status=-124 msg="Error reading state file" reason="old job manager is still alive"
gram_hcc.log:ts=2012-05-10T19:43:52.053672Z id=23534 event=gram.state_file.read.end level=ERROR path=/var/lib/globus/gram_job_state/job.red.unl.edu.16217979690222107261.11840111202095093410 status=-122 msg="Error reading state file" reason="could not read the job state file"
gram_hcc.log:ts=2012-05-10T19:43:52.053962Z id=23534 event=gram.reload_requests.info level=WARN statedir="/var/lib/globus/gram_job_state" msg="Error restarting job" gramid=/16217979690222107261/11840111202095093410/ status=-122 reason="could not read the job state file"

So far (after 10 minutes), this has happened to 7 jobs.  Looks like there are a few more incoming though:

[root@red globus]# grep status=\-122 *.log | wc -l
126
[root@red globus]# grep status=\-124 *.log | wc -l
18684


Brian

bbockelm - 2012-05-10

Shouldn't the patched function also ignore the state file if request->config->service_tag matches?

Looking through our held jobs, I only see problems for users that have multiple DNs in use.

Joe Bester - 2012-05-10

Add this patch to ignore jobs with different tags.

Globus Toolkit/GT-156

Summary

globus-job-manager unable to shutdown

Details

Type: Bug

Status: Resolved 2012-07-26

Description

We've noticed that a few job-managers will get into a state where they believe they are shutting down, but never shutdown.  The symptom is that all jobs for that user fail and we get this message repeatedly in the log:

ts=2012-03-08T10:10:45.928806Z id=24514 event=gram.signal.end level=WARN gramid=/16217969582761169661/5881249661246812557/ signal="7" jmstate=GLOBUS_GRAM_JOB_MANAGER_STATE_STOP msg="Invalid query" status=-94 reason="the jobmanager does not accept any new requests (shutting down)"
ts=2012-03-08T10:10:45.928901Z id=24514 event=gram.query.end level=ERROR gramid=/16217969582761169661/5881249661246812557/ uri="/16217969582761169661/5881249661246812557/" msg="Error processing query" status=-94 reason="the jobmanager does not accept any new requests (shutting down)

This will go on indefinitely until a admin intervenes and manually kills the g-j-m.  Jobs from other users will continue to work: it appears this is specific to a jobmanager.

As this appears to hit the most active job-managers, it is blocking these CEs from doing useful work when it happens.

Comments

bbockelm - 2012-03-11

I think it's relevant - it may not be the *most active* job-managers, but those using gliteWMS instances that appear to trim the maximum proxy validity.  For example,

[root@red-gw1 ~]# grep gram.proxy_expire.end /var/log/globus/gram_*.log*
/var/log/globus/gram_lcgadmin.log-20120311:ts=2012-03-10T23:39:24.153378Z id=909 event=gram.proxy_expire.end level=WARN msg="Proxy expired, stopping job manager"
/var/log/globus/gram_uscmsPool1288.log-20120311:ts=2012-03-10T15:31:12.465138Z id=29545 event=gram.proxy_expire.end level=WARN msg="Proxy expired, stopping job manager"
/var/log/globus/gram_uscmsPool2652.log-20120311:ts=2012-03-11T02:48:28.713039Z id=26159 event=gram.proxy_expire.end level=WARN msg="Proxy expired, stopping job manager"
/var/log/globus/gram_uscmsPool2752.log-20120311:ts=2012-03-10T23:58:20.666248Z id=30703 event=gram.proxy_expire.end level=WARN msg="Proxy expired, stopping job manager"

When the proxy expires, the manager tries to shut down.  It appears to not be able to do this successfully (honestly, I don't think I've ever seen a g-j-m shut down).

If a new request is made against a stopped manager, even with a valid proxy, the manager stays stopped.

bbockelm - 2012-03-11

To reproduce:

1) voms-proxy-init -valid 0:11
2) globus-job-run $HOSTNAME:/jobmanager-fork `which sleep` 10m  (you'll have to hack globus-job-run for it to allow such a short proxy)
3) Background the globus-job-run process (Ctrl+Z)
4) Wait 1 minute.  The job-manager will notice the proxy expires and queues up a message for the sleeping globus-job-run instance.  This can't be delivered, of course, because it is SIGSTOP'd.
5) Start a new "globus-job-run" instance.  In the log, you'll get the following message:

ts=2012-03-11T20:03:49.516689Z id=5813 event=gram.add_request.end level=WARN gramid=/16217943208034520401/12470985504822659492/ status=-130 reason="the job manager was sent a stop signal (job is still running)"

However, *this doesn't get sent to the second client*.  The client hangs forever and is never informed it has been ignored.

I'll be attaching a patch shortly.  I still think there are issues with clients that "just go away" that prevent shutdown.

Joe Bester - 2012-03-12

There's a timeout in the IO library (90 seconds) so the callback contact not being reachable should not affect things. There are other things that might affect it (staging or streaming, maybe) that I still need to investigate.

Regarding the patch, it looks to me like the last section of the patch doesn't actually do anything, does it? It's a duplicate of the default case. The register oneshot in the 2nd to last section is a little worrisome to me. I think in that case, the job manager already has some outstanding thing that will trigger the job state machine elsewhere.

bbockelm - 2012-03-12

Hi Joe -

In the last hunk, the response_code defaults to:

response_code = request->failure_code

However, we're looking at the case where globus_gram_job_manager_add_request fails, so request->failure_code is never touched - instead, we want to look at the rc from globus_gram_job_manager_add_request.

In the end, I'm beginning to think a better course is to set "manager->done = GLOBUS_TRUE" immediately and let the next client be responsible for restarting jobs from the save state.  Where do we benefit from "cleaning up nicely" in a shutdown?

Brian

Joe Bester - 2012-03-14

Check again, the code you are patching is switching on request->failure_code, so it's really not any different than the default case.

The key thing is to keep handling the script functions until whatever is in progress has its result state change stored. That could include staging or job startup.

bbockelm - 2012-03-14

Hi Joe,

Ok, I see what you mean.  Likely, we need separate error handling to set the failures appropriately.

Of course, these very unlikely cases: to reach this code, one needs a valid proxy.  Prior to calling the function, the first hunk will set manager->stop to false if there is a valid proxy.  I suppose we could still catch the malloc issue.

Joe Bester - 2012-03-23

I think I've found a spot where GRAM can wait for up to the two-phase commit timeout plus the gram protocol connect timeout when trying to shut down when a proxy expires. During that time, the job manager won't accept any proxy refresh or job submissions, even restarts with a new proxy. I'll try to get a patch for that Monday.

bbockelm - 2012-04-24

Hi Joe,

I don't think that's it.  We just found another instance where a g-j-m remaining on for 10 hours after it initiated shutdown.

Is there any need to shutdown at all?  Why not just have it send SIGKILL to itself?

Brian

Joe Bester - 2012-04-24

There might be things in progress like job submits or cancels where if we sigkill we could lose some state returned from a script. I still haven't seen any logs or state info about what the job manager is up to, so it's hard for me to debug this.

bbockelm - 2012-04-24

Yes - we are unable to run production CEs at anything above WARNING for a long time.  I can give you the logs, but they're mostly empty.

We'll try running them at TRACE for a day at a time and see if we can catch this in action.

bbockelm - 2012-05-10

Hi Joe,

I feel stuck on this one.  I can't provide the information you need without reverting the OSG hack to kill the job-manager; I can't revert the hack because it ends up being errors exposed to users.

An idea - we can probably make progress if I could reliably trigger a shutdown.  What if we try this:

https://globus.atlassian.net/browse/GT-71

and/or give me a handle to trigger a shutdown via a Unix signal?  This way, during working hours, I could possibly replicate the issue while you are logged in to the machine.

Brian

bbockelm - 2012-05-10

Actually, I think I might revert the patch anyway.  I just poked through some logs and found that, even with it, there are some job-managers that are in shutting-down state for many hours.  So, something else must be at fault.  I'll attach a core file momentarily.

bbockelm - 2012-05-10

Core file is too big to attach.  Here's a link: http://pages.cs.wisc.edu/~bbockelm/uscmsPool1500-globus-job-manager.core.gz

The associated binaries and debug symbols are here: https://koji-hub.batlab.org/koji/buildinfo?buildID=1695

bbockelm - 2012-05-10

Ah-ha!  I looked again at the original message:

ts=2012-03-08T10:10:45.928806Z id=24514 event=gram.signal.end level=WARN gramid=/16217969582761169661/5881249661246812557/ signal="7" jmstate=GLOBUS_GRAM_JOB_MANAGER_STATE_STOP msg="Invalid query" status=-94 reason="the jobmanager does not accept any new requests (shutting down)"

This is *not* the same thing as:

ts=2012-03-11T02:48:28.713039Z id=26159 event=gram.proxy_expire.end level=WARN msg="Proxy expired, stopping job manager"

or

ts=2012-03-11T20:03:49.516689Z id=5813 event=gram.add_request.end level=WARN gramid=/16217943208034520401/12470985504822659492/ status=-130 reason="the job manager was sent a stop signal (job is still running)"

The original error message is from globus_l_gram_job_manager_query_stop_manager when Condor-G tries to stop a job when the request is in FILE_CLEAN_UP, SCRATCH_CLEAN_UP, CACHE_CLEAN_UP, DONE, FAILED_DONE, FAILED, FAILED_FILE_CLEAN_UP, FAILED_SCRATCH_CLEAN_UP, FAILED_CACHE_CLEAN_UP, or FAILED_CLOSE_OUTPUT state.

While we may have chased down a real bug for when the proxy expired, I think this is just a poor choice of return code / error message and can be (mostly) ignored.

This might indicate some unexpected Condor-G issue; I don't know if it gracefully handles job-managers that refuse to stop.

Joe - if you agree with this diagnosis, I'm ready to close this ticket.

Joe Bester - 2012-05-10

It's not only when the client tries to stop the job manager. Prior to my patches, any attempt to send a signal (including proxy refresh) after proxy expiration has been triggered would fail, and submitting jobs (event with a fresher proxy) would also fail until the job manager has terminated. My patches address most of this, though I think there might be some other issues with jobs that get stopped and then a proxy is refreshed not getting into a running state again until a client intervenes.

I guess it would be helpful if you see the bad behavior (processes remaining in shutdown for a long time) with the last tarball and I gave you (and not the allow-manager-restart.patch) as it contains my patches for this issue.

bbockelm - 2012-05-10

Yeah - the reason why I had kept the allow-manager-restart patch is that I still saw issues after your patches.  However, upon closer inspection, the issues were all due to the client trying to stop the job-manager.

Basically, there was a bad log message and a bug.  You fixed the bug, but I still saw the bad log message, and thought the bug was not fixed.

I've reverted the allow-manager-restart patch and am rebuilding the RPM, and I'll upgrade all our local CEs.  Assuming everything is happy on our site, we can release the new RPM on Tuesday.

Joe Bester - 2012-05-11

I'm getting confused now. I thought the report was that the job manager process is not exiting. Is that still the case or not?

The message

ts=2012-03-08T10:10:45.928806Z id=24514 event=gram.signal.end level=WARN gramid=/16217969582761169661/5881249661246812557/ signal="7" jmstate=GLOBUS_GRAM_JOB_MANAGER_STATE_STOP msg="Invalid query" status=-94 reason="the jobmanager does not accept any new requests (shutting down)"

is from a stdio update signal, not a stop request. It is being rejected because either a stop request came in for that job (client explicitly paused things) or the proxy expiration callback was triggered. In either case, the client should be able to fix that particular job and effect the stdio update by submitting a restart request (perhaps after the submitting side has updated its proxy).

bbockelm - 2012-05-11

Hi Joe,

From the logs, there was no proxy expiration for this case.  I think the original bug is actually fixed, and the subsequent instances of this log message are a red-herring.

Does Jaime have a JIRA account?  It would be nice to know if Condor-G has somehow gotten itself into an unrecoverable loop because it does not handle this response code.

Brian

Stuart Martin - 2012-05-11

I just added Jaime as a watcher on this issue.

Joe Bester - 2012-07-26

I'm going to mark this as done/incomplete, since I don't have a reproducible test case for this behavior and it's not been commented on for a few months.

Globus Toolkit/GT-157

Summary

Hash gram_job_state directory by user

Details

Type: Task

Status: Resolved 2012-06-18

Description

Some filesystems struggle with the number of files that GRAM can accumulate in the globus_job_state directory.  One great improvement was the elimination of the per-job lockfile: this reduces the number of files by a factory of 2.

A second improvement would be to create new jobs in /var/lib/globus/gram_job_state/$LOGNAME/.

I had a ticket open for this in the OSG (https://jira.opensciencegrid.org/browse/SOFTWARE-356), but it really belongs in the Globus JIRA.

Comments

bbockelm - 2012-04-19

Just found an example of this in the wild.  One site had 47k state files, which resulted in about 141k files in this directory.  Anything that scans through the directory (such as a periodic globus-job-run causing a globus-job-manager to fire up) would cause enough I/O to disrupt other parts of the system.

We ought to think about trying to schedule this sometime during 5.2.x.

Joe Bester - 2012-05-23

One thing I've been thinking about is replacing the per-job job state file with a per-job-manager-lrm-user-tag sqlite database. I think this would solve this problem and also allow some flexibility in the job manager to eventually simplify some of its job management tasks as things develop.

For example, the query operations could be done without requiring the request structure to be reloaded into memory. The stdio_update logic could be simplified to delay URL resolution until needed. The management of the which jobs need to be operated on by the state machine can be done by querying the table instead of having per-job callbacks registered. The expiration shutdown could be done without loading the full set of jobs into memory to trigger their client callbacks. These are all difficult with the current state file format. I think long-term this work would help with gram scalability and responsiveness. Depending on how urgent you feel the problem, we can pursue this alternative or the directory hashing solution.

bbockelm - 2012-05-31

Hi Joe,

It seems the two approaches are a different order of magnitude in terms of work.  I suggested hashing the gram_job_state because it appears to be relatively straightforward.

Long-term, I think the sqlite approach would be great.

Brian

Joe Bester - 2012-06-05

Attaching GT-157.diff which will put job state files in /var/lib/globus/gram_job_state/$LOGNAME/$SERVICETAG/$LRM/job.$JOBID

This should shut up some of the warnings about job files belonging to other job managers as well.

bbockelm - 2012-06-05

Thanks Joe!

What are the upgrade concerns?

Joe Bester - 2012-06-06

The fix is available in the testing repos and will be in 5.2.2

bbockelm - 2012-06-06

Joe - I meant, can we upgrade in-place, or will all jobs be lost on upgrade?

Joe Bester - 2012-06-06

If you upgrade, all old jobs will be read at restart time. New jobs will be in the hashed subdirs.

alainroy - 2012-06-18

Joe, just to clarify: you think that in-place upgrades will be safe and nothing will be lost?

Globus Toolkit/GT-158

Summary

Decouple GRAM from home directories

Details

Type: Task

Status: Open

Description

OSG would like to completely decouple GRAM from our user's home directories. A few use cases:

1) In some cases, it's a pain to automate the creation of the directories if they aren't otherwise going to exist on the CE.
2) Home directories may be on an (expensive) sitewide shared file system, and any reduction in load is greatly appreciated by the local sysadmins. GRAM files could be separately exported directly from the CE if desired, for example.
3) If local users are mapped to their local username, they'll notice lots of gram_scratch directories and unexpected files in ~/.globus that count against their quota.
4) If the Globus-related files are in a dedicated location, it's much easier to do migrate all the state to a separate machine/VM and failover to that one.

This decoupling is fairly easy to achieve.  Patches to follow.

Comments

Joe Bester - 2012-05-23

> GRAM files could be separately exported directly from the CE if desired, for example.

I think this is the key missing piece from this patch. Do you include something to effect this in the OSG LRM adapters? Without something to address this, file and executable  staging would not work, and I'm not sure that the scratch directory stuff would work either. I guess as an alternative we could have those disabled with some configuration flag so that it would be clear why jobs don't have access to those features. Do you have requirements for these features?

> 3) If local users are mapped to their local username, they'll notice lots of gram_scratch directories and unexpected files in ~/.globus that count against their quota.

The scratch dir base dir can be changed via gram config options.

Globus Toolkit/GT-159

Summary

globus-gatekeeper init script should report errors better

Details

Type: Task

Status: Resolved 2012-05-14

Description

The globus-gatekeeper fails silently for a few reasons. I think it should report errors to the user instead. We had a user hit one of these (missing certificate), and he would have solve it more quickly if an error message was printed.

The attached patch shows a tested modification to the init script.

Comments

Globus Toolkit/GT-160

Summary

improve the GRAM LRM adapter doc

Details

Type: Improvement

Status: Resolved 2012-07-25

Description

I just noticed that we still have ws-gram on these pages:
        http://www.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/developer/scheduler-tutorial.html

We need to review and update the documentation here.
  - Remove and WS GRAM details.
  - Remove the resource publishing section.
  - Add an overview section that describes in more details the issues with interfacing GRAM with an LRM.
  - Add a requirements section?  e.g. a host where the LRM commands and log file (for SEG) is available.
  - Change to use terms consistently.  e.g. LRM instead of scheduler.
  - should be more clear the steps required to complete the adapter.
     + step 1, step 2, ...

Comments

Joe Bester - 2010-03-23

Also, drop pre-ws name from the docs.

Stuart Martin - 2011-09-16

Here is an example of LRM troubleshooting doc to add.

Step 1: make sure the perl file is valid perl / runs ok:

% perl -I$GLOBUS_LOCATION/lib/perl $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm

Step 2: run the jm command by hand to see if you get better errors

(Assumes step 1 worked)

Add (save_job_description = yes) to your RSL and submit the job.  This will leave a perl file in your home directory that you can run with the job manager script. That file is called gram_$unique.pl where $unique is a string of characters unique for each job. Pass that to the script to see what's going on:

$GLOBUS_LOCATION/libexec/globus-job-manager-script.pl -m pbs -f ~/gram_UNIQUE.pl -c submit

Joe Bester - 2012-01-06

Attached current updates but haven't committed yet. The Perl, RVF, Configuration sections are more-or-less complete, but the SEG section is not started (though I've written the SEG module but not tested it yet). Probably also should have a packaging section.

Joe Bester - 2012-07-25

I've finished the SEG module documentation and committed the tutorial to the 5.2.2 gram5 developer document. Would be nice to have some packaging stuff there and more info about the APIs, but it's an improvement to what we had.

Globus Toolkit/GT-161

Summary

Memory leak in gss_accept_delegation()

Details

Type: Bug

Status: Resolved 2012-05-09

Description

One of our sysadmins did me a favor and started looking at the persistent memory leaks in globus-job-manager (folks have reported 5GB g-j-m processes after several weeks).

In accept_delegation.c, the following is called:

            local_result = globus_gsi_cred_get_cert(
                        context->peer_cred_handle->cred_handle,
                        &peer_cert);

Per the globus_gsi_cred_get_cert documentation, the caller is responsible for calling X509_free on peer_cert on success.  This is not done and the cert is leaked.

Leak was added by this commit: http://viewcvs.globus.org/viewcvs.cgi/gsi/gssapi/source/library/accept_delegation.c?r1=1.37&r2=1.38 (I think)

Comments

Joe Bester - 2012-05-07

Attached a version of gssapi that plugs that leak.

alainroy - 2012-05-07

Thanks Joe! I'm happy to take the tarball if needed, but if it's easy for you, could I just get the source RPM?

Joe Bester - 2012-05-09

Here's our SRPM.

Globus Toolkit/GT-162

Summary

Add support for GT packages for SLES 11

Details

Type: New Feature

Status: Resolved 2012-05-09

Description

Some XSEDE systems use SLES11, so we should add that OS to the build and test system.

Comments

Joe Bester - 2012-04-25

I've created an SLES 11 AMI and have some patches to the spec files which I will add after 5.2.1 is released. The changes are small for some dependencies that have different names than redhat (latex, perl). One outstanding issue is SLES doesn't appear to have voms rpms, which myproxy uses.

Also todo are yast repository creation (somewhat different than yum) and whatever the equivalent of groupinstall is. The documentation will need some changes as well to use zypper or yast2 instead of yum when dealing with SLES.

Joe Bester - 2012-05-09

duplicate of GT-151

Globus Toolkit/GT-163

Summary

Condor fake-SEG loses track of job

Details

Type: Bug

Status: Resolved 2012-05-09

Description

Due to a user complaint, I found a case where a user's job remained in "active" state indefinitely - even hours after the job actually finished.

Using strace, I still see the condor XML file being polled and read. I suspect there is some unforeseen race condition - or parsing issue - in the "fake" SEG for Condor.

open("/var/lib/globus/gram_job_state/condor.16217795921895230381.17186126506088338356", O_RDONLY) = 14
fstat(14, {st_mode=S_IFREG|0644, st_size=8265, ...}) = 0
getuid()                                = 30177
close(14)                               = 0

I've attached the Condor log and the job state file.

Comments

bbockelm - 2012-03-28

From gdb, the SEG believes it has already processed all 8265 bytes in the file.  So, there are one of two possibilities:

1) SEG is not properly parsing the attached file.
2) There is some race condition where not all events are read, but the ref->seg_last_size is still updated.

Joe Bester - 2012-03-28

Is there a gram log?

bbockelm - 2012-03-28

To help the user along, I restarted the job manager.  The file parsed successfully and the job ended with respect to GRAM.  So, I suppose we are looking for a race condition.

Oh - and another interesting thing.  Even though the job is "still running", notice that there aren't any destinations for stdout/err persisted in the state file.  When the job manager is restarted, Condor-G and GRAM will go into an infinite loop unless Globus 129 is suppressed.

bbockelm - 2012-03-28

Here's the GRAM log; not very useful:

I killed off the job-manager at around 13:10:12; PID 2605 is the restarted job-manager where it successfully parsed the state file.

ts=2012-03-28T09:47:59.433535Z id=5350 event=gram.send_job.end level=WARN status=-3 errno=2 msg="Error creating datagram socket" reason="No such file or directory"
ts=2012-03-28T10:49:30.658964Z id=20630 event=gram.send_job.end level=WARN status=-3 errno=2 msg="Error creating datagram socket" reason="No such file or directory"
ts=2012-03-28T11:51:03.060209Z id=15125 event=gram.send_job.end level=WARN status=-3 errno=2 msg="Error creating datagram socket" reason="No such file or directory"
ts=2012-03-28T12:52:35.342865Z id=28350 event=gram.send_job.end level=WARN status=-3 errno=2 msg="Error creating datagram socket" reason="No such file or directory"
ts=2012-03-28T13:11:12.414579Z id=2606 event=gram.send_job.end level=WARN status=-3 errno=111 msg="Error creating datagram socket" reason="Connection refused"
ts=2012-03-28T13:11:12.431932Z id=2605 event=gram.set_job_status.end level=WARN gramid=/16217795921895230381/17186126506088338356/ state=2 failure_code=0 status=-156 reason="the job contact string does not match any which the
 job manager is handling"
ts=2012-03-28T13:11:19.773143Z id=2605 event=gram.signal.end level=WARN gramid=/16217795921895230381/17186126506088338356/ signal="8" jmstate=GLOBUS_GRAM_JOB_MANAGER_STATE_TWO_PHASE_END msg="Stdout size mismatch" status=0 stdout_signal_size=0 stdout_actual_size=194 reason="Success"

bbockelm - 2012-03-28

Here's a second example of the same behavior.  The contents of the condor log file is a bit simpler - likely simple enough to eliminate the chance of a parsing issue.

This came from a different CE; as before, no useful GRAM log file.

bbockelm - 2012-03-28

Actually, it looks like this problem is fairly widespread.  I extended my Globus state parser script to also query Condor.

On one of my CEs, I count 3300 jobs in either the ACTIVE or PENDING Globus job state with no corresponding job in Condor.

[root@red-gw1 gram_job_state]# ~/globus_state_parser -x FAILED,DONE | grep Unknown | wc -l
3300

bbockelm - 2012-03-28

Hi Joe,

Talking with the Condor folks, I think the approach used in this code is unsafe.  Consider this ordering:
1) poll_time = time() is executed.
2) Condor writes an event to the log, where the event occurs after poll_time.
3) g-j-m calls fstat to check size of the file.
4) When the log is parsed, the event is read but rejected because the timestamp happens after poll_time
5) At the next SEG run, the file is ignored because it grew before step 3.

The immediate fix may be to no longer ignore "too new" events.  However, that's got other issues - you'll get duplicated events.

The Condor guys suggested to not use the date for ordering events, regardless.  They suggested to count the number of events that appear in the user log (it's never rotated) and to remember the last completely parsed event.  A bit more coding work, but a much more robust solution.

Brian

Joe Bester - 2012-03-28

Attached a patch. It keeps track of the parsed offset instead of the file length as the seg_last_size.

bbockelm - 2012-03-28

Dumb question (have a meeting, don't have time to look at the code) - is this info persisted to disk anywhere?  I ask because I want to make sure it is safe to upgrade for running jobs.

Joe Bester - 2012-03-28

It's not persisted, but the timestamp of the last handled event is. The size is only used to determine whether to check a file again (an optimization related to GRAM-273).

Joe Bester - 2012-03-29

Updated a new patch GRAM-329.diff that removes the timestamp stuff for parsing condor logs, and avoids reparsing events that have already been read by a job manager process.

bbockelm - 2012-04-09

Hi Joe,

There's a problem with the previous patch - it doesn't capture the "\n" at the end of the event.

This causes the log file to be re-parsed with each iteration, as there is always 5 bytes at the end of the file left unparsed - in turn, this causes the g-j-m to consume significantly more resources than before this fix.

This patch reduced, but not eliminated, the problem.  Don't know what the remaining issue is.

Brian

alainroy - 2012-04-15

What's the state of this? Any chance of eliminating the final problems? I'd really like to ship an update next week if possibe. Thanks!

Alain

Joe Bester - 2012-04-18

Brian, were you referring to the first or second patch?

bbockelm - 2012-04-18

This was the second patch (GRAM-329.patch, not GRAM-229.patch).

Joe Bester - 2012-04-18

Found the following: the parser didn't anchor the regex, so adding the match length didn't move past the end of the whole message. Also, the parser didn't know about the time() style values (which might be errors on condor's part?) and so it would fail to parse things that came after them.

This attached version is the same as 5.2.1rc2 + patches to fix the above issues.

Joe Bester - 2012-04-26

Do you have any feedback on this latest version?

Joe Bester - 2012-05-04

These has been added to OSG so I've committed it to trunk and 5.2 branch.

Globus Toolkit/GT-164

Summary

add a hybrid split/single mode which only creates backend connections if client requests stripes.

Details

Type: New Feature

Status: Resolved 2012-07-24

Description

In many cases with striped server setups, the majority of transfers are non-striped.  This causes less efficient transfers for those users and more resource usage for the servers.  A hybrid mode where the server runs as a standalone server until a striped command is used would enable the best of both worlds, and with simpler setup for admins.

Comments

Mike Link - 2012-04-17

Added this and seems stable after initial testing.

Mike Link - 2012-05-09

fully tested, ready to go for 5.2.2

Globus Toolkit/GT-165

Summary

Threaded server has a race condition with parallel data channels and loading crls

Details

Type: Bug

Status: Resolved 2012-05-22

Description

Gayane was test threaded+encryption, and ran into an error where >1 streams data channel would sometimes fail to connect.   We traced the error to a race condition in loading the crls into the global hash table.  I believe if loading the crls is slow enough, the 2nd connection attempts to load them before the first attempt has finished (I think they should only be loaded once), which causes a failure when openssl thinks a duplicate crl was found.

Comments

Mike Link - 2012-05-22

Fixed by GT-166

Globus Toolkit/GT-166

Summary

Threaded server data channel connection error

Details

Type: Bug

Status: Resolved 2012-05-22

Description

possibly related to GRIDFTP-238.  multiple data channels with threaded servers occasionally fail with:
500-OpenSSL Error: a_verify.c:168: in library: asn1 encoding routines, function ASN1_item_verify: EVP lib
500-OpenSSL Error: rsa_eay.c:676: in library: rsa routines, function RSA_EAY_PUBLIC_DECRYPT: padding check failed
500-OpenSSL Error: rsa_pk1.c:100: in library: rsa routines, function RSA_padding_check_PKCS1_type_1: block type is not 01

I suspect with the proper fix for 238 this may be fixed as well.

Comments

Mike Link - 2012-05-22

Locking around x509_verify_cert() during the connection handshake fixes the problem.  This may be an openssl bug in older versions of openssl, where x509_verify_cert() is not thread safe.  I'm not able to reproduce the problem in 1.x, so it may have been fixed, but since I don't see any evidence of that in the openssl changelogs I'm not inclined restrict the locking to older openssl versions.

After the fix I don't see a noticeable delay in forming the data channel with large numbers of parallel streams -- it is still slightly faster than a non-threaded run.

Globus Toolkit/GT-167

Summary

UMD Criterion: EGI_GENERIC_SEC_1 (Writable files)

Details

Type: Bug

Status: Resolved 2012-05-22

Description

GridFTP server creates the "/var/log/gridftp.log" file with the following permission rights: 0666, which means that the file is writable by anyone.

See below some extract from the results I got using strace on the globus-gridfpt-server execution.

open("/var/log/gridftp.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=164975, ...}) = 0

Comments

support@ige-project.eu - 2012-05-02

Hi,

Any news on this ticket?

Regards,
Sebastian

Mike Link - 2012-05-22

0666 is the mode used by fopen() when a file is created.  This only results in an all writable file when the umask is 000, which shouldn't happen, but in case it does I've added a fix to set the umask to 022 for log files and the pidfile, which will force a default permission of 644 for those files.  The permissions can be further controlled with -log-filemode.

Globus Toolkit/GT-168

Summary

Documentation regarding GridFTP in UDT isn’t up to date

Details

Type: Documentation

Status: Resolved 2012-09-05

Description

In the GridFTP configuration page, in particular in the UDT section (http://www.globus.org/toolkit/docs/5.2/5.2.0/gridftp/admin/#gridftp-config-udt), the GT website states that the administrator should use the threaded version of the server and/or the client component.
In GT 5.2 this difference has disappeared. The webpage should be updated accordingly.

Comments

support@ige-project.eu - 2012-05-02

Hi,

Any news on a fix for this?

Regards,
Sebastian

Raj Kettimuthu - 2012-09-05

This has been fixed.

Globus Toolkit/GT-169

Summary

get a native Windows 5.2 GridFTP build/install working

Details

Type: Task

Status: Resolved 2012-06-05

Description

Test in the existing Windows GC framework

Comments

Mike Link - 2012-02-14

only a little work done with this, mostly getting recent additions to build.  still ongoing.

Mike Link - 2012-04-17

this is finally stable and should be ready to release to testers.

Mike Link - 2012-05-02

Still need to make a standalone installer.

Mike Link - 2012-05-22

I updated my machine at some point in the last few weeks, which seems to have broken my ability to build: the new version of gcc no longer accepts '-no-undefined', which is exactly what I need in order to build dlls.  If I can't see an obvious fix I'll try to downgrade or try to build on older machine.  Either way I'll have packages this week.

Mike Link - 2012-06-05

installers are ready.

Globus Toolkit/GT-170

Summary

create top level commands to eventually replace some site commands

Details

Type: Task

Status: Open

Description

Go through list of site commands and create synonymous top-level commands for 5.2.x where appropriate.  Work with the full protocol command list Raj is putting together.

Comments

Mike Link - 2012-03-27

Need to discuss this further.  The candidates for new names are:
 SITE RDEL  pathname
 SITE CHMOD  mode  pathname
 SITE CHGRP  group  pathname
 SITE SYMLINKFROM  reference-path
 SITE SYMLINKTO  link-path

but arguments can be made for keeping them site commands.  CHMOD in particular is long lived outside of globus and has always been SITE.

Globus Toolkit/GT-171

Summary

Prototype GridFTP OSCARS integration

Details

Type: User Story

Status: Resolved 2012-10-12

Description

OSCARS provides API to create virtual circuits (with guaranteed bandwidth) between a pair of endpoints. Prototype the ability to reserve bandwidth in GUC using the OSCARS API. Eventually, we would like to add this ability to GO. Part of this exercise involves providing inputs to ESnet folks on the OSCARS API (what kind of improvements are needed to their API, any additional functionality etc).

Comments

Mike Link - 2012-01-24

Started with a simple client library plugin which calls a configured script before a PORT command.  The script calls the the example java oscars reservation app.

There are limitations to this approach because without server protocol support, the source data address has to be assumed to be the same as the source control channel.  Additionally the oscars reservation involves a request and then polling to find if the request has been approved.  If approval is not fairly quick there may be issues with keeping the server alive while we wait.

Mike Link - 2012-05-21

Asked Raj to talk to his esnet contatcs to get me an account on their oscars system.  Might be a bit of work to figure out how to incorporate those credentials into the client, but I'm not sure yet.  Until now we thought it was an open system.

Mike Link - 2012-06-19

figured out the authentication and can successfully create a reservation via the plugin.  it is ready for meaningful tests.

Globus Toolkit/GT-172

Summary

Extend DSI interface to allow DSI-defined ftp commands

Details

Type: New Feature

Status: Resolved 2012-05-22

Description

This has been requested for some time, but the new HPSS dsi impl from ncsa would require this in order to install on to a standard server.  Additionally I need to look at the ability for the dsi to return custom tags in mlsx responses.

Comments

Mike Link - 2012-05-09

added required functionality and new api.  Will get in touch with Jason Alt to test using his new HPSS dsi.

Mike Link - 2012-05-22

I made a few modifications after seeing Jason's DSI, now this is done.

Globus Toolkit/GT-173

Summary

Allow a frontend→backend connection via admin defined credentials.

Details

Type: Task

Status: Resolved 2012-05-09

Description

ipc options to specify credentials to connect with, and DNs to allow connections from will provide an alternative way to connect than client delegated credentials.  future work of adding extensions to credentails may tie into this.

Comments

Mike Link - 2012-04-17

added -ipc-cred option to specify a credential or subject of credential (in the default search path) to connect with on the frontend, and -ipc-allowed-dn to specify subjects that the backend will accept connections from.

Globus Toolkit/GT-174

Summary

Add support GT packages for Debian 7 (wheezy)

Details

Type: Task

Status: Resolved 2012-05-09

Description

The debian wheezy release is scheduled for freeze in June. We should add support for this in its current state so that we can shake out any GT-related bugs prior to its release.

Comments

Joe Bester - 2012-04-24

I have a bamboo build and test working for the current state of the debian 7 release at http://gtbamboo.dyndns.org:8085/browse/GT52-DEB7 and http://gtbamboo.dyndns.org:8085/browse/GT52-TESTDEB7

After final release, I will update the ami, but will close this as the work to support it is done.

Globus Toolkit/GT-175

Summary

Support Ubuntu LTS releases in GT 5.2.x build plans

Details

Type: Task

Status: Resolved 2012-05-09

Description

We currently do not create binaries for ubuntu LTS releases which are still supported by ubuntu. Create new AMIs to build and test on those, and add 5.2.x build plans.

Comments

Joe Bester - 2012-04-13

10.04 was added for 5.2.1rc1.

Joe Bester - 2012-04-24

12.04 was added after 5.2.1rc2

Joe Bester - 2012-04-25

8.04 is still supported by ubuntu until 2013, but our debian package metadata don't support the old version of debhelper available for that release. I'm going to mark this task as complete, and if we have a legitimate request for support for 8.04, we can address that version. In any case we will have binaries for 10.04 and 12.04 when 5.2.1 is released.

Globus Toolkit/GT-176

Summary

Create repo packages for bamboo, versioned, and testing repo roots

Details

Type: Task

Status: Resolved 2012-05-09

Description

The bamboo tasks for the 5.2.x releases create an apt or yum repo package for the packages on www.globus.org/ftppub/gt5/5.2/5.2.x/. We should add other repo packages for www.globus.org/ftppub/gt5/5.2/testing and www.globus.org/ftppub/gt5/5.2/stable, and for bamboo.globus.org artifacts for testing.

Comments

Joe Bester - 2012-04-25

This is implemented for 5.2.1 final release.

Globus Toolkit/GT-177

Summary

Add support GT packages for ubuntu 12.04LTS (precise pangolin)

Details

Type: Task

Status: Resolved 2012-05-09

Description

The base image used for Globus graph/nexus will be moving to ubuntu 12.04LTS at the end of April.  Add support for GT packages for that system.

Comments

Joe Bester - 2012-04-20

I've created an AMI for this based on the current beta state of 12.04 and have set up build and test tasks on http://gtbamboo.dyndns.org:8085/browse/GT52-UBUNTU1204 and http://gtbamboo.dyndns.org:8085/browse/GT52-TESTUBUNTU1204

I'll update that image as the final OS release comes around, and depending on test results, we'll publish the binaries with 5.2.1 when it is released.

Joe Bester - 2012-04-24

The latest rc from CVS passes the build and test; I will update the image when the final release is made.

Globus Toolkit/GT-178

Summary

Globus Release 5.2.1

Details

Type: Milestone

Status: Resolved 2012-05-09

Description

Milestone for tracking 5.2.1 bugfixes in the toolkit

Comments

Joe Bester - 2012-04-25

I'll start finalizing this release today.

Joe Bester - 2012-04-26

All of the 5.2.1 source, binaries, and docs are updated on the web site.

Globus Toolkit/GT-179

Summary

Missing directories $GLOBUS_LOCATION/var/lock and $GLOBUS_LOCATION/var/run

Details

Type: Bug

Status: Resolved 2012-05-09

Description

Some of the paths used by init scripts for gridftp and gram rely on ${localstatedir}/lock and ${localstatedir}/run to be created before running, but they aren't in the installer prefix. Either the installer should create those, or we should add some install hooks when the init scripts are built.

Comments

Joe Bester - 2012-04-03

I added these to the installer, but it would be nicer (post-5.2.1) to add those to the packages that actually need them.

Globus Toolkit/GT-180

Summary

Prepare Security and Common doc for 5.2.0

Details

Type: Task

Status: Resolved 2012-05-09

Description

Update the various Common and Security frags to describe new functionality, bug fixes, changes, etc.

Comments

Joe Bester - 2012-04-26

I added doc on the new thread model in the common docs.

Globus Toolkit/GT-181

Summary

move GT build tasks from Bamboo to jenkins

Details

Type: Task

Status: Resolved 2012-10-02

Description

Our MCS hosted Bamboo server is not functioning well.  It isn't handling the numerous simultaneous running builds that we're doing for GT.  The task here is to investigate if using a jenkins integration testing server is a viable and better option.

The Graph team is already maintaining one for their needs that we can use as well.
   https://jenkins.utils.globuscs.info/

Comments

Joe Bester - 2012-04-18

I have EC2 AMIs for all of the OS we support for 5.2.1 that can do things with bamboo. Is there anything I'd need to add to those to allow Jenkins to start them up and start build tasks on them? Bamboo had an agent that the images needed to run to be able to communicate with it.

Joe Bester - 2012-05-24

I'm spending a lot of time babysitting bamboo and would *really* like to see this done.

Mike Link - 2012-06-05

Started work on this.  I set up a jenkins instance on my test server, and created the simple installer creation, installer test jobs, and source package creation jobs.  I am able to link with EC2 and used Joe's amis to run test jobs.

I played with the configuration matrix bits and it seems like it would be a nice alternative to separate build and test jobs for each platform -- for instance we would just need one rpm build job, and then load it with the list of labeled platform images, vs one job for each platform.  I haven't figured out if I can use the platform labels in command parameters yet, which would be the way we need to use it.

I don't think the images will need to change, at least to support the basic functionality we have in bamboo.

Mike Link - 2012-06-19

Joe, I need access to the rest of your AMI's.  Plan is to copy the same basic job structure as bamboo has and get that working, before trying the more advanced jenkin features like cvs triggers and configuration matrices.

Joe Bester - 2012-06-19

I've added permissions on the amis I've been using for your account.

Mike Link - 2012-08-13

per a call between me, Joe, Josh and Steve, there doesn't seem to be an immediate fit for running on graph's box.

Joe has access to my test install.

Joe Bester - 2012-08-24

I got all of the tests (including the installer tests) working yesterday, with test results displayed in the mikelink.com jenkins setup.

I've started migrating the installation to ec2 (currently on ec2-184-73-162-120.compute-1.amazonaws.com pending getting a dns id). I've linked it with crowd.globus.org, reusing the bamboo-user and bamboo-admin group permissions as the equivalent for jenkins. So, anyone who had access to bamboo.globus.org should be able to use jenkins as before. There might be some configuration issues to shake out after migrating things, as I think some jobs refer to mikelink.com

I've tried to separate some of the permissions out for the ec2 side, so that the jenkins account is able to create and terminate instances only, and not do anything else. I've added Stu to a AWS IAM group to manage jenkins, and will do so for Mike later. This should make it easier to share debugging/tracking chores when nodes go awry, as we can have people in the management group look at instances created by jenkins, and get the console logs and whatever easily.

I've also moved the buidls from M1Large to M1Small nodes, since those support 64-bit AMIs now. I'll continue to plug away at more configuration things and hopefully get things into the shape of having a reserved instance to run the jenkins service for us.

Joe Bester - 2012-08-28

builds.globus.org points to the jenkins installation I have running on ec2 (though might be a day to get dns propagated everywhere)

Joe Bester - 2012-09-12

DNS is working now (even at MCS) and the build scripts are all working with jenkins on the builds.globus.org machine.

Still todo:
- backup schedule so that changes to jenkins tasks won't be lost if the machine dies
- Set up reserved instance for builds.globus.org to reduce costs

Joe Bester - 2012-09-19

I added some rsync/git backup to my mcs home dir that will track changes in the jenkins system configuration files and job configuration files

Joe Bester - 2012-10-02

All done.

Globus Toolkit/GT-182

Summary

gridftp truncates pathnames over 4096 chars, misleading errors

Details

Type: Bug

Status: Resolved 2012-06-19

Description

I'm getting errors like System error in mkdir: File exists on mkdir and 500-globus_xio: System error in open: Is a directory on STOR.   Neither case is giving me something close to the error I want, like 'path too long'.

I'm using files like: python print "/home/karlito/" + (("a" * 100 + "/") * 44) + "somefilename"

Comments

Mike Link - 2012-06-19

fixed so that paths that are too long will now fail with path too long errors.

Globus Toolkit/GT-183

Summary

Usage stats server doesn’t discard bad packets

Details

Type: Bug

Status: Resolved 2012-05-10

Description

The usage stats server sometimes encounters a badly formatted gram packet with a missing or malformed uuid. After it fails to insert this packet, it tends to put an hour's worth of usage stats in the failed state for later. It's fairly easy to detect this type of bad packet (the uuid is missing or not in uuid form) so we should be able to discard these by filtering them out when the insert statement is generated for that packet.

Comments

Joe Bester - 2012-05-10

I've committed a change to usagepacket.py and gram5packet.py that will discard usage packets which have no, or non-36-byte uuid field. These are deployed on usage-stats.globus.org. The problem in GT-113 remains, where an error can cause the usage server to abort a transaction and end up caching bad data, so some failures can propagate to multiple hours of data, especially multiple files are to be uploaded at once.

Globus Toolkit/GT-184

Summary

gram5_rsl_attribute_groups.attributes column is too small

Details

Type: Bug

Status: Resolved 2012-05-10

Description

The gram5_rsl_attribute_groups.attributes column, which contains a comma-separated list of rsl attributes in a job is not wide enough to handle all of the RSL attributes. I don't think this table is used in any of our queries, so it might be ok to ignore it altogether. I started a query to double the size of that column yesterday morning, but it's been running for 24 hours without any success. I'll try to see if we have any use for that table, and if not, stop trying to insert into it.

Comments

Joe Bester - 2012-05-10

The resize completed. I've committed the schema change to CVS. I'll defer any concerns about the usage of this table for later.

Globus Toolkit/GT-185

Summary

globus-personal-gatekeeper creates too-long paths on MacOS

Details

Type: Improvement

Status: Resolved 2012-05-11

Description

The path to the job dir created by globus-personal-gatekeeper is based on $TMPDIR, which is, on MacOS, a rather large path unique to each user. Adding the various other components to the pathname, including a redundant $LOGNAME creates a path which is longer than is valid for sockaddr_un addresses, so the job manager can't run correctly. Since the job directory gets the username and hostname appended to it, the path passed to gram can be shorted without worrying about having duplicate path names on different machines.

Comments

Joe Bester - 2012-05-11

fix committed to 5.2 branch and trunk

Globus Toolkit/GT-186

Summary

GRAM job manager leaks condor log path

Details

Type: Sub-task

Status: Resolved 2014-07-23

Description

In globus_l_gram_file_cleanup(), the job manager allocates a string for the condor log path and doesn't free it

==26696== 10,580 bytes in 115 blocks are definitely lost in loss record 665 of 670
==26696==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==26696==    by 0x7B9FD15: globus_common_v_create_string (in /usr/lib64/libglobus_common.so.0.14.6)
==26696==    by 0x7B9FDCC: globus_common_create_string (in /usr/lib64/libglobus_common.so.0.14.6)
==26696==    by 0x4267BB: globus_l_gram_file_cleanup (globus_gram_job_manager_state.c:2226)
==26696==    by 0x424AE6: globus_l_gram_job_manager_state_machine (globus_gram_job_manager_state.c:782)
==26696==    by 0x423AB5: globus_gram_job_manager_state_machine_callback (globus_gram_job_manager_state.c:137)
==26696==    by 0x7B92033: globus_callback_space_poll_nothreads (in /usr/lib64/libglobus_common.so.0.14.6)
==26696==    by 0x7BB2E1F: ??? (in /usr/lib64/libglobus_common.so.0.14.6)
==26696==    by 0x409A3D: main (main.c:634)

Comments

Globus Toolkit/GT-187

Summary

GRAM job manager leaks during stdio update

Details

Type: Sub-task

Status: Resolved 2012-05-16

Description

There appears to be a couple of leaks in the stdio update processing, both in rsl merging process and the stream list replacement.

The following stacks appear multiple times in the valgrind traces.

==9995== 35 bytes in 5 blocks are indirectly lost in loss record 244 of 526
==9995==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==9995==    by 0x6F5B07F: globus_rsl_copy_recursive (in /usr/lib64/libglobus_rsl.so.2.7.1)
==9995==    by 0x41A46D: globus_gram_job_manager_rsl_merge (globus_gram_job_manager_rsl.c:77)
==9995==    by 0x4152F1: globus_l_gram_stdio_update_signal (globus_gram_job_manager_query.c:2067)
==9995==    by 0x4136B3: globus_l_gram_job_manager_signal (globus_gram_job_manager_query.c:881)

==9995== 120 bytes in 5 blocks are indirectly lost in loss record 341 of 526
==9995==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==9995==    by 0x6F59C4D: globus_rsl_value_make_literal (in /usr/lib64/libglobus_rsl.so.2.7.1)
==9995==    by 0x6F5AF1C: globus_rsl_value_copy_recursive (in /usr/lib64/libglobus_rsl.so.2.7.1)
==9995==    by 0x4233BB: globus_l_gram_job_manager_staging_add_pair (globus_gram_job_manager_staging.c:896)
==9995==    by 0x422532: globus_l_staging_replace_one_stream (globus_gram_job_manager_staging.c:308)
==9995==    by 0x4226A6: globus_l_staging_replace_stream (globus_gram_job_manager_staging.c:408)
==9995==    by 0x422243: globus_gram_job_manager_streaming_list_replace (globus_gram_job_manager_staging.c:162)
==9995==    by 0x418374: globus_i_gram_request_stdio_update (globus_gram_job_manager_request.c:2359)


==9995== 300 bytes in 5 blocks are indirectly lost in loss record 400 of 526
==9995==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==9995==    by 0x6F5AF06: globus_rsl_value_copy_recursive (in /usr/lib64/libglobus_rsl.so.2.7.1)
==9995==    by 0x4233CF: globus_l_gram_job_manager_staging_add_pair (globus_gram_job_manager_staging.c:897)
==9995==    by 0x422532: globus_l_staging_replace_one_stream (globus_gram_job_manager_staging.c:308)
==9995==    by 0x4226A6: globus_l_staging_replace_stream (globus_gram_job_manager_staging.c:408)
==9995==    by 0x422243: globus_gram_job_manager_streaming_list_replace (globus_gram_job_manager_staging.c:162)
==9995==    by 0x418374: globus_i_gram_request_stdio_update (globus_gram_job_manager_request.c:2359)
==9995==    by 0x41538E: globus_l_gram_stdio_update_signal (globus_gram_job_manager_query.c:2091)


==9995== 2,100 (120 direct, 1,980 indirect) bytes in 5 blocks are definitely lost in loss record 517 of 526
==9995==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==9995==    by 0x7BA1C99: globus_list_insert (in /usr/lib64/libglobus_common.so.0.14.6)
==9995==    by 0x423672: globus_l_gram_job_manager_staging_add_pair (globus_gram_job_manager_staging.c:998)
==9995==    by 0x422532: globus_l_staging_replace_one_stream (globus_gram_job_manager_staging.c:308)
==9995==    by 0x4226A6: globus_l_staging_replace_stream (globus_gram_job_manager_staging.c:408)
==9995==    by 0x422243: globus_gram_job_manager_streaming_list_replace (globus_gram_job_manager_staging.c:162)
==9995==    by 0x418374: globus_i_gram_request_stdio_update (globus_gram_job_manager_request.c:2359)

Comments

Globus Toolkit/GT-188

Summary

gsi sysconfig leaves internal results in the error cache

Details

Type: Sub-task

Status: Resolved 2012-05-15

Description

The gsi sysconfig library generates some error objects (returned as globus_result_t) from its internal functions, and in some cases discards those before returning a GLOBUS_SUCCESS to the caller. These fill some space unnecessarily in the result cache and waste memory.

Comments

Globus Toolkit/GT-189

Summary

GRAM job manager regular expression storage grows

Details

Type: Sub-task

Status: Resolved 2012-05-16

Description

Like GT-188, not a true leak, but (at least on linux) the regexec() call to parse the condor log will add nodes to its internal state table as it parses different inputs. These will be never be freed until a call to regfree() on the compiled regular expression.

Comments

Globus Toolkit/GT-190

Summary

GRAM job manager leaks callback contact

Details

Type: Sub-task

Status: Resolved 2012-05-16

Description

I've seen a few valgrind stacks like this after running the client test suite against a job manager running with valgrind. I'm not sure what sort of jobs are triggering this leak, but it seems to be recurring.

==1514== 855 bytes in 15 blocks are definitely lost in loss record 365 of 376
==1514==    at 0x4C26FDE: malloc (vg_replace_malloc.c:236)
==1514==    by 0x9044B41: strdup (in /lib64/libc-2.12.so)
==1514==    by 0x50385CE: globus_l_gram_protocol_get_string_attribute (globus_gram_protocol_pack.c:2420)
==1514==    by 0x5038B86: globus_gram_protocol_unpack_job_request (globus_gram_protocol_pack.c:253)
==1514==    by 0x425B18: globus_gram_job_manager_read_request (globus_gram_job_manager_state.c:1533)
==1514==    by 0x416553: globus_gram_job_manager_request_load (globus_gram_job_manager_request.c:982)
==1514==    by 0x42EABD: globus_l_gram_startup_socket_callback (startup_socket.c:1680)
==1514==    by 0x6D032A7: globus_l_xio_read_write_callback_kickout (globus_xio_handle.c:1216)
==1514==    by 0x6D0383F: globus_i_xio_read_write_callback (globus_xio_handle.c:1184)
==1514==    by 0x6D0B38B: globus_l_xio_driver_op_read_kickout (globus_xio_driver.c:637)
==1514==    by 0x6D1678C: globus_xio_driver_finished_read (globus_xio_pass.c:1238)
==1514==    by 0x6D25B90: globus_l_xio_file_system_read_cb (globus_xio_file_driver.c:682)

Comments

Joe Bester - 2012-05-16

I confirmed that this is a restart job-related leak, as I am seeing it with restart-to-new-url-test but not a normal job request.

Globus Toolkit/GT-191

Summary

Investigate GRAM failures reported to usage stats

Details

Type: Task

Status: Open

Description

There have been a large number of job failing for reasons other than user cancel. Investigate those failures, and try to find interesting relationships between the errors and service deployments or client-requested features, or LRMs used.

Comments

Joe Bester - 2012-05-17

I've updated my prototype gram error viewer page with some recent data at http://www.mcs.anl.gov/~bester/gram_job_reports/
It's still a bit crude, but shows where the majority of errors are happening.

Globus Toolkit/GT-192

Summary

Segfault in globus-gram-streamer

Details

Type: Bug

Status: Resolved 2012-05-22

Description

See https://ticket.grid.iu.edu/goc/12125

Reported by Florida shortly after upgrade to 13.35.  Backtrace:

(gdb) bt
#0  0x00000000004092e9 in globus_gram_job_manager_request_log (
request=0x7fff1bdd87d0, level=GLOBUS_GRAM_JOB_MANAGER_LOG_TRACE,
format=0x429280 "event=gram.state_file.read.end level=TRACE path=%s status=%d \n") at globus_gram_job_manager_request.c:1515
#1  0x0000000000411b03 in globus_gram_job_manager_state_file_read (
request=0x7fff1bdd87d0) at globus_gram_job_manager_state_file.c:766
#2  0x0000000000408139 in main (argc=5, argv=0x7fff1bdd9048)
at globus_gram_streamer.c:150

It appears request->config is NULL and the log function doesn't check for this.

Comments

Joe Bester - 2012-05-22

I've committed a fix to this.

Globus Toolkit/GT-193

Summary

please document how openssl is found

Details

Type: Improvement

Status: Open

Description

just OPENSSL_CFLAGS and OPENSSL_LIBS should do it
1:18
OPENSSL_CFLAGS="-I ${openssl}/include" OPENSSL_LIBS="-L${openssl}/lib -lssl -lcrypto"

please put that in the INSTALL or somewhere on the web site for those of us on macs without pkg-config nor the inclination to build it and its crap dependencies.

Comments

Karl Pickett - 2012-05-16

also, I don't even see a doc that you use pkg-config.

Globus Toolkit/GT-194

Summary

Assert failure in cleanup code

Details

Type: Bug

Status: Open

Description

I saw the following assertion failure, probably when the job was shutting down.

(gdb) bt
#0  0x00000034ae630265 in raise () from /lib64/libc.so.6
#1  0x00000034ae631d10 in abort () from /lib64/libc.so.6
#2  0x00000034ae6296e6 in __assert_fail () from /lib64/libc.so.6
#3  0x000000000040cc9d in globus_gram_job_manager_stop_all_jobs (manager=0x7fff6b999180) at globus_gram_job_manager.c:2104
#4  0x00000034b361850e in globus_callback_space_poll_nothreads () from /usr/lib64/libglobus_common.so.0
#5  0x00000034b363846f in ?? () from /usr/lib64/libglobus_common.so.0
#6  0x0000000000409741 in main (argc=, argv=) at main.c:641
(gdb) up 3
#3  0x000000000040cc9d in globus_gram_job_manager_stop_all_jobs (manager=0x7fff6b999180) at globus_gram_job_manager.c:2104
2104            assert(rc == GLOBUS_SUCCESS);
(gdb) p ref
$2 = 
(gdb) p ref->key
Cannot access memory at address 0x0

This is with the latest release (13.35-0.4), *without* the OSG patch to globus_gram_job_manager_stop_all_jobs.

Comments

Joe Bester - 2012-05-17

Do you have a core file and link to the version of the job manager you are using?

Globus Toolkit/GT-195

Summary

GridFTP acts as wrong user when user doesn’t exist

Details

Type: Bug

Status: Resolved 2012-05-31

Description

We're using GridFTP from GT 5.2.1 and we (Doug Strain and Neha Sharma) found an interesting bug. Normally, GridFTP maps me to the user that I am mapped to in the grid-mapfile. For instance, when I'm mapped like this:

"/DC=org/DC=doegrids/OU=People/CN=Alain Roy 424511" alainroy

I'm mapped to the alainroy user. I can easily tell which user it is with UberFTP, though the client is irrelevant:

% uberftp fermicloud084
220 fermicloud084.fnal.gov GridFTP Server 6.5 (gcc64, 1323378368-83) [unknown] ready.
230 User alainroy logged in.
UberFTP> pwd
/cloud/login/alainroy

However, if I'm mapped to a user that doesn't exist, GridFTP appears to pick the last user in /etc/passwd. For example, when alainroy is misspelled:

"/DC=org/DC=doegrids/OU=People/CN=Alain Roy 424511" alainroyy

I'm mapped to the tomcat user:

% uberftp fermicloud084
220 fermicloud084.fnal.gov GridFTP Server 6.5 (gcc64, 1323378368-83) [unknown] ready.
230 User alainroyy logged in.
UberFTP> pwd
/usr/share/tomcat5

apparently because Tomcat is the last user in the passwd file:

% tail -1 /etc/passwd
tomcat:x:91:91:Tomcat:/usr/share/tomcat5:/bin/sh

Another example:

% globus-url-copy file:///cloud/login/alainroy/shar.pl gsiftp://fermicloud084.fnal.gov/tmp/shar.pl
% ls -l /tmp/shar.pl
-rw-r--r-- 1 tomcat tomcat 55051 May 17 12:11 /tmp/shar.pl

I would think that if the user doesn't exist, something safer would happen. Probably you should deny access.

Lest this seem like a rare condition, it's pretty common for people in OSG to mistakenly authorize users that don't have accounts. People authorize whole VOs because they authorize "everyone in OSG" but regularly forget to make any of the accounts for them. So this may well be a common problem and could cause security breaches. Definitely something to fix.

If you provide us with a patch, we can ship a patched version to OSG in advance of a new release from you.

Thanks!
-alain

Comments

Stuart Martin - 2012-05-17

Alain notes that this bug is not in 4.0

Mike Link - 2012-05-17

I see the problem and will have a patch soon.

Mike Link - 2012-05-17

Patch attached.  The problem was that we were improperly checking the return values from our getpw* wrapper functions.

alainroy - 2012-05-17

This is great, thanks! We'll try the patch and let you know how it goes.

alainroy - 2012-05-17

I built a new GridFTP server with the patch, and I get much better behavior now:

% uberftp fermicloud084
220 fermicloud084.fnal.gov GridFTP Server 6.5 (gcc64, 1323378368-83) [unknown] ready.
530 Login incorrect. : Mapped user 'alainroyy' is invalid.

This was an incredibly fast response time. Thank you!

Mike Link - 2012-05-17

Great.  Thanks for reporting it.

Mike Link - 2012-05-23

This is not limited to the 5.2 release.  Threaded builds of all earlier releases would be affected as well.
The vulnerability is related to the use of getpwnam_r() in our wrapper library.  That was used when certain autoconf macros were defined, which is the case in 5.2 and only threaded builds prior to 5.2 (*thr* in the flavor name).

Globus Toolkit/GT-196

Summary

globus debian repos are missing debian source packages

Details

Type: Bug

Status: Resolved 2012-06-05

Description

In trying to create update packages for some bug fixes, I noticed that our repos are missing debian source packages altogether.

Comments

Joe Bester - 2012-05-22

I've updated the build-debs script to create those; next time we rebuild and distribute updates, we should get those in place.

Joe Bester - 2012-06-05

This is fixed in the testing repo.

Globus Toolkit/GT-197

Summary

debuginfo packages missing for rhel and centos 5

Details

Type: Bug

Status: Resolved 2012-05-23

Description

The debuginfo package are missing for the RPMs for CentOS 5 and the RHEL builds. These packages contain the debugging symbols and source files so that users can use the globus tools in a debugger.

Comments

Joe Bester - 2012-05-21

The package redhat-rpm-config will need to be added to the build amis for this.

Joe Bester - 2012-05-22

I've updated CentOS 5 to help me debug the issue that caused me to add this bug report, and will do the rest next sprint.

Joe Bester - 2012-05-23

rhel images are updated as well

Globus Toolkit/GT-198

Summary

globusrun crashes when authentication fails for status check

Details

Type: Bug

Status: Resolved 2012-05-21

Description

globusrun -status will crash if it fails to authenticate with the job manager process, as it tries to look into an uninitialized extensions hashtable.

Comments

Globus Toolkit/GT-199

Summary

GRAM audit checks result username incorrectly

Details

Type: Bug

Status: Resolved 2012-05-22

Description

The gram audit code uses globus_libc_getpwuid_r() but doesn't check that res is non-NULL, so if the current uid can't be resolved, it will record the last uid that getpwuid returns.

Comments

Globus Toolkit/GT-200

Summary

Debug issues with GridFTP server at APS

Details

Type: Task

Status: Resolved 2012-06-05

Description

APS was having strange issues on a SUSE box, including firewall-like hangs with no firewall present, and slowing transfer speeds when it did work.   The machine is on a restricted network, so I tried debugging via email, unsuccessfully.  I then loaded the same distro on a vm image and tested with that, but didn't see a problem.

Suresh is going to see if he can arrange remote access to the machine, and if so I expect I could spend a day debugging directly.

Comments

Mike Link - 2012-06-05

Got access to machine, verified problem.  After debugging I saw it was a problem with multicast dns lookups hanging, possibly conflicting with their service.  Admins disabled mdns and problems are gone.

Globus Toolkit/GT-202

Summary

Evaluate RPM multi-relocate vs setting alternate RPM root for XSEDE GridFTP

Details

Type: Task

Status: Resolved 2012-06-05

Description

XSEDE sites want to be able to install a testing version of GridFTP (or other software) in an alternate install location.  We need to figure out an acceptable way for them to do this.  It seems that building the Globus Toolkit RPMs as relocatable is probably easiest for both our team and the RPs, but we need to try it out to confirm that this strategy is viable.

Comments

Eric Blau - 2012-06-05

It turns out that both approaches are viable, each has some small caveats:

RPM multi-relocate works well, and is easy to add to the spec files.
    Downsides:  have to modify all spec files (though perhaps not until each package is updated, see below)
                       Cannot have multiple copies of the same package on the machine, even in different locations
                       No problem having multiple _versions_ of the same package, as long as there are no overlapping files after relocation
                       Will require LD_LIBRARY_PATH to be set to include relocated libdir and standard libdir
                       don't think it is possible to use YUM to install relocated packages

Alternate RPM root:
    Benefits:  Can be done without having to change our spec files
                   relatively easy to use YUM--don't have to individually download packages
                   can have as many copies of the same packages as you want, in different roots.
   Downsides:
                   Unless you play tricks with copying the system rpmdb, the alternate install root has to have a bunch of system libraries and binaries (eg bash, perl, libc), making the alternate install root significantly larger (though still generally under 200MB)
                   Will require LD_LIBRARY_PATH and GLOBUS_LOCATION to be set to alternate root

My position is that we should add the multi-relocate section to the spec files (it should be the same for all of them, and is straightforward), and recommend that relocatable packages be used for testing locations, but include information about alternate root installs as well.

Globus Toolkit/GT-203

Summary

Create XSEDE GridFTP documentation

Details

Type: Documentation

Status: Resolved 2012-08-13

Description

Flesh out the outline for XSEDE GridFTP deployment into full documentation

Comments

Eric Blau - 2012-06-05

RPM relocation strategies have been tested, so that part of the documentation can now be fleshed out.
Still need to confirm Solaris 10 builds (possibly with user submitted patch that removes a currently unused typedef)

Eric Blau - 2012-06-19

Have been adding information to the outline, have 70%+ draft.

Eric Blau - 2012-07-03

Outline in presentable form, in process of testing to confirm that the recommended steps and configurations (and tests) are correct.

Eric Blau - 2012-07-17

Have gotten feedback from Mike Link on a couple of points that need revision, and with some new(ly packaged) tests that should be added.

Stuart Martin - 2012-08-13

A complete version of the guide is done and sent to XSEDE staff.

Globus Toolkit/GT-204

Summary

Track XSEDE GridFTP requirements

Details

Type: Task

Status: Resolved 2012-09-05

Description

Track XSEDE requirements with regard to GridFTP deployment, to inform the documentation and software release process.

Comments

Eric Blau - 2012-09-05

This was an activity preceding and concurrent with the drafting of the 5.2.2 XSEDE GridFTP Admin documentation.
It is complete, and its results have been captured by said documentation and expanded list of platforms for GT 5.2.2 binaries.

Globus Toolkit/GT-205

Summary

gatekeeper should log a message when it exits due to the presence of /etc/nologin

Details

Type: Improvement

Status: Resolved 2012-05-24

Description

From Troy Baer from XSEDE ticket 215602:

We've had a long-standing problem on Kraken where the GRAM5 service wouldn't start up properly when the system boots.  Yesterday, we discovered that this was because the globus-gatekeeper process will exit silently (i.e. without logging *anything*)
if it detects the existence of /etc/nologin.  This is highly suboptimal, and I'd like to get it fixed if possible.

(This is with gram5-5.0.4, BTW.)

Troy's recommendation:

in the presence of /etc/nologin the gatekeeper should be just like its behavior in the presence of
$GLOBUS_HOME/{etc,var}/globus-nologin -- exit, but log why you're doing.

Comments

Joe Bester - 2012-05-23

> (This is with gram5-5.0.4, BTW.)

Does this mean we need to generate a patch for the 5.0 branch?

Stuart Martin - 2012-05-24

Yes.  XSEDE does not have any clear plans at present to upgrade to 5.2.x, so they will need a 5.0 patch.

Joe Bester - 2012-05-24

I've created committed a fix for this to 5.2 and 5.0 and trunk and made a new 5.0 advisory. The 5.2 advisory will either come sometime later, or the fix will end up in 5.2.2.

Globus Toolkit/GT-206

Summary

Build RPMs for CentOS 4

Details

Type: Task

Status: Resolved 2012-06-05

Description

CentOS 4 is used by XSEDE for ranger. It has an older version of many packages, so some of the build and runtime dependencies will need updates to work with it. Also, the version of yum on CentOS 4 is old and doesn't handle html redirects like the new versions, so the build-rpms script will need some tweaks to generate repos to use. This build doesn't currently need myproxy.

Comments

Joe Bester - 2012-06-05

The packages are loaded into the testing repo. To install, first get http://www.globus.org/ftppub/gt5/5.2/testing/packages/rpm/centos/4/x86_64/Globus-testing-config.centos-4-1.noarch.rpm

Globus Toolkit/GT-207

Summary

define and document 5.2 release stream

Details

Type: Task

Status: Resolved 2014-07-23

Description

With native packaging, it's becoming much easier for us to push bug fixes to sites via the updates repository. Doing so, however introduces some confusion about what things like Toolkit Version 5.2.1 really mean, as doing a yum or apt install will get 5.2.1 + bugfixes and eventually fold into 5.2.2 and whatever else follows. This is somewhat complicated by situations (solaris, macos) where we don't have native packages or updaters in place. In those cases, we'd need to push out update source packages as before.

So, we need to define some behaviors related to the new updating release:
- What the 5.2 release versions mean?
  - What do you get when you install the repo packages linked to from the toolkit downloads page?
- Document how to upgrade and downgrade packages
- How we will keep documentation up to date, while still keeping relevant documentation to previous versions available
- How will we push updates to the non-native packaged world.

Comments

Joe Bester - 2012-06-01

http://confluence.globus.org/display/GT/GT+5.2+Release+Stream+*DRAFT*

Some things have been done already:
- The frag generator adds a pass on the Known_Problems frag and adds links to update packages that are relevant
- The advisories.html page includes better documentation on how to install gpt, rpm, and debian update packages

Joe Bester - 2012-09-12

It looks like all of the critical issues in the release stream doc are in place or in jira as separate items.

Globus Toolkit/GT-208

Summary

Move toolkit documentation to globustoolkit.org

Details

Type: Task

Status: Open

Description

The globus.org website is going to become more globusonline.org and less globus.org/toolkit, so we'll need to figure out how to manage the documentation on the new site. This is somewhat related to GT-207, as it's not entirely clear what the 5.2 documentation will look like with the auto-updating release stream. At the least, the documentation parts of the http://confluence.globus.org/display/GT/GT+5.2+release+process should be updated to run on the new (virtual?) server, with perhaps some overhaul of that process to make it more automated if still needed.

Comments

Joe Bester - 2012-06-06

Reassigning to Stu, so he can see if he can find someone willing to maintain the web server on an external location.

Globus Toolkit/GT-209

Summary

job manager crash in query

Details

Type: Bug

Status: Resolved 2012-05-24

Description

The job manager crashed during a couple of tests on CentOS 4. It looks like the query callback is trying to log with a null request and this is hitting a crash. Perhaps related to the GRAM-207 and  GT-192 fixes.

Comments

Globus Toolkit/GT-210

Summary

grid-mapfile-check-consistency doesn’t work well

Details

Type: Bug

Status: Open

Description

The grid-mapfile-check-consistency script flags having multiple lines with the same DN as an error, but doesn't notice invalid local users except in the first line with a particular dn. I ran in to this before suggesting we indicate this tool might help detect vulnerability to GT-195

Comments

Globus Toolkit/GT-211

Summary

Add details to download page about advisories

Details

Type: Documentation

Status: Resolved 2012-06-06

Description

On the download webpage, add a sentence or 2 how update packages / advisories augment the 5.2.x releases.  Add a link to advisories.

http://www.globus.org/toolkit/downloads/latest-stable/

Comments

Joe Bester - 2012-06-06

I think this is handled by the changes for GT-216

Globus Toolkit/GT-212

Summary

Missing debian packages

Details

Type: Bug

Status: Resolved 2012-07-06

Description

The packages globus-gram-audit globus-gridmap-callout-error exist in the installer and rpm packages, but not in our debian builds.

Comments

Joe Bester - 2012-06-25

globus-gram-audit is in done.

Joe Bester - 2012-06-27

globus-gridmap-callout-error is done as well.

Globus Toolkit/GT-213

Summary

centos 5 test failures on sge

Details

Type: Bug

Status: Resolved 2012-05-24

Description

The CentOS 5 tests on bamboo are haning/failing because of SGE misconfigurations. It looks like the gridengine_init script is not being run---not sure if there are leftovers in the /usr/share/gridengine  directory. Also, the gridengine tools have -ge appended to them in that install (qstat-ge). The non-ge versions may be torque or gridengine depending on the order of installation.

Comments

Joe Bester - 2012-05-24

I removed the gridengine/default/common directory to get the gridengine_init script to create a the sge configuration for the ec2 image, and then modified the globus-gram-job-manager-sge.spec file to use commands with the -ge suffix when building on rhel5-based systems.

Globus Toolkit/GT-214

Summary

Leaks in the job manager restart code

Details

Type: Task

Status: Resolved 2012-06-05

Description

The unique IDs of jobs that get stuck into the pending_restarts list seems to leak when a job manager is started. Also, the state file reader can leak pieces of the request structure if the request is discarded because of an LRM or service tag mismatch.

Comments

Joe Bester - 2012-05-25

valgrind log from Brian Bockelman

Joe Bester - 2012-06-01

First patch covers the definitely lost blocks from the valgrind.txt labeled "loss record 1,722", "loss record 1,739",
and "loss record 1,742"

These can occur when a job manager is started with a different service tag than another job manager which has multiple job state files in place. Parts of the state files are read and then discarded when gram realizes it shouldn't bother with them, but fails to deallocate the client dn, staging file lists, and client callback contacts.

alainroy - 2012-06-01

Thanks Joe! I don't have a feeling as to how much that patch covers. Should we wait for more patches, or does this cover enough that we should be building and testing with it?

Joe Bester - 2012-06-01

I think this covers the majority of the leaks. I have another patch but am still working on testing it. That one should cover the other large leak that valgrind marks as a definite leak.

alainroy - 2012-06-01

If you think it's coming "soon" (today? Monday?) I'm tempted to wait for it. Any idea how long it will take?

Joe Bester - 2012-06-01

Probably today.

alainroy - 2012-06-01

Great, we'll wait. Thanks!

Joe Bester - 2012-06-01

Adding pending-restarts-leak.txt patch which covers loss record 1,722 in valgrind.txt

I think these two patches cover the definitely happening leaks that occur at job manager startup time that are dependent on the number of job state files on disk. I've tested that the identified leaks are gone, and have kicked off the GRAM test suite to ensure that things still work with the patches. I don't anticipate any new issues with these patches.

Joe Bester - 2012-06-05

These are tested and committed

Globus Toolkit/GT-215

Summary

hpss dsi hangs after an error during hpss read

Details

Type: Bug

Status: Resolved 2012-06-05

Description

Nick B. at nersc reported a bunch of errors from a particular user/GO transfer.  The gridftp processes were hung after the errors.

Comments

Mike Link - 2012-06-05

Fixed the hang and released an updated hpss dsi.

Globus Toolkit/GT-216

Summary

Add updates packages to the downloads page

Details

Type: Task

Status: Resolved 2012-06-06

Description

The toolkit downloads pages (such as http://www.globus.org/toolkit/downloads/5.2.1/ ) contain repos and the source installer, but the repos might contain fixes not in the source installer. Add support for parsing the advisories.txt file and generating something similar to the advisories page, but only with updates relevant to that particular release.

Comments

Joe Bester - 2012-06-06

I've updated http://www.globus.org/toolkit/downloads/5.2.1/index.html to include links to the update source packages and the advisories page.

Globus Toolkit/GT-217

Summary

review and automate documentation tasks

Details

Type: Task

Status: Resolved 2012-11-21

Description

The tasks in http://confluence.globus.org/display/GT/GT+5.2+release+process#GT5.2releaseprocess-Doccoordinatortasks should be automated so that making a release has fewer steps.
- Investigate CVS branching or automating the copy of the files
- Fix the make files so the olink and html toplevel make rules work instead of relying on find
- Use the frag generator script to create change summaries, fixed bugs, and known problems frags

Comments

Joe Bester - 2012-07-10

I've done the 2nd one of these (making the Makefile for documentation work). I've updated all of the makefiles in the  5.2.2 dirs to include a common set of rules, and then updated those rules to compute dependencies (based on xi:include and graphical inclusions) and build html and pdf output when dependencies change.

The frag generator creates the parts for #3, but those need to be manually copied into the documentation dirs.

Joe Bester - 2012-11-21

I've updated the doc for CVS branching so it's less complicated. It still might make sense to use a vendor release branch and just keep changes for new versions in branches, but it's not obvious to be a benefit over what we have now, so I'm closing this.

Globus Toolkit/GT-218

Summary

Document repository types and locations

Details

Type: Task

Status: Resolved 2012-07-06

Description

The various repositories testing, stable, versioned, should be documented on the admin guide, including what they can contain and what their paths are.

Comments

Globus Toolkit/GT-219

Summary

Add instructions on how to install an update to the admin guide

Details

Type: Task

Status: Resolved 2012-06-06

Description

Modify the admin guide or toolkit release notes to contain information about how to go about installing an update package from source or from one of our repositories. Add a link to the rss page for update announcements http://www.globus.org/toolkit/rss/advisories/5.2.rss

Comments

Joe Bester - 2012-06-06

See http://www.globus.org/toolkit/docs/5.2/5.2.1/admin/install/#id2550691

Globus Toolkit/GT-220

Summary

Run automated tests when commits to metadata occur

Details

Type: Task

Status: Resolved 2012-08-13

Description

After the bamboo migration is complete, add triggers to automatically run the build and test tasks when the packaging metadata is modified. When these tasks complete, have artifacts published somewhere so that a periodic job on login.globus.org can download them and publish them to the testing repo.

Comments

Joe Bester - 2012-08-13

The automated execution is in place with jenkins. I've not implemented the updates to the testing repo, but that can be handled in a new item.

Globus Toolkit/GT-221

Summary

Make Globus RPMs relocatable

Details

Type: Task

Status: Open

Description

Eric investigated relocatable RPMS in GT-202. Support for this is apparently easy to add, so we should modify our spec files to include this feature. To do so, we will need to add relocations for /usr /etc /var to each spec file.

Comments

Joe Bester - 2012-06-29

It's less easy than originally thought, as some data in share can't be found unless we modify some code. Since XSEDE doesn't seem to need this for their multiple-version install plan, it's not urgent to do this.

Globus Toolkit/GT-222

Summary

Provide a feature to limit the number GridFTP connections per user

Details

Type: New Feature

Status: Open

Description

xinetd allows you to configure a limit on connections from a particular ip address, but there is no way to limit the number of connections for the mapped user.  With the current server, it would require state files to keep track of the number of connections across processes, perhaps one per username populated with a single number.  The file would be locked and the number incremented or decremented for each connection or disconnection.   I believe the main difficulty would be ensuring that all disconnections get counted.

Comments

Globus Toolkit/GT-223

Summary

Replace job state files with something better

Details

Type: Improvement

Status: Open

Description

There are some problems with the job state files that have some effects on the scalability and performance of GRAM. For example, each file must be read and write completely in order to be able to processed, as there is no way to do random access to the file because of its variable length fields. Whenever GRAM needs to do processing on behalf of a job, it usually ends up reading the whole file, even though most of the fields are only needed when the job is submitted. Some things like stdio update and proxy refresh (which condor uses) could be handled without necessarily having to read the whole state file into memory if there was a way to do random access to get to certain pieces of the state. Also, the restart code ends up loading each state file in turn, and then doing some processing to get the job request structure into a consistent state to know what needs to be done with it.

One possibility would be to replace the state file with an sqlite databse. This would add random access to the fields of the state file (add a row to register a new client callback contact, replace a row to do stdio update, query job state ) and also allow the job manager to pull out the ids of jobs that need to poll or are ready to send a state callback without having a large set of job requests in memory at any given time.

Comments

Globus Toolkit/GT-224

Summary

Manage GRAM execution per client host for scalability for different clients

Details

Type: Improvement

Status: Resolved 2012-07-06

Description

Several OSG sites have complained about a severe decrease in scalability between GRAM2 and GRAM5.  I think this is due to the fact that we previously had one grid_monitor per DN per submit host, and now have one globus-job-manager per DN.

Adding three submit hosts does not gain a factor of 3 in scalability: everything goes through the same g-j-m process.

Even worse - a significant issue at one submit host can render the other two inoperable (by hitting the concurrency limit, or making globus-url-copy stageout stall).

Can we include the submit host in the hash calculation for the g-j-m tag?

Comments

Joe Bester - 2012-06-27

Added a hack to the job manager script interface to have script instances independent for different client hosts, which should cover the case of things getting slowed when a client is slow to response to gass copy.

alainroy - 2012-06-27

That sounds great! But I'm slightly confused--I thought you were going to approach this in a different way. Am I confused or did you not go that route?

Let me know when you have either a patch or a new version of the GRAM job manager and we'll test it out.

bbockelm - 2012-07-03

Hi Joe,

I'm catching up, coming back from vacation.  What tag is this work in?

Brian

Globus Toolkit/GT-225

Summary

GRAM5 skips some SEG events

Details

Type: Task

Status: Resolved 2012-06-18

Description

Our large PBS sites have reported issues with GRAM5 losing track of jobs when SEG is enabled.

If the SEG misses an event, the job might indefinitely stay in Idle or Running state.  After a few weeks of running, if there's enough Idle jobs, the various pilot factories will stop submitting - possibly causing the site to become non-operational.

The only workaround I have found is to restart the g-j-m without the SEG, let it do the explicit qstat, and then restart it again with the SEG.  This is a complex procedure for sites to follow.

Is it possible to have a "hybrid SEG" mode, where the job states are explicitly queried once every 1-4 hours?

Comments

Joe Bester - 2012-06-07

The hybrid seg idea has been talked about a while back (GT-89). Maybe time to bump that in priority.

Do you have more details about the PBS problem? Are the jobs ending up in the SEG log in /var/lib/globus/globus-seg-pbs/$DATE logs? Just trying to understand if it's a PBS SEG or job manager problem.

bbockelm - 2012-06-07

Hi Joe,

I can confirm the job state transitions are picked up by the SEG, but are not picked up by the job manager.  I don't have any insight into why it is not being picked up by the job-manager.

I believe we have seen this at two of the three large PBS sites (using our globus state file parser).  I have not looked for it at the third.

Brian

Joe Bester - 2012-06-11

I started an experiment on Friday and was able to hit the problem in the job manager seg processing for a couple of jobs that occur when they finish at the last second of a (UTC) day.

I'll rerun the experiment today with a patch and see if the problem reoccurs. I am still investigating the GT-89 implementation as well.

Joe Bester - 2012-06-14

Source package with the fix for the SEG problem. Does not contain a GT-89 implementation, as I'm trying to unwind some of the code a bit to make it safe to add.

http://www.globus.org/ftppub/gt5/5.2/testing/packages/rpm/centos/5/SRPMS/globus-gram-job-manager-13.45-1.src.rpm

alainroy - 2012-06-14

Thanks Joe! Should we wait for to make the other fixes (GT-89) as well, or should we be taking this version?

Joe Bester - 2012-06-14

GT-89 is turning to be a little hairy to implement so it might be better to grab this one. This fix is pretty straightforward.

alainroy - 2012-06-14

OK, I'll get out an update for testing soon. Thanks Joe!

alainroy - 2012-06-18

I've built it and asked for a few people to test. If testing goes well, we'll ship next week.

Globus Toolkit/GT-226

Summary

An authz callout mapping SAML ePPN/ePTID to a unix account

Details

Type: New Feature

Status: Resolved 2012-09-05

Description

UChicago IdP releases the eduPersonPrincipalName and eduPersonTargetedID attributes. https://test.cilogon.org/ can process the attrobutes and issues user credentials with the reflecting extensions (ePPN 1.3.6.1.4.1.5923.1.1.1.6, ePTID 1.3.6.1.4.1.5923.1.1.1.10). The ePTID can be then mapped by a GridFTP authz callout to a unix account. A request for such a callout has been submitted as the GO ticket https://support.globusonline.org/tickets/300237.

Comments

Lukasz Lacinski - 2012-08-31

The callout is available at http://mcs.anl.gov/~lukasz/gc/globus_gridmap_eppn_callout-0.1_5.0.tar.gz.
It requires the following env variables are set:
GLOBUS_MYPROXY_CA_CERT=
GLOBUS_MYPROXY_AUTHORIZED_DN=

for example
# export GSI_AUTHZ_CONF=$GLOBUS_LOCATION/etc/gridmap_eptid_callout-gsi_authz.conf
# export GLOBUS_MYPROXY_CA_CERT="/etc/grid-security/certificates/28776852.0"
# export GLOBUS_MYPROXY_AUTHORIZED_DN="/DC=org/DC=cilogon/C=US/O=University of Chicago"

Lukasz Lacinski - 2012-08-31

The callout is used by GCMU 1.1.5UC
http://connect.globusonline.org/linux/stable/globusconnect-multiuser-1.1.5UC.tgz

Globus Toolkit/GT-227

Summary

API Documentation for Globus Priority Queue

Details

Type: Improvement

Status: Resolved 2012-06-13

Description

Trying to understand some old gram code, I ran into the use of the priority queue and got confused because of lack of documentation.  Add doxygen comments for the globus_priority_q structure and API to make it easier to understand what's going on.

Comments

Joe Bester - 2012-06-13

Committed some doxygen doc

Globus Toolkit/GT-228

Summary

build gsissh natively for windows

Details

Type: New Feature

Status: Resolved 2012-06-19

Description

after building gridftp natively for windows, I realized that gsissh's dependence on cygwin still meant that we needed the entire set of globus cygwin libraries.  http://www.nomachine.com/contributions  has a patched openssh for windows, and I should be able to apply Jim's gsi patches to that to get a native gsissh client.

Comments

Mike Link - 2012-06-19

success.  native gsissh is packaged in the latest GC test build.

Globus Toolkit/GT-229

Summary

GridFTP server windows native build

Details

Type: New Feature

Status: Resolved 2012-06-19

Description

Partially port and build gridftp server for windows.  Ignore the process handling such as fork and chroot, ignore user and file handling issues such as setuid/getuid and *pwent/*grent.  The server will run only as a single process (should be inetd-able), and only with permissions of the user running the process.  File listings and stats will have no ownership or permissions info.  Transfers, security, sync, etc, should all be fully functional, and it should do everything needed for the LTA.

Comments

Mike Link - 2011-03-24

Set up a mingw32 build env on linux.  Mattias already makes a windows build with deps up to globus-url-copy, so minimal changes were needed to the base libs.  I believe he builds via rpms using a seperate mingw spec file for each package.  This isn't natural to me yet (at least while editing and debugging), so I worked out the build process using our standard tools (make-packages/gpt).  Majority of the server edits so far are ifdefs and redefines, with real code changes to handle the functions we're ignoring which were previously assumed to never fail.  For the most part I'm trying to avoid #ifdef blocks in the main program flow, and instead stub out the functions that need to work differently.

I currently have it building successfully and running a simple transfer.  Listings will work when I finish the win compatible readdir func.

Karl Pickett - 2011-03-24

How are you handling path / drive translation.  What does / get you.   /~/ should be easy, using HOME or HOMEPATH/HOMEDRIVE.  / is the rub.

Mike Link - 2011-03-24

Yeah, thats something we need to look at.  By default / is the base of the working dir drive, and other drives are available via //./X:/path.  That may be workable in a lot of situations (unlikely to need to server data from multiple drives?), but we'd probably want to work out some virtual mappings that the user can configure (which is a feature linux users wanted too).  ~ is planned.

Mike Link - 2011-04-15

I've got single user functionality just about fully working.  The listing seems to be the tricky part at this point, as the readdir support in mingw is very just slow, or perhaps just buggy the way we use it.  I've started implementing listing using native win32 calls (findfirstfile/findnextfile), and at that point I think this stage would be complete.

Home directory support is working, but the default path situation is the same as in the prior comment.  A virtual mapping feature will help here but that isn't really specific to windows.

Outside the scope of this jira, we had a GSOC project proposal to implement native win32 support for the features we're disabling, but it didn't get any serious responses.

Mike Link - 2011-05-10

Done pending testing for LTA use.

Mike Link - 2012-06-19

my drive letter solution didn't work with GO, it only appeared to work because of a bug in a url library that GO was using.  need to fix.

Mike Link - 2012-06-19

Added virtual base dir of / containing the avalible drive letters and fixed a few issues related to that and path restrictions.

Globus Toolkit/GT-230

Summary

hybrid mode leaks memory for each transfer after switching to striped

Details

Type: Bug

Status: Resolved 2012-06-19

Description

discovered this while testing in prep for 5.2.2.

Comments

Mike Link - 2012-06-19

fixed the memory and fd leak

Globus Toolkit/GT-231

Summary

5.2.1 broken Ubuntu download link

Details

Type: Bug

Status: Resolved 2012-07-09

Description

On this page (for 5.2.1) http://globus.org/toolkit/downloads/latest-stable/
The link for "Ubuntu 12.04 LTS (testing)" returns
"The requested URL /ftppub/gt5/5.2/5.2.1/installers/repo/globus-repository-5.2-stable-precise.0.3_all.deb was not found on this server."

The correct URL should be this:
http://www.globus.org/ftppub/gt5/5.2/5.2.1/installers/repo/globus-repository-5.2-stable-precise_0.0.3_all.deb

Looks like there is a missing -->  "_0"

Comments

Bryce Allen - 2012-07-06

Now the link is no longer missing a 0, but it has a . instead of _ after precise:

http://www.globus.org/ftppub/gt5/5.2/5.2.1/installers/repo/globus-repository-5.2-stable-precise.0.0.3_all.deb

Should be:
http://www.globus.org/ftppub/gt5/5.2/5.2.1/installers/repo/globus-repository-5.2-stable-precise_0.0.3_all.deb

Globus Toolkit/GT-232

Summary

gss_accept_sec_context() doesn’t return value as per specs

Details

Type: Bug

Status: Open

Description

The GSSAPI specs mandates that the mech_type and src_name parameters of gss_accept_sec_context() are filled upon the successful completion of a context establishment sequence. The Globus implementation returns these values only in the middle of the process, which makes it difficult to use it with stacked gssapi pseudomechanisms and glues. The attached patch addresses this shortcoming. Please note that even if it's not required, it'd be very useful to have the mech oid returned by every single call of the function (not just the last one) as indicated by the patch.

Comments

Globus Toolkit/GT-233

Summary

Logrotate on globus-gatekeeper.log causes g-j-m to write into wrong file.

Details

Type: Bug

Status: Open

Description

When logrotating was added to the gatekeeper, we didn't take into account the fact that the various processes do not re-open (or cannot re-open) the file descriptor for the gatekeeper log.

This means that the globus-job-manager processes in particular cannot reliably log to that file: after a day of running, they log to a deleted file descriptor!

This causes us to lose quite a few interesting error messages.

Comments

alainroy - 2012-07-23

Hi Joe. Any thoughts on this one? It's affecting our ability to keep logs on production sites. Thanks!

alainroy - 2012-08-10

Hi, we've haven't heard about this in a while. It's definitely affecting our sites and we're concerned about it. It's pretty high priority.
Thanks!

Globus Toolkit/GT-234

Summary

Extra error messages when looking for CA dir

Details

Type: Bug

Status: Open

Description

When looking for the CA dir in globus_gsi_system_config.c, if a home directory is not found, the logic will pop off the error object and proceed to check elsewhere.

However, it appears there are two error objects in the stack (one for the missing home dir, one for the missing directory).  So, we still get

"File does not exist: /grid_home/uscmsPool014/.globus/certificates is not a valid directory"

in the logs (well, in strace).  Due to GT-233, it's a bit hard for me to follow whether this results in the entire operation failing (as the result has errors associated with it).

Comments

Globus Toolkit/GT-235

Summary

GSI does not reload CRLs if they are replaced

Details

Type: Bug

Status: Resolved 2012-07-30

Description

As far as I can tell, the globus-job-manager does not reload CRLs while it is running.  This means that, eventually, the CRL loaded into the process will expire and all SSL operations will be denied.

I've not had the time to reproduce, but I did find some messages along the lines of "CRL expired" in a non-functioning globus-job-manager process.  These cleared up when I killed off the process.  I also found an OpenSSL bug report stating that these do not get refreshed.

Obviously, we do not want to reload the CRL for every request - but probably cache it for some reasonable amount of time.

Does CRL verification even make sense within the g-j-m process?  If not, maybe we can avoid writing a lot code?

Comments

Joe Bester - 2012-06-25

I'm going to reframe this as an OpenSSL/GSI issue.

I think what is happening is that during the first time a credential is validated or used to establish a security context with a peer, it will attempt to load a CRL (via OpenSSL's internal loading algorithm). This CRL is then cached by OpenSSL and reused for each subsequent context establishment done with the same credential. So, if a CRL expires after that first connection, any existing credentials which have been the CRL cached will reject the context establishment with the expired credential.

Right now I'm looking to see if it's possible to force OpenSSL to forget about an expired CRL and try to reload it during validation.

Joe Bester - 2012-06-25

Here's a first cut at a patch (GT-235.diff) to try to reload the CRL after it fails to validate something because it has expired. I think it works, but need to test to make sure it doesn't leak or do nasty things to openssl memory.

matyas - 2012-07-06

Hi Joe,
I've applied the patch to the globus-gsi-callback package in the OSG and found no problems in a brief smoke test. Do you have any updates, or should we go ahead with the patch as-is?

Joe Bester - 2012-07-10

I was off last week, so haven't looked at it again yet.

Joe Bester - 2012-07-11

Looks like the patch doesn't work with the OpenSSL  1.0 in CentOS 6. I'm investigating some more, but it's slow / awkward to test

Joe Bester - 2012-07-12

With openssl 1.0, this patch will go into an infinite loop, because it will fail to load the CRL after it has been removed from the X509_STORE.

The way the file loading is done is a bit different between the two versions. For 0.9.8, openssl will only try to reload a cert or crl into the X509_STORE it's not present (which is why this patch works for that version). For 1.0, if the object is a cert, it will only try to load it if it's not in the cache; if it's a CRL, it always tries to reload it, even if it is already in the X509_STORE. However, it had found it previously, it will look for the next numeric extension of the CRL (.r1) instead of the .r0 file and fail, so the CRL checking stops being done if a CRL is expired with this patch. Replacing the CRL with a new file with .r1 extension wouldn't work properly with most openssl apps, and even if we modified the patch to loop over the extensions until it finds one that isn't expired, it would be required to parse each of the CRLs in turn before it finds one that is non-expired, and even so, won't necessarily use the most recent one.

I'll try explicitly reloading the CRL and putting it into the X509_STORE if it looks expired so that a patch would work for 1.0. It's kind of a messy solution but ought to work.

Joe Bester - 2012-07-26

I've created an update package that fixes this: http://www.globus.org/toolkit/advisories#globus_gsi_callback-4.4.tar.gz

A source RPM is also available from the http://www.globus.org/ftppub/gt5/5.2/5.2.2/updates/rpm/ dir

This update handles openssl 0.9.8 and 1.0.x behavior

alainroy - 2012-07-30

Thanks Joe! We'll test and release this to OSG users soon.

Globus Toolkit/GT-236

Summary

gram audit makefile has missing parameter to mkdir

Details

Type: Bug

Status: Resolved 2012-07-06

Description

The makefile in globus-gram-audit does

mkdir -p 01733 $(DESTDIR)$(localstatedir)/lib/globus/gram-audit

instead of

mkdir -p -m 01733 $(DESTDIR)$(localstatedir)/lib/globus/gram-audit

so instead of setting mode it creates a number directory.

Comments

Globus Toolkit/GT-237

Summary

Versioning document should be updated

Details

Type: Documentation

Status: Resolved 2012-07-06

Description

The page http://www.globus.org/toolkit/versioning.html should be updated to include info relating to the 5.2 release stream.

Comments

Globus Toolkit/GT-238

Summary

Implement SRFT or MODA to set remote file attributes (specifically timestamps)

Details

Type: Improvement

Status: Open

Description

As presented in http://www.rjh.org.uk/ftp-report.html, there's no standard way for a (grid)ftp server to allow a client to set remote file times.

With the -sync capability introduced in 5.X, there is an expectation in the user community that "some level of sync will preserve timestamps" ala rsync... not an entirely unreasonable expectation.  There's also the associated "make permissions match", which at least is possible via the GridFTP protocol if not GUC directly.... While I'd love to request GUC -sync options to preserve timestamps and permissions, this improvement needs to happen first.

thanks,

--Chan

Comments

Globus Toolkit/GT-239

Summary

installer gsissh build failure from 5.2.2rc1 installer on Mac OS X

Details

Type: Bug

Status: Resolved 2012-07-10

Description

I am seeing this on Mac OS X from the 5.2.2rc1 installer:

/usr/bin/gcc -g   -m64 -fno-common  -Wall -g -m64 -fno-common -Wall  -Wall -Wpointer-arith -Wsign-compare -Wformat-security -Wno-pointer-sign -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -fno-builtin-memset -fstack-protector-all  -I. -I.. -I. -I./..  -I/include/globus -I/ -no-cpp-precomp -I/Users/jbester/development/globus_5_2_branch/5.2.2rc1/gt5.2.2rc1-all-source-installer/globus-location/include/globus -I/Users/jbester/development/globus_5_2_branch/5.2.2rc1/gt5.2.2rc1-all-source-installer/globus-location/include/globus/gcc64dbg -no-cpp-precomp -I/Users/jbester/development/globus_5_2_branch/5.2.2rc1/gt5.2.2rc1-all-source-installer/globus-location/include/globus -I/Users/jbester/development/globus_5_2_branch/5.2.2rc1/gt5.2.2rc1-all-source-installer/globus-location/include/globus/ -I/Users/jbester/development/globus_5_2_branch/5.2.2rc1/gt5.2.2rc1-all-source-installer/globus-location/include/globus  -DHAVE_CONFIG_H -c port-tun.c
port-tun.c:111:20: error: /net/if.h: Input/output error
port-tun.c: In function 'sys_tun_open':

The include lines in the openbsd-compat directory contain -I${prefix}/include/globus -I${includedir}/${GLOBUS_FLAVOR_NAME}, but those macros are not defined. On Mac OS X, /net is a special filesystem, so #include  with -I/ causes an I/O error. The top-level Makefile has those macros defined, but they aren't propagated to all of the makefiles.

Comments

Eric Blau - 2012-06-29

CPPFLAGS contains  -I${prefix}/include/globus -I${includedir}/${GLOBUS_FLAVOR_NAME} because the presence of CONFIGOPTS_GPTMACRO in the Build_Steps from pkg_data_src.gpt sets that as its initial value (coming, fundamentally, from $GL/share/globus/flavors/flavor_gcc64.gpt), to which gsi-openssh's configure then adds the correctly expanded version obtained from globus-makefile-header.  It appears that gsi-openssh is already getting everything important that might be coming from CONFIGOPTS_GPTMACRO from globus-makefile-header, and, thus, we may be able to simply remove CONFIGOPTS_GPTMACRO from the build steps.

Joe Bester - 2012-07-10

Eric found a workaround in the gpt package data file for this that Jim Basney implemented and committed.

Globus Toolkit/GT-240

Summary

5.2.2rc1 globus-version on debian returns 5.2.1

Details

Type: Bug

Status: Resolved 2012-07-06

Description

The debian package builder for 5.2.2rc1 sets GLOBUS_VERSION to 5.2.1 before building globus-common, so it ends up with the wrong version string in the globus-version command.

Comments

Joe Bester - 2012-06-29

Fixed in CVS

Globus Toolkit/GT-241

Summary

wrong SIGINT handling in globus-url-copy

Details

Type: Bug

Status: Resolved 2012-07-19

Description

See original description of the problem sent by Frank Scheiner:

"
Dear all,

during software evaluation in PRACE-1IP we recognized some odd behaviour of guc
when it is killed with Ctrl+C (SIGINT). You can find an elaborate background on
[1].

The problem is, that guc exits with 0 when SIGINTed.

The reason is the following:
Currently guc (<=v8.4) does exit on SIGINT "unconventionally", meaning it
catches a SIGINT, but after doing its internal cleanup and writing out a
possible dumpfile, it does *not* reset the SIGINT handler to the default SIGINT
handler and kills itself with SIGINT, but simply exits normally (leading to "0"
as exit value in the bash shell).

I would expect it to exit with "killed by signal" (128 + signal number, which is
2 for SIGINT (see [2])). I use the exit values of guc to decide if guc worked or
not, or if it was was killed by a signal.

I think the issue could be solved by integrating a clause to detect if guc was
interrupted (globus_l_globus_url_copy_ctrlc == GLOBUS_TRUE), then restore the
original SIGINT handler and at last kill itself with SIGINT.
The relevant part of the code for the clause is the end of main() in
"globus-url-copy.c" (l.2043).

I would appreciate it very much if the Globus developers could do something
about this issue, as it would help interoperability with other tools very much.

Best regards
Frank Scheiner
_____________
[1] 
[2] 

--
Frank Scheiner

High Performance Computing Center Stuttgart (HLRS)
Department Project User Management & Accounting

Email: scheiner@hlrs.de
Phone: ++49(0)711/685-68039
"

Comments

Mike Link - 2012-07-17

Thanks for reporting.  This is fixed and will be released with GT 5.2.2.

Joe Bester - 2012-07-18

The patch for this is causing some test regressions for 5.2.2

Joe Bester - 2012-07-19

Mike's patched again to fix the regressions.

Globus Toolkit/GT-242

Summary

"-preserve" option in globus-url-copy does not work as expected

Details

Type: Bug

Status: Open

Description

See original description of problem sent by Frank Scheiner:

"
Dear all,

during software evaluation in PRACE-1IP we recognized that guc does not preserve file permissions from the local/source side to the remote side.

I made some tests with two Debian 6 driven VMs using the latest GridFTP server packages (v6.11) available and a Ubuntu 11.04 driven workstation with the latest guc package (v8.4) to get some more insight to this issue.

The v8.4 guc has an option named "-preserve" that should - according to [1] - preserve the permissions of files when transferring. It's not documented in the help output, but it's in the source code (according to [1] since 2010).

For these tests I created a source directory having the following content:

"
/home/johndoe/my_special_perm_files$ ls -la
total 24
drwxr-xr-x 2 johndoe johndoe 4096 Jun 20 07:58 .
drwxr-xr-x 4 johndoe johndoe 4096 Jun 20 07:57 ..
-rw------- 1 johndoe johndoe 1024 Jun 20 07:59 0600
-rw---x--x 1 johndoe johndoe 1024 Jun 20 07:59 0611
-rw-r-x-wx 1 johndoe johndoe    0 Jun 20 07:58 0653
-rwx------ 1 johndoe johndoe 1024 Jun 20 07:59 0700
-rwxr--r-- 1 johndoe johndoe 1024 Jun 20 07:59 0744
"

As you can see, the files are named like their permission mode. One file has zero length, to see if this would have an effect. I used the following command to transfer those files from source to destination controlled from my Ubuntu 11.04 driven workstation (reformatted for better reading):

"
$ globus-url-copy \
  -cd \
  -preserve \
  gsiftp://gridftp.zeta.orion:2811/~/my_special_perm_files/* \
  gsiftp://gridftp.epsilon.terra:2811/~/my_destination_dir/
"

The resulting dir contents look like that:

"
/home/johndoe/my_destination_dir$ ls -la
total 24
drwxr-xr-x 2 johndoe johndoe 4096 Jun 20 08:14 .
drwxr-xr-x 3 johndoe johndoe 4096 Jun  8 10:22 ..
-rw-r--r-- 1 johndoe johndoe 1024 Jun 20 08:14 0600
-rw-r--r-- 1 johndoe johndoe 1024 Jun 20 08:14 0611
-rw-r--r-- 1 johndoe johndoe    0 Jun 20 08:14 0653
-rw-r--r-- 1 johndoe johndoe 1024 Jun 20 08:14 0700
-rw-r--r-- 1 johndoe johndoe 1024 Jun 20 08:14 0744
"

All files have 0644 as permission mode - the default for newly created files. Obviously the "-preserve" option does not work as expected.

I then again had a look into the source code of guc (available from [2]) and found another option that looks like it could do the job. It's called "-copy-perms" and in lines 4045 to 4048 (of "globus_url_copy.c") you can see, that it's really intended to do the same job:

"
[...]
    case arg_preserve:
    case arg_perms:
        guc_info->perms = GLOBUS_TRUE;
        break;
[...]
"

Hence I tried the above guc command with "-copy-perms" instead of "-preserve", but this yields the same result, all files 0644. I saw how the "guc_info" struct is evaluated for "delayed passive" in lines 5859 to 5863 (of "globus_url_copy.c"):

"
[...]
    if(guc_info->delayed_pasv)
    {
            globus_ftp_client_operationattr_set_delayed_pasv(
                ftp_attr, GLOBUS_TRUE);
    }
[...]
"

...but there is nothing similar for the perms variable. Hence I assume this was forgotten or is not implemented yet.

I think this "preserve permissions" functionality is something really useful and also needed. Hence the Globus developers  should be contacted to have a look into this issue.

Best regards
Frank Scheiner
___________
[1] 
[2] 
[3] 

--
Frank Scheiner

High Performance Computing Center Stuttgart (HLRS)
Department Project User Management & Accounting

Email: scheiner@hlrs.de
Phone: ++49(0)711/685-68039
"

Comments

Globus Toolkit/GT-243

Summary

Split or striped mode frontends needlessly disconnect and reconnect to backends

Details

Type: Bug

Status: Resolved 2012-07-12

Description

The connection between the frontend and backend is duplicated needlessly each time a data channel is torn down and recreated.

Comments

Mike Link - 2012-07-12

Fixed.

Globus Toolkit/GT-244

Summary

GridFTP server memory leaks

Details

Type: Bug

Status: Resolved 2012-07-12

Description

The data handle can leak when using mode S.  Various small leaks in setup and config.

Comments

Mike Link - 2012-07-12

Fixed

Globus Toolkit/GT-245

Summary

Installer Makefile sets LD_LIBRARY_PATH, but doesn’t append existing value.

Details

Type: Bug

Status: Resolved 2012-07-12

Description

This can cause the build to fail.

Comments

Mike Link - 2012-07-12

Fixed

Globus Toolkit/GT-246

Summary

MFMT set times are offset by timezone on Windows

Details

Type: Bug

Status: Resolved 2012-07-17

Description

MFMT on a windows server results in the actual file time set being offset by the timezone.

Comments

Mike Link - 2012-07-17

this was a result of non-standard mktime() or tzset() behavior on windows.  fixed and released with final GC.

Globus Toolkit/GT-247

Summary

Race condition with perf markers can result in crash at end of transfer with many threads

Details

Type: Bug

Status: Resolved 2012-08-13

Description

Gayane saw this during her high threading security overhead tests.  Looks like multiple markers near the end of transfer can overlap and one will double-free.

Comments

Mike Link - 2012-07-17

the fix looked fairly straightforward. sent a patch to Gayane for testing.

Mike Link - 2012-08-13

Fixed

Globus Toolkit/GT-248

Summary

Update GC win release with GT 5.2.2 and prepare for public release

Details

Type: Improvement

Status: Resolved 2012-07-17

Description

Prep for native GC win public release.

Comments

Mike Link - 2012-07-17

Updated GridFTP to GT5.2.2, fixed an issue with config from cygwin version causing a conflict with native version, customized installer to include a notice about path changes, updated autoupdate metadata.

GC for Win 1.60 is released.

Globus Toolkit/GT-249

Summary

Merge 5.2.2 last minute patches

Details

Type: Task

Status: Resolved 2012-07-19

Description

Mattias sent some compile patches for S390, Hurd, and mingw32 to satisfy debian build process. Merge those and test them.

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-250

Summary

Add meta packages in place of groupinstall rules

Details

Type: Task

Status: Resolved 2012-07-18

Description

Replace the groupinstall targets with metapackages that contain the dependencies, so that we can use the similar install methods for suse and other rpm systems.

Comments

Joe Bester - 2012-07-17

I've committed some packages to the fedora directory and have started a build with those.

Joe Bester - 2012-07-18

I've committed the spec files for the metapackages to the 5.2 branch and have modified build-rpms to build them.

Globus Toolkit/GT-251

Summary

Bad compile flags for S390 build

Details

Type: Bug

Status: Resolved 2012-07-19

Description

The accompiler.m4 entry for S/390 uses -m32 but should use -m31 to generate code for the S/390 Linux ABI

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-252

Summary

Missing dependency in gass cache program

Details

Type: Bug

Status: Resolved 2012-07-19

Description

The gass cache program package is missing a compile dependency on globus_gass_transfer.

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-253

Summary

gatekeeper and job manager don’t build on hurd

Details

Type: Bug

Status: Resolved 2012-07-19

Description

The globus gatekeeper and job manager use SA_NOCLDWAIT which apparently is missing on gnu hurd. Mattias provided a compile patch to fix the compile. It's unclear whether these will work with the patch, but they will at least compile.

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-254

Summary

Gridftp server uses dynamic string as sprintf argument

Details

Type: Bug

Status: Resolved 2012-07-19

Description

The gridftp server passes a dynamic string to fprintf, which can cause problems if the string contains %.

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-255

Summary

gridmapdir support doesn’t compile on non-POSIX systems

Details

Type: Bug

Status: Resolved 2012-07-19

Description

The gridmapdir code doesn't build on non-posix systems as it requires opendir. Mattias provides a patch to #ifdef out that code for win32.

Comments

Joe Bester - 2012-07-19

The patch is committed to 5.2 branch and trunk and in 5.2.2

Globus Toolkit/GT-256

Summary

Machine parseable 404 (file / dir does not exist) reply

Details

Type: Improvement

Status: Open

Description

We are living dangerously in GO land by doing a strstr() based check of the error text to determine if MLST failed due to file not found, as opposed to an internal server error.

Comments

Globus Toolkit/GT-257

Summary

gpt_create_automake_rules creates duplicate rules for man pages

Details

Type: Bug

Status: Resolved 2012-07-27

Description

When  gpt_create_automake_rules processes man pages, it creates filelist-man* targets for every man section present each time it encounters a different man section. This results in duplicate targets and warnings during builds.

For example, job manager includes man1, man5, and man8 section pages; when the Makefile.in is updated, it contains 3 copies of each of the targets filelist-man1, filelist-man5, and filelist-man8 because it processed them 3 times as it encountered each section.

Comments

Joe Bester - 2012-07-27

committed a fix for this

Globus Toolkit/GT-258

Summary

Callout dll loader cannot load callout libraries

Details

Type: Bug

Status: Resolved 2012-10-12

Description

GridFTP server from GT 5.2.1 with an auth callout pointed by GSI_AUTHZ_CONF:

# export GSI_AUTHZ_CONF=/usr/local/packages/globus-5.2.1/etc/gridmap_eppn_callout-gsi_authz.conf
# cat $GSI_AUTHZ_CONF
globus_mapping libglobus_gridmap_eppn_callout.so globus_gridmap_eppn_callout

Callout dll loader tries to load a callout library specified above but with a flavor string concatenated:

# globus-gridftp-server -p 2811 -d ALL -debug
[6558] Thu Aug  2 05:37:56 2012 :: GFork functionality not enabled.:
globus_gfork: GFork error: Env not set
[6558] Thu Aug  2 05:37:56 2012 :: No configuration file found.
[6558] Thu Aug  2 05:37:56 2012 :: Server started in daemon mode.
[6558] Thu Aug  2 05:38:01 2012 :: New connection from: transfer.api.globusonline.org:44513
[6558] Thu Aug  2 05:38:02 2012 :: transfer.api.globusonline.org:44513: [CLIENT]: USER :globus-mapping:
[6558] Thu Aug  2 05:38:02 2012 :: transfer.api.globusonline.org:44513: [SERVER]: 331 Password required for :globus-mapping:.
[6558] Thu Aug  2 05:38:02 2012 :: transfer.api.globusonline.org:44513: [CLIENT]: PASS dummy
[6558] Thu Aug  2 05:38:02 2012 :: transfer.api.globusonline.org:44513: [CLIENT]: PASS dummy
[6558] Thu Aug  2 05:38:02 2012 :: transfer.api.globusonline.org:44513: [SERVER]: 530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: Error with dynamic library: couldn't dlopen libglobus_gridmap_eppn_callout.so_gcc64dbg: file not found
530-
530 End.

Comments

Mike Link - 2012-09-24

I looked into this and there shouldn't be an issue with this code in 5.2.x.

Globus Toolkit/GT-259

Summary

Windows GridFTP lists all drive letters even if they are not accessible due to path restrictions

Details

Type: Bug

Status: Resolved 2012-08-13

Description

This causes confusion, but nothing is accessible that shouldn't be.

Comments

Mike Link - 2012-08-13

Fixed.  Will be released with the next GC update.

Globus Toolkit/GT-260

Summary

Windows GridFTP doesn’t translate /drive-letter correctly when is mapped to /

Details

Type: Bug

Status: Resolved 2012-08-13

Description

~ may be mapped to / if there users home directory is not accessible, for instance if they only allow drive D.  In this case, ~ lists the available drives correctly, but ~/D does not correctly translate to D:\

Comments

Mike Link - 2012-08-13

Fixed.  Will be released with the next GC update.

Globus Toolkit/GT-261

Summary

installer makefile mangles PERL5LIB

Details

Type: Bug

Status: Resolved 2013-06-18

Description

The installer tries to add GPT's perl5 library path to the PERL5LIB environment variable, but in doing so mangles the old PERL5LIB value. It uses

export PERL5LIB="$(GPT_LOCATION)/lib/perl:$(shell echo $PERL5LIB)"

but $PERL5LIB needs a double $ to get things passed to the shell command like this:

export PERL5LIB="$(GPT_LOCATION)/lib/perl:$(shell echo $$PERL5LIB)"

Comments

Joe Bester - 2013-06-18

This was fixed in an earlier release.

Globus Toolkit/GT-262

Summary

Windows GridFTP doesn’t correctly handle relative paths

Details

Type: Bug

Status: Resolved 2012-08-13

Description

It builds the full path correctly but fails map that back to a windows formatted path.

Comments

Mike Link - 2012-08-13

Fixed.  Will be released with the next GC update.

Globus Toolkit/GT-263

Summary

gsi cert_utils does not handle ASN1 GENERALIZEDTIME

Details

Type: Bug

Status: Open

Description

In globus_gsi_cred_handle.c, fucntion globus_i_gsi_cred_goodtill:

        result = globus_gsi_cert_utils_make_time(
            X509_get_notAfter(current_cert),
            &tmp_goodtill);

This assumes that the value of X509_get_notAfter is ASN1_UTCTIME, which is not always the case. It has value ASN1_TIME, which may be UTCTIME or GENERALIZEDTIME. To make matters worse, if the cert does use generalized time, no error is returned, it just mis-interprets the date.

The proper place to fix this is probably globus_gsi_cert_utils_make_time, in cert_utils. It's probably used in other locations as well.

Prior to openssl 0.9.6, it appears that openssl did use UTCTIME. From CHANGELOG in the openssl source:

Changes between 0.9.5a and 0.9.6  [24 Sep 2000]
...
  *) Various fixes to use ASN1_TIME instead of ASN1_UTCTIME.
     Also change the functions X509_cmp_current_time() and
     X509_gmtime_adj() work with an ASN1_TIME structure,
     this will enable certificates using GeneralizedTime in validity
     dates to be checked.
     [Steve Henson]

If we require compatibility with 0.9.5a and earlier, we may need some special handling.

Comments

Bryce Allen - 2012-08-08

Here is a cert that uses GENERALIZEDTIME for notbefore/notafter. Lukasz original discovered this when trying to use the cert with GO, and we discovered that the bug was not just in M2Crypto, but also in GT.

Bryce Allen - 2012-08-09

This alternate implementation seems to be working for both ASN1_TIME subtypes. It assumes X.509 restricted format though (e.g. ends with Z for GMT with no timezone offset). If make_time is called on ASN1_TIME objects that did not come from an X.509, that may be an issue. It would not be too hard to extend however.

It looks like the original make_time implementation was a sloppy adaptation of X509_cmp_time from openssl (crypto/x509/x509_vfy.c). It copies data into a buffer which is not necessary for the adapted version, and it doesn't handle errors properly (it just sets newtime to 0 and continues). ASN1_XTIME_print functions provide a better starting point, although they don't parse TZ offset (they assume X.509 format).

Bryce Allen - 2012-08-09

One reason we've been getting away with this:

http://tools.ietf.org/html/rfc2459#section-4.1.2.5
"CAs conforming to this profile MUST always encode certificate
   validity dates through the year 2049 as UTCTime; certificate validity
   dates in 2050 or later MUST be encoded as GeneralizedTime."

Bryce Allen - 2012-08-09

Decreasing the priority. CAs that follow the RFC shouldn't be producing GENERALIZEDTIME for a while.

Bryce Allen - 2012-08-09

The latest RFC adds this sentence:

http://tools.ietf.org/html/rfc5280#section-4.1.2.5
Conforming applications MUST be able to process validity dates that
are encoded in either UTCTime or GeneralizedTime.

So GT isn't technically conforming because of this bug. In practice it may still be a non-issue for many years though.

Globus Toolkit/GT-264

Summary

link error in globus-redia

Details

Type: Bug

Status: Resolved 2012-08-09

Description

The globus-redia program in 5.2.2 has an implicit dependency on ldl which it isn't linked with explicitly. On some systems (fedora 16), this causes a link failure. The easiest fix is probably to just use lt_dlopen and company instead of dlopen since the libtool library is linked with it.

Comments

Joe Bester - 2012-08-09

committed to 5.2 and trunk.

Globus Toolkit/GT-265

Summary

Windows GridFTP can’t create files/dirs with trailing spaces or periods.

Details

Type: Bug

Status: Resolved 2012-11-19

Description

It looks like windows is weird about trailing spaces and dots.

mkd /c/tmp/dirSP
mlst /c/tmp/dirSP
will work, but the dir being created and checked is just /c/tmp/dir

only when you try to access /c/tmp/dirSP/file will it fail.

Apparently I can force the creation by using a UNC path.

Comments

Mike Link - 2012-08-13

This needs more research.  it sounds like forcing the ability to end in a space opens up the possibility of the name having characters that are invalid in the windows shell.

Globus Toolkit/GT-266

Summary

Add HTTP/S3 transfer support to GridFTP v0.1

Details

Type: New Feature

Status: Resolved 2014-07-23

Description

Need to add GridFTP commands to enable transfers between a GridFTP server and S3.  Additionally, we can use the same commands to use a GridFTP server to facilitate a S3 transfer over the GridFTP data channel.

Rough plans for the commands:

HTTP GET url  -- transfer url to the specified path, or to the client via the data channel.
HTTP PUT url  -- transfer to the url from the specified path or from the client via the data channel.

Partial gets and multi-part puts should be supported.

Other options including setting headers and auth tokens should be considered.

Comments

Mike Link - 2012-08-13

Some good progress on this.

HTTPS is not working -- the xio http driver will do https when combined with the gsi driver, but I couldn't make it work with standard ssl yet.

GETs are working to paths and data channel.  No partial or ranges/multiple streams yet.

PUTs are working from paths, not data channel yet.  No multipart yet.

main todos:

PUT DC
multi-part PUT
multi-stream GET
figure out ssl
Firm up command syntax and doc

Mike Link - 2012-08-27

PUT DC is working.

I had trouble with the way I was doing PUT and GET from/to the local storage, which wasn't compatible with other dsis (and wouldn't have worked well for parallel gets).  Added some framework to 'fake' a transfer to interact with the dsi interface like a normal transfer would, and now that is working correctly.

Ranged gets are working for partial transfers/restarts.  I'm still thinking about parallel gets.  not clear yet about dealing with the multiple connections in the gridftp server code or in the http driver or in a seperate driver, and then the general logic of the transfer.

Still no ssl or multi-part puts-- I have been using fakes3 to test against, which allows me to test locally without ssl, but fakes3 doesn't support POST or multi-part puts, so I have to get ssl working first.

Need to get a rough doc out to at least give karl something to write a test client against.

Currently have the following commands:

persistent setting:
HTTP S3ID 
HTTP S3KEY 

per-transfer:
HTTP RESTART 
HTTP GET  

HTTP PUT  


we talked about HTTP HEADER * to set headers.  easy enough but not done yet.  don't recall a specific need, maybe just for setting metadata.

Stuart Martin - 2012-09-05

At the last sprint review we decide these are the tasks required for an initial version

- HTTP v .1
        + PUT DC (done)
                - Can't restart
        + figure out ssl
        + Firm up command syntax and doc

Mike Link - 2012-09-10

SSL is working.  Ran into some issues with PUT and the expect header once I was able to test against amazon, that is working now.  status/perf markers should also be in this version.  They are working with file transfers, but not yet with DC transfers.  Should be able to wrap this up in another day or two.

Mike Link - 2012-09-24

fixed status markers for all modes, so v1 should be complete.

Globus Toolkit/GT-267

Summary

/etc/globus/globus-condor.conf is not marked as a config file in RPM spec

Details

Type: Bug

Status: Resolved 2012-11-26

Description

The /etc/globus/globus-[jobmanager].conf file are not listed as config files in the various globus-gram-job-manager-[jobmanager] RPMs. If people edit things (like the path to the Condor binaries, which is reasonably common), their changes are lost on update. If they would be marked as configuration files, there wouldn't be a problem.

Could this be fixed in a future release?

Thanks!

Comments

Joe Bester - 2012-08-14

It looks to me like this only affects the condor package.

alainroy - 2012-08-14

It could be--I only checked the condor package and blindly assumed it affected the others.

Joe Bester - 2012-11-26

Forgot to mark this as done

Globus Toolkit/GT-268

Summary

GRAM job manager seg module fails to replay first log of the month on restart

Details

Type: Bug

Status: Resolved 2012-09-24

Description

While debugging to http://ticket.grid.iu.edu/goc/viewer?id=12486 I noticed that a number of old jobs were not being cleaned up when the job manager restarts. The culprit seems to be the seg_jobmanager_module which doesn't normalize the date string before looking to see if an old file exists. That is causing the reply of the 1st log file of the month to fail because GRAM looks for the 32nd log of the previous month. When that fails, it increments again and normalizes, getting to the 2nd log of the new month.

Comments

Joe Bester - 2012-08-15

Attaching missing-normalize.diff to address this issue.

Joe Bester - 2012-09-24

This is committed to 5.2 branch and trunk.

Globus Toolkit/GT-269

Summary

GridFTP servers do not report the DEST IP address in transfer logs or usage stats when configured for striping or split processes

Details

Type: Bug

Status: Resolved 2012-08-27

Description

The problem is more of a design limitation of getting the information from the backend to the frontend, so I don't see an obvious fix.  If it is fixed, the logs and usage database would contain multiple addresses for striped transfers, so any outside parsing that is done would need to handle that.

Comments

Mike Link - 2012-08-27

Fixed.  Multiple data connections will be represented as comma seperated ip addresses.

Globus Toolkit/GT-270

Summary

job manager crash at shutdown (extra_envvar free)

Details

Type: Bug

Status: Resolved 2012-09-24

Description

The config->extra_envvars member is being freed with free(). This is incorrect with the implementation as a list of variable names, as the list contents must be freed with free() but the list nodes with globus_list_remove or globus_list_destroy_all. This has caused crashes on debian 7 at least.

Comments

Joe Bester - 2012-09-24

I've committed a fix to this to trunk and 5.2 branch.

Globus Toolkit/GT-271

Summary

Create Native Packages for gridmap-verify-myproxy-callout

Details

Type: Improvement

Status: Resolved 2012-09-24

Description

There is a gridmap callout to map myproxy ca certificates to local user accounts in 5.0 branch, but it's not natively packaged and in 5.2 branch.

Comments

Joe Bester - 2012-09-24

I've added metadata to the packaging directory for this package, and have added it to the build list. It'll show up the next time I get binaries onto the globus.org site

Globus Toolkit/GT-272

Summary

Increase default proxy key size

Details

Type: Improvement

Status: Resolved 2013-06-26

Description

The proxy keys generated by grid-proxy-init and the delegation protocol default to 512 bits. OpenSSL 1.0.1 adds support for TLS v1.2 that uses SHA-512 which fails when used with such small keys. Perhaps it's time to move that default to 1024 or higher.

Comments

dennisvd - 2013-03-20

Hi,

from the perspective of the European Grid Infrastructure, the IGTF and the Initiative for Globus in Europe project, this improvement would be very welcome indeed!

Joe Bester - 2013-05-15

This is fixed in CVS and updates are available from the GT repos.

Globus Toolkit/GT-273

Summary

can discover user names even with restrict path

Details

Type: Bug

Status: Open

Description

$ ls go#ep1/home/tuecke (as karlito)
500 Command failed : Path not allowed.
$ ls go#ep1/home/nosuchuser
500 Command failed : Path not allowed.

But the ~syntax leaks it:
$ ls go#ep1/~tuecke
500 Command failed : Path not allowed.
$ ls go#ep1/~nosuchuser
500 Command failed : Cannot expand ~

Comments

Globus Toolkit/GT-274

Summary

Add HTTP/S3 transfer support to GridFTP v0.2

Details

Type: New Feature

Status: Resolved 2014-02-06

Description

Here are the additional tasks to get a HTTP 0.2 version

- HTTP v .2
        + multi-part PUT
        + multi-stream GET
        + Firm up additional command syntax and doc

        + http XIO Driver for GridFTP/S3
                - need a gridftp http client (GO/fxp)

Comments

Globus Toolkit/GT-275

Summary

Prototype single port FTP transfer

Details

Type: Task

Status: Resolved 2012-10-12

Description

Prototype a single port FTP transfer. Keep in mind ideas from Mode F - http://confluence.globus.org/display/~karl/Mode+F+Notes
but this isn't necessarily dependent on mode F.

Comments

Mike Link - 2012-09-10

Spent a few days on this.  Made a simple pre-auth control channel command to initiate a data connection based on a key.

DATA 

I look up that key and then hand off that connection to a standard stream mode data channel connection.  insecure for now (but the data connection would still do DCAU), and requires a non-forking server process, but most of the work is just figuring out how to do the control to data handoff.  This is almost working.

Mike Link - 2012-09-24

single port transfers are working with a nonforking server.   MODE E, reusing data channels, and parallel streams all work.


did some work on passing the data channel fd over unix sockets so that forking will work, but that isn't functional yet.

Mike Link - 2012-10-12

Done, lessons learned for future work.

Globus Toolkit/GT-276

Summary

PBS SEG module isn’t robust against log files becoming unavailable

Details

Type: Bug

Status: Resolved 2012-09-24

Description

The SEG module for PBS is sometimes missing events in the middle of the day on systems with NFS mounted LRM logs. The log parser should be made more robust so that it can handle transient errors better and not miss things that might have happened.

Comments

Joe Bester - 2012-09-24

I've passed a source package for this updated version to Mats at OSG. It'll be on globus.org the next time I get binaries synced to the ftp dirs.

Globus Toolkit/GT-278

Summary

Use standard output for -help and -usage output in GT tools

Details

Type: Improvement

Status: Open

Description

We provide a bunch of tools in GT, and many of them have the behavior of writing help output to standard error, so pagers aren't able to handle it. Make a pass through the executables and scripts and make them less annoying. See https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=3310 for an older report of this issue.

Comments

Globus Toolkit/GT-279

Summary

ERROR regarding proxy lifetime should not be so frustrating - make it a WARNING

Details

Type: Improvement

Status: Open

Description

From https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5580

Description From Ben Clifford

If a proxy has less than 3(?) hours validity remaining, globus-job-run issues
an error and refuses to submit a job because there is not enough time. Often
the remaining proxy lifetime is adequate; the construction of of a proxy
pseudolifetime smaller than the actual lifetime is in those situations
frustrating.

The ERROR could become a WARNING.

Comments

Globus Toolkit/GT-280

Summary

Create script to update globus package repositories from jenkins artifacts.

Details

Type: Task

Status: Resolved 2012-10-04

Description

There is a Jenkins a REST and python api to interact with builds and artifacts. We should use one of those to create a script to download packages from jenkins and update our public repos on globus.org

Comments

Joe Bester - 2012-10-04

I've created a script bamboo-scripts/fetch-build-results which will pull down all of the packages and populate a tree that looks like the GT5 distribution point.

Globus Toolkit/GT-281

Summary

Add exclusive execution mode to lsf module

Details

Type: New Feature

Status: Resolved 2012-11-12

Description

From Horst:

I just looked at the new lsf.pm, and I don't see any of the mods we put
into our old osg-1.2 lsf.pm, so I'm wondering what the plans are for
this, or if we should just plan on hacking this into the new version
again once it's stable ...

I'm specifically talking about the declaration of the 'exclusive' flag,
which in turn sets "#BSUB -x\n". There is also a corresponding entry
in lsf.rvf, the "Attribute: exclusive".

At some point we were also playing around with $mpirun in lsf.pm, but
that's been a long time ago, and I don't remember if we ever actually
used that for anything.

But I think the 'exclusive' / "#BSUB -x\n" declaration should be in the
default jobmanager, since it will be used extensively in the ATLAS
AthenaMP whole node job configuration soon.

Another question I had was that the old lsf.rvf contained an
'Attribute: queue', which is no longer in the new lsf.rvf.
Does this mean that in the new osg-3 version of the LSF jobmanager,
we no longer need to declare every single LSF queue in lsf.rvf,
and they will be picked up automagically? We would be STRONGLY in favor
of that, since it was quite a pain to keep these queues updated
manually in lsf.rvf, since new LSF queues are created all the time ...

Comments

Joe Bester - 2012-11-12

OSG has changed their mind on this and are instead using the host_xcount and xcount attributes for choosing resources for jobs.

Globus Toolkit/GT-282

Summary

Combine rectify-spec-versions and rectify-debian-versions scripts

Details

Type: Task

Status: Resolved 2012-09-20

Description

There are two scripts, rectify-spec-versions and rectify-debian-versions which do the same general thing for each of the native package metadata types. They update package version numbers to reflect gpt version numbers and then update dependencies so that they match the current gpt dependency versions, and modify the changelog to match the latest version update. These programs should be combined into a single process which has default behavior so that there isn't a need for a lot of command-line options to use them.

Comments

Joe Bester - 2012-09-20

New tool rectify-versions committed to 5.2 branch and the debian/rpm specific tools are removed.

Globus Toolkit/GT-283

Summary

http://www.globus.org/toolkit/downloads/ doesn’t list GT 5.2.2 release

Details

Type: Documentation

Status: Open

Description

http://www.globus.org/toolkit/downloads/ still points to GT 5.2.1 as recommended 5.2 release.

Comments

Globus Toolkit/GT-284

Summary

GridFTP pipelining and reliability issues

Details

Type: Bug

Status: Open

Description

The original report comes from Frank Scheiner (HLRS) on behalf the PRACE project. In brief using pipelining and reliability options with the GUC client do not work well together. I've added the full report in the attached file (FrankScheiner-GridFTP-GUC-report.txt).

Best wishes,

Emmanouil

Comments

Globus Toolkit/GT-285

Summary

Special handling for filenames with spaces in the GridFTP transfer log format

Details

Type: Task

Status: Open

Description

Here is the note from Charles Bacon from ALCF:

On Sep 27, 2012, at 1:28 PM, Charles Bacon wrote:

There is no special handling for filenames that contain spaces, which breaks most CSV parsers.  I'm stuck using a regex to parse it, which is (I measured) about 20 times slower.  It would be nice if you ran whatever the standard "I need to enclose this value in delimeters because it has a weird character in it" that CSV parsers expect.  Jamming quotes around it is perfect, but I forget what you're supposed to do when the thing you're enclosing contains the enclosing character itself.

Just blowing off steam, feel free to file this into the bitbucket.  I'm just parsing about 2.5GB of xferlogs from ALCF, and the filename with spaces didn't show up until the 3.1millionth row or so, just about when I thought I was done.  :-)

Comments

Globus Toolkit/GT-286

Summary

Create Build Image for Ubuntu 12.10

Details

Type: Task

Status: Resolved 2012-11-26

Description

The next Ubuntu release is scheduled for October 18. Sometime before then, we should grab the latest snapshot and prepare a build image and build/test jobs to have a build ready for the release date. After the release date, we should be able to drop support for Ubuntu 10.10 and 11.04, as they will no longer be supported by Ubuntu.

Comments

Joe Bester - 2012-11-26

This is in place for the 5.2.3 release.

Globus Toolkit/GT-287

Summary

Create Build Image for Fedora 18

Details

Type: Task

Status: Open

Description

The next Fedora release is scheduled for November 27. Sometime before then, we should grab the latest snapshot and prepare a build image and build/test jobs to have a build ready for the release date. After the December 27, we can drop support for Fedora 16 as it will no longer be supported by Fedora.

Comments

Globus Toolkit/GT-288

Summary

environ variable not accessible from shared libraries on Mac OS X

Details

Type: Bug

Status: Resolved 2012-10-09

Description

On Mac OS X, the system-provided global variable 'environ' may not be directly accessible to shared libraries. This affects globus_libc_setenv.c in globus_common, which directly manipulates the environ variable. Directly accessing environ appears to work in many instances. But one instance where it does not work is when Globus is built on Mountain Lion (10.8.x) and run on Lion (10.7.x). You get a runtime error about being unable to find _environ in /usr/lib/libSystem.B.dylib.

According to the man page for environ, shared libraries that need to access environ can call _NSGetEnviron(), declared in , which returns a pointer to environ.

Comments

Joe Bester - 2012-10-04

Those library functions are non-issue on the current generation of Unixes so it's probably stub those at as calls to the normal stdlib.h functions and not add more complexity.

Joe Bester - 2012-10-09

I've replaced the implementations of those with calls to functions from stdlib.h

Globus Toolkit/GT-289

Summary

globus-url-copy can hang on error with gass server

Details

Type: Bug

Status: Open

Description

When transferring a file from globus-url-copy to the gass-server
library, if the destination filename cannot be opened, globus-url-copy
can end up blocking for a long period of time (15+ minutes). The gass
server has closed the connection, while globus-url-copy has the
connection open in CLOSE_WAIT state.
For this bug to trigger, I needed a moderately-sized file (400k) and a
wide distance between machines (UW and BNL). Otherwise, the transfer
fails immediately with this error:
   error: [globus_l_gass_copy_gass_write_callback]: gass_transfer_request_status: 3

Comments

jfrey - 2012-10-09

This is with the latest OSG packages. Here is the output of
globus-gass-server -versions:
globus-gass-server: 4.1 (1319549256-1)
globus_xio_gsi: 2.1 (1147293372-1)
globus_xio_tcp: 3.2 (1323094639-1)
globus_xio_system_select: 3.2 (1323094639-1)
globus_xio_file: 3.2 (1323094639-1)
globus_xio: 3.2 (1323094639-1)
globus_io: 9.2 (1322772745-1)
globus_i_gass_transfer_http: 7.1 (1319549257-1)
globus_gass_transfer: 7.1 (1319549257-1)
globus_gass_server_ez: 4.1 (1319549256-1)
globus_callout_module: 2.1 (1319549254-1)
globus_gss_assist: 8.1 (1319549262-1)
globus_gsi_callback_module: 4.4 (1343249939-83)
globus_sysconfig: 5.1 (1319549264-1)
globus_credential: 5.1 (1319549261-1)
globus_gsi_proxy: 6.1 (1319549263-1)
globus_gsi_openssl_error: 2.1 (1319549263-1)
globus_openssl: 3.1 (1319549263-1)
globus_extension_module: 14.5 (1323116869-1)
globus_callback_nonthreaded: 14.5 (1323116869-1)
globus_callback: 14.5 (1323116869-1)
globus_object: 14.5 (1323116869-1)
globus_error: 14.5 (1323116869-1)
globus_common: 14.5 (1323116869-1)
globus_gsi_gssapi: 10.7 (1336415934-83)
globus_thread_common: 14.5 (1323116869-1)
globus_thread_none: 14.5 (1323116869-1)
globus_thread: 

Globus Toolkit/GT-290

Summary

Review how JSDL resource specifications are done

Details

Type: Task

Status: Resolved 2012-10-22

Description

Both XSEDE and OSG are investigating various ways to describe resources via extensions to RSL. OSG has a patch to add xcount (with some vague semantics) to the LSF resource manager. XSEDE is looking to adding something similar to the PBS based on older TG patches which define xcount and host_xcount. We'd like to come up with some coherent solution for both platforms. One thing to investigate is how JSDL and BES profiles implement resource constraints, so that we can do something compatible with that if we end up with a conversion of GRAM to be a JSDL-consuming service.

Comments

Joe Bester - 2012-10-22

http://confluence.globus.org/display/GRAMPUBLIC/Resource+Constraints+in+GRAM+and+JSDL

Globus Toolkit/GT-291

Summary

Reduce verbosity of INFO level debug log on GRAM

Details

Type: Improvement

Status: Resolved 2012-11-12

Description

OSG has asked that the default log level be changed to INFO, so that they can trace job executions, but not see all of the other internals about what GRAM is doing. I think this can be done by pushing a few things from INFO to DEBUG, and perhaps a few things from DEBUG to TRACE so that the lower-verbosity log levels are more useful.

Comments

Joe Bester - 2012-10-19

I've committed changes for this to CVS. Will push out an update when they are tested.

Globus Toolkit/GT-292

Summary

Service tags may not isolate services completely

Details

Type: Bug

Status: Open

Description

OSG is trying to use the service name and service tag to isolate the "managedfork" service (which is condor), but reports that there are some issues with those methods not completely isolating services from each other, so that condor and managed fork services end up mistakenly dealing with each other's jobs. Investigate this and come up with a solution to this problem.

Comments

Joe Bester - 2012-12-04

I've asked Brian a few times for more info about this issue, but haven't received any updates, so I'm going to move this to the unscheduled sprint until I learn more.

Globus Toolkit/GT-293

Summary

Mirror CVS in ec2 for build and test

Details

Type: Task

Status: Resolved 2012-10-22

Description

There have been some intermittent errors related to ec2 nodes getting blocked by the DOE firewall, so that they cannot check out code from our cvs repository. Perhaps medium term, we should move the source repository to github or some other externally-managed system, but in the meantime, it would be helpful for our build and test runs to have a mirror of the repo on some node that is outside of the DOE firewall.

Comments

Joe Bester - 2012-10-19

I'm mirroring the cvs tree on builds.globus.org for use by the jenkins execute nodes only. I've updated the debian build and test scripts to use those.

Joe Bester - 2012-10-22

I've updated the rest of the scripts and jobs to use the alternate cvsroot

Globus Toolkit/GT-294

Summary

Add support for HDFS to gcmu native package

Details

Type: Task

Status: Open

Description

OSG has a DSI for hdfs, which we'd like to include in the GCMU native packages, so that it becomes easy to install a GridFTP server to deal with support for it. We'd like to get the package updated for native packaging with GT5.2 and add dependencies and hooks into the GCMU scripts to handle the integration with HDFS. There is a HDFS module that maps to an underlying POSIX filesystem, so we can use that to test the configuration of the gridftp server hdfs integration as part of our standard build and test process without having to configure a filesystem cluster.

Comments

Globus Toolkit/GT-295

Summary

Missing dependency in globus_scheduler_event_generator debian native packages

Details

Type: Bug

Status: Resolved 2012-11-12

Description

The native packages of globus-scheduler-event-generator-progs fail to uninstall cleanly because they have an unexpressed dependency on globus-common-progs, which may get removed prior to the pre-uninstall script being run.

Comments

Joe Bester - 2012-11-12

committed to 5.2 branch and available in the testing repo.

Globus Toolkit/GT-296

Summary

globus_ftp_control_data_read() race condition (Formerly bugzilla bug 1234)

Details

Type: Bug

Status: Resolved 2014-07-23

Description

Original post: https://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=1234

From Bugzilla:

globus_ftp_control_data_read() sets EOF on callbacks once all data is
delivered. It is the responsibility of the calling process to inform all
threads that EOF has been received so that no further reads are registered.
There is a short window between which the library sets the handle state to
EOF and the callback informs all threads that EOF has been received. During
this time, any new buffers introduced will result in an error.

This happens in the simplest of cases. If you register 2 reads with
globus_ftp_control_data_read() on a STREAMS mode data channel with each
buffer length = X and file size = Y where X > Y, then there is a chance that
you will receive EOF on the first buffer. The second buffer could still be
in the initial error-checking phases of globus_ftp_control_data_read()
resulting in the error.

This bug still exists in the ftp control libraries distributed with GT 5.2.2. This is a problem for client and DSI authors. It would be great if we could get a fix or work around for this race condition. Even if we had someway to distinguish EOF-related errors from other errors returned by globus_ftp_control_data_read(), it would help tremendously.

Comments

Mike Link - 2013-01-14

Fixed for 5.2.4

Jason Alt - 2013-02-19

As posted erroneously in bug GT-297:

Tested version 5.2.4 with the HPSS DSI. I can recreate the bug when using REST but not without:

MLST 2GB
250-status of 2GB
 Type=file;Modify=20121211185330;Size=2147483648;Perm=awr;UNIX.mode=0644;UNIX.owner=jalt;UNIX.uid=27751;UNIX.group=bw_staff;UNIX.gid=14802;Unique=7649b200-3d000002; 2GB
250 End.
SIZE 2GB
213 2147483648
STAGE 0 2GB
501 Syntax error in parameters or arguments.
SITE STAGE 0 2GB
500 Invalid command.
ALLO 2146483648
200 ALLO command successful.
SITE SETFAM DEFAULT
250 Ok
SITE SETCOS DEFAULT
250 Ok
MODE S
200 Mode set to S.
DCAU A
200 DCAU A.
PBSZ 1135616
200 PBSZ=1135616
PROT C
200 Protection level set to C.
TYPE I
200 Type set to I.
PASV
227 Entering Passive Mode (141,142,176,151,180,180)
REST 1000000
350 Restart Marker OK. Send STORE or RETR to initiate transfer.
STOR 2GB
MODE S
200 Mode set to S.
DCAU A
200 DCAU A.
PBSZ 1135616
200 PBSZ=1135616
PROT C
200 Protection level set to C.
TYPE I
200 Type set to I.
PORT 141,142,176,151,180,180
200 PORT Command successful.
REST 1000000
350 Restart Marker OK. Send STORE or RETR to initiate transfer.
RETR 2GB
150 Beginning transfer.
150 Beginning transfer.
226 Transfer Complete.
500-Command failed. : globus_ftp_control_data_read failed.
500-globus_ftp_control_data_read(): Handle not in proper state EOF.
500 End.
226 Transfer Complete.
2GB: Error with remote service during transfer.
500-Command failed. : globus_ftp_control_data_read failed.
500-globus_ftp_control_data_read(): Handle not in proper state EOF.
500 End.


Using the debugger, I see one op registered and it is in the read callback with eof=TRUE. It is waiting for the lock to set the global EOF flag. Another thread has the lock and is submitting one request when the error occurs.

If there is some procedural way to avoid this, let me know.

Jason Alt - 2013-02-20

Bug was recreated last night w/o REST.

hpss02$rpm -qa |grep globus-gridftp
globus-gridftp-server-control-2.8-1.el6.x86_64
globus-gridftp-server-progs-6.19-1.el6.x86_64
globus-gridftp-server-debuginfo-6.19-1.el6.x86_64
globus-gridftp-server-control-debuginfo-2.8-1.el6.x86_64
globus-gridftp-server-6.19-1.el6.x86_64
globus-gridftp-5.2.2-1.el6.x86_64

/testdir/Performance_10_hpss02_grid/cos2.2048mb.9.dd: Error with remote service during transfer.
500-Command failed. : globus_ftp_control_data_read failed.^M
500-globus_ftp_control_data_read(): Handle not in proper state EOF.^M
500 End.^M

Globus Toolkit/GT-297

Summary

globus_ftp_control_data_query_channels() SIGSEGV on proxy expiration

Details

Type: Bug

Status: Resolved 2013-02-19

Description

If a user's certificate expires during an ftp session and the client attempts to transfer another file, globus_ftp_control_data_query_channels()  will segfault. From the attached stack trace, it appears as though the data channel is closing. I presume this is due to DCAU failing.

Comments

Mike Link - 2013-01-14

Fixed for 5.2.4

Jason Alt - 2013-02-19

Tested version 5.2.4 with the HPSS DSI. I can recreate the bug when using REST but not without:

MLST 2GB
250-status of 2GB
 Type=file;Modify=20121211185330;Size=2147483648;Perm=awr;UNIX.mode=0644;UNIX.owner=jalt;UNIX.uid=27751;UNIX.group=bw_staff;UNIX.gid=14802;Unique=7649b200-3d000002; 2GB
250 End.
SIZE 2GB
213 2147483648
STAGE 0 2GB
501 Syntax error in parameters or arguments.
SITE STAGE 0 2GB
500 Invalid command.
ALLO 2146483648
200 ALLO command successful.
SITE SETFAM DEFAULT
250 Ok
SITE SETCOS DEFAULT
250 Ok
MODE S
200 Mode set to S.
DCAU A
200 DCAU A.
PBSZ 1135616
200 PBSZ=1135616
PROT C
200 Protection level set to C.
TYPE I
200 Type set to I.
PASV
227 Entering Passive Mode (141,142,176,151,180,180)
REST 1000000
350 Restart Marker OK. Send STORE or RETR to initiate transfer.
STOR 2GB
MODE S
200 Mode set to S.
DCAU A
200 DCAU A.
PBSZ 1135616
200 PBSZ=1135616
PROT C
200 Protection level set to C.
TYPE I
200 Type set to I.
PORT 141,142,176,151,180,180
200 PORT Command successful.
REST 1000000
350 Restart Marker OK. Send STORE or RETR to initiate transfer.
RETR 2GB
150 Beginning transfer.
150 Beginning transfer.
226 Transfer Complete.
500-Command failed. : globus_ftp_control_data_read failed.
500-globus_ftp_control_data_read(): Handle not in proper state EOF.
500 End.
226 Transfer Complete.
2GB: Error with remote service during transfer.
500-Command failed. : globus_ftp_control_data_read failed.
500-globus_ftp_control_data_read(): Handle not in proper state EOF.
500 End.


Using the debugger, I see one op registered and it is in the read callback with eof=TRUE. It is waiting for the lock to set the global EOF flag. Another thread has the lock and is submitting one request when the error occurs.

If there is some procedural way to avoid this, let me know.

Jason Alt - 2013-02-19

Sorry, wrong window. This is meant as a follow up for bug GT-296.

Globus Toolkit/GT-298

Summary

Leading whitespace confuses rvf parser

Details

Type: Bug

Status: Resolved 2012-11-12

Description

OSG ran into a problem where they added a blank line to the rvf file, and it caused the service to fail to parse and thus reject all jobs.

Comments

Joe Bester - 2012-10-11

I've applied the rvf.diff patch to CVS 5.2 branch and trunk for this issue. I'll issue an update package once the binary packages are built.

Joe Bester - 2012-11-12

committed to cvs and available from the testing repo.

Globus Toolkit/GT-299

Summary

data and finished command kickouts called twice?

Details

Type: Bug

Status: Resolved 2012-11-12

Description

While stress testing the GridFTP server w/ the HPSS DSI, the server became very unstable with moderate load on the system (~20 clients and servers). The server would crash in numerous locations with what appeared to be memory corruption. After debugging with valgrind and dmalloc, a common theme occurred;  somehow the globus_l_gfs_data_operation_t has already been destroyed before the call to globus_l_gfs_finished_command_kickout(). The same is happening with the request structure and globus_l_gfs_request_info_destroy(). The attached valgrind file shows this.

backtrace1 shows the server aborting on a failed globus_mutex_lock(). The contents of the globus_l_gfs_data_operation_t are included in the output. In this example, I had modified globus_l_gfs_data_operation_destroy() and globus_l_gfs_request_info_destroy() to overwrite the structure before free with '0xef' to be sure this corruption wasn't a write gone haywire.

backtrace2 and backtrace3 may be related, they may be separate issues. Both files show what appears to be aborts due to the server being in the wrong state. backtrace3 occurs quite frequently and usually with state GLOBUS_L_GSC_STATE_OPEN.

I was not able to reproduce this without the DSI loaded. This problem does not exist with GT 5.1.3 w/ the DSI loaded.

The order of FTP commands given were:

ALLO 52428800
MODE S
DCAU A
PBSZ 1181696
PROT C
TYPE I
PASV
STOR 50MB


Without GT 5.2.2, we lose the awesome custom commands and so we lose the ability to stage files.

Comments

Jason Alt - 2012-10-31

The acl_overlap_backtrace shows that the end of the STOR command (thread 2) is overlapping with the beginning of the MFMT command (thread 3). I believe that since acl_handle is shared, the op stored in acl_handle->user_arg by the MFMT thread is getting picked up in globus_l_gfs_authorize_cb() by the STOR thread. Thus the MFMT command (or possibly any command following STOR) is getting dispatched twice.

I'm still looking for the fix.

Jason Alt - 2012-10-31

It appears that the problem is in source-trees/gridftp/server/src/globus_i_gfs_data.c: globus_l_gfs_data_end_read_kickout(). This function calls globus_gfs_acl_authorize() and then calls globus_l_gfs_data_end_transfer_kickout() to shutdown the control side (which causes the reply to be sent and the next command to be read). Stress testing shows that the server is stable after removing the call to globus_gfs_acl_authorize() in this function.

Mike Link - 2012-11-12

Thanks for tracking this down.  Fixed for 5.2.3.

Globus Toolkit/GT-300

Summary

gridftp-server should use threading by default

Details

Type: Bug

Status: Resolved 2012-12-04

Description

A default setup of the gridftp-server (as taken from the Fedora/EPEL RPM packages) does not set the threading model to pthreads. The server will run single-threaded as a consequence.

There is reason to believe this setup may lead to performance problems no a server that has to deal with multiple connections. In at least one case, the gridftp server on a WMS running in Europe slowed to a crawl until the following line was added to the configuration file in /etc/gridftp.conf:

 $GLOBUS_THREAD_MODEL pthread

We were able to reproduce the performance difference on a testbed machine.

If there are no evident reasons why the server should not be run in a multi-threaded mode, then this should probably become the default. Either by shipping the server with the above line in the default configuration file, or making pthread the built-in default.

Cheers,

Dennis van Dok

Comments

Mike Link - 2012-10-12

Can you show how you reproduced the performance difference on a test machine?  There should be minimal performance differences between threaded and nonthreaded models.

dennisvd - 2012-10-12

Originally this came from a WMS run by the eNMR people. My colleague reproduced it on a machine in our testbed. I think the setup was quite simple; it uses gridftp server from the EPEL5 distribution, together with the gsi-authz callout to LCMAPS for credential mapping. The first run was a transfer of a large single file; the second run was done with two parallel transfers of large files.

I've asked my colleague to contact you for further assistance.

UPDATE: my colleague is unable to reproduce the behaviour on our testbed; transfer rates seem to be stable with or without threading. At least for the eNMR people, setting the model to threading seems to have helped; but now it's only one data point in an environment where the influence of other factors can not be excluded.

dennisvd - 2012-11-12

The GGUS ticket [https://ggus.eu/tech/ticket_show.php?ticket=86999] that led to creation of this ticket has been updated. Apparently the real cause of the slowness of the WMS wasn't the threadedness of the gridftp server, but the size of the mysql database.

I'm not sure if this invalidates the ticket, but at least I should apologise for the false alarm.

Sorry,

Dennis

Stuart Martin - 2012-12-04

Non-threaded is a better default.  Running threaded is fine too but under certain circumstances it can be slight less stable.

Globus Toolkit/GT-301

Summary

Debug XSEDE GridFTP setup and testing process

Details

Type: Task

Status: Resolved 2012-12-04

Description

XSEDE SD&I team is testing a gt 5.2.2 based installation process.  A gridftp test program has been provided, but returned errors.  Work with SD&I to figure out the root cause and update the installation guide and or test program as necessary.

Comments

Mike Link - 2012-10-22

Updated test program to address known issues, added one line explanations of tests being run, and added a DCSC test.  Am in contact with SD&I team to diagnose problems with the suggested server config.

Stuart Martin - 2012-12-04

Mike fixed.  Eric and Galen (XSEDE SD&I) tested and confirmed.  So, this is done.

Globus Toolkit/GT-302

Summary

Add initial sharing support to the GridFTP server

Details

Type: New Feature

Status: Resolved 2013-01-14

Description

Add required features as per sharing discussion.
-option to enable sharing, takes DN of sharing account
-processing of .globus_sharing file
-command to add additional path restrictions

Comments

Mike Link - 2012-10-22

Added an option:  --enable-sharing .  If this is set, any USER can be sent.  If that user has a ~/.globus_sharing file and it processes correctly, access is granted.

added a protocol command: SITE RP [path restriction string], where the string is the same as can be passed as an rp config option.  No escaping is done, so a path can't contain a , or \r\n.

Still need to proccess the .globus_sharing file.  Also spoke to karl about a change where he can send USER  rather than a username, and the server would do a gridmap lookup on that dn to get the correct user.  Nothing done there yet.

Mike Link - 2013-01-14

Fixed a few more bugs from Karl.

site restrict (or -restrict-paths) with multiple entries works.
site restrict handles special characters including urlencoded colon.
site chroot properly errors when passed a disallowed path.
site chroot handles globbing characters properly.

Globus Toolkit/GT-303

Summary

Add support for virtual paths to GridFTP server

Details

Type: New Feature

Status: Resolved 2013-01-14

Description

Add support for mapping existing paths to virtual paths, likely via the path restriction interface.  Need to take care that any return paths via listings or error messages or other responses reflect the virtual path, not the real path.

Comments

Mike Link - 2012-10-22

This is working as far as translating paths between real and virtual for any file accesses or return listings.  Still need to find where command responses might include a real path that needs to be either removed converted back to a virtual path.

Mike Link - 2013-01-14

SITE CHROOT done for 5.2.4

Globus Toolkit/GT-304

Summary

bashism in /bin/sh script

Details

Type: Bug

Status: Resolved 2014-07-23

Description

This issue was originally reported to the Debian Bug Tracking System:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=690652

possible bashism in ./usr/sbin/globus-gridftp-server-setup-chroot line 11 ($UID should be "$(id -ru)"):
if [ $UID -ne 0 ]; then

Comments

Mike Link - 2012-11-12

Thanks.  Fix committed for 5.2.3.

Globus Toolkit/GT-305

Summary

Improve doc for setting up the basic set of grid services

Details

Type: Documentation

Status: Open

Description

Begin forwarded message:

From: john alexander sanabria ordonez 
Date: October 17, 2012 8:58:30 AM CDT
To: gt-user@lists.globus.org
Subject: [gt-user] who maintains the Globus Toolkit documentation?

Hello,

I did try to follow the instructions given in the Globus Toolkit documentation web page but I found a couple of gaps that mislead the deployment of basic grid services (simpleCA + gram5 + gridftp), e.g.
The documentation does not mention that the grid-ca-certs package should be installed
The documentation can be more specific about the grid-ca-create command must be run by the globus user
When this command is run by globus user some files (found in ~/.globus/simpleCA) must be copied to the /etc/grid-security directory
The documentation should provide a hint which describes what [YUM and OSG specific] repositories should be installed for setting up a machine where Globus services  would be provisioned (https://twiki.grid.iu.edu/bin/view/Documentation/Release3/YumRepositories)

I have to thank to Marco Mambelli and Jose Caballero for their hints that allow me to define the correct set of steps for installing those services.

Regards,

PS:  I wrote some Chef recipes which allow to deploy the aforementioned services in a physical or virtual machine

Comments

Globus Toolkit/GT-306

Summary

add mode U for UDT based mode E with ICE based NAT traversal

Details

Type: New Feature

Status: Resolved 2013-03-22

Description

We'd like to demo this at SC 2012.

From Steve:

I think what we want to do is enhance the GridFTP control channel protocol, so that a GridFTP client can mediate the passing of UDP ip/port information between the two GridFTP servers. Assume we define a new "MODE U" that is the UDP with NAT traversal + UDT + mode E.  When in that mode, I think would basically replace the PASV and PORT commands with new ones that can pass a list of alternate ip/port pairs. I'm thinking as a short-term hack we can maybe use SPSV and SPOR for these, but instead of treating the list of ip/ports as striped servers, treat them as alternate UDP ip/ports.  So after passing MODE U to each side, the client would call SPSV to one server, which would then use STUN to discover its external ip/port, and would return both its external ip/port and its local (private) ip/port. The client would pass this list to the other server using SPOR. I don't recall if this exchange has to happen in both directions -- the second server also does STUN discovery, and pass that through the client to the first server.  If so, then perhaps you use SPSV and SPOR a second time in the opposite direction to pass this.  Or maybe its easer to just add new commands to do this (UPSV, UPOR).  In any case, then the servers use the ICE negotiation protocol to figure out with ip/ports to use to talk to each other.  For example, if they happen to be on the same private network, then they will talk directly using their private network addresses.  Otherwise they will use their external addresses.  I don't recall in what situations there might even be more than 2 ip/port pairs for each server -- perhaps if the server is multi-homed?

Once the two servers have found each other via UDP, then we should drop into UDT, and then drive MODE E over that.  I think the MODE E over UDT is already done. So I think the main missing bit is adding the STUN and ICE support to the UDP connection establishment.

Comments

Bryce Allen - 2012-11-07

I was able to get hole punching to work with udp sockets and then start a UDT connection by binding to the existing udp sockets. Both clients are behind a Linux NAT firewall which allows all outgoing forward traffic, but only allows ESTABLISHED/RELATED in.

Bryce Allen - 2012-11-08

Linux Firewall notes:

- Default timeout on UDP "connections" is 30 seconds
- A connection becomes "ASSURED" and gets a timeout of 180 seconds only after three packets are sent - outgoing, incoming, and the third can be either incoming or outgoing.

It's important to maintain the connections by sending packets periodically, so the port mapping on any NAT firewall is not changed.

Bryce Allen - 2012-11-08

Implementing STUN and ICE from scratch would be a fairly large project. There are three open source implementations worth looking at:

PJNATH: http://www.pjsip.org/pjnath/docs/html/
- No external requirements, but there is no obvious way to build only pjnath and pjlib (exclude the SIP and audio/video libraries)
- GPLv2 or later
- Developed primarily by small company (4 people?) in UK
- API looks like it's designed as a complete solution, may be hard to use selectively and then pass the socket off to UDT

LIBNICE: http://nice.freedesktop.org/wiki/
- LGPL v2.1 and MPL
- Requires glib >= 2.10. Dependencies on gupnp-igd and gstreamer are optional and easy to disable.
- API seems very clean, but no obvious way to extract a socket fd (or even a GSocket) for external use.

STUNTMAN: http://www.stunprotocol.org/
- STUN only, no ICE support
- Apache v2 license
- Requires Boost and openssl
- Server and Client implementation, but has a cleanly separated core library
- Most flexible, getting the socket fd should not be a problem, but will require significant work to implement NICE on top of. May be able to collaborate with other devs to add ICE support to the main library.

Bryce Allen - 2012-11-09

PJSIP includes a sample 'icedemo' application that does ICE step by step in an interactive CLI, with an optional STUN or TURN server. Wish I had found it sooner, it is perfect for testing that ICE really works on a given network topology:
  http://www.pjsip.org/pjnath/docs/html/ice_demo_sample.htm

The released (2.0.1) version has a critical bug, fixed by this patch:
 http://trac.pjsip.org/repos/changeset/4217/pjproject/trunk/pjsip-apps/src/samples/icedemo.c

So far I've tested in my lxc environment (two hosts behind linux fw cone nat), and between two ANL hosts (one secure wireless, second wired).

Bryce Allen - 2012-11-12

The following also work with icedemo:
- ANL wired w/ public IP (appears to have stateful firewall for non-ANL traffic), home computer behind linux nat
- two machines behind same linux nat
- public IP without firewall or NAT, symmetric NAT
- double NAT, single NAT (separate)
- double NAT, single NAT (outer NAT shared, so it's similar to no NAT/single NAT but the STUN server is outside)
- double NAT, double NAT (all 4 NATs are standard linux MASQUERADE)

The following does not work:
- standard linux NAT, symmetric NAT

Symmetric NAT was implemented with the following iptables rules:
-t nat -A POSTROUTING -o eth0 -p udp -j SNAT --to-source 10.0.3.111-10.0.3.113:50000-51000 --random
-t nat -A POSTROUTING -o eth0 ! -p udp -j SNAT --to-source 10.0.3.111-10.0.3.113 --random

I tested that different destinations for the same local port get mapped to different public addresses and ports. Not sure how typical the full behavior is of real symmetric NAT though.

Bryce Allen - 2012-11-13

libnice is packaged for RHEL, debian, and ubuntu (since lucid). However getting glib and all it's dependencies setup on windows and Mac OSX is a royal pain from source. We might be able to leverage existing binary builds and/or dev builds.

pjnath is not packaged anywhere, but is generally very easy to build and has no external dependencies. I get a link error cross compiling with mingw-w64, but it's during the sample app build after all the libs (including pjnath) and icedemo get built, so it could be ignored.

Bryce Allen - 2012-11-16

More icedemo tests (working):

- ANL wired, ANL guest wifi (note that both have no NAT, just firewall)
- ANL guest wifi, home linux NAT

Bryce Allen - 2013-01-08

I have ICE working (using pjnath) with my udpc test client. I can do ice negotation, using upas/uprt commands in udpc, then start a UDT connection immediately after destorying the ice session, binding to the negotiated addr:port.

There is some strange behavior in exotic network configs. My test machines behind linux double NAT take two minutes to initialize the ice session. icedemo has the same behavior, so it's either an intrinsic limitation of ICE, an issue with pjnath, or an issue with my test environment. It's a relatively minor issue, since everything still works - it just delays the connection. The response from the STUN server returns almost immediately, so there is no obvious reason for the delay.

Bryce Allen - 2013-01-09

Mystery solved: the delay has nothing to do with STUN or double NAT - it's caused by the hostname not resolving. I had hosts entries for the single NAT containers in my test environment, but not for the double NAT containers.

Bryce Allen - 2013-01-09

Scripts I used to setup a NAT test environment in Linux:
https://github.com/globusonline/lxc-nat

Bryce Allen - 2013-01-18

Need to see if pjnath can be dynamically linked, and make sure that Mike keeps the ice lib with code copied from icedemo.c separate.

Bryce Allen - 2013-01-18

Building .so should be possible, but it may require some work; I can't find any examples of people doing it for Linux:

http://www.pjsip.org/pjlib/docs/html/group__pj__dll__target.htm
http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/2007-September/000111.html

Bryce Allen - 2013-01-22

We can't use PJNATH because of licensing (confirmed by Perry Ismangil ). Their proprietary license is not an option because it's not compatible with OSS.

Libnice is LGPL 2.1, as is glib which it depends on. I think we can link to that dynamically without problems.

Bryce Allen - 2013-01-22

One concern with libnice is that there are many versions in the wild, and RHEL/CentOS 5 has no package. The oldest is 0.0.9 in CentOS 6.

UBUNTU
lucid has 0.0.10, oneiric has 0.1.0, precise has 0.1.1, and quantal has 0.1.2

DEBIAN
squeeze has 0.0.12, wheezy (upcoming) has 0.1.2

RHEL/CENTOS
6 has 0.0.9, 5 has nothing and no EPEL package
Fedora 16 has 0.1.0, 17 has 0.1.2, 18 has 0.1.3

Bryce Allen - 2013-01-22

libnice lacks a public API for getting the selected candidates - it assumes you will be using their internal send/receive functions.
http://nice.freedesktop.org/libnice/libnice-NiceCandidate.html

There is a private API that we could use to hack around this, but that may not be consistent between 0.0.X and 0.1.X or even among minor point releases. They may be open to adding such an API, but we still can't take advantage of existing native packages.

Waiting on response from the mailing list, but I'm going to use the hack for now so I'm not blocked yet.

Bryce Allen - 2013-02-06

libnice 0.0.X and 0.1.X are not compatible. Our options are to not support older distros with 0.0.X, build packages for 0.1.X for those distros, or try to throw in some ifdefs to support both versions. One example is that nice_agent_gather_candidates returns void on 0.0.X and gboolean on 0.1.X. That isn't too hard to work around, but on 0.0.X the error handling is weak.

Note that 0.1.X won't compile on CentOS 5 - it requires glib 2.13, and 2.12 is installed. We'd have to also build glib 2.X, X>=13 package (or statically link) to make it work.

Bryce Allen - 2013-02-06

I've only found two difference between 0.0.X vs 0.1.X that affect my code:

- use DRAFT19 compatiblity instead of RFC, which maps to RFC on the newer versions anyway
- nice_agent_gather_candidates returns void on 0.0.X

The second issue is more problematic, beause right after gathering candidates it waits on a condition variable for candidate-gathering-done. If the function failed the signal will never get triggered, and the old version does not signal that back to the caller.

Another concern is whether they will interoperate with each other. It might be simpler to package the latest that to get it to compile and then test on all the combinations.

Bryce Allen - 2013-02-08

Issue with libnice 0.0.9 in CentOS 6: nice_candidate_copy is not exported. Fairly easy to workaround, just a surprising discrepency.

Globus Toolkit/GT-307

Summary

stdio-update-after-failure-test hang on debian 7 and ubuntu 10.10

Details

Type: Bug

Status: Resolved 2014-09-11

Description

The test stdio-update-after-failure-test is hanging in the current build on debian 7 and ubuntu 10.10. The job manager is not running when the test is hung.

Comments

Joe Bester - 2014-09-11

The tests don't seem to be hanging in current builds. Unclear of the reason.

Globus Toolkit/GT-308

Summary

move some GT web pages to wiki

Details

Type: Documentation

Status: Open

Description

Consider moving some GT web pages to a Globus supported community wiki in order to enable the community to help maintain and improve them.

From: Pete Eby 
Date: October 19, 2012 9:28:35 AM CDT
Subject: Re: [gt-user] who maintains the Globus Toolkit documentation?

I thinking having the install and quickstart guides in a wiki format
would be extremely helpful. The globus user community seems well
suited to help maintain this documentation. These guides are very
helpful, and allowing them to be updated might help to improve them
further.

Also, perhaps an additional community wiki page could be created for
trouble shooting common issues. There seems to be some "top 10" things
users encounter frequently, and this might help save them a
considerable amount of debug / research time.

Pete

On Fri, Oct 19, 2012 at 10:22 AM, Stuart Martin  wrote:
I like that idea a lot.  We use docbook to generate most of the toolkit's web content.  We'll have to think through if and where taking on a transition like that would make sense.  Maybe there are some GT pages that are best left to be generated via docbook and some that could be changed to become wiki pages.

For example, I'm thinking (partially based on the input here) that the install and quick start might be good candidates to be wiki pages.
       http://www.globus.org/toolkit/docs/latest-stable/admin/install/#gtadmin
       http://www.globus.org/toolkit/docs/latest-stable/admin/quickstart/#quickstart

Maybe this developer guide too?
       http://www.globus.org/toolkit/docs/5.2/5.2.2/appendices/developer/

Thoughts?

-Stu

On Oct 19, 2012, at Oct 19, 6:09 AM, Pete Eby wrote:

Has the idea of placing the documentation on a wiki where it could be
updated by community members been discussed? Perhaps by users with a
Globus Online login?

Comments

Globus Toolkit/GT-309

Summary

gridftp: new SITE CHROOT command

Details

Type: New Feature

Status: Open

Description

We want a 'soft chroot', for input and output paths (error messages).  Or, turning off paths in error messages may also work, because GO doesn't push recursive operations to the server and shows the path itself in errors.   Perhaps if that was an option?  (SITE echo_file_names_in_errors_for_non_recursive_comands=0 for GO?)

This is lower priority for demo than SITE RP, but I think it's still wanted - if not by SC, then soon after.

Comments

Karl Pickett - 2012-10-25

you may want to only allow this command once.  GO will only send it once per session, at the very beginning.

Karl Pickett - 2012-11-01

appears to be working well in 5.2.3s

Globus Toolkit/GT-310

Summary

improve help for -rp-follow-symlinks

Details

Type: Improvement

Status: Resolved 2012-11-12

Description

security features deserve a bit of explanation.

current text:  'Allow following symlinks that lead to restricted paths.'

perhaps add this:  "Even if off, symlinks to allowed paths are allowed".

also could change current text to:  'Follow symlinks that lead to forbidden paths.'  or 'Symlinks can bypass restricted paths'.

option could also be renamed to -rp-trust-symlinks.

Comments

Karl Pickett - 2012-10-22

By the way, is there a great use case for even having this feature?  If it's on, I can just create a symlink to wherever the hell I want.   (Perhaps document WHY I would ever want to use this?)

Karl Pickett - 2012-10-22

Is it 'follow and create'?  Or just follow, that's affected by this option?

Mike Link - 2012-10-22

You couldn't create a symlink out of the allowed paths, just follow an existing one.  Both paths in the create commands would have to be allowed paths.

Karl Pickett - 2012-10-31

another idea for the name is -rp-always-follow-symlinks.  ("Follow existing symlinks that point to paths outside of the -rp paths.  If this option is off (default), only symlinks to allowed paths are followed.)

Globus Toolkit/GT-311

Summary

globus job manager is leaking memory

Details

Type: Bug

Status: Resolved 2013-06-26

Description

One of our OSG sites is reporting that the gatekeeper is leaking memory on his system.  His report follows:


It looks the globus jobmanager is leaking memory slowly.
1) I found that the globus jobmanager on one of our CEs(the biggest one) is using 10.7g virtul mamory and 1.8g resident memory:
6688 cmsprod7  25   0 10.7g 1.8g 3472 R 50.2 11.8   2831:18 globus-job-mana
I checked with Brian and he don't think it should use that much memory.
2) I checked other gatekeeprs and it looks the the cms production account which runs more jobs than others seem to be
using around 1g+(virt and res), while the analysis accounts stands at  around 500M(virt and res) level.

Are we really hitting another memory leak.

Comments

Joe Bester - 2012-11-07

Can you send some details about the workload for these (rate of job submissions, length of jobs, number of jobs in queue at any given time, rate of polling, lrm, seg configuration)? It's hard to diagnose without any information about how the service is being used.

fengping - 2012-11-07

Can you be specific on the configuration part. I can send you the configuration files. polling and seg should be default from the osg ce installations.

The particular job manager process is the one running cms production glideins. This one runs much more jobs than others. The queue is limited at 4416 cores, so number of running jobs don't exceed that number.
Number of jobs in queue is low these days, but I think it was running at full capacity when we saw the issue.

Here's gratia reporting for a week for this particular DN. Purdue-Rossmann is the site we observed the problem:
__________________________________________________________________________________________________________________________________________________________________
|                             CN                              |      Site       |  VO:Reporting   |    Job   |  # Jobs  |    Wall    |    Delta   |  Delta Wall  |
|                                                             |                 |      Name       | Success? |          |  Time (h)  |    Jobs    |   Time (h)   |
|_____________________________________________________________|_________________|_________________|__________|__________|____________|____________|______________|
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Carter   | CMS:cms         |    Yes   |    38996 |    89423.8 |     +24958 |    -127721.3 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Carter   | CMS:cms         |    No    |     1974 |    15908.7 |       -763 |      -4530.9 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-RCAC     | CMS:cms         |    Yes   |    73023 |   175772.8 |     +54991 |    -123535.3 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-RCAC     | CMS:cms         |    No    |      934 |     1016.8 |      -2022 |     -34767.6 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Rossmann | CMS:cms         |    Yes   |    30246 |   177130.6 |     +15005 |     -49252.4 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Rossmann | CMS:cms         |    No    |     1150 |     9986.7 |        -76 |      -3012.0 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Steele   | CMS:cms         |    Yes   |    36492 |    51104.9 |     +32611 |     -29317.4 |
| /CN=cmspilotjob/vocms157.cern.ch                            | Purdue-Steele   | CMS:cms         |    No    |       24 |      156.9 |        -18 |       -922.5 |

Joe Bester - 2012-11-08

I don't really understand what your gratia data means in terms of GRAM. What do the Job, #Jobs, and Jobs fields mean? Do any of those number indicate the job characteristics I asked about in my  previous message. Doing some lookups, I think this is a PBS system using SEG, correct?

fengping - 2012-11-08

Yes. It's a PBS(specifically torque/moab) system using SEG.
I am not aware of other ways to retrieve the characteristics you asked about. Does globus record those if it's that important to globus?

Joe Bester - 2012-11-21

I've not been able to reproduce with 10k-job  condor-g driven loads. I'll need more info about how gram's being used in order to try to track this down.

hepmkj - 2013-03-20

Hi Neha,
    Joe asked for more information.  I am digging into the log files of batch system to get the required information.

Thanks,
Manoj

sthapa - 2013-03-28

Here's the latest update from the admin, is this what you were looking for Joe?

Hi Suchandra,
Please find the attached file which tells about  the load (number of jobs) on problematic CE 'rossmann-osg.rcac.purdue.edu' during a year .     This CE is a part of community clusters and max wall time  for a job is 30 days.   Usually, we faced the memory leak problem when number of jobs is more than 6K.   Following are information that is available to us

""""
Rate of job submissions:  Difficult to say.  From the attached file 'rossmannCE.png', on an average 5K jobs are in running state.
Length of jobs:  Attached file 'report.txt' shows information about all user jobs during this year.  Length of jobs varies from couple of minutes (~7) to several days (13).  For a given user,  walltime represents the average of  walltime from all his/her jobs.
Number of jobs in queue at any given time: Don't know.
Rate of polling:  We are using the default value as comes with globus.
LRM:  PBS (pbs-config --version 4.1.4)
Seg configuration:
"""
[jha2@rossmann-osg 25th]$ cat /etc/sysconfig/globus-scheduler-event-generator
GLOBUS_SEG_PIDFMT="${localstatedir}/run/globus-scheduler-event-generator-%s.pid"
GLOBUS_SEG_LOGFMT="${localstatedir}/lib/globus/globus-seg-%s"
GLOBUS_SEG_LRM_DIR="${sysconfdir}/globus/scheduler-event-generator"
# GLOBUS_SEG_NICE_LEVEL=0
[jha2@rossmann-osg 25th]$

"""

Site people are in CC.  If you need more information, then let us know.

Thanks,
Manoj

by /DC=org/DC=doegrids/OU=People/CN=Manoj Kumar Jha 945408

Joe Bester - 2013-05-16

Here's a patch that catches some memory leaks. I am still testing this one, but it seems to help quite a bit.

Globus Toolkit/GT-312

Summary

automate native simple_ca package more

Details

Type: Improvement

Status: Resolved 2012-11-12

Description

When the globus-simple-ca native package is installed, it should configure itself if there's no simple CA configured yet. It should also create a host cert if none is present so that things can work without manual configuration.

Comments

Globus Toolkit/GT-313

Summary

globus-url-copy manpage not in globus-gass-copy-progs

Details

Type: Bug

Status: Open

Description

The manpage for globus-url-copy is in the globus-gass-copy-docs package, but should probably be in the progs package, so that they will be installed together.

Comments

Globus Toolkit/GT-314

Summary

hybrid mode front end crashes when attempting to stripe and no data nodes are available.

Details

Type: Bug

Status: Resolved 2012-11-12

Description

When a front end server configured with -hybrid attempts a striping operation, and no data nodes are available, it will crash and the client is left without a clear error message.  The server should respond with a clear error as is done when hybrid mode is not enabled.

Comments

Mike Link - 2012-11-12

Fixed for 5.2.3.

Globus Toolkit/GT-315

Summary

Windows GridFTP server does not list filenames with extended characters

Details

Type: Bug

Status: Resolved 2012-11-19

Description

Windows has ascii and unicode versions of it's directory listing functions.  The ascii functions are being used and do not return usable entries for files with extended characters.  The end result is that these files get skipped in listings.

Comments

Globus Toolkit/GT-316

Summary

Reverse DNS failure on control channel connection results in 0.0.0.0 reported in logs.

Details

Type: Bug

Status: Resolved 2012-11-12

Description

[5034] Thu Oct 25 17:41:21 2012 :: Couldn't get remote contact. Possibly using a non-tcp protocol.
[5034] Thu Oct 25 17:41:21 2012 :: New connection from: 0.0.0.0

Should clarify the error and report the actual IP address.

Comments

Mike Link - 2012-11-12

Fixed for 5.2.3.  The cause of the error is not directly a reverse DNS failure, but in any case the logs will now report a valid IP address.

Globus Toolkit/GT-317

Summary

GRAM "current_jobs" usage stats are incorrect

Details

Type: Bug

Status: Open

Description

Looks like some counting is off in GRAM, as I see a lot of data in the usage stats tables with negative current_jobs:

select count(distinct job_manager_instance_id) from gram5_job_manager_status
 where current_jobs < 0 and date(status_time) = '2012-11-07';
 count
-------
   283
(1 row)

Comments

Globus Toolkit/GT-318

Summary

globus-url-copy (guc) and "-do" option inconsistency

Details

Type: Bug

Status: Resolved 2014-01-22

Description

During testing a new version of [tgftp] I discovered an inconsistency in globus-url-copy (guc) v8.6 (from the GT5.2.2 DEB packages from [1]) when using the "-do" option. The problem is, that when "transferring" only one file and requesting a dump, this dump does not contain the additional information a normal dump contains, like offset, size, modify timestamp and permissions. Theses additional (and very useful) information is only included, when "transferring" at least two files. Please check the following printouts for details:

"
$ guc -do - gsiftp://gridftp.omicron.jupiter/~/my_source_dir/1?_128MB gsiftp://gridftp.omicron.neptune/~/my_destination_dir/
"gsiftp://gridftp.omicron.jupiter/~/my_source_dir/10_128MB" "gsiftp://gridftp.omicron.neptune/~/my_destination_dir/10_128MB" 0,-1 size=134217728;modify=1350743120;mode=0644;
"gsiftp://gridftp.omicron.jupiter/~/my_source_dir/11_128MB" "gsiftp://gridftp.omicron.neptune/~/my_destination_dir/11_128MB" 0,-1 size=134217728;modify=1351177826;mode=0644;

$ guc -do - gsiftp://gridftp.omicron.jupiter/~/my_source_dir/10_128MB gsiftp://gridftp.omicron.neptune/~/my_destination_dir/
"gsiftp://gridftp.omicron.jupiter/~/my_source_dir/10_128MB" "gsiftp://gridftp.omicron.neptune/~/my_destination_dir/10_128MB"
"

Comments

frank.scheiner - 2012-11-15

Hi all,

just added the remainder of the orginal IGE ticket:

"
[...]
guc is not a wrapper here, but just a link to globus-url-copy to save me some keystrokes.

I use this functionality to get the size of a transfer in order to calculate the effective transfer rate. And just for consistency I think "-do" should behave the same for one or n files. Could you please fix this? Thanks in advance.

Best regards
Frank Scheiner
_______
[tgftp] 
[1] 
"

Best regards
Frank Scheiner

frank.scheiner - 2012-12-09

Dear all,

I recently also used transfer lists (with "guc -f ") and there's an issue (when using "-do") similar to the already described one in this ticket/issue.

Example:

"
$ cat transferList
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/10_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/10_128MB" 0,-1 size=134217728;modify=1350743120;mode=0644;
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/11_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/11_128MB" 0,-1 size=134217728;modify=1351177826;mode=0644;
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/12_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/12_128MB" 0,-1 size=134217728;modify=1351840800;mode=0644;

$  guc -do - -f transferList
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/10_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/10_128MB" 0
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/11_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/11_128MB" 0
"gsiftp://gridftp.omicron.jupiter:2811/~/my_source_dir/12_128MB" "gsiftp://gridftp.omicron.neptune:2811/~/my_destination_dir/12_128MB" 0
"

If I omit the additional info after the URLs in the transfer list - which would be the more common case, when users create a transfer list manually - the additional "0" is omitted in the output, as for the single file in the initial description.
Used tools (guc) and servers (ggs) are the same versions as in the initial description.

Best regards
Frank Scheiner

Mike Link - 2014-01-22

Fixed after 5.2.5.

Globus Toolkit/GT-319

Summary

Fix usage uploader errors

Details

Type: Task

Status: Resolved 2012-11-26

Description

There have been some errors in the uploader logs lately where a database assertion is not being met by some values. This is causing the uploader to abort the upload for a particular hour's data.

Comments

Joe Bester - 2012-11-14

There was a bug in the gram5 packet parser where it used the packet send time instead of the packet's GLOBUS_L_GRAM_USAGE_STATUS__TIME value to determine when the packet data was created.

Joe Bester - 2012-11-14

something more is going on

Joe Bester - 2012-11-21

I've implemented some rollback points in the uploader so that if it hits an error it can pull out the bad packet and commit the rest. It also reports the sender and type of the packets it couldn't install. I'm using the new uploader manually now to push the old packets into the db and then will insert it into the crontab to automate.

Joe Bester - 2012-11-26

The new uploader is in place and committed to CVS trunk.

Globus Toolkit/GT-320

Summary

GRAM job submission support with SLURM scheduler on TACC Stampede

Details

Type: New Feature

Status: Resolved 2013-04-22

Description

Currently the Southern California Earthquake Center is using GRAM job submission to submit our scientific workflows for execution on TACC Ranger.  This has been extremely successful for us and has enabled us to execute hundreds of millions of tasks on Ranger during its lifespan.

In January 2013 a new TACC cluster and XSEDE resource, Stampede, is coming online, and in February 2013 Ranger will be decommissioned.  We would like to move our workflows from Ranger to Stampede.  Since Stampede will be using the SLURM scheduler, I suspect a GRAM/SLURM interface may need to be written, and I would like to request this so we can port our workflows to Stampede.  I understand that both Globus and TACC have many other priorities and we certainly don't need this in place when the system initially comes online, but it would be nice from our perspective if GRAM job submission was in place by Spring 2013.

I'm happy to help debug as a friendly user.  Please contact me if you have questions or if I can be of assistance.

Comments

Stuart Martin - 2013-04-22

GRAM has been installed and working on Stampede.  There have been some issues reported, but those will be handled separately.

Globus Toolkit/GT-321

Summary

Globus Toolkit install some part in $PREFIX directory while compiling

Details

Type: Improvement

Status: Open

Description

Globus Toolkit install some part in $PREFIX directory while compiling. Is there a possibility to make this instalations while make install not while make all?

Comments

Joe Bester - 2012-11-19

Right now that is not possible, as the make steps require libraries and headers from the dependent components. It's a little buggy, but since most people use the RPM or Debian packages, it's a low priority. What platform are you using---is it one we support with the RPM or Debian packages?

bbiegun - 2012-11-19

I using Gentoo Linux. I build whole package from source. Currently I writing ebuild files - spec files equivalent.

Globus Toolkit/GT-322

Summary

Globus Toolkit doesn’t compile with multithread make

Details

Type: Bug

Status: Open

Description

Globus Toolkit doesn't compile with multithread make.
make -jN compile all packages in non-correct order. Compilation must be performed with one thread (-j1).

Comments

Joe Bester - 2012-11-19

This is noted in the install documentation on the web site.

bbiegun - 2012-11-19

Is a possibility to make this compilation threadable?

Globus Toolkit/GT-323

Summary

memory leak in globus-gridftp-server

Details

Type: Bug

Status: Resolved 2013-01-14

Description

A rename command issued from a client causes the gridftp server to leak memory.  Over time, the server process consumes all machine memory.

The following bash command illustrates this.

#
# rename /tmp/foo -> /tmp/bar and then undo; repeat.
#
touch /tmp/foo; rm -f /tmp/bar; i=0; while [ 1 ]; do /usr/bin/uberftp -rename gsiftp://slave1.localdomain:15000/tmp/foo  /tmp/bar || break; /usr/bin/uberftp -rename gsiftp://slave1.localdomain:15000/tmp/bar  /tmp/foo || break; echo $(( i=i+1 )); done

To see the memory footprint, the following command was executed.  Attached is the output of 300s.

#
# run ps with some args to watch gridftp-server process, sleep 1s; repeat.
#
while [ 1 ]; do ps -eo pid,user,%mem,rss,vsz,%cpu,command | grep gridftp | grep -v grep >> gridftp.log; sleep 1; done

Also attached is the configuration file that was used by gridftp-server process.

Comments

jeffk - 2012-11-20

The above report is from an old server version.  I tested this problem against the gridftp server from the Debian Squeeze GT 5.2.2 packages provided by Globus.  I see the same behavior, namely memory use increasing without bound.

#
# The server version report:
#
$ /usr/sbin/globus-gridftp-server -v
globus_gridftp_server: 6.15 (1348095937-83)

Thanks,
--Jeff

jeffk - 2012-11-20

One more note: when a client caches its connection to the server, it is able to perform many different actions without causing the memory problem server-side.  Memory use appears to be a function of the # of connections by a client and not actions taken by the client.  (This was observed using a small client written against the C API.)

Mike Link - 2013-01-14

Fixed for 5.2.4

Globus Toolkit/GT-324

Summary

Behaviour of globus-job-status

Details

Type: Bug

Status: Open

Description

I have installed the globus-gram-client-tools from the official Globus Debian
Repository. When I submit a job to a GRAM server using globus-job-submit I get
back a URL that I want to use to watch the status of the job. I noticed, that I
can change parts of that URL and globus-job-status will always return the status
"DONE". That can't be right - the logfile tells me something like that:
"status=-156 uri= msg="Unable to find job for URI" reason="the job contact string does not match any which the job manager is handling""

Will this be fixed anytime soon?

Comments

Joe Bester - 2012-11-21

From GRAM's point of view, unless you use a two-phase commit, the job manager will not remember the state of old jobs. So anything that doesn't match an actively running job will appear done.

 We've got some ideas about changes to GRAM that would make this go away as well as fix other problems, but they've not risen to the top priority yet.

Globus Toolkit/GT-325

Summary

Create 5.2.3 release

Details

Type: Task

Status: Resolved 2012-12-03

Description

Create and publish 5.2.3.

Comments

Joe Bester - 2012-11-27

The build is complete, and should be ready to get to the ftp site tomorrow morning.

Joe Bester - 2012-12-03

Announced this morning.

Globus Toolkit/GT-326

Summary

Merge etc/package-list and etc/package-list-5.1.0

Details

Type: Task

Status: Open

Description

In the packaging/etc directory, there are two files which map package names to source directories, with slightly different formats, used by different tools in the build process. Those should be merged and the tools updated to use the new file. Otherwise, each release we need to modify both files for things to work.

Comments

Globus Toolkit/GT-327

Summary

Security doc needs update for GT5

Details

Type: Improvement

Status: Open

Description

This page has a broken latest-stable link  http://www.globus.org/security/

It should probably point to this page http://www.globus.org/toolkit/security/ but it need some updates:
  - a GT5 section is needed that maybe should grab sections from GT2?  http://www.globus.org/toolkit/docs/2.4/gsi/
  - The GT4 section should be moved to "older releases"
  - "Other security project pages" should have a link to GSI?
  - The SweGrid note should be removed.

Maybe we should move all "older releases" doc so it does not show up on these main pages.  Make it so you have to go to "older release" see see don on them.  Make it very clear that there is the current stuff and then the older stuff.  Time to relegate GT4 and older so it is generally not seen.

Comments

Globus Toolkit/GT-328

Summary

Remove obsolete links and docs from http://www.globus.org/toolkit

Details

Type: Task

Status: Open

Description

In making the 5.2.3 release, I saw a bunch of badly outdated information on the toplevel toolkit web pages: http://www.globus.org/toolkit
Things like links to nmi-build system which is no longer running, mentions of bugzilla and dev.globus.org, doc about 4.0 services, and links to latest-stable documentation that don't match the web layout for the 5.2 doc.

Comments

Globus Toolkit/GT-329

Summary

globus-version has rc number in debian packages

Details

Type: Bug

Status: Resolved 2012-12-05

Description

I forgot to update the debian/rules for to include the correct GT version, so globus-version will report 5.2.3rc0 instead of 5.2.3 on debian systems with the latest rpms.

Comments

Joe Bester - 2012-12-05

Committed a fix to CVS and have added update packages for debian with the correct output.

Globus Toolkit/GT-330

Summary

GridFTP server does not UTF8 encode filenames

Details

Type: Bug

Status: Resolved 2012-12-21

Description

GridFTP server does not UTF8 encode filenames with special characters as required by RFC 2640.  This is mostly an issue on Windows, but can also be an problem in Linux if the locale isn't compatible with UTF8.

Comments

Globus Toolkit/GT-331

Summary

Debug xsede myproxy limited proxy issues

Details

Type: Bug

Status: Resolved 2012-12-04

Description

xsede myproxy limited proxies are failing on older openssl versions.

Comments

Mike Link - 2012-12-04

The problem was with on the myproxy end -- the subject name in the proxy is not correctly encoded.

Globus Toolkit/GT-332

Summary

Prepare GC Windows for handoff

Details

Type: Task

Status: Resolved 2012-12-21

Description

Clean up code, update doc, update GridFTP binaries to latest code.

Comments

Globus Toolkit/GT-333

Summary

Add usage aggregation tables for gridftp

Details

Type: New Feature

Status: Resolved 2012-12-14

Description

Raj and IGE have some summary queries that they do regularly on the month scope, which take a very long time with the current usage stats tables. We can probably help these out by adding some summary tables that contain some aggregated information instead of a row for each transfer. For this feature, define new tables as necessary to do the queries and modify the uploader to insert things into the aggregator tables, and distribute some new queries that will produce the same reports as the earlier queries, but more efficiently.

Comments

Joe Bester - 2012-12-14

I've added a few new tables to aggregate the information from the gftp_transfers and gram5_jobs tables. These are: gftp_aggregations_hourly,
gftp_aggregations_daily, gram5_aggregations_hourly, and gram5_aggregations_daily.  The hourly ones are updated during the normal data upload as a summary of the packets contained in a particular globus-usage-collector data file.  The daily ones are generated during a cronjob that independently aggregates the last day's data from the hourly data. I'll attach some sample queries which use the new tables. The aggregation tables are being populated as of 2012-12-14, but I've started an aggregation query to get November and the earlier parts of December into the hourly aggregation tables. It will probably run over the weekend. We can probably aggregate about a month's worth in a day if we want to aggregate back to the last major set of queries.

Joe Bester - 2012-12-14

Some example gridftp and gram5 usage queries that use the new aggregation tables.

Globus Toolkit/GT-334

Summary

segfault using ftp control lib

Details

Type: Bug

Status: Resolved 2013-01-14

Description

GO had a crasher with 5.0.5, here's trace: (https://globus.atlassian.net/browse/KOA-2351 has details)

#0 0x00007feaa7c4c7e7 in gss_unwrap ()
   from /usr/local/gogt/lib/libglobus_gssapi_gsi_gcc64dbg.so.0
#1 0x00007feaa888bbf8 in ?? ()
   from /usr/local/gogt/lib/libglobus_ftp_control_gcc64dbg.so.0
#2 0x00007feaa888b1e9 in ?? ()
   from /usr/local/gogt/lib/libglobus_ftp_control_gcc64dbg.so.0
#3 0x00007feaa6c0eaef in ?? ()
   from /usr/local/gogt/lib/libglobus_io_gcc64dbg.so.0
#4 0x00007feaa6173fc7 in globus_l_xio_read_write_callback_kickout ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#5 0x00007feaa6173e42 in globus_i_xio_read_write_callback ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#6 0x00007feaa617f426 in globus_l_xio_driver_op_read_kickout ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#7 0x00007feaa61942a2 in globus_xio_driver_finished_read ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#8 0x00007feaa61c2c15 in ?? ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#9 0x00007feaa61c2cfe in ?? ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#10 0x00007feaa6198843 in ?? ()
   from /usr/local/gogt/lib/libglobus_xio_gcc64dbg.so.0
#11 0x00007feaa71c4430 in globus_callback_space_poll ()
   from /usr/local/gogt/lib/libglobus_common_gcc64dbg.so.0
#12 0x00000000004079dd in WaitLock::wl_wait_until_done (this=0x7fff97694dd0)
    at conn/../lib/waitlock.h:37
#13 0x0000000000407453 in conn::conn_run () at conn/connect.cpp:483
#14 0x0000000000412d88 in main (argc=11, argv=0x7fff97694f18)
    at fxp/fxp.cpp:494

Comments

Mike Link - 2013-01-14

Fixed for 5.2.4

Globus Toolkit/GT-335

Summary

Root directory still visible for listing even if restrict_paths prohibits it

Details

Type: Bug

Status: Resolved 2013-01-29

Description

If the globus-gridftp-server configuration file contains a restrict_paths option that does not include the root directory, it still can be displayed even though you will be unable to transfer files to/from it.  For example, if the configuration file has a line similar to this:

restrict_paths rw/glade

You can still display the contents of the root directory with "globus-url-copy -list gsiftp://server/" or another client like uberftp with the dir command.

Comments

Mike Link - 2012-12-06

You will always be able to list /, but only allowed paths or their parents should be shown.  If that isn't the case, can you show a specific example?

cruff@ucar.edu - 2012-12-06

For this example, the following are set in the server configuration file:

use_home_dirs 1
restrict_paths rw/glade

Since / is explicitly not permitted by the restrict_paths setting, the following is not intuitive behavior, and could potentially leak sensitive information.

UberFTP> open -P 388 gridftp01.ucar.edu
220-NCAR GLADE GridFTP Service
...
230 End.
UberFTP> cd /
UberFTP> dir
drwxr-xr-x  20     root     root         4096 Apr 19 09:55 .
drwxr-xr-x  16     root     root         4096 Apr 19 09:55 ..
-rw-r--r--   1     root     root            0 Jan  5 09:58 HEREIAM
drwxr-xr-x   2     root     root         4096 Apr 19 09:56 bin
drwxr-xr-x   6     root     root         1024 Nov 28 10:40 bluefire
drwxr-xr-x  19     root     root        65536 May  1 11:11 dasg_proj2
drwxr-xr-x   2     root     root         4096 Nov 12 09:30 dev
drwxr-xr-x   6     root     root         4096 Dec  5 15:49 etc
drwxr-xr-x   3     root     root         4096 Jan  4 14:33 fs
drwxr-xr-x  14     root     root         1024 Nov 28 10:02 glade
drwxr-xr-x   2     root     root         4096 Dec 14 09:28 gpfs
drwxr-xr-x   3     root     root        53248 Sep 28 10:32 home
drwxr-xr-x   3     root     root         4096 Dec  5 15:07 lib64
drwxr-xr-x   2     root     root         4096 Jan 12 08:55 proc
drwxr-xr-x 2771     root     root       262144 Nov 19 12:46 ptmp
drwxr-xr-x   2     root     root         4096 Apr 19 09:58 sbin
drwxr-xr-x   2     root     root         4096 Dec  5 15:11 tmp
drwxr-xr-x   6     root     root         4096 Dec  5 15:04 usr
drwxr-xr-x   5     root     root         4096 Sep 21 13:03 var
UberFTP> get HEREIAM
HEREIAM: 500 Command failed : Path not allowed.
UberFTP> cd proc
proc: 500 Command failed : Path not allowed.
UberFTP> cd dev
dev: 500 Command failed : Path not allowed.
UberFTP> dir home/skel
home/skel: 500 Command failed : Path not allowed.
UberFTP> cd glade/u/home/cruff
UberFTP> dir
drwxr-xr-x   3    cruff     univ        16384 Dec  5 14:02 .
drwxr-xr-x 2057     root     root       131072 Dec  5 09:37 ..
-rw-r--r--   1    cruff     ncar      3198499 Dec  4 11:19 gridftp-parallel-test
UberFTP> quit
221 Goodbye.

Mike Link - 2012-12-06

OK, that is a bug -- you should only see /glade in that list.

I'm not able to reproduce that behavior though.  Is there anything else interesting with your config?

cruff@ucar.edu - 2012-12-06

Figured it out.

The problem is that the documentation does not really make clear the division of labor when using front end and data nodes.  The problem was caused by the fact that the configuration file for the data nodes did not include the restrict_paths setting as used on the control node.  When I added the appropriate restrict_paths to the data node configuration, it behaves as you say it should.  It also wasn't obvious that the directory listing processing was getting pushed to a data node, instead of being performed by the front end.  I suspect a documentation update would be appropriate.

Mike Link - 2012-12-06

Ah.  Thanks for figuring it out.  I'll see what I can do to make it clear, or possibly have the data node inherit the settings from the front end.

Mike Link - 2013-01-29

Updated documentation to specify that restrict paths should be set on both the front end and data node.

Globus Toolkit/GT-336

Summary

Problem with globbing characters in SITE CHROOT path names when SITE RESTRICT is used

Details

Type: Bug

Status: Resolved 2013-01-15

Description

The sequence:

SITE CHROOT /globchar/
SITE RESTRICT rw/

results in a globchar in the rp list without indication that it is meant to be an explicit pathname.

Comments

Mike Link - 2013-01-14

Fixed.

Karl Pickett - 2013-01-15

can you clarify this?  your im told me it still assumes a glob unless \ escaped.

Mike Link - 2013-01-15

characters in CHROOT are always literal, no escaping or encoding allowed.

since RESTRICT supports wildcards, you need to escape any that are meant to be literal.

Mike Link - 2013-01-15

Although the RESTRICT wildcard support is only there because it mimics the -rp config semantics.   so if wildcard stuff isn't worth the extra complication I could always treat RESTRICT paths as literals.

Globus Toolkit/GT-337

Summary

Add GridFTP protocol support for UDT+STUN

Details

Type: New Feature

Status: Resolved 2013-06-04

Description

Add the ftp commands UPSV and UPRT to pass a candidate list between source and destination of a UDT+STUN connection.

UPSV returns the candidate list from the server, UPRT accepts the remote candidate list from the client.

Comments

Globus Toolkit/GT-338

Summary

Investigate pre-authorization process hangs

Details

Type: Bug

Status: Resolved 2013-01-14

Description

Pre-authorization server process hangs were recently reported.

Comments

Mike Link - 2013-01-14

Could not reproduce a pre-auth hang.  Fixed a consistently reproduced striped hang.  Possibly fixed a not-easy-to-reproduce nonstriped hang.

Globus Toolkit/GT-339

Summary

Add STUN support to XIO UDT driver

Details

Type: New Feature

Status: Resolved 2013-06-04

Description

STUN support needs to be added to the XIO udt driver, with interfaces to enable the gridftp server to set and get candidate lists.

Comments

Mike Link - 2013-01-14

Got test app from Bryce and spent some time looking it over and started to integrate the functionality into the xio udt driver.

Globus Toolkit/GT-340

Summary

Add ability to relocate toolkit rpms

Details

Type: New Feature

Status: Open

Description

On production systems, it is often very important to be able to switch
quickly between multiple versions of software, particularly when testing
a new version of software for future deployment.  When doing this with
RPMs, it is helpful to be able to relocate where the RPMs are installed:

rpm -i --relocate /usr=/opt/pkg/$VERSION 

For instance, NICS does this on Nautilus for SGI's MPT software, which
by default installs in /opt.  On Nautilus, /opt is an NFS filesystem
with root-squash enabled, so these RPMs are installed into an alternate
location and then copied manually into /opt on the NFS server.

Comments

Globus Toolkit/GT-341

Summary

GPT metadata problem in new myproxy callout package

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The package builds a library but lists its dependencies as pgm_link rather than lib_link. It also has a full stop at the end of its description string. Patch attached.

Comments

Joe Bester - 2013-03-22

Thanks. The fix is committed.

Globus Toolkit/GT-342

Summary

Warning from GPT with newer perl version

Details

Type: Bug

Status: Resolved 2013-06-20

Description

New perl version shows warnings when using GPT. Patch attached.

Comments

Joe Bester - 2013-06-20

This is fixed in CVS and will be available in 5.2.5

Globus Toolkit/GT-343

Summary

Add additional latex build requirements for newer fedora releases

Details

Type: Bug

Status: Resolved 2013-06-20

Description

Fedora 18 introduced TexLive 2012, and made the tex packaging more modular at the same time. This means that additional build requirements are needed to build the documentation. The globus-spec-creator script should be changed to reflect this. Patch attached.

Comments

Mattias Ellert - 2013-05-30

This patch in addition updates the spec creator script by removing some clutter no longer needed after RHEL4 is EOL.

Joe Bester - 2013-06-20

This is fixed in CVS and will be in 5.2.5

Globus Toolkit/GT-344

Summary

Cut and past error in gpt metadata for GRAM LSF module

Details

Type: Bug

Status: Resolved 2013-11-07

Description

The LSF support package was created by modifying the SGE support package, but package description was not changed in the gpt metadata file.. Patch attached.

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-345

Summary

Portability problem in PBS GRAM support module

Details

Type: Bug

Status: Open

Description

The globus-gram-job-manager-pbs package fails to build on debian on a kfreebsd kernel because it uses the ENOSR error code which is not defined on this platform. Patch attached.

Comments

Globus Toolkit/GT-346

Summary

Improve GridFTP testing coverage by adding new tests that use FXP client

Details

Type: Improvement

Status: Open

Description

FXP would be useful for add tests for:
        - DCSC
        - Pipelining
        - Sharing
        - no delegation
        - http (future)

Comments

Globus Toolkit/GT-347

Summary

Fix XSEDE GridFTP user guide to clarify logging section

Details

Type: Task

Status: Open

Description

From a reviewer:

In chapter 7 (Debugging), the explanation of log formats is inconsistent.
For example:

As of Globus 4.2.0, GridFTP server provides system administration logs in 2 different formats. The
CEDPS best practices compliant format is a new format provided by GridFTP server available in Globus 5.2.2.
In section 1.1 there is a link to CEDPS which points to http://cedps.net/index.php/LoggingBestPractices, this doesn't appear to be a relevant link and should be removed. An explanation of what CEDPS logging is could be added in it's place. Also, in the same section, the links to sample logs return a 404. Perhaps explain why a user would choose one log format over another, or at least explain the differences between them.

Comments

Rachana Ananthakrishnan - 2013-01-09

http://globus.org/toolkit/docs/latest-stable/gridftp/admin/#gridftp-admin-debugging

Globus Toolkit/GT-348

Summary

gridftp log file entries are incorrect for striped transfers

Details

Type: Bug

Status: Resolved 2013-01-29

Description

Mike Link confirmed the below is a bug...

I am running gridftp 5.2.2 on Kraken on a test port and I see this in the logs for a transfer submitted by user zzzzz.

DATE=20130108205145.813704 HOST=gridftp5.nics.utk.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20130108205145.751106 USER=:globus-mapping: FILE=/xxxx/y/home/zzzzz/log_test/onemegfile BUFFER=0 BLOCK=262144 NBYTES=1048576 VOLUME=/ STREAMS=4 STRIPES=2 DEST=[192.249.6.25,192.249.6.25] TYPE=RETR CODE=226

Why does it have "USER=:globus-mapping:" for the username?   I am wondering why it doesn't just have sudarshan as the user.  I wonder does it have something to do with striping?

-Victor

Comments

Mike Link - 2013-01-29

Fixed for 5.2.4

Globus Toolkit/GT-349

Summary

SITE CHROOT on a disallowed path crashes

Details

Type: Bug

Status: Resolved 2013-01-14

Description

SITE CHROOT on a disallowed path crashes.

Comments

Mike Link - 2013-01-14

Fixed for 5.2.4

Globus Toolkit/GT-350

Summary

Various memory leaks in gridftp server

Details

Type: Bug

Status: Resolved 2013-01-14

Description

There are a few memory leaks in new code, as well as some older edge/error cases.

Comments

Mike Link - 2013-01-14

Fixed all leaks with sharing code.
Fixed a long time leak with mode S pasv and some LOSF cases.
Fixed a long time leak with errors on generic commands with striping.
Fixed a bunch of config initialization leaks.

Globus Toolkit/GT-351

Summary

GridFTP server config line limits

Details

Type: Bug

Status: Resolved 2013-01-29

Description

GridFTP server config lines are limited to 1024 chars.  There is no need for a limit and some ENV var settings can be very long.

Comments

Mike Link - 2013-01-29

Fixed for 5.2.4

Globus Toolkit/GT-352

Summary

*other* components that use the globus-ftp-client and globus-xio *libraries* (e.g. the FTS transfer agents), where the IPv6 option is *not* enabled by default.

Details

Type: Improvement

Status: Open

Description

There are *other* components that use the globus-ftp-client and globus-xio *libraries* (e.g. the FTS transfer agents), where the IPv6 option is *not* enabled by default, and can only be enabled via specific API calls (globus_ftp_client_operationattr_set_allow_ipv6, globus_io_attr_set_tcp_allow_ipv6).
It would be much preferable if IPv6 could be enabled (via an environment variable, or the like) at the Globus library level, rather than having to modify all existing users of the Globus libraries and add a configuration option in each of them.  The history of GGUS #80628 seems to imply that Globus was willing to address this at the library level, but, as of GT5.2.3 (I just checked it as well) nothing has occurred yet there.


Best Regards
Adrian

Comments

Globus Toolkit/GT-353

Summary

Globus-GridFTP-Server 6.14 Memory Explosion

Details

Type: Bug

Status: Resolved 2013-03-22

Description

If you use a dsi module with globus-gridftp-server-6.14 (specifically xrootd-dsi), the globus-gridftp-server hangs up the request unexpectedly after using up all the memory on the machine and then crashing.

An example conf file used was:
daemon 0
log_level ERROR,WARN,INFO,ALL
log_module stdio
debug 1
port 5002
blocksize 1048576
load_dsi_module posix

The client gets the following error:
error: an end-of-file was reached
globus_xio: An end of file occurred

No errors are reported on the server, but it uses up all the memory on the machine and is killed by the OS.
It seems to be stuck in this procedure:

r0  0x0000003d16672208 in _int_malloc () from /lib64/libc.so.6
#1  0x0000003d16673bae in malloc () from /lib64/libc.so.6
#2  0x00000033ae027663 in globus_list_insert ()
   from /usr/lib64/libglobus_common.so.0
#3  0x00000033ae0242fc in globus_hashtable_to_list ()
   from /usr/lib64/libglobus_common.so.0
#4  0x00000033b303e3e3 in globus_l_gfs_auth_session_cb (reply=0x7fffffffe270,
    user_arg=0x66d6a0) at globus_i_gfs_control.c:646
#5  0x00000033b302905c in globus_l_gfs_operation_finished_kickout (
    op=, result=0, finished_info=0x7fffffffe270)
    at globus_i_gfs_data.c:9017
#6  globus_gridftp_server_operation_finished (op=,
    result=0, finished_info=0x7fffffffe270) at globus_i_gfs_data.c:9191
#7  0x00002aaaab2fcabf in globus_l_gfs_posix_start ()
   from /usr/lib64/libglobus_gridftp_server_posix.so
#8  0x00000033b3021d14 in globus_l_gfs_data_brain_ready_delay_cb (
    user_arg=0x650170) at globus_i_gfs_data.c:1633
#9  0x00000033b3022124 in globus_l_gfs_data_auth_init_cb (
    object=, action=,
    user_arg=0x650170, result=0) at globus_i_gfs_data.c:1712
#10 0x00000033b3019641 in globus_l_gfs_acl_kickout (user_arg=0x66d780)
    at globus_i_gfs_acl.c:104
#11 0x00000033ae01850e in globus_callback_space_poll_nothreads ()
   from /usr/lib64/libglobus_common.so.0
#12 0x00000033ae03846f in ?? () from /usr/lib64/libglobus_common.so.0
#13 0x00000000004051d2 in main (argc=4, argv=)
    at globus_gridftp_server.c:1857

Seems to be a problem in globus_i_gfs_control.c with code added between 6.5 and 6.14.
            rc = globus_hashtable_to_list(
                &reply->op_info->custom_command_table, &list);

I can give more information on the xrootd posix dsi module if necessary.  It's possible it doesn't have all the proper methods defined or something, but this is still an unacceptable mode of error that the whole server eats up all the memory then dies.


Doug Strain
OSG Software

Comments

Mike Link - 2013-01-17

Is it possible that DSI was built against an older version of GridFTP?  There was an ABI break between 4.x and 6.x, and according to
what I see here: http://vdt.cs.wisc.edu/upstream/xrootd-dsi/3.0.4-2/xrootd-dsi/globus_gridftp_server_posix.c, that is a likely explanation for the crash.

Despite the technical incompatibility, we've tried to work around situations where a DSI would be affected by the ABI change, and I may be able to do that here as well, but that won't always be the case.  If possible, the DSI should be rebuilt against a post-6.x version.

Mike Link - 2013-01-17

Actually, it appears the soname bump didn't happen until between 6.6 and 6.7, so you would want to rebuild against 6.7 or later.

dstrain - 2013-01-17

Ah, I see.  That would explain it.  I will try to rebuild xrootd-dsi against the newer version and let you know.

bbockelm - 2013-01-17

Hi Doug,

Look at the build info for xrootd-dsi:

https://koji-hub.batlab.org/koji/rpminfo?rpmID=13184

It's probably an issue with xrootd-dsi: it appears the xrootd-dsi library is not explicitly linked against the gridftp library (likely it means it resolves symbols at runtime as it is a loadable module).  Hence, the wrong version of the gridftp library got loaded (with respect to the version it was compiled against), the wrong data structure used, and garbage got fed to malloc.

Fault is probably at both sides:
* The DSI module design probably ought to use explicit version checking and refuse to load improperly versioned plugins.
* We should get the upstream plugin devs to explicitly link against a particular version so RPM dependency checking picks up this issue.

I suspect gridftp-hdfs also has the same issue as xrootd-dsi.

Brian

dstrain - 2013-01-23

I have rebuilt the xrootd-dsi and gridftp-hdfs DSI modules using the new ABI from globus-gridftp-server 6.14.  This fixed the issues.
The modules still don't explicitly link against libglobus_gridftp_server for some reason (or at least rpm is not catching it), but that's a problem for us to fix.

You can close this ticket.  (I apparently do not have the permissions to do so.)

Mike Link - 2013-03-22

Added some protection against this which was released with 5.2.4.

Globus Toolkit/GT-354

Summary

Compatibility with automake 1.13

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The AM_CONFIG_HEADER macro has been removed from automake 1.13. It should be replaced with AC_CONFIG_HEADERS:

AM_CONFIG_HEADER(header.h)  AC_CONFIG_HEADERS([header.h])

Affected packages: globus-core, globus-common, globus-gsi-cert-utils, globus-gsi-callback, globus-xio, globus-gatekeeper, globus-gridftp-server

Comments

dennisvd - 2013-01-18

This is exactly what I ran into with the macports packaging; recently macports updated to automake 1.13.

Mattias Ellert - 2013-02-27

Patches for affected components.

Joe Bester - 2013-06-20

This is all fixed in cvs and will be in 5.2.5

Globus Toolkit/GT-355

Summary

Myproxy-server init script problems on debian

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The myproxy-server init script in the debian package sets but does not export the X509_USER_* variables, so by default it fails to load its credentials. Also, the script doesn't explicitly leave the current directory, so it can fail oddly if the CWD is /root or another dir unreadable/unexecutable by the myproxy user.

Comments

Globus Toolkit/GT-356

Summary

Add configuration and a command to make the sharing authorization file easier to manage

Details

Type: New Feature

Status: Resolved 2013-06-04

Description

The default sharing auth file of $HOME/.globus_sharing may not work for all sites.  Add configuration to define the path to the sharing file.

To ease enabling of sharing from GO, and since the location of the sharing file may not be known by the client, a new command "SITE SHARING" will create the sharing file locally.

Add configuration to enable SITE SHARING.

Comments

Mike Link - 2013-06-04

changed sharing file management to support a per-uuid configuration.

Globus Toolkit/GT-357

Summary

Extend globus_ftp_control_authenticate() to allow the caller to set req flags such as delegation.

Details

Type: New Feature

Status: Resolved 2013-01-23

Description

A user of the globus_ftp_control lib may wish not to delegate its credentials when making a connection.  There is no way to do this.

Add globus_ftp_control_authenticate_ex() with the same semantics as globus_ftp_control_authenticate(), except that it will honor auth_info->req_flags. fxp will use this.

Comments

Globus Toolkit/GT-358

Summary

Invalid values for boolean config options silently sets the option false.

Details

Type: Bug

Status: Resolved 2013-01-25

Description

Boolean options in a config file must be valued 0 or 1.  Any other string, including 'true', will result in a false setting with no error.

Comments

Mike Link - 2013-01-25

Fixed.

Globus Toolkit/GT-359

Summary

SGE SEG hangs when log_path points to directory

Details

Type: Bug

Status: Resolved 2013-06-26

Description

Iwona Sakrejda found an apparent bug with the SGE SEG. When she set log_path (in globus-sge.conf) to a directory, it appeared that the SGE SEG hung, repeatedly trying to read the directory as a file. Here is her strace output fragment:

{noformat}
read(4, 0x7f0bfd3af000, 32768)          = -1 EISDIR (Is a directory)
read(4, 0x7f0bfd3af000, 32768)          = -1 EISDIR (Is a directory)
read(4, 0x7f0bfd3af000, 32768)          = -1 EISDIR (Is a directory)
read(4, 0x7f0bfd3af000, 32768)          = -1 EISDIR (Is a directory)
{noformat}

Changing to a file, especially the correct file path, allowed the SEG to make forward progress.

It would be nice if the SEG would check for a directory and, if so, log an error and give up, instead of hanging.

Comments

Joe Bester - 2013-02-20

Fixed in CVS and pushed as an update package to globus.org

Globus Toolkit/GT-360

Summary

signals ignored by subprocesses

Details

Type: Bug

Status: Open

Description

We use globus libraries in a Python application that regularly spawns subprocesses.  The problem we have is that spawned subprocesses ignore SIGTERM, SIGHUP, etc whenever the globus libraries are used.  Specifically, we think the issue is triggered when

globus_module_activate(GLOBUS_COMMON_MODULE);

is called. Indeed, the problem extends beyond Python modules to programs written in C. Below is a simple example in pure C.   When the program is compiled as it is presented here, signals are received  as expected.  But when the "globus_module_activate" line is uncommented and the program is re-complied, the spawned 'sleep' process ignores SIGTERM, SIGHUP, etc.

#include 
#include 
#include "globus_common.h"

int main(int argc, char **argv){
  int rc = 0;
  // please uncomment the next line
  // rc = globus_module_activate(GLOBUS_COMMON_MODULE);
  if (rc != GLOBUS_SUCCESS) {
    printf("Warning!");
  }
  printf("calling 'sleep 100'\n");
  printf("please send SIGTERM to the new sleep process\n");
  system("sleep 100");
  printf("Awake!\n");
  return 0;
}

Comments

Globus Toolkit/GT-361

Summary

globus-url-copy inconsistent data when using stdout pipe with parallel option

Details

Type: Bug

Status: Open

Description

I need to transfert a file from distant server to stdout pipe with globus-url-copy :
$ globus-url-copy gsiftp:/// - | 

But the data stream is inconsistent when I use parallel option and the size of the file is upper than 512k.

Example, when I create a file of 513k and I copy it with parallel option to md5sum command pipe, the md5sum of the stream is not equals to source file.

$ dd if=/dev/urandom bs=1024 count=513 of=/tmp/513k
513+0 records in
513+0 records out
$ md5sum  /tmp/513k
c8a35e2f67617077e4149002a498e171  /tmp/513k
$ globus-url-copy -p 8 gsiftp://yvas7820.inetpsa.com/tmp/513k - | md5sum
7e60f67a267bf7aaf835c68f37a0757f  -

I have the same result when a write the pipe stream to a file

$ globus-url-copy -p 8  gsiftp://yvas7820.inetpsa.com/tmp/513k - | cat - > /tmp/513k_dst
$ md5sum  /tmp/513k_dst
7e60f67a267bf7aaf835c68f37a0757f  /tmp/513k_dst

But i don't reproduce the issue :

- When I redirect to stdout to a file

$ globus-url-copy -p 8  gsiftp://yvas7820.inetpsa.com/tmp/513k - > /tmp/513k_dst
$ md5sum  /tmp/513k_dst
c8a35e2f67617077e4149002a498e171  /tmp/513k_dst

- When I don't use parallel option

$ globus-url-copy  gsiftp://yvas7820.inetpsa.com/tmp/513k - | md5sum
c8a35e2f67617077e4149002a498e171  -

- When I use a file smaller or equals to 512k

$ dd if=/dev/urandom bs=1024 count=512 of=/tmp/512k
512+0 records in
512+0 records out
$ md5sum /tmp/512k
6703c9b4087bf76249b420bef46316e6  /tmp/512k
$ globus-url-copy -p 8 gsiftp://yvas7820.inetpsa.com/tmp/512k - | md5sum
6703c9b4087bf76249b420bef46316e6  -

Comments

Globus Toolkit/GT-362

Summary

simple ca loses spaces in dn in signing policy

Details

Type: Bug

Status: Resolved 2013-06-26

Description

While testing gcmu, found a bug in simple ca where the signing policy loses spaces in the DN of the cond_subjects value.

Comments

Joe Bester - 2013-02-20

Fixed and updates pushed to globus.org

Globus Toolkit/GT-363

Summary

gss_get_mic/gss_verify_mic fail for some TLS ciphers with OpenSSL 1.0.1

Details

Type: Bug

Status: Resolved 2013-06-26

Description

In the transition of OpenSSL 1.0.0->1.0.1, some higher efficiency combined cipher and hash functions were added for AES-NI platforms which means the *_hash fields referenced in the Globus gss_get_mic/gss_verify_mic functions are NULL.

If the server has an Intel CPU which supports the AES_NI instruction set, the server will segfault if the client requests the use of the TLSv1 method.  In addition, this problem should only happen if a ciphersuite using a "stitched cipher" is used. For that to happen it has to support AES-NI and negotiate a ciphersuite using TLS 1.0 or later and AES and SHA1 MAC or RC4 and MD5 MAC. If the cipherstring avoids those cases, it won't happen.

This problem can be avoided by adding the following to the server environment:

OPENSSL_ia32cap=~0x200000200000000

A better solution was proposed by Dr. Henson (of the OpenSSL project) to make the get_mic.c and verify_mic.c more portable.

There is a second problem which could arise when using Globus with TLS 1.2.  There are ciphersuites used in TLS 1.2 which don't even have a MAC hash associated with them, the GCM ciphersuites. In that case the "hash" will still be NULL even with the Dr. Henson fix. Perhaps Globus can simply avoid using these ciphersuites.

The attached get_mic.c and verify_mic.c resolves the TLSv1 condition, but not the second problem related to TLSv1.2.

Thanks to Ken Robinette (InterSoft International, Inc.) for the above diagnosis.

Comments

Joe Bester - 2013-04-10

These patches are applied and in the GT package repository.

Globus Toolkit/GT-364

Summary

SSHFTP (GridFTP-over-SSH) segmentation fault

Details

Type: Bug

Status: Open

Description

I didn't success to use GridFTP over SSH with GridFTP of GT 5.2.3 or GT 5.2.4. It works succefully with GridFTP of GT 5.0.5.

I saw that the remote_contact variable in globus_l_gfs_new_server_cb method could be not set. I modified globus_gridftp_server.c to resolve the problem for my case and now it's works. But I do not know the impact of my modification. See the patch file in attachement.

Output of globus-url-copy
$  globus-url-copy -dbg /etc/group sshftp://127.0.0.1/tmp/group
debug: starting to put sshftp://127.0.0.1/tmp/group
debug: connecting to sshftp://127.0.0.1/tmp/group
debug: response from sshftp://127.0.0.1/tmp/group:
500 Server is not configured for SSHFTP connections.

debug: fault on connection to sshftp://127.0.0.1/tmp/group: globus_ftp_client: the server responded with an error
debug: data callback, error globus_ftp_client: the server responded with an error, buffer 0x2b10f22bd010, length 0, offset=0, eof=true
debug: operation complete

error: globus_ftp_client: the server responded with an error
500 Server is not configured for SSHFTP connections.

I added debug and log option in the file ~/.globus/sshftp :
exec $GLOBUS_LOCATION/sbin/globus-gridftp-server -log-level all -logfile /scratch/lsftmp/gridftp.log -debug -ssh

And I could seean issue  in the log with the remote hostname (with 5.0.5, i have "New connection from: 0.0.0.0")

globus_gfork: GFork error: Env not set

[32268] Tue Feb 19 14:35:40 2013 :: No configuration file found.
[32268] Tue Feb 19 14:35:40 2013 :: Server started in inetd mode.
[32268] Tue Feb 19 14:35:40 2013 :: Couldn't get remote IP address.  Possibly using a non-tcp protocol.
[32268] Tue Feb 19 14:35:40 2013 :: New connection from: O     +
[32268] Tue Feb 19 14:35:40 2013 :: Couldn't get local contact.  Possibly using a non-tcp protocol.
[32268] Tue Feb 19 14:35:40 2013 :: Couldn't get local contact.  Possibly using a non-tcp protocol.
[32268] Tue Feb 19 14:35:40 2013 :: Couldn't enable TCP_NODELAY.  Possibly using a non-tcp protocol.
[1214] Tue Feb 19 14:37:25 2013 :: GFork functionality not enabled.:

When i tried to launch directly the wrapper gridftp-ssh, i obtain a segmentation fault with a core dump

$  ${GLOBUS_LOCATION}/share/globus/gridftp-ssh sshftp://127.0.0.1/tmp/group 127.0.0.1 22
500 Server is not configured for SSHFTP connections.

ksh: line 1: 12017: Memory fault

The backtrace of the core file dump from gdb :

#0  0x00002b36bc7fe027 in free () from /lib64/libc.so.6
#1  0x0000000000404330 in globus_l_gfs_new_server_cb (handle=0x53cc80, result=18, user_arg=0x0) at globus_gridftp_server.c:740
#2  0x00002b36bb986ba8 in globus_l_xio_open_close_callback_kickout (user_arg=0x53cd60) at globus_xio_handle.c:998
#3  0x00002b36bb986a62 in globus_l_xio_open_close_callback (op=0x53cd60, result=0, user_arg=0x0) at globus_xio_handle.c:965
#4  0x00002b36bb993601 in globus_l_xio_driver_open_op_kickout (user_arg=0x53cd60) at globus_xio_driver.c:906
#5  0x00002b36bbf15c1e in globus_callback_space_poll_nothreads (timestop=0x2b36bbf40e10, space=-2) at globus_callback_nothreads.c:1437
#6  0x00002b36bbf139e6 in globus_callback_space_poll (timestop=0x2b36bbf40e10, space=-2) at globus_callback.c:252
#7  0x00002b36bbf3de2c in globus_l_thread_none_cond_wait (cv=0x507e80, mut=0x507ec0) at globus_thread_none.c:371
#8  0x00002b36bbf33a9c in globus_cond_wait (cond=0x507e80, mutex=0x507ec0) at globus_thread.c:585
#9  0x000000000040604e in main (argc=7, argv=0x7fffc950ea88) at globus_gridftp_server.c:1863

Comments

Globus Toolkit/GT-365

Summary

Switch sharing user identification from DN to CERT

Details

Type: Task

Status: Resolved 2013-06-26

Description

Callouts may need extensions from the certs of shared users in order to identify them.  Drop USER , add USER CERT.

Comments

Globus Toolkit/GT-366

Summary

Delegation failures due to modification to wrong authinfo object in globus-ftp-control

Details

Type: Bug

Status: Resolved 2013-04-03

Description

Delegation starts failing after updating to GT 5.2.4

Patch to fix the issue attached.

Comments

Mike Link - 2013-02-27

Ugh.  Thanks.

Mike Link - 2013-02-27

Fix committed.

Mike Link - 2013-04-03

globus_ftp_control-4.6 update package added to http://www.globus.org/toolkit/advisories.html

Globus Toolkit/GT-367

Summary

segfault during globus_gsi_proxy_handle_destroy() because of problem in globus_gsi_proxy_create_req()

Details

Type: Bug

Status: Open

Description

if globus_gsi_proxy_create_req(h) encounters an error, it jumps to "error_exit" which then frees the rsa key.  however, it does not clear the pointer to the RSA from handle->proxy_key->pkey (or reassign the RSA to null).

when a client later calls globus_gsi_proxy_handle_destroy(h), that function calls EVP_PKEY_free(handle->proxy_key), which also attempts to free the rsa, resulting in a double free and likely segfault.


i haven't looked at it too closely, but here are two options for suggested fixes in the body of globus_gsi_proxy_create_req ()
1) use EVP_PKEY_set1_RSA() in place of EVP_PKEY_assign_RSA(), assuming this copies instead of references the key.
2) in error_exit, call EVP_PKEY_assign(handle->proxy_key, NULL) to clear the pkey so it is not freed again by globus_gsi_proxy_handle_destroy().

i will of course leave the actual solution to someone who knows more about it than i do.

i am able to reproduce this by passing in a proxy handle that has the handle->attrs->keybits set to 128.  then, when X509_REQ_sign() fails, it causes the double free which manifests later when cleaning up using globus_gsi_proxy_handle_destroy().

Comments

Globus Toolkit/GT-368

Summary

GridFTP syslog transfer stats concatenated and truncated

Details

Type: Bug

Status: Resolved 2013-07-16

Description

When logging transfer stats to syslog ("-log-level TRANSFER -log-module syslog"), events that happen within the same second are concatenated to the same line and if the line exceeds a certain length, the line is truncated. This presents a problem for us since we collect this data for analysis and error detection. An example follows.

Feb 26 12:57:24 gap-md1 globus-gridftp-server[10660]: Transfer stats: DATE=20130226185724.074982 HOST=gap1.ncsa.illinois.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20130226185724.000367 USER=rbrunner FILE=/u/staff/rbrunner/MSS/jyc1-20120213/devel/charm-git/src/ck-core/ckmarshall.ci BUFFER=87380 BLOCK=262144 NBYTES=77 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.142.31.99] TYPE=STOR CODE=226 Transfer stats: DATE=20130226185724.161207 HOST=gap1.ncsa.illinois.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20130226185724.100013 USER=rbrunner FILE=/u/staff/rbrunner/MSS/jyc1-20120213/devel/charm-20120118/src/arch/lapi/cc-mpcc64.h BUFFER=87380 BLOCK=262144 NBYTES=78 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[0.0.0.0] TYPE=STOR CODE=226 Transfer stats: DATE=20130226185724.253174 HOST=gap1.ncsa.illinois.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20130226185724.185261 USER=rbrunner FILE=/u/staff/rbrunner/MSS/jyc1-20120213/devel/charm-20120104/src/arch/lapi/cc-mpcc64.h BUFFER=87380 BLOCK=262144 NBYTES=78 VO

Comments

Globus Toolkit/GT-369

Summary

GRAM5 skips some SEG events for PBS batch system

Details

Type: Bug

Status: Open

Description

The Purdue site is using PBS as one of the batch system. We have noticed for a given local pool account all the jobs from PBS batch system are not communicating to globus job manager. These jobs finished successfully on the batch system.  But, the globus job manager had not recieved concerning the ending of jobs in the batch system.   Admin of the CE had received following mail

"""

PBS Job Id: 1492855.carter-adm.rcac.purdue.edu
Job Name:   STDIN
Exec host:  carter-a010/9
An error has occurred processing your job, see below.
Post job file processing error; job 1492855.carter-adm.rcac.purdue.edu on host carter-a010/9


"""
Above mail don't depicts valuable information.  'globus-seg-pbs'   log contains following messages

"""
[jha2@carter-osg globus-seg-pbs]$ grep -r '1492855' .
./20130228:001;1362056658;1492855.carter-adm.rcac.purdue.edu;2;0
./20130228:001;1362056985;1492855.carter-adm.rcac.purdue.edu;2;0
./20130228:001;1362057101;1492855.carter-adm.rcac.purdue.edu;8;0
[jha2@carter-osg globus-seg-pbs]$

"""




Site is using following list of  rpms from globus
***
[jha2@carter-osg ~]$ rpm -qa | grep globus
globus-gsi-openssl-error-2.1-4.osg.el6.x86_64
globus-rsl-9.1-4.osg.el6.x86_64
globus-gfork-3.1-4.osg.el6.x86_64
globus-gridftp-server-6.5-1.7.osg.el6.x86_64
globus-gsi-cert-utils-8.1-4.osg.el6.x86_64
globus-io-9.2-3.osg.el6.x86_64
globus-xio-devel-3.2-4.1.osg.el6.x86_64
globus-authz-2.1-4.osg.el6.x86_64
globus-gass-copy-8.2-4.osg.el6.x86_64
globus-gsi-cert-utils-progs-8.1-4.osg.el6.x86_64
globus-gss-assist-8.1-4.osg.el6.x86_64
globus-xio-gsi-driver-devel-2.1-4.osg.el6.x86_64
globus-core-8.5-2.osg.el6.x86_64
globus-gatekeeper-9.6-1.7.osg.el6.x86_64
globus-gridmap-callout-error-1.2-2.osg.el6.x86_64
globus-gsi-sysconfig-5.1-4.osg.el6.x86_64
globus-gram-protocol-11.2-3.1.osg.el6.x86_64
globus-gsi-proxy-ssl-devel-4.1-4.osg.el6.x86_64
globus-gsi-callback-devel-4.1-4.osg.el6.x86_64
globus-gass-copy-progs-8.2-4.osg.el6.x86_64
globus-gsi-proxy-core-devel-6.1-4.osg.el6.x86_64
globus-gram-job-manager-callout-error-2.1-4.osg.el6.x86_64
globus-common-14.5-2.2.osg.el6.x86_64
globus-gsi-proxy-core-6.1-4.osg.el6.x86_64
globus-usage-3.1-4.osg.el6.x86_64
globus-gram-job-manager-pbs-1.1-4.1.osg.el6.x86_64
globus-gsi-sysconfig-devel-5.1-4.osg.el6.x86_64
globus-gram-job-manager-fork-setup-poll-1.0-8.osg.el6.noarch
cog-jglobus-axis-1.2-1.osg.el6.noarch
globus-callout-devel-2.1-4.osg.el6.x86_64
globus-gridftp-server-control-2.3-1.1.osg.el6.x86_64
globus-gsi-credential-5.1-4.osg.el6.x86_64
globus-common-devel-14.5-2.2.osg.el6.x86_64
globus-ftp-control-4.2-6.osg.el6.x86_64
globus-scheduler-event-generator-4.4-1.osg.el6.x86_64
globus-gsi-cert-utils-devel-8.1-4.osg.el6.x86_64
globus-gass-server-ez-4.1-4.osg.el6.x86_64
globus-gass-cache-program-5.0-5.osg.el6.x86_64
globus-xio-pipe-driver-2.1-4.osg.el6.x86_64
globus-gsi-proxy-ssl-4.1-4.osg.el6.x86_64
globus-callout-2.1-4.osg.el6.x86_64
globus-gsi-openssl-error-devel-2.1-4.osg.el6.x86_64
globus-gram-job-manager-scripts-4.1-3.1.osg.el6.noarch
globus-authz-callout-error-2.1-4.osg.el6.x86_64
globus-xio-popen-driver-2.2-3.osg.el6.x86_64
globus-gram-client-tools-10.0-5.osg.el6.x86_64
globus-scheduler-event-generator-progs-4.4-1.osg.el6.x86_64
globus-gssapi-error-devel-4.1-4.osg.el6.x86_64
globus-gridftp-server-progs-6.5-1.7.osg.el6.x86_64
globus-scheduler-event-generator-devel-4.4-1.osg.el6.x86_64
globus-gssapi-gsi-10.7-2.osg.el6.x86_64
globus-common-progs-14.5-2.2.osg.el6.x86_64
globus-proxy-utils-5.0-5.osg.el6.x86_64
globus-gram-client-12.3-3.osg.el6.x86_64
globus-gram-job-manager-fork-1.0-8.osg.el6.x86_64
globus-gss-assist-devel-8.1-4.osg.el6.x86_64
globus-xio-3.2-4.1.osg.el6.x86_64
globus-io-devel-9.2-3.osg.el6.x86_64
globus-openssl-module-3.1-4.osg.el6.x86_64
globus-xio-gsi-driver-2.1-4.osg.el6.x86_64
globus-gram-job-manager-pbs-setup-seg-1.1-4.1.osg.el6.x86_64
globus-ftp-client-7.2-4.osg.el6.x86_64
globus-gsi-callback-4.1-4.osg.el6.x86_64
globus-gass-transfer-7.1-4.osg.el6.x86_64
globus-openssl-module-devel-3.1-4.osg.el6.x86_64
globus-gsi-credential-devel-5.1-4.osg.el6.x86_64
globus-gass-cache-8.1-4.osg.el6.x86_64
globus-gssapi-gsi-devel-10.7-2.osg.el6.x86_64
globus-gram-job-manager-13.45-1.1.osg.el6.x86_64
globus-gssapi-error-4.1-4.osg.el6.x86_64
cog-jglobus-1.8.0-1.osg.el6.noarch
globus-gram-protocol-devel-11.2-3.1.osg.el6.x86_64
[jha2@carter-osg ~]$

***

Globus version is '5.2.0' .



Thanks,
Manoj

Comments

Globus Toolkit/GT-370

Summary

Add symlink from jobmanager to jobmanager-folk

Details

Type: Improvement

Status: Open

Description

User request from XSede user from Gram5 installation

Todd L Miller  writes:
>       I've been using HTCondor to access Stampede, and I noticed a
>small inconsistency: Stampede offers GRAM 5, but HTCondor treats it as a
>GRAM 2 site.  Everything works, except that I can't get the return status
>of my job back.  The reason, I am told by the developers, is that HTCondor
>checks the GRAM version number, it asks the 'jobmanager' resource, as
>opposed to the one I actually specified, e.g.
>
>login5.stampede.tacc.utexas.edu:2119/jobmanager
>
>instead of
>
>login5.stampede.tacc.utexas.edu:2119/jobmanager-fork
>
>       Although I can work around this on the client side (by changing
>the HTCondor configuration to trust me, rather than probing for the
>version), it would be easier for me, and probably less confusing for other
>of your users, to symlink
>
>/opt/apps/xsede/gram5-5.2.3/etc/grid-services/jobmanager-fork
>
>to
>
>/opt/apps/xsede/gram5-5.2.3/etc/grid-services/jobmanager
>
>I am told this is what OSG does, although other sites will symlink
>'jobmanager' to their preferred default service instead.
>
>       I've cc'd the HTCondor developer I've spoken with in case you have
>any questions.  Thank you.
>
>- Todd

Comments

Globus Toolkit/GT-371

Summary

Build of gt 5.4.2 fails to use --includedir flag

Details

Type: Improvement

Status: Open

Description

Building from source on Solaris, we have libtool installed a custom location $ROOT. We tried two different configure commands:

 ./configure --prefix=${ROOT} --with-flavor=gcc32dbgpthr

and

 ./configure --prefix=${ROOT} --with-flavor=gcc32dbgpthr --includedir=$ROOT/include

In both cases, the build fails with an error "ltdl.h not found". The problem is that $ROOT/include is not passed to the gcc command. It is possible to complete the build by manual editing of Makefiles and appending the $ROOT/include to the appropriate variables.

Comments

Globus Toolkit/GT-372

Summary

Assemble comprehensive and readable firewall documentation for Globus Online scenarios

Details

Type: Task

Status: Open

Description

The firewall documentation we have is sorely lacking.  There is nothing (that I could find) regarding GridFTP servers and firewalls that discussed firewalling outbound connections (which some sites want to/must do), aside from a mention that servers "should have outbound connections enabled on all ports).  No discussion of the different scenarios when using Globus Online (when the server is sending data, when it is receiving data, when an MLSD is sent, etc).  No discussion of Globus Connect.

Comments

Eric Blau - 2013-03-12

As a potential starting point, I'm pasting in part of an email exchange.  The quoted bits were written by me, the top section by Mike Link, in response.



That's all correct.  One additional point:  There are a few sites we
know of that attempt to restrict outbound connections, and the method
they use is to set GLOBUS_TCP_SOURCE_RANGE on their GridFTP servers, and
have their firewall key on the source ports of the outbound connections.
  So the firewall would be configured to allow a connection to any
remote host if the source port is between 60000-61000, and the GridFTP
server will bind to those source ports for any outbound connections.

Mike

On 3/12/2013 9:35 AM, Eric Blau wrote:
> Here's the issue, as I understand it.  I've cc:ed Mike Link and Karl Pickett who are GridFTP/Globus Online developers in the hopes that they'll chime in if I am making any major technical misstatements.
>
>
> GLOBUS_TCP_PORT_RANGE specifies the range of ports that are open in the firewall for GridFTP to use for _incoming_ connections.
> GLOBUS_TCP_SOURCE_RANGE specifies the range of local, source ports to use when connecting to a remote destination.
>
> In Globus Online, in a mode E transfer, the side receiving the data is always the one to which the data channel connections will be made.  Thus, it gets to decide, according to it's GLOBUS_TCP_PORT_RANGE,
> what ports the sending side will connect to.
>
> For example:  trying to transfer a file from pcmdi9.llnl.gov to tg-steele.purdue.teragrid.org:
>
> Steele is the receiving side.  Its GLOBUS_TCP_PORT_RANGE is set to 50000,51000, so it tells pcmdi9
> to connect to it at 128.211.128.46:50673 and send the file.  In this scenario, to my understanding, the fact that pcmdi9 has GLOBUS_TCP_PORT_RANGE and GLOBUS_TCP_SOURCE_RANGE set to 60000,61000 is irrelevant.  pcmdi9's GLOBUS_TCP_PORT_RANGE doesn't come into play at all here, and SOURCE_RANGE refers to the local, source ports, not the remote destination port (steele:50673).
>
> So, the question here is:  is pcmdi9 behind a firewall that restricts outbound connections by remote destination port?  It seems like it is.  If it is, and we want Globus Online transfers to be fully functional, we're going to have to open up the firewall to let it make outbound connections in _at_least_ the 50000,51000 range of remote destination ports.
>
> Eric
>
>
> (Note:  Globus Connect endpoints are handled differently, as it is assumed that one using Globus Connect has less control over the firewall issues:  It is my understanding that a Globus Connect endpoint will always be the one to connect to the other server, regardless of direction of data transfer).

Globus Toolkit/GT-373

Summary

gridmap_eppn_callout doesn’t work with proxies

Details

Type: Bug

Status: Resolved 2015-12-15

Description

The globus_gridmap_eppn_callout will fail with a mysterious "gridmap lookup error" if the client attempts to use a proxy of a CILogon credential. It looks like the X509_verify_cert() call is returning X509_V_ERR_UNABLE_TO_GET_ISSUER_CERT_LOCALLY. Related to this, the result is cleared prior to doing the gridmap lookup, so the error message is lost before it comes to the user.

Comments

Bryce Allen - 2015-11-24

Shouldn't this be using globus_gsi_cred_verify_cert_chain, instead of the plain openssl verify routine? That being said, setting OPENSSL_ALLOW_PROXY_CERTS in the environment might work.

Bryce Allen - 2015-11-24

This is the vanilla openssl doc for proxy certs:
https://github.com/openssl/openssl/blob/master/doc/HOWTO/proxy_certificates.txt

Bryce Allen - 2015-11-24

Looks like it's assuming a single ca cert for verification, from env GLOBUS_MYPROXY_CA_CERT, and extracts only the 0th from the input cert to verify against that CA. Makes no attempt to setup a verification context with the full chain. Seems like this would fail if there were intermediate CAs, not just proxies.

Mike Link - 2015-12-15

Fixed in globus-gridmap-eppn-callout-1.9.

Globus Toolkit/GT-374

Summary

Can’t share files in a path structure with symlinks: path not allowed error

Details

Type: Bug

Status: Resolved 2013-10-15

Description

Rachana and Eric ran into this.  A path like:

/esg -> /disks/space0/esg

and a sharing chroot path of /esg/some/subdir
will not work - path not allowed.  Simply changing the full path of the chroot to never have symlinks (/disks/space0/esg/) fixes the problem.

The SITE RESTRICT command was simply R/.
The MLST ~/ command was what failed.

Comments

Karl Pickett - 2013-03-19

also, both -rp and -sharing-rp were empty.

Globus Toolkit/GT-375

Summary

Gsi-openssh fails in get_mic wth null pointer referance OpenSuSe 12.2

Details

Type: Bug

Status: Open

Description

I built the current globus build for SuSE 12.2 from source rpms. I ran across a issue i believe related to bug GT-363 as in gis-openssh it failed with a null pointer inside of get_mic(). I followed the recomendation in GT363 and patched the get_mic.c and verifiy_mic.c which in fact fixed the Null pointer referance but now  i get

... from the client .....
debug3: authmethod_is_enabled gssapi-with-mic
debug1: Next authentication method: gssapi-with-mic
debug2: we sent a gssapi-with-mic packet, wait for reply
debug1: Delegating credentials
debug1: Delegating credentials
debug1: Delegating credentials
debug1: Delegating credentials
debug1: GSS Major Status: General failure

GSS Minor Status Error Chain:
globus_gsi_gssapi: Out of memory: Success
globus_gsi_gssapi: A system call failed: Success


debug2: we did not send a packet, disable method
debug1: No more authentication methods to try.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password,keyboard-interactive).
.... from the server....
debug3: mm_request_receive entering
debug3: monitor_read: checking request 39
debug1: Got no client credentials
debug3: mm_request_send entering: type 40
debug3: mm_request_send entering: type 39 [preauth]
debug3: mm_request_receive_expect entering: type 40 [preauth]
debug3: mm_request_receive entering [preauth]
debug3: mm_request_receive entering
debug3: monitor_read: checking request 39
debug1: Got no client credentials
debug3: mm_request_send entering: type 40
debug3: mm_request_send entering: type 39 [preauth]
debug3: mm_request_receive_expect entering: type 40 [preauth]
debug3: mm_request_receive entering [preauth]
debug3: mm_request_receive entering
debug3: monitor_read: checking request 39
debug1: Got no client credentials
debug3: mm_request_send entering: type 40
Connection closed by 192.168.0.2 [preauth]
debug1: do_cleanup [preauth]
debug1: monitor_read_log: child log fd closed
debug3: mm_request_receive entering
debug1: do_cleanup
debug1: Killing privsep child 6040



on the client loggin into the same machine w/ gsissh -vvvv

I needed to add the -bits 2048 to the grid-proxy-init line to avoid other issues comming from openssl 1.0.1 i believe

Attached are the changes i make to globus_gssapi_gsi-10.7

the globus_gssapi_gsi.patch.1.0.1 is a patch to fix a problem with a older application gssklogd  , When getting an context lifetime, the returned value is the maximum of either the local cert or the peers cert. It would seem the returned value should be minimum of those two as the context will become invalid when the shortest of the two expires. ?

If i did patch this correctly else can i do to correct the gssapi  mic issue?  or does the problem lie in gsi-openssh ?
I should note that other services so seem to work correctly gram seems ok , as well as gsiftp and with some finesse so does gssklog

Comments

mcoyne - 2013-03-13

Some additional informatiion

mcoyne@oldtimer:/usr/include/openssl> openssl version
OpenSSL 1.0.1e 11 Feb 2013

As far as i can tell i think i am getting the following  NID_aes_256_gcm for the switch statement added in gss_get_mic.c , what , if any hash EVP_md5 EVP_sha1 is there EVP_gsm?

from openssl's obj_mac.h

#define SN_aes_256_gcm          "id-aes256-GCM"
#define LN_aes_256_gcm          "aes-256-gcm"
#define NID_aes_256_gcm         901
#define OBJ_aes_256_gcm         OBJ_aes,46L



gdb /usr/bin/gsissh -vvv ...

Breakpoint 2, gss_get_mic (minor_status=minor_status@entry=0x6a7c84,
    context_handle=0x6a80f0, qop_req=qop_req@entry=0,
    message_buffer=message_buffer@entry=0x7fffffffb5f0,
    message_token=message_token@entry=0x7fffffffb5e0) at get_mic.c:153
153         hash = context->gss_ssl->write_hash->digest;
(gdb) print context->gss_ssl->write_hash->digest
$1 = (const EVP_MD *) 0x0
(gdb) s

Breakpoint 4, gss_get_mic (minor_status=minor_status@entry=0x6a7c84,
    context_handle=0x6a80f0, qop_req=qop_req@entry=0,
    message_buffer=message_buffer@entry=0x7fffffffb5f0,
    message_token=message_token@entry=0x7fffffffb5e0) at get_mic.c:161
161         if(hash == NULL)
(gdb) s
164              switch(EVP_CIPHER_CTX_nid(cctx))
(gdb) print EVP_CIPHER_CTX_nid(cctx)
$2 = 901
(gdb) s
178              GLOBUS_GSI_GSSAPI_MALLOC_ERROR(minor_status);
(gdb) list
173         #endif
174         #endif
175         if(hash == NULL)
176              {
177              /* Shouldn't happen: some error occurred */
178              GLOBUS_GSI_GSSAPI_MALLOC_ERROR(minor_status);
179              major_status = GSS_S_FAILURE;
180              goto unlock_mutex;
181              }
182         md_size = EVP_MD_size(hash);
(gdb) print minor_status
$3 = (OM_uint32 *) 0x6a7c84
(gdb) help s
Step program until it reaches a different source line.
Argument N means do this N times (or till program stops for another reason).
(gdb) list 160
155         /* Some versions of OpenSSL use special ciphers which
156         * combine HMAC with the encryption operation:
157         * for these ssl->write_hash is NULL.
158         * If the cipher context is one of these set the

mcoyne - 2013-03-23

By adding this
+              case NID_aes_128_gcm:
+              case NID_aes_192_gcm:
+              case NID_aes_256_gcm: hash = EVP_md_null();
+                                              break;
to the patch proposed in GT-363 and correcting some divide by zero errors , i got gsissh to work arround the  the gsm crypto type.

Globus Toolkit/GT-376

Summary

globus_ftp_control_data_query_channels() SIGSEGV on ABRT

Details

Type: Bug

Status: Open

Description

On connection abort, a call to globus_gridftp_server_get_optimal_concurrency() causes a segfault in globus_ftp_control_data_query_channels() because the transfer handle is null (line 2316):

2314     globus_mutex_lock(&dc_handle->mutex);
2315     {
2316         if(stripe_ndx >= transfer_handle->stripe_count)
2317         {
2318             res = globus_error_put(globus_error_construct_string(
2319                       GLOBUS_FTP_CONTROL_MODULE,
2320                       GLOBUS_NULL,
2321                       "Invalid Stripe index."));
2322         }
2323         else
2324         {
2325             stripe = &transfer_handle->stripes[stripe_ndx];
2326             *num_channels = stripe->connection_count;
2327         }
2328     }
2329     globus_mutex_unlock(&dc_handle->mutex);

Looking at the globus_gfs_operation_t Op:

op->state = GLOBUS_L_GFS_DATA_ABORTING
op->type  = GLOBUS_L_GFS_DATA_INFO_TYPE_RECV
op->data_handle->state = GLOBUS_L_GFS_DATA_HANDLE_TE_PRE_CLOSED
op->data_handle->data_channel->dc_handle->state = GLOBUS_FTP_DATA_NONE
op->data_handle->data_channel->cc_handle->cc_state = GLOBUS_FTP_CONTROL_UNCONNECTED

This all happens in a short amount of time. The call sequence is:

DSI's recv_func()
globus_gridftp_server_begin_transfer()
globus_gridftp_server_get_optimal_concurrency()

I suspect that this has to do with how GO queues commands to a service and the abort can occur out of band. The DSI is preparing for the STOR while the server is already tearing it down.

Normally I would consider this to be minor since the service is apparently shutting down. However, with GO this is
happening quite often and leaving behind many many core files.

Comments

Globus Toolkit/GT-377

Summary

abort() in globus_ftp_control_handle_destroy() → globus_list_remove()

Details

Type: Bug

Status: Open

Description

On connection shutdown, we get cores with the following trace:

#4  0x00b3bdf0 in raise () from /lib/libc.so.6
#5  0x00b3d701 in abort () from /lib/libc.so.6
#6  0x00b3526b in __assert_fail () from /lib/libc.so.6
#7  0x558fdfab in globus_list_rest (head=0x0) at globus_list.c:95
#8  0x558fe8e7 in globus_list_remove (headp=0x555eb700, entry=0x56132370) at globus_list.c:474
#9  0x555c899d in globus_ftp_control_handle_destroy (handle=0x56133cf0) at globus_ftp_control_client.c:261
#10 0x55576531 in globus_l_gfs_data_handle_free (data_handle=0x56133ca8) at globus_i_gfs_data.c:4975
#11 0x55575d3f in globus_l_gfs_data_fc_return (op=0x5613a120) at globus_i_gfs_data.c:4758
#12 0x55575e86 in globus_l_gfs_data_complete_fc_cb (callback_arg=0x5613a120, ftp_handle=0x56133cf0, error=0x0) at globus_i_gfs_data.c:4795
#13 0x555da6b4 in globus_l_ftp_control_close_kickout (user_args=0x56133cf0) at globus_ftp_control_data.c:7450
#14 0x558f0d79 in globus_l_callback_thread_poll (user_arg=0x559196a0) at globus_callback_threads.c:2512
#15 0x5590b677 in globus_l_thread_pool_thread_start (user_arg=0x5611bf80) at globus_thread_pool.c:284
#16 0x5572f89b in thread_starter (temparg=0x805223c) at globus_thread_pthreads.c:285
#17 0x55a92852 in start_thread () from /lib/libpthread.so.0
#18 0x00be584e in clone () from /lib/libc.so.6

Comments

Jason Alt - 2013-04-01

It appears as though if a problem arises with the transfer after calling globus_gridftp_server_begin_transfer(), then the data handle is free'd twice:

Normal tear down:
#0  globus_ftp_control_handle_destroy (handle=0x7fffe404b8e8) at globus_ftp_control_client.c:233
#1  0x00007ffff7b91ec9 in globus_l_gfs_data_handle_free (data_handle=0x7fffe404b870) at globus_i_gfs_data.c:5056
#2  0x00007ffff7b9553f in globus_i_gfs_data_request_handle_destroy (ipc_handle=, in_session_arg=0x7fffe401cba0, data_arg=0x1) at globus_i_gfs_data.c:5229
#3  0x00007ffff7bb6bff in globus_l_gfs_request_data_destroy (user_data_arg=0x1, user_arg=0x7fffec000eb0) at globus_i_gfs_control.c:2736
#4  0x00007ffff7519716 in globus_l_gsc_user_data_destroy_cb_kickout (user_arg=0x7fffe404ee70) at globus_gridftp_server_control.c:1612
#5  0x00007ffff4dd71cc in globus_l_callback_thread_poll (user_arg=0x7ffff5005820) at globus_callback_threads.c:2512
#6  0x00007ffff4def0dd in globus_l_thread_pool_thread_start (user_arg=0x60beb0) at globus_thread_pool.c:222
#7  0x00007ffff25d803b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#8  0x00007ffff4923851 in start_thread () from /lib64/libpthread.so.0
#9  0x00007ffff46716dd in clone () from /lib64/libc.so.6

Second tear down via globus_i_gfs_data_request_transfer_event():
#0  globus_ftp_control_handle_destroy (handle=0x7fffe404b8e8) at globus_ftp_control_client.c:233
#1  0x00007ffff7b91ec9 in globus_l_gfs_data_handle_free (data_handle=0x7fffe404b870) at globus_i_gfs_data.c:5056
#2  0x00007ffff7b9211d in globus_l_gfs_data_fc_return (op=0x7fffe404eec0) at globus_i_gfs_data.c:4839
#3  0x00007ffff7b96b1c in globus_l_gfs_data_complete_fc_cb (callback_arg=0x7fffe404eec0, ftp_handle=, error=) at globus_i_gfs_data.c:4876
#4  0x00007ffff4dd71cc in globus_l_callback_thread_poll (user_arg=0x7ffff5005820) at globus_callback_threads.c:2512
#5  0x00007ffff4def0dd in globus_l_thread_pool_thread_start (user_arg=0x60c010) at globus_thread_pool.c:222
#6  0x00007ffff25d803b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#7  0x00007ffff4923851 in start_thread () from /lib64/libpthread.so.0
#8  0x00007ffff46716dd in clone () from /lib64/libc.so.6

Jason Alt - 2015-01-08

I spent a significant amount of time debugging this over Christmas break. It turns out that in order to support restart markers with the HPSS DSI (see also GT-517), I need to send the restart marker on error (it's the only chance I have to send it). Unfortunately, this bug causes the server to crash before I can get the restart marker off so this issue is of greater importance to us then originally believed.

This was all verified with GT6. The main culprit seems to be this function here:

globus_gridftp_server_control-3.6/globus_gridftp_server_control_events.c:globus_l_gsc_event_done_cb()

static
void
globus_l_gsc_event_done_cb(
    void *                              user_arg)
{
    globus_i_gsc_op_t *                 op;
    globus_i_gsc_event_data_t *         event;
    globus_i_gsc_server_handle_t *      server_handle;

    op = (globus_i_gsc_op_t *) user_arg;
    event = &op->event;
    server_handle = op->server_handle;

    event->user_cb(
        op,
        GLOBUS_GRIDFTP_SERVER_CONTROL_EVENT_TRANSFER_COMPLETE,
        event->user_arg);

    if(event->stripe_total != NULL)
    {
        globus_free(event->stripe_total);
    }

    globus_mutex_lock(&server_handle->mutex);
    {
        if(op->data_destroy_obj)
        {
            globus_i_guc_data_object_destroy(
                op->server_handle, op->data_destroy_obj);
        }
        globus_i_gsc_op_destroy(op);
    }
    globus_mutex_unlock(&server_handle->mutex);
}

Both the event->user_cb() and globus_i_guc_data_object_destroy() start off callbacks that eventually result in calls to globus_l_gfs_data_handle_free(). I believe this is not a problem that is unique to transfers ending with error; I believe ABRT will result in the same effect as will anything that would require a call to globus_ftp_control_data_force_close(). It seems the error handling logic in globus_gridftp_server-7.17/globus_i_gfs_data.c is duplicating the 'clean' shutdown logic.

I don't know the intent of the original author but I took a stab at a fix in order to verify my assumptions. The server does not crash on 'transfer ends in error" with the following change:

globus_gridftp_server-7.17/globus_i_gfs_data.c:globus_l_gfs_data_fc_return()

static
void
globus_l_gfs_data_fc_return(
    globus_l_gfs_data_operation_t *     op)
{
    GlobusGFSName(globus_l_gfs_data_fc_return);

    GlobusGFSDebugEnter();

    switch(op->data_handle->state)
    {
        case GLOBUS_L_GFS_DATA_HANDLE_CLOSING:
            op->data_handle->state = GLOBUS_L_GFS_DATA_HANDLE_TE_PRE_CLOSED;
            break;

        case GLOBUS_L_GFS_DATA_HANDLE_CLOSING_AND_DESTROYED:
            /* ok free it */
#ifdef NOT
            globus_l_gfs_data_handle_free(op->data_handle);
            op->data_handle = NULL;
#endif
            break;

        case GLOBUS_L_GFS_DATA_HANDLE_TE_PRE_CLOSED:
        case GLOBUS_L_GFS_DATA_HANDLE_VALID:
        case GLOBUS_L_GFS_DATA_HANDLE_INUSE:
        case GLOBUS_L_GFS_DATA_HANDLE_CLOSED:
        default:
            globus_assert(0 && "possible memory corruption");
            break;
    }

    GlobusGFSDebugExit();
}

This takes globus_l_gfs_data_handle_free() out of the error logic. It may result in a memory leak or something worse since "op->data_handle = NULL" is also skipped but it works.

Globus Toolkit/GT-378

Summary

abort() in globus_i_gfs_data_session_stop()

Details

Type: Bug

Status: Open

Description

full backtrace plus dump of session handle attached.

Thread 1 (Thread 0x560abb90 (LWP 10050)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00b3bdf0 in raise () from /lib/libc.so.6
#2  0x00b3d701 in abort () from /lib/libc.so.6
#3  0x55906e27 in globus_silent_fatal () at globus_print.c:62
#4  0x55906e9d in globus_fatal (msg=0x559186cf "%s %s\n%s %s") at globus_print.c:93
#5  0x5590b251 in globus_i_thread_report_bad_rc (rc=22, message=0x557327c4 "GLOBUSTHREAD: pthread_mutex_lock() failed\n") at globus_thread_common.c:145
#6  0x55731ece in globus_l_pthread_mutex_lock (mut=0x806d184) at globus_thread_pthreads.c:594
#7  0x5590be06 in globus_mutex_lock (mutex=0x806d184) at globus_thread.c:357
#8  0x55581455 in globus_i_gfs_data_session_stop (ipc_handle=0x0, session_arg=0x806d0e0) at globus_i_gfs_data.c:8892
#9  0x555a0f03 in globus_l_gfs_channel_close_cb (handle=0x564290b0, result=0, user_arg=0x805d628) at globus_i_gfs_control.c:517
#10 0x556c6bff in globus_l_xio_open_close_callback_kickout (user_arg=0x8077a80) at globus_xio_handle.c:998
#11 0x556c6ac1 in globus_l_xio_open_close_callback (op=0x8077a80, result=0, user_arg=0x0) at globus_xio_handle.c:965
#12 0x556d2697 in globus_l_xio_driver_op_close_kickout (user_arg=0x8077a80) at globus_xio_driver.c:813
#13 0x558f2d79 in globus_l_callback_thread_poll (user_arg=0x5591b6a0) at globus_callback_threads.c:2512
#14 0x5590d444 in globus_l_thread_pool_thread_start (user_arg=0x8053aa0) at globus_thread_pool.c:222
#15 0x5573189b in thread_starter (temparg=0x8052204) at globus_thread_pthreads.c:285
#16 0x55a94852 in start_thread () from /lib/libpthread.so.0
#17 0x00be584e in clone () from /lib/libc.so.6

Comments

Jason Alt - 2013-03-24

This one seems to be related:

#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00b3bdf0 in raise () from /lib/libc.so.6
#2  0x00b3d701 in abort () from /lib/libc.so.6
#3  0x00b743ab in __libc_message () from /lib/libc.so.6
#4  0x00b7c6c5 in _int_free () from /lib/libc.so.6
#5  0x00b7cb09 in free () from /lib/libc.so.6
#6  0x5556f33b in globus_l_gfs_free_session_handle (session_handle=0x806d1b0) at globus_i_gfs_data.c:1026
#7  0x555814ed in globus_i_gfs_data_session_stop (ipc_handle=0x0, session_arg=0x806d1b0) at globus_i_gfs_data.c:8914
#8  0x555a0f03 in globus_l_gfs_channel_close_cb (handle=0x56429028, result=0, user_arg=0x805d6f8) at globus_i_gfs_control.c:517
#9  0x556c6bff in globus_l_xio_open_close_callback_kickout (user_arg=0x8077b50) at globus_xio_handle.c:998
#10 0x556c6ac1 in globus_l_xio_open_close_callback (op=0x8077b50, result=0, user_arg=0x0) at globus_xio_handle.c:965
#11 0x556d2697 in globus_l_xio_driver_op_close_kickout (user_arg=0x8077b50) at globus_xio_driver.c:813
#12 0x558f2d79 in globus_l_callback_thread_poll (user_arg=0x5591b6a0) at globus_callback_threads.c:2512
#13 0x5590d444 in globus_l_thread_pool_thread_start (user_arg=0x80539f8) at globus_thread_pool.c:222
#14 0x5573189b in thread_starter (temparg=0x80521f8) at globus_thread_pthreads.c:285
#15 0x55a94852 in start_thread () from /lib/libpthread.so.0
#16 0x00be584e in clone () from /lib/libc.so.6

Globus Toolkit/GT-379

Summary

Abort() in globus_i_gfs_data_request_recv()

Details

Type: Bug

Status: Open

Description

Thread 1 (Thread 0x560abb90 (LWP 32058)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00b3bdf0 in raise () from /lib/libc.so.6
#2  0x00b3d701 in abort () from /lib/libc.so.6
#3  0x5557aa0a in globus_i_gfs_data_request_recv (ipc_handle=0x0, session_arg=0x56143f80, id=0, recv_info=0x561b6508, cb=0x555a44e9 , event_cb=0x555a42d5 , user_arg=0x56159c50) at globus_i_gfs_data.c:5957
#4  0x555a4d8c in globus_l_gfs_request_recv (op=0x56165628, data_handle=0x1, path=0x5615dc00 "~/BW-10560/dumps/0002/FV--hires01/FV-hires01-0002.bob8iak", mod_name=0x0, mod_parms=0x0, range_list=0x561657b0, user_arg=0x561294e0) at globus_i_gfs_control.c:2291
#5  0x55616316 in globus_i_gsc_recv (op=0x56165628, path=0x56164650 "~/BW-10560/dumps/0002/FV--hires01/FV-hires01-0002.bob8iak", mod_name=0x0, mod_parms=0x0, transfer_cb=0x556204b5 , user_arg=0x56163ee8) at globus_gridftp_server_control.c:5043
#6  0x556206db in globus_l_gsc_cmd_transfer (wrapper=0x56163ee8) at globus_gridftp_server_control_commands.c:2696
#7  0x55621876 in globus_l_gsc_cmd_stor_retr (op=0x56165628, full_command=0x56165708 "STOR ~/BW-10560/dumps/0002/FV--hires01/FV-hires01-0002.bob8iak\r\n", cmd_a=0x561591f8, argc=2, user_arg=0x0) at globus_gridftp_server_control_commands.c:3131
#8  0x5560fc3a in globus_l_gsc_command_callout (user_arg=0x56165628) at globus_gridftp_server_control.c:2355
#9  0x558f2d79 in globus_l_callback_thread_poll (user_arg=0x5591b6a0) at globus_callback_threads.c:2512
#10 0x5590d444 in globus_l_thread_pool_thread_start (user_arg=0x8053aa0) at globus_thread_pool.c:222
#11 0x5573189b in thread_starter (temparg=0x8052204) at globus_thread_pthreads.c:285
#12 0x55a94852 in start_thread () from /lib/libpthread.so.0
#13 0x00be584e in clone () from /lib/libc.so.6

(gdb) print data_handle->state
$2 = GLOBUS_L_GFS_DATA_HANDLE_CLOSED

Comments

Globus Toolkit/GT-380

Summary

Abort in globus_l_gfs_new_server_cb()

Details

Type: Bug

Status: Open

Description

Thread 1 (Thread 0x560abb90 (LWP 2739)):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00b3bdf0 in raise () from /lib/libc.so.6
#2  0x00b3d701 in abort () from /lib/libc.so.6
#3  0x00b743ab in __libc_message () from /lib/libc.so.6
#4  0x00b7c6c5 in _int_free () from /lib/libc.so.6
#5  0x00b7cb09 in free () from /lib/libc.so.6
#6  0x0804b52f in globus_l_gfs_new_server_cb (handle=0x561291d0, result=0, user_arg=0x0) at globus_gridftp_server.c:734
#7  0x556c6bff in globus_l_xio_open_close_callback_kickout (user_arg=0x56129240) at globus_xio_handle.c:998
#8  0x556c6ac1 in globus_l_xio_open_close_callback (op=0x56129240, result=0, user_arg=0x0) at globus_xio_handle.c:965
#9  0x556d2ad6 in globus_l_xio_driver_open_op_kickout (user_arg=0x56129240) at globus_xio_driver.c:906
#10 0x558f2d79 in globus_l_callback_thread_poll (user_arg=0x5591b6a0) at globus_callback_threads.c:2512
#11 0x5590d444 in globus_l_thread_pool_thread_start (user_arg=0x8053aa0) at globus_thread_pool.c:222
#12 0x5573189b in thread_starter (temparg=0x8052204) at globus_thread_pthreads.c:285
#13 0x55a94852 in start_thread () from /lib/libpthread.so.0
#14 0x00be584e in clone () from /lib/libc.so.6

Comments

Globus Toolkit/GT-381

Summary

globus-url-copy client (GT5.2.4) fails to authenticate

Details

Type: Bug

Status: Resolved 2013-04-22

Description

We are seeing a access issue using the GT5.2.4 client (source build) and Globus Server versions 5.2.4 (ORNL) , 5.0.2 (ORNL) ,5.0.3 (NERSC). A client switch from GT5.2.4 to our previous build of GT5.0.4 (compiled using the same build script) allows the transfers to work properly with no other changes client or server side.


Errors Seen:

Server Side:

 [CLIENT]: USER :globus-mapping:
 [SERVER]: 331 Password required for :globus-mapping:.
 [CLIENT]: PASS
 [CLIENT]: PASS
 [SERVER]: 530 Login incorrect. : Access denied by configuration.

Client Side :

debug: data callback, error globus_ftp_client: the server responded with an error, buffer 0x7f23899a0010, length 0, offset=0, eof=true
debug: operation complete

error: globus_ftp_client: the server responded with an error
530 Login incorrect. : Access denied by configuration.

Comments

Mike Link - 2013-04-03

This issue has been fixed.  The globus_ftp_control-4.6 update package is now available at http://www.globus.org/toolkit/advisories.html

Globus Toolkit/GT-382

Summary

globus-url-copy GT 5.2.4 fails data transfers with server side error whereas GT 5.2.3 works fine

Details

Type: Bug

Status: Resolved 2013-04-22

Description

All attempts to transfer files fail using the GT 5.2.4 version of globus-url-copy when the GT 5.2.3 version succeeds using the same user proxy cert.  I've attached debug output from the globus-url-copy and the server side.  The client was built with:

./configure
make gridftp gsi-myproxy

Comments

Mike Link - 2013-04-03

This issue has been fixed.  The globus_ftp_control-4.6 update package is now available at http://www.globus.org/toolkit/advisories.html

cruff@ucar.edu - 2013-04-04

I am unable to build the contents of that tar archive against the contents
of /usr/local/globus-5.2.4.  How should I be applying this against the
5.2.4 source tree?

bells% ./configure --prefix=/usr/local/globus-5.2.4 --with-flavor=gcc64dbg
checking whether to enable maintainer-specific portions of Makefiles... no
ERROR: Flavor gcc64dbg has not been installed

bells% ls /usr/local/globus-5.2.4/share/globus
aclocal  config.guess  globus-args-parser-header     globus-build-env-noflavor.sh  globus-gram-protocol-constants.sh globus-sh-tools.sh       gpt-bootstrap.sh  packages
amdir    flavors       globus-bootstrap.sh           globus-gass-cache-util.pl     globus-job-manager-script.pl globus-sh-tools-vars.sh  gram-audit        sftp-server
bundles  gcc64dbg      globus-build-env-gcc64dbg.sh  globus_gram_job_manager       globus-script-initializer gpt                      gridftp-ssh       ssh-keysign

David Carver - 2013-04-05

Craig,
See section "Installing updates from source" at http://www.globus.org/toolkit/advisories.html

But you should first set your environment variable to point to the installed package

Example:
 export GLOBUS_LOCATION=/usr/local/globus-5.2.4
 export GPT_LOCATION=/usr/local/globus-5.2.4
 export PATH=$PATH:$GLOBUS_LOCATION/bin:$GLOBUS_LOCATION/sbin

 gpt-build -update globus_ftp_control-4.6.tar.gz gcc64dbg

cruff@ucar.edu - 2013-04-05

Ah, thanks.  I didn't even notice the text above the table nor that there was something below it.  It would be helpful if an INSTALL instructions file was included inside the update.

Globus Toolkit/GT-383

Summary

sshftp is broken in 5.2.4

Details

Type: Task

Status: Open

Description

globus-gridftp-server-enable-sshftp crashes in 5.2.4. Mike thinks this is broken when he fixed probably the other bug that resulted in 0.0.0.0 in the logs

Comments

Globus Toolkit/GT-384

Summary

GRAM mishandles long script responses

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The SLURM debugging hit a snag when a large block of log info was returned from the LRM adapter. The GRAM script response parser doesn't handle continued messages (longer than the first read) nor does it handle a message longer than the buffer size (it overwrites part of the message in either case).

Comments

Globus Toolkit/GT-385

Summary

Implement a native packaged GCMU

Details

Type: New Feature

Status: Resolved 2013-05-24

Description

Globus Online has provided a tarball version of GCMU for some time.  In order to support easier updates and other improvements, create a native packaged version of the Globus Connect Multiuser functionality.

Comments

Stuart Martin - 2013-05-24

An initial version has been made available for Beta testing.

Globus Toolkit/GT-386

Summary

Add support for a GRAM SLURM adapter in GT

Details

Type: New Feature

Status: Resolved 2013-10-22

Description

There is use of SLURM in XSEDE (TACC/Stampede), OSG and Europe.  Adding support for SLURM will help these communities by providing a single trusted implementation.

Comments

Joe Bester - 2013-09-10

This is checked into CVS and being built, though needs better testing. It has been tested with the slurm-llnl package on debian 7.

Globus Toolkit/GT-387

Summary

Bug 3934 is not fixed in Globus 5.x branches

Details

Type: Bug

Status: Resolved 2013-09-13

Description

http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=3934 is not fixed in newer versions,
it breaks --stdin_pass of myproxy-init

Comments

salvet - 2013-04-30

read_passphrase_stdin() in myproxy_read_pass.c has the same issue

Jim Basney - 2013-07-12

Fix for myproxy_read_pass.c committed to CVS:
http://lists.globus.org/pipermail/myproxy-commit/2013-July/000760.html

Fix will appear in MyProxy v6.0.

I agree the original fix for Bug 3934 in gsi/proxy/proxy_utils/source/programs/grid_proxy_init.c on the globus_4_0 branch (revision 1.37.4.3) was never merged to the trunk and is therefore missing from Globus 5.x sources. Stu, please re-assign this issue to the appropriate person to re-apply the grid_proxy_init.c fix.

Jim Basney - 2013-07-12

Stu please assign to the appropriate person to re-apply the grid_proxy_init.c fix in Globus GSI code.

Globus Toolkit/GT-388

Summary

Sharing access check is done outside of chroot, but share control files are created inside chroot.

Details

Type: Bug

Status: Resolved 2013-06-26

Description

This is only when using the -chroot-path config option.

Comments

Mike Link - 2013-05-17

Fixed in server version 6.27 and in the build repo.

Globus Toolkit/GT-389

Summary

globusrun and globus-job-run don’t report job failures to user

Details

Type: Bug

Status: Open

Description

When using globus-job-run and globusrun with the -quiet flag, if the GRAM jobmanager reports the job's status as FAILED, the tool exits with exit code 0. This is identical to what happens if the job completes normally. The failure needs to be reported to the user in some fashion.

Comments

Globus Toolkit/GT-390

Summary

globus_callout loader fails load plugins on systems with broken libtool-ltdl

Details

Type: Bug

Status: Resolved 2013-06-26

Description

globus_callout uses lt_dlopenext to load the gridmap callouts, which are installed as lib*.so.  In older versions of libtool that call is broken such that it only looks for lib*.la instead of lib*.so.  globus_extension (the plugin loading code used for xio and others) already has a work-around for this issue.

Comments

Mike Link - 2013-05-17

Fixed in the globus-callout package in cvs and the build repo.

Globus Toolkit/GT-391

Summary

Problem with share created with a root containing a double quote

Details

Type: Bug

Status: Resolved 2013-10-22

Description

The share file uses the standard gridftp config file parser, which doesn't allow quotes in parameter values.  The share root should stored encoded to avoid this issue.

Comments

Mike Link - 2013-05-17

Fixed in server version 6.27 (sharing rev6) in the build repo.

Globus Toolkit/GT-392

Summary

Allow variables in restrict-paths and sharing-restrict-paths

Details

Type: Improvement

Status: Resolved 2013-05-17

Description

Multiple users have requested the ability to use the $USER and $HOME variables with path restrictions, i.e, RW/scratch/$USER to only allow users to access their own scratch space.

Comments

Mike Link - 2013-05-17

Done in 6.27 in the build repo.

Globus Toolkit/GT-393

Summary

grid-proxy-init doesn’t support LDAP in a standard behaviour

Details

Type: Bug

Status: Open

Description

The grid-proxy-init command doesn't allow the use of LDAP in a standard behaviour.

Thus, running "grid-proxy-init" command produces the following error message:
$GLOBUS_LOCATION/bin/grid-proxy-init
Error: Couldn't find valid credentials to generate a proxy.
Use -debug for further information.

To get things running properly, the cert and key options have to be specified and are mandatory:
$GLOBUS_LOCATION/bin/grid-proxy-init -cert $HOME/.globus/usercert.pem -key $HOME/.globus/userkey.pem

When using LDAP, user accounts don't appear in the /etc/password file which is one of the files use by the grid-proxy-init command.
Instead, by using the user environment variables such as for example $HOME should solve this issue. Could you please modify it?


Best regards,
Agnes Ansari.

Comments

Globus Toolkit/GT-394

Summary

new lines in file names not working

Details

Type: Bug

Status: Resolved 2013-06-04

Description

on Globus Toolkit 5.2.5rc0.  Commands are url encoded for clarity.

Command: MLST ~/tricky/hi%0A2
Message: Fatal FTP response
---
500-Command failed : System error in stat: No such file or directory
500-A system call failed: No such file or directory
500 End.


Command: STOR /tmp/try5/%0A
Message: Fatal FTP response
---
500-Command failed. : globus_l_gfs_file_open failed.
500-globus_xio: Unable to open file /tmp/try5
500-globus_xio: System error in open: Is a directory
500-globus_xio: A system call failed: Is a directory
500 End.

Comments

Mike Link - 2013-06-04

Dup of GT-395

Globus Toolkit/GT-395

Summary

utf-8 bytes don’t work

Details

Type: Bug

Status: Open

Description

This doesn't work in gt 5.2.  A directory of (bytes encoded):
/tmp/special/funny/utfdir/abcd%E2%80%99/

Command: MLSC /tmp/special/funny/utfdir/abcd%E2%80%99/
Message: Fatal FTP Response
---
500-Command failed : System error in stat: No such file or directory
500-A system call failed: No such file or directory
500 End.

MLSD fails too.  Maybe related to \n also not working?

Comments

Mike Link - 2013-06-04

This is an issue with clear ftp sessions.  The telnet driver swallows most non-7-bit ascii.

Globus Toolkit/GT-396

Summary

gsiftp: MLST mangles file with trailing newline

Details

Type: Bug

Status: Resolved 2013-06-26

Description

this is using gsiftp.  file: /share/special/notfunny/Directory%20Two/tfile6%3Fblah%3Darg%0A

05-22 18:12:24.246 _conn_send_command_now S0.28> MLST /share/special/notfunny/Directory Two/tfile6%3Fblah=arg%0A
05-22 18:12:24.282 _conn_cc_response S0.28< 250-status of /share/special/notfunny/Directory Two/tfile6%3Fblah=arg%0D%0A %0D%0A Type=file;Modify=20110202165452;Size=4;Perm=r;UNIX.mode=0644;UNIX.owner=koa;UNIX.uid=1001;UNIX.group=koa;UNIX.gid=1001;Unique=801-25cc7; /share/special/notfunny/Directory Two/tfile6%3Fblah=arg%0D%0A %0D%0A250 End.%0D%0A

Comments

Mike Link - 2013-06-04

Fixed in gridftp 6.30

Globus Toolkit/GT-397

Summary

gridftp doesn’t error if file with unrepresentable name is used (MLSC)

Details

Type: Bug

Status: Open

Description

I created a file with a name of abc\r\n.
MLSC just returns it with as is, which ends up mangling it; go thinks its just called abc.  Recommend erroring on the MLSC in this case.

250-Contents of /tmp/rn/%0D%0A Type=cdir;Modify=20130522191920;Size=4096;Perm=el;UNIX.mode=0755;UNIX.owner=koa;UNIX.uid=1001;UNIX.group=koa;UNIX.gid=1001;Unique=801-a2af;X.count=3; .%0D%0A Type=pdir;Modify=20130522192002;Size=4096;Perm=cfmpel;UNIX.mode=1777;UNIX.owner=root;UNIX.uid=0;UNIX.group=root;UNIX.gid=0;Unique=801-3a981; ..%0D%0A Type=file;Modify=20130522191920;Size=0;Perm=r;UNIX.mode=0644;UNIX.owner=koa;UNIX.uid=1001;UNIX.group=koa;UNIX.gid=1001;Unique=801-a2b0; abc%0D%0A%0D%0A250 End.%0D%0A
05-22 19:20:09.836 _conn_unlink_cmd C0.4: unlinking

Comments

Globus Toolkit/GT-398

Summary

Unable to transfer files using globus-url-copy without using -noedcu option

Details

Type: Bug

Status: Resolved 2013-06-04

Description

I'm having problems using globus-url-copy to copy a file from a linux gsiftp server to my Mac OS X 10.8 laptop. I'm using gt-5.2.4 compiled using:

GLOBUS_LOCATION=/opt/ldg
FLAVOUR=gcc64dbg

./configure --prefix=${GLOBUS_LOCATION} \
  --with-gsiopensshargs="--without-openssl-header-check --sbindir=${GLOBUS_LOCATION}/sbin/ssh.d --sysconfdir=${GLOBUS_LOCATION}/etc/ssh --with-globus=${GLOBUS_LOCATION} --with-globus-flavor=${FLAVOUR}" \
  --with-flavor=${FLAVOUR}

CPPFLAGS="-I$GLOBUS_LOCATION/include" LDFLAGS="-L$GLOBUS_LOCATION/lib" make globus-gsi globus_gass_copy globus_usage globus-resource-management-client gsi-openssh gsi-myproxy

And then using the command:

$ globus-url-copy -dbg gsiftp://server/path/to/file/to/transfer file:///Users/ram/

results in the error:

ram@mimir ~]$ globus-url-copy -dbg gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat  file:///Users/ram/
debug: starting to get gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: connecting to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
220 ldas-pcdev1.ligo.caltech.edu GridFTP Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.

debug: authenticating with gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
230 User ram logged in.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE HELP

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
214-The following commands are recognized:
    ALLO    APPE    REST    CWD     CDUP    DCAU    EPSV    FEAT
    ERET    MDTM    STAT    ESTO    HELP    LIST    MODE    NLST
    MLSC    MLSD    PASV    RNFR    MLSR    MLST    NOOP    OPTS
    STOR    PASS    PBSZ    PORT    PROT    SITE    EPRT    RETR
    SPOR    MFMT    SCKS    TREV    PWD     QUIT    SBUF    SIZE
    SPAS    STRU    SYST    RNTO    TYPE    USER    LANG    MKD
    RMD     DELE    CKSM    DCSC
214 End

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
FEAT

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
211-Extensions supported
 DCSC P,D
 MFMT
 AUTHZ_ASSERT
 MLSR
 MLSC
 UTF8
 LANG EN
 DCAU
 PARALLEL
 SIZE
 MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.uid*;UNIX.group*;UNIX.gid*;Unique*;UNIX.slink*;X.count;
 ERET
 ESTO
 SPAS
 SPOR
 REST STREAM
 MDTM
 PASV AllowDelayed;
211 End.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE CLIENTINFO scheme=gsiftp;appname="globus-url-copy";appver="8.6 (gcc64dbg, 1342633606-83) [Globus Toolkit 5.2.4]";
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
250 OK.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
TYPE I
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 Type set to I.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PBSZ 1048576

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 PBSZ=1048576

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PASV

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
227 Entering Passive Mode (131,215,115,249,157,191)

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
RETR /home/tzs/MaxEntWork/ER2/rawwave.mat

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
500-Command failed. : callback failed.
500-globus_gsi_gssapi: Error with gss credential handle
500-globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Attempt 1
500-globus_credential: Error reading host credential
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: File is not owned by current user: /etc/grid-security/hostcert.pem is not owned by current user
500-Attempt 2
500-globus_credential: Error reading proxy credential
500-globus_sysconfig: Could not find a valid proxy certificate file location
500-globus_sysconfig: Error with key filename
500-globus_sysconfig: File does not exist: /tmp/x509up_u4142 is not a valid file
500-Attempt 3
500-globus_credential: Error reading user credential
500-globus_credential: Key is password protected: GSI does not currently support password protected private keys.
500-OpenSSL Error: pem_pkey.c:109: in library: PEM routines, function PEM_READ_BIO_PRIVATEKEY: bad password read
500 End.

debug: fault on connection to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat: globus_ftp_client: the server responded with an error
debug: data callback, error globus_ftp_client: the server responded with an error, buffer 0x11023c000, length 0, offset=0, eof=true
debug: operation complete

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : callback failed.
500-globus_gsi_gssapi: Error with gss credential handle
500-globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Attempt 1
500-globus_credential: Error reading host credential
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: File is not owned by current user: /etc/grid-security/hostcert.pem is not owned by current user
500-Attempt 2
500-globus_credential: Error reading proxy credential
500-globus_sysconfig: Could not find a valid proxy certificate file location
500-globus_sysconfig: Error with key filename
500-globus_sysconfig: File does not exist: /tmp/x509up_u4142 is not a valid file
500-Attempt 3
500-globus_credential: Error reading user credential
500-globus_credential: Key is password protected: GSI does not currently support password protected private keys.
500-OpenSSL Error: pem_pkey.c:109: in library: PEM routines, function PEM_READ_BIO_PRIVATEKEY: bad password read
500 End.
$

If I specify the number of parallel transfers to use, i.e. "-p 1" then it again results in a similar error:

[ram@mimir ~]$ globus-url-copy -dbg -p 1 gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat  file:///Users/ram/
debug: starting to get gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: connecting to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
220 ldas-pcdev1.ligo.caltech.edu GridFTP Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.

debug: authenticating with gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
230 User ram logged in.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE HELP

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
214-The following commands are recognized:
    ALLO    APPE    REST    CWD     CDUP    DCAU    EPSV    FEAT
    ERET    MDTM    STAT    ESTO    HELP    LIST    MODE    NLST
    MLSC    MLSD    PASV    RNFR    MLSR    MLST    NOOP    OPTS
    STOR    PASS    PBSZ    PORT    PROT    SITE    EPRT    RETR
    SPOR    MFMT    SCKS    TREV    PWD     QUIT    SBUF    SIZE
    SPAS    STRU    SYST    RNTO    TYPE    USER    LANG    MKD
    RMD     DELE    CKSM    DCSC
214 End

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
FEAT

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
211-Extensions supported
 DCSC P,D
 MFMT
 AUTHZ_ASSERT
 MLSR
 MLSC
 UTF8
 LANG EN
 DCAU
 PARALLEL
 SIZE
 MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.uid*;UNIX.group*;UNIX.gid*;Unique*;UNIX.slink*;X.count;
 ERET
 ESTO
 SPAS
 SPOR
 REST STREAM
 MDTM
 PASV AllowDelayed;
211 End.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE CLIENTINFO scheme=gsiftp;appname="globus-url-copy";appver="8.6 (gcc64dbg, 1342633606-83) [Globus Toolkit 5.2.4]";
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
250 OK.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
TYPE I
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 Type set to I.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
MODE E

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 Mode set to E.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
OPTS RETR Parallelism=1,1,1;

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 OPTS Command Successful.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PBSZ 1048576

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 PBSZ=1048576

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PORT 129,89,61,162,221,177

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 PORT Command successful.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
RETR /home/tzs/MaxEntWork/ER2/rawwave.mat

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
500-Command failed. : callback failed.
500-globus_gsi_gssapi: Error with gss credential handle
500-globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Attempt 1
500-globus_credential: Error reading host credential
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: File is not owned by current user: /etc/grid-security/hostcert.pem is not owned by current user
500-Attempt 2
500-globus_credential: Error reading proxy credential
500-globus_sysconfig: Could not find a valid proxy certificate file location
500-globus_sysconfig: Error with key filename
500-globus_sysconfig: File does not exist: /tmp/x509up_u4142 is not a valid file
500-Attempt 3
500-globus_credential: Error reading user credential
500-globus_credential: Key is password protected: GSI does not currently support password protected private keys.
500-OpenSSL Error: pem_pkey.c:109: in library: PEM routines, function PEM_READ_BIO_PRIVATEKEY: bad password read
500 End.

debug: fault on connection to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat: globus_ftp_client: the server responded with an error
debug: data callback, error globus_ftp_client: the server responded with an error, buffer 0x10c5c6000, length 0, offset=0, eof=true
debug: operation complete

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : callback failed.
500-globus_gsi_gssapi: Error with gss credential handle
500-globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Valid credentials could not be found in any of the possible locations specified by the credential search order.
500-Attempt 1
500-globus_credential: Error reading host credential
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: Error with certificate filename
500-globus_sysconfig: File is not owned by current user: /etc/grid-security/hostcert.pem is not owned by current user
500-Attempt 2
500-globus_credential: Error reading proxy credential
500-globus_sysconfig: Could not find a valid proxy certificate file location
500-globus_sysconfig: Error with key filename
500-globus_sysconfig: File does not exist: /tmp/x509up_u4142 is not a valid file
500-Attempt 3
500-globus_credential: Error reading user credential
500-globus_credential: Key is password protected: GSI does not currently support password protected private keys.
500-OpenSSL Error: pem_pkey.c:109: in library: PEM routines, function PEM_READ_BIO_PRIVATEKEY: bad password read
500 End.
$

Therefore I believe this effects both passive and extended mode.

The only workaround that has been found it to "turn off data channel authentication for ftp transfers", i.e. pass the "-noedcu" option, then the transfer occurs without issue.

Comments

skymoo - 2013-05-23

Sorry, there's a typo in that last sentence. The option that allows the transfer to succeed is "-nodcau".

skymoo - 2013-05-23

I think this is actually issue GT-366.

skymoo - 2013-05-24

This is indeed GT-366, applying the advisory results in successful transfer. This issue can be closed.

skymoo - 2013-05-25

Actually I spoke to soon, transfers using "-p 1", and higher, fail with:

[ram@liquid ~]$ globus-url-copy -p 1 gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat  file:///Users/ram/
^C
Cancelling copy...

[ram@liquid ~]$ globus-url-copy -p 1 -dbg gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat  file:///Users/ram/
debug: starting to get gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: connecting to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
220 ldas-pcdev1.ligo.caltech.edu GridFTP Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.

debug: authenticating with gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
230 User ram logged in.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE HELP

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
214-The following commands are recognized:
    ALLO    APPE    REST    CWD     CDUP    DCAU    EPSV    FEAT
    ERET    MDTM    STAT    ESTO    HELP    LIST    MODE    NLST
    MLSC    MLSD    PASV    RNFR    MLSR    MLST    NOOP    OPTS
    STOR    PASS    PBSZ    PORT    PROT    SITE    EPRT    RETR
    SPOR    MFMT    SCKS    TREV    PWD     QUIT    SBUF    SIZE
    SPAS    STRU    SYST    RNTO    TYPE    USER    LANG    MKD
    RMD     DELE    CKSM    DCSC
214 End

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
FEAT

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
211-Extensions supported
 DCSC P,D
 MFMT
 AUTHZ_ASSERT
 MLSR
 MLSC
 UTF8
 LANG EN
 DCAU
 PARALLEL
 SIZE
 MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.uid*;UNIX.group*;UNIX.gid*;Unique*;UNIX.slink*;X.count;
 ERET
 ESTO
 SPAS
 SPOR
 REST STREAM
 MDTM
 PASV AllowDelayed;
211 End.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
SITE CLIENTINFO scheme=gsiftp;appname="globus-url-copy";appver="8.6 (gcc64dbg, 1342633606-83) [Globus Toolkit 5.2.4]";
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
250 OK.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
TYPE I
debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 Type set to I.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
MODE E

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 Mode set to E.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
OPTS RETR Parallelism=1,1,1;

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 OPTS Command Successful.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PBSZ 1048576

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 PBSZ=1048576

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
PORT 192,168,1,107,216,163

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
200 PORT Command successful.

debug: sending command to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
RETR /home/tzs/MaxEntWork/ER2/rawwave.mat

debug: response from gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat:
500-Command failed. : callback failed.
500-globus_xio: Unable to connect to 192.168.1.107:55459
500-globus_xio: System error in connect: Connection timed out
500-globus_xio: A system call failed: Connection timed out
500 End.

debug: fault on connection to gsiftp://ldas-pcdev1.ligo.caltech.edu/home/tzs/MaxEntWork/ER2/rawwave.mat: globus_ftp_client: the server responded with an error
debug: data callback, error globus_ftp_client: the server responded with an error, buffer 0x100a17000, length 0, offset=0, eof=true
debug: operation complete

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : callback failed.
500-globus_xio: Unable to connect to 192.168.1.107:55459
500-globus_xio: System error in connect: Connection timed out
500-globus_xio: A system call failed: Connection timed out
500 End.

[ram@liquid ~]$

skymoo - 2013-05-27

Nevermind, this seems to be a combination of firewall and NAT issues.

Mike Link - 2013-06-04

Thanks for following up Adam.  Let us know if you find any issues.

Globus Toolkit/GT-399

Summary

eppn callout might need to support multiple CA certificates

Details

Type: Bug

Status: Resolved 2013-06-06

Description

The eppn callout requires the GLOBUS_MYPROXY_CA_CERT environment variable to be set to the path to the CA which issues client certificates. However, for CILogon, there are two different CAs which have different levels of service, so that callout should allow both to be used.

Comments

Mike Link - 2013-06-06

Fixed via GT-411

Globus Toolkit/GT-400

Summary

gridftp server doesn’t send CONF_ID

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The gridftp server has a configuration option to send a usage_stats_id that describes the configuration of a server. I'd like to use that in gcmu, but it doesn't get sent because that option is not in the default set of tags that are sent to the usage stats server. Since that option is only used for usage stats, it should probably be added to the default set.

Comments

Mike Link - 2013-06-04

Fixed in build repo.

Globus Toolkit/GT-401

Summary

no threads created in gridftp c client debian packages

Details

Type: Bug

Status: Resolved 2013-06-04

Description

Attached is a C client that anonymously connects to a running gridftp server and tests existence of a file.  The client should spawn at least 1 thread where a callback function is executed, but when building against the current Wheezy debian packages, no threads are spawned.  Attached is the relevant code demonstrating the problem.

I installed the latest gt from source, building the "gcc64dbgpthr" flavor.  When the attached test C client compiles against these custom libraries, several threads are spawned and everything works as expected.

Our particular use case involves a Python C extension module. If no threads are spawned, the callback is never executed, and our client deadlocks, so we require an alternative to the debian packages that are currently distributed.

Note that this issue relates to both the native Debian Wheezy packages and the most recent Globus.org-supplied packages.

Comments

Joe Bester - 2013-05-28

See the note on migrating from GT5.0 in
http://www.globus.org/toolkit/docs/5.2/5.2.4/ccommonlib/mig/#ccommonlibMig

Mike Link - 2013-06-04

I discussed with Jeff and verified the issue was no runtime selection of threads.

Globus Toolkit/GT-402

Summary

gpt-bootstrap should call automake with --force-missing

Details

Type: Bug

Status: Resolved 2013-06-20

Description

This suggestion is made so that bootstrapping the sources picks up a newer ltmain.sh from the system that supports newer architectures like aarch64.

Comments

Mattias Ellert - 2013-06-02

globus-core uses its own bootstrap script instead of gpt-bootstrap, so needs a separate fix.

Joe Bester - 2013-06-20

This is fixed in CVS and will be in 5.2.5

Globus Toolkit/GT-403

Summary

More porting issues (aarch64 and x32)

Details

Type: Bug

Status: Resolved 2013-06-20

Description

Fedora is planning support for aarch64, Debian is already building for x32. The accompiler.m4 needs to be updated to support these architectures. Patch attached. (If you know a more clever way to detect x32 let me know - at least this one works.)

Comments

Joe Bester - 2013-06-20

This is fixed in CVS and will be in 5.2.5

Globus Toolkit/GT-404

Summary

5 times latex sometimes not enough

Details

Type: Bug

Status: Resolved 2013-06-20

Description

The makefile generated by doxygen doe building the documentation stops after five iteration if the labels have not stabilized. A recent build of the documentation of globus-ftp-control on Fedora 20 required 6 iterations to succeed. The attached patch increases the allowed number of iterations.

Comments

Joe Bester - 2013-06-20

This is fixed in CVS and will be in 5.2.5

Globus Toolkit/GT-405

Summary

Non-portable use of echo in shell script

Details

Type: Bug

Status: Resolved 2013-11-07

Description

Using \t in a string printed by echo is not portable. Patch attached.

Originally reporten in Debian BTS - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=701557

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-406

Summary

Keep globus files in globus namespace

Details

Type: Bug

Status: Open

Description

All globus packages install their data files in /usr/share/globus or a subdirectory thereof, except for globus-simple-ca which creates a new directory /usr/share/globus_simple_ca in the global namespace. The attached patch cleans this up and move the files to /usr/share/globus/simple-ca instead.

Comments

Globus Toolkit/GT-407

Summary

globus-gridftp-server status returns 0 when not running on ubuntu

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The return value from service globus-gridftp-server status on ubuntu is 0 when the job is not running and its pid file is not present. It should return 3 (not running) in that case.

Comments

Joe Bester - 2013-05-31

fixed in cvs and builds.globus.org repo

Globus Toolkit/GT-408

Summary

service globus-gridftp-server status returns incorrect status

Details

Type: Bug

Status: Resolved 2013-06-26

Description

The command globus-gridftp-server status returns the pid of the /etc/init.d/globus-gridftp-server script if the gridftp server is not running and prints that the service is running correctly.

Comments

Joe Bester - 2013-06-03

Fixed in CVS.

Globus Toolkit/GT-409

Summary

Globus GSI OpenSSH

Details

Type: New Feature

Status: Open

Description

If an OS internally supports FIPS, such as CentOS 6, and that OS is placed in FIPS mode as required to adhere to government standards/regulations, then the GSI-OpenSSH server will abend when started.

Comments

Globus Toolkit/GT-410

Summary

Build binary packages with UDT support for GC

Details

Type: Task

Status: Resolved 2013-06-04

Description

Build binaries compatible with all supported platforms for Mac, Linux, and Windows.

Comments

Mike Link - 2013-06-04

All platforms built and working.

Globus Toolkit/GT-411

Summary

Extend authz callout framework to support chaining of gridmap callouts

Details

Type: New Feature

Status: Resolved 2013-06-26

Description

Currently you can define multiple gridmap callouts in an authz.conf, but they will all run and must all succeed in order to pass.  It is useful to be able to have multiple callouts configured and only require one to succeed.

Comments

Mike Link - 2013-06-04

Also investigate a way to run the same callout with different arguments or callout-specific configuration.  Specifically we want to run the eppn callout with two different CA certs.

Mike Link - 2013-06-06

Added the ability to set environment variables in the callout configuration file, so the same callout can be run multiple times with a different set of env var values.

Mike Link - 2013-06-06

Released to stable.

Globus Toolkit/GT-412

Summary

Add configuration to append a runtime version string to the existing toolkit version reported in the banner and usage stats.

Details

Type: New Feature

Status: Resolved 2013-06-26

Description

GCMU will set this.

Comments

Mike Link - 2013-06-06

Added -version-tag config.  Released to stable.

Globus Toolkit/GT-413

Summary

Clarify errors whenever sharing file operations fail

Details

Type: Bug

Status: Open

Description

The error returned by the server in cases where the state dir does not exist, isn't writable, or can't be created is not useful.  Return specific errors.

Comments

Bryce Allen - 2014-10-30

This is the error I see when quota is exceeded (from KOA-3081):

Command failed: sharing state dir has invalid permissions

Globus Toolkit/GT-414

Summary

globus-gridftp-server closing connection before returning cksm value

Details

Type: Bug

Status: Open

Description

When users submit a transfer via GO with the verification option selected the connection closes before the cksm value is returned.

GO Error:
Error (transfer)
Server: mulroony#hpss37 (hpss-md37.ncsa.illinois.edu:2811)
Command: CKSM MD5 0 -1 ~/stage_asyrowski_runs2_copy2.sh
Message: The connection to the server was broken
---
an end-of-file was reached
globus_xio: An end of file occurred

CMD Log from system:
CMD (b:140737018816368 l:13 n:13 r:0)
ALLO 757677

REPLY (b:140737152928832 l:30)
200 ALLO command successful.

CMD (b:140737018820096 l:39 n:39 r:0)
STOR ~/stage_asyrowski_runs2_copy2.sh

CMD (b:140737018831664 l:48 n:48 r:0)
CKSM MD5 0 -1 ~/stage_asyrowski_runs2_copy2.sh

REPLY (b:140737153062528 l:25)
150 Beginning transfer.

REPLY (b:140737153132768 l:126)
112-Perf Marker
 Timestamp:  1371136671.0
 Stripe Index: 0
 Stripe Bytes Transferred: 0
 Total Stripe Count: 1
112 End.

REPLY (b:140736549178352 l:126)
112-Perf Marker
 Timestamp:  1371136671.0
 Stripe Index: 0
 Stripe Bytes Transferred: 0
 Total Stripe Count: 1
112 End.

REPLY (b:140737153132912 l:24)
226 Transfer Complete.

Comments

Mike Link - 2013-06-13

Can you update the server?  There have been a few bug fixes since that release.  If you installed it from the globus.org repo a yum update should get the latest, otherwise you can install the globus.org repo configuration here: http://www.globus.org/toolkit/downloads/5.2.4/

mulroony - 2013-06-13

Thanks for the quick response. It will take some significant effort to test the upgrade first, get approval, then implement it across our nodes. This was working as expected up until a few days ago, then stopped working.

Is there anything else I can do to increase the verbosity to try to determine why this is happening?

Mike Link - 2013-06-13

Nothing has changed on your end from when it was working?  Unfortunately most of the debug tracing ability is unavailable in the non-dbg builds.  Do you see a return code from the server processes? It should be either in the gridftp log or xinetd/syslog, depending on how it is running.

mulroony - 2013-06-13

Not to our knowledge did anything change, we did have to restart xinetd on our nodes, but nothing should have changed by the restart as these are stateless nodes. We attempted a restart of a few nodes, which would have pulled the same image they have been running for some time, and no difference.

It looks like it is exiting with code 127.

Turns out we might be able to test a newer version after all. I will follow up with the results of that test.

Thanks again.

mulroony - 2013-06-13

It worked when tested with the below versions. With the time it would take to implement a system wide upgrade we still hope to find a different solution if possible.

globus_gridftp_server: 6.32 (1370414947-83)
globus_gfork: 3.2 (1142319502-1)
globus_xio_queue: 3.3 (1331018989-83)
globus_gridftp_server_file: 6.32 (1370414947-83)
globus_xio_udp: 3.3 (1331018989-83)
globus_usage_stats_module: 3.1 (1319549265-1)
globus_gsi_authz_callout_error_module: 2.2 (1329144117-83)
globus_gsi_authz: 2.2 (1329144142-83)
globus_xio_pipe: 2.2 (1137650252-1)
globus_xio_telnet: 3.3 (1331018989-83)
globus_xio_gssapi_ftp: 2.8 (1359003884-83)
globus_gridftp_server_control: 2.8 (1359003884-83)
globus_gsi_callback_module: 4.4 (1343249939-83)
globus_credential: 5.3 (1329144244-83)
globus_gsi_proxy: 6.2 (1329144361-83)
globus_gsi_openssl_error: 2.1 (1319549263-1)
globus_openssl: 3.2 (1329144334-83)
globus_gsi_gssapi: 10.7 (1336415934-83)
globus_sysconfig: 5.3 (1337105279-83)
globus_callout_module: 2.2 (1329143466-83)
globus_gss_assist: 8.7 (1359003886-83)
globus_xio_gsi: 2.3 (1147293372-1)
globus_xio_tcp: 3.3 (1331018989-83)
globus_xio_system_select: 3.3 (1331018989-83)
globus_xio_file: 3.3 (1331018989-83)
globus_io: 9.4 (1359994550-83)
globus_ftp_control: 4.5 (1359004328-83)
globus_gridftp_server: 6.32 (1370414947-83)
globus_xio: 3.3 (1331018989-83)
globus_extension_module: 14.9 (1349797671-83)
globus_callback_nonthreaded: 14.9 (1349797671-83)
globus_callback: 14.9 (1349797671-83)
globus_object: 14.9 (1349797671-83)
globus_error: 14.9 (1349797671-83)
globus_common: 14.9 (1349797671-83)
globus_thread_common: 14.9 (1349797671-83)
globus_thread_none: 14.9 (1349797671-83)
globus_thread: 

Mike Link - 2013-06-13

Interesting.  There wouldn't be any way to prevent it from happening locally without an update, but I'll check if GO has started to do something differently that is triggering a bug in the old version.

mulroony - 2013-06-13

Mike, Tested this with uberftp to one of our clients. Any time I issue the 'CKSM' command it panics with exit code 127.

What little I could get from gdb is below:

(gdb) bt
#0  0x00007ffff472543c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000040581b in main (argc=, argv=) at globus_gridftp_server.c:1863
(gdb) frame 1
#1  0x000000000040581b in main (argc=, argv=) at globus_gridftp_server.c:1863
1863                globus_cond_wait(&globus_l_gfs_cond, &globus_l_gfs_mutex);
(gdb) frame 0
#0  0x00007ffff472543c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) quit

Mike Link - 2013-06-14

Running under gdb should be helpful.  Did you run it manually or did you attach to a running xinetd process?  Add the gridftp -debug argument and try again, and if you get another crash show: "thread apply all bt".

glasgow - 2013-06-14

Hi Mike,

We resolved the issue at our end by reverting to the previous version of our hpss dsi. Things are working as expected at this point. I will leave the investigations into what is going on in the new dsi for Jason when he returns from vacation.

Thanks for your prompt assistance!

Jim

Globus Toolkit/GT-415

Summary

GridFTP server fails when listing a directory with specific numbers of entries.

Details

Type: Bug

Status: Resolved 2013-07-02

Description

With GridFTP from GT 5.0.5, a directory listing will fail where the number of entries not including . and .. equals 999 + any multiple of 1000 (i.e. 999, 1999, 2999).

Comments

Mike Link - 2013-07-02

update package released at http://www.globus.org/toolkit/advisories.html?version=5.0

Globus Toolkit/GT-416

Summary

Prepare GT 5.2.5 release

Details

Type: Task

Status: Resolved 2013-11-07

Description

Do the prep work for the 5.2.5 release. See linked issues.

Comments

Globus Toolkit/GT-417

Summary

globus-xio-gridftp-driver package fails in doxygen

Details

Type: Bug

Status: Resolved 2013-06-18

Description

The gridftp xio driver has some issues with its comments which cause the latex code generated by doxygen to fail compilation.

Comments

Globus Toolkit/GT-418

Summary

globus-gatekeeper leaves stale processes behind if port 2119 is probed

Details

Type: Bug

Status: Open

Description

Merely probing globus-gatekeeper on port 2119 (using a port scanner or by merely telneting to the port and closing the connection) leaves a zombie process behind.  A malicious user could conceivably use this to stage a DoS attack.

Comments

Globus Toolkit/GT-419

Summary

Add support for ubuntu 13.04

Details

Type: Task

Status: Resolved 2013-06-20

Description

Add support in the build  system for ubuntu 13.04.

Comments

Joe Bester - 2013-06-20

I've added ubuntu 13.04 to the list of builds done by default, and have binaries of the current 5.2 branch available on the builds.globus.org repository.

Globus Toolkit/GT-420

Summary

Change in behaviour of the globus-gridftp-server

Details

Type: Bug

Status: Resolved 2013-08-15

Description

Forwarded from the IGE RT: https://rt.ige-project.eu/rt/Ticket/Display.html?id=349

The IGE certification of the gridftp server in GT 5.2.4 discovered a change in behaviour:


I found a problem testing this criterion: when running gridftp server with the "-restricted-paths" option, I found that write access is allowed only if mentioned explicitly. It was not so until now, and even the man page says that if not explicitly specified full access is allowed.

I ran the server like:
globus-gridftp-server -c /etc/gridftp.conf -C /etc/gridftp.d -pidfile /var/run/globus-gridftp-server.pid -l /var/log/gridftp.log -restrict-paths /home/$test_user_home,/tmp

and the test like:
globus-url-copy gsiftp://`hostname -f`/$test_user_home/../`basename $test_user_home`/$test_file_name gsiftp://`hostname -f`//$test_user_home/../../tmp/test

I am reported that the /tmp/test path is not allowed. It is though only about permission right, because if I ru the server like this:
globus-gridftp-server -c /etc/gridftp.conf -C /etc/gridftp.d -pidfile /var/run/globus-gridftp-server.pid -l /var/log/gridftp.log -restrict-paths RW/home/$test_user_home,RW/tmp

i.e. prefixing the paths with RW, than the transfer is successfully done.

Was this an deliberate change in behaviour, or an unintended regression?

Comments

Mike Link - 2013-08-15

Reverted to documented behavior and committed for 5.2.5.

Globus Toolkit/GT-421

Summary

Inconsistent documentation in GrdFTP release notes in GT 5.2.4

Details

Type: Documentation

Status: Resolved 2013-10-16

Description

There are some inconsistencies in the Release documentation for GridFTP in GT 5.2.4.

This issue is forwarded from the IGE RT: https://rt.ige-project.eu/rt/Ticket/Display.html?id=350

This is not a bug really, but something not clear. In "New features" description on the "release notes" page (at http://www.globus.org/toolkit/docs/latest-stable/gridftp/rn/) they say about a new feature described here https://globus.atlassian.net/browse/GT-302. That page is not clear regarding what is newly introduced. They mention about an "--enable-sharing" option, which I cannot found either on the GrdiFTP Web page (http://www.globus.org/toolkit/docs/latest-stable/gridftp/pi/#globus-gridftp-server) or into its manual (and -help) page.

So the release notes seems to indicate that there should be a new --enable-sharing option in GT 5.2.4, but it does not seem to be there. This might be simply a mis-reading of the release notes, but if you could comment on the issue.

Comments

Stuart Martin - 2013-10-16

There has been subsequent work on the sharing feature since this jira issue was created.  The 5.2.5 release contains the stable sharing feature and the 5.2.5 doc will be updated to reflect these changes.

Globus Toolkit/GT-422

Summary

Update documentation for 5.2.5

Details

Type: Task

Status: Resolved 2013-11-07

Description

Create a new documentation tree for 5.2.5 and update the frags to reflect changes since 5.2.4.

Comments

Globus Toolkit/GT-423

Summary

Windows build uses wrong format specifier for globus_off_t

Details

Type: Bug

Status: Resolved 2013-06-20

Description

%lld is not supported in the WinXP crt, at least for vsnprintf, and can cause invalid memory accesses.

Comments

Mike Link - 2013-06-20

Fixed, binaries updated.

Globus Toolkit/GT-424

Summary

New Fedora Packaging Guideline - no %_isa in BuildRequires

Details

Type: Bug

Status: Resolved 2013-06-28

Description

There was recently an addition made to the Fedora Packaging Guidelines which bans the use of %_isa in BuildRequires in specfiles:

http://fedoraproject.org/wiki/Packaging:Guidelines#BuildRequires_and_.25.7B_isa.7D

The applied patch fixes the globus-spec-creator script to comply with this change.

Comments

Mattias Ellert - 2013-06-26

PS. The patch also adds back the %clean section. It is still needed for EPEL 5 builds.

Joe Bester - 2013-06-28

Thanks. This is committed to CVS and will be in 5.2.5.

Globus Toolkit/GT-425

Summary

CLONE - Add environment variables to enable ftp client support for ipv6

Details

Type: Improvement

Status: Resolved 2013-08-15

Description

This will enable older middleware to work with ipv6.

Should also enable globus_io support, or possibly change the default to allowed.

Comments

Mattias Ellert - 2013-06-28

We have recieved feedback that the fix in globus-io is not enough. A similar fix is needed for globus-ftp-client.

Mattias Ellert - 2013-06-28

This patch makes a similar change to globus-ftp-client.

Mike Link - 2013-08-15

Committed for 5.2.5

Globus Toolkit/GT-426

Summary

memory leaks in globus-gsi-callback package

Details

Type: Bug

Status: Resolved 2013-09-16

Description

Memory leaks reported by valgrind:
==43594== 7,946,490 (1,328 direct, 7,945,162 indirect) bytes in 83 blocks are definitely lost in loss record 1,938 of 1,938
==43594== at 0x4C278FE: malloc (vg_replace_malloc.c:270)
==43594== by 0xA8999CD: CRYPTO_malloc (mem.c:306)
==43594== by 0xA92CBC9: X509_STORE_add_crl (x509_lu.c:373)
==43594== by 0xA92E43C: X509_load_crl_file (by_file.c:238)
==43594== by 0xA92ECA2: get_cert_by_subject (by_dir.c:413)
==43594== by 0xA92C4DB: X509_STORE_get_by_subject (x509_lu.c:306)
==43594== by 0xCC16AAA: globus_i_gsi_callback_check_revoked (globus_gsi_callback.c:1050)
==43594== by 0xCC1782A: globus_i_gsi_callback_cred_verify (globus_gsi_callback.c:860)
==43594== by 0xCC17E61: globus_gsi_callback_handshake_callback (globus_gsi_callback.c:550)
==43594== by 0xA928CCF: internal_verify (x509_vfy.c:1656)
==43594== by 0xA92963C: X509_verify_cert (x509_vfy.c:372)
==43594== by 0xCC17BD8: globus_gsi_callback_X509_verify_cert (globus_gsi_callback.c:403)

How to reproduce:
I have valid proxy (not expired) and submitting transfers to FTS. If crls expired in the meantime (manually stopped fetch-crl cron job to and waited a few days), memory goes out of control in every submission. When I run fetch-crl while FTS is running and submitting transfers it stops leaking memory immediately.

I am wondering if it is related to the ticket below (since I first saw the problem with globus 5.2 :
https://globus.atlassian.net/browse/GT-235

Info:
uname -a
Linux fts3src2 2.6.32-279.1.1.el6.x86_64 #1 SMP Tue Jul 10 20:50:49 CEST 2012 x86_64 x86_64 x86_64 GNU/Linux
globus version: latest from EPEL 6

Comments

Joe Bester - 2013-09-13

I've committed a fix for this to CVS and am working on generating new binary packages.

Joe Bester - 2013-09-16

New update packages are available in our package repository now (globus-gsi-callback 4.5)

Globus Toolkit/GT-427

Summary

globus-user-env.csh sets environment incorrectly for 64-bit builds

Details

Type: Bug

Status: Open

Description

globus-user-env.csh does not correctly set LD_LIBRARY_PATH to ${GLOBUS_LOCATION}/lib64 on 64-bit systems, unlike globus-user-env.sh which sets things correctly.  This is because globus-user-env.csh.in has ${GLOBUS_LOCATION}/lib hardcoded in the LD_LIBRARY_PATH setting instead of using ${libdir}.  In fact, it looks like all of the various path settings in globus-user-env.csh.in hardcode  ${GLOBUS_LOCATION}/blah throughout rather than using the libdir, bindir, sbindir, datadir, and perlmoduledir variables that are set earlier in the script.

Patch for this is attached.

Comments

Globus Toolkit/GT-428

Summary

Improve handling of hanging GridFTP server processes

Details

Type: Bug

Status: Resolved 2013-10-16

Description

Fix process watchdog timer detect more cases of hung processes, not just processes that hang during the exit state.

Comments

Mike Link - 2013-10-16

Fixed major root causes of hangs (linked) and minor issues including data channel race conditions.

Globus Toolkit/GT-429

Summary

Provide debug state dump when hung server process is detected

Details

Type: Improvement

Status: Resolved 2013-10-16

Description

Dump useful info when https://globus.atlassian.net/browse/GT-428 is triggered to help fix the root of the problem.

Comments

Globus Toolkit/GT-430

Summary

Consider providing Mac OS X packages

Details

Type: Task

Status: Open

Description

Hi,

I've packaged a selection of the Globus Toolkit client tools as a
native Mac OS X package. For details on the packaging, see [1].

1. https://github.com/dvandok/globus-macports

The actual download is linked from the EGCF page[2].

2. http://www.egcf.eu/software/

Would you please consider to include [2] and maybe [1] on the "third
party" download page[3]?

3. http://www.globus.org/toolkit/downloads/#thirdparty

By the way, on [3] there is a link to the IGE v2.0 release of GT 5.2;
it's outdated; perhaps it's best to update it just to point to the
general release page [4] that links to all the IGE releases.

4.  http://www.ige-project.eu/downloads/software/releases

Best regards,

Dennis van Dok

Comments

Globus Toolkit/GT-431

Summary

Large leaks found during 5.2.5 release testing

Details

Type: Bug

Status: Resolved 2013-10-16

Description

Seem to exist in 5.2.4 and possibly older -- could be long time leaka that are revealed by an improved valgrind.

Comments

Globus Toolkit/GT-432

Summary

Add DSI API to provide direction of transfer within a delayed passive call

Details

Type: New Feature

Status: Open

Description

There is no way for the DSI to know whether a transfer is a READ or WRITE during a delayed passive call.

Comments

Mike Link - 2013-10-28

This will need an accessor added to server-lib to get the pending transfer command at the time of the passive call.  From that we get a direction that can be added to data-info->op_info, and the dsi can get that with an op_info query.

Globus Toolkit/GT-433

Summary

Add GCMU config option to enable UDT

Details

Type: New Feature

Status: Resolved 2013-10-16

Description

The corresponding server options would be -allow-udt -threads 1, and the xio udt driver package becomes a dependency.

Comments

Mike Link - 2013-10-16

Added AllowUDT

;  Set this to true to Allow Globus Online to use UDT with NAT traversal
; instead of TCP for data transfers.
;
;AllowUDT = False

Globus Toolkit/GT-434

Summary

Add client and commands to summary tables

Details

Type: Improvement

Status: Resolved 2013-08-27

Description

Currently the summary tables contain information about the gridftp server deployments, numbers and sizes of transfers, and histograms of sizes and transfer rates. Add new tables or fields to  include the other information that is included in the daily summary reports via email: the number of different types of commands and the number of different types of clients per summary time.

Comments

Globus Toolkit/GT-435

Summary

Publish usage stats summaries in a google docs spreadsheet

Details

Type: Improvement

Status: Resolved 2013-08-27

Description

Add some hooks to the usage stats summarizer to upload the daily summary data into a google docs spreadsheet so that it can be analyzed offline with less effort. Include a link to the usage summary sheet(s) in the daily email.

Comments

Globus Toolkit/GT-436

Summary

Create process to archive and store old usage data

Details

Type: Improvement

Status: Resolved 2013-08-27

Description

We'd like to export the historical data from the usage stats database into a backed up file on mcs or s3 and remove the per-transfer records from the database, leaving only summary data for previous years' data. I think the process is to create a dump of the old data, verify that it can be reconstituted, and then look to deleting and cleaning the live database to have less data, with the goal of help us avoid upgrading the database disks often (as they fill).

Comments

Globus Toolkit/GT-437

Summary

grid-proxy-init broken for PKCS12 files with CA certificates

Details

Type: Bug

Status: Resolved 2013-11-07

Description

See also: https://rt.ige-project.eu/rt/Ticket/Display.html?id=245

Often PKCS12 files contain not only the user cert and key but also the CA chain certificates. Unless ordered in a specific way, grid-proxy-init fails. Single PKCS12 keystore with user cert and key + ca certificate is commonly distributed by CAs, e.g. in France.

EGCF/IGE analyzed the source code of grid-proxy-init and found that a small coding mistake renders the function of the code dysfunctional. The reason it doesn't work is a bug in the source (the cert stack counter decrements too fast for it's own good), and the following match fixes it:

*** gsi/credential/source/library/globus_gsi_credential.c.org 2013-07-15
22:18:40.000000000 +0200
--- gsi/credential/source/library/globus_gsi_credential.c 2013-07-15
22:19:03.000000000 +0200
***************
*** 1580,1586 ****
goto exit;
}

! for(i = 0 ; i < sk_X509_num(pkcs12_certs); i++)
{
handle->cert = sk_X509_pop(pkcs12_certs);

--- 1580,1588 ----
goto exit;
}

! j = sk_X509_num(pkcs12_certs);
!
! for(i = 0 ; i < j; i++)
{
handle->cert = sk_X509_pop(pkcs12_certs);

The code was there forever, but the way it was expressed means that
sk_X509_num(pkcs12_certs) is decrementing whilst the loop is running, because
the certs are popped from the pkcs12_certs stack. You just need to iterate over
the complete depth of the initial stack.

Since EGCF here provides a very easy solution, can you please try to get this into the GT5.2.5 release?

Thanks
Helmut

Comments

Joe Bester - 2013-11-07

This patch is included in 5.2.5

Globus Toolkit/GT-438

Summary

globus-connect-multiuser-setup keeps prompting for username/password infinitely upon entering wrong username/password

Details

Type: Improvement

Status: Open

Description

It would be helpful if it throws an error message like "incorrect username" or "incorrect password" or something else and quit after a few unsuccessful attempt.

Comments

Globus Toolkit/GT-439

Summary

globus-connect-multiuser-setup has no output on successful setup

Details

Type: Improvement

Status: Resolved 2013-10-28

Description

globus-connect-multiuser-setup has no output at all.

It would be nice if it said something like:

create endpoint endpointname on Globus Online
Done.

Comments

Joe Bester - 2013-09-10

I've updated the script to print out a summary of services and endpoint configurations that it made, as well as the DNs of the services

Globus Toolkit/GT-440

Summary

GRAM Job submission should not require an absolute path to executable

Details

Type: Improvement

Status: Open

Description

Job submission via globus-job-run or RSL for both fork and pbs (and probably all other LRMSs) job managers requires an absolute path to be used for the executable unless it is in $HOME.

This bevahiour was patched out in VDT since VDT 1.6 and is an assumed default in the LCG-CE. The patch is attached to this ticket.

Could you please back-port this to GT?

A patch

Comments

Globus Toolkit/GT-441

Summary

Jobmanagers should produce a single identifiable line in syslog for incident response or accounting

Details

Type: Improvement

Status: Open

Description

The job managers fail to write out key information about a job for tracking in a single identifiable line which can be used for incident response (or accounting). Having all key information about user, end-points involved, and LRMS jobID in a single line is essential for incident handling. Currently there is no way except for time stamp correlation to find the submitter IP address in relation to the LRMS jobID.

The same line also helps in finding data for accounting services quickly.

A good example of a like would be:
Apr 29 19:40:44 dissel jobmanager-pbs[31563]: jobmanagement accounting; REMOTE_REQUEST_ADDRESS=::ffff:131.169.255.255; USER_DN=/O=GRID-FR/C=FR/O=Institute/CN=Myname Family; USER_FQAN={ /biomed/Role=NULL/Capability=NULL; /biomed/lcg1/Role=NULL/Capability=NULL; /biomed/team/Role=NULL/Capability=NULL; }; USER_VO=biomed; JOB_REPOSITORY_ID=2012-04-28.10:38:51.0000031532.0000000000; CMD_NAME=JOB_START; uid=biome070; gid=biome; jobId=16217781746731561326.1180412860493205382; lrmsAbsJobId=pbs/20120429/23078451.stro.nikhef.nl;

Patches require GRID_ID,GATEKEEPER_PEER,GATEKEEPER_JM_ID to be passed to the jobmanager.

Patch is attached!

Comments

Globus Toolkit/GT-442

Summary

PBS job manager leaves single-core-single-node jobs by default in $HOME, which may be a shared directory

Details

Type: Improvement

Status: Open

Description

This is a really optional request - since it changes the behaviour of the system. What about making the default job directory be $TMPDIR on the worker node if that is defined and exists for single-core-single-node jobs unless the user defines a different directory?

$HOME may be on a shared file system and users writing scratch data there usually kill a shared file server. Putting $CWD by default to $TMPDIR on the worker node prevents such incidents.

Patched example (including other patches) is in
https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/

Patch is attached!

Comments

Globus Toolkit/GT-443

Summary

allow PBS job manager to set GLOBUS_LOCATION to a different value for the generated scheduler script than on the GRAM5 service node

Details

Type: Improvement

Status: Open

Description

Since the GRAM5 head node and the worker nodes in the batch system do not necessarily run the same OS version or distibution, the GLOBUS_LOCATION variable may have to be different on the WNs. Currently, the generated LRMS job script forces GLOBUS_LOCATION in the resulting job to be set to the value in the GRAM5 head node.
This should be configurable.

Patched example HARD-CODED (Sorry!) (including other patches) is in
https://ndpfsvn.nikhef.nl/repos/pdpsoft/trunk/nl.nikhef.ndpf.tools/globus-gram-job-manager-pbs-nikhef/

Patch is attached!

Comments

Joe Bester - 2013-07-25

Does setting -target-globus-location path-on-exec-node work for you? That's been available for a while now. That provides a different GLOBUS_LOCATION in the execution node's environment.

Globus Toolkit/GT-444

Summary

Support for gsi-authz.conf in data nodes not working

Details

Type: Bug

Status: Open

Description

here at LRZ we use a split configuration for the production GridFTP instances. We would like to add VOMS support but we found a problem.

Given a VOMS configuration (see {lcas, lcmaps}.db_lxcluster) working for a single instance of GridFTP, we tried to extend it to a split setup. The scripts used to configure and launch the front end and the back end are, respectively, gridftp_fe and gridftp_be.



First of all, we compiled libraries and plugins from scratch, since our systems are SLES11 based, in particular:



- lcas-1.3.19

- voms-2.0.8

- lcmaps-1.5.7

- lcmaps-plugins-basic-1.5.1

- lcmaps-plugins-voms-1.5.5

- lcas-lcmaps-gt4-interface-0.2.5



I assume that all the environment variable are properly set, since everything is working for the single server case, as already said.



If the grid-mapfile has an entry for my DN, I can use the split service. If I want to use a pool account, i.e.

- removing my DN from the grid-mapfile

- adding the entry ' "/esr/*" .ops ' in order to map members of the ESR VO into accounts ops000* (gridmapdir, groupmapfile and groupmapdir are correctly populated)

then nothing works.



We tried many configurations:



- front end running as unprivileged user and back end as root: nothing works, even the log file of VOMS is not created (not a problem of permissions, the specific environment variable set in the configuration script of the service points to /tmp)

- both front end and back end running as root, VOMS disable on the front end: of course it does't work if I want to use a pool account, the front end can't do the authentication and the error message is consistent: "530-Login incorrect. : globus_gss_assist: Gridmap lookup failure: Could not map ..."

- both front end and back end running as root, VOMS enabled on the front end (VOMS enabled/disabled on the back end doesn't matter): it doesn't work. The error message is

"530-Login incorrect. : globus_gss_assist: Error invoking callout

530-globus_callout_module: The callout returned an error"

The VOMS logfile is created, voms_fe-root-voms_be-root-novoms.log: it doesn't show any error, on the contrary, I'm mapped to the correct pool account.


Preliminary investigations seem to suggest that the back-end does not use the GSI authentication callout, by which we've implemented the VOMS integration. Come to think of it, this actually makes sense as it is the front-end that connects to the back-end, and probably not by using GSI authentication. Without GSI authentication, globus-gridftp-server falls back to using plain grid-mapfile lookups.

Can you please support split configurations in the GSI call-out?

Tracked as https://rt.ige.psnc.pl/rt/Ticket/Display.html?id=340 in EGCF RT.

Comments

dennisvd - 2013-07-24

To contribute to what Helmut reported: I've done some testing locally on the Nikhef testbed and what I see is that the gsi-authz call-out is run only for the front-end, not for the back-end. This can be illustrated by using separate LCMAPS log files for front-end and backend (no logfile is created for the back-end at all).

I've been using the documentation http://www.globus.org/toolkit/docs/5.2/5.2.4/gridftp/admin/#globus-gridftp-server and tried several combinations of -ipc-auth-mode, to no effect.

Globus Toolkit/GT-445

Summary

Doxygen fixes

Details

Type: Bug

Status: Resolved 2013-11-07

Description

The attached patches fixes some doxygen issues in globus-gsi-cert-utils, globus-gssapi-gsi and globus-xio.

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-446

Summary

Documentation built twice

Details

Type: Bug

Status: Resolved 2013-11-07

Description

The build of the doxygen documentation is built in a phony makefile target. This means that the documentation is built twice, once during "make" and once again druing "make install". If a real target is used instead of a phone one this can be avoided since make will then realise taht the documentation has already been built in the second invocation. The attached patch implements this.

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-447

Summary

Data Management virtual packages don’t work

Details

Type: Bug

Status: Resolved 2013-11-07

Description

From: Stephen Rosen 
Date: July 24, 2013 12:15:34 PM CDT
Subject: [globus-dev] Misleading Instructions for GT 5.2.4 Install on Ubuntu

I was setting up a GridFTP server on a fresh 12.04 Ubuntu instance, and I realized that either the instructions for a fresh install of GT from native packaging are wrong, or they are misleading.
The instructions I was following are here: http://www.globus.org/toolkit/docs/5.2/5.2.4/admin/install/

For the record, it is possible that I did something wrong. The commands issued were
wget http://www.globus.org/ftppub/gt5/5.2/5.2.4/installers/repo/globus-repository-5.2-stable-precise_0.0.3_all.deb
sudo dpkg -i globus-repository-5.2-stable-precise_0.0.3_all.deb
sudo aptitude update
sudo aptitude install globus-data-management-server

After a conversation with Jack, he pointed out that I probably wanted to install the globus-gridftp package instead. That install worked, but the packages
globus-data-management-server
globus-data-management-client
globus-data-management-sdk
globus-resource-management-server
globus-resource-management-client
globus-resource-management-sdk

All fail with messages about unmet dependencies which are virtual packages.
The doc implies that they should all install successfully after adding the package. I only checked Ubuntu 12.04 and 10.04, but I bet this holds on all or many flavors of Debian.

I reread the doc, and if this is expected behavior it should be more clear there.

Best,
-Stephen

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-448

Summary

Flavor dependent file in globus_core/noflavor_data package

Details

Type: Bug

Status: Resolved 2013-11-07

Description

grep gcc /usr/share/globus/packages/globus_core/noflavor_data.filelist
/share/globus/flavors/flavor_gcc64.gpt

The flavor description metadata file, which is flavor dependent is part of the noflavor_data package, which as a result is no longer flavor independent.

Suggestion - move to e.g. dev_${flavor} package. Patch attached.

Comments

Joe Bester - 2013-11-07

This is fixed in 5.2.5

Globus Toolkit/GT-449

Summary

GCMU Install Incomplete When Use Aborts via no response to not fully qualified domain question

Details

Type: Bug

Status: Resolved 2013-11-07

Description

box does not have fqdn as shown by hostname -f

User runs sudo aptitude -y install globus-connect-multiuser (on ubuntu 12.04 in this case)

User is prompted as follows and answers no:
The hostname fri-interview does not appear to be fully qualified.
Do you wish to continue? [n]

It looks like install continues but it is not fully baked

Note this question is coming from grid-cert-request as run by globus-simple-ca.postinst

Comments

Jack Kordas - 2013-09-10

zendesk ticket: https://support.globusonline.org/tickets/301463

Joe Bester - 2013-09-10

It's a quirk of debian packages that if a postinstall or preinstall script fails it leaves things in a half-installed state. So, while we can fix it for globus-simple-ca, we can't make all installs work in general. In any case, I've moved this to GSI, since that's where the issue exists.

Joe Bester - 2013-11-07

I've fixed the simple ca postinst to not abort in this case.

Globus Toolkit/GT-450

Summary

extra newline in PASV error message

Details

Type: Bug

Status: Open

Description

09-11 21:41:13.128 _conn_cc_response S0.178< 500-500 Command failed.%0D%0D%0A500- : globus_ftp_control_local_pasv failed.%0D%0A500-globus_xio: globus_l_xio_tcp_bind failed.%0D%0A500-globus_xio: System error in bind: Address already in use%0D%0A500-globus_xio: A system call failed: Address already in use%0D%0A500 End.%0D%0A

becomes:

500-500 Command failed.

500- : globus_ftp_control_local_pasv failed.
500-globus_xio: globus_l_xio_tcp_bind failed.
500-globus_xio: System error in bind: Address already in use
500-globus_xio: A system call failed: Address already in use
500 End.

This appears on:

 telnet gridftp2.cac.cornell.edu 2811
Trying 128.84.3.47...
Connected to gridftp2.cac.cornell.edu.
Escape character is '^]'.
220 gridftp2.cac.cornell.edu GridFTP Server 6.35 (gcc64, 1375286616-83) [Globus Toolkit 5.2.4 GCMU-2.0.54] ready.

Comments

Globus Toolkit/GT-451

Summary

Cannot transfer using globus-url-copy. globus-xio Authentication error

Details

Type: Bug

Status: Open

Description

I tried doing a globus-url-copy from the server which has GridFTP 5.2.4 installed. It throws an Authentication error.

The stack trace is as follows:

530-globus_xio: Authentication Error
530-OpenSSL Error: a_verify.c:168: in library: asn1 encoding routines, function ASN1_item_verify: EVP lib
530-OpenSSL Error: fips_rsa_eay.c:748: in library: rsa routines, function RSA_EAY_PUBLIC_DECRYPT: padding check failed
530-OpenSSL Error: rsa_pk1.c:100: in library: rsa routines, function RSA_padding_check_PKCS1_type_1: block type is not 01
530 End.

I created the certificates using the following process:
1. grid-ca-create
2. grid-cert-request (for user)
3. grid-cert-request (for host)
4. grid-ca-sign (for user) - created usercert.pem and the corresponding pem in /home/username/.globus
5. grid-ca-sign (for host) - created hostcert.pem and the corresponding pem in /var/lib/globus/simpla_ca/newcerts

Do you know what I'm doing wrong?

Thanks,
Sonali

Comments

Globus Toolkit/GT-452

Summary

autoconf macros for globus-core can’t handle absolute path in CC variable (e.g. CC=/usr/bin/gcc)

Details

Type: Bug

Status: Open

Description

When the CC environment variable is set to the absolute path of the compiler, the configure script of globus-core-8.8 initially respects this setting but at some point in the configuration switches to the first 'gcc' in the PATH. This behaviour does not occur when setting CC to just the name of the compiler.

Steps to reproduce:
 ln -s /usr/bin/gcc /tmp/mygcc
 cd globus-core-8.8
 mkdir build
 cd build
 CC=/tmp/mygcc ../configure --with-flavor=gcc64pthr

The result:
grep '^CC=' config.log
CC='/usr/bin/gcc'

The expected result:
grep '^CC=' config.log
CC='/tmp/mygcc'

If the configure script is run alternatively with
 PATH=/tmp:$PATH CC=mygcc ../configure --with-flavor=gcc64pthr

The result is as expected.

This issue breaks compatibility with macports; fixing the compiler to use 'gcc' as the preferred compiler does not work as the macports build scripts will use CC=/usr/bin/gcc.

See also https://trac.macports.org/wiki/UsingTheRightCompiler

Comments

Globus Toolkit/GT-453

Summary

multiple globus-job-manager instances appear for a single job

Details

Type: Bug

Status: Resolved 2013-10-28

Description

One of the people at the OSG GOC is seeing multiple globus-job-manager processes start up whenever a job is started.  See https://ticket.grid.iu.edu/goc/15953 for details.

Comments

Joe Bester - 2013-10-10

This is normal behavior if there are other jobs that are present in GRAM's view (state files in /var/lib/globus/gram_job_state). Those processes will terminate when the  job manager is idle, and main GRAM job manager process jobs will terminate when all jobs are removed from the system either by expiration or job termination and two-phase commit.

sthapa - 2013-10-11

Hi Joe,

Elizabeth says that the job never completes when this happens.  I think she's running into some sort of anomalous behavior here instead of typical operations.

Suchandra

Globus Toolkit/GT-454

Summary

memory leak in gss_accept_sec_context

Details

Type: Bug

Status: Resolved 2014-09-26

Description

Valgrind reports definitely lost as follows

==7809== 4,651,691 (261,200 direct, 4,390,491 indirect) bytes in 1,306
blocks are definitely lost in loss record 1,385 of 1,386
==7809==    at 0x4A05FDE: malloc (vg_replace_malloc.c:236)
==7809==    by 0x3D7025D95D: CRYPTO_malloc (in /usr/lib64/libcrypto.so.1.0.0)
==7809==    by 0x3D702D4E6C: ??? (in /usr/lib64/libcrypto.so.1.0.0)
==7809==    by 0x3D702D7D19: ASN1_item_ex_d2i (in /usr/lib64/libcrypto.so.1.0.0)
==7809==    by 0x3D702D83B3: ASN1_item_d2i (in /usr/lib64/libcrypto.so.1.0.0)
==7809==    by 0x3D702CC0F5: ASN1_item_dup (in /usr/lib64/libcrypto.so.1.0.0)
==7809==    by 0x552FEEF: globus_gsi_cred_get_cert (in /usr/lib64/libglobus_gsi_credential.so.1.4.6)
==7809==    by 0x4ED2916: gss_accept_sec_context (in /usr/lib64/libglobus_gssapi_gsi.so.4.6.8)

I guess this the same problem as the memory leak in gss_accept_delecation() reported by

https://globus.atlassian.net/browse/GT-161

Comments

Joe Bester - 2014-09-26

This was fixed a while back and is included in GT6.0

Globus Toolkit/GT-455

Summary

Incorporate OSG patches

Details

Type: Task

Status: Open

Description

OSG has a large number of patches to various components that we'd like to include in upstream if possible.

This is an umbrella ticket; specific patches will have their own sub-tickets.

Comments

Joe Bester - 2014-09-18

All of the GRAM ones are committed to the GT6 branch and are now being processed by the build system. I'll update when the new packages are available in the GT6 testing repository.

Joe Bester - 2014-10-01

The remaining patches are GridFTP-related

Globus Toolkit/GT-456

Summary

OSG patch "load_requests_before_activating_socket.patch" for globus-gram-job-manager

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to cleanly recover when globus-job-manager processes die. It applied cleanly to gt525/packaging/source-trees/gram/jobmanager/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-457

Summary

OSG patch "nfslite.patch" for globus-gram-job-manager-condor

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to add NFSLite support to the Condor jobmanager. It applies to gt525/packaging/source-trees/gram/jobmanager/lrms/condor/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-458

Summary

OSG patch "groupacct.patch" for globus-gram-job-manager-condor

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to add a callout for accounting groups to the Condor jobmanager. It applies to gt525/packaging/source-trees/gram/jobmanager/lrms/condor/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-459

Summary

OSG patch "669-xcount.patch" for globus-gram-job-manager-condor

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch for the Condor jobmanager to add support for the xcount and min_memory attributes. It applies to gt525/packaging/source-trees/gram/jobmanager/lrms/condor/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-460

Summary

OSG patch "717-max-walltime.patch" for globus-gram-job-manager-condor

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to add a periodic_remove statement for the max_wall_time attribute for the Condor jobmanager. It applies to gt525/packaging/source-trees/gram/jobmanager/lrms/condor/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-461

Summary

OSG patch "increase-concurrency.patch" for globus-gram-protocol

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to increase maximum concurrency in globus-gram-protocol. It applies to gt525/packaging/source-trees/gram/protocol/source in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-462

Summary

OSG patch "level-out-connection-speeds.patch" for globus-ftp-control

Details

Type: Sub-task

Status: Open

Description

This is a patch to fix connection speed leveling on servers with different buffer sizes. It is critical for HDFS, but may need additional review. It applies to gt525/packaging/source-trees/gridftp/control/source in globus_5_2_branch.

Comments

Globus Toolkit/GT-463

Summary

OSG patch "osg-path.patch" for globus-gram-job-manager-scripts

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to add a default PATH to the environment in JobDescription.pm. This patch may need review. It applies to gt525/packaging/source-trees/gram/jobmanager/scripts in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-464

Summary

OSG patch "gridftp-conf-logging.patch" for globus-gridftp-server

Details

Type: Sub-task

Status: Reopened

Description

This patch adds logging options to the default gridftp.conf. This patch may need additional review. It applies to gt525/packaging/source-trees/gridftp/server/src in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Joe Bester - 2014-10-01

Oops. Wrong issue #. This one needs review by Mike

Globus Toolkit/GT-465

Summary

OSG patch "gatekeeper-logrotate-copytruncate.patch" for globus-gatekeeper

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to the logrotate config file for globus-gatekeeper to work around an issue where the gatekeeper keeps writing to a logfile after it has been rotated. This may need additional review. This applies to gt525/packaging/source-trees/gatekeeper/source in globus_5_2_branch. (There is a similar patch for globus-gram-job-manager).

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-466

Summary

OSG patch "logrotate-copytruncate-jobmanager.patch" for globus-gram-job-manager

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This is a patch to the logrotate config file for globus-gram-job-manager to work around an issue where the job-manager keeps writing to a logfile after it has been rotated. This may need additional review. This applies to gt525/packaging/source-trees/gram/jobmanager/source in globus_5_2_branch. (There is a similar patch for globus-gatekeeper).

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-467

Summary

OSG patch "gratia.patch" for globus-gram-job-manager-scripts

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This patch adds a callout to Gratia (if it's present) to the job manager script. It is OSG specific, but should be harmless on non-OSG sites. This patch may need additional review. It applies to gt525/packaging/source-trees/gram/jobmanager/scripts in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-468

Summary

OSG patch "osg-environment.patch" for globus-gram-job-manager-scripts

Details

Type: Sub-task

Status: Resolved 2014-10-01

Description

This patch adds OSG job environment information to the job manager scripts. It's OSG-specific, but should be harmless on non-OSG sites. This patch may need additional review. It applies to gt525/packaging/source-trees/gram/jobmanager/scripts in globus_5_2_branch.

Comments

Joe Bester - 2014-10-01

Published to stable repo for GT 6 updates. See http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-469

Summary

MFMT/UTIME update access time but shouldn’t

Details

Type: Bug

Status: Resolved 2013-10-15

Description

The FTP commands MFMT/UTIME set modification time via utime(). MFMT is only supposed to set modification time. I don't see a reference to this in RFC 959 or RFC 3659 but it was described in an RFC draft (http://tools.ietf.org/html/draft-somers-ftp-mfxx-03#page-7).

This change seems trivial but consider the consequence. Clients that maintain modification times post transfer (like GO) cause the access time to be set back as well. On systems that purge data based, in part, on last access time, the data that just transferred is instantly considered out of date and becomes a purge candidate.

Comments

Globus Toolkit/GT-470

Summary

Globus IO reports timeout error as cancellation

Details

Type: Bug

Status: Resolved 2013-10-15

Description

IO compat lib wraps timeout errors from xio as caller generated cancellations, which is different from original IO behavior.  This causes issues with globus_ftp_control which only expects cancellation errors on connections it actively cancels.

Comments

Globus Toolkit/GT-471

Summary

select() for writes on a broken socket never indicates ready

Details

Type: Sub Issue

Status: Open

Description

GridFTP server is seeing issues where a connection is written to without reading, and eventually the writes hang.  The root of the cause is that select never indicates write-ready despite the connection being broken.  Simple tests show that write normally indicates write-ready after disconnection, but it appears that if the write buffer fills before the disconnection is detected (write() does not return error yet), then select will not indicate write-ready.  If it did, the resulting write attempt would fail.  SO_ERROR does not indicate error.

This is mainly for documentation.  I am not sure if this is valid kernel behavior or if there is a real workaround when not reading on the socket.  The most specific doc I can find only indicates that select for write will indicate ready if there is buffer space.   epoll() does not seem to suffer from this issue so that may be the way to go if this is only a linux issue.

Comments

Globus Toolkit/GT-472

Summary

GridFTP server fails to detect client disconnection with piplining

Details

Type: Bug

Status: Resolved 2013-10-15

Description

The GridFTP server reads ahead to queue up client commands, which is the basis of our pipelining support.  After the queue limit is reached the server will stop reading client commands.  If the client disconnects in this state, the server continue to write which eventually will not finish (select doesn't indicate the socket is ready for writes, GT-471), and the server will hang waiting for that write callback.

Comments

Mike Link - 2013-10-15

Reconfigured server to never stop reading from the queue, but to return an error if the pipeline read queue limit is exceeded.  The limit is set at 1000 commands, configurable via env GFS_MAX_READ_QUEUE

Globus Toolkit/GT-473

Summary

GridFTP sharing extentions specification

Details

Type: Documentation

Status: Resolved 2014-03-19

Description

write a spec (in IETF RFC style) for the sharing extensions to Globus Online. The two other GridFTP server implementations, dCache and Gryphon for iRods, are interested in adding support for those extensions. Once complete, submit the spec to OGF for standardization.

Comments

msalle - 2014-03-19

Is the specification somewhere available? That could (partially) solve ticket https://globus.atlassian.net/browse/GT-476 and thereby my still open question on ticket https://globus.atlassian.net/browse/GT-475.
Mischa

Globus Toolkit/GT-474

Summary

Report platform/distro name and version with usage stats

Details

Type: New Feature

Status: Open

Description

Right now the GridFTP server reports its own version and release information to usage stats, but it could be useful to know the platform, distro, and version of the host machine.  Possibly include this in the usage stats library rather than requiring each application to query it.

Comments

Globus Toolkit/GT-475

Summary

Globus Online Sharing does not work with LCAS/LCMAPS libraries

Details

Type: Bug

Status: Open

Description

This issue is tracked at https://rt.ige-project.eu/rt/Ticket/Display.html?id=367


The IGE testbed machine at LRZ, gt5-ige.drg.lrz.de, hosts a GridFTP server "GO Share" enabled on port 2812 (endpoitn: mlanati#gt5-ige). The sharing feature works, but when GridFTP is configured with LCAS/LCMAPS support for VOMS support, sharing fails with the following message:



[11160] Sun Aug 25 16:49:13 2013 :: New connection from: cli.globusonline.org:58654

[11160] Sun Aug 25 16:49:14 2013 :: cli.globusonline.org:58654: [CLIENT]: USER :globus-sharing:cert=LS0tLS1CRUdJTiBD...

...
BVEUtLS0tLQ==;id=222ee528-ed20-11e2-9f2f-22000a972bd6;sharee=matteodemo;

[11160] Sun Aug 25 16:49:14 2013 :: cli.globusonline.org:58654: [CLIENT]: PASS dummy

[11160] Sun Aug 25 16:49:14 2013 :: DN /C=US/O=Globus Consortium/OU=Globus Online/OU=Transfer User/CN=__transfer__ has provided sharing credentials for DN /C=DE/O=GridGermany/OU=Leibniz-Rechenzentrum/CN=Matteo Lanati.

[11160] Sun Aug 25 16:49:14 2013 :: cli.globusonline.org:58654: [CLIENT]: PASS dummy

[11160] Sun Aug 25 16:49:14 2013 :: cli.globusonline.org:58654: [SERVER]: 530-Login incorrect. : globus_gss_assist: Error invoking callout

530-globus_callout_module: The callout returned an error

530-an unknown error occurred

530 End.

[11160] Sun Aug 25 16:49:14 2013 :: Closed connection from cli.globusonline.org:58654

[10494] Sun Aug 25 16:49:14 2013 :: Child process 11160 ended with rc = 0


It looks like that the library performing the credential mapping should be updated.

Please, get in touch if you need access to the testbed.


Later investigation revealed:
I found out that if I add to the grid-mapfile the DN used by the Globus Share service and I associate it with a user having sharing privileges on the folder, then it works.
Note that this is not necessary in a regular GridFTP installation (i.e. without LCAS/LCMAPS callout)

I have been looking in the logs (I was on holiday last week), somehow LCMAPS receives the
 "/C=US/O=Globus Consortium/OU=Globus Online/OU=Transfer User/CN=__transfer__"
credential, and not the "/C=DE/O=GridGermany/OU=Leibniz-Rechenzentrum/CN=Matteo Lanati" credential.
Now I don't know how this is supposed to work, but it doesn't sound like anything we can change. LCMAPS is simply responding to what it receives in the gsi credential. From the gsiftp.log it seems like both credentials are available, but the wrong one is passed in/used.

Comments

Mike Link - 2013-10-18

Hi Helmut,

I need to create a formal doc, but basically you'll have to add sharing support to your callout.

Here is a good example of the simplest change you would need to make:
http://viewcvs.globus.org/viewcvs.cgi/gsi/gridmap_callout/source/globus_gridmap_callout.c?r1=1.4&r2=1.4.22.2

The function globus_gsi_cred_read_cert_buffer() can return you the subect, x509 cert, x509 chain, and the globus_gsi_cred_handle_t of the original user cert;

In order to support sharing in a gridmap callout, you must not fail if the credentials are expired, and any extensions that you reference in the credential must be ageless (i.e. an auth token to a remote system that itself expires would not be compatible with sharing).

msalle - 2013-10-28

Hi Mike,

In order to be able to implement this, I have a few questions.
The input is a pem-encoded proxy cert as far as I can see, which parts are allowed to have expired? Only the proxy cert itself, or also the EEC, or maybe even the (root)CA cert(s)? And should revoked EEC and CA certs be accepted? Should the chain itself be validated? Furthermore, what is the precise reason for wanting to accept expired credentials? Is it because the (proxy) cert is basically used a convenient wrapper around the DN. It makes me feel a bit uncomfortable to rely on expired proxy certificates, in my opinion once a proxy is expired it's useless, but it seems it could here still be used.

Best wishes,
Mischa

helmut - 2014-02-10

Hi Mike,

I think Mischa still needs your answers in order to be able to proceed. Could you please advise him?

Thanks!
Helmut

Anonymous - 2014-05-16

Hi Mike,

as I commented in the IGE rt tracker:

I am implementing a setup where the sharing proxy is completely (i.e. also chain, VOMS AC etc.) checked, but always and entirely at the notBefore time. So everything might be long expired, including even root CA certs, but they may *not* be revoked.
Would this be a correct implementation?

Best wishes,
Mischa

Globus Toolkit/GT-476

Summary

Document gridmap callout requirements for sharing support

Details

Type: Documentation

Status: Open

Description

Existing callouts will need to be changed to support sharing.  Document.

Comments

Globus Toolkit/GT-477

Summary

Tracking TCP retransmits on the GridFTP server

Details

Type: Task

Status: Open

Description

It could be *really* useful for the GridFTP server to log the number of TCP retransmits for each transfer (with "-log-transfer" option would be fine). This way for performance troubleshooting, it would be easy to tell network issues from end-host/disk issues. Its really easy to collect this. Some sample code from iperf3 is below.

#if defined(linux) || defined(__FreeBSD__)
   socklen_t tcp_info_length = sizeof(struct tcp_info);

   if (getsockopt(sp->socket, IPPROTO_TCP, TCP_INFO, (void *)&irp->tcpInfo, &tcp_info_length) < 0)
       iperf_err(sp->test, "getsockopt - %s", strerror(errno));
#endif


/*************************************************************/
long
get_tcpinfo_total_retransmits(struct iperf_interval_results *irp)
{
#if defined(linux) && defined(TCP_MD5SIG)
   return irp->tcpInfo.tcpi_total_retrans;
#else
#if defined(__FreeBSD__) && __FreeBSD_version >= 600000
   return irp->tcpInfo.__tcpi_retransmits;
#else
   return -1;
#endif
#endif
}

Comments

Raj Kettimuthu - 2014-02-10

Tierney tells me that there is often a burst of retransmits at the end of slow start. These retransmits are not really of interest to us. The interesting data is if there are a steady rate of retransmits through out the data transfer. So, we should either log a histogram of retransmits (say 1 minute buckets) or just wait 10 seconds (for slow start to end) before tracking retransmits.

Globus Toolkit/GT-478

Summary

Comments on GT 5.2.5 Release Candidate

Details

Type: Bug

Status: Open

Description

Here are some comments on the GT 5.2.5 Release candidate.

=====================
globus-gsi-credential
=====================

This version (5.7) adds symbols to the library w.r.t previous version (5.3)
 - globus_gsi_cred_read_cert_buffer
 - globus_gsi_cred_verify_cert_chain_when

The Major version and Age should therefore increase, and Minor version
should be reset. (The Age part is important! - otherwise the soname changes.)

-    
+    

Packages using any of the new symbols must declare dependencies
(compile and linking) on globus-gsi-credential version 6 in their GPT
metadata. For most of them this is a new dependency that must be added
arther than an existing dependency that should change version.
 - globus-gridftp-server
 - globus-gridmap-callout
 - globus-gridmap-eppn-callout
 - globus-gridmap-verify-myproxy-callout
 - globus-gss-assist


=================
globus-gss-assist
=================

Add new dependency on globus-gsi-credential in GPT metadata - see
above for details.

This version (8.9) adds symbols to the library w.r.t previous version (8.7)
 - globus_gss_assist_map_and_authorize_sharing

The Major version and Age should therefore increase, and Minor version
should be reset. (The Age part is important! - otherwise the soname changes.)

-    
+    

The new function is missing in the header file. This results in warnings
about implicit declarations when compiling packages that depend on it.

Add the new function to the header file!

Packages using the new symbol must declare dependencies (compile and
linking) on globus-gsi-credential version 9 in their GPT metadata.
This seems to be only one: globus-gridftp-server

             
                 
-                    
+                    
                 
             

(The change above should be appled three times (compile, lib_link, pgm_link)


=====================
globus-gridftp-server
=====================

Add compile and link depeendency on globus-gsi-credential version 6 in GPT
metadata (see above).

Update compile and link dependency on globus-gss-assist to version 9 in GPT
metadata (see above)


=====================================
globus-gridmap-callout
=====================================

Add compile and link depeendency on globus-gsi-credential version 6 in GPT
metadata (see above).


=====================================
globus-gridmap-eppn-callout
=====================================

Add compile and link depeendency on globus-gsi-credential version 6 in GPT
metadata (see above).

The gpt metadate contains a  tag that is not quite empty
(This confuses the globus-spec-creator script).

-            
-            
+            


=====================================
globus-gridmap-verify-myproxy-callout
=====================================

Add compile and link depeendency on globus-gsi-credential version 6 in GPT
metadata (see above).

The gpt metadate contains a  tag that is not quite empty
(This confuses the globus-spec-creator script).

-            
-            
+            

Comments

Mattias Ellert - 2013-10-26

Another comment - The SLURM module description is inconsistent with the existing ones:

    Condor Job Manager Support
    Fork Job Manager Support
    LSF Job Manager Support
    PBS Job Manager Support
    Grid Engine Job Manager Support

But:

    SLURM GRAM LRM

I strongly suggest changing this to be consistent with the others:

    SLURM Job Manager Support

Mattias Ellert - 2013-10-26

The 5.2.5 RC is not tagged in the CVS like the release candidates of earlier releases, so it looks more like a random snapshot than a proper release candidate. I hope the final release will not be done without a CVS tag.

The RC source directory http://www.globus.org/ftppub/gt5/5.2/5.2.5rc1/packages/src/ contains a lot of files that are not part of the release. A release (or an RC) should only contain one version of each package. This RC contains multiple versions of many packages and its contents is therefore ambiguous. It looks like the RC source directory started from a complete copy of the previous release's update directory http://www.globus.org/ftppub/gt5/5.2/5.2.4/updates/src/ without removing old versions of packages. The RC source directory should contain only the package versions in the release candidate. Compare to the contents of the source directories of previous RCs.

Mattias Ellert - 2013-12-11

Hers is a patch for one of the remaining issues.

Globus Toolkit/GT-479

Summary

memory leak in LSF job manager

Details

Type: Bug

Status: Resolved 2014-01-16

Description

Wei Yang at SLAC is running into problems with a memory leak or something similar in the LSF jobmanger.  I've included his report below and am getting more details.  There's a OSG GOC ticket for this open at https://ticket.grid.iu.edu/goc/17605


At SLAC we constantly have problem with LSF jobmanager with the new GT5 installation. It runs a few days (now a few hours) and it starts to pile up globus-job-manager processes. One of them seems to be still running OK but it memory usage grows quickly. All other globus-job-manager (last time I checked, there are 11k of them) are basically hanging at

connect(X,X,"/var/lib/globus/gram_job_state/osgatlas01/osgserv02/lsf.18ac9349.sock")

This happens with both SEG enabled and disabled. In most case, the memory leak is fast and I don't get much change to find it before the machine become unresponsive. But there are two cases (one with SEG and one without SEG) that I stopped globus-gatekeeper, and the system slowly go back to normal (that particular globus-job-manager is stil there but all others disappeared).

After i stopped globus-gatekeeper, there are still jobs be submitted to our LSF (probably by those reminding globus-job-manager processes?)

The CE is a RHEL6 x85_64 with 16GB in KVM. with OSG 3.1.25 rpms. Any idea?

Comments

sthapa - 2013-10-30

It looks like Wei may have resolved this issue:


I found that the SEG module is using the secondary LSF log, not the primary log. I believe this is the root cause of most of the problems I saw, if not all. I fixed it. Let me run for a month (previously we were able to run for  two weeks with the secondary log) and see if we still run into problem. If not, i will close this ticket, or we can follow up if there are still problem.

I think the globus-job-manager may still have some memory problem but I hope that by using the primary log, the problem will never be triggered.

regards,
Wei Yang  |  yangw@slac.stanford.edu  |  1-650-926-3338

Stuart Martin - 2013-10-30

Thanks for the update.  Marking this resolved for now.  Reopen or create a new issue if something specific is identified.

sthapa - 2013-12-03

Can this be reopened?  Wei indicates that it's still a problem:

Hi Chris,

We still see large number of gram state files left behind at /var/lib/globus/gram_job_state/osgatlas01/18ac9349/lsf (not today since we have a site wide downtime today), and it seems one of the (long running) globus-job-manager spent a lot of time on those files. These still results in huge # of globus-job-manager processes hanging around (I guess they were all waiting for the long running globus-job-manager to respond to a unix socket). When the # of these process reach ~20k, the system will run out of memory.

To mitigate these issues, I run a script to delete state files older than 3 days, and temporarily stop globus-gatekeeper service if the # of globus-job-manager goes beyond 1000. Doing so allows me to keep the number state files below 14k and thus we are able to archive stable operation. But I think the problem is clearly still there.

regards,
Wei Yang  |  yangw@slac.stanford.edu  |  650-926-3338(O)

Joe Bester - 2013-12-10

This is difficult for me to debug as I don't have any LSF capable machines that I can access. I guess some things to figure out:

- Why isn't the job manager responding to the startup socket connects?
- Why are there so many jobs in the job state directory? Are they finished? Why aren't they being cleaned/expired.
- Why is the client sending so many job requests when the service is not responding?

The first one we might be able to get some info from the logs at higher levels of verbosity. I think OSG has some sort of state file parser that might help figure out whether the jobs are completed for the second item.

Stuart Martin - 2014-01-16

resolving this.  Reopen with additional info if it comes up again.

Globus Toolkit/GT-480

Summary

Implement GO plan for HTTP protocol support

Details

Type: New Feature

Status: Resolved 2014-01-22

Description

Changes existing http command protocol in concert with GO.

Comments

Mike Link - 2014-01-22

Major functionality finished and released in a test build.

Globus Toolkit/GT-481

Summary

Avoid UDT driver blocking

Details

Type: Improvement

Status: Resolved 2013-11-06

Description

The udt driver is written using a blocking udt api, which is supported in xio's framework via a wrapper to blocking drivers (wrapblock).  A particular issue here is a blocking accept, which results in a hang before eventual timeout on the listening side of failed udt connections.   libudt has a select interface so that should be used for at least the accept call.

Comments

Mike Link - 2013-11-06

fix committed after 5.2.5 release.

Globus Toolkit/GT-482

Summary

misleading error message when filenames contains invalid characters

Details

Type: Bug

Status: Resolved 2013-11-06

Description

Trying to create a directory containing '>', which is not allowed in windows, results in the following error:

500-Command failed : System error in mkdir: Invalid argument
500-A system call failed: The device does not recognize the command.

It would be nice to display:
A filename cannot contain any of the following characters:
\ / : * ? " < > |
http://support.microsoft.com/kb/177506
Which is what windows explorer does, at least in windows 7.

Comments

Mike Link - 2013-11-06

fixed in windows-3.

Globus Toolkit/GT-483

Summary

windows: using -home-dir causes conflicts with sharing state dir

Details

Type: Bug

Status: Resolved 2013-11-06

Description

On windows, -home-dir overrides the internal setting for the true home dir, which is used to set sharing state dir and is expected to be in native path format.  This can cause sharing state dir to be an unwanted or invalid path and break sharing.

Comments

Mike Link - 2013-11-06

Fixed in windows-2.

Globus Toolkit/GT-484

Summary

xio gsi driver hardcodes "TCP" in failed connection error message

Details

Type: Bug

Status: Resolved 2013-11-06

Description

Error message references TCP driver even if tcp isn't the underlying protocol.

Comments

Globus Toolkit/GT-485

Summary

~ defaults to / if the home dir path contains a symlink that leads outside of rp list

Details

Type: Bug

Status: Resolved 2014-01-22

Description

The internal access check that sets the home dir to / happens before the rp symlink normalization (from GT-374) that would enable that check to pass.

Comments

Mike Link - 2014-01-22

Fixed in GCP build 4.

Globus Toolkit/GT-486

Summary

Perform udt driver connection using the socket negotiated by ice instead of binding to the negotiated source addr.

Details

Type: New Feature

Status: Resolved 2014-01-22

Description

By using the negotiated socket we can avoid binding to a specific port, which the local firewall might detect as an inbound connection. Bryce did some work in KOA-2817 to expose the socket from within libnice.

Comments

Mike Link - 2014-01-22

Done, released in latest GCP builds.

Globus Toolkit/GT-487

Summary

Normalize paths passed via key=value; paramters

Details

Type: Bug

Status: Resolved 2014-01-22

Description

HTTP will pass paths in a parameter string, so we need to support normalization for this to work with restricted paths and sharing.  This will also apply to sharing creation and bugs like KOA-2833 which required normalization.

Comments

Mike Link - 2014-01-22

Released with HTTP test.

Globus Toolkit/GT-488

Summary

Add a guc example for using DCSC

Details

Type: Documentation

Status: Open

Description

See gt-user email with subject "GridFTP DCSC transfer using different source and destination certs"

Add an example or 2 to the gridftp user guide for using guc where DCSC is needed between the 2 endpoints.
http://toolkit.globus.org/toolkit/docs/latest-stable/gridftp/user/#gridftp-user-basic

Comments

Globus Toolkit/GT-489

Summary

xio_gsi errors with OpenSSL 1.0.1e

Details

Type: Bug

Status: Open

Description

RHEL 6.5 / Centos 6.5 just came out, with OpenSSL version 1.0.1e. Upgrading to this version causes the following failures with several grid operations:
{quote}
$ globus-job-run localhost/jobmanager /usr/bin/id
globus_xio_gsi: Token size exceeds limit. Usually happens when someone tries to establish a insecure connection with a secure endpoint, e.g. when someone sends plain HTTP to a HTTPS endpoint without first establishing a SSL session.
 (error code 10)
{quote}
{quote}
$ globus-url-copy  -v file:///etc/virc gsiftp://localhost/tmp/foo
Source: file:///etc/
Dest:   gsiftp://localhost/tmp/
  virc  ->  foo

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : globus_gridftp_server_file.c:globus_l_gfs_file_server_read_cb:2094:
500-callback failed.
500-an end-of-file was reached
500-globus_xio_gsi.c:globus_l_xio_gsi_read_token_cb:1069:
500-The GSI XIO driver failed to establish a secure connection. The failure occured during a handshake read.
500-globus_i_xio_system_common.c:globus_i_xio_system_try_recv:243:
500-An end of file occurred
500 End.
{quote}
These are the commands I've tried, though others may be affected.
Downgrading openssl to 1.0.0 fixes both problems, which is what we are currently telling people to do.

I'm attaching some of the logfiles.
/var/log/messages shows that I've been successfully authenticated, but doesn't show any errors

I've replicated the problem on both the OSG packages (based on 5.2.1) and the 5.2.5 packages from EPEL.

Comments

Joe Bester - 2013-12-06

This patch https://globus.atlassian.net/secure/attachment/11550/impexp.diff

 should fix the GRAM issue (or alternately the configuration option -launch_method fork_and_proxy in the gatekeeper configuration).

Joe Bester - 2013-12-06

I've not been able to duplicate the gridftp issue yet.

Joe Bester - 2013-12-06

It'd help if you could rerun the gridftp command with GLOBUS_GSSAPI_DEBUG_LEVEL=9 environment variable so I can see what's going on. Mike thinks this is probably a client-side issue, so running the globus-url-copy with that should provide some insight.

Mike Link - 2013-12-06

And additionally set these environment variables for the globus-url-copy test
GLOBUS_ERROR_OUTPUT=1
GLOBUS_ERROR_VERBOSE=1

matyas - 2013-12-06

Looks like the globus-url-copy issue is only with GT 5.2.1. I've attached the debug output anyway; globus-url-copy-debug1.txt is the log without GLOBUS_GSSAPI_DEBUG_LEVEL=9, and globus-url-copy-debug2.txt is the log with it.

Joe Bester - 2013-12-06

I've committed the GSSAPI fix above and will generate new packages with that. I'm inclined to mark the other issue as resolved, though I'm not sure exactly which patch between 5.2.1 and current fixes it.

matyas - 2013-12-06

I rebuilt the globus-gssapi-gsi package with your patch for both 5.2.1 and 5.2.5. Now I'm getting a different error:
$ globus-job-run localhost/jobmanager-fork-poll /usr/bin/id
GRAM Job submission failed because an end-of-file was reached
globus_xio: An end of file occurred
 (error code 10)
I'll attach debugging output from 5.2.5.

Joe Bester - 2013-12-06

That one looks like a service-side problem. The client is authenticating and then the server is hanging up on it. Can you send the gatekeeper/job manager logs for those?

matyas - 2013-12-06

Attaching gatekeeper log.
A jobmanager log didn't get created, but I found this in /var/log/messages:

kernel: globus-job-mana[16372]: segfault at 1400000357 ip 0000003dea8e7500 sp 00007fffa7e11ef0 error 4 in libcrypto.so.1.0.1e[3dea800000+1b5000]

matyas - 2013-12-06

That segfault occurs if I take a build of globus-gssapi-gsi that was built against openssl 1.0.0, but run it on a system with openssl 1.0.1.
Unfortunately, if I build it with openssl 1.0.1 then I can't even install it on a system with openssl 1.0.0 due to shared library version requirements.

I've done a lot of testing of various combinations and I think the easiest way to summarize is to post a table; hope JIRA doesn't mangle it too much.
(These were all from running globus-job-run)

{code}
| gssapi-gsi | patch | build openssl | run openssl | outcome                              |
|------------+-------+---------------+-------------+--------------------------------------|
|       10.7 | no    |         1.0.0 |       1.0.0 | ok                                   |
|       10.7 | no    |         1.0.0 |      1.0.1e | FAIL globus-xio error                |
|       10.7 | yes   |         1.0.0 |       1.0.0 | FAIL globus-xio error                |
|       10.7 | yes   |         1.0.0 |      1.0.1e | FAIL jobmanager segfault             |
|       10.7 | no    |        1.0.1e |       1.0.0 | untested presumably does not install |
|       10.7 | no    |        1.0.1e |      1.0.1e | FAIL hangs forever                   |
|       10.7 | yes   |        1.0.1e |       1.0.0 | FAIL does not install                |
|       10.7 | yes   |        1.0.1e |      1.0.1e | FAIL hangs forever                   |
|      10.10 | no    |         1.0.0 |       1.0.0 | ok                                   |
|      10.10 | no    |         1.0.0 |      1.0.1e | FAIL globus-xio error                |
|      10.10 | yes   |         1.0.0 |       1.0.0 | ok                                   |
|      10.10 | yes   |         1.0.0 |      1.0.1e | FAIL jobmanager segfault             |
|      10.10 | no    |        1.0.1e |       1.0.0 | FAIL does not install                |
|      10.10 | no    |        1.0.1e |      1.0.1e | ok                                   |
|      10.10 | yes   |        1.0.1e |       1.0.0 | FAIL does not install                |
|      10.10 | yes   |        1.0.1e |      1.0.1e | ok                                   |
{code}

Some conclusions:

1. If GSSAPI-GSI was built against OpenSSL 1.0.0 then globus-job-run does not work on OpenSSL 1.0.1e.
If it's not the "Token size" error then it's a jobmanager segfault.

2. If GSSAPI-GSI was built against OpenSSL 1.0.1e then it cannot be installed on a system with OpenSSL 1.0.0.
This is a dependency issue due to shared library versions.

3. If GSSAPI-GSI 10.7 was built against OpenSSL 1.0.1e then globus-job-run will hang forever -- patched or unpatched.

4. The patch is not necessary with 10.10: as long as it is rebuilt with OpenSSL 1.0.1e, it will work on OpenSSL 1.0.1e (and no older).
The patch breaks 10.7: I get a "Token size" error even when GSSAPI-GSI was built with and running on OpenSSL 1.0.0.

Joe Bester - 2013-12-09

Version 10.8 of GSSAPI contains some fixes to work with TLS 1.2, which is probably why your old version doesn't work. Is there any reason for this testing matrix to include 10.7 since we've already fixed bugs in that version?

Would something like this work:
build:
globus-gssapi-gsi-1.10-1 build depend/depend on openssl >= 1.0.0 and openssl < 1.0.1
build:
globus-gssapi-gsi-1.10-2 build depend/depend on openssl >= 1.0.1

matyas - 2013-12-09

I tested with 10.7 since that's the version that was in GT 5.2.1, which is the version we ship.
I tried out 10.8 (well, 10.7 with the code changes backported from 10.8) in an otherwise GT 5.2.1 install, but got the same symptoms.
(Also tried out 10.10 in a 5.2.1 install -- same thing).

I tried your second suggestion. Unfortunately, yum is not smart enough to automatically choose the older version of globus-gssapi-gsi if the requirement isn't met, and it will cause errors on upgrade. If the user knows to explicitly specify the version then it will work, but we can't expect that.

Joe Bester - 2013-12-10

I guess I'm not sure what you are expecting from us. We provide a version that works with old ssl will have a new one that works with the new ssl. You're willing to upgrade to an openssl that breaks things but not to a gt that fixes the breakage?

matyas - 2013-12-10

The openssl upgrade happened on the distro side, and we have no control over it. Actually, we've been telling people to avoid upgrading openssl until we have a fix.

We're in the process of upgrading to 5.2.5, but that's not something that will be ready and tested this month, let alone this week, so we were hoping for something that can tide the users over until 5.2.5 is ready.

If possible, we'd also like something that works with both openssl 1.0.0 and 1.0.1e, since some of our users are very conservative and only upgrade software when they absolutely have to, whereas other users aren't.

matyas - 2013-12-10

We have some good news to report: using globus-xio-3.6 and globus-xio-gsi-driver-2.4 in an otherwise GT 5.2.1 install fixes gridftp; adding -launch_method fork_and_proxy to the arguments when starting up globus-gatekeeper fixes globus-job-run as well.

How do you recommend we make fork_and_proxy the default? Right now I just patch the init script; is there a better place for it?

Mattias Ellert - 2013-12-11

I was able to make the globus-gatekeeper work by simply recompiling the GT 5.2.5 globus-gssapi-gsi source unchanged, i.e. without any patch applied, against openssl 1.0.1.

The globus-gssapi-gsi library if compiled against openssl 1.0.0 but used with openssl 1.0.1 at runtime will not do the same thing as a globus-gssapi-gsi library compiled against openssl 1.0.1. The reason for this is that openssl 1.0.1 uses labelled symbols and will use different versions of some functions in the library if the binary or library that uses the library at runtime was compiled against openssl 1.0.0 or 1.0.1. The globus-gssapi-gsi library is the only globus library that uses any of the versioned symbols from openssl 1.0.1, so it is the only one that needs to be recompiled. It is the only one that has a rpm requires on libcrypto.so.10(OPENSSL_1.0.1)(64bit) when compiled against openssl 1.0.1.

The recompiled library is available in EPEL testing - https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-12307/

If you are installing globus from EPEL and this update works for you please provide karma so that it can be pushed to stable earlier than the 2 week waiting time that is imposed for updated that do not reach the karma threshold.

David Carver - 2015-07-27

TACC have a user that is seeing this problem with gridftp-6.0 running on Solaris 5.10 (gridftp.ranch.tacc.utexas.edu or gridftp.ranch.tacc.xsede.org)
https://portal.xsede.org/group/xup/tickets/-/tickets/33713

bash-4.1$ cat grid-ftp
service gsiftp
{
    disable = no
    instances = 50
    socket_type = stream
    protocol = tcp
    per_source = 42
    wait = no
    user = root
    banner_fail = /usr/local/apps/gridftp-6.0/etc/gridftp.full.msg
    env += GLOBUS_HOSTNAME=gridftp1.ranch.tacc.utexas.edu
    env += GLOBUS_TCP_PORT_RANGE=50000,51000
    env += GLOBUS_LOCATION=/usr/local/apps/gridftp-6.0
    env += LD_LIBRARY_PATH=/usr/local/apps/gridftp-6.0/lib:/usr/local/ssl-1.0.0m/lib:/usr/local/lib
    server = /usr/local/apps/gridftp-6.0/sbin/globus-gridftp-server
    server_args = -c /usr/local/apps/gridftp-6.0/etc/gridftp.ranch.conf -fs-whitelist file,popen,ordering -popen-whitelist tar:/bin/tar,md5sum:/usr/bin/md5sum
    log_on_failure += USERID
    nice = 19
}

I am not sure if the "+=" operator is a post or prefix operator.

bash-4.1$ LD_LIBRARY_PATH=/usr/local/apps/gridftp-6.0/lib:/usr/local/ssl-1.0.0m/lib:/usr/local/lib:$LD_LIBRARY_PATH
bash-4.1$ ldd  /usr/local/apps/gridftp-6.0/sbin/globus-gridftp-server
        libglobus_gridftp_server.so.6 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_gridftp_server.so.6
        libglobus_gfork.so.0 =>  /usr/local/apps/gridftp-6.0/lib/libglobus_gfork.so.0
        libglobus_ftp_control.so.1 =>    /usr/local/apps/gridftp-6.0/lib/libglobus_ftp_control.so.1
        libglobus_io.so.3 =>     /usr/local/apps/gridftp-6.0/lib/libglobus_io.so.3
        libglobus_usage.so.0 =>  /usr/local/apps/gridftp-6.0/lib/libglobus_usage.so.0
        libglobus_gridftp_server_control.so.0 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_gridftp_server_control.so.0
        libglobus_xio.so.0 =>    /usr/local/apps/gridftp-6.0/lib/libglobus_xio.so.0
        libglobus_gssapi_error.so.2 =>   /usr/local/apps/gridftp-6.0/lib/libglobus_gssapi_error.so.2
        libglobus_gss_assist.so.3 =>     /usr/local/apps/gridftp-6.0/lib/libglobus_gss_assist.so.3
        libglobus_authz.so.0 =>  /usr/local/apps/gridftp-6.0/lib/libglobus_authz.so.0
        libglobus_callout.so.0 =>        /usr/local/apps/gridftp-6.0/lib/libglobus_callout.so.0
        libglobus_gssapi_gsi.so.4 =>     /usr/local/apps/gridftp-6.0/lib/libglobus_gssapi_gsi.so.4
        libglobus_gsi_proxy_core.so.0 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_proxy_core.so.0
        libglobus_gsi_credential.so.1 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_credential.so.1
        libglobus_gsi_callback.so.0 =>   /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_callback.so.0
        libglobus_oldgaa.so.0 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_oldgaa.so.0
        libglobus_gsi_cert_utils.so.0 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_cert_utils.so.0
        libglobus_gsi_sysconfig.so.1 =>  /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_sysconfig.so.1
        libglobus_openssl.so.0 =>        /usr/local/apps/gridftp-6.0/lib/libglobus_openssl.so.0
        libglobus_openssl_error.so.0 =>  /usr/local/apps/gridftp-6.0/lib/libglobus_openssl_error.so.0
        libglobus_proxy_ssl.so.1 =>      /usr/local/apps/gridftp-6.0/lib/libglobus_proxy_ssl.so.1
        libglobus_gsi_authz_callout_error.so.0 =>        /usr/local/apps/gridftp-6.0/lib/libglobus_gsi_authz_callout_error.so.0
        libglobus_common.so.0 =>         /usr/local/apps/gridftp-6.0/lib/libglobus_common.so.0
        libnsl.so.1 =>   /lib/libnsl.so.1
        libsocket.so.1 =>        /lib/libsocket.so.1
        libssl.so.1.0.0 =>       /usr/local/ssl-1.0.0m/lib/libssl.so.1.0.0
        libcrypto.so.1.0.0 =>    /usr/local/ssl-1.0.0m/lib/libcrypto.so.1.0.0
        libltdl.so.3 =>  /usr/local/lib/libltdl.so.3
        libdl.so.1 =>    /lib/libdl.so.1
        libc.so.1 =>     /lib/libc.so.1
        libgcc_s.so.1 =>         /usr/local/lib/libgcc_s.so.1
        libmp.so.2 =>    /lib/libmp.so.2
        libmd.so.1 =>    /lib/libmd.so.1
        libsoftcrypto.so.1 =>    /lib/libsoftcrypto.so.1
        libelf.so.1 =>   /lib/libelf.so.1
        libcryptoutil.so.1 =>    /lib/libcryptoutil.so.1
        libz.so.1 =>     /usr/local/lib/libz.so.1
        libz.so.1 (SUNW_1.1) =>  (version not found)
        libm.so.2 =>     /lib/libm.so.2
bash-4.1$

Other libssl libraries found.
bash-4.1$  find /usr /lib /opt -name "libssl.so.1.0.*" -ls 2> /dev/null
55992    1 lrwxrwxrwx   1 root     root           34 Feb 10 11:18 /usr/lib/amd64/libssl.so.1.0.0 -> ../../../lib/amd64/libssl.so.1.0.0
56031    1 lrwxrwxrwx   1 root     root           25 Feb 10 11:18 /usr/lib/libssl.so.1.0.0 -> ../../lib/libssl.so.1.0.0
51752 1024 -r-xr-xr-x   1 2        bin        338695 Feb 11  2011 /usr/local/ssl/lib/libssl.so.1.0.0
232910 1024 -r-xr-xr-x   1 root     root       371516 Jun  9  2014 /usr/local/ssl-1.0.0m/lib/libssl.so.1.0.0
56017    1 lrwxrwxrwx   1 root     root           31 Feb 10 11:18 /lib/libssl.so.1.0.0 -> openssl/default/libssl.so.1.0.0
55996    1 lrwxrwxrwx   1 root     root           40 Feb 10 11:18 /lib/amd64/libssl.so.1.0.0 -> ../openssl/default/amd64/libssl.so.1.0.0
80602  643 -r-xr-xr-x   1 root     bin        615992 May  5 10:06 /lib/openssl/default/amd64/libssl.so.1.0.0
80597  515 -r-xr-xr-x   1 root     bin        468792 May  5 10:06 /lib/openssl/default/libssl.so.1.0.0
239368 1024 -r-xr-xr-x   1 makeda   G-815144   427336 Dec 13  2013 /opt/splunkforwarder/lib/libssl.so.1.0.0

Joe Bester - 2015-07-27

I don't have permissions to view the xsede ticket.

David Carver - 2015-07-27

Joe,
Try it now.
https://tickets.xsede.org/Ticket/Display.html?id=33713

Globus Toolkit/GT-490

Summary

No Functional Group in globus-xio-gridftp-driver GPT metadata

Details

Type: Bug

Status: Open

Description

The GPT metadata for the globus-xio-gridftp-driver package has no Functional Group tag

The patch sets the Functional Group to "Communication", which is the one used for the other XIO drivers.

$ grep Group /usr/share/globus/packages/globus_xio_*_driver/pkg_data_gcc64_rtl.gpt
/usr/share/globus/packages/globus_xio_gsi_driver/pkg_data_gcc64_rtl.gpt:Communication
/usr/share/globus/packages/globus_xio_pipe_driver/pkg_data_gcc64_rtl.gpt:Communication
/usr/share/globus/packages/globus_xio_popen_driver/pkg_data_gcc64_rtl.gpt:Communication

(The patch also changes source dependency type from pgm link to lib link - again as the other XIO driver plugin libraries.)

Comments

Globus Toolkit/GT-491

Summary

GridFTP data channel authentication broken when dc_default is in use

Details

Type: Bug

Status: Open

Description

When playing with alternate XIO data stacks, I noticed that data channel authentication is broken for default 'globus-url-copy' parameters.

- dc_default=tcp is broken if the client requests data channel authentication
- dc_default=tcp,gsi appears to look for a certificate as the target user.

Can the gridftp server make dc_default and globus-url-copy "play nicely" automatically?

Comments

Mike Link - 2014-01-16

Are you testing between two servers that both set the same -dc-default?  The doc doesn't make this clear, but unless you're using it to add a read-only driver, that is the only way it will work.  There is no negotiation involved with the client or the other end of the transfer.

Globus Toolkit/GT-492

Summary

GSI-OpenSSH server/client debian packages do not install cleanly

Details

Type: Bug

Status: Resolved 2014-09-11

Description

I've had to perform the steps I list below to get the GSI-enabled SSH server and client installed on Debian wheezy. Some of the issues are fairly obvious bugs that appear not to have been caught over the last several releases (I tagged only releases I've tried this on).

I was told by Jim Basney that "At first glance, the issues appear to all be with the Debian packaging of GSI-OpenSSH by the Globus project rather than something coming from the GSI-OpenSSH sources that I maintain."

Workaround for installation:

1. apt-get install openssh-server # at least as of July 2013, the GSI packages do not attempt to create the sshd user, this does
2. Login via ssh
3. service ssh stop
4. update-rc.d -f ssh remove

DON'T LOGOUT! WE JUST NEED PORT 22 TO BE UNUSED

5. wget http://www.globus.org/ftppub/gt5/5.2/stable/packages/deb/debian/wheezy/pool/contrib/g/globus-repository/globus-repository-5.2-stable-wheezy_0.0.3_all.deb

6. dpkg -i globus-repository-5.2-stable-wheezy_0.0.3_all.deb
7. apt-get update
8. apt-get install libglobus-gss-assist3 libglobus-usage0 # dependencies that are not enforced in the packaging
9. ln -s /usr/bin/ssh-keygen /usr/bin/gsissh-keygen # a necessary file not provided by the client/server packages
10. apt-get install gsi-openssh-clients gsi-openssh-server # works A-OK
11. edit /etc/init.d/gsi-openssh-server
11a. Change:

ECDSA_KEY=$sysconfdir/ssh_host_ecdsa_key
to
ECDSA_KEY=/etc/gsissh/ssh_host_ecdsa_key

This must be something I'm missing - what is the purpose of having the ECDSA key installed somewhere different from the RSA keys?

11b. remove second 'e' from eecdsa later in the file # obvious typo
11c. add 'do_ecdsa_keygen' after 'do_rsa_keygen' in the start() function. # without this, not ECDSA key will be generated and a warning will be printed at gsisshd startup
12. service gsi-openssh-server restart

Comments

Joe Bester - 2014-09-11

I think all of these issues are fixed in the GT6.0 release.

Globus Toolkit/GT-493

Summary

Missing configure option in GRAM LSF package

Details

Type: Bug

Status: Open

Description

The configure.in uses AC_SUBST(globusstatedir), but doesn't provide a way to configure it.
Patch (with a few lines of code copied from one of the other GRAM packages) attached.

Comments

Globus Toolkit/GT-494

Summary

Typo in configure for GRAM SGE package

Details

Type: Bug

Status: Open

Description

There is a mismatch between the variable name defined by configure (SEG_REPORTING_FILE) and the name used in the .in files where its value should be replaced (SGE_REPORTING_FILE). Patch attached.

Comments

Globus Toolkit/GT-495

Summary

Another architecture porting fix

Details

Type: Bug

Status: Open

Description

The attached patch was created by ubuntu to support the new upuntu port ppc64le.

Comments

Globus Toolkit/GT-496

Summary

GridFTP server frequently fails to log the remote IP address for transfers

Details

Type: Bug

Status: Resolved 2014-07-02

Description

We've noticed that the GridFTP server is frequently failing to log an IP address for the remote side of data transfers.  Examining the log for today shows that over all about 45% of transfers are reported as "DEST=[0.0.0.0]", and 99.3% of STOR transfers did so.  This obviously makes it impossible to categorize the set of remote sites using our GridFTP service.  For a while I thought it was related to Globus Online vs other transfer management methods, but that does not appear to be the case.  I can't imagine that the getpeername system calls are failing.

Do you have any suggestions for pinning this down?

Comments

Mike Link - 2014-02-13

Hi Craig,

How are your servers configured?  There was an issue with this in the past, but only with a striped or split configuration, and that should be fixed.

cruff@ucar.edu - 2014-02-13

We were using a split configuration, but disabled that to start testing the sharing feature.  Even after disabling it, the xferlog still contains entries like this:

DATE=20140213191944.136513 HOST=gridftp01.ucar.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140213191944.136048 USER=zcao FILE=/glade/scratch/zcao/20140214/1995/case3/SZ0215.opt BUFFER=65536 BLOCK=4194304 NBYTES=0 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[0.0.0.0] TYPE=STOR CODE=226

This is the contents of the currently active configuration file:

daemon 1
detach 1
hostname gridftp01.ucar.edu
port 2811
auth_level 2
log_level ERROR,WARN,INFO
#log_level ALL
log_module stdio:buffer=0
log_filemode 0644
log_single /var/log/gridftp/gridftp.log
log_transfer /var/log/gridftp/xferlog
use_home_dirs 1
home_dir /glade/scratch/$USER
banner_file /etc/grid-security/gridftp-banner
login_msg_file /etc/grid-security/gridftp-login-msg
restrict_paths rw/glade,rw/data
disable_usage_stats 1
stripe_blocksize 4194304
blocksize 4194304
sharing_dn      "/C=US/O=Globus Consortium/OU=Globus Online/OU=Transfer User/CN=__transfer__"
sharing_rp RW/data/share
$X509_USER_CERT "/etc/grid-security/hostcert.pem"
$X509_USER_KEY "/etc/grid-security/hostkey.pem"
$X509_CERT_DIR "/etc/grid-security/certificates"

Mike Link - 2014-02-13

I'm able to reproduce this.  When a data connection with multiple streams is reused for multiple transfers, the subsequent transfers will report 0.0.0.0.  Only the first transfer of a connection with any number of streams and all transfers with using 1 stream are reported correctly.

A globus-ftp-control package update with the fix should be available next week.

Thanks for reporting this.

Anonymous - 2014-06-26

Hi Mike,

1. Do you know if the patch got already in version 4.7-1 of globes-ftp-control?

2. Is it possible that I have a look at the patch?

Thanks,

Edgar

Anonymous - 2014-06-26

Hi Mike,

1. Do you know if the patch got already in version 4.7-1 of globes-ftp-control?

2. Is it possible that I have a look at the patch?

Thanks,

Edgar
OSG Software Team

Anonymous - 2014-06-26

Hi Mike,

1. Do you know if the patch got already in version 4.7-1 of globes-ftp-control?

2. Is it possible that I have a look at the patch?

Thanks,

Edgar
OSG Software Team

Mike Link - 2014-07-02

Hi Edgar,

The fix is in globus-ftp-control 4.8.   I added the source update package to http://toolkit.globus.org/toolkit/advisories.html

The small patch is at: https://github.com/globus/globus-toolkit/commit/fb9aed8f5bc8a3883c53453552f940ae99e2d048

Mike

Globus Toolkit/GT-497

Summary

globus_url_string_hex_encode crashes when input contains high ascii characters

Details

Type: Bug

Status: Open

Description

globus_url_string_hex_encode crashes when input contains high ascii (>127).

Comments

Globus Toolkit/GT-498

Summary

HTTP: automatically reconnect on demand when persistent connection is closed by http server.

Details

Type: New Feature

Status: Open

Description

A persistent http connection can be closed even after a small delay, rather than relay the connection closed error to the client I can automatically retry.  If the reconnection fails, that error will be reported to the client.

Comments

Globus Toolkit/GT-499

Summary

HTTP: during a GET, premature connection close is not distinguishable from end of transfer

Details

Type: Bug

Status: Open

Description

If a HTTP connection closes in the middle of a GET body, the server can not detect that it was a premature close other than that the data is smaller than expected.  Only the short read is reported in the FTP error response.

Comments

Mike Link - 2014-01-22

I think the only way around this is for the HTTP driver to rewrite (rather than wrap) the tcp driver's eof error so it doesn't get detected as an eof.  Not clear if that would have other implications.

Globus Toolkit/GT-500

Summary

Add debug tracing to XIO HTTP driver

Details

Type: Improvement

Status: Open

Description

No tracing in the http driver, particularly icky for debugging threading issues.

Comments

Globus Toolkit/GT-501

Summary

HTTP: threading issues

Details

Type: Bug

Status: Open

Description

Seeing GridFTP HTTP transfer race conditions on opens for put with expect, and on close.

Comments

Globus Toolkit/GT-502

Summary

Insufficient dependencies in MyProxy GPT metadata

Details

Type: Bug

Status: Open

Description

When GT 5.2 was released all package version dependencies in the GPT metadata were changed to require as a minimum the versions of the packages that were released with GT 5.2 in order to avoid that older versions would be used to resolve dependencies. However, this change was never implemented for the MyProxy package.

Also, the MyProxy GPT file does not list all direct dependencies needed for the compileation, which means that the success of the build relies on indirect dependencies.

The attached patch fixes these issues.

Comments

Globus Toolkit/GT-503

Summary

Support authentication using username/password for GCMU registering with Globus Online

Details

Type: Task

Status: Resolved 2014-02-06

Description

Support authentication using username/password for GCMU registering with Globus Online. This should also support SSH mechanism, and should use the OAuth protocol support in Nexus.

Comments

Joe Bester - 2013-07-23

The username/password version is part of the native packaged gcmu. The other authentication methods are not used for that. Not sure if that's sufficient for resolving this.

Globus Toolkit/GT-504

Summary

Add GCMU code base to GitHub

Details

Type: Task

Status: Resolved 2014-02-06

Description

Add GCMU code base to GitHub

Comments

Joe Bester - 2014-01-23

https://github.com/globus/globus-toolkit/tree/globus_5_2_branch/globusonline/globus-connect-server

Globus Toolkit/GT-505

Summary

GCMU fails when installed on MAC in a directory with a space

Details

Type: Bug

Status: Open

Description

GO ticket #300301. When GCMU is installed on Mac OS X in a directory with space characters, it fails. All paths should be enclosed in quotation marks.

Comments

Globus Toolkit/GT-506

Summary

The authentication method ssh public key used by GCMU need a replacement

Details

Type: New Feature

Status: Resolved 2014-02-06

Description

GCMU should use a GC credential to execute the CLI commands endpoint-{add|remove|modify|list} instead of requesting from admins to add an ssh public key to GO in "My Profile" and use the public key authentication method.

Comments

Globus Toolkit/GT-507

Summary

Build GCMU with GT 5.2.2

Details

Type: Task

Status: Resolved 2014-02-06

Description

GCMU uses GridFTP server from GT 5.0.5. The GridFTP server does not support the -home-dir option that requested by CRC at UND. The options is added to GT 5.2.2.

Comments

Globus Toolkit/GT-508

Summary

Add update command to GCMU

Details

Type: Task

Status: Resolved 2014-02-06

Description

Here is the suggestion from Steve:
* Is there a version number in the GCMU config file?  If not, that would be a good thing to add in the next version.  Then in the future if the update requires updates to the config or other files, it can use the version number to know what version is being run.

* We should add an "update" command in the top level GCMU directory that the user should always run after doing the untar.  It may not do anything in some updates.  But if there are other things we need to do to update an old GCMU to a new one, that gives us a place to put it.  Point is to just make this part of the standard update process.

* Have the update program check to see if there is a new version.  If not, then print "You are already running the latest version."  If there is a new version, then it could actually download it and untar it, probably after prompting the user if this is what they want to do.

Comments

Joe Bester - 2014-02-03

Mooted by GCS redesign

Globus Toolkit/GT-509

Summary

Add a new installation type to GCMU

Details

Type: Improvement

Status: Resolved 2014-02-06

Description

GCMU should support an installation type with many GridFTP server and one MyProxy server acting as one GO endpoint.

Comments

Globus Toolkit/GT-510

Summary

Create a GT 5.2 package for Globus Connect

Details

Type: Improvement

Status: Resolved 2014-02-06

Description

All of the Connect dependencies will be packaged in GT 5.2.  To make installing Connect easy, it should be packaged too.

Comments

Globus Toolkit/GT-511

Summary

Globus Connect Server Install Service name of GridFTP Server Incorrect

Details

Type: Bug

Status: Open

Description

This was on Centos 6.5, using the 6.1 RPM.

After installing,

 service --status-all

shows as "GridFTP server is running (pid=xxx)

Trying:

 service "GridFTP server" status

doesn't work.

Looking in /etc/init.d, see that it is globus-gridftp-server.  I think this is what it should advertise itself as for consistency.

Comments

Globus Toolkit/GT-512

Summary

Globus Connect Server Setup Result Not Clear With Bad MyProxy Server

Details

Type: Bug

Status: Resolved 2014-02-03

Description

This was on centos-6.5.

Change configuration file to use external MyProxy as follows.

[MyProxy]
Server = foo.jjk.info

Run setup, without verbose all that is shown is TypeError: putenv() argument 2 must be string, not None

Here is a snippet with verbose on.

ENTER: get_myproxy_dn_from_server()
fetching myproxy dn from server
MyProxy DN is None
EXIT: get_myproxy_dn_from_server()
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/globus/connect/server/setup.py", line 125, in 
    io.setup(reset=reset)
  File "/usr/lib/python2.6/site-packages/globus/connect/server/io/__init__.py", line 63, in setup
    self.configure_trust_roots(**kwargs)
  File "/usr/lib/python2.6/site-packages/globus/connect/server/io/__init__.py", line 261, in configure_trust_roots
    super(IO, self).configure_trust_roots(**kwargs)
  File "/usr/lib/python2.6/site-packages/globus/connect/server/__init__.py", line 478, in configure_trust_roots
    self.get_myproxy_dn_from_server()
  File "/usr/lib64/python2.6/os.py", line 471, in __setitem__
    putenv(key, item)
TypeError: putenv() argument 2 must be string, not None

It should message that that MyProxy foo.jjk.info was not available or some such.

Comments

Joe Bester - 2014-01-24

I think this is the same as https://globusonline.zendesk.com/agent/#/tickets/301799 which will be fixed in the next GCS release.

Joe Bester - 2014-02-03

Fixed in Globus connect Server 3.0.3

Globus Toolkit/GT-513

Summary

Support multiple identity providers in the GCMU CILogon authorization callout

Details

Type: Improvement

Status: Open

Description

The whitelist of accepted identity providers can only be configured to have one identity provider. Change that to support multiple identity providers.

Comments

Rachana Ananthakrishnan - 2013-08-12

This feature is not critical, so no implied timeline in the JIRA. Would be good to add this at the first convenient opportunity.

Globus Toolkit/GT-514

Summary

Make it easier for users to get the right version of GCS for their system

Details

Type: Improvement

Status: Resolved 2014-09-11

Description

Unfortunately, I don't think we can make a generic config RPM for all distributions, because the distribution name is not available in yum as a variable like the arch and version are. We can probably collapse the different versions of the same distribution into one file (all fedoras as one, all scientific linuxes as one) which would make it possible to do something like this:

# distro="$(rpm -qf /etc/[Sr]*-release | cut -d- -f1)"
# http://www.globus.org/ftppub/gt5/5.2/stable/installers/repo/Globus-5.2.stable-config.$distro-1.noarch.rpm

Which is still kind of ugly, but cut-and-pasteable

Joe

======

Joe,

We've had a couple of people run into problems installing GCMU/GCS on RPM distributions other than Fedora 18 because they copy the command from the support forum (https://support.globusonline.org/entries/23857088), which points to the package: Globus-5.2.stable-config.fedora-18-1.noarch.rpm (so they end up with some kind of dependency error, e.g. if I do this on RHEL 6.4 I get:

error: Failed dependencies:
        fedora-release >= 18 is needed by Globus-5.2.stable-config.fedora-18-1.noarch

Is there a way to tell yum (or give it a generic link) to pull the correct package for the distribution that's running on the user's machine? If not, can the user run some command prior to pulling the installer to fix this. Alternatively, I guess we could provide a set of links for all the distributions but that seems like a brute force way to handle.

Thanks,
Vas

Comments

Joe Bester - 2014-09-11

The new GT6 repository configuration packages will automatically decide which repo paths to use at install time, so a generic deb and rpm package are provided. These are included in the Linux packages of http://toolkit.globus.org/toolkit/downloads/6.0/

Globus Toolkit/GT-515

Summary

Increase default proxy key size in gsi-proxy-core

Details

Type: Improvement

Status: Resolved 2014-09-11

Description

The default key size in the gsi-proxy-core library seems to still be 512-bits (in globus_gsi_proxy_handle_attrs.c:30). We've seen a number of problems in grid middleware applications linked against this library caused by this (mainly TLSv1.2 related). Would it be possible to change the default to 1024? While the applications that use this are now being patched to request larger keys (much like with ticket GT-272), I think it would also make sense to change the default in the library to ensure that these problems don't appear again in the future.

Comments

Globus Toolkit/GT-516

Summary

GridFTP server: data transfers cannot be interrupted by the client

Details

Type: Bug

Status: Resolved 2014-07-15

Description

in the PRACE project we recently discovered an anomaly in the current GridFTP server version (v.6.38) included in the Globus Toolkit v5.2.5. Due to this, GridFTP data transfer processes (DTPs as per [1]) continue to transfer data after the initiating client was interrupted. I attached my analysis with more detailed description, symptoms, the created problems and the GridFTP server versions affected and not affected according to my testing.

I am unsure, but this issue might be related to the following Globus JIRA tickets:

* https://globus.atlassian.net/browse/GRIDFTP-160
* https://globus.atlassian.net/browse/GRIDFTP-178

Best regards
Frank Scheiner

[1]: http://www.ietf.org/rfc/rfc959.txt

--
Frank Scheiner


Additional information:

## Description ##

Data transfers or benchmarks initiated with globus-url-copy (guc) persist, even
if the initiating guc call is interrupted, regardless of wether this is
done with SIGINT, SIGTERM or SIGKILL. It makes no difference if guc is called
directly or by another program. It also makes no difference if guc runs locally
on the system that hosts the GridFTP service or remotely.


## Symptoms ##

1. When using the guc option `-len X`, a memory to memory transfer/benchmark
seems to continue - meaning the GridFTP DTPs (as per [1]) stay active hogging
CPU power and network bandwidth - until the specified amount of data was
transferred,...

2. ...if this option is not used, a memory to memory transfer/benchmark
continues until one of the involved DTPs is killed manually. But this can only
be done if the user has command line access to one of the involved hosts or by a
local admin. It is sufficient to kill all involved processes on one side of the
transfer.

3. File to file/memory transfers show the same effect as memory to memory
transfers with specified length, most likely because the length is determined
implicitly by the file size.

[1]: http://www.ietf.org/rfc/rfc959.txt


## Workaround ##

There is no real workaround for this issue except going back to a version that
does not include this bug.

Nevertheless the impact can be limited by the following actions:

For memory to memory tests use a specified length (option `-len X`) for a test.
This at least makes sure that the transfer ends some time in the future without
manual intervention. For a file to file/memory transfer the length is determined
implicitely.


## Problems ##

* Interrupted benchmarks and data transfers continue to hog CPU power and
network bandwidth without reason.

* An authenticated user can trigger a denial of service situation

* This bug also breaks the interruption and continuation of data transfers
performed by gtransfer, as interrupted data transfers continue although the
controlling client (guc) is no longer active.

* In principle data transfers can no longer be interrupted safely.


## Versions affected ##

| BINARY | VERSION | OS |
| --------------------- | --------------------- | --------------------------------------------------------------------------- |
| globus-gridftp-server | v6.38 (1382984154-83) | Debian 7 (Globus package), SLES11 (compiled manually), SL6.1 (EPEL package) |


## Versions not affected ##

| BINARY | VERSION | OS |
| --------------------- | --------------------- | -------------------------- |
| globus-gridftp-server | v6.19 (1359994843-83) | Debian 7 (Globus package) |
| globus-gridftp-server | v6.10 (1334324800-83) | Debian 7 (Globus package) |

All tests were performed with the same client (guc v8.6) on different OSes. As
the same client version works with different server versions, I assume this is
a bug in the server.

Comments

Mike Link - 2014-02-04

Can you report the exact client parameters and server configuration?  I can't reproduce this using the Globus package on Fedora, but I'll try Debian 7.

helmut - 2014-02-06

Hi,

I attached a document (used-client-and-server-config.md.txt) with the configurations I used to reproduce the issue and four additional documents (init script file, init script configuration file, PI configuration file, DTP configuration file). The server was driven by Debian 7, the client by Ubuntu 12.04.

I've included instructions on how to switch between different package versions in Debian 7.

Bye
Frank

Mike Link - 2014-07-15

This has been fixed in the latest version of 5.2.5 update packages, and there is a source package available at http://toolkit.globus.org/toolkit/advisories.html

Globus Toolkit/GT-517

Summary

Decouple restart and perf markers

Details

Type: Improvement

Status: Open

Description

GridFTP's DSI interface allows for the DSI to send information on the bytes it has received so that the server can report perf markers and restart markers back to the client. This interface is implemented in a single function call globus_gridftp_server_update_bytes_written() which records information used for both restart and perf markers. Although this is successful for standard file systems, the same is not true for HPSS. In HPSS, restart markers (successfully written ranges) are not available until the end of the transfer. Waiting until the end of the transfer to us update_byte_written() would result in the client believing the transfer is hung.

Instead, this function needs to be broken into to portions for DSI's like HPSS so that restart markers (ranges successfully written) and perf markers (bytes written) can be reported separately. I have patched our local libglobus_gridftp_server.so with the following changes:

globus_gridftp_server.h:
/*
 * update bytes
 *
 * This should be called during a recv(), after each successful write
 * to the storage system.
 */
void
globus_gridftp_server_update_bytes_written(
    globus_gfs_operation_t              op,
    globus_off_t                        offset,
    globus_off_t                        length);

/*
 * update total byte counts written; used for perf markers
 */
void
globus_gridftp_server_update_bytes_recvd(
    globus_gfs_operation_t              op,
    globus_off_t                        length);


globus_i_gfs_data.c:

void
globus_gridftp_server_update_bytes_recvd(
    globus_gfs_operation_t              op,
    globus_off_t                        length)
{
    GlobusGFSName(globus_gridftp_server_update_bytes_recvd);
    GlobusGFSDebugEnter();

    globus_l_gfs_data_alive(op->session_handle);

    globus_mutex_lock(&op->session_handle->mutex);
    {
        op->recvd_bytes += length;
    }
    globus_mutex_unlock(&op->session_handle->mutex);

    GlobusGFSDebugExit();
}

void
globus_gridftp_server_update_range_recvd(
    globus_gfs_operation_t              op,
    globus_off_t                        offset,
    globus_off_t                        length)
{
    GlobusGFSName(globus_gridftp_server_update_range_recvd);
    GlobusGFSDebugEnter();

    globus_l_gfs_data_alive(op->session_handle);

    globus_mutex_lock(&op->session_handle->mutex);
    {
        globus_range_list_insert(
            op->recvd_ranges, offset + op->transfer_delta, length);
    }
    globus_mutex_unlock(&op->session_handle->mutex);

    GlobusGFSDebugExit();
}

Comments

Jason Alt - 2015-01-08

Needed for GT6.

Globus Toolkit/GT-518

Summary

Wrong CA Cert Referenced After globus-connect-server-io-setup with configured MyProxy Server

Details

Type: Bug

Status: Open

Description

I fired up two ec2 instances.

One I configured by running globus-connect-id-setup.

One I configured by running globus-connect-io-setup after setting the myproxy server to the FQDN of the first above.

I created users with the same name on both, and then did an ls on the endpoint.  Logged in with username / pass, etc.

I get the error as follows.
Server: kordas#gcsmyproxtest1 (ec2-54-198-229-126.compute-1.amazonaws.com:2811)

02-07 16:15:50.233 conn_error Server: kordas#gcsmyproxtest1 (ec2-54-198-229-126.compute-1.amazonaws.com:2811)

Message: Login Failed

02-07 16:15:50.233 conn_error Message: Login Failed

---

530-Login incorrect. : globus_gss_assist: Error invoking callout

530-globus_callout_module: The callout returned an error

530-globus_gridmap_callout_error: Gridmap lookup failure: Could not map /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=0d63a586-9011-11e3-bce2-123139074522/CN=kordas

530-

530 End.

02-07 16:15:50.233 conn_error ---\n530-Login incorrect. : globus_gss_assist: Error invoking callout\r\n530-globus_callout_module: The callout returned an error\r\n530-globus_gridmap_callout_error: Gridmap lookup failure: Could not map /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=0d63a586-9011-11e3-bce2-123139074522/CN=kordas\r\n530-\r\n530 End.

Investigating, I found in /etc/gridftp.d/globus-connect-server-authorization, the GLOBUS_MYPROXY_CA_CERT variable was pointing to the general CA from globus.org rather than the CA of the MyProxy server.  The CA cert of the MyProxy server was installed, just not referenced properly.

After changing it to the myproxy ca cert, it works as expected.

Comments

Joe Bester - 2014-02-10

What OS are you using for this?

Jack Kordas - 2014-02-13

Ubuntu 12.04 LTS

$ cat /etc/issue
Ubuntu 12.04.4 LTS \n \l

$ uname -a
Linux ec2-54-235-230-60.compute-1.amazonaws.com 3.2.0-58-virtual #88-Ubuntu SMP Tue Dec 3 17:58:13 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Joe Bester - 2014-02-14

I have a fix for this that I'm testing.

Globus Toolkit/GT-519

Summary

Globus-url-copy transfer failing between TACC and SLAC with message Can’t get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash aece7839

Details

Type: Bug

Status: Resolved 2014-02-20

Description

Globus-url-copy transfer failing between TACC and SLAC with the following error message:
500-globus_gsi_callback_module: Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash aece7839
500 End.

Here is a debug trace on the transfer:
login2$ globus-url-copy -dst-cred /tmp/x509_ubeckermr.slac -dbg -tcp-bs 13M gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar
debug: starting to size gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar
debug: connecting to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar
debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
220 data3.stampede.tacc.utexas.edu GridFTP Server 6.38 (gcc64dbg, 1382984154-83) [Globus Toolkit 5.2.5] ready.

debug: authenticating with gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar
debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
230 User beckermr logged in.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE HELP

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
214-The following commands are recognized:
    ALLO    APPE    REST    CWD     CDUP    DCAU    EPSV    FEAT
    ERET    MDTM    STAT    ESTO    HELP    LIST    MODE    NLST
    MLSC    MLSD    PASV    RNFR    MLSR    MLST    NOOP    OPTS
    STOR    PASS    PBSZ    PORT    PROT    SITE    EPRT    RETR
    SPOR    MFMT    SCKS    TREV    PWD     QUIT    SBUF    SIZE
    SPAS    STRU    SYST    RNTO    TYPE    USER    LANG    MKD
    RMD     DELE    CKSM    DCSC
214 End

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
FEAT

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
211-Extensions supported
 DCSC P,D
 MFMT
 AUTHZ_ASSERT
 MLSR
 MLSC
 UTF8
 LANG EN
 DCAU
 PARALLEL
 SIZE
 MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.uid*;UNIX.group*;UNIX.gid*;Unique*;UNIX.slink*;X.count;
 ERET
 ESTO
 SPAS
 SPOR
 REST STREAM
 MDTM
 PASV AllowDelayed;
211 End.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE CLIENTINFO scheme=gsiftp;appname="globus-url-copy";appver="5.14 (gcc64, 1305182462-80) [Globus Toolkit 5.0.4]";
debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
250 OK.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
TYPE I
debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
200 Type set to I.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
SIZE /scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
213 76832051200

debug: operation complete
debug: starting to transfer gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar
debug: connecting to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar
debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
220 osggridftp01.slac.stanford.edu GridFTP Server 6.38 (gcc64, 1382984154-83) [Globus Toolkit 5.2.5] ready.

debug: authenticating with gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar
debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
230 User beckermr logged in.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE HELP

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
214-The following commands are recognized:
    ALLO    APPE    REST    CWD     CDUP    DCAU    EPSV    FEAT
    ERET    MDTM    STAT    ESTO    HELP    LIST    MODE    NLST
    MLSC    MLSD    PASV    RNFR    MLSR    MLST    NOOP    OPTS
    STOR    PASS    PBSZ    PORT    PROT    SITE    EPRT    RETR
    SPOR    MFMT    SCKS    TREV    PWD     QUIT    SBUF    SIZE
    SPAS    STRU    SYST    RNTO    TYPE    USER    LANG    MKD
    RMD     DELE    CKSM    DCSC
214 End

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
FEAT

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
211-Extensions supported
 DCSC P,D
 MFMT
 AUTHZ_ASSERT
 MLSR
 MLSC
 UTF8
 LANG EN
 DCAU
 PARALLEL
 SIZE
 MLST Type*;Size*;Modify*;Perm*;Charset;UNIX.mode*;UNIX.owner*;UNIX.uid*;UNIX.group*;UNIX.gid*;Unique*;UNIX.slink*;X.count;
 ERET
 ESTO
 SPAS
 SPOR
 REST STREAM
 MDTM
 PASV AllowDelayed;
211 End.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE CLIENTINFO scheme=gsiftp;appname="globus-url-copy";appver="5.14 (gcc64, 1305182462-80) [Globus Toolkit 5.0.4]";
debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
250 OK.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
TYPE I
debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
200 Type set to I.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE STORBUFSIZE 13631488

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
200 Site Command Successful.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
DCAU S /C=US/O=National Center for Supercomputing Applications/CN=Matthew Becker

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
200 DCAU S.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
PBSZ 1048576

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
200 PBSZ=1048576

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
PASV

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
227 Entering Passive Mode (134,79,120,8,196,207)

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
ALLO 76832051200

debug: response from gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
200 ALLO command successful.

debug: sending command to gsiftp://osggridftp01.slac.stanford.edu:2811/nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar:
STOR /nfs/slac/g/ki/ki21/cosmo/beckermr/mass_function/lasdamas/Consuelo_4001_snapshot_046.tar

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
SITE RETRBUFSIZE 13631488

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
200 Site Command Successful.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
DCAU S /O=Grid/OU=GlobusTest/OU=simpleCA-osgmyproxy.slac.stanford.edu/OU=SLAC/CN=uid:beckermr

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
200 DCAU S.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
PBSZ 1048576

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
200 PBSZ=1048576

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
PORT 134,79,120,8,196,207

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
200 PORT Command successful.

debug: sending command to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
RETR /scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar

debug: response from gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar:
500-Command failed. : callback failed.
500-OpenSSL Error: s3_clnt.c:1068: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed
500-globus_gsi_callback_module: Could not verify credential
500-globus_gsi_callback_module: Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash aece7839
500 End.

debug: fault on connection to gsiftp://data3.stampede.tacc.utexas.edu:2811/scratch/01756/beckermr/lasdamas/Consuelo_4001_snapshot_046.tar: globus_ftp_client: the server responded with an error
debug: operation complete

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : callback failed.
500-OpenSSL Error: s3_clnt.c:1068: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed
500-globus_gsi_callback_module: Could not verify credential
500-globus_gsi_callback_module: Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash aece7839
500 End.

However, I am at a lost where the "Untrusted self-signed certificate in chain with hash aece7839" is referenced.

Simple grep on CAs for aece7839
data3# grep -i aece7839 /etc/grid-security/certificates/* /etc/grid-security/h*pem
data3#

Grep on openssl hashs of all defined CAs at TACC
data3# for C in `ls -1 /etc/grid-security/certificates/*.0 /etc/grid-security/certificates/*.pem` ; do echo $C; openssl x509 -in $C -subject_hash -issuer_hash -noout -subject_hash_old -issuer_hash_old; done | grep aece7839
data3#

The gridftp server logs at both TACC and SLAC report sucess or return code 226.

Here is the gridftp log at SLAC end.

DATE=20140214223132.875137 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214222702.516169 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=4363 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.19] TYPE=ESTO CODE=226
DATE=20140214223638.361932 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214223207.854667 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=4363 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.20] TYPE=ESTO CODE=226
DATE=20140214224147.945451 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214223716.785862 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=4363 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.19] TYPE=ESTO CODE=226
DATE=20140214224654.549034 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214224224.555878 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=4363 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.19] TYPE=ESTO CODE=226
DATE=20140214225158.424155 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214224728.346313 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=4363 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.19] TYPE=ESTO CODE=226
DATE=20140214225704.255575 HOST=osggridftp01.slac.stanford.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214225233.847119 USER=beckermr FILE=/u/ki/beckermr/nfs_links/cosmo21/beckermr/mass_function/lasdamas/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=65536 BLOCK=262144 NBYTES=24803 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[129.114.62.20] TYPE=ESTO CODE=226

And here is the log at TACC end

NBYTES=262144 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[134.79.120.8] TYPE=RETR CODE=226
data3.stampede.transfer.log:DATE=20140214215040.116685 HOST=data3.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214214610.608660 USER=beckermr FILE=/home1/01756/beckermr/scratch/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=87380 BLOCK=262144 NBYTES=262144 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[134.79.120.8] TYPE=RETR CODE=226
data3.stampede.transfer.log:DATE=20140214220052.253903 HOST=data3.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214215621.868786 USER=beckermr FILE=/home1/01756/beckermr/scratch/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=87380 BLOCK=262144 NBYTES=262144 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[134.79.120.8] TYPE=RETR CODE=226
data3.stampede.transfer.log:DATE=20140214230209.212324 HOST=data3.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214225738.739932 USER=beckermr FILE=/home1/01756/beckermr/scratch/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=87380 BLOCK=262144 NBYTES=262144 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[134.79.120.8] TYPE=RETR CODE=226
data3.stampede.transfer.log:DATE=20140214233747.773684 HOST=data3.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140214233317.162566 USER=beckermr FILE=/home1/01756/beckermr/scratch/lasdamas/Consuelo_4001_snapshot_046.tar BUFFER=87380 BLOCK=262144 NBYTES=262144 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[134.79.120.8] TYPE=RETR CODE=226


Also, Globusonline loops trying to send the file, since globus-url-copy transfer an error message

Comments

David Carver - 2014-02-17

Here is the original email describing the problem


On Feb 13, 2014, at 5:47 PM, Matthew Becker  wrote:

Hi everyone,

I am trying to move 10 TB of data off of scratch on stampede to a machine here at SLAC.

Thanks for the help!

Cheers,
Matt

On Thu, Feb 13, 2014 at 3:46 PM, Peter Onyisi  wrote:
Hi Wei:

I cc Chris Hempel at TACC, who will know better who to contact.  I do note that there seems to be some trouble with the home disks on Stampede right now - is the user trying to transfer from xsede#stampede or some other resource?

Cheers,
Peter

On Thu, Feb 13, 2014 at 4:41 PM, Yang, Wei  wrote:
Thanks Horst!

Hi Peter,

Please redirect me to the right person at TACC. I am helping a user at SLAC. He is using Globus Online to transfer data from TACC to SLAC. He said that he had no problem transferring data from NERSC to SLAC via GO but not from TACC. The error looks like

Error (transfer)
Server: slac#kipac (osggridftp01.slac.stanford.edu:2811)
Command: STOR

~/nfs_links/cosmo21/beckermr/mass_function/lasdamas/consuelo/Consuelo_4001_snapshot_046.tar
Message: The operation timed out
---
Timeout waiting for response

Do you have any suggestions?

regards,
Wei Yang  |  yangw@slac.stanford.edu  |  1-650-926-3338


On Feb 13, 2014, at 2:35 PM, Horst Severini  wrote:

> Hi Wei,
>
> I would start with Peter Onyisi; hi Peter. :)
>
> Cheers,
>
>       Horst
>
> On 02/13/2014 04:13 PM, Yang, Wei wrote:
>> Hi Horst,
>>
>> Do you know anyone at TACC that are familiar with their GridFTP and GlobusOnline stuffs?
>>
>> regards,
>> Wei Yang  |yangw@slac.stanford.edu   |  1-650-926-3338




--
Peter Onyisi   |
CERN: 4-R-028  | Department of Physics
UT: RLM 10.211 | University of Texas at Austin

Mike Link - 2014-02-18

Hi David,

That is the expected behavior when one server doesn't have the CA cert of the credential used for the data connection.  This is a common when using different credentials.  You can use the globus-url-copy -data-cred option to use the server feature designed to work around this (which is what Globus Transfer does).  Either -data-cred auto or -data-cred  should work.

I understand Jack Kordas is helping debug the initial stalling issue.

David Carver - 2014-02-18

Mike the orginal problem was reported to Globusonline by Matthew Becker and referred to TACC.  I used globus-url-copy to get additional debug information and worked with Wei  to gather gridftp logs before opening this ticket.

I am not sure how the "data-cred" is specified for GlobusOnline or what to tell Matthew Becker.

Is it possible to reopen and assign this ticket to Globus Online to enable this feature?

Later,
David Carver

David Carver - 2014-02-19

FYI, update from  GlobusOnline Request #301979

How to specify gridftp "-data-cred" feature for Globus-online

David Carver
Feb 18 16:41

I opened a ticket on gridftp (GT-519) and Mike Link said to add "data-cred" feature in globus-url-copy. But the user (Matthew Becker) is using GlobusOnline to move his files from TACC to SLAC at Stanford. Not sure how Matthew Becker can enable this feature in GlobusOnline.

Thanks,
David Carver


Comments
User photo Globus Team - Jack
globus support
GlobusOnline automatically uses this option to avoid the unsupported certificate error.

We discovered yesterday for some reason if the user chooses to encrypt the data transfer between stampede and space, the transfers start to work. There must be some different use of the network stack in this case.

Given this ticket is specifically about enabling the DCSC protocol and GlobusOnline does that. I'm going to close this ticket.

Matthew Becker still has a ticket open for the original issue.

Jack

February 19, 2014 09:18

Globus Toolkit/GT-520

Summary

GRAM Jobmanager crash

Details

Type: Bug

Status: Resolved 2014-03-17

Description

Hi,

UFlorida was having issues with globus-job-manager segfaulting regularly.  This
happened after an upgrade to OSG 3.2.4 (which included the GT 5.2.5 packages)
and was solved by downgrading "globus-gram-job-manager",
"globus-scheduler-event-generator" and "globus-gatekeeper" packages to their
pre-5.2.5 versions (globus-gram-job-manager-13.45, globus-gatekeeper-9.6,
globus-scheduler-event-generator-4.4).

They've provided stack traces, which I'll link. The interesting thing about
both stack traces is that globus_l_gram_script_queue was somehow called with
arguments for globus_l_gram_job_manager_script_run even though the function
signatures are not at all the same.

In the first stack trace:
...
   #3  0x00000000004172b9 in globus_l_gram_script_queue (request=0xaa42c0, script_cmd=, callback=, callback_arg=)    at globus_gram_job_manager_script.c:2366
   #4  globus_l_gram_job_manager_script_run (request=0xaa42c0, script_cmd=, callback=, callback_arg=)    at globus_gram_job_manager_script.c:317
...
But globus_l_gram_script_queue takes 2 arguments, not 4!

At first I thought this might be a mismatch between the globus-gram-job-manager package
and the globus-gram-job-manager-debuginfo package, but their package list shows
this not to be the case.


Unfortunately, we have not been able to replicate this in a controlled
environment. UFlorida is fairly unique in that they pass a huge number of jobs
through a single CE that most other sites would split up across multiple CEs.
So it's possible that this only exhibits itself under heavy load.

The stack traces are available at:
http://oo.ihepa.ufl.edu:8080/t2/operations/GDB.OSG.3.24.core.txt
http://oo.ihepa.ufl.edu:8080/t2/operations/GDB.OSG.3.24.core.2.txt

The site's osg-system-profiler output (which lists, among other things, their
full set of packages (pre-downgrade)) is here:
http://oo.ihepa.ufl.edu:8080/t2/operations/osg-profile.txt

I hope you can help. If you need further information, the main contact for
UFlorida on this issue is Bockjoo Kim (bockjoo at phys.ufl.edu).

Thanks,
-Mat

Comments

Joe Bester - 2014-02-26

I think I'd ignore the extra parameters to the function---the optimizer is probably inlining the function because it's only called in one place.

The two bugs look to point to the same issue: request->job_stats.client_address is NULL. I guess that's the thing to investigate for this. You might be able to see if there are any jobs that look like that in the state file directory.

matyas - 2014-02-27

Unfortunately, Bockjoo doesn't have those state files anymore, but he has some core files saved that might help:
http://oo.ihepa.ufl.edu:8080/t2/operations/cores/

bockjoo - 2014-02-28

I just signed up for the jira globus so that I can directly reply.
We have also upgraded our glibc to the same version as Nebraska has because the have the same OSG 3.2.4.
That did not help and we still had the segfault every couple seconds.
I think 'ip' point became different after the glibc upgrade.
But I hope provided core files can help pin down the issue.
Please let me know if you need more info from me.
Thanks for looking into it!

Joe Bester - 2014-03-11

I think I've found a way that this situation can arise. Briefly, the job state file is created before the peer's connection information is stored in the request structure. So, if the job manager terminates (two phase timeout or crash or reboot) between the time the job has been created and the state machine has begun processing, it'll have an invalid value of the client_address. Then when the job manager is restarted and tries to process that job, it is unable to create a script because it assumes the client_address value is non-NULL.

I'll have to fix two issues. 1: ensure that the state file includes the client address when it is first written. 2: handle the case where the client address is unknown without crashing.

Joe Bester - 2014-03-12

I have some RHEL 6 RPMS with a patch for this issue:
Source:
http://builds.globus.org/repo/rpm/redhat/6Server/SRPMS/globus-gram-job-manager-13.54-1gt.src.rpm
Binary:
http://builds.globus.org/repo/rpm/redhat/6Server/x86_64/globus-gram-job-manager-13.54-1gt.i686.rpm
http://builds.globus.org/repo/rpm/redhat/6Server/x86_64/globus-gram-job-manager-13.54-1gt.x86_64.rpm

I'm currently testing those, but preliminary results seem to show the issue resolved.

matyas - 2014-03-13

Hi Joe,
Can you give us a patch file of the changes that we can apply to our existing package? We get our packages from EPEL now, and merging the packages would be slower for us. We now have another site reporting the same problem, so we need to get a fix out fast.
Thanks,
-Mat

Joe Bester - 2014-03-13

https://github.com/globus/globus-toolkit/commit/d8bf2d72d4e14c7765964a4c155a89ce031fbacf.patch

matyas - 2014-03-13

Can you give me instructions on how to reproduce the original issue? Here's the procedure I tried:

1. globus-job-submit a long sleep job
2. Kill the job-manager
3. Edit the state file to remove the client_address (I tried both truncating the file to just before the client_address and also just blanking out that field)
4. Restart the job manager (by doing another globus-job-submit).

I expected the job manager to start crashing at that point, but it just complained about not being able to read the state file.

Joe Bester - 2014-03-13

If you convert that line to " \n" SPACE NEWLINE  it'll be caught by the parser.

What I did was attach a job manager to the debugger and kill it after globus_gram_job_manager_request_load() returns but before the request->job_stats.client_address = peer_str line in startup_socket.c

matyas - 2014-03-14

When I tried " \n", the job manager stopped complaining, but didn't crash.

How did you use gdb on the job manager? I tried attaching it to the gatekeeper and then following the child process once it forked, but the globus-job-manager did not seem to touch either startup_socket.c or globus_gram_job_manager_request_load().

Joe Bester - 2014-03-14

Either attach to an existing job manager process, or run the job manager with the command-line options in the grid-service entry before a job manager is running.

matyas - 2014-03-14

That did it. I've been able to replicate both the original crash and verify that the patched version fixes it.

Thanks Joe!

Globus Toolkit/GT-521

Summary

Encrypted Hostkey Support for GSI

Details

Type: New Feature

Status: Open

Description

Currently, GSI does not support encrypted hostkeys, and as a result, GCS will fail on an encrypted key with

Attempt 1 globus_credential: Error reading host credential
globus_credential: Key is password protected: GSI does not currently
support password protected private keys.

Prompt for password could be issued on interactive gridftp server startup and fail in non-interactive cases.

Comments

Globus Toolkit/GT-522

Summary

GridFTP server hangs indefinitely during 3rd party transfer when remote services does not connect

Details

Type: Bug

Status: Open

Description

A GO transfer is hitting permission denied on a remote endpoint which is causing GridFTP processes on our end to linger. These lingering GridFTP processes add up and eventually xinetd will not allow any more instances of the GridFTP server. The hang occurs during a STOR operation; the remote server never connects (presumably because it has returned EPERM on RETR).

Stack Trace:
(gdb) thread apply all where

Thread 6 (Thread 0x2aaab0954700 (LWP 31355)):
#0  0x00002aaaadf9a43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002aaaadae11af in globus_l_thread_pool_thread_start (user_arg=)
    at globus_thread_pool.c:263
#2  0x00002aaab035103b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#3  0x00002aaaadf96851 in start_thread () from /lib64/libpthread.so.0
#4  0x00002aaaaecfa6dd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x2aaab0b55700 (LWP 31356)):
#0  0x00002aaaadf9a7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002aaab0351589 in globus_l_pthread_cond_timedwait (cv=0x2aaaadcf7888, mut=0x2aaaadcf7860,
    abstime=0x2aaab0b54d20) at globus_thread_pthreads.c:768
#2  0x00002aaaadac92df in globus_l_callback_thread_poll (user_arg=0x2aaaadcf7820)
    at globus_callback_threads.c:2487
#3  0x00002aaaadae10dd in globus_l_thread_pool_thread_start (user_arg=0x609fe0) at globus_thread_pool.c:222
#4  0x00002aaab035103b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#5  0x00002aaaadf96851 in start_thread () from /lib64/libpthread.so.0
#6  0x00002aaaaecfa6dd in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x2aaabc301700 (LWP 31357)):
#0  0x00002aaaaecf32c3 in select () from /lib64/libc.so.6
#1  0x00002aaaad063630 in globus_l_xio_system_poll (user_args=)
    at globus_xio_system_select.c:1346
#2  0x00002aaaadac8e72 in globus_l_callback_thread_callback (user_arg=0x617230) at globus_callback_threads.c:2248
#3  0x00002aaaadae10dd in globus_l_thread_pool_thread_start (user_arg=0x2aaab4000bf0) at globus_thread_pool.c:222
#4  0x00002aaab035103b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#5  0x00002aaaadf96851 in start_thread () from /lib64/libpthread.so.0
#6  0x00002aaaaecfa6dd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x2aaabd164700 (LWP 31361)):
#0  0x00002aaaadf9e2a5 in sigwait () from /lib64/libpthread.so.0
#1  0x00002aaaadaca07b in globus_l_callback_thread_signal_poll (user_arg=0x2aaabd163ddc)
    at globus_callback_threads.c:2868
#2  0x00002aaab0350ea8 in globus_l_pthread_thread_cancellable_func (
    func=0x2aaaadaca000 , func_arg=0x2aaabd163ddc,
    cleanup_func=0x2aaaadac8810 , cleanup_arg=0x2aaabd163ddc,
    execute_cleanup=1) at globus_thread_pthreads.c:888
#3  0x00002aaaadac94cd in globus_l_callback_thread_signal_poll_wrapper (user_arg=)
    at globus_callback_threads.c:2944
#4  0x00002aaab035103b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#5  0x00002aaaadf96851 in start_thread () from /lib64/libpthread.so.0
#6  0x00002aaaaecfa6dd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x2aaabcd3c700 (LWP 31362)):
#0  0x00002aaaadf9a7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002aaab0351589 in globus_l_pthread_cond_timedwait (cv=0x2aaaadcf7888, mut=0x2aaaadcf7860,
    abstime=0x2aaabcd3bd20) at globus_thread_pthreads.c:768
#2  0x00002aaaadac92df in globus_l_callback_thread_poll (user_arg=0x2aaaadcf7820)
    at globus_callback_threads.c:2487
#3  0x00002aaaadae10dd in globus_l_thread_pool_thread_start (user_arg=0x2aaab4034030) at globus_thread_pool.c:222
#4  0x00002aaab035103b in thread_starter (temparg=) at globus_thread_pthreads.c:285
#5  0x00002aaaadf96851 in start_thread () from /lib64/libpthread.so.0
#6  0x00002aaaaecfa6dd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x2aaab034cfc0 (LWP 31353)):
#0  0x00002aaaadf9a43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004059ce in ?? ()
#2  0x00002aaaaec31cdd in __libc_start_main () from /lib64/libc.so.6
#3  0x0000000000403009 in ?? ()
#4  0x00007fffffffea98 in ?? ()
#5  0x000000000000001c in ?? ()

Thread 4 is at globus_xio_system_select.c:1346:

globus_l_xio_system_poll()
        ....
        nready = select(
            num,
            globus_l_xio_system_ready_reads,
            globus_l_xio_system_ready_writes,
            GLOBUS_NULL,
            (time_left_is_infinity ? GLOBUS_NULL : &time_left));

(gdb) thread 4
[Switching to thread 4 (Thread 0x2aaabc301700 (LWP 31357))]#0  0x00002aaaaecf32c3 in select ()
   from /lib64/libc.so.6
(gdb) up
#1  0x00002aaaad063630 in globus_l_xio_system_poll (user_args=)
    at globus_xio_system_select.c:1346
1346            nready = select(
(gdb) print time_left_is_infinity
$1 = 1
(gdb) print num
$2 = 9
globus_l_xio_system_read_operations[0]
$3 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[1]
$4 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[2]
$5 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[3]
$6 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[4]
$7 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[5]
$8 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[6]
$9 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[7]
$10 = (globus_i_xio_system_op_info_t *) 0x0
(gdb) print globus_l_xio_system_read_operations[8]
$11 = (globus_i_xio_system_op_info_t *) 0x2aaab8013370
(gdb) print globus_l_xio_system_read_operations[8][0]
$12 = {type = GLOBUS_I_XIO_SYSTEM_OP_ACCEPT, state = GLOBUS_I_XIO_SYSTEM_OP_PENDING, op = 0x2aaab8013630,
  handle = 0x2aaab800d480, error = 0x0, user_arg = 0x2aaab8011c50, nbytes = 0, waitforbytes = 1, offset = 0,
  sop = {non_data = {callback = 0x2aaaad082a20 , out_fd = 0x2aaab8011c58},
    data = {callback = 0x2aaaad082a20 , start_iov = 0x2aaab8011c58,
      start_iovc = 0, iov = 0x0, iovc = 0, addr = 0x0, flags = 0}}}


ie01$lsof -p 31353
COMMAND     PID    USER   FD   TYPE      DEVICE SIZE/OFF                NODE NAME
globus-gr 31353 arnoldg  cwd    DIR        0,28      640               30976 /
globus-gr 31353 arnoldg  rtd    DIR        0,28      640               30976 /
globus-gr 31353 arnoldg  txt    REG        0,28    35760           164340244 /usr/sbin/globus-gridftp-server
globus-gr 31353 arnoldg  mem    REG        0,28   154464               40768 /lib64/ld-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28   440504           164340225 /usr/lib64/libglobus_gridftp_server.so.6.0.38
globus-gr 31353 arnoldg  mem    REG        0,28   161168               59623 /usr/lib64/libglobus_ftp_control.so.1.3.6
globus-gr 31353 arnoldg  mem    REG        0,28   104712               59638 /usr/lib64/libglobus_io.so.3.6.4
globus-gr 31353 arnoldg  mem    REG        0,28   165656               59628 /usr/lib64/libglobus_gridftp_server_control.so.0.2.9
globus-gr 31353 arnoldg  mem    REG        0,28     9624               59636 /usr/lib64/libglobus_gssapi_error.so.2.2.1
globus-gr 31353 arnoldg  mem    REG        0,28    57680               59635 /usr/lib64/libglobus_gss_assist.so.3.5.9
globus-gr 31353 arnoldg  mem    REG        0,28    13328               59619 /usr/lib64/libglobus_authz.so.0.2.2
globus-gr 31353 arnoldg  mem    REG        0,28   127768               59637 /usr/lib64/libglobus_gssapi_gsi.so.4.6.8
globus-gr 31353 arnoldg  mem    REG        0,28    73104               59633 /usr/lib64/libglobus_gsi_proxy_core.so.0.6.2
globus-gr 31353 arnoldg  mem    REG        0,28    70336           164340212 /usr/lib64/libglobus_gsi_credential.so.1.5.0
globus-gr 31353 arnoldg  mem    REG        0,28    48136               59630 /usr/lib64/libglobus_gsi_callback.so.0.4.6
globus-gr 31353 arnoldg  mem    REG        0,28    39496               59639 /usr/lib64/libglobus_oldgaa.so.0.4.6
globus-gr 31353 arnoldg  mem    REG        0,28    52240               59634 /usr/lib64/libglobus_gsi_sysconfig.so.1.4.3
globus-gr 31353 arnoldg  mem    REG        0,28    23232               59631 /usr/lib64/libglobus_gsi_cert_utils.so.0.8.5
globus-gr 31353 arnoldg  mem    REG        0,28    15976               59644 /usr/lib64/libglobus_usage.so.0.3.1
globus-gr 31353 arnoldg  mem    REG        0,28    10512               59640 /usr/lib64/libglobus_openssl.so.0.3.2
globus-gr 31353 arnoldg  mem    REG        0,28    29352               59626 /usr/lib64/libglobus_gfork.so.0.3.2
globus-gr 31353 arnoldg  mem    REG        0,28   429928               59645 /usr/lib64/libglobus_xio.so.0.3.5
globus-gr 31353 arnoldg  mem    REG        0,28    18800               59641 /usr/lib64/libglobus_openssl_error.so.0.2.1
globus-gr 31353 arnoldg  mem    REG        0,28    19640               59620 /usr/lib64/libglobus_callout.so.0.2.4
globus-gr 31353 arnoldg  mem    REG        0,28     4864               59629 /usr/lib64/libglobus_gsi_authz_callout_error.so.0.2.2
globus-gr 31353 arnoldg  mem    REG        0,28    19296               59642 /usr/lib64/libglobus_proxy_ssl.so.1.3.1
globus-gr 31353 arnoldg  mem    REG        0,28   298600               59621 /usr/lib64/libglobus_common.so.0.14.10
globus-gr 31353 arnoldg  mem    REG        0,28   595800               40834 /lib64/libm-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28   142464               40862 /lib64/libpthread-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28   437016           164340124 /usr/lib64/libssl.so.1.0.1e
globus-gr 31353 arnoldg  mem    REG        0,28  1950976           164340122 /usr/lib64/libcrypto.so.1.0.1e
globus-gr 31353 arnoldg  mem    REG        0,28    19536               40797 /lib64/libdl-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28    88600               40880 /lib64/libz.so.1.2.3
globus-gr 31353 arnoldg  mem    REG        0,28  1912432               40784 /lib64/libc-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28    36880               59750 /usr/lib64/libltdl.so.7.2.1
globus-gr 31353 arnoldg  mem    REG        0,28   269472               40814 /lib64/libgssapi_krb5.so.2.2
globus-gr 31353 arnoldg  mem    REG        0,28   912944               40826 /lib64/libkrb5.so.3.3
globus-gr 31353 arnoldg  mem    REG        0,28    14664               40789 /lib64/libcom_err.so.2.1
globus-gr 31353 arnoldg  mem    REG        0,28   178952               40824 /lib64/libk5crypto.so.3.1
globus-gr 31353 arnoldg  mem    REG        0,28    43696               40827 /lib64/libkrb5support.so.0.1
globus-gr 31353 arnoldg  mem    REG        0,28    10192               40825 /lib64/libkeyutils.so.1.3
globus-gr 31353 arnoldg  mem    REG        0,28   110960               40865 /lib64/libresolv-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28   122040               40867 /lib64/libselinux.so.1
globus-gr 31353 arnoldg  mem    REG        0,28    16640               59643 /usr/lib64/libglobus_thread_pthread.so.0.14.10
globus-gr 31353 arnoldg  mem    REG        0,28    61704               59647 /usr/lib64/libglobus_xio_gsi_driver.so.0.2.3
globus-gr 31353 arnoldg  mem    REG        0,28    19664               59649 /usr/lib64/libglobus_xio_pipe_driver.so.0.2.2
globus-gr 31353 arnoldg  mem    REG        0,28    65928               40845 /lib64/libnss_files-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28    27424               40844 /lib64/libnss_dns-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28    90784               40807 /lib64/libgcc_s-4.4.6-20120305.so.1
globus-gr 31353 arnoldg  mem    REG        0,28   400051               64382 /usr/local/gridftp_lustre_dsi_r701/libglobus_gridftp_server_lustre.so.0.0.0
globus-gr 31353 arnoldg  mem    REG        0,28   182802               64414 /usr/local/gridmap_callout_1.3/libgridmap_callout.so
globus-gr 31353 arnoldg  mem    REG        0,28   305984               40829 /lib64/libldap-2.4.so.2.5.6
globus-gr 31353 arnoldg  mem    REG        0,28    55848               40850 /lib64/libpam.so.0.82.2
globus-gr 31353 arnoldg  mem    REG        0,28    60512               40828 /lib64/liblber-2.4.so.2.5.6
globus-gr 31353 arnoldg  mem    REG        0,28   242112               59877 /usr/lib64/libssl3.so
globus-gr 31353 arnoldg  mem    REG        0,28   181168               59869 /usr/lib64/libsmime3.so
globus-gr 31353 arnoldg  mem    REG        0,28  1288016               59796 /usr/lib64/libnss3.so
globus-gr 31353 arnoldg  mem    REG        0,28   154456               59802 /usr/lib64/libnssutil3.so
globus-gr 31353 arnoldg  mem    REG        0,28    14560               40857 /lib64/libplds4.so
globus-gr 31353 arnoldg  mem    REG        0,28    18720               40856 /lib64/libplc4.so
globus-gr 31353 arnoldg  mem    REG        0,28   240592               40842 /lib64/libnspr4.so
globus-gr 31353 arnoldg  mem    REG        0,28   106160               59863 /usr/lib64/libsasl2.so.2.0.23
globus-gr 31353 arnoldg  mem    REG        0,28   113096               40780 /lib64/libaudit.so.1.0.0
globus-gr 31353 arnoldg  mem    REG        0,28    40400               40790 /lib64/libcrypt-2.12.so
globus-gr 31353 arnoldg  mem    REG        0,28   383504               40805 /lib64/libfreebl3.so
globus-gr 31353 arnoldg  mem    REG        0,28    44328               40847 /lib64/libnss_ldap.so.2
globus-gr 31353 arnoldg    0u  sock         0,6      0t0           164352086 can't identify protocol
globus-gr 31353 arnoldg    1u  sock         0,6      0t0           164352086 can't identify protocol
globus-gr 31353 arnoldg    2u   CHR         1,3      0t0                3829 /dev/null
globus-gr 31353 arnoldg    3r  FIFO         0,8      0t0           164352091 pipe
globus-gr 31353 arnoldg    4w  FIFO         0,8      0t0           164352091 pipe
globus-gr 31353 arnoldg    5w   CHR         1,3      0t0                3829 /dev/null
globus-gr 31353 arnoldg    6w   CHR         1,3      0t0                3829 /dev/null
globus-gr 31353 arnoldg    7u  IPv4   164352100      0t0                 UDP *:41535
globus-gr 31353 arnoldg    8u  IPv4   164352162      0t0                 TCP *:47853 (LISTEN)
globus-gr 31353 arnoldg    9w   REG 1590,444806        0 1125442682274034226 /mnt/b/projects/staff/bw_seas/arnoldg/xt-cle3.1/cp2k/2.4/cnl3.1_gnu4.6.2/test.log

Only 'live' TCP port is port 47853 which is a response to PASV.

ie01$rpm -qa |grep globus
globus-gsi-openssl-error-2.1-11.el6.x86_64
globus-gssapi-gsi-10.8-1.el6.x86_64
globus-authz-callout-error-2.2-8.el6.x86_64
globus-xio-pipe-driver-2.2-6.el6.x86_64
globus-gram-job-manager-pbs-debuginfo-1.6-4.el6.x86_64
globus-gass-transfer-debuginfo-7.2-7.el6.x86_64
globus-gridftp-server-control-debuginfo-2.9-2.el6.x86_64
globus-gridmap-eppn-callout-debuginfo-0.4-1.el6.x86_64
globus-gfork-debuginfo-3.2-6.el6.x86_64
globus-xio-popen-driver-debuginfo-2.3-6.el6.x86_64
globus-xio-debuginfo-3.5-1.el6.x86_64
globus-gridftp-5.2.2-1.el6.x86_64
globus-gsi-callback-4.6-1.el6.x86_64
globus-xio-gsi-driver-2.3-8.el6.x86_64
globus-xio-popen-driver-2.3-6.el6.x86_64
globus-gsi-cert-utils-progs-8.5-2.el6.x86_64
globus-common-debuginfo-14.10-1.el6.x86_64
globus-gsi-cert-utils-debuginfo-8.5-2.el6.x86_64
globus-gram-job-manager-condor-debuginfo-1.4-3.el6.x86_64
globus-gram-job-manager-sge-debuginfo-1.7-1.el6.x86_64
globus-gram-client-debuginfo-12.4-7.el6.x86_64
globus-ftp-client-debuginfo-7.5-1.el6.x86_64
globus-gsi-credential-6.0-2.el6.x86_64
globus-usage-3.1-9.el6.x86_64
globus-openssl-module-3.2-7.el6.x86_64
globus-gss-assist-8.9-1.el6.x86_64
globus-io-9.4-2.el6.x86_64
globus-gass-transfer-7.2-7.el6.x86_64
globus-xioperf-debuginfo-3.1-6.el6.x86_64
globus-rsl-debuginfo-9.1-11.el6.x86_64
globus-openssl-module-debuginfo-3.2-7.el6.x86_64
globus-gsi-proxy-core-debuginfo-6.2-7.el6.x86_64
globus-gatekeeper-debuginfo-9.15-1.el6.x86_64
globus-gram-protocol-debuginfo-11.3-7.el6.x86_64
globus-gss-assist-debuginfo-8.9-1.el6.x86_64
globus-ftp-client-7.5-1.el6.x86_64
globus-gss-assist-progs-8.9-1.el6.x86_64
globus-scheduler-event-generator-debuginfo-4.7-3.el6.x86_64
globus-gram-job-manager-callout-error-debuginfo-2.1-11.el6.x86_64
globus-usage-debuginfo-3.1-9.el6.x86_64
globus-gass-copy-debuginfo-8.6-3.el6.x86_64
globus-gridftp-server-6.38-1.el6.x86_64
globus-common-14.10-1.el6.x86_64
globus-gsi-cert-utils-8.5-2.el6.x86_64
globus-gsi-proxy-core-6.2-7.el6.x86_64
globus-xio-3.5-1.el6.x86_64
globus-ftp-control-4.6-1.el6.x86_64
globus-gfork-3.2-6.el6.x86_64
globus-gass-copy-8.6-3.el6.x86_64
globus-gridmap-callout-error-debuginfo-1.2-9.el6.x86_64
globus-gsi-credential-debuginfo-5.6-1.el6.x86_64
globus-gsi-callback-debuginfo-4.6-1.el6.x86_64
globus-gssapi-gsi-debuginfo-10.8-1.el6.x86_64
globus-gass-cache-debuginfo-8.1-9.el6.x86_64
globus-gass-server-ez-debuginfo-4.3-5.el6.x86_64
globus-gram-job-manager-fork-debuginfo-1.5-8.el6.x86_64
globus-gsi-openssl-error-debuginfo-2.1-11.el6.x86_64
globus-callout-debuginfo-2.4-1.el6.x86_64
globus-gram-job-manager-debuginfo-13.53-1.el6.x86_64
globus-io-debuginfo-9.4-2.el6.x86_64
globus-gram-job-manager-lsf-debuginfo-1.1-1.el6.x86_64
globus-gridftp-server-debuginfo-6.37-1.el6.x86_64
globus-gridftp-server-progs-6.38-1.el6.x86_64
globus-proxy-utils-5.1-1.el6.x86_64
globus-gsi-sysconfig-5.3-5.el6.x86_64
globus-gssapi-error-4.1-11.el6.x86_64
globus-common-progs-14.10-1.el6.x86_64
globus-gass-copy-progs-8.6-3.el6.x86_64
globus-gssapi-error-debuginfo-4.1-11.el6.x86_64
globus-gsi-proxy-ssl-debuginfo-4.1-11.el6.x86_64
globus-authz-callout-error-debuginfo-2.2-8.el6.x86_64
globus-gsi-sysconfig-debuginfo-5.3-5.el6.x86_64
globus-authz-debuginfo-2.2-8.el6.x86_64
globus-gram-client-tools-debuginfo-10.4-4.el6.x86_64
globus-gsi-proxy-ssl-4.1-11.el6.x86_64
globus-callout-2.4-1.el6.x86_64
globus-authz-2.2-8.el6.x86_64
globus-gridftp-server-control-2.9-2.el6.x86_64
globus-xio-gsi-driver-debuginfo-2.3-8.el6.x86_64
globus-gass-cache-program-debuginfo-5.2-2.el6.x86_64
globus-gridmap-verify-myproxy-callout-debuginfo-1.2-2.el6.x86_64
globus-proxy-utils-debuginfo-5.1-1.el6.x86_64
globus-ftp-control-debuginfo-4.6-1.el6.x86_64
globus-xio-pipe-driver-debuginfo-2.2-6.el6.x86_64

Comments

Mike Link - 2014-03-17

This was fixed in the 5.2.5 release, which it seems you only partially have.  The specific fix was in globus-gridftp-server-control 2.10, which was released at the same time as globus-gridftp-server 6.37.  It looks like 2.10 is currently in epel.

We don't directly control when updates make it into epel, but it would be strange for those two versions to be part of the same install.  Did you update only the globus-gridftp-server package at some point?

Jason Alt - 2014-03-17

You are right. After I encountered the problem, I did update Globus in order to make sure it was at the latest and greatest before diagnosing the problem. But looking at the history on that machine, I only did a yum update of globus-gridftp-server-progs; globus-gridftp-server-control did not get updated.

We'll do a full Globus update and verify that the problem is gone. Thanks.

Globus Toolkit/GT-523

Summary

Update GSI OpenSSH configuration instructions to include "="--with-default-path=/usr/local/bin:/bin:/usr/bin"

Details

Type: Improvement

Status: Open

Description

Update build instructions at http://toolkit.globus.org/toolkit/docs/5.2/5.2.5/gsiopenssh/admin/#gsiopenssh-admin-installing-configure-options to reflect
default-path with "/usr/local/bin:/bin:/usr/bin" in globus tollkit rpms.  Otherwise a user building the GSI OpenSSH will get a default-path of "/usr/bin:/bin:/usr/sbin:/sbin

Example
--with-gsiopensshargs="--with-default-path=/usr/local/bin:/bin:/usr/bin"

The default-path in Globus toolkit gsi-openssh rpm

staff$ wget -q http://toolkit.globus.org/ftppub/gt5/5.2/5.2.5/packages/rpm/centos/6/x86_64/gsi-openssh-5.6-1.el6.x86_64.rpm
staff$ rpm2cpio gsi-openssh-server-5.6-1.el6.x86_64.rpm |  cpio -idmv
staff$ strings ./usr/sbin/gsisshd | grep /usr/bin
/usr/bin/xauth
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/usr/bin/passwd
/usr/local/bin:/bin:/usr/bin


Centos /usr/sbin/sshd also has /usr/local/bin in default path

master$ rpm -qf /usr/sbin/sshd
openssh-server-6.2p2-1.x86_64
master$  strings /usr/sbin/sshd | grep /usr/bin
/usr/bin/xauth
/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/usr/bin/passwd
/usr/local/bin:/bin:/usr/bin
master$

But, building GSI OpenSSH from scratch using current instructions (without default-path option) does not include /usr/local/bin

staff$ wget -q http://toolkit.globus.org/ftppub/gt5/5.2/5.2.5/installers/src/gt5.2.5-all-source-installer.tar.gz
staff$ tar xf gt5.2.5-all-source-installer.tar.gz
staff$ cd gt5.2.5-all-source-installer
staff$ ./configure --prefix=/opt/apps/xsede/gsi-openssh-6.2p1  --with-gsiopensshargs="--with-pam --with-md5-passwords --with-tcp-wrappers"
staff$ make gsi-openssh
staff$ make gsi-openssh install
staff$ strings /opt/apps/xsede/gsi-openssh-6.2p1/sbin/sshd | grep /usr/bin

/usr/bin/xauth
/usr/bin:/bin:/usr/sbin:/sbin:/opt/apps/xsede/gsi-openssh-6.2p1/bin
/usr/bin/passwd
/usr/bin:/bin:/usr/sbin:/sbin:/opt/apps/xsede/gsi-openssh-6.2p1/bin

staff$ head -5 /opt/apps/xsede/gsi-openssh-6.2p1/etc/ssh/sshd_config
#       $OpenBSD: sshd_config,v 1.89 2013/02/06 00:20:42 dtucker Exp $

# This is the sshd server system-wide configuration file.  See
# sshd_config(5) for more information.

# This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin

Comments

Globus Toolkit/GT-524

Summary

Can’t build gsi_openssh_bundle-5.7-src.tar.gz with --with-nerscmod option

Details

Type: Bug

Status: Open

Description

Can't build the gsi-openssh-5.7 update for Globus 5.2.5 with the "with-nerscmod" option

staff$ http://toolkit.globus.org/ftppub/gt5/5.2/5.2.5/updates/src/gsi_openssh-5.7-src.tar.gz
staff$ tar xf gsi_openssh-5.7-src.tar.gz

or from

staff$ wget --no-check-certificate https://sourceforge.net/projects/cilogon/files/gsissh/gsi_openssh_bundle-5.7-src.tar.gz/download
staff$ tar zxf gsi_openssh_bundle-5.7-src.tar.gz
staff$ tar xf gsi_openssh-5.7-src.tar.gz



staff$ cd gsi_openssh-5.7-src
staff$ ./configure --prefix=/opt/apps/xsede/gsi-openssh-5.7 --with-default-path=/usr/local/bin:/bin:/usr/bin --with-superuser-path=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin --with-pam --with-md5-passwords --with-tcp-wrappers --with-nerscmod

staff$ make
 ...

gcc -g -O2 -Wall -Wpointer-arith -Wsign-compare -Wformat-security -Wno-pointer-sign -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -fno-builtin-memset -fstack-protector-all  -I. -I.  -D_PATH_PRIVSEP_CHROOT_DIR=\"/var/empty\" -DHAVE_CONFIG_H -c channels.c
channels.c: In function ?channel_handle_wfd?:
channels.c:1926: error: ?Channel? has no member named ?wfd_isatty?
make: *** [channels.o] Error 1

It looks like "wfd_isatty" in channels.h  is only defined when compiling on AIX

staff$ grep -A 1 -B 1 wfd_isatty *.h *.c
channels.h-#ifdef _AIX
channels.h:     int     wfd_isatty;     /* wfd is a tty */
channels.h-#endif
--
channels.c-     /* XXX: Later AIX versions can't push as much data to tty */
channels.c:     c->wfd_isatty = is_tty || isatty(c->wfd);
channels.c-#endif
--
channels.c-             /* XXX: Later AIX versions can't push as much data to tty */
channels.c:             if (compat20 && c->wfd_isatty)
channels.c-                     dlen = MIN(dlen, 8*1024);
--
channels.c-                     /* this section for filtering unwanted data */
channels.c:                     if ( !c->wfd_isatty  && c->audit_enable == 1 ) {
channels.c-                             int print_len = 0;
staff$

I am glad that you all are picking up the ncsa gsi, psc hpn, and now nersc patches!

Thanks,
David

Comments

Anonymous - 2014-05-23

David,

I've committed a fix for this build problem to the GSI-OpenSSH CVS. You can apply the fix as a patch like this:

$ cd gsi_openssh-5.7-src
$  wget http://lists.globus.org/pipermail/gsi-openssh-commit/2014-May/000505.html
$ patch < 000505.html
patching file channels.h
patching file channels.c
Hunk #1 succeeded at 273 with fuzz 2.

The fix will appear in the next GSI-OpenSSH release. I've also reported the issue to the NERSC developers.

Thanks,
Jim Basney

Globus Toolkit/GT-525

Summary

globus-gridftp-server crashes at startup with assertion failed

Details

Type: Bug

Status: Open

Description

Hello.

When trying to start globus-gridftp-server I am receiving assertion failed:

# /usr/sbin/globus-gridftp-server -p 2811 -S -d error,warn,info -l /var/log/gridftp-session.log -Z /var/log/globus-gridftp.log
Assertion 0 && "globus_hashtable_lookup bad parms" failed in file globus_hashtable.c at line 390
Aborted

]# gdb /usr/sbin/globus-gridftp-server
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /usr/sbin/globus-gridftp-server...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/sbin/globus-gridftp-server
[Thread debugging using libthread_db enabled]
Assertion 0 && "globus_hashtable_lookup bad parms" failed in file globus_hashtable.c at line 390

GDB output:

# gdb /usr/sbin/globus-gridftp-server
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /usr/sbin/globus-gridftp-server...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/sbin/globus-gridftp-server
[Thread debugging using libthread_db enabled]
Assertion 0 && "globus_hashtable_lookup bad parms" failed in file globus_hashtable.c at line 390

Program received signal SIGABRT, Aborted.
0x00007ffff6f3a925 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install globus-gridftp-server-progs-6.38-1.el6.x86_64
(gdb) bt
#0  0x00007ffff6f3a925 in raise () from /lib64/libc.so.6
#1  0x00007ffff6f3c105 in abort () from /lib64/libc.so.6
#2  0x00007ffff74dd2be in globus_hashtable_lookup () from /usr/lib64/libglobus_common.so.0
#3  0x00007ffff7b85631 in globus_i_gfs_config_int () from /usr/lib64/libglobus_gridftp_server.so.6
#4  0x00007ffff7b8290e in globus_gfs_log_exit_message () from /usr/lib64/libglobus_gridftp_server.so.6
#5  0x00007ffff7b86ea6 in ?? () from /usr/lib64/libglobus_gridftp_server.so.6
#6  0x00007ffff7b873ef in globus_i_gfs_config_init_envs () from /usr/lib64/libglobus_gridftp_server.so.6
#7  0x0000000000404cd1 in main ()

Version:
globus-gridftp-server-6.38-1.el6.x86_64

Comments

Globus Toolkit/GT-526

Summary

GridFTP server allows disabled accounts

Details

Type: Improvement

Status: Resolved 2015-03-12

Description

GridFTP does not check a users shell to determine if the account is valid for login.  It is common to set a shell to /dev/null to disable user logins, but this will not prevent a GridFTP login.

Comments

Globus Toolkit/GT-527

Summary

GridFTP doesn’t verify sharing state files/dir permissions

Details

Type: Improvement

Status: Resolved 2015-03-12

Description

GridFTP server creates sharing config files and the sharing state dir with user-only permissions, but it does not verify that the permissoins are proper at the time of use. This is important in case of tampering, but also in cases where admins manage the sharing files themselves and may not set restrictive enough permissions.

Comments

Globus Toolkit/GT-528

Summary

SITE RESTRICT parameters may cause errors

Details

Type: Bug

Status: Resolved 2015-03-12

Description

When SITE RESTRICT is given multiple paths, one of them is /, and chroot is also /, the rp list can become corrupted.

Comments

Globus Toolkit/GT-529

Summary

port_range config is not applied in inetd or sshftp mode

Details

Type: Bug

Status: Resolved 2015-03-12

Description

The port_range config option sets GLOBUS_TCP_PORT_RANGE, but does not do it before xio is activated, which is when the setting is checked.  With daemon mode the children are forked and the initial env is inherited, so this isn't a problem, but that isn't the case with sshftp and inetd.

Comments

Globus Toolkit/GT-530

Summary

Assertion in event callout module, which results in a connection resets on STOR or RETR

Details

Type: Bug

Status: Open

Description

With threads and small files, there can be an assertion in the event callout module due to the finishing op and the next op overlapping.

Comments

Globus Toolkit/GT-531

Summary

Overlapping operations while threaded cause assertion

Details

Type: Bug

Status: Open

Description

Operation 2 can start before Operation 1 resets the session state, causing operation 2 to abort.

Comments

Globus Toolkit/INF-459

Summary

git release-tag should not allow date specification

Details

Type: Task

Status: Closed 2014-05-27

Description

In line with the philosophy that less is more, git release-tag should not allow a user to specify a date. This means that release tags are specified by the date on which they are created.

Comments

Globus Toolkit/GT-534

Summary

globus gridftp client library random crashes

Details

Type: Bug

Status: Resolved 2015-04-10

Description

Hi,

We are noticing random gridftp client library crashes in FTS.
Attached FTS log file as also GridFTP debug module output.

Stack-trace:
/lib64/libc.so.6() [0x3aaec329a0]
/lib64/libc.so.6(gsignal+0x35) [0x3aaec32925]
/lib64/libc.so.6(abort+0x175) [0x3aaec34105]
/usr/lib64/libglobus_ftp_client.so.2(globus_i_ftp_client_response_callback+0x5ff) [0x7fd17445b73f]
/usr/lib64/libglobus_ftp_control.so.1(+0xab76) [0x7fd16fde3b76]
/usr/lib64/libglobus_io.so.3(+0x10143) [0x7fd174019143]
/usr/lib64/libglobus_xio.so.0(globus_l_xio_read_write_callback_kickout+0xc8) [0x7fd16fb86298]
/usr/lib64/libglobus_xio.so.0(globus_i_xio_read_write_callback+0x370) [0x7fd16fb86830]
/usr/lib64/libglobus_xio.so.0(globus_l_xio_driver_op_read_kickout+0x14c) [0x7fd16fb8e37c]
/usr/lib64/libglobus_xio.so.0(globus_xio_driver_finished_read+0x40d) [0x7fd16fb9977d]
/usr/lib64/libglobus_xio.so.0(+0x4fdf5) [0x7fd16fbbfdf5]
/usr/lib64/libglobus_xio.so.0(+0x502e6) [0x7fd16fbc02e6]
/usr/lib64/libglobus_xio.so.0(+0x2f12f) [0x7fd16fb9f12f]
/usr/lib64/libglobus_common.so.0() [0x30ae21a1bc]
/usr/lib64/libglobus_common.so.0(globus_l_thread_pool_thread_start+0x1d) [0x30ae2320cd]
/usr/lib64/libglobus_thread_pthread.so.0(+0x202b) [0x7fd16f76c02b]
/lib64/libpthread.so.0() [0x3aaf4079d1]
/lib64/libc.so.6(clone+0x6d) [0x3aaece8b6d]

From the control channel:
gridftp debug :: response from gsiftp://stormgf2.pi.infn.it:2811//gpfs/ddn/srm/cms/store/user/riahi/ZJetsToNuNu_PtZ-100_8TeV-madgraph/140505_082330_crab_20140505_102307/140505_082330/0000/out_124.root:
500-500 Command failed.^M^M
500- : globus_ftp_control_local_pasv failed.^M
500-globus_xio: globus_l_xio_tcp_contact_string failed.^M
500-globus_xio: globus_libc_addr_to_contact_string failed.^M
500-globus_common: globus_libc_gethostaddr failed^M
500 End.^M

gridftp debug :: fault on connection to gsiftp://stormgf2.pi.infn.it:2811//gpfs/ddn/srm/cms/store/user/riahi/ZJetsToNuNu_PtZ-100_8TeV-madgraph/140505_082330_crab_20140505_102307/140505_082330/0000/out_124.root: globus_ftp_client: the server responded with an error
Assertion client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_SOURCE_LIST || client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_SOURCE_RETR_OR_ERET || client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_DEST_STOR_OR_ESTO || client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_THIRD_PARTY_TRANSFER || client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_THIRD_PARTY_TRANSFER_ONE_COMPLETE || (client_handle->op == GLOBUS_FTP_CLIENT_TRANSFER && client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_SOURCE_SETUP_CONNECTION) || (client_handle->op == GLOBUS_FTP_CLIENT_TRANSFER && client_handle->state == GLOBUS_FTP_CLIENT_HANDLE_SOURCE_CONNECT) failed in file globus_ftp_client_state.c at line 4989
~
GridFTP Version:
rpm -qa | grep -i globus-
globus-gsi-cert-utils-8.6-2.el6.x86_64
globus-io-9.5-1.el6.x86_64
globus-gsi-credential-6.0-2.el6.x86_64
globus-xio-popen-driver-2.3-1.el6.x86_64
globus-gsi-proxy-ssl-4.1-10.el6.x86_64
globus-gsi-callback-4.6-2.el6.x86_64
globus-gssapi-gsi-10.10-2.el6.x86_64
globus-gass-transfer-7.2-9.el6.x86_64
globus-callout-2.4-2.el6.x86_64
globus-ftp-client-7.6-1.el6.x86_64
globus-gass-copy-8.6-7.el6.x86_64
globus-gsi-openssl-error-2.1-10.el6.x86_64
globus-gsi-sysconfig-5.3-8.el6.x86_64
globus-gsi-proxy-core-6.2-9.el6.x86_64
globus-gssapi-error-4.1-10.el6.x86_64
globus-xio-3.6-2.el6.x86_64
globus-gss-assist-9.0-1.el6.x86_64
globus-ftp-control-4.7-1.el6.x86_64
globus-common-14.10-2.el6.x86_64
globus-openssl-module-3.3-2.el6.x86_64
globus-xio-gsi-driver-2.4-1.el6.x86_64

Comments

Anonymous - 2014-05-27

More detailed info on versions used:

[root@fts105 ~]# uname -a
Linux fts105.cern.ch 2.6.32-431.1.2.el6.x86_64 #1 SMP Fri Dec 13 08:31:15 CET 2013 x86_64 x86_64 x86_64 GNU/Linux


globus version 5.2.5 from EPEL 6

[root@fts105 ~]# globus-version
5.2.5

Anonymous - 2014-05-27

How can I add attachments?

Right now attachments can be found here:
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105158

and here:
https://rt.ige-project.eu/rt/Ticket/Display.html?id=380

Eric Blau - 2014-08-15

I believe that I'm hitting this bug in the esg#esg-anl-gov endpoint, as seen in Globus support ticket 302710

https://globusonline.zendesk.com/agent/#/tickets/302710

Mattias Ellert - 2014-10-23

These are the attachments from the GGUS ticket.

Mattias Ellert - 2014-10-23

I promised to forward relevant parts of the discussion from the GGUS ticket to here.

Most of the relevant discussion (i.e. excluding entries related to the transfer of the ticket between different support units) is already here. The attachments from the original report were not here before so I have added then. The only missing part of the discussion not yet here were the last part where I said.

"The log ends with an assertion failure. The assertion checks for a number of allowed values for client_handle->state. Knowing what value client_handle->state has when the assertion fails would help."

The reply form the reporter was:

"I would safely assume it's "GLOBUS_FTP_CLIENT_HANDLE_DEST_SETUP_CONNECTION" since it failed when GridFTP client attempted to connect to the destination GridFTP server."

I can not really judge how safe this assumption is.

Mattias Ellert - 2015-03-09

Feedback received on the GGUS ticket:
https://ggus.eu/index.php?mode=ticket_info&ticket_id=105158

"The ticket should indeed be closed, tested it and it seems to have fixed the issue reported."

Mike Link - 2015-03-12

Fixed in globus-ftp-client-8.20.

Globus Toolkit/UX-2333

Summary

Testing: Update test suite to Jasmine 2.0

Details

Type: Task

Status: Closed 2015-01-21

Description

This will be easier to do if we clean up test suites such that all waitsFor() calls (the main thing that has been removed) are consolidated in one place, which is covered by UX-2334. The general model for doing this was covered by UX-2298.

Comments

Dan Morgan - 2015-01-21

Because Ember

Globus Toolkit/GT-536

Summary

globus-job-manager segfaults from time to time

Details

Type: Task

Status: Open

Description

Hi,
I still have the globus-job-manager that segfaults from time to time.
I have
globus-gram-job-manager-13.53-1.3.osg32.el6.x86_64
installed via OSG version 3.2.4 on an EL6 machine
with 12 core AMD 2435, 32 GB of memor.

The segfaults are not as frequent as the ones reported in
http://jira.globus.org/browse/GT-520, but still they occur every week or so.

Recently 3 different type of core dumps were created with the globus-job-manager.
I included the 3 backtraces below.
I tried to create an issue with the globus jira, but I think it went to a wrong project:
http://jira.globus.org/browse/FACEIT-8#
Can you check them and see what you can do?

Thanks,
Bockjoo

gdb /usr/sbin/globus-job-manager core-globus-job-mana-11-723-5001-11833-1401323320

(gdb) bt
#0 0x0000003588a1dee8 in globus_l_common_path_error_instance (errmsg=0x3588a3950f "Can't resolve path") at globus_common_paths.c:75
#1 0x0000003588a1e0a0 in globus_eval_path (pathstring=, bufp=0x7fffb39e6038) at globus_common_paths.c:240
#2 0x0000003588a3099c in globus_i_thread_pre_activate () at globus_thread.c:166
#3 0x0000003588a28d8d in globus_l_module_initialize (module_descriptor=0x63e480, deactivate_cb=0, user_arg=0x0) at globus_module.c:871
#4 globus_module_activate_proxy_new (module_descriptor=0x63e480, deactivate_cb=0, user_arg=0x0) at globus_module.c:196
#5 0x00000000004091e1 in main (argc=7, argv=0x7fffb39e6668) at main.c:156

gdb /usr/sbin/globus-job-manager core-globus-job-mana-11-723-5001-30226-1401021087

(gdb) bt
#0 0x0000003582289fa4 in _wordcopy_fwd_dest_aligned () from /lib64/libc.so.6
#1 0x0000003582283b5e in memmove () from /lib64/libc.so.6
#2 0x0000000000417ef7 in globus_l_gram_job_manager_script_read (handle=, result=0, buffer=,
    len=, nbytes=, data_desc=, user_arg=0x3219cf0)
    at /usr/include/bits/string3.h:58
#3 0x000000358da162f8 in globus_l_xio_read_write_callback_kickout (user_arg=0x185b840) at globus_xio_handle.c:1224
#4 0x000000358da16890 in globus_i_xio_read_write_callback (op=0x185b840, result=0, nbytes=5778, user_arg=)
    at globus_xio_handle.c:1192
#5 0x000000358da1e46c in globus_l_xio_driver_op_read_kickout (user_arg=0x185b840) at globus_xio_driver.c:637
#6 0x000000358da2986d in globus_xio_driver_finished_read (in_op=0x185b840, result=0, nbytes=)
    at globus_xio_pass.c:1238
#7 0x00002b45e8499935 in ?? () from /usr/lib64/libglobus_xio_popen_driver.so.0
#8 0x000000358da2f21f in globus_l_xio_system_kickout (user_arg=0x48d0390) at globus_xio_system_select.c:920
#9 0x0000003588a18044 in globus_callback_space_poll_nothreads (timestop=0x3588a3b510, space=)
    at globus_callback_nothreads.c:1437
#10 0x0000003588a38ab0 in globus_l_thread_none_cond_wait (cv=0x7fff1eb7e110, mut=0x7fff1eb7e0e8) at globus_thread_none.c:371
#11 0x000000000040988b in main (argc=, argv=) at main.c:646



gdb /usr/sbin/globus-job-manager core-globus-job-mana-6-723-5001-26169-1401200405
(gdb) bt
#0 0x0000003582232925 in raise () from /lib64/libc.so.6
#1 0x0000003582234105 in abort () from /lib64/libc.so.6
#2 0x0000003582270837 in __libc_message () from /lib64/libc.so.6
#3 0x0000003582276166 in malloc_printerr () from /lib64/libc.so.6
#4 0x0000003582278c93 in _int_free () from /lib64/libc.so.6
#5 0x0000000000417eb4 in globus_l_gram_job_manager_script_read (handle=, result=0, buffer=,
    len=, nbytes=, data_desc=, user_arg=0xf2c8600)
    at globus_gram_job_manager_script.c:678
#6 0x000000358da162f8 in globus_l_xio_read_write_callback_kickout (user_arg=0xf764180) at globus_xio_handle.c:1224
#7 0x000000358da16890 in globus_i_xio_read_write_callback (op=0xf764180, result=0, nbytes=53350, user_arg=)
    at globus_xio_handle.c:1192
#8 0x000000358da1e46c in globus_l_xio_driver_op_read_kickout (user_arg=0xf764180) at globus_xio_driver.c:637
#9 0x000000358da2986d in globus_xio_driver_finished_read (in_op=0xf764180, result=0, nbytes=)
    at globus_xio_pass.c:1238
#10 0x00002b3bd9a10935 in ?? () from /usr/lib64/libglobus_xio_popen_driver.so.0
#11 0x000000358da2f21f in globus_l_xio_system_kickout (user_arg=0xf19ed10) at globus_xio_system_select.c:920
#12 0x0000003588a18044 in globus_callback_space_poll_nothreads (timestop=0x3588a3b510, space=)
    at globus_callback_nothreads.c:1437
#13 0x0000003588a38ab0 in globus_l_thread_none_cond_wait (cv=0x7fff6ef418d0, mut=0x7fff6ef418a8) at globus_thread_none.c:371
#14 0x000000000040988b in main (argc=, argv=) at main.c:646

Comments

Joe Bester - 2014-06-05

Can you share the core files?

Joe Bester - 2014-06-06

I spent most of today trying to get ideas from those. The core core-globus-job-mana-11-723-5001-11833-1401323320 seems to be a memory allocation failing extremely early in the run of the process (before the program really does anything). Is there some low memory situation occurring or some tight limits for the GRAM process?

I couldn't get gdb to parse core-globus-job-mana-6-723-5001-26169-1401200405 at all (just a bunch of ?? lines) so I can't see anything there.

The other one seems to show some memory corruption, with the stack showing problems in the script response parsing. Poking around a bit, it looked like globus-url-copy was returning multiple-line errors, which may be confusing the parser, but I'm not sure, as the heap is pretty messed up by the time the core was generated and not much of the data values seemed legitimate.

Anonymous - 2014-06-06

Below is what appeared on the terminal at the time of the issue.
So, your memory comment makes sense.
I did not have the chance to check what caused the memory exhaustion, though.
Bockjoo

> [root@osg ~]#
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14579]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14394]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14452]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14594]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14635]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14563]
>
> Message from syslogd@osg at May 28 20:34:19 ...
>  t of memory [14653]
>

Joe Bester - 2014-06-09

Can you run rpm -V globus-gram-job-manager-scripts so I can see if there are any local modifications to the GRAM script? If there are any files that indicate changes from that, could you send me them so I can see if they might be contributing to this problem?

Tim Cartwright - 2014-06-09

OSG applies three patches to the globus-gram-job-manager-scripts package. They are all in JIRA (GT-463, GT-467, GT-468), waiting to be incorporated into a future GT release, but for reference, I will attach the latest patches. You can also browse our Koji build system, which might provide a little extra information about the patches and build; the globus-gram-job-manager-scripts source package is here:

https://koji-hub.batlab.org/koji/packageinfo?packageID=37

I will attach our modified spec file as well, so that you can see how the patches are applied, etc.

Let me know if there is anything else that I can provide or help with!

Anonymous - 2014-06-09

 rpm -V globus-gram-job-manager-scripts
S.5....T.    /usr/share/perl5/vendor_perl/Globus/GRAM/JobManager.pm

Yes, I have a lot of modification in
/usr/share/perl5/vendor_perl/Globus/GRAM/JobManager.pm
to debug the stageout issue:
http://oo.ihepa.ufl.edu:8080/t2/operations/glidein/stageout.failure.txt

The modified one is uploaded here:
http://oo.ihepa.ufl.edu:8080/t2/operations/glidein/JobManager.pm

Thanks,
Bockjoo

Anonymous - 2014-06-09

One of the motivations of the modification was the empty stdout/stderr.
In the pm, if any of output file sometimes fails to be staged out, the JobManager.pm
issues the error condition without sending the stdout/stderr.
One such a case is the job hitting the walltime and stdout/stderr sometimes
(not always) becomes empty.

Joe Bester - 2014-06-13

I notice that the output from globus-url-copy -v is being returned to the script processor without any escaping. That might be causing the problem, as it can contain newlines and throw off the parser. If you pass that through the log() method in the JobManager object it might help things be more stable.

Anonymous - 2014-06-17

Matyas Selmeci at Wisconsin wrote the patch for me.
The working patch is
--- JobManager.pm       2014/06/16 21:25:13     1.1
+++ JobManager.pm       2014/06/16 21:34:24     1.2
@@ -132,6 +132,15 @@
     return undef;
 }

+sub quotestr
+{
+    my $str = $_[0];
+    $str =~ s/\\/\\\\/g;
+    $str =~ s/\n/\\n/g;
+    $str =~ s/\"/\\\"/g;
+    return "\"$str\"";
+}
+
 =item $manager->log($string)

 Log a message to the job manager log file. The message is preceded by
@@ -1013,7 +1022,7 @@

             $self->respond( {
                 'GT3_FAILURE_TYPE' => 'filestageout',
-                'GT3_FAILURE_MESSAGE' => $stderr,
+                'GT3_FAILURE_MESSAGE' => quotestr($stderr),
                 'GT3_FAILURE_SOURCE' => $local,
                 'GT3_FAILURE_DESTINATION' => $remote
             });
and is applied to
http://oo.ihepa.ufl.edu:8080/t2/operations/glidein/JobManager.pm.0.8.5
Bockjoo Kim at U of Florida

Globus Toolkit/GT-537

Summary

GCS uses multiuser in config settings and doc

Details

Type: Improvement

Status: Open

Description

Globus Connect Server config file includes references to globus-connect-multiuser in the comments and in the settings (e.g. some default paths like /var/lib/globus-connect-multiuser/grid-security/hostcert.pem). Any references to multiuser should be removed/changed to server.

Comments

Globus Toolkit/GT-538

Summary

Improve checking of FQDNs in globus-connect-server-setup

Details

Type: Task

Status: Open

Description

There have been many support tickets for GCS where the FQDN on either the MyProxy server of GridFTP server has not been set to a real FQDN.

I would suggest a resolution check using a globally available dns server such as google's 8.8.8.8 to check that a resolvable FQDN has been provided.

The downside would be if the hardcoded dns server was not available, but given the alternative of a support ticket,  I'll take my changes with google.

Comments

Globus Toolkit/GT-539

Summary

MyProxy Server Startup Not Configured Properly on Debian 7

Details

Type: Bug

Status: Resolved 2014-07-03

Description

Installed the following deb: http://www.globus.org/ftppub/gt5/5.2/stable/installers/repo/globus-repository-5.2-stable-wheezy_0.0.3_all.deb

Installed GCS with default options changing only endpoint name.

After install, I was able to authenticate via MyProxy and list a directory via the  Globus cli.

I then rebooted the machine.  The GridFTP sever restarts properly but the MyProxy server does not.  I don't see any entries for MyProxy in /etc/rc*.d.

If I try to manually start it I get an error as shown below:

service myproxy-server start
[ ok ] Started myproxy-server.
/etc/init.d/myproxy-server: 62: /etc/init.d/myproxy-server: cannot create /var/run/myproxy/myproxy.pid: Directory nonexistent
root@ip-10-151-50-144:/home/admin# /etc/init.d/myproxy-server status
[FAIL] myproxy-server is not running ... failed!
root@ip-10-151-50-144:/home/admin# /etc/init.d/myproxy-server start
[warn] myproxy-server already running ... (warning).
root@ip-10-151-50-144:/home/admin# /etc/init.d/myproxy-server status
[FAIL] myproxy-server is not running ... failed!

Comments

Jack Kordas - 2014-06-10

Related ticket which I'm going to reassign to Joe as well.

https://globusonline.zendesk.com/agent/#/tickets/302426

Joe Bester - 2014-06-12

I've committed changes to include default runlevels for the myproxy-server service to git. I'm in process of building new binaries, but am also rebuilding to add ubuntu 14.04 so it may be a few days to get it updated.

Globus Toolkit/GT-540

Summary

gsi_openssh install problems with alpha2

Details

Type: Bug

Status: Resolved 2014-09-11

Description

The gsi_openssh package in the alpha2 source installer seems to have an issue where it attempts to use install -s on a libtool wrapper script, instead of using libtool --mode=install install -s, so it ends up copying the wrong file and printing an error about the script not being a binary executable. This was reported by Venkat at NCSA

Comments

Globus Toolkit/GT-541

Summary

gsi_openssh error message during build from source installer in 6.0 alpha 2

Details

Type: Bug

Status: Open

Description

The 6.0 alpha2 install prints out "Error: More #endif's than #if's found." during make all. This appears to be generated by doxygen during the build, but there is no doxygen docs in gsi_openssh, so it can be removed from the list of directories to process.

This was reported to me by Venkat at NCSA.

Comments

Globus Toolkit/GT-542

Summary

gsi_openssh program links (gsissh → ssh.d/ssh, gsiscp → ssh.d/scp, etc) missing in 6.0 alpha 2

Details

Type: Bug

Status: Resolved 2014-09-11

Description

The gsi-openssh in the source installer for 6.0 alpha2 is not creating the gsi-prefixed name links (gsissh, gsiscp, etc). This appears to be because the gpt-install rule isn't invoked by the default install rule. This should be done when configured with GSI.

Comments

Globus Toolkit/GT-543

Summary

Add builds for openssl for use in GCP

Details

Type: Task

Status: Resolved 2014-08-25

Description

Need to include openssl binaries with Linux and Windows GT binary tarballs

Linux
-Make a separate tarball of GT binaries for GCP to use
-Add system openssl libs to tarball
-Make sure build distro gets updated before build, to ensure latest openssl

Mac
-Drop support for 10.5, which means we can rely on system openssl and not provide our own.

Win
-Need to be able to build OpenSSL by hand in order to respond to vulnerabilities in a timely manner
--Current release is based on fedora 19s 1.0.1e+patches
--Any reason to stick with that vs a std openssl version?
.

Comments

Globus Toolkit/GT-544

Summary

Number of globus-job-manager becomes very high once in a while

Details

Type: Task

Status: Open

Description

Hi,
I am seeing so many globus-job-manager processes and globus-gatekeeper
processes once in a while.
I have uploaded the process lists, top command output, output of free:
http://oo.ihepa.ufl.edu:8080/t2/operations/gjm_too_many
Can you check and see if there is anything obvious?
Of please let me know if you need other system info.
Thanks,
Bockjoo Kim at U of Florida

Comments

Anonymous - 2014-06-23

So, I stopped the gatekeeper, killed all the gatekeeper procs under the user cmspilot
and the globus-job-manager processes that became orphaned,
and started the gatekeeper.
And an hour later, I eventually get the segfaults:
http://oo.ihepa.ufl.edu:8080/t2/operations/cores-GT544.tar.gz
After getting the segfault, the gatekeeper does not seem to accept any more jobs.
Just now (6:25 pm EDT), I have stopped the gatekeeper again and executed
killall globus-job-manager and restarted the gatekeeper.
This lets jobs run, now.
Bockjoo

Globus Toolkit/GT-545

Summary

gridftp server can hang during exit() from globus_l_gfs_bad_signal_handler

Details

Type: Task

Status: Open

Description

Hello,

(using linux, slc6, 64 bit)

We had a problem with our gridftp server plugin with gt 5.2.5 (probably unrelated to the globus toolkit) which causes a SIGSEGV. However the gridftp process hung - apparently due to a weakness in globus_l_gfs_bad_signal_handler: Traces which may be helpful:

Thread 3 (Thread 0x7fa750536700 (LWP 32734)):
#0  0x0000003f2b8f82be in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003f2b87d0b0 in _L_lock_5195 () from /lib64/libc.so.6
#2  0x0000003f2b878a0b in _int_free () from /lib64/libc.so.6
#3  0x0000003f2b835e55 in exit () from /lib64/libc.so.6
#4  0x00000033072422e1 in ?? () from /usr/lib64/libglobus_gridftp_server.so.6
#5  0x0000003f30c1a1bc in ?? () from /usr/lib64/libglobus_common.so.0
#6  0x0000003f30c320cd in globus_l_thread_pool_thread_start () from /usr/lib64/libglobus_common.so.0
#7  0x00007fa75053902b in ?? () from /usr/lib64/libglobus_thread_pthread.so.0
#8  0x0000003f2bc079d1 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003f2b8e8b5d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fa737fff700 (LWP 32735)):
#0  0x0000003f2b8f82be in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003f2b87d0b0 in _L_lock_5195 () from /lib64/libc.so.6
#2  0x0000003f2b878a0b in _int_free () from /lib64/libc.so.6
#3  0x0000003f2b835e55 in exit () from /lib64/libc.so.6
#4  0x0000000000403009 in ?? ()
#5  
#6  0x0000003f2b876272 in malloc_consolidate () from /lib64/libc.so.6
#7  0x0000003f2b878c38 in _int_free () from /lib64/libc.so.6
#8  0x00007fa73d66bf21 in free_root () from /usr/lib64/mysql/libmysqlclient.so.16
#9  0x00007fa73d68f36c in free_old_query () from /usr/lib64/mysql/libmysqlclient.so.16
#10 0x00007fa73d68fbbd in mysql_close () from /usr/lib64/mysql/libmysqlclient.so.16
#11 0x00007fa73d9f50aa in _M_set_node (this=0x7fa74c7aafd0, __vtt_parm=0x7fa73dc19288, __in_chrg=) at /usr/include/c++/4.4.7/bits/stl_deque.h:224
#12 _M_pop_front_aux (this=0x7fa74c7aafd0, __vtt_parm=0x7fa73dc19288, __in_chrg=) at /usr/include/c++/4.4.7/bits/deque.tcc:446
#13 pop_front (this=0x7fa74c7aafd0, __vtt_parm=0x7fa73dc19288, __in_chrg=) at /usr/include/c++/4.4.7/bits/stl_deque.h:1241
#14 ~PoolContainer (this=0x7fa74c7aafd0, __vtt_parm=0x7fa73dc19288, __in_chrg=) at /usr/src/debug/dmlite-0.7.0/include/dmlite/cpp/utils/poolcontainer.h:54
---Type  to continue, or q  to quit---
#15 dmlite::NsMySqlFactory::~NsMySqlFactory (this=0x7fa74c7aafd0, __vtt_parm=0x7fa73dc19288, __in_chrg=) at /usr/src/debug/dmlite-0.7.0/src/plugins/mysql/MySqlFactories.cpp:129
#16 0x000000000234d000 in ?? ()
#17 0x000000000235ca73 in ?? ()
#18 0x000000000234cff9 in ?? ()

A problem happens in thread 2 which triggers the signal handler. The handler calls exit() in globus_l_gfs_bad_signal_handler, but exit() is considered an "unsafe" function for a signal handler. (_exit() may be an alternative possibility). The thread hangs, as exit() triggers some dynamic memory actions while the malloc system had been left in an intermediate state as this is where the signal occurred .

(Thread 3 is the globus_l_gfs_control_watchdog_exit(), but it also hangs in exit()).

Comments

Globus Toolkit/GT-546

Summary

HTTP transfers larger than 4GB fail

Details

Type: Bug

Status: Open

Description

The xio http driver reads and writes the content-length header into a type that doesn't support large integers.

Comments

Globus Toolkit/GT-547

Summary

GridFTP-HDFS transfers corrupt order of data blocks (pthreads on, single-stream)

Details

Type: Bug

Status: Open

Description

With GridFTP-HDFS, enabling pthreads (by setting GLOBUS_THREAD_MODEL="pthread") seems to cause data corruptions for *single stream* transfers that span multiple HDFS blocks. In particular, we believe that the order of blocks can be scrambled on the destination file. This issue has affected a few OSG sites, such as GLOW here in Wisconsin, that have all of the relevant components.

Carl Edquist (edquist@cs.wisc.edu) is the OSG Software team member who has done most of the debugging on our end, and who will handle questions, etc. He does not have an account here, so I can relay comments as needed. If you want to provide a machine, Carl would be happy to try to set up a failing scenario to play with.

*Details*

Transfers that use parallelism (we tried from 2-10 streams), and single stream transfers that only span a single HDFS block seem to be fine.

Transfers using a single stream but spanning multiple (3 or more) HDFS blocks result in the correct size, but usually the wrong checksum at the destination. This was reported to us by a remote user transferring via srm-copy, and all our testing had the same problems using local globus-url-copy tools.

Looking at the actual differences in the corrupted files, it appears that blocks sporadically get mis-ordered in the destination file. For example, with an HDFS blocksize of 1M, we can take a corrupted output file and split it into 1M blocks, and we can see that the checksum for each block corresponds to a 1M block from the original input file, only occasionally pairs of adjacent blocks will be swapped in the output file.

We had enabled pthreads by adding the line

{noformat}
$GLOBUS_THREAD_MODEL pthread
{noformat}

to /etc/gridftp.d/gridftp-hdfs.conf. Turning off pthreads (or rather, not enabling it) by removing this line gives no corruption in all cases.

We created a patch to attempt to fix this based on a suggestion from Brian Bockelman (attached). However, the problem persisted and the failures still happen consistently.

We used the following gridftp-hdfs build (which includes the patch) in our testing, which still has the failures:

https://koji-hub.batlab.org/koji/buildinfo?buildID=5322

A recipe for reproducing the failures on a system running this gridftp-hdfs + globus-gridftp-server, with example output:

{noformat}
$ dd if=/dev/urandom of=/tmp/random.in bs=1M count=2K
$ globus-url-copy gsiftp://$HOSTNAME:2811/tmp/random.in file:///tmp/random.out
$ md5sum /tmp/random.{in,out}
6566088eb4686738b9d27d1f8363fba2  /tmp/random.in
b7a7d94625da8a4b9c18444e4136cef2  /tmp/random.out
{noformat}

For history, see the original OSG GOC and JIRA tickets:

https://ticket.grid.iu.edu/21157
https://ticket.grid.iu.edu/21825
https://jira.opensciencegrid.org/browse/SOFTWARE-1495

---
1495-pthread-mutex.patch:
{noformat}
diff -ur gridftp-hdfs-0.5.4.orig/src/gridftp_hdfs_send.c gridftp-hdfs-0.5.4/src/gridftp_hdfs_send.c
--- gridftp-hdfs-0.5.4.orig/src/gridftp_hdfs_send.c     2012-06-15 07:57:50.000000000 -0500
+++ gridftp-hdfs-0.5.4/src/gridftp_hdfs_send.c  2014-06-19 13:55:22.760210044 -0500
@@ -313,7 +313,9 @@
     remaining_read = read_length;
     cur_offset = offset;
     while (remaining_read != 0) {
+       globus_mutex_lock(hdfs_handle->mutex);
        nbytes = hdfsPread(hdfs_handle->fs, hdfs_handle->fd, cur_offset, cur_buffer_pos, remaining_read);
+       globus_mutex_unlock(hdfs_handle->mutex);
        if (nbytes == 0) {    /* eof */
            // No error
            globus_gfs_log_message(GLOBUS_GFS_LOG_DUMP, "hdfs_perform_read_cb EOF.\n");
{noformat}

Comments

Mike Link - 2014-08-20

I believe the issue is due to out of order globus_gridftp_server_register_write() calls.  In stream mode these are required to be in order, as there is no offset information transmitted with the data as in mode E/parallel streams.  This also means the problem should be avoidable by using single-stream mode E (i.e. globus-url-copy -p 1).

Locking around the entire hdfsPread/gridftp_write pair might help, but a better solution might be to wrap the globus_gridftp_server_register_write() call with a function that checks if the offset is in order, and if not, saves the buffer to a queue.  After an in-order write, the func would check the queue to see if it contains a buffer that is next in line. In this case, that queue should never contain more than a block or two so memory isn't an issue.

I'm not sure what I can do on the server side about this.  I consider it a design flaw in the DSI interface, since out of order writes may be useful and they are essentially disallowed.  Even though they are valid in mode E, the DSI can't learn what the current mode is (and that info is out of scope even if I added it).

Tim Cartwright - 2014-09-08

(From Carl Edquist, 4 September 2014:)

I was looking over your comment on GT-547, and I noticed something:

I believe the issue is due to out of order globus_gridftp_server_register_write() calls. In stream mode these are required to be in order, as there is no offset information transmitted with the data as in mode E/parallel streams. This also means the problem should be avoidable by using single-stream mode E (i.e. globus-url-copy -p 1).

Not sure if this was clear in our report, but we observed the problem with globus-url-copy only when no -p/parallelism argument was specified, and it was avoided by using -p N, with N > 1.

I see now that it is also avoided with -p 1, as you mentioned, but -p 0 seems to be equivalent to no -p argument, and results in the out-of-sequence blocks.

Also, from what I can tell -fast and -pipeline each turn on -p 1 implicitly, and in my tests they also avoid the problem.

Apparently there is an important difference here between -p 0 (no parallelism) and -p 1 ... Is the difference just to turn on mode E, as you mentioned?

If so, is there any way to force using mode E from the server side?

If there is not a good server-side solution, I'm wondering if we should make the globus-url-copy default -p 1 instead of 0

-    guc_info->num_streams = 0;
+    guc_info->num_streams = 1;

in globus_url_copy.c in globus_gass_copy ... Or is there any reason this is a bad idea?

Globus Toolkit/GT-548

Summary

GridFTP GUI problem

Details

Type: Bug

Status: Open

Description

Some months ago I used GridFTP GUI, but recently, the Java web start application does not work.
On page: http://toolkit.globus.org/toolkit/docs/4.0/data/gridftp/GridFTP_Public_Interfaces.html#id2542892, the link: http://www-unix.globus.org/cog/demo/ogce/ftp.jnlp is unavailable.
Regards, Akos Hajnal

Comments

Globus Toolkit/GT-549

Summary

HTTP upload where the file size is shorter than the requested write

Details

Type: Bug

Status: Open

Description

When performing an HTTP upload with a file size that is shorter than the requested length, the http connection sits idle an error is reported, unless the http connection drops due to idle timeout (as it usually will).

Comments

Mike Link - 2014-08-20

Looks like failure in setting END_OF_ENTITY on the http handle, and then the response read attempt hangs until the connection fails..

Globus Toolkit/GT-550

Summary

GridFTP pipelining and throughput plugin

Details

Type: User Story

Status: Open

Description

Hi,

I am trying to use the pipelining feature in the GridFTP client library, and so far so good, I got it working.

However, I would like to get the performance for each pair, but I don't seem to be able to do so.

I install the throughput plugin, and I do get the begin callback entered once per each pair, with source and destination properly set up, but as far as I can see, the others callback (throughput and end of transfers) are only called for the last pair.

Is it possible to get the throughput per transfer when pipelining?

Regards.

Comments

Globus Toolkit/GT-551

Summary

Pipelining doesn’t work with delayed passive move

Details

Type: Bug

Status: Open

Description

Hi,

When pipelining is used, and at the same time delayed passive is enabled, usually the transfer "fails" straight away with "125 Beginning transfer", which doesn't really look like an error.

Is this expected?

Regards.

Comments

Globus Toolkit/GT-552

Summary

Terminate with error if using UDT and no threading

Details

Type: Improvement

Status: Open

Description

[From Brian Bockelman . Please include him in email about this ticket.]

If UDT is enabled but the Globus threading mode is set to "none", then the GridFTP transfer will deadlock 100% of the time. This is a known issue (and doesn't sound like it is easy to fix).

Can the UDT module query the threading mode and simply throw an error if it is configured in this way? This provides a much better feedback mechanism to sysadmins.

Comments

Globus Toolkit/GT-553

Summary

Add supported platforms to downloads page

Details

Type: Task

Status: Resolved 2014-09-16

Description

Incorporate the information from platforms.xml to the download page at http://toolkit.globus.org/toolkit/downloads/6.0/

Comments

Globus Toolkit/GT-554

Summary

gridftp.conf can not be overridden by gridftp.d/ conf entries

Details

Type: Improvement

Status: Open

Description

Any config set in the -c gridftp config file can not be overridden with entries from the -C gridftp.d config entries.  This means default config entries that we may add to gridftp.conf, like port or log settings, can't be overridden by GCS or other uses of the gridftp.d entries.

Comments

Globus Toolkit/GT-555

Summary

gridftp daemon not reporting Globus version for 6.0

Details

Type: Bug

Status: Resolved 2015-03-12

Description

Small minor "bug-a-boo" with gridftp daemon not reporting Globus version for 6.0

Telnet to port running Globus Toolkit 6.0

login5.stampede(2)$ echo exit | sleep 5 | telnet gridftp.stampede.tacc.xsede.org 2811
Trying 129.114.62.20...
Connected to gridftp.stampede.tacc.xsede.org.
Escape character is '^]'.
220 data2.stampede.tacc.utexas.edu GridFTP Server 7.11 (gcc64, 1409949382-85) [unknown] ready.
Connection closed by foreign host.

Comments

Mike Link - 2015-03-12

This is fixed in globus-gridftp-server 7.17.

Globus Toolkit/GT-556

Summary

GT 6.0 globus-makefile-header returns bad GLOBUS_CFLAGS on MacOS 10.9.4

Details

Type: Bug

Status: Resolved 2014-09-19

Description

On MacOS 10.9.4 a source install (globus_toolkit-6.0.tar.gz) gives me good output from globus-makefile-header:

{quote}
$ $GLOBUS_LOCATION/bin/globus-makefile-header -flavor gcc64dbg globus_gss_assist globus_usage | grep GLOBUS_CFLAGS
GLOBUS_CFLAGS = -g -O2 -pthread
{quote}

However, a binary install (globus_toolkit-6.0.pkg or
Globus-6.0-build121.tar.gz) gives output that causes compiler errors:

{quote}
$ $GLOBUS_LOCATION/bin/globus-makefile-header -flavor gcc64dbg globus_gss_assist globus_usage | grep GLOBUS_CFLAGS
GLOBUS_CFLAGS = -mmacosx-version-min=10.6 -arch i386 x86_64 -Wno-deprecated-declarations -pthread
$ gcc -mmacosx-version-min=10.6 -arch i386 x86_64 -Wno-deprecated-declarations -pthread hello.c
clang: error: no such file or directory: 'x86_64'
{quote}

This is causing problems for the GSI-OpenSSH configure script which relies on globus-makefile-header.

Comments

Joe Bester - 2014-09-19

I'm pretty sure this is a bug in older versions of pkg-config which incorrectly handle compiler command-line options (turning "-arch i386 -arch x86_64" into the "-arch i386 x86_64" you see).

This version fixes that problem: http://pkgconfig.freedesktop.org/releases/pkg-config-0.28.tar.gz

Jim Basney - 2014-09-19

Upgrading to pkg-config 0.28 fixed it. Thanks!

Globus Toolkit/GT-557

Summary

Improve gridmap failure error string

Details

Type: Improvement

Status: Open

Description

Rachana hit this error string when attempting to use the gridmap fallback behavior in the callout from globus_gridmap_verify_myproxy_callout:

Command Failed: Error (login) Server: ranantha#mdftest (mdfdemo.ncsa.illinois.edu:2811) Message: Login Failed --- 530-Login incorrect. : globus_gss_assist: Error invoking callout 530-globus_callout_module: The callout returned an error 530-globus_gridmap_callout_error: Gridmap lookup failure: Could not map /C=US/O=Globus Consortium/OU=Globus Connect User/CN=ranantha 530- 530 End.

There was some confusion because the entry was added to a gridmap file, but it wasn't in the correct place. It would be helpful if the error message included the path to the gridmap file that was read.

Comments

Globus Toolkit/GT-558

Summary

Access denied with symlinked home dir and restricted paths

Details

Type: Task

Status: Open

Description

GT-485, which fixed the problem generally, added its own bug that causes access to the home dir to be denied when the realpath of the home dir is longer than any other path in the restricted path list.

Comments

Globus Toolkit/GT-559

Summary

Bundle targets documented in admin guide don’t exist in makefile

Details

Type: Documentation

Status: Resolved 2014-10-01

Description

4) The source installer will build all of the globus toolkit packages in the default make
   rule.The same [package groups] as the native packages may be used to build and
   install a subset of the toolkit.

Run

globus@elephant% make PACKAGE-GROUPS


suggesting that the PACKAGE-GROUPS there can be replaced by the
packages listed above
the source build section, are a little wide of the mark.

On the 5.2.5 build, yes there are targets for

globus-data-management-server:
globus-data-management-client:
globus-data-management-sdk:
globus-resource-management-server:
globus-resource-management-client:
globus-resource-management-sdk:

and

globus-gsi

from the package list, but not the first two that get mentioned in the docs

These packages are:

globus-gridftp   GridFTP client and server tools
globus-gram5   GRAM5 client and server tools
globus-gsi ...


are no longer targets in the Makefile.

$ make globus-data-management-server
make: *** No rule to make target 'globus-data-management-server'.  Stop.
$ make globus-data-management-client
make: *** No rule to make target 'globus-data-management-client'.  Stop.
$ make globus-data-management-sdk
make: *** No rule to make target 'globus-data-management-sdk'.  Stop.
$ make globus-resource-management-server
make: *** No rule to make target 'globus-resource-management-server'.  Stop.
$ make globus-resource-management-client
make: *** No rule to make target 'globus-resource-management-client'.  Stop.
$ make globus-resource-management-sdk
make: *** No rule to make target 'globus-resource-management-sdk'.  Stop.
$ make globus-gsi
make: *** No rule to make target 'globus-gsi'.  Stop.
$

I am sure that this will be a case of the docs not keeping up with the
code, rather,
the Makefile, but I thought I might as well flag it up whilst I was
"experiencing" it.
first hand.

Comments

Joe Bester - 2014-10-01

I've updated the documentation to reflect the targets in the makefile

Globus Toolkit/GT-560

Summary

Verify sharing certs in gridmap_verify_myproxy_callout

Details

Type: Bug

Status: Open

Description

Verify sharing certs in gridmap_verify_myproxy_callout.

Comments

Globus Toolkit/GT-561

Summary

Gridftp 5.2.5 ( with patch globus_ftp_control 4.8.3) and 6.0 still logging [0.0.0.0] when "Requesting Abort, otherwise not seeing DEST=[0.0.0.0].

Details

Type: Bug

Status: Open

Description

Gridftp 5.2.5 ( with patch globus_ftp_control 4.8.3) and 6.0 still logging [0.0.0.0] when "Requesting Abort,
otherwise not seeing DEST=[0.0.0.0].

Log from patched gridftp 5.2.5
[3009] Mon Sep 29 17:23:58 2014 :: Requesting abort...
[3009] Mon Sep 29 17:23:58 2014 :: Transfer stats: DATE=20140929222358.304696 HOST=data2.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=2014 929222358.261524 USER=tg803251 FILE=/home1/01062/tg803251/ToTyler/state200:7/min4-sim.dcd BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYP =STOR CODE=226
[3656] Mon Sep 29 22:27:54 2014 :: Requesting abort...
[3667] Mon Sep 29 22:39:44 2014 :: Requesting abort...
[3883] Tue Sep 30 00:44:45 2014 :: Requesting abort...
[8336] Tue Sep 30 19:55:19 2014 :: Requesting abort...
[8336] Tue Sep 30 19:55:19 2014 :: Transfer stats: DATE=20141001005519.894538 HOST=data2.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=2014 001005519.879223 USER=tg811191 FILE=/scratch/01823/tg811191/rtdensity/k8/R100Sc1/512exp/postavg2/Kux/Kux000.plt BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[8337] Tue Sep 30 19:55:19 2014 :: Requesting abort...
[8337] Tue Sep 30 19:55:19 2014 :: Transfer stats: DATE=20141001005519.925649 HOST=data2.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20141001005519.860277 USER=tg811191 FILE=/scratch/01823/tg811191/rtdensity/k8/R100Sc1/512exp/postavg2/Krhowz/Srhowz029.plt BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226

Log from gridftp 6.0
[39960] Tue Oct 7 06:17:49 2014 :: Requesting abort...
[39960] Tue Oct 7 06:17:49 2014 :: Transfer stats: DATE=20141007111749.685957 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007111749.305876 USER=duanl FILE=/home1/02682/duanl/Postprocessing/.git/objects/0d/2acd3a5aedc85d7a5067ce911c94eedf8ce1b9 BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREA S=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[40027] Tue Oct 7 06:17:49 2014 :: Requesting abort...
[40027] Tue Oct 7 06:17:49 2014 :: Transfer stats: DATE=20141007111749.686162 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007111748.226227 USER=duanl FILE=/home1/02682/duanl/Postprocessing/.git/objects/08/11ccdd47c8a1514733b769b06e1b3649eb54a1 BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREA S=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[40077] Tue Oct 7 06:27:08 2014 :: Requesting abort...
[40077] Tue Oct 7 06:27:08 2014 :: Transfer stats: DATE=20141007112708.191737 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007112707.694558 USER=duanl FILE=/home1/02682/duanl/Postprocessing/.ptp-sync/objects/18/bb96e364d0217e0b65c30b2cedd9c9866b33fa BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ TREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[40078] Tue Oct 7 06:27:08 2014 :: Requesting abort...
[40078] Tue Oct 7 06:27:08 2014 :: Transfer stats: DATE=20141007112708.191922 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007112707.694232 USER=duanl FILE=/home1/02682/duanl/Postprocessing/.ptp-sync/objects/1b/1849b803717153c9d14c6712e117665cc50c99 BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ TREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[40115] Tue Oct 7 06:33:10 2014 :: Requesting abort...
[40116] Tue Oct 7 06:33:10 2014 :: Requesting abort...
[40116] Tue Oct 7 06:33:10 2014 :: Transfer stats: DATE=20141007113310.649237 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007113310.357252 USER=duanl FILE=/home1/02682/duanl/Postprocessing/.ptp-sync/objects/70/89813b8e1f9133b98802ba3b5d1dadcbd002b7 BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ TREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226
[40131] Tue Oct 7 06:41:00 2014 :: Requesting abort...
[40131] Tue Oct 7 06:41:00 2014 :: Transfer stats: DATE=20141007114100.917673 HOST=data1.stampede.tacc.utexas.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=201 1007114057.863798 USER=duanl FILE=/home1/02682/duanl/Code_Init/Calsgr.f90 BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1 STRIPES=1 DEST=[0.0.0.0] TYPE=RETR CODE=226

Comments

Mike Link - 2014-10-14

This is expected when the connection to the remote server could not be made.

Globus Toolkit/GT-562

Summary

globus-gatekeeper-9.15 missing globus-common-progs dependency

Details

Type: Bug

Status: Open

Description

This ticket is about the OSG version of the globus-gatekeeper-9.15 package, which we source from EPEL  so it is possible that your packaging is different or that the issue has been fixed in a more recent release.

The spec file for globus-gatekeeper (9.15) does not have globus-common-progs as a Requires line. We believe that it used to be there in older versions, but is absent in this one.

If globus-common-progs is not installed, the init script for the globus-gatekeeper service will fail because it will not find /usr/share/globus/globus-script-initializer. To fix, add

Requires: globus-common-progs%{?_isa} >= 14

to the spec file for globus-gatekeeper.

If possible, please keep Mat Selmeci  informed on this ticket, too, as he is the person who found the issue and patched our package.

Comments

Globus Toolkit/UX-2565

Summary

"Create Globus Connect Personal Endpoint" allows submission even if the endpoint name is invalid or in-use

Details

Type: Bug

Status: Closed 2014-11-02

Description

Step to reproduce:

1. go to /xfer/ManageEndpoints (or /xfer/StartTransfer)
2. click "add Globus Connect Personal" (or "Get Globus Connect Personal")
3. enter an endpoint name that has disallowed characters
4. click "Generate Setup Key" button

Expected: Button should refuse to submit the form.
Actual: Button becomes "Generating Setup Key" and sticks

Comments

Dave Shifflett - 2014-10-16

Change at https://github.com/globusonline/webapp/compare/dev...JIRA-UX-2565 or https://github.com/globusonline/webapp/commit/11f704a06b82289e41

Diane Collins - 2014-11-02

Entering incorrect values into the EP box now throws errors upon attempt to submit as expected.-dpc

Globus Toolkit/GT-564

Summary

gridmap- shell scripts parse input badly

Details

Type: Task

Status: Open

Description

Per http://lists.globus.org/pipermail/gt-user/2014-October/010749.html

The grid-mapfile-add-entry script doesn't process its -dn and -ln options correctly and adding some quotes to those inputs can create gridmap files that could grant access to the wrong user.

Comments

Globus Toolkit/GT-565

Summary

MyProxy error messages like "unknown username" disclose too much information

Details

Type: Improvement

Status: Open

Description

https://www.owasp.org/index.php/Guide_to_Authentication states, "Authentication and registration processes, particularly login failures, SHOULD provide no information as to if an account exists or password or is wrong. A single error message for the end user covering both scenarios is more than adequate."

The MyProxy server should implement the above advice rather than providing errors like "unknown username" to the client.

Comments

Globus Toolkit/GT-566

Summary

Debian packages look for configuration in /usr/etc and state in /usr/var

Details

Type: Bug

Status: Open

Description

The globus_common package doesn't set localstatedir and sysconfdir in the configuration script invocation, so they default to be relative to ${prefix} (/usr). This causes problems when some components look for configuration files or (gram especially) try to write state.

Comments

Globus Toolkit/GT-567

Summary

Remove requirement that GRAM5 use SSLv3

Details

Type: Task

Status: Resolved 2015-01-08

Description

The GRAM client library uses the IO attribute GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP_SSL3 which forces SSLv3 to be used when authenticating with the gatekeeper. This was done for compatibility with an old version of GRAM that didn't support TLS. Remove this so that we can begin transition period to remove SSLv3 support altogether.

Comments

Globus Toolkit/GT-568

Summary

IPv4 only SE <→ Dual stack SE interoperability problems using globus-url-copy

Details

Type: Bug

Status: Resolved 2015-04-10

Description

Originally reported in GGUS: https://ggus.eu/index.php?mode=ticket_info&ticket_id=109576

We are unable to copy a file from an IPv4-only SE to a dual-stack SE, e.g.

$ export GLOBUS_IO_IPV6=TRUE
$ export GLOBUS_FTP_CLIENT_IPV6=TRUE
$ globus-url-copy gsiftp://srm.glite.ecdf.ed.ac.uk/dpm/ecdf.ed.ac.uk/home/dteam/generated/2014-09-21/file89244586-e79c-41c8-b20c-99ed40051c2e gsiftp://dc2-grid-23.brunel.ac.uk:2811/dpm/brunel.ac.uk/home/dteam/testfile-$$

error: globus_ftp_client: the server responded with an error
500 500-Command failed. : globus_ftp_control_data_write failed.
500-globus_ftp_control_data_write(): Handle not in proper state. PORT
500 End.

note the copy works when the IPv6 flags are not set:

$ unset GLOBUS_FTP_CLIENT_IPV6
$ unset GLOBUS_IO_IPV6
$ globus-url-copy gsiftp://srm.glite.ecdf.ed.ac.uk/dpm/ecdf.ed.ac.uk/home/dteam/generated/2014-09-21/file89244586-e79c-41c8-b20c-99ed40051c2e gsiftp://dc2-grid-23.brunel.ac.uk:2811/dpm/brunel.ac.uk/home/dteam/testfile-$$
$

We think globus-url-copy is picking the IPv6 address family for the data channel when the SE at one end is IPv4 only. This problem makes it seemingly impossible to run a production dual-stack SE at the moment because it affects FTS3 servers if the FTS3 server is also IPv6 enabled (please see ggus ticket #109089). More detailed debug output is attached.

Comments

Mattias Ellert - 2014-10-23

Additional information provided by the reporter in the original GGUS ticket:

$ globus-url-copy -versions
globus-url-copy: 9.11 (1408739578-85)
globus_ftp_client_restart_plugin: 8.12 (1408739578-85)
globus_ftp_client_debug_plugin: 8.12 (1408739578-85)
globus_ftp_client_perf_plugin: 8.12 (1408739578-85)
globus_ftp_client_throughput_plugin: 8.12 (1408739578-85)
globus_xio_popen: 3.5 (1408739578-85)
globus_ftp_control: 5.11 (1408739578-85)
globus_ftp_client: 8.12 (1408739578-85)
globus_xio_gsi: 3.5 (1408739578-85)
globus_xio_tcp: 4.14 (1408739578-85)
globus_xio_system_select: 4.14 (1408739578-85)
globus_xio_file: 4.14 (1408739578-85)
globus_xio: 4.14 (1408739578-85)
globus_io: 10.11 (1408739578-85)
globus_gsi_callback_module: 5.5 (1408739578-85)
globus_credential: 7.6 (1408739578-85)
globus_gsi_proxy: 7.6 (1408739578-85)
globus_gsi_openssl_error: 3.4 (1408739578-85)
globus_openssl: 4.5 (1408739578-85)
globus_gsi_gssapi: 11.12 (1408739578-85)
globus_sysconfig: 6.7 (1408739578-85)
globus_callout_module: 3.12 (1408739578-85)
globus_gss_assist: 10.11 (1408739578-85)
globus_i_gass_transfer_http: 8.7 (1408739578-85)
globus_extension_module: 15.25 (1409949382-85)
globus_callback_nonthreaded: 15.25 (1409949382-85)
globus_callback: 15.25 (1409949382-85)
globus_object: 15.25 (1409949382-85)
globus_error: 15.25 (1409949382-85)
globus_common: 15.25 (1409949382-85)
globus_gass_transfer: 8.7 (1408739578-85)
globus_gass_copy: 9.11 (1408739578-85)
globus_thread_common: 15.25 (1409949382-85)
globus_thread_none: 15.25 (1409949382-85)
globus_thread: 

220 srm.glite.ecdf.ed.ac.uk GridFTP Server 6.38 (gcc64, 1382984154-83) [Globus Toolkit 5.2.5] ready
220 dc2-grid-23.brunel.ac.uk GridFTP Server 7.11 (gcc64, 1408739578-85) [Globus Toolkit 6.0] ready.

$ rpm -qf /usr/bin/globus-url-copy
globus-gass-copy-progs-9.11-1.el6.x86_64
$ globus-version
6.0
$ globus-url-copy -version
globus-url-copy: 9.11
$ uname -a
Linux v6ui00.grid.hep.ph.ic.ac.uk 2.6.32-431.20.3.el6.x86_64 #1 SMP Thu Jun 19 21:14:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Mattias Ellert - 2015-02-25

There was a comment regarding the update on GGUS:
https://ggus.eu/index.php?mode=ticket_info&ticket_id=109089

Hi,

I installed globus-ftp-client-8.19-1.el6 and the latest version of FTS, 3.2.32, but I'm still seeing errors such as the following when transferring files between an IPv4 and dual-stack endpoint:

Wed Feb 25 14:33:09 2015 ERROR TRANSFER globus_ftp_client: the server responded with an error 500 'eprt |2|2001:630:12:580:92E2:BAFF:FE20:1CBC|21762|': command not und

I saw some others with the error "500-glob" in the same place. I don't know why the error messages are getting so truncated.

Regards,
Simon

[~mlink] could you comment?

Mike Link - 2015-03-03

Is this the same sort of transfer as the one in the original report?  A transfer between two servers, source ipv4 to destination ipv6?  If so, the source is a FTP server that doesn't support the EPRT command -- but is it possible that it was contacted on a ipv6 address?

Mattias Ellert - 2015-03-04

A reply was added to the GGUS ticket:

Hi,

Yes, these transfers are the same as the original report: source is IPv4, destination is dual-stack... The source must have connected over IPv4, none of the sources involved have IPv6 addresses. The destination almost certainly connected over IPv6. The following list describes the various combinations of source/dest:

Source / Dest -> Result
IPv4 / IPv4 -> Works
IPv4 / IPv6 -> Fails
IPv6 / IPv4 -> Works
IPv6 / IPv6 -> Untested recently, but used to work

The source servers we are using are all the production SEs around the grid... They probably don't support the EPRT command, but they shouldn't need to (and even if they did, passing them an IPv6 address would seem to be the wrong behavior anyway).

If you want to do any testing, our dual-stack FTS server is at https://fts00.grid.hep.ph.ic.ac.uk:8449/ (transfer port is standard 8443)... We have a dual-stack SE at gfe02.grid.hep.ph.ic.ac.uk which supports dteam and IPv6 testbed VOs.

Regards,
Simon

Mike Link - 2015-03-04

I believe the problem is that the first patch was still treating an IPV4 mapped address as an IPV6 address.  I've committed a fix for globus-ftp-client-8.20 to correctly handle that case.

Mattias Ellert - 2015-03-11

Feedback from GGUS:

Simon Fayer 2015-03-11 12:04:

Hi,

This seems to now work in the IPv4 -> Dual-stack case, unfortunately it seems IPv6 - IPv6 transfers are now broken.

When transferring between two IPv6 only hosts, it appears the gsiftp endpoint addresses are getting mangled:

Wed Mar 11 11:22:13 2015 INFO [1426072933797] BTH SRM PREPARE:ENTER
Wed Mar 11 11:22:14 2015 INFO [1426072934531] SRC GFAL2::PLUGINS::SRM SRM:GET Got TURL srm://t2dpm1-v6.physics.ox.ac.uk:8446/srm/managerv2?SFN=/dpm/physics.ox.ac.uk/home/dteam/generated/2014-08-29/file53159295-b603-4741-a3f3-535edf360ce8 => gsiftp://t2dpm1-v6.physics.ox.ac.uk/t2dpm1-v6.physics.ox.ac.uk:/dpm/pool1/dteam/2014-08-29/file53159295-b603-4741-a3f3-535edf360ce8.10928.0
Wed Mar 11 11:22:15 2015 INFO [1426072935374] DST GFAL2::PLUGINS::SRM SRM:PUT Got TURL srm://v6se00.grid.hep.ph.ic.ac.uk:8446/srm/managerv2?SFN=/dpm/grid.hep.ph.ic.ac.uk/home/dteam/dtr-test-31222 => gsiftp://v6se00.grid.hep.ph.ic.ac.uk/v6se00.grid.hep.ph.ic.ac.uk:/srv/pool00/dteam/2015-03-11/dtr-test-31222.1231466.0
Wed Mar 11 11:22:15 2015 INFO [1426072935374] BTH SRM PREPARE:EXIT
Wed Mar 11 11:22:15 2015 INFO [1426072935375] BTH GSIFTP TRANSFER:ENTER (2001:630:441:905::6f:0) gsiftp://t2dpm1-v6.physics.ox.ac.uk/t2dpm1-v6.physics.ox.ac.uk:/dpm/pool1/dteam/2014-08-29/file53159295-b603-4741-a3f3-535edf360ce8.10928.0 => (2001:630:12:580:216:3eff:fe7f:14b:0) gsiftp://v6se00.grid.hep.ph.ic.ac.uk/v6se00.grid.hep.ph.ic.ac.uk:/srv/pool00/dteam/2015-03-11/dtr-test-31222.1231466.0
Wed Mar 11 11:22:17 2015 ERROR TRANSFER globus_ftp_client: the server responded with an error 500 500-500 Command failed. 500- : globus_libc_addr_to_contact_string f

Regards,
Simon

Simon Fayer 2015-03-11 12:20:

Hmm, it actually seems to be mangling some dual-stack and IPv4 only ones too, but not all of them... Perhaps this is a new bug?

Regards,
Simon

Zdenek Salvet 2015-03-11 15:07:

Hi Simon,
the most interesting last message is truncated in your last post. Do you have more?

Simon Fayer 2015-03-11 15:12:

Unfortunately not... All of the interesting log lines seem to be truncated in the log files themselves (I suspect this will be a major problem for tracking down future errors). I don't know whether it's globus or FTS that's causing the truncation, but I suspect the latter.

Regards,
Simon

Mike Link - 2015-03-11

Can you attempt to reproduce with globus-url-copy as in the original report?  I can't see how any recent changes could result in rewriting urls in that way.

Mattias Ellert - 2015-03-16

More feedback from GGUS:

Duncan Rand 2015-03-12 12:08:

$ rpm -qa|grep globus-ftp-client
globus-ftp-client-8.20-1.el6.i686
globus-ftp-client-devel-8.20-1.el6.x86_64
globus-ftp-client-8.20-1.el6.x86_64

$ env|grep GLOBUS
GLOBUS_IO_IPV6=TRUE
GLOBUS_TCP_PORT_RANGE=20000,25000
GLOBUS_FTP_CLIENT_IPV6=TRUE

ipv4 --> dual-stack

$ globus-url-copy gsiftp://heplnx236.pp.rl.ac.uk/pnfs/pp.rl.ac.uk/data/dteam/generated/2014-07-12/file8c1a0130-9d6d-4c97-9d56-9969bd84b696 gsiftp://sedsk63.grid.hep.ph.ic.ac.uk:2811//pnfs/hep.ph.ic.ac.uk/data/dteam/dtr-test-$$;echo $?
0

ipv6 --> ipv6

$ globus-url-copy gsiftp://t2dpm1-v6.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/dteam/generated/2014-08-29/file53159295-b603-4741-a3f3-535edf360ce8 gsiftp://v6se00.grid.hep.ph.ic.ac.uk/dpm/grid.hep.ph.ic.ac.uk/home/dteam/dtr-test-$$;echo $?
0

Mattias Ellert 2015-03-11 21:28:

[ellert@localhost ~]$ host t2dpm1-v6.physics.ox.ac.uk
t2dpm1-v6.physics.ox.ac.uk has IPv6 address 2001:630:441:905::6f
[ellert@localhost ~]$ host v6se00.grid.hep.ph.ic.ac.uk
v6se00.grid.hep.ph.ic.ac.uk has IPv6 address 2001:630:12:580:216:3eff:fe7f:14b

However, in the log you quote it says 2001:630:441:905::6f:0 and 2001:630:12:580:216:3eff:fe7f:14b:0 with an extra :0 appended. If this 0 is supposed to mean "undefined port" the IPv6 address should be in [ brackets ] i.e. [2001:630:441:905::6f]:0. Is this just a bad rendering in the log or is this string used for something important?

Mattias Ellert 2015-03-16 09:52:

As far as I can tell, the strings with the :0 appended to the numerical IPv6 addresses without enclosing the IPv6 address in square brackets is not an issue with the globus ftp client library, but with the gfal2 library. See function return_hostname in source file src/plugins/gridftp/gridftp_filecopy.cpp in the gfal2 sources. If these strings are used only for creating log messages or if they are used for something else that affects the functionality of the code is not fully clear to me. Though it looks like most calls to return_hostname are used only for writing log messages.

Mattias Ellert - 2015-03-16

From the last comments quoted from the GGUS ticket it looks like it works as expefted when gloubs-url-copy is used as the client. The weird logging messages with the :0 appended to the IPv6 numeric address are not from globus, but from gfal2. The only remaining issue is the origin of the last quoted message from the log, but it seems to be a bit mangled and incomplete, so what it is trying to say is unclear.

Mattias Ellert - 2015-03-30

More feedback from GGUS:

Mattias Ellert 2015-03-17 13:18

If I try to summarize.

The issue reported in ticket #109576 (IPv4 only SE <-> Dual stack SE interoperability problems using globus-url-copy) - which was merged with this ticket - has been fixed according to update #34 in this ticket.

The original issue reported in this thread - the use of PASV/PORT and EPSV/EPRT respectively with the wrong kind of IP addresses - seems to also be fixed with the latest update.

Later comments have reported bad rendering of numerical IPv6 addresses in log messages - which is unrelated to the original issue and can be attributed to a different software component.

Also reported in later comments is an issue with log messages that appear truncated or scrambled and therefore not easily interpretable. These messages looks different from the error messages in the original report. So this looks like a different issue.

If the originally reported issue was fixed I suggest that this ticket is closed, and that new tickets are created for the other issues reported in the comments. These should then be assigned to relevant components.

What do you think?

Mattias Ellert 2015-03-24 10:25:

A week ago I suggested closing this ticket since as far as I could tell the originally reported issues have been fixed. There has so far not been any feedback on this suggestion. If there are no objections within the next few days I will go ahead and close. But feedback stating agreement to the proposal would also be welcome.

Mattias Ellert 2015-03-30 08:43

Since there were no objections to my proposal to close this bug report I will now close it. On the other hand noone provided support for the proposal either.
If you experience the originally reported issue again, please reopen the ticket. If you still experince some of the unrelated issues that were mentioned in some of the comments in this ticket please open new separate tickets for these.
The globus-ftp-client-8.20-1 updates were pushed to EPEL stable on 2015-03-25.

(Ticket marked "solved".)

Mattias Ellert - 2015-04-04

The GGUS ticket has been marked "verified" by the original reporter.

Globus Toolkit/GT-569

Summary

globus-connect-server doesn’t catch some configuration errors

Details

Type: Task

Status: Open

Description

In https://globusonline.zendesk.com/agent/tickets/302949 the issue ended up having an invalid value for the AuthorizationMethod (the user tried | to join different methods). This wasn't caught by the parser, and the resulting authorization method string didn't match anything that globus-connect-server knew what to do with. In this case, it caused the id setup to not install the myproxy mapapp for the DN used by the myproxy ca.

Comments

Globus Toolkit/GT-570

Summary

globus-connect-server SELinux implementation can generate odd errors

Details

Type: Task

Status: Open

Description

The globus-connect-server setup scripts attempt to do the right thing when the SELinux tools are available, but the actual things that need to be done in the different distribution versions. We should probably have different implementations of the SELinux commands for each OS that might be using SELinux, and skip the filesystem labeling if SELinux is not configured to enforce policies.

Comments

Globus Toolkit/GT-571

Summary

Move

Details

Type: Bug

Status: Open

Description

Move the /etc/myproxy.d searching into the default myproxy.sysconfig file instead of patching it at build time in the RPM spec. See https://support.globus.org/entries/102139983-myproxy-server-6-0-2-is-DOA-on-install-how-to-correct-so-myproxy-server-will-start

Comments

Globus Toolkit/GT-572

Summary

globus-ftp-client performs MLSD with incorrect TYPE

Details

Type: Bug

Status: Open

Description

globus-ftp-client performs MLSD listings in TYPE A, which leads to encoding errors when the server is spec compliant.  It should use TYPE I.

Comments

Globus Toolkit/GT-573

Summary

errors from voms_proxy_init are hidden in myproxy-logon

Details

Type: Bug

Status: Open

Description

myproxy-logon
1) incorrectly interprets return code from voms_proxy_init()
(return value <0 is understood to indicate error while the return value
can be status code from wait())
2) does not pass error from voms_proxy_init() to its own exit status

Comments

Mattias Ellert - 2016-02-26

Hi

I was asked about news on the progress of this issue in the corresponding GGUS ticket. is there any?

Mattias

Jim Basney - 2016-02-26

Sorry no progress from me on it. A GitHub pull request (https://github.com/globus/globus-toolkit/pulls) would be welcome.

Globus Toolkit/GT-574

Summary

globus-gram-job-manager test suite fails on GNU/Hurd

Details

Type: Bug

Status: Open

Description

The "make check" fails on GNU/Hurd when building the globus-gram-job-manager package (version 14.22) on Debian. It completes successfully on all the other Debian architectures.

https://buildd.debian.org/status/fetch.php?pkg=globus-gram-job-manager&arch=hurd-i386&ver=14.22-2&stamp=1414745865

I would appreciate if you could provide any hints on how to fix this, or give pointers to the best way to debug it. The globus-gram-job-manager test suite is quite complex with the test programs connection to a globus-gatekeeper started by the globus-personal-gatekeeper script and the gatkeeper starting a globus-job-manager. So there are many componets at play, and it is not clear to me exactly which one is not behaving as expected.

Comments

Globus Toolkit/GT-575

Summary

GridFTP pluggable network manager

Details

Type: New Feature

Status: Open

Description

Support network event callouts and provide them with the Globus Transfer task id.

Comments

Globus Toolkit/GT-577

Summary

http://toolkit.globus.org/toolkit/downloads/ says "GT 5.2.0 is highly recommended."

Details

Type: Documentation

Status: Open

Description

http://toolkit.globus.org/toolkit/downloads/ says "GT 5.2.0 is highly recommended." Probably it should be updated now to say "GT 6.0 is highly recommended."

Comments

Globus Toolkit/GT-578

Summary

User was mapped but not allowed

Details

Type: Bug

Status: Resolved 2015-01-20

Description

Hello,

I am trying to transfer data using the Globus toolkit and I get the error message: 530--User was mapped as root but root is not allowed
root is the sudoers user name.
Any idea how can I work work around around that issue.

Comments

Stuart Martin - 2015-01-20

Hi,

Mapping to root is not allowed.  Try creating a user account and add a gridmap entry to that account.

-Stu

Globus Toolkit/GT-579

Summary

GSI-OpenSSH V5.7 bug on Luster file sytems

Details

Type: Bug

Status: Open

Description

This was started a globus.org support ticket (#303096) that is being moved to the Toolkit Jira system.

Hi Gigi,

Yes, it's since upgrading to 5.7....5.5 did not produce this error.

We're running the "gsi-openssh-server-5.7-1gt" package provided via
Globus' RPM repository on CentOS 6.5 x86_64.

The issue seems to only occur when scp'ing or gsiscp'ing large files to
a path that resides on a Lustre filesystem.

I took a look at the openssh code in question, it seems there's a
hardcoded max buffer of 64MB, which seems to be overrun in this case.

Scott

--------------------

Hello,

Since upgrading to the latest gsi-openssh-server packages, large file transfers have been failing with:

"fatal: buffer_append_space: alloc 67141632 not supported"

Manually setting HPNBufferSize=[number less than 64MB] works around the issue. This started in 5.6.

HPN references the problem at the bottom of this page: http://www.psc.edu/index.php/hpn-ssh#patches

Thanks,

Scott Watson
Systems Administrator - University of Manitoba

Comments

Jim Basney - 2015-01-22

Subject: HPN-SSH fatal: buffer_append_space: alloc 67141632 not supported
From: Jim Basney 
To: 
CC: Scott Watson 

Hello HPN-SSH developers,

Scott Watson (cc'ed) has reported an issue with the HPN14 patch that we
include in GSI-OpenSSH:

  https://globus.atlassian.net/browse/GT-579

The error he sees is:

  "fatal: buffer_append_space: alloc 67141632 not supported"

Manually setting HPNBufferSize=[number less than 64MB] works around the
issue.

At the bottom of http://www.psc.edu/index.php/hpn-ssh I read, "If you are
experiencing disconnects due to a failure in buffer_append_space please
let us know." So I'm writing to connect you with Scott so you can gather
the information you need to resolve the problem, then I'll be happy to
roll the fix into GSI-OpenSSH.

Thanks,
Jim

Jim Basney - 2015-07-21

Note that buffer_append_space() changed in OpenSSH 6.7 and later, so there's a good chance that updating GSI-OpenSSH to the current OpenSSH release will resolve this issue.

Dan Powers - 2015-07-22

We've got a user of gsi-openssh-server-5.7-3.el6+gt6.x86_64 who is reporting the same issue described here in ticket:

https://globusonline.zendesk.com/agent/tickets/304090

They report that:

"We also tried setting HPNBufferSize to 16384, in both the sshd_config file and using the option "oHPNBufferSize=16384", but it did not make any difference."

and also that:

"The only thing that worked was disabling HPN completely, using "HPNDisabled yes" in the sshd_config file."

Gigi Rohder - 2015-10-28

We recently received the following comment as a ticket at our Globus Support Help Desk. Please feel free to contact him directly if warranted. Also adding the patch which he just sent.
Thanks,
  Gigi

https://globusonline.zendesk.com/attachments/token/GtnMa4IJGi4D4QrKUmK1TtBaa/?name=hpn-14v1-256MBbuffer.txt

from: Adam Dorsey  - adam.dorsey@noaa.gov

Hi,
I am trying to provide information regarding https://globus.atlassian.net/browse/GT-579 but I have no idea how to request an account for that portal, so I'm emailing you instead. Sorry in advance.

Regarding that issue (buffer_append_space error causing disconnects from clients) we've found a workaround. Recompiling the gsi-openssh package and changing the maximum HPN buffer size from 64MB to 256MB seems to resolve the issue. I have a patch, which I will gladly send to whoever is responsible for that package.

Can you please either get me an account on your bug tracker, or point me towards whoever maintains the gsi-openssh package?

Thanks,
Adam Dorsey (NOAA)

Globus Toolkit/GT-580

Summary

add HPN Multithreaded AES-CTR Cipher to GSI-OpenSSH

Details

Type: New Feature

Status: Open

Description

GSI-OpenSSH doesn't include the HPN Multithreaded AES-CTR Cipher patch from http://www.psc.edu/index.php/hpn-ssh because in my previous tests it caused instability (connection freezing). This issue is to try again to add this functionality for the next GSI-OpenSSH release and see if we can diagnose the source of instability if it occurs again. Given past instability, we need this functionality to be controlled by ssh_config/sshd_config so it can be enabled/disabled at run-time as needed.

See mailing list discussion at:
http://lists.globus.org/pipermail/gsi-openssh-dev/2015-February/000013.html

Comments

Globus Toolkit/GT-581

Summary

Prefer IPV6 address family when creating a listener on all interfaces

Details

Type: Bug

Status: Open

Description

A change in glibc causes getaddrinfo() results to now come sorted with v4 family addrs before v6 family addrs, and the tcp driver creates the listener on the first one only.  The result is that a listener can currently only be on a v4 interface or v6, but not both.  The tcp driver should choose the v6 family addr when listening on all interfaces.

Comments

Globus Toolkit/GT-582

Summary

RestrictPaths option in GCS config file doesn’t work properly if it includes an entry for RW~

Details

Type: Bug

Status: Resolved 2015-03-13

Description

Issue discovered in ticket https://globusonline.zendesk.com/agent/tickets/303326

Current work around is to have user simply put RW/home into RestrictPaths instead of RW~.

On a managed endpoint with sharing enabled, if the RestrictPaths option in the /etc/globus-connect-server.conf file contains an entry for RW~, then sharing outside the home directory doesn't work properly.

Instead you get the following: generates "Message: Fatal FTP Response --- 500 Command failed : Path not allowed." messages if attempting to access a share pointed outside the home directory (but still within the RestrictPaths limits) that was created before RW~ was added to RestrictPaths, or generating a "The endpoint said permission denied for this directory" error if attempting to create a new share pointed outside of the home directory (but still within the RestrictPaths limits) after RW~ has been added to RestrictPaths.

UPDATE: User also reported similar issues using the * character in RestrictPaths.

Comments

Mike Link - 2015-03-13

Fixed in globus-gridftp-server 7.22.

Globus Toolkit/GT-583

Summary

SEG hangs looking for moved log directory (EPEL packaging)

Details

Type: Bug

Status: Closed 2015-03-08

Description

We've run into a problem with the latest GT6 packages where a globus-job-run against a PBS jobmanager using the SEG will hang indefinitely. Attaching gdb to the globus-scheduler-event-generator process showed that globus_l_job_manager_find_logfile() repeatedly fails with the error SEG_JOB_MANAGER_ERROR_LOG_NOT_PRESENT, yet when I looked at the file system, the SEG logfile for the day was definitely there and the SEG had successfully been writing to it.

This appears to be caused to be caused by the move of the SEG log directories from /var/lib/globus/* to /var/log/globus/* -- creating the symlink /var/lib/globus/globus-seg-pbs -> /var/log/globus/globus-seg-pbs fixed the problem. I haven't been able to find the part of the code that needs to be changed, so as a workaround, we've just modified the init script to create those symlinks.

Instructions to reproduce on EL6 using the EPEL repos (assuming grid certificates are already present):

# as root:
yum install  globus-gatekeeper  globus-gram-client-tools \
 globus-gram-job-manager-pbs-setup-seg  globus-proxy-utils \
 globus-scheduler-event-generator-progs  munge \
 torque-{client,mom,scheduler,server}

# set up a 1-node pbs cluster
mom_config=/var/lib/torque/mom_priv/config
nodes_file=/var/lib/torque/server_priv/nodes
servername_file=/var/lib/torque/server_name

/usr/sbin/create-munge-key -f
service munge start

echo '$pbsserver' $(hostname) > $mom_config
service pbs_mom start

service pbs_sched start

echo $(hostname) np=1 > $nodes_file
hostname > $servername_file
service pbs_server start

pbs_config='
create queue batch queue_type=execution
set queue batch started=true
set queue batch enabled=true
set queue batch resources_default.nodes=1
set queue batch resources_default.walltime=3600
set server default_queue=batch
set server keep_completed = 600
set server job_nanny = True
set server scheduling=true
set server acl_hosts += *
set server acl_host_enable = True
'

qmgr $(hostname) <<< "$pbs_config"

# wait for the output of `qnodes -s $(hostname)` to contain 'state = free'

globus-scheduler-event-generator-admin -e pbs
globus-gatekeeper-admin -e jobmanager-pbs-seg

service globus-gatekeeper start
service globus-scheduler-event-generator start

# as a user:
grid-proxy-init
globus-job-run `hostname`/jobmanager-pbs-seg /usr/bin/id
# ^ this should hang
# until you do `ln -s  /var/log/globus/globus-seg-pbs  /var/lib/globus`

(Discovered and documented by Mat Selmeci, UWMadison/OSG Software)

Comments

Mattias Ellert - 2015-02-18

Also reported in redhat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1193992

Mattias Ellert - 2015-02-19

Updates in EPEL testing:

EPEL5: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2015-0827
EPEL6: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2015-0817
EPEL7: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2015-0820

Mattias Ellert - 2015-02-24

Positive feedback (karma: +1) reported on the EPEL 6 update.

Mattias Ellert - 2015-03-08

EPEL updates are now in EPEL stable.

Globus Toolkit/GT-584

Summary

/Library/Globus/etc/ssh not found

Details

Type: Bug

Status: Open

Description

If you choose the single-user-only option in the GT MacOS binary installer, gsissh returns "/Library/Globus/etc/ssh not found" because it doesn't find the installation in $HOME/Library/Globus. A work-around is to do "export GLOBUS_LOCATION=$HOME" or "export GLOBUS_LOCATION=$HOME/Library/Globus". The GSI-OpenSSH search path code (pathnames.c) needs to be updated for compatibility with this GT6 installation method.

Comments

Jim Basney - 2015-03-24

Also if $GLOBUS_LOCATION isn't set, then gsiscp can't find gsissh:

$ gsiscp hello.c xd-login.opensciencegrid.org:~
/usr/bin/gsissh: No such file or directory
lost connection
$ export GLOBUS_LOCATION=/Library/Globus
$ gsiscp hello.c xd-login.opensciencegrid.org:~
/Library/Globus/etc/ssh not found.
/Library/Globus/etc/ssh not found.
hello.c                             100%   79     0.1KB/s   0.1KB/s   00:00

Globus Toolkit/GT-585

Summary

Environrment and threading config not loaded from config dir

Details

Type: Bug

Status: Open

Description

Some env vars and threading config need to be processed early in the server init in order to affect lower GT libraries (i.e. threads need to be set before globus-common loads).  The conf-dir files are not included in this early processing.

Comments

Globus Toolkit/GT-586

Summary

Restrict sharing based on username or group membership

Details

Type: New Feature

Status: Open

Description

Add the ability to restrict creation of and access to shares based on user or group membership of the share owner.

Four new config parameters will be added:
sharing_users_allow
sharing_users_deny
sharing_groups_allow
sharing_groups_deny

if a user matches multiple lists, the order of precedence is userdeny, userallow, groupdeny, groupallow (processing stops after the first match).

if userallow or groupallow lists are set, the user much match at least one of them.  if a list is unset, it is skipped.

Comments

Globus Toolkit/GT-587

Summary

path to SSH is not getting set correctly in binary packages for sshftp

Details

Type: Bug

Status: Resolved 2015-03-13

Description

> In gt6 the last line of the file is
>
> exec @SSH_BIN@ $port_str $remote_host $remote_program
>
> which causes globus-url-copy to fail with sshftp. In gt5 the last line of the same file is
>
> exec /usr/bin/ssh $port_str $remote_host $remote_program
>
> Changing the line fixes gt6 for sshftp.

This issues was first reported here:
https://globusonline.zendesk.com/agent/tickets/303447

Comments

Mike Link - 2015-03-13

Fixed in globus-ftp-client-8.21.

Globus Toolkit/GT-588

Summary

When conducting transfers with globus-url-copy against a GridFTP server configured for ssh auth, the GridFTP server will attempt to use ports outside of the range defined by $GLOBUS_TCP_PORT_RANGE or port_range.

Details

Type: Task

Status: Open

Description

https://globusonline.zendesk.com/agent/tickets/303477

When conducting transfers with globus-url-copy against a GridFTP server configured for ssh auth, the GridFTP server will attempt to use ports outside of the range defined by $GLOBUS_TCP_PORT_RANGE or  port_range. I set config values in both /etc/gridftp.conf and /etc/gridftp.d/, but issue persisted. GUC was installed from the globus-data-management-client package from the GT 6 toolkit and GridFTP was installed from the globus-data-management-server package from the GT 6 toolkit. GridFTP was configured for ssh auth using the globus-gridftp-server-enable-sshftp command. User reported noticing this problem when running GUC with -sync option, but I found the issue occurred even when no sync option was set.

I was able to reproduce this behavior using two Ubuntu 14.04 aws instances (one with GUC installed acting as client and the other acting as the GridFTP server) using:

globus_gridftp_server: 7.17 (1417812052-85)

and

globus-url-copy: 9.13

Comments

Mike Link - 2015-03-27

 I believe the failure in 7.17 was related to GT-585.  This should work with the latest version.  The config needs to be in /etc/gridftp.conf, or directly set in the file /etc/gridftp-sshftp.

Dan Powers - 2015-03-27

Hi Mike,

When I pull either the 'globus-gridftp' package or the 'globus-data-management-server' package from our repo, I'm still getting "globus_gridftp_server: 7.17 (1417812052-85)". What version of gridftp do I need for the fix and what package should I pull it from? Thanks.

-Dan

Globus Toolkit/PM-251

Summary

Managed endpoint admin wants additional control over who users are permitted to grant share access to

Details

Type: User Story

Status: Done

Description

https://globusonline.zendesk.com/agent/tickets/303542

User would like to be able to restrict users who are creating shares on a managed endpoint from being able to grant access to that share to "all users", but would still like to allow users to be able to grant access to other users or groups individually.

Comments

Rachana Ananthakrishnan - 2015-07-01

https://support.globus.org/entries/24005071

Globus Toolkit/GT-590

Summary

GT5 shows running jobs as being in pending state

Details

Type: Bug

Status: Open

Description

With the LSF job manger, the GT5 GRAM shows jobs as being pending even though they are listed as running. Wei upgraded to OSG 3.2, and noticed that jobs show up as pending when LSF lists them as running.  After looking at the SEG logs, it looks like the SEG correctly records the job state as running but GRAM indicates that the jobs are pending.

See https://ticket.grid.iu.edu/24681  for details.

Comments

Joe Bester - 2015-04-03

The original message in the ticket states that this was after an update from OSG 3.1 to 3.2. What versions of GRAM packages do those refer to?

Globus Toolkit/GT-591

Summary

Regular expressions with sharing_rp

Details

Type: Bug

Status: Open

Description

From ticket: #302913

We are having some difficulties setting up sharing_rp as needed for our site. We wish to restrict sharing to a specific share directory contained within the user's project directory. The share directory path has the form

/projects/*/*/share

Sharing works when I do not use regular expressions:

sharing_rp "N/,R/mnt/b/projects/sciteam/xxx/share"

But every attempt to use regular expressions acceptable to fnmatch() have failed. Are regular expressions supported? Thanks!

RPM list it case it is useful:

ie01$rpm -qa |grep globus
globus-gsi-cert-utils-progs-8.6-2gt.x86_64
globus-xio-debuginfo-3.9-1gt.x86_64
globus-gssapi-error-debuginfo-4.1-12gt.x86_64
globus-gfork-debuginfo-3.2-7gt.x86_64
globus-ftp-client-debuginfo-7.6-1gt.x86_64
globus-gsi-sysconfig-debuginfo-5.3-8.el6.x86_64
globus-gsi-sysconfig-5.3-8.el6.x86_64
globus-xio-gsi-driver-2.4-1gt.x86_64
globus-gfork-3.2-7gt.x86_64
globus-xio-popen-driver-2.3-7gt.x86_64
globus-rsl-debuginfo-9.1-12gt.x86_64
globus-gsi-proxy-ssl-debuginfo-4.1-12gt.x86_64
globus-gridmap-callout-error-debuginfo-1.2-10gt.x86_64
globus-gram-job-manager-debuginfo-13.54-1gt.x86_64
globus-gass-server-ez-debuginfo-4.3-6gt.x86_64
globus-authz-debuginfo-2.2-9gt.x86_64
globus-gram-protocol-debuginfo-11.3-11.el6.x86_64
globus-gss-assist-9.0-2gt.x86_64
globus-authz-callout-error-2.2-9gt.x86_64
globus-xio-pipe-driver-2.2-7gt.x86_64
globus-gsi-callback-debuginfo-4.6-2.el6.x86_64
globus-gass-copy-debuginfo-8.6-7.el6.x86_64
globus-gass-copy-progs-8.6-7.el6.x86_64
globus-proxy-utils-5.2-1gt.x86_64
globus-xio-pipe-driver-debuginfo-2.2-7gt.x86_64
globus-io-debuginfo-9.6-1gt.x86_64
globus-gridmap-eppn-callout-debuginfo-0.6-1gt.x86_64
globus-gram-job-manager-fork-debuginfo-1.5-9gt.x86_64
globus-gram-client-tools-debuginfo-10.5-1gt.x86_64
globus-gass-cache-debuginfo-8.1-10gt.x86_64
globus-callout-debuginfo-2.5-1gt.x86_64
globus-gsi-openssl-error-2.1-13gt.x86_64
globus-gsi-cert-utils-8.6-2gt.x86_64
globus-gsi-proxy-core-6.3-1gt.x86_64
globus-xio-3.9-1gt.x86_64
globus-ftp-control-4.8-1gt.x86_64
globus-authz-2.2-9gt.x86_64
globus-common-progs-14.12-1ggt.x86_64
globus-gridftp-server-control-2.10-1gt.x86_64
globus-gass-copy-8.6-7.el6.x86_64
globus-gridftp-5.2.2-1gt.x86_64
globus-gsi-credential-debuginfo-6.0-2.el6.x86_64
globus-gass-transfer-debuginfo-7.2-9.el6.x86_64
globus-gsi-proxy-ssl-4.1-12gt.x86_64
globus-gsi-callback-4.6-2.el6.x86_64
globus-gssapi-gsi-10.12-3gt.x86_64
globus-gss-assist-progs-9.0-2gt.x86_64
globus-xio-popen-driver-debuginfo-2.3-7gt.x86_64
globus-usage-debuginfo-3.1-10gt.x86_64
globus-openssl-module-debuginfo-3.3-4gt.x86_64
globus-gss-assist-debuginfo-9.0-2gt.x86_64
globus-gsi-openssl-error-debuginfo-2.1-13gt.x86_64
globus-gridmap-verify-myproxy-callout-debuginfo-1.5-1gt.x86_64
globus-gridftp-server-control-debuginfo-2.10-1gt.x86_64
globus-gram-job-manager-lsf-debuginfo-1.3-1gt.x86_64
globus-gram-job-manager-callout-error-debuginfo-2.1-12gt.x86_64
globus-gatekeeper-debuginfo-9.16-1gt.x86_64
globus-gass-cache-program-debuginfo-5.2-3gt.x86_64
globus-common-debuginfo-14.12-1ggt.x86_64
globus-gssapi-error-4.1-12gt.x86_64
globus-gass-transfer-7.2-9.el6.x86_64
globus-gridftp-server-6.43-1gt.x86_64
globus-xioperf-debuginfo-3.1-7gt.x86_64
globus-proxy-utils-debuginfo-5.2-1gt.x86_64
globus-gsi-proxy-core-debuginfo-6.3-1gt.x86_64
globus-gridftp-server-debuginfo-6.43-1gt.x86_64
globus-gram-job-manager-condor-debuginfo-1.4-4gt.x86_64
globus-authz-callout-error-debuginfo-2.2-9gt.x86_64
globus-gram-job-manager-pbs-debuginfo-1.6-7.el6.x86_64
globus-common-14.12-1ggt.x86_64
globus-gsi-credential-6.0-2.el6.x86_64
globus-callout-2.5-1gt.x86_64
globus-gridftp-server-progs-6.43-1gt.x86_64
globus-xio-gsi-driver-debuginfo-2.4-1gt.x86_64
globus-gssapi-gsi-debuginfo-10.12-3gt.x86_64
globus-gsi-cert-utils-debuginfo-8.6-2gt.x86_64
globus-gram-job-manager-sge-debuginfo-1.7-2gt.x86_64
globus-gram-client-debuginfo-12.4-8gt.x86_64
globus-ftp-control-debuginfo-4.8-1gt.x86_64
globus-scheduler-event-generator-debuginfo-4.7-8.el6.x86_64
globus-openssl-module-3.3-4gt.x86_64
globus-io-9.6-1gt.x86_64
globus-usage-3.1-10gt.x86_64
globus-ftp-client-7.6-1gt.x86_64


Reply 1:
GridFTP supports shell globbing with asterisk, but not regular expressions, with fnmatch. Any testing you do with fnmatch() should give you viable paths to give to GridFTP.
That means that patterns of the form ".../*/*/..." should all be supported.

However, there are some bugs (with fixes on the way) relating to symlink handling in RestrictPaths. Are there links in the paths you're working with, perhaps after the globbing pattern?

Reply 2:
There are symlinks. /mnt/b/projects -> /projects. So we need to share out /projects/sciteam//share, the real path is /mnt/b/projects/sciteam//share.

This works:

"sharing_rp N/,R/projects/sciteam/xxx/share/".

These d not work:

sharing_rp N/,R/mnt/b/projects/sciteam/xxx/share/
sharing_rp N/,R/mnt/b/projects/sciteam/*/share/
sharing_rp N/,R/projects/sciteam/*/share/

Any reason fnmatch doesn't use FNM_PATHNAME?

Reply 3:
There is an upcoming update that fixes a problem when paths contain symlinks. However, it should work if you add both the symlinked and true path to the RP list.

sharing_rp R/mnt/b/projects/sciteam/*/share/,R/projects/sciteam/*/share/

The update also changes to using FNM_PATHNAME, though I don't believe that is the problem here.

Reply 4:
That has helped with setting up shared on HPSS and Blue Waters. Thanks.

Blue Waters:

sharing_rp N/,R/projects/sciteam/*/share/,R/mnt/b/projects/sciteam/*/share

HPSS (no symlink magic):

sharing_rp N/,R/projects/sciteam/*/share

User arnoldg can create shares (arnoldg#onlinetest and arnoldg#nearlinetest) but I (user jasonalt) can not access them (Permission Denied). Looking at the debug info available through the "Transfer Files" web interface:

"Command Failed: Error (list) Server: arnoldg#onlinetest (ie25.ncsa.illinois.edu:2811) Command: MLST / Message: Fatal FTP Response --- 500 Command failed : Path not allowed."

arnoldg has allowed me explicitly, anonymous access and sent me an email invite, still with the same result "Permission Denied".

Reply 5:

Explicitly adding his share path to sharing_rp makes it work:

sharing_rp N/,R/projects/sciteam/*/share,R/projects/sciteam/jn7/share

So it appears to be a problem with the wildcard

Reply 6:
I seem to have this working now on nearline and bluewaters but it is not pretty. On both systems, I wanted to share /projects/*/*/share. nearline does not have symlinks in the path. Bluewaters does (/mnt/b/project => /projects).

In looking at globus_i_gfs_data_check_path(), it seems there is a problem with comparing the true_path/check_path/start_path to the alias (active_rp_list / rp_list) because start_path is a concatenation of chroot_path and in_path, so in other words, it ends with a trailing '/'. Combine that with the fact that the alias has wildcards and you can follow the failure path through globus_i_gfs_data_check_path().

I debugged this on nearline which does have the HPSS DSI but hopefully the attachment will still be of use. I setup the debugger to print the stack trace, session_handle, chroot_path, active_rp_list and rp_list. The final call to globus_i_gfs_data_check_path() fails.

Once I added a trailing '/' to each entry in sharing_rp, I was able to access the share but I was not able to descend into sub directories. For that to work, I had to append '*' to each entry in sharing_rp. Then I had to append '/' to make the matching work.

So, I would have expected nearline to work with 'sharing_rp N/,R/projects/*/*/share'. Instead I have to use this:

sharing_rp N/,R/projects/*/*/share/,R/projects/*/*/share,R/projects/*/*/share/*,R/projects/*/*/share/*/

On bluewaters, with the symlink in the path, I have to use this:

sharing_rp N/,R/projects/*/*/share,R/projects/*/*/share/,R/projects/*/*/share/*,R/projects/*/*/share/*/,R/mnt/b/projects/*/*/share,R/mnt/b/projects/*/*/share/,R/mnt/b/projects/*/*/share/*,R/mnt/b/projects/*/*/share/*/

Comments

Globus Toolkit/GT-592

Summary

Allow for extra ssh options when initiation gridftp-ssh transfers

Details

Type: Improvement

Status: Open

Description

When doing GridFTP-SSH transfers, it would be nice if the user could pass in extra ssh options. My use case is remote workflow jobs, which makes it difficult to control ssh via for example the .ssh/config file. It is common for us to specify options like what ssh key to use (-i) and -o StrictHostKeyChecking=no.

A simple fix would be to expose this via a new environment variable: $GLOBUS_SSHFTP_EXTRA_OPTS. For example:

diff --git a/gridftp/client/source/gridftp-ssh.in b/gridftp/client/source/gridftp-ssh.in
index 5db1863..7dbe15e 100644
--- a/gridftp/client/source/gridftp-ssh.in
+++ b/gridftp/client/source/gridftp-ssh.in
@@ -44,4 +44,4 @@ if [ "X" != "X$GLOBUS_SSHFTP_PRINT_ON_CONNECT" ]; then
     echo "Connecting to $1 ..." >/dev/tty
 fi

-exec @SSH_BIN@ $port_str $remote_host $remote_program
+exec @SSH_BIN@ $GLOBUS_SSHFTP_EXTRA_OPTS $port_str $remote_host $remote_program

Comments

Globus Toolkit/GT-593

Summary

Would like to be able to set gridftp server umask in /etc/gridftp.conf file

Details

Type: Improvement

Status: Open

Description

https://globusonline.zendesk.com/agent/tickets/303623

The user in this ticket is reporting that the default umask that gridftp is running with doesn't meet their needs. They have to hand edit the start up script to change this. They'd like for /etc/gridftp.conf to have a config option that would allow setting the umask.

Comments

Globus Toolkit/GT-594

Summary

Add keep-alive messages to GridFTP control channel

Details

Type: Improvement

Status: Open

Description

See EGCF ticket
https://rt.ige.psnc.pl/rt/Ticket/Display.html?id=381

See EGI ticket
https://ggus.eu/index.php?mode=ticket_info&ticket_id=112473

Summary: modern routing hardware drops inactive connections after a few minutes. This may block the control channel in GridFTP in case of long-running transfers. Then the transfer cannot complete properly.

Detailed description:
Our ICT department confirmed that they recover idle connections after 5 minutes. But keepalive settings should solve the issue. I went back to the keepalive settings (which I already tried before without success) and set very aggressive configurations, with keepalives being sent each 15s after 30s of idleness.

$ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 20

Then, I made some tcpdumps, and realize that those keepalive packets were not being sent. Running 'netstat --timers' showed that timers were off for the gridftp connections while for other applications were on:

Show quoted text
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
(...)
tcp 0 0 Y.Y.Y.Y:22 X.X.X.X:41183 ESTABLISHED 3906/sshd keepalive (6948.79/0/0)
(...)
tcp 0 0 Z.Z.Z.Z:51079 K.K.K.K:23375 ESTABLISHED 3964/globus-url-cop off (0.00/0/0)
tcp 0 0 Z.Z.Z.Z:45386 M.M.M.M:2811 ESTABLISHED 3964/globus-url-cop off (0.00/0/0)
(...)
tcp 0 0 Y.Y.Y.Y:22 X.X.X.X:41181 ESTABLISHED 3749/sshd keepalive (6940.96/0/0)


This means that the kernel is doing its job but the application seems it does not speak keepalive language...

Then, I went to the keepalive project [1]. It is possible to add keepalive support to a binary simply by preloading a library. They make available a "shared library that overrides the socket system call in most binaries, without the need to recompile or modify them. The technique is based on the preloading feature of the ld.so(8) loader included in Linux, which allows you to force the loading of shared libraries with higher priority than normal. Programs usually use the socket(2) function call located in the glibc shared library; with libkeepalive you can wrap it and inject the setsockopt (2) just after the socket creation, returning a socket with keepalive already set to the main program."

I have downloaded it, compiled it and launched the globus-url-copy as

$ LD_PRELOAD=/tmp/libkeepalive.so globus-url-copy -dbg gsiftp://:2811//my/domain/atlas/disk-only/atlasscratchdisk/rucio/panda/f8/62/panda.um.user.williams.5206161._001779.physics.root /dev/null

And now, running 'netstat --timers' shows that keepalive is being used by the application:

Show quoted text
tcp 0 0 Z.Z.Z.Z:57632 K.K.K.K:20241 ESTABLISHED 11615/globus-url-co keepalive (28.74/0/0)
tcp 0 0 Z.Z.Z.Z:39697 M.M.M.M:2811 ESTABLISHED 11615/globus-url-co keepalive (13.40/0/0)

and finally, the dam transfer succeed:

debug: response from gsiftp://:2811//my/domain/atlas/disk-only/atlasscratchdisk/rucio/panda/f8/62/panda.um.user.williams.5206161._001779.physics.root:
226 Transfer complete.
debug: data callback, no error, buffer 0x7fd4c51a6010, length 377513, offset=212860928, eof=true
debug: operation complete

So, the conclusion was that globus-url-copy is not prepared to deal with the keepalives, and we are seing it here because we have a very restrict firewall setting to recover idle connections.

I am using the following version.

$ which globus-url-copy
/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/emi/3.14.0-1_v4.sl6/usr/bin/globus-url-copy

This makes me think (again) that the problem relies on the control channel, which is kept idle for too long (when we are using only one stream) while the transfer is happening on the data channel. Some routers and firewalls in between may want to recover the connection if nothing is going on there. When I saw this, I thought I was able to fix the issue playing with the TCP keepalive parameters, like Pablo suggested, but that did not solved the problem.

It certainly makes sense to add keep-alive (optional) to all Globus components working with connections
that can be idle for a long time but not able to handle timeouts and/or reconnects internally
(Globus XIO layer already supports keepalive option and GASS uses it for http transfers).

Comments

Mattias Ellert - 2015-08-26

EGCF as a a support unit in GGUS gets reminders about providing feedback regarding this issue. Is there any information from the Globus developers about this issue?

Stuart Martin - 2015-10-02

Thanks for the bug/feature report.  We'll try and get this added soon.

Mike Link - 2015-10-26

globus-ftp-control-6.8 has been released and enables keepalives on the control channel.

Globus Toolkit/GT-595

Summary

Remove GRAM slurm option: SBATCH -l h_cpu

Details

Type: Bug

Status: Resolved 2015-05-08

Description

From: David Carver

In slurm.pm the  cpu_time "#SBATCH -l h_cpu=" is not valid on Stampede and should be removed.  I think this was a carry over from SGE.

'cpu_time', sub { "#SBATCH -l h_cpu=" . format_time_value(shift) }],

Example

login5.stampede(6)$ cat slurm.job.35402
#!/bin/bash
# Grid Engine batch job script built by Globus job manager

#SBATCH -N 4
#SBATCH -n 4
#SBATCH -p development
#SBATCH -A TG-STA110011S
#SBATCH -t 0:5:00
#SBATCH -l h_cpu=0:5:00



login5.stampede(7)$ sbatch slurm.job.35402
-----------------------------------------------------------------
             Welcome to the Stampede Supercomputer
-----------------------------------------------------------------

sbatch: invalid option -- 'l'
sbatch: error: Try "sbatch --help" for more information

Comments

Joe Bester - 2015-05-08

A patch to remove that is in the new slurm LRM package: http://toolkit.globus.org/ftppub/gt6/packages/globus_gram_job_manager_slurm-2.6.tar.gz
and binary packages are available from the GT RPM/Deb repositories.

Globus Toolkit/GT-596

Summary

Add ability to set cipher suite in GSSAPI

Details

Type: New Feature

Status: Resolved 2015-06-18

Description

New feature: OpenSSL encryption ciphers allowed by Globus.org and Globus Toolkit clients and services Globus should prevent the use of weak vulnerable ciphers during encryption negotiation when using OpenSSL for SSL/TLS.

Add an option to Globus to allow for the HIGH cipher list to be specified, rather than DEFAULT, and this could be a runtime option to the Globus libraries that use OpenSSL

Comments

Joe Bester - 2015-06-18

Fixed in GT 5.2 branch: globus_gssapi_gsi 10.13
Fixed in GT 6 branch: globus_gssapi_gsi 11.17

Globus Toolkit/PM-252

Summary

User wants share file in ~/.globus/sharing on an endpoint to get deleted when a share is deleted

Details

Type: Product Feature

Status: Open