File Operations
1. Overview
This document describes synchronous operations that can be performed on a collection.
The operations described in this document are short foreground operations, that don’t return data until completed or an error is encountered. The API resources for these operations have prefix '/operation/'. This is used to indicate that they involve communication with the collection, and could raise errors related to network communication or authentication failures.
Long running operations, including delete and transfer, are documented elsewhere, and result in the creation of a task to track progress. See Task Submission for details.
2. Path Encoding
For maximum compatibility with different filesystems, it’s recommended to use only ASCII characters in file and directory names. If using other characters is absolutely necessary, all systems involved should be configured to use UTF-8 encoding when possible. See platform notes below.
The API uses JSON as its data format, and all strings in JSON are Unicode. Since Linux filesystems and the GridFTP protocol use raw bytes for path names, the Transfer API service must decode the bytes in order to display them as characters. It does this assuming UTF-8 encoding, which is the most common encoding on Linux systems.
2.1. Invalid Path Names
Globus does not allow the string "\r\n" in any file or directory names, and passing such a path will result in an error from the API.
Depending on the collection’s underlying filesystem, other characters or strings may
be disallowed. For example, Windows filesystem do not allow several common
punctuation characters, including '<', '>', and '*'. Globus attempts to
classify such errors with code InvalidPath
, but there may be combinations
of GridFTP server and filesystem that result in a generic EndpointError
[1] code.
2.2. Linux and Unix
In Linux and Unix filesystems, file names are stored as raw bytes. The common case is that the bytes will be UTF-8 encoded Unicode, but it depends on user space configuration, which can be set system wide and overridden by individual users or even by individual applications or login shells. On most modern Linux systems, UTF-8 will be used everywhere unless a user goes out of their way to use something else. Transferring data between two such Linux systems using UTF-8 encoding is the best case scenario - no path name corruption will occur.
2.3. Windows
Windows systems use UTF-16 encoding but do not enforce any particular normalization. It is recommended that Windows users limit themselves to ASCII characters for file names. Non-ASCII special characters, accented characters, and non-English characters could be incorrectly encoded, resulting in file name corruption. We plan on fixing this in a future update to Globus Connect Personal for Windows, by having the GridFTP server convert everything to/from UTF-8 for communication with Globus. Please contact support@globus.org if you have concerns.
2.4. Mac OS X
Mac OS X uses UTF-8 by default, but HFS+ also forces NFD normalization. This can cause path name corruption when copying files with non-ASCII names from Linux or Windows systems.
The new file system, APFS, will not force NFD normalization, which fixes the most common cause of name mangling (a single file). However, there is still a potential issue: two file names that differ only in normalization are allowed on Linux and Windows, but will alias a single file on Mac APFS because it is normalization-insensitive (this is a similar issue as case-insensitivity).
3. Document Types
3.1. Result
The "result" family of document types, which includes resource-specific result types like "mkdir_result", represents the result of a foreground operation. If the operation fails, an error result will be returned. Some operations have multiple success cases.
Field Name | JSON Type | Description |
---|---|---|
DATA_TYPE |
string |
Has value "result" or "(subtype)_result" to indicate a result family document type. Some result subtypes have additional fields. |
code |
string |
Code indicating how the operation succeeded. Depends on the specific operation. |
message |
string |
Message describing how the operation succeeded in more detail. |
resource |
string |
Path relative to the API version root of the request. |
request_id |
string |
ID of the request, which can be used by Globus admins to look up the request in the server logs. Useful when submitting support requests or posting to the mailing list. |
{
"DATA_TYPE": "mkdir_result",
"code": "DirectoryCreated",
"message": "The directory was created successfully",
"request_id": "ABCdef789",
"resource": "/operation/endpoint/6c54cade-bde5-45c1-bdea-f4bd71dba2cc/mkdir"
}
3.2. file_list Document
Field Name | JSON Type | Description |
---|---|---|
DATA_TYPE |
string |
Always has value "file_list" to indicate this document type. |
endpoint[1] |
string |
The collection ID that was requested. |
path |
string |
The path that was listed; may start with |
absolute_path |
string |
The path that was listed; This field will not include the host path for guest collections; it is always a virtual root based path. This field is not a "physical path", meaning it does not resolve symlinks like "pwd -P". |
rename_supported |
bool |
Indicates if the collection supports rename operations. This does not necessarily mean the current user has authorization to rename a file. |
symlink_supported |
bool |
Indicates if the collection supports creating symbolic links. |
DATA |
list |
List of "file" documents. |
{ "DATA_TYPE": "file_list", "path": "/~/path/to/dir", "endpoint": "5d3c6c59-5244-11e5-84dd-22000bb3f45d", "rename_supported": true, "symlink_supported": true, "DATA": [ { "DATA_TYPE": "file", ... }, ... ] }
3.3. File Document
Field Name | JSON Type | Description |
---|---|---|
DATA_TYPE |
string |
Always has value "file" to indicate this document type. |
name |
string |
The name of this entry in the filesystem |
type |
string |
The type of the entry: "dir", "file", or "invalid_symlink". For unix special files "chr", "blk", "pipe", or "other". If this entry is a valid symlink, the If this entry is an invalid symlink, the |
link_target |
string |
If this entry is a symlink (valid or invalid), this is the path of its target, which may be an absolute or relative path. If this entry is not a symlink, this field is null. |
permissions |
string |
The unix permissions, as an octal mode string. |
size |
int |
The file size in bytes. |
user |
string |
The user owning the file or directory, if applicable on the collection’s filesystem. |
group |
string |
The group owning the file or directory, if applicable. |
last_modified |
string |
The date and time the file or directory was last modified, in modified ISO 8601 format: YYYY-MM-DD HH:MM:SS+00:00, i.e. using space instead of "T" to separate date and time. Always in UTC, indicated explicitly with a trailing "+00:00" timezone. |
link_size, link_user, link_group, link_last_modified |
various |
If this entry is a symlink (valid or invalid), these fields show attributes
of the symlink itself, not its target.
Same format as the |
{
"DATA_TYPE": "file",
"name": "somefile",
"type": "file",
"user": "auser",
"group": "agroup",
"permissions": "0644",
"last_modified": "2000-01-02 03:45:06+00:00",
"link_target": null,
"size": 1024
}
5. Common Query Parameters
Name | Type |
---|---|
Description |
fields |
string |
Comma separated list of fields to include in the response. This can be used to save bandwidth on large list responses when not all fields are needed. |
6. Common Errors
The error code
can be found in the HTTP response body JSON document. See
error overview
.
Code | HTTP Status | Description |
---|---|---|
ServiceUnavailable |
503 |
The service is down for maintenance. |
OperationPaused |
409 |
An administrator of the endpoint or collection has set a pause rule for the operation. The error response will include a 'pause_message' string field that contains a message from the administrator about why the pause rule was set. |
ConsentRequired |
403 |
The collection requires consent to a data_access scope missing from the user’s current consents. See Data Access Consent for details. |
7. Operations
7.1. List Directory Contents
List the contents of the directory at the specified path on a collection.
The path is specified in the path
query parameter. If the parameter is not
passed, the default path depends on the type of collection:
-
For guest collections the default is '/'.
-
For mapped collections, the default is '/~/'. Most of the time this will be the mapped user’s home directory.
Results can be paged, sorted, and filtered. By default all entries
up to the 100,000 entry limit are returned, sorted by (type
, name
).
URL |
/operation/endpoint/<collection_id>/ls [?path=/path/to/dir/][1] |
---|---|
Method |
GET |
7.1.1. Directory Listing Query Parameters
Name | Type | Description |
---|---|---|
path |
string |
Path to a directory on the collection to list. Non-absolute paths are treated as
relative to |
show_hidden |
boolean |
If |
limit |
int |
Change the page size. Defaults to 100,000, which is also the maximum. Note that the entire directory is is still fetched from the collection on every request. This is because the GridFTP protocol does not support paging, so paging must be handled by the Transfer service. |
offset |
int |
If using a |
orderby |
string |
A comma separated list of order by options. Each order by option is either a
field name, or a field name followed by space and 'ASC' or 'DESC' for
ascending and descending; ascending is the default. For the directory listing
results, any "file" document field can be used in the |
filter |
string |
Return only file documents that match the filter clauses specified in this string. This parameter can be passed multiple times. See Directory Listing Filtering for details. |
local_user |
string |
Optional value passed to identity mapping specifying which local user account to map to. Only usable with Globus Connect Server v5 mapped collections. |
7.1.2. Directory Listing Filtering
An individual filter parameter for directory listing is made up of filter clauses separated by forward slashes. Each clause starts with a field from the Directory Listing Response followed by a colon and filter syntax dependent on the field chosen. An item must match every clause given to be matched by that filter.
For example, "name:~.*/type:dir" would match only hidden directories.
The filter parameter can be passed multiple times to allow a logical OR across the different filter clauses. An item will be included in the response if it matches at least one of the filters.
For example, "filter=type:dir&filter=name:~*.txt" would match both directories and txt files.
String Fields
String fields such as name
and type
accept comma separated lists of
patterns. An item matching any of the patterns is considered matching the
clause. Patterns start with special characters to determine how they are
applied. If no character is given, =
is assumed.
=
requires the strings match exactly.
~
matches against a pattern that can include and
?
as wildcards.
will match any number of other characters.
?
will match any
single character.
!
is the inverse of =
, allowing any string that doesn’t match exactly.
!~
is the inverse of ~
, allowing any string that doesn’t match
the pattern.
Some examples:
"type:=file"
or just "type:file"
would filter out any items that are not
files.
"name:~*.txt,~*.pdf"
would filter out any items that do not end with the .txt
or .pdf extensions.
"user:!alice/user:!bob"
would filter out any items owned by local users
named "alice" or "bob". Note that this is made up of two separate filter
clauses, since "user:!alice,!bob"
would match all items.
"name:!~.*"
would filter out hidden items.
size
The size
field supports a comma separated list of comparison operators
along with an integer value in bytes. An item matching any of the operations
is considered matching the clause.
The supported comparison operators are: =
, !
, <
, >
, ⇐
, and >=
. If
no operator is given, =
is assumed.
Some examples:
"size:=1,=2,=3"
or "size:1,2,3"
would filter out any items that weren’t
1, 2, or 3 bytes in size.
"size:!0"
would filter out any items that were 0 bytes in size.
"size:>=500/size:<1000"
would filter out any items that were between 500 and
1000 bytes in size including 500 byte items but excluding 1000 byte items.
Note that this is made up of two separate filter clauses, since
"size:>=500,<1000"
would match all items.
last_modified
The last_modified
field supports a comma separated date range with dates
specified in ISO 8601 format. Either end of the date range may be left out
to specify an open range. If no comma is given the range defaults to after
the given time.
Some examples:
"last_modified:2020-01-01,"
or "last_modified:2020-01-01"
would list only
items that were last modified on or after Jan 1, 2020.
"last_modified:,2021-01-01"
would list only items that were last modified
before Jan 1, 2021.
"last_modified:2020-01-01,2021-01-01"
would list only items that were last
modified in 2020.
7.1.3. Directory Listing Response
The response is a "file_list" document, containing a list of "file" documents, and some additional directory-level fields. Each "file" document represents a single file or directory. See the "Document Types" section for details.
{
"DATA_TYPE": "file_list",
"path": "/~/path/to/dir/",
"endpoint": "5d3c6c59-5244-11e5-84dd-22000bb3f45d",
"rename_supported": true,
"symlink_supported": true,
"DATA": [
{
"DATA_TYPE": "file",
"name": "somefile",
"type": "file",
"link_target": null,
"user": "auser",
"group": "agroup",
"permissions": "0644",
"last_modified": "2000-01-02 03:45:06+00:00",
"size": 1024
}
]
}
7.1.4. Errors
Code | HTTP Status | Description |
---|---|---|
ClientError.NotFound |
404 |
collection not found. |
EndpointError[1] |
502 |
Catch all for errors returned by the collection that don’t have specific types. |
7.2. Get File or Directory Status
Stat the file or directory at the specified path on a collection.
Like ls
, the path is specified in the path
query parameter.
Unlike ls
, there is no "default" path — it must be specified.
URL |
/operation/endpoint/<collection_id>/stat [?path=/path/to/item][1] |
---|---|
Method |
GET |
7.2.1. Stat Query Parameters
Name | Type | Description |
---|---|---|
path |
string |
Path to a file or directory on the collection. Non-absolute paths are treated as
relative to |
local_user |
string |
Optional value passed to identity mapping specifying which local user account to map to. Only usable with Globus Connect Server v5 mapped collections. |
7.2.2. Stat Response
The response is a
File document,
similar to ls
, which returns a list of such documents. The type
field of
the response can be used to determine if an entity is a file or directory.
See the
"Document Types"
section for more details.
{
"DATA_TYPE": "file",
"group": "agroup",
"last_modified": "2024-01-02 03:45:06+00:00",
"link_group": null,
"link_last_modified": null,
"link_size": null,
"link_target": null,
"link_user": null,
"name": "my_directory",
"permissions": "0755",
"size": 4096,
"type": "dir",
"user": "auser"
}
7.2.3. Errors
Code | HTTP Status | Description |
---|---|---|
InvalidPath |
400 |
The path contains characters that are not supported by the remote filesystem or is otherwise not valid. |
EndpointPermissionDenied[1] |
403 |
The user does not have permission to read the status of the specified path on the collection. For example, if the path is a symlink to outside the collection. |
ClientError.NotFound |
404 |
collection or path not found. |
EndpointError[1] |
502 |
Catch all for errors returned by the collection that don’t have specific types. |
7.3. Make Directory
Create a directory at the specified path on a collection.
URL |
/operation/endpoint/<collection_id>/mkdir[1] |
---|---|
Method |
POST |
Request Body |
|
Response Body |
|
7.3.1. Mkdir Request Fields
Field Name | JSON Type | Description |
---|---|---|
DATA_TYPE |
string |
Always has value "mkdir" to indicate this document type. |
path |
string |
Path of the directory to be created. Non-absolute paths are treated
as relative to |
local_user |
string |
Optional value passed to identity mapping specifying which local user account to map to. Only usable with Globus Connect Server v5 mapped collections. |
7.3.2. Result Codes
The "code" field of the result document will be one of the following:
Code | HTTP Status | Description |
---|---|---|
DirectoryCreated |
202 |
Directory created successfully. |
7.3.3. Errors
The mkdir operation can return any error returned by directory listing, as well as the following errors.
Code | HTTP Status | Description |
---|---|---|
ExternalError.MkdirFailed.Exists |
502 |
The path already exists. |
ExternalError.MkdirFailed.PermissionDenied |
403 |
The user does not have permission to read or write one of the specified file or directories. |
7.4. Rename
Rename or move a file, directory, or symlink on a collection. If the object is a symlink, the symlink itself is renamed, not its target.
When moving to a different parent directory, the parent directory of the new path must already exist.
URL |
/operation/endpoint/<collection_id>/rename[1] |
---|---|
Method |
POST |
Request Body |
|
Response Body |
|
7.4.1. Rename Request Fields
JSON strings are Unicode, but will be encoded as UTF-8 to interact with byte oriented filesystems. See the Path Encoding section for details.
Field Name | JSON Type | Description |
---|---|---|
DATA_TYPE |
string |
Always has value "rename" to indicate this document type. |
old_path |
string |
Current path of a file, directory, or symlink. Non-absolute paths
are treated as relative to |
new_path |
string |
Path the item at |
local_user |
string |
Optional value passed to identity mapping specifying which local user account to map to. Only usable with Globus Connect Server v5 mapped collections. |
7.4.2. Result Codes
The "code" field of the result document will be one of the following:
Code | HTTP Status | Description |
---|---|---|
FileRenamed |
200 |
File or directory renamed successfully. |
7.4.3. Errors
Code | HTTP Status | Description |
---|---|---|
NotSupported |
409 |
collection does not support the rename operation. |
EndpointNotFound[1] |
404 |
collection doesn’t exist or is not visible to the current user. |
GCDisconnectedException |
409 |
the Globus Connect Personal collection is not currently connected. |
GCPausedException |
409 |
the Globus Connect Personal collection is paused. |
EndpointPermissionDenied[1] |
403 |
The user does not have permission to read or write one of the specified paths on the collection. |
NotFound |
404 |
|
InvalidPath |
400 |
One of the specified paths contains characters that are not supported by the remote filesystem or is otherwise not valid. |
Exists |
409 |
|
EndpointError[1] |
502 |
Catch all for other errors received from the collection.
Examples include connection failure,
authentication failure,
and filesystem failures like |