1. Description

The Search API is used to query data from Globus Search.

It has two forms. Simple queries (for example, those not defining facets) may be accessed using the GET form for ease. More complicated queries will use the POST form to specify richer requirements. In either case, the result format is the same.

1.1. Timeouts

Query execution time is capped at 30 seconds. If query processing takes longer than this time, the API will terminate it and return a 504 error.

2. Document Types

There are multiple documents associated with Search Queries, and they can generally be categorized into two types: results, and parts of complex queries for use with the POST Query method.

2.1. Request Documents

2.1.1. GSearchRequest

This is the main document type for encoding a complex Search query, as in the POST Query API.

Field Name Type Description

q

String

User-supplied query, conforming to the query syntax

limit

Integer

Optional. Limit the results given to limit many items. Defaults to 10

offset

Integer

Optional. Start at the result numbered offset, in conjunction with limit allows result paging. Defaults to 0.

facets

Array

Optional. An array of GFacet Documents. Facets to count on the search

filters

Array

Optional. An array of GFilter Documents. Filters to apply to the search

boosts

Array

Optional. An array of GBoost Documents. Fields to increase value in un-sorted searches

sort

Array

Optional. An array of GSort Documents. Fields on which to sort returned values

resource_type

String

Optional. Name of a resource_type.

advanced

Boolean

Defaults to False

When true, interpret q with the advanced query syntax

Note:resource_type is an older name for what we now call Query Templates. See the documentation on the Query Templates API to find out what values are valid for resource_type.
Note:A GSearchRequest requires at least one of q, filters, or resource_type
Note:If sort is specified, boosts is ignored as results will be ordered based on sorting rather than relevance calculation which is influenced by boosts.
Examples
{
  "q": "the quick brown fox jumps"
}
{
  "q": "author: \"John Doe\"",
  "advanced": true,
  "limit": 5
}
{
  "q": "a search with paging",
  "offset": 100,
  "limit": 100
}
{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}
{
  "q": "(queries can be fancy AND cool) OR (NOT extravagant)",
  "advanced": true
}

2.1.2. GBoost

Field Name Type Description

field_name

String

Field to rank higher in results. Any dots (".") must be escaped with a preceeding backslash ("\") character or they will be treated as paths to a field and not part of a field name

factor

Floating Point

Factor for weighting results for query matches on the field_name. >1 is higher ranking, <1 is negative boosting. Maximum of 10, minimum of 0

Examples
{
  "field_name": "author",
  "factor": 5
}

2.1.3. GFacet

Field Name Type Description

name

String

A name for this facet which is referenced in the results. If name is omitted, it will default to the value of the field_name property. If more than one facet in a single search request references the same field, a name must be provided.

type

String

One of {"terms", "date_histogram", "numeric_histogram"}

field_name

String

The field to which the facet refers. Any dots (".") must be escaped with a preceeding backslash ("\") character.

size

Integer

For terms and numeric_histogram facets, the number of facet values (buckets) to return. For terms, this is the most common values (buckets with highest count). For numeric_histograms, this is the number of intervals between low and high of the histogram_range to be created

histogram_range

Object

An object containing the following fields:

low: Numeric or date formatted String containing the low value bucket

high: Numeric or date formatted String containing the high value for the last bucket

date_interval

String

Must be one of: {"year", "quarter", "month", "week", "day", "hour", "minute", "second"}. May only be present when the type is "date_histogram". Indicates the unit for the buckets returned within the histogram_range

Note:histogram_range is required when type is "date_histogram" or "numeric_histogram"
Note:size is required when type is "numeric_histogram"
Note:For a terms facet, any values containing more than 10,000 characters will not be tabulated into the results and no buckets containing a value with more than 10,000 characters will be created.
{
  "name": "File Extension",
  "type": "terms",
  "field_name": "extension",
  "size": 10
}
{
  "name": "pub_date",
  "type": "date_histogram",
  "field_name": "http://dublincore\.org/schemas/xmls/qdc/2008/02/11/dcterms\.xsd#created",
  "histogram_range": {
    "low": "2000-01-01",
    "high": "2010-01-01"
  },
  "date_interval": "year"
}
{
  "name": "file size",
  "type": "numeric_histogram",
  "field_name": "https://transfer\.api\.globus\.org/file#size",
  "size": 100,
  "histogram_range": {
    "low": 0,
    "high": 100000000
  }
}

2.1.4. GFilter

Field Name TYpe Description

type

String

One of {"match_any", "match_all", "range"}

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceeding backslash ("\") character.

values

Array of Strings or Objects

The values to evaluate against the field_name.

If type is "match_any" or "match_all" this must be a list of Strings.

If type is "range", this must be a list of Objects each with the fields from and to.

"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.

Note:

"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.

Note:

values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.

Examples
{
  "type": "match_any",
  "field_name": "https://schema\.labs\.datacite\.org/meta/kernel-4\.0/metadata\.xsd#resourceTypeGeneral",
  "values": ["Globus Endpoint"]
}
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "low": "1970-01-01",
      "high": "2015-01-01"
    }
  ]
}
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "low": "*",
      "high": "2014-11-07"
    }
  ]
}
{
  "type": "match_all",
  "field_name": "https://transfer\.api\.globus\.org/endpoint#keywords",
  "values": ["hpc", "internet2", "uchicago"]
}

2.1.5. GSort

Field Name Type Description

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceeding backslash ("\") character.

order

String

Must be one of "asc" or "desc" indicating the ordering of the sort: ascending ("asc") or descending ("desc"). Also, see note on sorting when multiple values are present for a particular field.

Note:A single field may contain multiple values for a single subject such as when an array of values is provided or when there are multiple GMetaEntry structures which refer to the same subject. In this situation, the value to be used during sorting will be the "smallest" when sorting in ascending order and "largest" when sorting in descending order.
Note:Any record which does not contain a value for a field which sorted upon will appear at the end of the sorted list regardless of whether the sort is ascending or descending. If more than one record does not contain a value, the ordering among those records is arbitrary.
Note:For purposes of sorting, a field containing more than 10,000 characters will be considered missing, and will thus be sorted to the end of the list.
Examples
{
  "field_name": "author",
  "order": "asc"
}
{
  "field_name": "path.to.date",
  "order": "desc"
}

2.2. Result Documents

2.2.1. GSearchResult

This is the document type for all results from Search queries.

Field Name Type Description

gmeta

Array

An array of GMetaResult documents, the main body of the result

facet_result

Array

Optional. An array of GFacetResult documents with counts for all facets requested on the search request

offset

Integer

The offset provided on the input search request

count

Integer

The number of results returned; i.e. the size of the gmeta array. May be 0

total

Integer

The total number of matches for the search. May be 0 if no matches are found

Examples
{
  "count": 1,
  "offset": 0,
  "total": 1,
  "gmeta": [
    {
      "content": [
        {
          "alpha": {
            "beta": "gamma"
          }
        }
      ],
      "subject": "http://example.com"
    }
  ]
}

2.2.2. GMetaResult

These are components in a GSearchResult.

A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:

  • visibility information is not exposed; i.e. visible_to is not included

  • metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the content block is always returned as an array in which each element represents data provided by a different source or with different visibility

Field Name Type Description

subject

String

the resource described by this metadata, often a URI

content

Array

an array of GMetaContent documents containing the metadata pertaining to the subject

2.2.3. GFacetResult & GBucket

Table 1. GBucket
Field Name Type Description

value

String or Object

If the bucket represents a single value (e.g. in a "terms" GFacet), the value is provided. If the bucket represents a range of values, then this is an object with "from" and "to" as in a GFilter document This range is assumed to be closed for the "from" value and open on the "to" value as in [from, to)

count

Integer

The number of results in this bucket

{
  "value": ".docx",
  "count": 1234
}
{
  "value": {
    "from": "0",
    "to": "10"
  },
  "count": 0
}
{
  "value": {
    "from": "2011-01-01",
    "to": "2012-01-01"
  },
  "count": 17
}
Table 2. GFacetResult
Field Name Type Description

name

String

must match a name of a GFacet on the search request

buckets

Array

an array of GBucket documents

{
  "name": "extensions",
  "buckets": [
    {
      "@version": "2017-09-01",
      "value": ".docx",
      "count": 1234
    },
    {
      "@version": "2017-09-01",
      "value": ".png",
      "count": 12
    }
  ]
}

3. API Methods

3.1. Simple GET Query

URL

/v1/index/<index_id>/search

Method

GET

HTTP Headers

(optional) Authorization: Bearer <Globus Auth token> 1
If no Authorization is provided, only public metadata will be queried

Query Parameters

See the Query Parameters table below

Response Body

A GSearchResult

1 The token must have the urn:globus:scopes:search.api.globus.org:all or urn:globus:scopes:search.api.globus.org:search scope

Table 3. Query Parameters
Parameter Name Description

q

Required. A string representation of the query to be executed

offset

Zero based offset into the result set, used for paging Default 0, Maximum 10,000

limit

Maximum number of results to return. Default 10, max 10,000

query_template

The name of a pre-defined GSearchRequest to be included in the search criteria.

advanced

For expert users only. When "true" use the advanced form of the query syntax. Default "false"

3.1.1. Examples

Searching

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with a simple query of globus documentation

  • getting the first page of results

curl -XGET 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search?q=globus%20documentation'

Searching

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with an advanced query of type:html content:"globus documentation"

  • getting the second page of results, with a page size of 20

curl -XGET 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search?q=type%3Ahtml%20content%3A%22globus%20documentation%22&advanced=true&limit=20&offset=20'

3.2. Complex POST Query

URL

/v1/index/<index_id>/search

Method

POST

HTTP Headers

(optional) Authorization: Bearer <Globus Auth token> 1
Content-Type: application/json

Request Body

a GSearchRequest document

Response Body

a GSearchResult document

1 As in the GET query, a properly scoped token

3.2.1. GSearchRequest Documents

A GSearchRequest is a JSON document which can encode more information than a simple GET query. Let’s start with some trivial examples which can be encoded in GET queries:

{
  "q": "the quick brown fox jumps"
}
{
  "q": "a search with paging",
  "offset": 100,
  "limit": 100,
  "advanced": false
}

These simple cases are easy enough, but not very compelling.

This example query can’t be encoded in a GET query:

{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}

This requests a search with a simple query, but with the additional conditions that

  • documents will be inspected for their {"path": {"to: {"date": ...}}} values and filtered only to those prior to 2014-11-07

  • the search should generate a "facet", a count of different values for path.to.date

  • the type of facet is a date_histogram

  • the bucket size for the counts should be by year

  • the search should return sorted results (as opposed to sorted by ranking), sorted by path.to.date in ascending order

3.2.2. Examples

To run our sample query from above, we send it via a POST to the API, e.g.

curl -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search' \
    --data '
{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}'

3.3. Query Syntax

Two separate syntaxes for specifying the query are supported in both GET and POST queries: standard and advanced.

The standard query allows only for basic text matching. All queries will be processed, and results which best match the input will be provided. This is appropriate for environments where end-users will be providing the content of the query string, e.g. a searchbox in a web UI.

The advanced syntax is more powerful and supports ranges, regular expressions, matching on particular fields and other more sophisticated capabilities. The advanced syntax is subject to errors in parsing such as badly formed ranges or mis-named fields. As such, it requires expertise in the search language and should not be directly exposed to users without care and forethought.

3.3.1. Inspiring Influence: the ElasticSearch Query String

The Globus Search syntax is based on the ElasticSearch query string syntax and should be familiar to users of ElasticSearch.

There are a couple of notable exceptions:

  • Wildcards in field names are not allowed. So, for example, the query “book.\*:(quick brown)” is not permitted

  • The missing and exists query terms are not permitted

The full grammar for Globus Search queries is as follows, with some notational liberties taken:

EMPTY := ""
SPACE := " "
SPACES := SPACE SPACES | EMPTY
ESCAPE := \\
LPAREN := (
RPAREN := )
FIELD_SEP := :
OPERATOR := + | - | = | & | "|" | > | < | ! | { | } | [ | ] | ^ | ~ | * | ?
OPS = OPERATOR OPS | EMPTY
NEEDS_ESCAPE = OPERATOR | ESCAPE | ' | " | : | ( | ) | /

ESCAPED_SPACE := ESCAPE SPACE
ESCAPED_SPECIAL := ESCAPE NEEDS_ESCAPE | ESCAPED_SPACE

PRINTABLES := <all printable UTF-8 characters, except spaces>
PRINTABLE_STRING := PRINTABLES PRINTABLE_STRING | EMPTY

ESCAPED_DQUOTE := ESCAPE '"'
ESCAPED_SQUOTE := ESCAPE "'"
ESCAPED_SLASH := ESCAPE "/"

# taking a liberty, using `-` to mean "without this char"
PRINTABLE_ESCAPED_DQUOTE :=
    PRINTABLE_STRING - '"' |
    PRINTABLE_ESCAPED_DQUOTE ESCAPED_DQUOTE PRINTABLE_ESCAPED_DQUOTE
# same as above, but with "'" and ESCAPED_SQUOTE
PRINTABLE_ESCAPED_SQUOTE := ...
# same as above, but with "/" and ESCAPED_SLASH
PRINTABLE_ESCAPED_SLASH := ...


QUOTED_STRING := '"' PRINTABLE_ESCAPED_DQUOTE '"' |
                 "'" PRINTABLE_ESCAPED_SQUOTE "'" |
                 "/" PRINTABLE_ESCAPED_SLASH "/"

ATOM += PRINTABLE_STRING | PRINTABLE_STRING ATOM | OPERATOR ATOM |
        ESCAPED_SPECIAL ATOM | ESCAPE

VALUE := OPS QUOTED_STRING OPS | OPS ATOM OPS

VALUE_EXPR := OPS (LPAREN EXPR RPAREN | VALUE) OPS

FIELD := OPS ATOM FIELD_SEP SPACES VALUE_EXPR

EXPR := FIELD | VALUE_EXPR |
        EXPR SPACE SPACES EXPR | SPACES EXPR SPACES

3.3.2. Advanced Query String Usage

Field names in Advanced Query strings may express paths within documents, using dots as the separator between field names. Escaped dots should be used to express "the dot character". For example, given the document

{
  "a": {
    "b": 1
  },
  "a.b": 2
}

we take a.b to refer to the value 1, and a\.b to refer to the value 2.

Quotes

Quoted strings are literal strings. For example, to search for abc.def:ghi, in the advanced query language, you would use "abc.def:ghi" to signify that the contents of the string should not be parsed as an advanced query.

Fields

example: foo searches for instances of the field named example with a value of foo.

Ranges

foo:[X to Y] searches for the field foo with values between X and Y.

Boolean Ops + Grouping

Use parentheses + AND and OR and NOT to combine expressions.

For example, foo:[10 to 20] AND (example:bar OR NOT baz:"[ERROR]")


© 2010- The University of Chicago Legal