Search API Menu
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Transfer Python SDK Helper Pages
  • How To
  • Guides
    Globus Connect Server High Assurance Collections for Protected Data Command Line Interface Premium Storage Connectors Security Modern Research Data Portal
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Globus APIs
  3. Globus Search
  4. API Reference

POST Query

Method

POST

URL

/v1/index/<index_id>/search

Authentication required?

Only for non-public data

Required Roles

None

Request Body

a GSearchRequest document

Response Body

a GSearchResult document

Authentication & Authorization

Tokens for this call must have one of these scopes.

urn:globus:scopes:search.api.globus.org:all
urn:globus:scopes:search.api.globus.org:search

Examples

Query via curl

To run a query, we send it via a POST to the API, e.g.

curl -XPOST \
    -H 'Content-Type: application/json' \
    'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search' \
    --data '
{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}'

Request Schemas

GSearchRequest

This is the main document type for encoding a complex Search query.

Field Name Type Description

q

String

User-supplied query, conforming to the query syntax. Required if there are no filters.

advanced

Boolean

Defaults to False

When true, interpret q with the advanced query syntax

limit

Integer

Optional. Limit the results given to limit many items. Defaults to 10

offset

Integer

Optional. Start at the result numbered offset, in conjunction with limit allows result paging. Defaults to 0.

bypass_visible_to

Boolean

Defaults to False

Allowed for Index Admins only. When true, visible_to restrictions will be ignored for this search query.

result_format_version

String

One of {"2019-08-27", "2017-09-01"}. Defaults to 2019-08-27.

When given as 2017-09-01, results will be returned in the legacy format.

filter_principal_sets

List of Strings

Optional. A list of principal_set names.

The caller’s identity set will be matched against any principal_sets assigned to entry documents, and filtered to matches for any of these strings. If this parameter is provided, at least one match must be present.

filters

Array

Optional. An array of GFilter Documents. Filters to apply to the search

facets

Array

Optional. An array of GFacet Documents. Facets to count on the search

boosts

Array

Optional. An array of GBoost Documents. Fields to increase value in un-sorted searches

sort

Array

Optional. An array of GSort Documents. Fields on which to sort returned values

Note

If sort is specified, boosts is ignored as results will be ordered based on sorting rather than relevance calculation which is influenced by boosts.

Examples

{
  "q": "the quick brown fox jumps"
}
{
  "q": "a search with filtering",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ]
}
{
  "q": "author: \"John Doe\"",
  "advanced": true,
  "limit": 5
}
{
  "q": "a search with paging",
  "offset": 100,
  "limit": 100
}
{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}
{
  "q": "(queries can be fancy AND cool) OR (NOT extravagant)",
  "advanced": true
}

GFilter

A GFilter document is one of several document types which encode a filter. The type of filter is identified by the type field. See the table below for the various filter types.

Type Schema

match_all

GFilterMatch

match_any

GFilterMatch

range

GFilterRange

geo_bounding_box

GFilterGeoBoundingBox

GFilterMatch

A matching filter for finding results which match some set of text terms.

"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.

Field Name Type Description

type

String

One of {"match_any", "match_all"}

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Strings

The values to evaluate against the field_name.

If type is "match_any" or "match_all" this must be a list of Strings.

Note

"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.

Examples
{
  "type": "match_any",
  "field_name": "globus_metadata.resource_type",
  "values": ["Globus Endpoint"]
}
{
  "type": "match_all",
  "field_name": "globus_metadata.keywords",
  "values": ["hpc", "internet2", "uchicago"]
}

GFilterRange

A range filter for finding results which have numeric or date values within a specified range.

Field Name Type Description

type

String

Must have the value "range"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Objects

The values to evaluate against the field_name.

This must be an Array of Objects each with the fields from and to.

Note

values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.

Examples
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    }
  ]
}
{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "10",
      "to": "50"
    }
  ]
}

This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.

{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "0",
      "to": "5"
    },
    {
      "from": "10",
      "to": "*"
    }
  ]
}

GFilterGeoBoundingBox

A bounding box filter for finding geo_shape and geo_point values which intersect with a specified bounding box.

Field Name Type Description

type

String

Must have the value "geo_bounding_box"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

top_left

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

bottom_right

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

Note

top_left is required to be northwest of bottom_right.

Examples
{
  "type": "geo_bounding_box",
  "field_name": "country.center",
  "top_left": {
    "lat": 49.1,
    "lon": -124.9
  },
  "bottom_right": {
    "lat": 24.9,
    "lon": -67.1
  }
}

GFacet

Field Name Type Description

name

String

A name for this facet which is referenced in the results.

If name is omitted, it will default to the value of the field_name property. If more than one facet in a single search request references the same field, a name must be provided.

type

String

One of terms, date_histogram, numeric_histogram, sum, avg

field_name

String

The field to which the facet refers.

Any dots (.) must be escaped with a preceding backslash (\) character.

size

Integer

The number of distinct facet values (buckets) to return.

For terms, size=N limits results to the N most common values (buckets with highest count). For numeric_histograms, this is the number of intervals between low and high of the histogram_range to be created.

Required if type=numeric_histogram. Optional if type=terms. Forbidden otherwise.

missing

Float

The value to use for entries that do not contain the field named by the value of field_name.

By default, missing values will be ignored and do not count towards sums and averages.

Optional if type=sum or type=avg. Forbidden otherwise.

histogram_range

Object

An object containing the following fields:

low: Numeric or date formatted String containing the value at the low end of the histogram range

high: Numeric or date formatted String containing the value at the high end of the histogram range

Required if type=numeric_histogram. Optional if type=date_histogram. Forbidden otherwise.

date_interval

String

Indicates the unit for the buckets returned within the histogram_range

Must be one of: year, quarter, month, week, day, hour, minute, second

Required when type=date_histogram. Forbidden otherwise.

Note

For a terms facet, any values containing more than 10,000 characters will not be tabulated into the results and no buckets containing a value with more than 10,000 characters will be created.

date_histogram faceting requires that the field was detected as a date type. See the Globus Search supported Date Formats to see how data is detected as being a date. The histogram also requires that low and high are both in one of the supported date formats.

{
  "name": "File Extension",
  "type": "terms",
  "field_name": "extension",
  "size": 10
}
{
  "name": "pub_date",
  "type": "date_histogram",
  "field_name": "http://dublincore\\.org/schemas/xmls/qdc/2008/02/11/dcterms\\.xsd#created",
  "histogram_range": {
    "low": "2000-01-01",
    "high": "2010-01-01"
  },
  "date_interval": "year"
}
{
  "name": "file size",
  "type": "numeric_histogram",
  "field_name": "https://transfer\\.api\\.globus\\.org/file#size",
  "size": 100,
  "histogram_range": {
    "low": 0,
    "high": 100000000
  }
}
{
  "name": "calculate total cost",
  "type": "sum",
  "field_name": "price"
}
{
  "name": "calculate average cost per item",
  "type": "avg",
  "missing": 1.20,
  "field_name": "price"
}

GBoost

Field Name Type Description

field_name

String

Field to rank higher in results. Any dots (".") must be escaped with a preceding backslash ("\") character or they will be treated as paths to a field and not part of a field name

factor

Floating Point

Factor for weighting results for query matches on the field_name. >1 is higher ranking, <1 is negative boosting. Maximum of 10, minimum of 0

Examples

{
  "field_name": "author",
  "factor": 5
}

GSort

Field Name Type Description

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

order

String

Must be one of "asc" or "desc" indicating the ordering of the sort: ascending ("asc") or descending ("desc"). Also, see note on sorting when multiple values are present for a particular field.

Note

A single field may contain multiple values for a single subject such as when an array of values is provided or when there are multiple GMetaEntry structures which refer to the same subject. In this situation, the value to be used during sorting will be the "smallest" when sorting in ascending order and "largest" when sorting in descending order.
Note

Any record which does not contain a value for a field which sorted upon will appear at the end of the sorted list regardless of whether the sort is ascending or descending. If more than one record does not contain a value, the ordering among those records is arbitrary.
Note

For purposes of sorting, a field containing more than 10,000 characters will be considered missing, and will thus be sorted to the end of the list.

Examples

{
  "field_name": "author",
  "order": "asc"
}
{
  "field_name": "path.to.date",
  "order": "desc"
}

Response Schemas

GSearchResult

This is the document type for all results from Search queries.

Field Name Type Description

gmeta

Array

An array of GMetaResult documents, the main body of the result

facet_result

Array

Optional. An array of GFacetResult documents with counts for all facets requested on the search request

offset

Integer

The offset provided on the input search request

count

Integer

The number of results returned; i.e. the size of the gmeta array. May be 0

total

Integer

The total number of matches for the search. May be 0 if no matches are found

has_next_page

Boolean

True if there’s another page of results available, False otherwise

Examples

This result is in the 2019-08-27 format for GMetaResult documents.

{
  "@datatype": "GSearchResult",
  "@version": "2017-09-01",
  "count": 1,
  "gmeta": [
    {
      "@datatype": "GMetaResult",
      "@version": "2019-08-27",
      "entries": [
        {
          "content": {
            "cuisine": [
              "mexican"
            ],
            "handle": "salsa-verde",
            "ingredients": [
              {
                "amount": {
                  "number": 10
                },
                "default": "tomatillo",
                "preparation": "simmer 20 minutes",
                "type": "fruit"
              },
              {
                "amount": {
                  "number": 2
                },
                "default": "serrano pepper",
                "preparation": "seeded",
                "substitutes": [
                  "jalapeno",
                  "thai bird chili"
                ],
                "type": "fruit"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "clove"
                },
                "default": "garlic",
                "type": "vegetable"
              },
              {
                "amount": {
                  "number": 0.5
                },
                "default": "yellow onion",
                "type": "vegetable"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "tsp"
                },
                "default": "salt",
                "type": "spice"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "tbsp"
                },
                "default": "coriander",
                "preparation": "ground",
                "substitutes": [
                  "cumin"
                ],
                "type": "spice"
              }
            ],
            "keywords": [
              "salsa",
              "tomatillo",
              "coriander",
              "serrano pepper"
            ],
            "origin": {
              "author": "Diana Kennedy",
              "title": "Regional Mexican Cooking",
              "type": "book"
            }
          },
          "entry_id": null
        }
      ],
      "subject": "https://en.wikipedia.org/wiki/Salsa_verde"
    }
  ],
  "offset": 0,
  "total": 1
}

This result is in the 2017-09-01 format for GMetaResult documents.

{
  "count": 1,
  "offset": 0,
  "total": 1,
  "gmeta": [
    {
      "content": [
        {
          "alpha": {
            "beta": "gamma"
          }
        }
      ],
      "entry_ids": [
        null
      ],
      "subject": "http://example.com"
    }
  ]
}

GMetaResult

These are components in a search result.

A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:

  • visibility information is not exposed; i.e. visible_to is not included

  • metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the result is always returned as an array in which each element represents data provided by a different source or with different visibility

GMetaResult version 2019-08-27

Field Name Type Description

subject

String

the resource described by this metadata, often a URI

entries

Array

An array of objects containing the data pertaining to the subject.

Each object has the fields content, entry_id, and matched_principal_sets. The content is an object with the entry data which was sent to Search, and the entry_id is its ID. If there are any assigned principal_sets for the entry which match the current caller, they will be returned as an array of strings in matched_principal_sets.

{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    },
    {
      "content": {
        "alpha": {
          "beta": "delta"
        }
      },
      "matched_principal_sets": [],
      "entry_id": "with_delta"
    }
  ],
  "subject": "http://example.com"
}

GMetaResult version 2017-09-01 (legacy format)

Field Name Type Description

subject

String

the resource described by this metadata, often a URI

content

Array

an array of objects containing the metadata pertaining to the subject

entry_ids

Array

an array of Entry IDs matching the content such that the entry ID at index i has content found in content[i]. See note below

Note

entry_ids and content are kept separate to maintain backwards compatibility. They can easily be unified with a zip operation in many programming languages.
{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    }
  ],
  "subject": "http://example.com"
}

Note how, in the example below, the new format makes it easier to associate entry_id values with content blobs. Additionally, this new format is more extensible — if new fields are needed in the new format, they can be added as siblings of the content and entry_id fields.

Table 1. Format Comparison
New Format (2019-08-27) Old Format (2017-09-01)
{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    },
    {
      "content": {
        "alpha": {
          "beta": "delta"
        }
      },
      "matched_principal_sets": [],
      "entry_id": "with_delta"
    }
  ],
  "subject": "http://example.com"
}
{
  "content": [
    {
      "alpha": {
        "beta": "gamma"
      }
    },
    {
      "alpha": {
        "beta": "delta"
      }
    }
  ],
  "entry_ids": [
    null,
    "with_delta"
  ],
  "subject": "http://example.com"
}

each entry is a complete, standalone subdocument

entry_ids needs to be zipped with content to make sense of the structure

GBucket

Field Name Type Description

value

String or Object

If the bucket represents a single value (e.g. in a "terms" GFacet), the value is provided. If the bucket represents a range of values, then this is an object with "from" and "to" as in a GFilter document This range is assumed to be closed for the "from" value and open on the "to" value as in [from, to)

count

Integer

The number of results in this bucket

{
  "value": ".docx",
  "count": 1234
}
{
  "value": {
    "from": "0",
    "to": "10"
  },
  "count": 0
}
{
  "value": {
    "from": "2011-01-01",
    "to": "2012-01-01"
  },
  "count": 17
}

GFacetResult

Field Name Type Description

name

String

Name of the GFacet in the search request

value

Float

Result of the GFacet if it was a sum or avg facet

buckets

Array

An array of GBucket documents if it was a terms, numeric_histogram or date_histgram facet

{
  "name": "extensions",
  "buckets": [
    {
      "@version": "2017-09-01",
      "value": ".docx",
      "count": 1234
    },
    {
      "@version": "2017-09-01",
      "value": ".png",
      "count": 12
    }
  ]
}
{
  "name": "calculations",
  "value": 24.5
}
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
© 2010- The University of Chicago Legal Privacy Accessibility