Search API Menu

Globus SearchOverviewAPI Usage & BasicsIngestQueryTypes, Type Detection, and SchemasError Handling
API Reference
Create or Update EntryDelete by QueryDelete by SubjectDelete EntryGet EntryGET QueryGet SubjectGet TaskIngestPOST QueryQuery Template APIsScroll QueryShow IndexTask List
Guides
Role Based Filtering
Globus Search LimitsAPI Change History
Globus Docs
  • APIs
    • Auth
    • Transfer
    • Groups
    • Search
    • Python SDK
    • Helper Pages
  • How To
  • Guides
    • Globus Connect Server Installation Guides
    • High Assurance Collections for Protected Data
    • Management Console Guide
    • Command Line Interface
    • Premium Storage Connectors
    • Security
    • Modern Research Data Portal
  • Support
    • FAQs
    • Mailing Lists
    • Contact Us
    • Check Support Tickets
  1. Home
  2. Globus APIs
  3. Globus Search
  4. API Reference

POST Query

Method

POST

URL

/v1/index/<index_id>/search

Authentication required?

Only for non-public data

Required Roles

None

Request Body

a GSearchRequest document

Response Body

a GSearchResult document

Authentication & Authorization

Tokens for this call must have one of these scopes.

urn:globus:scopes:search.api.globus.org:all
urn:globus:scopes:search.api.globus.org:search

Examples

Query via curl

To run a query, we send it via a POST to the API, e.g.

curl -XPOST \
    -H 'Content-Type: application/json' \
    'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search' \
    --data '
{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}'

Request Schemas

GSearchRequest

This is the main document type for encoding a complex Search query.

Field Name Type Description

q

String

User-supplied query, conforming to the query syntax. Required if there are no filters.

advanced

Boolean

Defaults to False

When true, interpret q with the advanced query syntax

limit

Integer

Optional. Limit the results given to limit many items. Defaults to 10

offset

Integer

Optional. Start at the result numbered offset, in conjunction with limit allows result paging. Defaults to 0.

query_template

String

Optional. Name of a query_template

bypass_visible_to

Boolean

Defaults to False

Allowed for Index Admins only. When true, visible_to restrictions will be ignored for this search query.

result_format_version

String

One of {"2019-08-27", "2017-09-01"}. Defaults to 2019-08-27.

When given as 2017-09-01, results will be returned in the legacy format.

filter_principal_sets

List of Strings

Optional. A list of principal_set names.

The caller’s identity set will be matched against any principal_sets assigned to entry documents, and filtered to matches for any of these strings. If this parameter is provided, at least one match must be present.

filters

Array

Optional. An array of GFilter Documents. Filters to apply to the search

facets

Array

Optional. An array of GFacet Documents. Facets to count on the search

boosts

Array

Optional. An array of GBoost Documents. Fields to increase value in un-sorted searches

sort

Array

Optional. An array of GSort Documents. Fields on which to sort returned values

Note

If sort is specified, boosts is ignored as results will be ordered based on sorting rather than relevance calculation which is influenced by boosts.

Examples

Simple Query

{
  "q": "the quick brown fox jumps"
}

Search with Filtering

{
  "q": "a search with filtering",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ]
}

Advanced Query with Limit

{
  "q": "author: \"John Doe\"",
  "advanced": true,
  "limit": 5
}

Paginated Search using Limit+Offset

{
  "q": "a search with paging",
  "offset": 100,
  "limit": 100
}

Search with Filtering and Faceting

{
  "q": "a search with filtering and faceting",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ],
  "facets": [
    {
      "name": "Publication Date",
      "field_name": "path.to.date",
      "type": "date_histogram",
      "date_interval": "year"
    }
  ],
  "sort": [
    {
      "field_name": "path.to.date",
      "order": "asc"
    }
  ]
}

Advanced Query with Boolean Operators

{
  "q": "(queries can be fancy AND cool) OR (NOT extravagant)",
  "advanced": true
}

GFilter

Field Name TYpe Description

type

String

One of {"match_any", "match_all", "range"}

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Strings or Objects

The values to evaluate against the field_name.

If type is "match_any" or "match_all" this must be a list of Strings.

If type is "range", this must be a list of Objects each with the fields from and to.

"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.

Note

"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.

Note

values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.

Examples

Example 1

{
  "type": "match_any",
  "field_name": "https://schema\\.labs\\.datacite\\.org/meta/kernel-4\\.0/metadata\\.xsd#resourceTypeGeneral",
  "values": ["Globus Endpoint"]
}

Example 2

{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    }
  ]
}

Example 3

{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "*",
      "to": "2014-11-07"
    }
  ]
}

Example 4

{
  "type": "match_all",
  "field_name": "https://transfer\\.api\\.globus\\.org/endpoint#keywords",
  "values": ["hpc", "internet2", "uchicago"]
}

GFacet

Field Name Type Description

name

String

A name for this facet which is referenced in the results. If name is omitted, it will default to the value of the field_name property. If more than one facet in a single search request references the same field, a name must be provided.

type

String

One of {"terms", "date_histogram", "numeric_histogram", "sum", "avg"}

field_name

String

The field to which the facet refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

size

Integer

For terms and numeric_histogram facets, the number of facet values (buckets) to return. For terms, this is the most common values (buckets with highest count). For numeric_histograms, this is the number of intervals between low and high of the histogram_range to be created

missing

Float

For sum and avg facets, the value to use for entries that do not contain the field named by the value of 'field_name'. This field is optional. By default, missing values will be ignored and do not count towards sums and averages.

histogram_range

Object

For "date_histogram" and "numeric_histogram" facets, an object containing the following fields:

low: Numeric or date formatted String containing the low value bucket

high: Numeric or date formatted String containing the high value for the last bucket

Required when type is "numeric_histogram"

date_interval

String

Must be one of: {"year", "quarter", "month", "week", "day", "hour", "minute", "second"}. Indicates the unit for the buckets returned within the histogram_range

Required when the type is "date_histogram".

Note

For a terms facet, any values containing more than 10,000 characters will not be tabulated into the results and no buckets containing a value with more than 10,000 characters will be created.

date_histogram faceting requires that the field was detected as a date type. See the Globus Search supported Date Formats to see how data is detected as being a date. The histogram also requires that low and high are both in one of the supported date formats.

Example 1

{
  "name": "File Extension",
  "type": "terms",
  "field_name": "extension",
  "size": 10
}

Example 2

{
  "name": "pub_date",
  "type": "date_histogram",
  "field_name": "http://dublincore\\.org/schemas/xmls/qdc/2008/02/11/dcterms\\.xsd#created",
  "histogram_range": {
    "low": "2000-01-01",
    "high": "2010-01-01"
  },
  "date_interval": "year"
}

Example 3

{
  "name": "file size",
  "type": "numeric_histogram",
  "field_name": "https://transfer\\.api\\.globus\\.org/file#size",
  "size": 100,
  "histogram_range": {
    "low": 0,
    "high": 100000000
  }
}

Example 4

{
  "name": "calculate total cost",
  "type": "sum",
  "field_name": "price"
}

Example 5

{
  "name": "calculate average cost per item",
  "type": "avg",
  "missing": 1.20,
  "field_name": "price"
}

GBoost

Field Name Type Description

field_name

String

Field to rank higher in results. Any dots (".") must be escaped with a preceding backslash ("\") character or they will be treated as paths to a field and not part of a field name

factor

Floating Point

Factor for weighting results for query matches on the field_name. >1 is higher ranking, <1 is negative boosting. Maximum of 10, minimum of 0

Examples

Example 1

{
  "field_name": "author",
  "factor": 5
}

GSort

Field Name Type Description

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

order

String

Must be one of "asc" or "desc" indicating the ordering of the sort: ascending ("asc") or descending ("desc"). Also, see note on sorting when multiple values are present for a particular field.

Note

A single field may contain multiple values for a single subject such as when an array of values is provided or when there are multiple GMetaEntry structures which refer to the same subject. In this situation, the value to be used during sorting will be the "smallest" when sorting in ascending order and "largest" when sorting in descending order.
Note

Any record which does not contain a value for a field which sorted upon will appear at the end of the sorted list regardless of whether the sort is ascending or descending. If more than one record does not contain a value, the ordering among those records is arbitrary.
Note

For purposes of sorting, a field containing more than 10,000 characters will be considered missing, and will thus be sorted to the end of the list.

Examples

Example 1

{
  "field_name": "author",
  "order": "asc"
}

Example 2

{
  "field_name": "path.to.date",
  "order": "desc"
}

Response Schemas

GSearchResult

This is the document type for all results from Search queries.

Field Name Type Description

gmeta

Array

An array of GMetaResult documents, the main body of the result

facet_result

Array

Optional. An array of GFacetResult documents with counts for all facets requested on the search request

offset

Integer

The offset provided on the input search request

count

Integer

The number of results returned; i.e. the size of the gmeta array. May be 0

total

Integer

The total number of matches for the search. May be 0 if no matches are found

has_next_page

Boolean

True if there’s another page of results available, False otherwise

Examples

Example 1

This result is in the 2019-08-27 format for GMetaResult documents.

{
  "@datatype": "GSearchResult",
  "@version": "2017-09-01",
  "count": 1,
  "gmeta": [
    {
      "@datatype": "GMetaResult",
      "@version": "2019-08-27",
      "entries": [
        {
          "content": {
            "cuisine": [
              "mexican"
            ],
            "handle": "salsa-verde",
            "ingredients": [
              {
                "amount": {
                  "number": 10
                },
                "default": "tomatillo",
                "preparation": "simmer 20 minutes",
                "type": "fruit"
              },
              {
                "amount": {
                  "number": 2
                },
                "default": "serrano pepper",
                "preparation": "seeded",
                "substitutes": [
                  "jalapeno",
                  "thai bird chili"
                ],
                "type": "fruit"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "clove"
                },
                "default": "garlic",
                "type": "vegetable"
              },
              {
                "amount": {
                  "number": 0.5
                },
                "default": "yellow onion",
                "type": "vegetable"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "tsp"
                },
                "default": "salt",
                "type": "spice"
              },
              {
                "amount": {
                  "number": 2,
                  "unit": "tbsp"
                },
                "default": "coriander",
                "preparation": "ground",
                "subsitutes": [
                  "cumin"
                ],
                "type": "spice"
              }
            ],
            "keywords": [
              "salsa",
              "tomatillo",
              "coriander",
              "serrano pepper"
            ],
            "origin": {
              "author": "Diana Kennedy",
              "title": "Regional Mexican Cooking",
              "type": "book"
            }
          },
          "entry_id": null
        }
      ],
      "subject": "https://en.wikipedia.org/wiki/Salsa_verde"
    }
  ],
  "offset": 0,
  "total": 1
}

Example 2

This result is in the 2017-09-01 format for GMetaResult documents.

{
  "count": 1,
  "offset": 0,
  "total": 1,
  "gmeta": [
    {
      "content": [
        {
          "alpha": {
            "beta": "gamma"
          }
        }
      ],
      "entry_ids": [
        null
      ],
      "subject": "http://example.com"
    }
  ]
}

GMetaResult

These are components in a search result.

A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:

  • visibility information is not exposed; i.e. visible_to is not included

  • metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the result is always returned as an array in which each element represents data provided by a different source or with different visibility

GMetaResult version 2019-08-27

Field Name Type Description

subject

String

the resource described by this metadata, often a URI

entries

Array

An array of objects containing the data pertaining to the subject.

Each object has the fields content, entry_id, and matched_principal_sets. The content is an object with the entry data which was sent to Search, and the entry_id is its ID. If there are any assigned principal_sets for the entry which match the current caller, they will be returned as an array of strings in matched_principal_sets.

Example 1

{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    },
    {
      "content": {
        "alpha": {
          "beta": "delta"
        }
      },
      "matched_principal_sets": [],
      "entry_id": "with_delta"
    }
  ],
  "subject": "http://example.com"
}

GMetaResult version 2017-09-01 (legacy format)

Field Name Type Description

subject

String

the resource described by this metadata, often a URI

content

Array

an array of objects containing the metadata pertaining to the subject

entry_ids

Array

an array of Entry IDs matching the content such that the entry ID at index i has content found in content[i]. See note below

Note

entry_ids and content are kept separate to maintain backwards compatibility. They can easily be unified with a zip operation in many programming languages.

Example 1

{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    }
  ],
  "subject": "http://example.com"
}

Side-by-side comparison with the new format

Note how, in the example below, the new format makes it easier to associate entry_id values with content blobs. Additionally, this new format is more extensible — if new fields are needed in the new format, they can be added as siblings of the content and entry_id fields.

Table 1. Format Comparison
New Format (2019-08-27) Old Format (2017-09-01)
{
  "entries": [
    {
      "content": {
        "alpha": {
          "beta": "gamma"
        }
      },
      "matched_principal_sets": [],
      "entry_id": null
    },
    {
      "content": {
        "alpha": {
          "beta": "delta"
        }
      },
      "matched_principal_sets": [],
      "entry_id": "with_delta"
    }
  ],
  "subject": "http://example.com"
}
{
  "content": [
    {
      "alpha": {
        "beta": "gamma"
      }
    },
    {
      "alpha": {
        "beta": "delta"
      }
    }
  ],
  "entry_ids": [
    null,
    "with_delta"
  ],
  "subject": "http://example.com"
}

each entry is a complete, standalone subdocument

entry_ids needs to be zipped with content to make sense of the structure

GBucket

Field Name Type Description

value

String or Object

If the bucket represents a single value (e.g. in a "terms" GFacet), the value is provided. If the bucket represents a range of values, then this is an object with "from" and "to" as in a GFilter document This range is assumed to be closed for the "from" value and open on the "to" value as in [from, to)

count

Integer

The number of results in this bucket

Example 1

{
  "value": ".docx",
  "count": 1234
}

Example 2

{
  "value": {
    "from": "0",
    "to": "10"
  },
  "count": 0
}

Example 3

{
  "value": {
    "from": "2011-01-01",
    "to": "2012-01-01"
  },
  "count": 17
}

GFacetResult

Field Name Type Description

name

String

Name of the GFacet in the search request

value

Float

Result of the GFacet if it was a sum or avg facet

buckets

Array

An array of GBucket documents if it was a terms, numeric_histogram or date_histgram facet

Example 1

{
  "name": "extensions",
  "buckets": [
    {
      "@version": "2017-09-01",
      "value": ".docx",
      "count": 1234
    },
    {
      "@version": "2017-09-01",
      "value": ".png",
      "count": 12
    }
  ]
}

Example 2

{
  "name": "calculations",
  "value": 24.5
}
Globus SearchOverviewAPI Usage & BasicsIngestQueryTypes, Type Detection, and SchemasError Handling
API Reference
Create or Update EntryDelete by QueryDelete by SubjectDelete EntryGet EntryGET QueryGet SubjectGet TaskIngestPOST QueryQuery Template APIsScroll QueryShow IndexTask List
Guides
Role Based Filtering
Globus Search LimitsAPI Change History

© 2010- The University of Chicago Legal