POST Query
Method |
POST |
URL |
/v1/index/<index_id>/search |
Authentication required? |
Only for non-public data |
Required Roles |
None |
Request Body |
a GSearchRequest document |
Response Body |
a GSearchResult document |
Authentication & Authorization
Tokens for this call must have one of these scopes.
urn:globus:auth:scope:search.api.globus.org:all urn:globus:auth:scope:search.api.globus.org:search
Examples
To run a query, we send it via a POST
to the API, e.g.
curl -XPOST \
-H 'Content-Type: application/json' \
'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search' \
--data '
{
"q": "a search with filtering and faceting",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "path.to.date",
"type": "date_histogram",
"date_interval": "year"
}
],
"sort": [
{
"field_name": "path.to.date",
"order": "asc"
}
]
}'
Request Schemas
GSearchRequest
This is the main document type for encoding a complex Search query.
Field Name | Type | Description |
---|---|---|
q |
String |
User-supplied query, conforming to the query syntax. Required if there are no filters. |
advanced |
Boolean |
Defaults to False When true, interpret q with the advanced query syntax |
limit |
Integer |
Optional. Limit the results given to limit many items. Defaults to 10 |
offset |
Integer |
Optional. Start at the result numbered offset, in conjunction with limit allows result paging. Defaults to 0. |
bypass_visible_to |
Boolean |
Defaults to False Allowed for Index Admins only. When true, visible_to restrictions will be ignored for this search query. |
filter_principal_sets |
List of Strings |
Optional. A list of The caller’s identity set will be matched against any |
filters |
Array |
Optional. An array of GFilter Documents. Filters to apply to the search |
facets |
Array |
Optional. An array of GFacet Documents. Facets to count on the search |
boosts |
Array |
Optional. An array of GBoost Documents. Fields to increase value in un-sorted searches |
sort |
Array |
Optional. An array of GSort Documents. Fields on which to sort returned values |
Examples
{
"q": "the quick brown fox jumps"
}
{
"q": "a search with filtering",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
]
}
{
"q": "author: \"John Doe\"",
"advanced": true,
"limit": 5
}
{
"q": "a search with paging",
"offset": 100,
"limit": 100
}
{
"q": "a search with filtering and faceting",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "path.to.date",
"type": "date_histogram",
"date_interval": "year"
}
],
"sort": [
{
"field_name": "path.to.date",
"order": "asc"
}
]
}
{
"q": "(queries can be fancy AND cool) OR (NOT extravagant)",
"advanced": true
}
GFilter
A GFilter document is one of several document types which encode a filter.
The type of filter is identified by the type
field.
See the table below for the various filter types.
Type | Schema |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All filters support a post_filter
field. Note that post_filter
is only
valid on filters when they are in the top level filters
array of a request.
When filters are nested under and
, or
, or not
filters, post_filter
is
no longer valid.
GFilterMatch
A matching filter for finding results which match some set of text terms.
"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.
Field Name | Type | Description |
---|---|---|
type |
String |
One of |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Strings or Booleans |
The values to evaluate against the field_name. If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True for |
"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.
Examples
{
"type": "match_any",
"field_name": "globus_metadata.resource_type",
"values": [
"Globus Endpoint"
]
}
{
"type": "match_all",
"field_name": "globus_metadata.keywords",
"values": [
"hpc",
"internet2",
"uchicago"
]
}
{
"type": "match_any",
"field_name": "globus_metadata.snorkels",
"values": [
"few",
"many",
true
]
}
This filter is only valid if globus_metadata.snorkels
is a string field
because string fields can contain boolean values.
If it is a boolean field (which cannot contain string values), the query will
fail with an error regarding the improper mapping of "few"
and "many"
onto
a boolean field.
GFilterRange
A range filter for finding results which have numeric or date values within a specified range.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Objects |
The values to evaluate against the Each object has the fields |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.
Examples
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
}
]
}
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "10",
"to": "50"
}
]
}
This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "0",
"to": "5"
},
{
"from": "10",
"to": "*"
}
]
}
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
},
{
"gte": "2015-01-01",
"lte": "2016-01-01"
},
{
"gt": "2016-01-15",
"lt": "*"
}
]
}
GFilterGeoBoundingBox
A bounding box filter for finding geo_shape
and geo_point
values which
intersect with a specified bounding box.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
top_left |
Object |
An object describing a coordinate pair. It must contain the keys |
bottom_right |
Object |
An object describing a coordinate pair. It must contain the keys |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
top_left
is required to be northwest of bottom_right
.
GFilterGeoShape
A geo filter for finding geo_shape
and geo_point
values which
intersect with or are contained within a given shape.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
shape |
Object |
A GeoJSON formatted Geometry. See note below on supported geometries. |
relation |
String |
The shape relationship to test. One of |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
Supported Geometries
Only two-dimensional GeoJSON data are allowed in geo_shape
filters.
That means that coordinates should be encoded as JSON arrays of length 2.
Globus Search only supports filters using GeoJSON Polygons. Furthermore, Polygons are restricted to simple polygons, consisting of only one coordinate ring. This means that polygons with internal cut-outs are forbidden.
GFilterExists
An "existence" filter which checks if a field is present in a document with a
non-null value.
Note that a field being present but with a value of null
is considered the
same, under exists filters, as the field being absent from the document.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
GFilterLike
A "like" filter which checks if a field matches a "like-expression". Like expressions are matching strings containing the wildcard characters:
-
*
matches any number of characters -
?
matches any one character
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. It must be a text field. |
value |
String |
The filter expression to apply as a match. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
Examples
The following filter finds documents where the field filename
contains a
string ending in .csv
.
{
"type": "like",
"field_name": "filename",
"value": "*.csv"
}
Note that this does not technically guarantee that the filename
ends with
.csv
. For example, it is possible for the filter to match on a value like
"filename": "foo.csv bar"
.
GFilterNot
A "not" filter for inverting any other valid filter.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filter |
Object |
Any valid GFilter object. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
GFilterAnd
An "and" filter for joining any other valid filters. In order for an "and" filter to match on documents, all of the filters it contains must match.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
GFilterOr
An "or" filter for joining any other valid filters. In order for an "or" filter to match on documents, at least one of the filters it contains must match.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
Examples
The following filter finds documents where either the author.institution
or
the dataset.institution
is uchicago.edu
. One or both can be a match:
{
"type": "or",
"filter": [
{
"type": "match_any",
"field_name": "author.institution",
"values": [
"uchicago.edu"
]
},
{
"type": "match_any",
"field_name": "dataset.institution",
"values": [
"uchicago.edu"
]
}
]
}
GFacet
Field Name | Type | Description |
---|---|---|
name |
String |
A name for this facet which is referenced in the results. If name is omitted, it will default to the value of the |
type |
String |
One of |
field_name |
String |
The field to which the facet refers. Any dots ( |
size |
Integer |
The number of distinct facet values (buckets) to return. For terms, Required if |
missing |
Float |
The value to use for entries that do not contain the field named by the value of By default, missing values will be ignored and do not count towards sums and averages. Optional if |
histogram_range |
Object |
An object containing the following fields:
Required if |
date_interval |
String |
Indicates the unit for the buckets returned within the Must be one of: Required when |
terms
facet, any values containing more than 10,000 characters will not be tabulated into the results and no buckets containing a value with more than 10,000 characters will be created.
date_histogram
faceting requires that the field was detected as a date type.
See the
Globus Search supported Date Formats
to see how data is detected as being a date. The histogram also requires that
low
and high
are both in one of the supported date formats.
{
"name": "File Extension",
"type": "terms",
"field_name": "extension",
"size": 10
}
{
"name": "pub_date",
"type": "date_histogram",
"field_name": "http://dublincore\\.org/schemas/xmls/qdc/2008/02/11/dcterms\\.xsd#created",
"histogram_range": {
"low": "2000-01-01",
"high": "2010-01-01"
},
"date_interval": "year"
}
{
"name": "file size",
"type": "numeric_histogram",
"field_name": "https://transfer\\.api\\.globus\\.org/file#size",
"size": 100,
"histogram_range": {
"low": 0,
"high": 100000000
}
}
{
"name": "calculate total cost",
"type": "sum",
"field_name": "price"
}
{
"name": "calculate average cost per item",
"type": "avg",
"missing": 1.2,
"field_name": "price"
}
GBoost
Field Name | Type | Description |
---|---|---|
field_name |
String |
Field to rank higher in results. Any dots (".") must be escaped with a preceding backslash ("\") character or they will be treated as paths to a field and not part of a field name |
factor |
Floating Point |
Factor for weighting results for query matches on the field_name. >1 is higher ranking, <1 is negative boosting. Maximum of 10, minimum of 0 |
GSort
Field Name | Type | Description |
---|---|---|
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
order |
String |
Must be one of "asc" or "desc" indicating the ordering of the sort: ascending ("asc") or descending ("desc"). Also, see note on sorting when multiple values are present for a particular field. |
Response Schemas
GSearchResult
This is the document type for all results from Search queries.
Field Name | Type | Description |
---|---|---|
gmeta |
Array |
An array of GMetaResult documents, the main body of the result |
facet_results |
Array |
Optional. An array of GFacetResult documents with counts for all facets requested on the search request |
offset |
Integer |
The offset provided on the input search request |
count |
Integer |
The number of results returned; i.e. the size of the gmeta array. May be 0 |
total |
Integer |
The total number of matches for the search. May be 0 if no matches are found |
has_next_page |
Boolean |
True if there’s another page of results available, False otherwise |
Examples
This result is in the 2019-08-27 format for GMetaResult documents.
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{
"content": {
"cuisine": [
"mexican"
],
"handle": "salsa-verde",
"ingredients": [
{
"amount": {
"number": 10
},
"default": "tomatillo",
"preparation": "simmer 20 minutes",
"type": "fruit"
},
{
"amount": {
"number": 2
},
"default": "serrano pepper",
"preparation": "seeded",
"substitutes": [
"jalapeno",
"thai bird chili"
],
"type": "fruit"
},
{
"amount": {
"number": 2,
"unit": "clove"
},
"default": "garlic",
"type": "vegetable"
},
{
"amount": {
"number": 0.5
},
"default": "yellow onion",
"type": "vegetable"
},
{
"amount": {
"number": 2,
"unit": "tsp"
},
"default": "salt",
"type": "spice"
},
{
"amount": {
"number": 2,
"unit": "tbsp"
},
"default": "coriander",
"preparation": "ground",
"substitutes": [
"cumin"
],
"type": "spice"
}
],
"keywords": [
"salsa",
"tomatillo",
"coriander",
"serrano pepper"
],
"origin": {
"author": "Diana Kennedy",
"title": "Regional Mexican Cooking",
"type": "book"
}
},
"entry_id": null
}
],
"subject": "https://en.wikipedia.org/wiki/Salsa_verde"
}
],
"offset": 0,
"total": 1
}
This result is in the 2017-09-01 format for GMetaResult documents.
{
"count": 1,
"offset": 0,
"total": 1,
"gmeta": [
{
"content": [
{
"alpha": {
"beta": "gamma"
}
}
],
"entry_ids": [
null
],
"subject": "http://example.com"
}
]
}
GMetaResult
These are components in a search result.
A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:
-
visibility information is not exposed; i.e. visible_to is not included
-
metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the result is always returned as an array in which each element represents data provided by a different source or with different visibility
GMetaResult
Field Name | Type | Description |
---|---|---|
subject |
String |
the resource described by this metadata, often a URI |
entries |
Array |
An array of objects containing the data pertaining to the subject. Each object has the fields |
{
"entries": [
{
"content": {
"alpha": {
"beta": "gamma"
}
},
"matched_principal_sets": [],
"entry_id": null
},
{
"content": {
"alpha": {
"beta": "delta"
}
},
"matched_principal_sets": [],
"entry_id": "with_delta"
}
],
"subject": "http://example.com"
}
GBucket
Field Name | Type | Description |
---|---|---|
value |
String or Object |
If the bucket represents a single value (e.g. in a "terms" `GFacet`), the value is provided. If the bucket represents a range of values, then this is an object with "from" and "to" as in a `GFilter` document This range is assumed to be closed for the "from" value and open on the "to" value as in [from, to) |
count |
Integer |
The number of results in this bucket |
{
"value": ".docx",
"count": 1234
}
{
"value": {
"from": "0",
"to": "10"
},
"count": 0
}
{
"value": {
"from": "2011-01-01",
"to": "2012-01-01"
},
"count": 17
}
GFacetResult
Field Name | Type | Description |
---|---|---|
name |
String |
Name of the `GFacet` in the search request |
value |
Float |
Result of the `GFacet` if it was a sum or avg facet |
buckets |
Array |
An array of GBucket documents if it was a terms, numeric_histogram or date_histogram facet |
{
"name": "extensions",
"buckets": [
{
"@version": "2017-09-01",
"value": ".docx",
"count": 1234
},
{
"@version": "2017-09-01",
"value": ".png",
"count": 12
}
]
}
{
"name": "calculations",
"value": 24.5
}