POST Query
Method |
POST |
URL |
/v1/index/<index_id>/search |
Authentication required? |
Only for non-public data |
Required Roles |
None |
Request Body |
a GSearchRequest document |
Response Body |
a GSearchResult document |
Authentication & Authorization
Tokens for this call must have one of these scopes.
urn:globus:scopes:search.api.globus.org:all urn:globus:scopes:search.api.globus.org:search
Examples
To run a query, we send it via a POST
to the API, e.g.
curl -XPOST \
-H 'Content-Type: application/json' \
'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/search' \
--data '
{
"q": "a search with filtering and faceting",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "path.to.date",
"type": "date_histogram",
"date_interval": "year"
}
],
"sort": [
{
"field_name": "path.to.date",
"order": "asc"
}
]
}'
Request Schemas
GSearchRequest
This is the main document type for encoding a complex Search query.
Field Name | Type | Description |
---|---|---|
q |
String |
User-supplied query, conforming to the query syntax. Required if there are no filters. |
advanced |
Boolean |
Defaults to False When true, interpret q with the advanced query syntax |
limit |
Integer |
Optional. Limit the results given to limit many items. Defaults to 10 |
offset |
Integer |
Optional. Start at the result numbered offset, in conjunction with limit allows result paging. Defaults to 0. |
query_template |
String |
Optional. Name of a query_template |
bypass_visible_to |
Boolean |
Defaults to False Allowed for Index Admins only. When true, visible_to restrictions will be ignored for this search query. |
result_format_version |
String |
One of {"2019-08-27", "2017-09-01"}. Defaults to 2019-08-27. When given as 2017-09-01, results will be returned in the legacy format. |
filter_principal_sets |
List of Strings |
Optional. A list of The caller’s identity set will be matched against any |
filters |
Array |
Optional. An array of GFilter Documents. Filters to apply to the search |
facets |
Array |
Optional. An array of GFacet Documents. Facets to count on the search |
boosts |
Array |
Optional. An array of GBoost Documents. Fields to increase value in un-sorted searches |
sort |
Array |
Optional. An array of GSort Documents. Fields on which to sort returned values |
Examples
{
"q": "the quick brown fox jumps"
}
{
"q": "a search with filtering",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
]
}
{
"q": "author: \"John Doe\"",
"advanced": true,
"limit": 5
}
{
"q": "a search with paging",
"offset": 100,
"limit": 100
}
{
"q": "a search with filtering and faceting",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "path.to.date",
"type": "date_histogram",
"date_interval": "year"
}
],
"sort": [
{
"field_name": "path.to.date",
"order": "asc"
}
]
}
{
"q": "(queries can be fancy AND cool) OR (NOT extravagant)",
"advanced": true
}
GFilter
Field Name | TYpe | Description |
---|---|---|
type |
String |
One of {"match_any", "match_all", "range"} |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Strings or Objects |
The values to evaluate against the field_name. If type is "match_any" or "match_all" this must be a list of Strings. If type is "range", this must be a list of Objects each with the fields from and to. |
"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.
"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.
values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.
Examples
{
"type": "match_any",
"field_name": "https://schema\\.labs\\.datacite\\.org/meta/kernel-4\\.0/metadata\\.xsd#resourceTypeGeneral",
"values": ["Globus Endpoint"]
}
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
}
]
}
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
{
"type": "match_all",
"field_name": "https://transfer\\.api\\.globus\\.org/endpoint#keywords",
"values": ["hpc", "internet2", "uchicago"]
}
GFacet
Field Name | Type | Description |
---|---|---|
name |
String |
A name for this facet which is referenced in the results. If name is omitted, it will default to the value of the field_name property. If more than one facet in a single search request references the same field, a name must be provided. |
type |
String |
One of {"terms", "date_histogram", "numeric_histogram", "sum", "avg"} |
field_name |
String |
The field to which the facet refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
size |
Integer |
For terms and numeric_histogram facets, the number of facet values (buckets) to return. For terms, this is the most common values (buckets with highest count). For numeric_histograms, this is the number of intervals between low and high of the histogram_range to be created |
missing |
Float |
For sum and avg facets, the value to use for entries that do not contain the field named by the value of 'field_name'. This field is optional. By default, missing values will be ignored and do not count towards sums and averages. |
histogram_range |
Object |
For "date_histogram" and "numeric_histogram" facets, an object containing the following fields: low: Numeric or date formatted String containing the low value bucket high: Numeric or date formatted String containing the high value for the last bucket Required when type is "numeric_histogram" |
date_interval |
String |
Must be one of: {"year", "quarter", "month", "week", "day", "hour", "minute", "second"}. Indicates the unit for the buckets returned within the histogram_range Required when the type is "date_histogram". |
terms
facet, any values containing more than 10,000 characters will not be tabulated into the results and no buckets containing a value with more than 10,000 characters will be created.
date_histogram
faceting requires that the field was detected as a date type.
See the
Globus Search supported Date Formats
to see how data is detected as being a date. The histogram also requires that
low
and high
are both in one of the supported date formats.
{
"name": "File Extension",
"type": "terms",
"field_name": "extension",
"size": 10
}
{
"name": "pub_date",
"type": "date_histogram",
"field_name": "http://dublincore\\.org/schemas/xmls/qdc/2008/02/11/dcterms\\.xsd#created",
"histogram_range": {
"low": "2000-01-01",
"high": "2010-01-01"
},
"date_interval": "year"
}
{
"name": "file size",
"type": "numeric_histogram",
"field_name": "https://transfer\\.api\\.globus\\.org/file#size",
"size": 100,
"histogram_range": {
"low": 0,
"high": 100000000
}
}
{
"name": "calculate total cost",
"type": "sum",
"field_name": "price"
}
{
"name": "calculate average cost per item",
"type": "avg",
"missing": 1.20,
"field_name": "price"
}
GBoost
Field Name | Type | Description |
---|---|---|
field_name |
String |
Field to rank higher in results. Any dots (".") must be escaped with a preceding backslash ("\") character or they will be treated as paths to a field and not part of a field name |
factor |
Floating Point |
Factor for weighting results for query matches on the field_name. >1 is higher ranking, <1 is negative boosting. Maximum of 10, minimum of 0 |
GSort
Field Name | Type | Description |
---|---|---|
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
order |
String |
Must be one of "asc" or "desc" indicating the ordering of the sort: ascending ("asc") or descending ("desc"). Also, see note on sorting when multiple values are present for a particular field. |
Response Schemas
GSearchResult
This is the document type for all results from Search queries.
Field Name | Type | Description |
---|---|---|
gmeta |
Array |
An array of GMetaResult documents, the main body of the result |
facet_result |
Array |
Optional. An array of GFacetResult documents with counts for all facets requested on the search request |
offset |
Integer |
The offset provided on the input search request |
count |
Integer |
The number of results returned; i.e. the size of the gmeta array. May be 0 |
total |
Integer |
The total number of matches for the search. May be 0 if no matches are found |
has_next_page |
Boolean |
True if there’s another page of results available, False otherwise |
Examples
This result is in the 2019-08-27 format for GMetaResult documents.
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{
"content": {
"cuisine": [
"mexican"
],
"handle": "salsa-verde",
"ingredients": [
{
"amount": {
"number": 10
},
"default": "tomatillo",
"preparation": "simmer 20 minutes",
"type": "fruit"
},
{
"amount": {
"number": 2
},
"default": "serrano pepper",
"preparation": "seeded",
"substitutes": [
"jalapeno",
"thai bird chili"
],
"type": "fruit"
},
{
"amount": {
"number": 2,
"unit": "clove"
},
"default": "garlic",
"type": "vegetable"
},
{
"amount": {
"number": 0.5
},
"default": "yellow onion",
"type": "vegetable"
},
{
"amount": {
"number": 2,
"unit": "tsp"
},
"default": "salt",
"type": "spice"
},
{
"amount": {
"number": 2,
"unit": "tbsp"
},
"default": "coriander",
"preparation": "ground",
"subsitutes": [
"cumin"
],
"type": "spice"
}
],
"keywords": [
"salsa",
"tomatillo",
"coriander",
"serrano pepper"
],
"origin": {
"author": "Diana Kennedy",
"title": "Regional Mexican Cooking",
"type": "book"
}
},
"entry_id": null
}
],
"subject": "https://en.wikipedia.org/wiki/Salsa_verde"
}
],
"offset": 0,
"total": 1
}
This result is in the 2017-09-01 format for GMetaResult documents.
{
"count": 1,
"offset": 0,
"total": 1,
"gmeta": [
{
"content": [
{
"alpha": {
"beta": "gamma"
}
}
],
"entry_ids": [
null
],
"subject": "http://example.com"
}
]
}
GMetaResult
These are components in a search result.
A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:
-
visibility information is not exposed; i.e. visible_to is not included
-
metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the result is always returned as an array in which each element represents data provided by a different source or with different visibility
GMetaResult version 2019-08-27
Field Name | Type | Description |
---|---|---|
subject |
String |
the resource described by this metadata, often a URI |
entries |
Array |
An array of objects containing the data pertaining to the subject. Each object has the fields |
{
"entries": [
{
"content": {
"alpha": {
"beta": "gamma"
}
},
"matched_principal_sets": [],
"entry_id": null
},
{
"content": {
"alpha": {
"beta": "delta"
}
},
"matched_principal_sets": [],
"entry_id": "with_delta"
}
],
"subject": "http://example.com"
}
GMetaResult version 2017-09-01 (legacy format)
Field Name | Type | Description |
---|---|---|
subject |
String |
the resource described by this metadata, often a URI |
content |
Array |
an array of objects containing the metadata pertaining to the subject |
entry_ids |
Array |
an array of Entry IDs matching the content such that the entry ID at index i has content found in content[i]. See note below |
{
"entries": [
{
"content": {
"alpha": {
"beta": "gamma"
}
},
"matched_principal_sets": [],
"entry_id": null
}
],
"subject": "http://example.com"
}
Note how, in the example below, the new format makes it easier to associate
entry_id
values with content blobs. Additionally, this new format is more
extensible — if new fields are needed in the new format, they can be added as
siblings of the content
and entry_id
fields.
New Format (2019-08-27) | Old Format (2017-09-01) |
---|---|
|
|
each entry is a complete, standalone subdocument |
entry_ids needs to be zipped with content to make sense of the structure |
GBucket
Field Name | Type | Description |
---|---|---|
value |
String or Object |
If the bucket represents a single value (e.g. in a "terms" GFacet), the value is provided. If the bucket represents a range of values, then this is an object with "from" and "to" as in a GFilter document This range is assumed to be closed for the "from" value and open on the "to" value as in [from, to) |
count |
Integer |
The number of results in this bucket |