Scroll Query
Method |
POST |
URL |
/v1/index/<index_id>/scroll |
Authentication required? |
Only for non-public data |
Required Roles |
None |
Request Body |
a GScrollRequest document |
Response Body |
a GScrollResult document |
Authentication & Authorization
Tokens for this call must have one of these scopes.
urn:globus:auth:scope:search.api.globus.org:all urn:globus:auth:scope:search.api.globus.org:search
Examples
To run a scroll query, we send it via a POST
to the API, e.g.
curl -XPOST \
-H 'Content-Type: application/json' \
'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/scroll' \
--data '
{
"q": "a scroll request with filtering",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
]
}'
Request Schemas
GScrollRequest
This is the main document type for encoding a scrolling query.
Field Name | Type | Description |
---|---|---|
q |
String |
User-supplied query, conforming to the query syntax. Required if there are no filters. |
advanced |
Boolean |
Defaults to False When true, interpret q with the advanced query syntax |
limit |
Integer |
Optional. Limit the results given to limit many items. Defaults to 10 |
bypass_visible_to |
Boolean |
Defaults to False Allowed for Index Admins only. When true, visible_to restrictions will be ignored for this search query. |
filter_principal_sets |
List of Strings |
Optional. A list of The caller’s identity set will be matched against any |
filters |
Array |
Optional. An array of GFilter Documents. Filters to apply to the search |
marker |
String |
Optional. An opaque token from a previous scroll result document, used to request the nex page of results. |
GFilter
A GFilter document is one of several document types which encode a filter.
The type of filter is identified by the type
field.
See the table below for the various filter types.
Type | Schema |
---|---|
|
|
|
|
|
|
|
|
|
|
|
GFilterMatch
A matching filter for finding results which match some set of text terms.
"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.
Field Name | Type | Description |
---|---|---|
type |
String |
One of |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Strings or Booleans |
The values to evaluate against the field_name. If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans. |
"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.
Examples
{
"type": "match_any",
"field_name": "globus_metadata.resource_type",
"values": [
"Globus Endpoint"
]
}
{
"type": "match_all",
"field_name": "globus_metadata.keywords",
"values": [
"hpc",
"internet2",
"uchicago"
]
}
{
"type": "match_any",
"field_name": "globus_metadata.snorkels",
"values": [
"few",
"many",
true
]
}
This filter is only valid if globus_metadata.snorkels
is a string field
because string fields can contain boolean values.
If it is a boolean field (which cannot contain string values), the query will
fail with an error regarding the improper mapping of "few"
and "many"
onto
a boolean field.
GFilterRange
A range filter for finding results which have numeric or date values within a specified range.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Objects |
The values to evaluate against the field_name. This must be an Array of Objects each with the fields from and to. |
values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.
Examples
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
}
]
}
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "10",
"to": "50"
}
]
}
This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "0",
"to": "5"
},
{
"from": "10",
"to": "*"
}
]
}
GFilterGeoBoundingBox
A bounding box filter for finding geo_shape
and geo_point
values which
intersect with a specified bounding box.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
top_left |
Object |
An object describing a coordinate pair. It must contain the keys |
bottom_right |
Object |
An object describing a coordinate pair. It must contain the keys |
top_left
is required to be northwest of bottom_right
.
GFilterExists
An "existence" filter which checks if a field is present in a document with a
non-null value.
Note that a field being present but with a value of null
is considered the
same, under exists filters, as the field being absent from the document.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
GFilterNot
A "not" filter for inverting any other valid filter.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filter |
Object |
Any valid GFilter object, except for another "not" filter. |
Response Schemas
GScrollResult
This is the document type for all results from scrolling queries.
Field Name | Type | Description |
---|---|---|
gmeta |
Array |
An array of GMetaResult documents, the main body of the result |
count |
Integer |
The number of results returned; i.e. the size of the gmeta array. May be 0 |
total |
Integer |
The total number of matches for the search. May be 0 if no matches are found |
has_next_page |
Boolean |
True if there’s another page of results available, False otherwise |
marker |
String |
An opaque marker used to request the next page of results |
Examples
{
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{
"content": {
"cuisine": [
"mexican"
],
"handle": "salsa-verde",
"ingredients": [
{
"amount": {
"number": 10
},
"default": "tomatillo",
"preparation": "simmer 20 minutes",
"type": "fruit"
},
{
"amount": {
"number": 2
},
"default": "serrano pepper",
"preparation": "seeded",
"substitutes": [
"jalapeno",
"thai bird chili"
],
"type": "fruit"
},
{
"amount": {
"number": 2,
"unit": "clove"
},
"default": "garlic",
"type": "vegetable"
},
{
"amount": {
"number": 0.5
},
"default": "yellow onion",
"type": "vegetable"
},
{
"amount": {
"number": 2,
"unit": "tsp"
},
"default": "salt",
"type": "spice"
},
{
"amount": {
"number": 2,
"unit": "tbsp"
},
"default": "coriander",
"preparation": "ground",
"substitutes": [
"cumin"
],
"type": "spice"
}
],
"keywords": [
"salsa",
"tomatillo",
"coriander",
"serrano pepper"
],
"origin": {
"author": "Diana Kennedy",
"title": "Regional Mexican Cooking",
"type": "book"
}
},
"entry_id": null
}
],
"subject": "https://en.wikipedia.org/wiki/Salsa_verde"
}
],
"total": 1,
"has_next_page": true,
"marker": "3d34900e3e4211ebb0a806b2af333354"
}
GMetaResult
These are components in a search result.
A GMetaResult is a structure similar to a GMetaEntry from the Ingest API, with the following significant differences:
-
visibility information is not exposed; i.e. visible_to is not included
-
metadata for any subject may be an aggregate of multiple documents with different visibility rules or sources. Thus, the result is always returned as an array in which each element represents data provided by a different source or with different visibility
GMetaResult version 2019-08-27
Field Name | Type | Description |
---|---|---|
subject |
String |
the resource described by this metadata, often a URI |
entries |
Array |
An array of objects containing the data pertaining to the subject. Each object has the fields |
{
"entries": [
{
"content": {
"alpha": {
"beta": "gamma"
}
},
"matched_principal_sets": [],
"entry_id": null
},
{
"content": {
"alpha": {
"beta": "delta"
}
},
"matched_principal_sets": [],
"entry_id": "with_delta"
}
],
"subject": "http://example.com"
}
GMetaResult version 2017-09-01 (legacy format)
Field Name | Type | Description |
---|---|---|
subject |
String |
the resource described by this metadata, often a URI |
content |
Array |
an array of objects containing the metadata pertaining to the subject |
entry_ids |
Array |
an array of Entry IDs matching the content such that the entry ID at index i has content found in content[i]. See note below |
{
"entries": [
{
"content": {
"alpha": {
"beta": "gamma"
}
},
"matched_principal_sets": [],
"entry_id": null
}
],
"subject": "http://example.com"
}
Note how, in the example below, the new format makes it easier to associate
entry_id
values with content blobs. Additionally, this new format is more
extensible — if new fields are needed in the new format, they can be added as
siblings of the content
and entry_id
fields.
New Format (2019-08-27) | Old Format (2017-09-01) |
---|---|
|
|
each entry is a complete, standalone subdocument |
entry_ids needs to be zipped with content to make sense of the structure |