Query
1. Overview
The Query APIs are used to query data from Globus Search.
There are two APIs. Simple queries (for example, those not defining facets) may be accessed using the GET form for ease. More complicated queries will use the POST form to specify richer requirements. In either case, the result format is the same.
1.1. Timeouts
Query execution time is capped at 30 seconds. If query processing takes longer than this time, the API will terminate it and return a 504 error.
2. POST vs GET Queries
Many simple queries can be encoded in a simple query (a GET
query).
However, there are many complex query types which can only be encoded in a search request document. Here are some examples of the differences between these two query types.
Let’s start with some trivial examples which can be encoded in GET queries:
{
"q": "the quick brown fox jumps"
}
{
"q": "a search with paging",
"offset": 100,
"limit": 100,
"advanced": false
}
These simple cases are easy enough, but not very compelling.
{
"q": "a search with filtering and faceting",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "path.to.date",
"type": "date_histogram",
"date_interval": "year"
}
],
"sort": [
{
"field_name": "path.to.date",
"order": "asc"
}
]
}
This requests a search with a simple query, but with the additional conditions that
-
documents will be inspected for their
{"path": {"to: {"date": …}}}
values and filtered only to those prior to2014-11-07
-
the search should generate a "facet", a count of different values for
path.to.date
-
the type of facet is a
date_histogram
-
the bucket size for the counts should be by year
-
the search should return sorted results (as opposed to sorted by ranking), sorted by
path.to.date
in ascending order
2.1. Query Syntax
Two separate syntaxes for specifying the query are supported in both GET and POST queries: standard and advanced.
The standard query allows only for basic text matching. All queries will be processed, and results which best match the input will be provided.
The advanced syntax is more powerful and supports ranges, matching on particular fields and other more sophisticated capabilities. However, it is possible for an advanced query to be rejected as malformed.
As a result, advanced queries should not be directly exposed to users without forethought.
2.1.1. Advanced Query String Syntax
The Globus Search advanced query syntax should be very familiar to users of Elasticsearch. It provides a subset of the expressions supported by Elasticsearch, with a very similar syntax.
The full grammar for Globus Search queries is as follows (in EBNF):
(* character classes for use throughout the grammar *)
escape = "\" ;
operator = "+" | "-" | "=" | "&" | "|" | ">" | "<" | "!" | "^" | "~" | "*" | "?" ;
needs_escape = operator | escape | "'" | '"' | ":" | "(" | ")" |
"[" | "]" | "{" | "}" | " " ;
printables = ? all printable utf-8 characters, except spaces ? ;
(* zero-or-more spaces *)
spaces = { " " } ;
(* boolean operators *)
infix_bool_ops = " AND " | " OR " ;
printable_escaped_dquote = printables - '"' | escape , '"' |
printable_escaped_dquote , { printable_escaped_dquote } ;
quoted_string = '"' , printable_escaped_dquote , '"' ;
(* an atom is a bare string of some variety, potentially including
escaped characters *)
atom_char = printables - operator | escape , ( needs_escape | atom_char ) ;
atom = atom_char, { atom_char };
range_expr = ( "[" | "{" ) , spaces,
( atom | "*" ),
spaces , " TO " , spaces ,
( atom | "*" ) ,
spaces , ( "]" | "}" ) ;
term = quoted_string | atom , "*" | atom ;
field_expr = term | spaces , field_expr , spaces |
"(" , field_expr , ")" |
field_expr , ( " " | infix_bool_ops ) , field_expr ;
field_rhs = term | range_expr | ( "(" , field_expr , ")" );
(* a field is an expression like 'foo: bar' or 'foo: [1 TO 10}' *)
field = ( "+" | "-" | "" ) , atom , ":" , spaces , field_rhs ;
(* an expression is a bare value or field or series of expressions
expressions can be joined with spaces or infix operators *)
expr = field | "(" , expr , ")" |
expr , ( " " | infix_bool_ops ) , expr |
spaces , expr , spaces ;
Note that the concrete parser used may differ slightly from the above grammar, especially where it is ambiguous.
2.1.2. Advanced Query String Usage
These are the basic grammatical constructs of the advanced query language.
Quotes
Quoted strings are literal strings. For example, to search for abc.def:ghi
,
in the advanced query language, you would use "abc.def:ghi"
to signify that
the contents of the string should not be parsed as an advanced query.
Fields
example: foo
searches for instances of the field named example
with a value
of foo
.
Field names in Advanced Query strings may express paths within documents, using dots as the separator between field names. Escaped dots should be used to express "the dot character". For example, given the document
{
"a": {
"b": 1
},
"a.b": 2
}
we take a.b
to refer to the value 1, and a\.b
to refer to the value 2.
Boolean Ops + Grouping
Use parentheses + AND
and OR
and NOT
to combine expressions.
For example, foo:[10 to 20] AND (example:bar OR NOT baz:"[ERROR]")
Note that field expressions cannot contain fields.
The following query is invalid: foo:(bar baz:buzz)
.
Dates and Date Ranges
Dates can be handled with exact matching or with range expressions. When using a date within a range, be sure to use a supported date format for your query.