Delete by Query
Delete by query provides a powerful method for removing a large number of documents in a single operation.
The operation removes an entire subject where there is a match on the query. That is, if even a single entry for a subject matches the query, then all entries, as well as the subject itself, will be removed from the index.
This is similar to the result of performing a query: the set of subjects deleted will exactly match the set of subjects returned by the query used for delete by query.
You may want to test delete by query operations by first executing the query as a search.
Due to the broad capability of delete by query to change the state of the index, it can only be executed by a user with 'owner' or 'admin' permissions on the index.
Delete By Query is submitted as an asynchronous Task, which can then be monitored using the Get Task API. Once your task is complete, the data will be removed from the search index and will no longer appear in query results.
Method |
POST |
URL |
/v1/index/<index_id>/delete_by_query |
Authentication required? |
Yes |
Required Roles |
You must have |
Request Body |
|
Response Body |
Authentication & Authorization
Tokens for this call must have one of these scopes.
urn:globus:auth:scope:search.api.globus.org:all urn:globus:auth:scope:search.api.globus.org:ingest
Request Schemas
DeleteByQueryRequest
This is the main document type for encoding a Delete By Query task.
A DeleteByQueryRequest
document is versioned with the @version
field as either
2017-09-01
or delete_by_query#1.0.0
.
When omitted, @version
defaults to the current service default version, which
is 2017-09-01
.
This is the newer version of a Delete By Query request.
It will become the default in a future release of Globus Search.
Until then, users must request it explicitly with the @version
field.
Field Name | Type | Description |
---|---|---|
@version |
String |
Must be |
q |
String |
User-supplied query, conforming to the query syntax. Required if there are no filters. |
advanced |
Boolean |
Optional. When true, interpret q with the advanced query syntax Defaults to False. |
filters |
Array |
An array of GFilter Documents. Filters to apply to the search. Required if |
{
"@version": "delete_by_query#1.0.0",
"q": "the quick brown fox jumps"
}
{
"@version": "delete_by_query#1.0.0",
"q": "a search with filtering",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
]
}
{
"@version": "delete_by_query#1.0.0",
"q": "(queries can be fancy AND cool) OR (NOT extravagant)",
"advanced": true
}
This is the legacy version of a Delete By Query request.
It is the default when @version
is omitted for compatibility while users
migrate to the delete_by_query#1.0.0
version.
Field Name | Type | Description |
---|---|---|
@version |
String |
Must be |
q |
String |
User-supplied query, conforming to the query syntax. Required if there are no filters. |
advanced |
Boolean |
Optional. When true, interpret q with the advanced query syntax Defaults to False. |
filters |
Array |
An array of GFilter Documents. Filters to apply to the search. Required if |
{
"@version": "2017-09-01",
"q": "the quick brown fox jumps"
}
{
"@version": "2017-09-01",
"q": "a search with filtering",
"filters": [
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "*",
"to": "2014-11-07"
}
]
}
]
}
{
"@version": "2017-09-01",
"q": "(queries can be fancy AND cool) OR (NOT extravagant)",
"advanced": true
}
GFilter
A GFilter document is one of several document types which encode a filter.
The type of filter is identified by the type
field.
See the table below for the various filter types.
These filter documents are defined for query#1.0.0
, delete_by_query#1.0.0
,
and scroll#1.0.0
.
Type | Schema |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These filter documents are defined for documents on version 2017-09-01
.
Type | Schema |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All filters on document version 2017-09-01
support a post_filter
field.
Note that post_filter
is only valid on filters when they are in the top
level filters
array of a request.
When filters are nested under and
, or
, or not
filters, post_filter
is
no longer valid.
GFilterMatch
A matching filter for finding results which match some set of text terms.
"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.
This is the version of a "match" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
One of |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Strings or Booleans |
The values to evaluate against the field_name. If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans. |
"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.
{
"type": "match_any",
"field_name": "globus_metadata.resource_type",
"values": [
"Globus Endpoint"
]
}
{
"type": "match_all",
"field_name": "globus_metadata.keywords",
"values": [
"hpc",
"internet2",
"uchicago"
]
}
{
"type": "match_any",
"field_name": "globus_metadata.snorkels",
"values": [
"few",
"many",
true
]
}
This filter is only valid if globus_metadata.snorkels
is a string field
because string fields can contain boolean values.
If it is a boolean field (which cannot contain string values), the query will
fail with an error regarding the improper mapping of "few"
and "many"
onto
a boolean field.
This is the version of a "match" filter defined for legacy document versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
One of |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Strings or Booleans |
The values to evaluate against the field_name. If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True for |
"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.
{
"type": "match_any",
"field_name": "globus_metadata.resource_type",
"values": [
"Globus Endpoint"
]
}
{
"type": "match_all",
"field_name": "globus_metadata.keywords",
"values": [
"hpc",
"internet2",
"uchicago"
]
}
{
"type": "match_any",
"field_name": "globus_metadata.snorkels",
"values": [
"few",
"many",
true
]
}
This filter is only valid if globus_metadata.snorkels
is a string field
because string fields can contain boolean values.
If it is a boolean field (which cannot contain string values), the query will
fail with an error regarding the improper mapping of "few"
and "many"
onto
a boolean field.
GFilterRange
A range filter for finding results which have numeric or date values within a specified range.
This is the version of a "range" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Objects |
The values to evaluate against the Each object has the fields |
values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
}
]
}
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "10",
"to": "50"
}
]
}
This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "0",
"to": "5"
},
{
"from": "10",
"to": "*"
}
]
}
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
},
{
"gte": "2015-01-01",
"lte": "2016-01-01"
},
{
"gt": "2016-01-15",
"lt": "*"
}
]
}
This is the version of a "range" filter defined for legacy document versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
values |
Array of Objects |
The values to evaluate against the Each object has the fields |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
}
]
}
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "10",
"to": "50"
}
]
}
This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.
{
"type": "range",
"field_name": "cardinality_of_foobar",
"values": [
{
"from": "0",
"to": "5"
},
{
"from": "10",
"to": "*"
}
]
}
{
"type": "range",
"field_name": "path.to.date",
"values": [
{
"from": "1970-01-01",
"to": "2015-01-01"
},
{
"gte": "2015-01-01",
"lte": "2016-01-01"
},
{
"gt": "2016-01-15",
"lt": "*"
}
]
}
GFilterGeoBoundingBox
A bounding box filter for finding geo_shape
and geo_point
values which
intersect with a specified bounding box.
This is the version of a "geo bounding box" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
top_left |
Object |
An object describing a coordinate pair. It must contain the keys |
bottom_right |
Object |
An object describing a coordinate pair. It must contain the keys |
top_left
is required to be northwest of bottom_right
.
{
"type": "geo_bounding_box",
"field_name": "country.center",
"top_left": {
"lat": 49.1,
"lon": -124.9
},
"bottom_right": {
"lat": 24.9,
"lon": -67.1
}
}
This is the version of a "geo bounding box" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
top_left |
Object |
An object describing a coordinate pair. It must contain the keys |
bottom_right |
Object |
An object describing a coordinate pair. It must contain the keys |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
top_left
is required to be northwest of bottom_right
.
{
"type": "geo_bounding_box",
"field_name": "country.center",
"top_left": {
"lat": 49.1,
"lon": -124.9
},
"bottom_right": {
"lat": 24.9,
"lon": -67.1
}
}
GFilterGeoShape
A geo filter for finding geo_shape
and geo_point
values which
intersect with or are contained within a given shape.
This is the version of a "geo shape" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
shape |
Object |
A GeoJSON formatted Geometry. See note below on supported geometries. |
relation |
String |
The shape relationship to test. One of |
{
"type": "geo_shape",
"field_name": "city.boundary",
"shape": {
"type": "Polygon",
"coordinates": [
[
[
-5.8,
51.5
],
[
10.0,
51.5
],
[
10.0,
41.0
],
[
-5.8,
41.0
],
[
-5.8,
51.5
]
]
]
}
}
This is the version of a "geo shape" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
shape |
Object |
A GeoJSON formatted Geometry. See note below on supported geometries. |
relation |
String |
The shape relationship to test. One of |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
{
"type": "geo_shape",
"field_name": "city.boundary",
"shape": {
"type": "Polygon",
"coordinates": [
[
[
-5.8,
51.5
],
[
10.0,
51.5
],
[
10.0,
41.0
],
[
-5.8,
41.0
],
[
-5.8,
51.5
]
]
]
}
}
Supported Geometries
Only two-dimensional GeoJSON data are allowed in geo_shape
filters.
That means that coordinates should be encoded as JSON arrays of length 2.
Globus Search only supports filters using GeoJSON Polygons. Furthermore, Polygons are restricted to simple polygons, consisting of only one coordinate ring. This means that polygons with internal cut-outs are forbidden.
GFilterExists
An "existence" filter which checks if a field is present in a document with a
non-null value.
Note that a field being present but with a value of null
is considered the
same, under exists filters, as the field being absent from the document.
This is the version of an "exists" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
The following filter finds documents where the field foo
exists:
{
"type": "exists",
"field_name": "foo"
}
This is the version of an "exists" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
The following filter finds documents where the field foo
exists:
{
"type": "exists",
"field_name": "foo"
}
GFilterLike
A "like" filter which checks if a field matches a "like-expression". Like expressions are matching strings containing the wildcard characters:
-
*
matches any number of characters -
?
matches any one character
This is the version of a "like" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. It must be a text field. |
value |
String |
The filter expression to apply as a match. |
The following filter finds documents where the field filename
contains a
string ending in .csv
.
{
"type": "like",
"field_name": "filename",
"value": "*.csv"
}
Note that this does not technically guarantee that the filename
ends with
.csv
. For example, it is possible for the filter to match on a value like
"filename": "foo.csv bar"
.
This is the version of a "like" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
field_name |
String |
The field to which the filter refers. It must be a text field. |
value |
String |
The filter expression to apply as a match. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
The following filter finds documents where the field filename
contains a
string ending in .csv
.
{
"type": "like",
"field_name": "filename",
"value": "*.csv"
}
Note that this does not technically guarantee that the filename
ends with
.csv
. For example, it is possible for the filter to match on a value like
"filename": "foo.csv bar"
.
GFilterNot
A "not" filter for inverting any other valid filter.
This is the version of a "not" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filter |
Object |
Any valid GFilter object. |
The following filter finds documents where the field foo
does not exist:
{
"type": "not",
"filter": {
"type": "exists",
"field_name": "foo"
}
}
This is the version of a "not" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filter |
Object |
Any valid GFilter object. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
The following filter finds documents where the field foo
does not exist:
{
"type": "not",
"filter": {
"type": "exists",
"field_name": "foo"
}
}
GFilterAnd
An "and" filter for joining any other valid filters. In order for an "and" filter to match on documents, all of the filters it contains must match.
An "existence" filter which checks if a field is present in a document with a
non-null value.
Note that a field being present but with a value of null
is considered the
same, under exists filters, as the field being absent from the document.
This is the version of an "and" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
The following filter finds documents where the field title
exists and
keywords
contains hpc
:
{
"type": "and",
"filter": [
{
"type": "exists",
"field_name": "title"
},
{
"type": "match_any",
"field_name": "keywords",
"values": [
"hpc"
]
}
]
}
This is the version of an "and" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
The following filter finds documents where the field title
exists and
keywords
contains hpc
:
{
"type": "and",
"filter": [
{
"type": "exists",
"field_name": "title"
},
{
"type": "match_any",
"field_name": "keywords",
"values": [
"hpc"
]
}
]
}
GFilterOr
An "or" filter for joining any other valid filters. In order for an "or" filter to match on documents, at least one of the filters it contains must match.
This is the version of an "or" filter defined for
query#1.0.0
, delete_by_query#1.0.0
, and scroll#1.0.0
.
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
The following filter finds documents where either the author.institution
or
the dataset.institution
is uchicago.edu
. One or both can be a match:
{
"type": "or",
"filter": [
{
"type": "match_any",
"field_name": "author.institution",
"values": [
"uchicago.edu"
]
},
{
"type": "match_any",
"field_name": "dataset.institution",
"values": [
"uchicago.edu"
]
}
]
}
This is the version of an "or" filter defined for legacy document
versions (2017-09-01
).
Field Name | Type | Description |
---|---|---|
type |
String |
Must have the value |
filters |
Array of Object |
An array of GFilter objects. |
post_filter |
Boolean |
Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results. Defaults to True. |
The following filter finds documents where either the author.institution
or
the dataset.institution
is uchicago.edu
. One or both can be a match:
{
"type": "or",
"filter": [
{
"type": "match_any",
"field_name": "author.institution",
"values": [
"uchicago.edu"
]
},
{
"type": "match_any",
"field_name": "dataset.institution",
"values": [
"uchicago.edu"
]
}
]
}