Search API
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Batch Delete by Subject
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Entry Delete
    • Entry Show
    • Index Create
    • Index Delete
    • Index List
    • Index Reopen
    • Index Show
    • Ingest
    • Query - GET
    • Query - POST
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Subject Show
    • Task List
    • Task Show
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Timers Transfer Globus Connect Server Compute Helper Pages
  • Applications
    Globus Connect Personal Globus Connect Server Premium Storage Connectors Compute Command Line Interface Python SDK JavaScript SDK
  • Guides
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Globus Services
  3. Globus Search
  4. API Reference
  5. Delete by Query

Delete by Query

Delete by query provides a powerful method for removing a large number of documents in a single operation.

The operation removes an entire subject where there is a match on the query. That is, if even a single entry for a subject matches the query, then all entries, as well as the subject itself, will be removed from the index.

Warning

Delete by query should be used with care, as it may not always be obvious which documents it will delete.

This is similar to the result of performing a query: the set of subjects deleted will exactly match the set of subjects returned by the query used for delete by query.

You may want to test delete by query operations by first executing the query as a search.

Due to the broad capability of delete by query to change the state of the index, it can only be executed by a user with 'owner' or 'admin' permissions on the index.

Delete By Query is submitted as an asynchronous Task, which can then be monitored using the Get Task API. Once your task is complete, the data will be removed from the search index and will no longer appear in query results.

Method

POST

URL

/v1/index/<index_id>/delete_by_query

Authentication required?

Yes

Required Roles

You must have owner, admin, or writer access

Request Body

A DeleteByQueryRequest

Response Body

A DeleteByQueryResponse

Authentication & Authorization

Tokens for this call must have one of these scopes.

urn:globus:auth:scope:search.api.globus.org:all
urn:globus:auth:scope:search.api.globus.org:ingest

Request Schemas

DeleteByQueryRequest

This is the main document type for encoding a Delete By Query task.

A DeleteByQueryRequest document is versioned with the @version field as either 2017-09-01 or delete_by_query#1.0.0. When omitted, @version defaults to the current service default version, which is 2017-09-01.

  • version query#1.0.0
  • version 2017-09-01 (legacy)

This is the newer version of a Delete By Query request. It will become the default in a future release of Globus Search. Until then, users must request it explicitly with the @version field.

Field Name Type Description

@version

String

Must be "delete_by_query#1.0.0"

q

String

User-supplied query, conforming to the query syntax.

Required if there are no filters.

advanced

Boolean

Optional. When true, interpret q with the advanced query syntax

Defaults to False. Mutually exclusive with q_settings.

q_settings

Object

Optional. A q_settings object, with settings indicating how to interpret q.

Mutually exclusive with advanced.

filters

Array

An array of GFilter Documents. Filters to apply to the search.

Required if q is not provided.

Examples
{
  "@version": "delete_by_query#1.0.0",
  "q": "the quick brown fox jumps"
}
{
  "@version": "delete_by_query#1.0.0",
  "q": "a search with filtering",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ]
}
{
  "@version": "delete_by_query#1.0.0",
  "q": "(queries can be fancy AND cool) OR (NOT extravagant)",
  "q_settings": {
    "mode": "advanced_query_string",
    "default_operator": "or"
  }
}

This is the legacy version of a Delete By Query request. It is the default when @version is omitted for compatibility while users migrate to the delete_by_query#1.0.0 version.

Field Name Type Description

@version

String

Must be "2017-09-01"

q

String

User-supplied query, conforming to the query syntax.

Required if there are no filters.

advanced

Boolean

Optional. When true, interpret q with the advanced query syntax

Defaults to False.

filters

Array

An array of GFilter Documents. Filters to apply to the search.

Required if q is not provided.

Examples
{
  "@version": "2017-09-01",
  "q": "the quick brown fox jumps"
}
{
  "@version": "2017-09-01",
  "q": "a search with filtering",
  "filters": [
    {
      "type": "range",
      "field_name": "path.to.date",
      "values": [
        {
          "from": "*",
          "to": "2014-11-07"
        }
      ]
    }
  ]
}
{
  "@version": "2017-09-01",
  "q": "(queries can be fancy AND cool) OR (NOT extravagant)",
  "advanced": true
}

GFilter

A GFilter document is one of several document types which encode a filter. The type of filter is identified by the type field. See the table below for the various filter types.

  • version 1.0.0
  • version 2017 (legacy)

These filter documents are defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Type Schema

match_all

GFilterMatch

match_any

GFilterMatch

range

GFilterRange

geo_bounding_box

GFilterGeoBoundingBox

geo_shape

GFilterGeoShape

exists

GFilterExists

like

GFilterLike

not

GFilterNot

and

GFilterAnd

or

GFilterOr

These filter documents are defined for documents on version 2017-09-01.

Type Schema

match_all

GFilterMatch

match_any

GFilterMatch

range

GFilterRange

geo_bounding_box

GFilterGeoBoundingBox

geo_shape

GFilterGeoShape

exists

GFilterExists

like

GFilterLike

not

GFilterNot

and

GFilterAnd

or

GFilterOr

Note

All filters on document version 2017-09-01 support a post_filter field. Note that post_filter is only valid on filters when they are in the top level filters array of a request.

When filters are nested under and, or, or not filters, post_filter is no longer valid.

GFilterMatch

A matching filter for finding results which match some set of text terms.

"match_any" and "match_all" refer to the different possible behaviors of the filter values. As their names imply, if "match_any" is specified, the filter will match results for which any of filter values match, while "match_all" requires that all of the values match on every result.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "match" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

One of {"match_any", "match_all"}

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Strings or Booleans

The values to evaluate against the field_name.

If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans.

Note

"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.

Examples
{
  "type": "match_any",
  "field_name": "globus_metadata.resource_type",
  "values": [
    "Globus Endpoint"
  ]
}
{
  "type": "match_all",
  "field_name": "globus_metadata.keywords",
  "values": [
    "hpc",
    "internet2",
    "uchicago"
  ]
}
{
  "type": "match_any",
  "field_name": "globus_metadata.snorkels",
  "values": [
    "few",
    "many",
    true
  ]
}
Note

This filter is only valid if globus_metadata.snorkels is a string field because string fields can contain boolean values.

If it is a boolean field (which cannot contain string values), the query will fail with an error regarding the improper mapping of "few" and "many" onto a boolean field.

This is the version of a "match" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

One of {"match_any", "match_all"}

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Strings or Booleans

The values to evaluate against the field_name.

If the field is a boolean field, this must be an array of booleans only. For string fields, it may be a mixture of strings or booleans.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True for match_any and False for match_all.

Note

"match_any" and "match_all" are the same when there’s only one value as far as filtering is concerned, but they may have different impact on the way that facets are interpreted.

Examples
{
  "type": "match_any",
  "field_name": "globus_metadata.resource_type",
  "values": [
    "Globus Endpoint"
  ]
}
{
  "type": "match_all",
  "field_name": "globus_metadata.keywords",
  "values": [
    "hpc",
    "internet2",
    "uchicago"
  ]
}
{
  "type": "match_any",
  "field_name": "globus_metadata.snorkels",
  "values": [
    "few",
    "many",
    true
  ]
}
Note

This filter is only valid if globus_metadata.snorkels is a string field because string fields can contain boolean values.

If it is a boolean field (which cannot contain string values), the query will fail with an error regarding the improper mapping of "few" and "many" onto a boolean field.

GFilterRange

A range filter for finding results which have numeric or date values within a specified range.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "range" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "range"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Objects

The values to evaluate against the field_name.

Each object has the fields from and to OR each object has exactly one of gte or gt and exactly one of lt or lte.

Note

values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.

Examples
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    }
  ]
}
{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "10",
      "to": "50"
    }
  ]
}

This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.

{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "0",
      "to": "5"
    },
    {
      "from": "10",
      "to": "*"
    }
  ]
}
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    },
    {
      "gte": "2015-01-01",
      "lte": "2016-01-01"
    },
    {
      "gt": "2016-01-15",
      "lt": "*"
    }
  ]
}

This is the version of a "range" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "range"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

values

Array of Objects

The values to evaluate against the field_name.

Each object has the fields from and to OR each object has exactly one of gte or gt and exactly one of lt or lte.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Note

values.from and values.to may be the special string "*" indicating that the range is unbounded on this end. An example is given below.

Examples
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    }
  ]
}
{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "10",
      "to": "50"
    }
  ]
}

This example filter has multiple clauses. The combination is implicitly joined with "or" semantics. This means that we allow values from 0 to 5, and greater than or equal to 10.

{
  "type": "range",
  "field_name": "cardinality_of_foobar",
  "values": [
    {
      "from": "0",
      "to": "5"
    },
    {
      "from": "10",
      "to": "*"
    }
  ]
}
{
  "type": "range",
  "field_name": "path.to.date",
  "values": [
    {
      "from": "1970-01-01",
      "to": "2015-01-01"
    },
    {
      "gte": "2015-01-01",
      "lte": "2016-01-01"
    },
    {
      "gt": "2016-01-15",
      "lt": "*"
    }
  ]
}

GFilterGeoBoundingBox

A bounding box filter for finding geo_shape and geo_point values which intersect with a specified bounding box.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "geo bounding box" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "geo_bounding_box"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

top_left

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

bottom_right

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

Note

top_left is required to be northwest of bottom_right.

Examples
{
  "type": "geo_bounding_box",
  "field_name": "country.center",
  "top_left": {
    "lat": 49.1,
    "lon": -124.9
  },
  "bottom_right": {
    "lat": 24.9,
    "lon": -67.1
  }
}

This is the version of a "geo bounding box" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "geo_bounding_box"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

top_left

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

bottom_right

Object

An object describing a coordinate pair.

It must contain the keys lat and lon, each of which must have a numeric value.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Note

top_left is required to be northwest of bottom_right.

Examples
{
  "type": "geo_bounding_box",
  "field_name": "country.center",
  "top_left": {
    "lat": 49.1,
    "lon": -124.9
  },
  "bottom_right": {
    "lat": 24.9,
    "lon": -67.1
  }
}

GFilterGeoShape

A geo filter for finding geo_shape and geo_point values which intersect with or are contained within a given shape.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "geo shape" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "geo_shape"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

shape

Object

A GeoJSON formatted Geometry.

See note below on supported geometries.

relation

String

The shape relationship to test.

One of {"intersects", "within"}. Defaults to "intersects".

Examples
{
  "type": "geo_shape",
  "field_name": "city.boundary",
  "shape": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -5.8,
          51.5
        ],
        [
          10.0,
          51.5
        ],
        [
          10.0,
          41.0
        ],
        [
          -5.8,
          41.0
        ],
        [
          -5.8,
          51.5
        ]
      ]
    ]
  }
}

This is the version of a "geo shape" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "geo_shape"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

shape

Object

A GeoJSON formatted Geometry.

See note below on supported geometries.

relation

String

The shape relationship to test.

One of {"intersects", "within"}. Defaults to "intersects".

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples
{
  "type": "geo_shape",
  "field_name": "city.boundary",
  "shape": {
    "type": "Polygon",
    "coordinates": [
      [
        [
          -5.8,
          51.5
        ],
        [
          10.0,
          51.5
        ],
        [
          10.0,
          41.0
        ],
        [
          -5.8,
          41.0
        ],
        [
          -5.8,
          51.5
        ]
      ]
    ]
  }
}
Supported Geometries

Only two-dimensional GeoJSON data are allowed in geo_shape filters. That means that coordinates should be encoded as JSON arrays of length 2.

Globus Search only supports filters using GeoJSON Polygons. Furthermore, Polygons are restricted to simple polygons, consisting of only one coordinate ring. This means that polygons with internal cut-outs are forbidden.

GFilterExists

An "existence" filter which checks if a field is present in a document with a non-null value. Note that a field being present but with a value of null is considered the same, under exists filters, as the field being absent from the document.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of an "exists" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "exists"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

Examples

The following filter finds documents where the field foo exists:

{
  "type": "exists",
  "field_name": "foo"
}

This is the version of an "exists" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "exists"

field_name

String

The field to which the filter refers. Any dots (".") must be escaped with a preceding backslash ("\") character.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples

The following filter finds documents where the field foo exists:

{
  "type": "exists",
  "field_name": "foo"
}

GFilterLike

A "like" filter which checks if a field matches a "like-expression". Like expressions are matching strings containing the wildcard characters:

  • * matches any number of characters

  • ? matches any one character

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "like" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "like".

field_name

String

The field to which the filter refers. It must be a text field.

value

String

The filter expression to apply as a match.

Examples

The following filter finds documents where the field filename contains a string ending in .csv.

{
  "type": "like",
  "field_name": "filename",
  "value": "*.csv"
}

Note that this does not technically guarantee that the filename ends with .csv. For example, it is possible for the filter to match on a value like "filename": "foo.csv bar".

This is the version of a "like" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "like".

field_name

String

The field to which the filter refers. It must be a text field.

value

String

The filter expression to apply as a match.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples

The following filter finds documents where the field filename contains a string ending in .csv.

{
  "type": "like",
  "field_name": "filename",
  "value": "*.csv"
}

Note that this does not technically guarantee that the filename ends with .csv. For example, it is possible for the filter to match on a value like "filename": "foo.csv bar".

GFilterNot

A "not" filter for inverting any other valid filter.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of a "not" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "not"

filter

Object

Any valid GFilter object.

Examples

The following filter finds documents where the field foo does not exist:

{
  "type": "not",
  "filter": {
    "type": "exists",
    "field_name": "foo"
  }
}

This is the version of a "not" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "not"

filter

Object

Any valid GFilter object.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples

The following filter finds documents where the field foo does not exist:

{
  "type": "not",
  "filter": {
    "type": "exists",
    "field_name": "foo"
  }
}

GFilterAnd

An "and" filter for joining any other valid filters. In order for an "and" filter to match on documents, all of the filters it contains must match.

An "existence" filter which checks if a field is present in a document with a non-null value. Note that a field being present but with a value of null is considered the same, under exists filters, as the field being absent from the document.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of an "and" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "and"

filters

Array of Object

An array of GFilter objects.

Examples

The following filter finds documents where the field title exists and keywords contains hpc:

{
  "type": "and",
  "filter": [
    {
      "type": "exists",
      "field_name": "title"
    },
    {
      "type": "match_any",
      "field_name": "keywords",
      "values": [
        "hpc"
      ]
    }
  ]
}

This is the version of an "and" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "and"

filters

Array of Object

An array of GFilter objects.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples

The following filter finds documents where the field title exists and keywords contains hpc:

{
  "type": "and",
  "filter": [
    {
      "type": "exists",
      "field_name": "title"
    },
    {
      "type": "match_any",
      "field_name": "keywords",
      "values": [
        "hpc"
      ]
    }
  ]
}

GFilterOr

An "or" filter for joining any other valid filters. In order for an "or" filter to match on documents, at least one of the filters it contains must match.

  • version 1.0.0
  • version 2017 (legacy)

This is the version of an "or" filter defined for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

type

String

Must have the value "or"

filters

Array of Object

An array of GFilter objects.

Examples

The following filter finds documents where either the author.institution or the dataset.institution is uchicago.edu. One or both can be a match:

{
  "type": "or",
  "filter": [
    {
      "type": "match_any",
      "field_name": "author.institution",
      "values": [
        "uchicago.edu"
      ]
    },
    {
      "type": "match_any",
      "field_name": "dataset.institution",
      "values": [
        "uchicago.edu"
      ]
    }
  ]
}

This is the version of an "or" filter defined for legacy document versions (2017-09-01).

Field Name Type Description

type

String

Must have the value "or"

filters

Array of Object

An array of GFilter objects.

post_filter

Boolean

Control whether or not this filter should be applied before or after facets are calculated. If True, the filter will not impact facet results, but will filter the query results.

Defaults to True.

Examples

The following filter finds documents where either the author.institution or the dataset.institution is uchicago.edu. One or both can be a match:

{
  "type": "or",
  "filter": [
    {
      "type": "match_any",
      "field_name": "author.institution",
      "values": [
        "uchicago.edu"
      ]
    },
    {
      "type": "match_any",
      "field_name": "dataset.institution",
      "values": [
        "uchicago.edu"
      ]
    }
  ]
}

q_settings

  • version 1.0.0

This is the definition of q_settings documents for query#1.0.0, delete_by_query#1.0.0, and scroll#1.0.0.

Field Name Type Description

mode

String

query_string: Evaluate q using the normal query mode.

advanced_query_string: Evaluate q using the advanced query mode.

default_operator

String

or: Return results that match any of the query terms.

and: Return results that match all of the query terms.

Examples
{
  "mode": "query_string",
  "default_operator": "or"
}

Response Schemas

DeleteByQueryResponse

Field Name Type Description

task_id

UUID

the ID of the submitted task

num_subjects_deleted

Integer

This is a legacy field kept for compatibility. It will always be 0.

Example
{
  "task_id": "450538fb-cf9c-48fc-bd6f-08abc5e86da9",
  "num_subjects_deleted": 0
}
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Batch Delete by Subject
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Entry Delete
    • Entry Show
    • Index Create
    • Index Delete
    • Index List
    • Index Reopen
    • Index Show
    • Ingest
    • Query - GET
    • Query - POST
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Subject Show
    • Task List
    • Task Show
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
© 2010- The University of Chicago Legal Privacy Accessibility