Search API Menu
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Transfer Python SDK Helper Pages
  • How To
  • Guides
    Globus Connect Server High Assurance Collections for Protected Data Command Line Interface Premium Storage Connectors Security Modern Research Data Portal
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Globus APIs
  3. Globus Search
  4. API Reference

Ingest

The Ingest API allows you to perform a bulk create or update operation, (over)writing entries via a single API call.

The API allows you to submit bulk data to be written to an index. The data will be queued up to be added to index_id, and the call will return a Task ID which can be used with the Get Task API to check the status of the ingest request.

Submitted ingest tasks are guaranteed to execute in the order received, and subsequent tasks to re-ingest data for the same subject will update that subject to use the latest data.

You may also want to read the documentation for Types and Type Detection in order to understand how Globus Search will assign types to your data.

Method

POST

URL

/v1/index/<index_id>/ingest

Authentication required?

Yes

Required Roles

You must have owner, admin, or writer access

Request Body

a GIngest document

Response Body

an IngestResponse

Authentication & Authorization

Tokens for this call must have one of these scopes.

urn:globus:scopes:search.api.globus.org:all
urn:globus:scopes:search.api.globus.org:ingest

Examples

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with a subject of https://example.com/foo/bar

  • with a null entry_id

  • public visibility

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
   --data '
{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://example.com/foo/bar",
    "visible_to": ["public"],
    "content": {
      "foo/bar": "some val"
    }
  }
}
'

The datatype of the ingest_data document in GMetaEntry for a single entry is GMetaEntry.

content is an arbitrary JSON body.

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with subject values of https://example.com/foo/bar and https://example.com/foo/bar/baz

  • with entry_id values of null, "alpha", and "beta"

  • public visibility and visibility only to the user globus@globus.org

    • The ID of globus@globus.org is 46bd0f56-e24f-11e5-a510-131bef46955c, so this is the value which will be used

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "subject": "https://example.com/foo/bar",
        "visible_to": ["public"],
        "content": {
          "foo/bar": "some val"
        }
      },
      {
        "subject": "https://example.com/foo/bar",
        "id": "alpha",
        "visible_to": [
          "urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
        ],
        "content": {
          "foo/bar": "some otherval"
        }
      },
      {
        "subject": "https://example.com/foo/bar/baz",
        "id": "alpha",
        "visible_to": [
          "urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
        ],
        "content": {
          "foo/bar/baz": "some val"
        }
      },
      {
        "subject": "https://example.com/foo/bar/baz",
        "id": "beta",
        "visible_to": ["public"],
        "content": {
          "foo/bar/baz": "some otherval"
        }
      }
    ]
  }
}
'

This time, the ingest_data is of type GMetaList.

GMetaList.gmeta is an array of GMetaEntry documents.

The first entry does not specify an id, so its entry_id is null.

The notation in visible_to is a Principal URN

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with subject values of https://example.com/foo/

  • with entry_id values of "alpha", and "beta"

  • all_authenticated_users visibility and visibility only to the Group with ID 0a4dea26-44cd-11e8-847f-0e6e723ad808

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "subject": "https://example.com/foo",
        "id": "alpha",
        "visible_to": [
          "urn:globus:group:id:0a4dea26-44cd-11e8-847f-0e6e723ad808"
        ],
        "content": {
          "foo/bar": "some val"
        }
      },
      {
        "subject": "https://example.com/foo",
        "id": "beta",
        "visible_to": ["all_authenticated_users"],
        "content": {
          "foo/bar/baz": "some otherval"
        }
      }
    ]
  }
}
'

The value in the visible_to field of the first entry above is a Principal URN for a Globus Group.

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with subject values of https://example.com/chicago and https://example.com/vancouver

  • public visibility

  • a field_mapping which specifies that location is a geo_point (a coordinate pair)

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "subject": "https://example.com/chicago",
        "visible_to": ["public"],
        "content": {
          "name": "Chicago",
          "location": "41.9, 87.6"
        }
      },
      {
        "subject": "https://example.com/vancouver",
        "visible_to": ["public"],
        "content": {
          "name": "Vancouver",
          "location": {
            "lat": "49.3",
            "lon": "123.1"
          }
        }
      }
    ]
  },
  "field_mapping": {
    "location": "geo_point"
  }
}
'

Note that the geo_point values given in location can be expressed as comma-separated strings (latitude, longitude) or as objects with the keys lat and lon. Globus Search supports a wide variety of geo-data formats. For more details, see the Geospatial Search Guide.

Request Schemas

GMetaEntry

A GMetaEntry is a single block of data pertaining to a given subject.

Field Name Type Description

subject

String

The entity described by this data, typically a URI or other identifier for the document in question.

visible_to

Array of Strings

This is a list of security principals allowed to read the metadata. Each string will be in the form of a Principal URN, or the special strings "public" or "all_authenticated_users".

principal_sets

Object

A mapping from strings to lists of principals in Principal URN format. These allow for search results to be filtered based on a user’s principals (identities and groups) using the filter_principal_sets query feature.

The principal_sets of an entry do not impact visibility positively or negatively. They only apply as filters when filter_principal_sets is used.

principal_sets also appear in search results in the matched_principal_sets field.

content

Object

An arbitrary object containing data. This is the actual data which will be indexed and queryable.

id

String

Optional. A unique identifier for this metadata entry. This value will be used on further API operations which reference this entry such as updates or delete. When id is not provided, it is assumed to have a default "null" value.

Warning

For older indices, @context is a field within content with special meaning.

{
  "subject": "https://search.api.globus.org/robots.txt",
  "visible_to": ["public"],
  "content": {
    "type": "file"
  }
}
{
  "subject": "https://search.api.globus.org/robots.txt",
  "mimetype": "application/json",
  "visible_to": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"],
  "id" : "visible_to_globus@globus.org",
  "content": {
    "type": "file",
    "extension": "txt",
    "name" : "robots.txt"
  }
}

This document is a superset of Example 1, but is only visible to the user globus@globus.org. This demonstrates how multiple entries about the same subject, but with different IDs, can be useful: some data is only visible to certain users or groups, while other data is public.

{
  "subject": "https://search.api.globus.org/robots.txt",
  "mimetype": "application/json",
  "visible_to": ["all_authenticated_users"],
  "principal_sets": {
    "admin": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"]
  },
  "id" : "visible_to_globus@globus.org",
  "content": {
    "type": "file",
    "extension": "txt",
    "name" : "robots.txt"
  }
}

This document is a similar to the others, but includes a principal_sets declaration listing globus@globus.org under the admin set. The document is visible to all_authenticated_users — meaning anyone can see it once they login.

For most users, the admin list is not visible, but for globus@globus.org, the search results will include "matched_principal_sets": ["admin"]. A query by this user with filter_principal_sets=admin would return the document, but a query with filter_principal_sets=monitor (not listed here) would not.

This demonstrates how principal_sets can be used to implement role-based filtering on documents without impacting visibility.

Special Note: Long Fields

All text or string type fields are constrained on their total length when used for faceting or sorting. A record containing more than 10,000 characters in a field will not appear in any facet buckets for that field. A record which contains more than 10,000 characters will appear at the end of any sort operation on that field even though it may lexically appear earlier in the list.

Content Schema is Recorded

The format and schema of content must match any existing documents in your index.

For example, if last_modified is a field in content which is formatted as a date, and parsed as a date (see Type Detection), then last_modified must always be formatted as a date. Passing an integer, String, Object, or other datatype will cause the ingest task to fail.

GMetaList

A GMetaList is a collection of GMetaEntry documents.

Field Name Type Description

gmeta

Array

an array of GMetaEntry documents

{
  "gmeta": [
    {
      "subject": "https://datasearch.demo.globus.org/",
      "mimetype": "application/json",
      "visible_to": ["public"],
      "id" : "valid_doc_1",
      "content": {
          "type": "file",
          "extension": "txt",
          "name" : "abc.txt"
      }
    }
  ]
}

GIngest

A GIngest document is a wrapper around a GMetaList or GMetaEntry which supplies attributes relevant to the ingest and indexing of metadata into the Globus Search service.

Field Name Type Description

ingest_type

String

must be one of {"GMetaList", "GMetaEntry"}. Describes the type of ingest_data

ingest_data

Object

must be a document of the type named in ingest_type. This is the data to add to the Search Index

field_mapping

Object

A mapping from field names (dotted) to the types for those fields. Currently only supports geo_point and geo_shape as types.

The field_mapping will be applied as an update to the index prior to any data being added to the index. If the field_mapping cannot be applied, the task will fail with a message about the mapping.

For details about the geo_point and geo_shape types, see the Geospatial Search Guide.

{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://search.api.globus.org/",
    "mimetype": "application/json",
    "visible_to": ["public"],
    "id": "stephen_test_doc_2016_11_13",
    "content": {
      "type": "file",
      "extension": "txt",
      "name" : "stephen's test document with spaces.txt"
    }
  }
}
{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://search.api.globus.org/",
    "mimetype": "application/json",
    "visible_to": ["public"],
    "id": "test_doc_2017_06_14",
    "content": {
      "type": "file",
      "extension": "txt",
      "name" : "another_document_without_spaces.txt"
    }
  }
}

Response Schemas

IngestResponse

Field Name Type Description

task_id

UUID

The ID of the submitted Task

as_identity

String

The principal URN of the caller’s primary ID

success

Boolean

This is a deprecated field kept for backwards compatibility. Always true

num_documents_ingested

Integer

This is a deprecated field kept for backwards compatibility. Always 0

  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
© 2010- The University of Chicago Legal Privacy Accessibility