Search API Menu

Globus SearchOverviewAPI Usage & BasicsIngestQueryTypes, Type Detection, and SchemasError Handling
API Reference
Create or Update EntryDelete by QueryDelete by SubjectDelete EntryGet EntryGET QueryGet SubjectGet TaskIngestPOST QueryQuery Template APIsScroll QueryShow IndexTask List
Guides
Role Based Filtering
Globus Search LimitsAPI Change History
Globus Docs
  • APIs
    • Auth
    • Transfer
    • Groups
    • Search
    • Python SDK
    • Helper Pages
  • How To
  • Guides
    • Globus Connect Server Installation Guides
    • High Assurance Collections for Protected Data
    • Management Console Guide
    • Command Line Interface
    • Premium Storage Connectors
    • Security
    • Modern Research Data Portal
  • Support
    • FAQs
    • Mailing Lists
    • Contact Us
    • Check Support Tickets
  1. Home
  2. Globus APIs
  3. Globus Search
  4. API Reference

Ingest

The Ingest API allows you to perform a bulk create or update operation, (over)writing entries via a single API call.

The API allows you to submit bulk data to be written to an index. The data will be queued up to be added to index_id, and the call will return a Task ID which can be used with the Get Task API to check the status of the ingest request.

You may also want to read the documentation for Types and Type Detection in order to understand how Globus Search will assign types to your data.

Method

POST

URL

/v1/index/<index_id>/ingest

Authentication required?

Yes

Required Roles

You must have either admin, writer access

Request Body

a GIngest document

Response Body

an IngestResponse

Authentication & Authorization

Tokens for this call must have one of these scopes.

urn:globus:scopes:search.api.globus.org:all
urn:globus:scopes:search.api.globus.org:ingest

Examples

Ingesting a single entry

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with a subject of https://example.com/foo/bar

  • with a null entry_id

  • public visibility

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
   --data '
{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://example.com/foo/bar",
    "visible_to": ["public"],
    "content": {
      "foo/bar": "some val"
    }
  }
}
'

The datatype of the ingest_data document in GMetaEntry for a single entry is GMetaEntry.

content is an arbitrary JSON body.

Ingesting a list of entries (1)

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with subject values of https://example.com/foo/bar and https://example.com/foo/bar/baz

  • with entry_id values of null, "alpha", and "beta"

  • public visibility and visibility only to the user globus@globus.org

    • The ID of globus@globus.org is 46bd0f56-e24f-11e5-a510-131bef46955c, so this is the value which will be used

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "subject": "https://example.com/foo/bar",
        "visible_to": ["public"],
        "content": {
          "foo/bar": "some val"
        }
      },
      {
        "subject": "https://example.com/foo/bar",
        "id": "alpha",
        "visible_to": [
          "urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
        ],
        "content": {
          "foo/bar": "some otherval"
        }
      },
      {
        "subject": "https://example.com/foo/bar/baz",
        "id": "alpha",
        "visible_to": [
          "urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
        ],
        "content": {
          "foo/bar/baz": "some val"
        }
      },
      {
        "subject": "https://example.com/foo/bar/baz",
        "id": "beta",
        "visible_to": ["public"],
        "content": {
          "foo/bar/baz": "some otherval"
        }
      }
    ]
  }
}
'

This time, the ingest_data is of type GMetaList.

GMetaList.gmeta is an array of GMetaEntry documents.

The first entry does not specify an id, so its entry_id is null.

The notation in visible_to is a Principal URN

Ingesting a list of entries (2)

  • in the index 4de0e89e-a395-11e7-bc54-8c705ad34f60

  • with subject values of https://example.com/foo/

  • with entry_id values of "alpha", and "beta"

  • public visibility and visibility only to the Group with ID 0a4dea26-44cd-11e8-847f-0e6e723ad808

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "subject": "https://example.com/foo",
        "id": "alpha",
        "visible_to": [
          "urn:globus:group:id:0a4dea26-44cd-11e8-847f-0e6e723ad808"
        ],
        "content": {
          "foo/bar": "some val"
        }
      },
      {
        "subject": "https://example.com/foo",
        "id": "beta",
        "visible_to": ["public"],
        "content": {
          "foo/bar/baz": "some otherval"
        }
      }
    ]
  }
}
'

The value in the visible_to field of the first entry above is a Principal URN for a Globus Group.

Request Schemas

GMetaEntry

A GMetaEntry is a single block of data pertaining to a given subject.

Field Name Type Description

subject

String

The entity described by this metadata, typically a URI

visible_to

Array of Strings

This is a list of security principals allowed to read the metadata. Each string will be in the form of a Principal URN, or the special string "public".

content

Object

A GMetaContent. This is the actual metadata to assert about subject

id

String

Optional. A unique identifier for this metadata entry. This value will be used on further API operations which reference this entry such as updates or delete. When id is not provided, it is assumed to have a default "null" value.

Example 1

{
  "subject": "https://search.api.globus.org/abc.txt",
  "visible_to": ["public"],
  "content": {
    "http://transfer.api.globus.org/metadata-schema/file#type": "file"
  }
}

Example 2

{
  "subject": "https://search.api.globus.org/abc.txt",
  "mimetype": "application/json",
  "visible_to": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"],
  "id" : "visible_to_globus@globus.org",
  "content": {
    "http://transfer.api.globus.org/metadata-schema/file#type": "file",
    "http://transfer.api.globus.org/metadata-schema/file#extension": "txt",
    "http://transfer.api.globus.org/metadata-schema/file#name" : "abc.txt"
  }
}

This document is a superset of Example 1, but is only visible to the user globus@globus.org. This demonstrates how multiple entries about the same subject, but with different IDs, can be useful: some data is only visible to certain users or groups, while other data is public.

GMetaContent

GMetaContent is arbitrary structured data provided by data sources for Globus Search. It has only one special field, @context.

Field Name Type Description

@context

Object

A set of shorthands which will be expanded in all other fields of the document

The @context field is used to define a shorthand for values which are interpolated into the document keys. To best understand, see the examples section.

Special Note: Long Fields

All text or string type fields are constrained on their total length when used for faceting or sorting. A record containing more than 10,000 characters in a field will not appear in any facet buckets for that field. A record which contains more than 10,000 characters will appear at the end of any sort operation on that field even though it may lexically appear earlier in the list.

Example 1

{
  "@context": {
    "f": "file_meta"
  },
  "f:type": "file",
  "f:extension": "txt",
  "f:name" : "abc.txt"
}

which is equivalent to and will be expanded as:

{
  "file_meta#type": "file",
  "file_meta#extension": "txt",
  "file_meta#name": "abc.txt"
}

GMetaList

A GMetaList is a collection of GMetaEntry documents.

Field Name Type Description

gmeta

Array

an array of GMetaEntry documents

Example 1

{
  "gmeta": [
    {
      "subject": "https://datasearch.demo.globus.org/",
      "mimetype": "application/json",
      "visible_to": ["public"],
      "id" : "valid_doc_1",
      "content": {
          "type": "file",
          "extension": "txt",
          "name" : "abc.txt"
      }
    }
  ]
}

GIngest

A GIngest document is a wrapper around a GMetaList or GMetaEntry which supplies attributes relevant to the ingest and indexing of metadata into the Globus Search service.

Field Name Type Description

ingest_type

String

must be one of {"GMetaList", "GMetaEntry"}. Describes the type of ingest_data

ingest_data

Object

must be a document of the type named in ingest_type. This is the data to add to the Search Index

Example 1

{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://search.api.globus.org/",
    "mimetype": "application/json",
    "visible_to": ["public"],
    "id": "stephen_test_doc_2016_11_13",
    "content": {
      "type": "file",
      "extension": "txt",
      "name" : "stephen's test document with spaces.txt"
    }
  }
}

Example 2

{
  "ingest_type": "GMetaEntry",
  "ingest_data": {
    "subject": "https://search.api.globus.org/",
    "mimetype": "application/json",
    "visible_to": ["public"],
    "id": "test_doc_2017_06_14",
    "content": {
      "type": "file",
      "extension": "txt",
      "name" : "another_document_without_spaces.txt"
    }
  }
}

Response Schemas

IngestResponse

Field Name Type Description

task_id

UUID

The ID of the submitted Task

as_identity

String

The principal URN of the caller’s primary ID

success

Boolean

This is a deprecated field kept for backwards compatibility. Always true

num_documents_ingested

Integer

This is a deprecated field kept for backwards compatibility. Always 0

Globus SearchOverviewAPI Usage & BasicsIngestQueryTypes, Type Detection, and SchemasError Handling
API Reference
Create or Update EntryDelete by QueryDelete by SubjectDelete EntryGet EntryGET QueryGet SubjectGet TaskIngestPOST QueryQuery Template APIsScroll QueryShow IndexTask List
Guides
Role Based Filtering
Globus Search LimitsAPI Change History
© 2010- The University of Chicago Legal Accessibility