Search API Menu
  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
Skip to main content
Globus Docs
  • APIs
    Auth Flows Groups Search Transfer Python SDK Helper Pages
  • How To
  • Guides
    Globus Connect Server High Assurance Collections for Protected Data Command Line Interface Premium Storage Connectors Security Modern Research Data Portal
  • Support
    FAQs Mailing Lists Contact Us Check Support Tickets
  1. Home
  2. Globus APIs
  3. Globus Search

Ingest

Table of Contents
  • 1. Overview
    • 1.1. API Methods
  • 2. Ingest Document Walkthrough
    • 2.1. GMetaEntry, Subjects, and Entries
    • 2.2. Complete Example Document
  • 3. Monitoring an Ingest Task

1. Overview

The Ingest API is used to send metadata into Globus Search. It is the primary way in which you add data to an index.

You send documents by POSTing them to the Ingest API, and getting back a Task ID. The Task ID lets you check on the status of your request to add data to an index. Globus Search will automatically retry certain failures and will guarantee the ordered delivery of ingest requests.

You can then check the status of that Task using the Get Task API. Once your task is complete, the data will be available to search queries.

1.1. API Methods

Ingest API

Submit an Ingest Task

Get Task

Get the status of a Task

2. Ingest Document Walkthrough

The Ingest API accepts GIngest documents.

You can read the full GIngest specification below for a more rigorous definition and some examples, but we will first cover the two forms this document can take and include several examples.

Every GIngest document is either a single GMetaEntry document:

{
  "ingest_type": "GMetaEntry",
  "ingest_data": { ... }
}

or a single GMetaList document:

{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        ...
      }
    ]
  }
}

A GMetaList has the field gmeta, containing an array of GMetaEntry documents. There’s no constraint on the documents themselves and they do not have to be related. So from here on out we’ll really focus on the GMetaEntry.

2.1. GMetaEntry, Subjects, and Entries

If you haven’t read the Overview of Globus Search, you should really stop and read it now. We’re going to talk about Subjects and Entries and you’ll need to know what these are to read and write sensible GMetaEntry documents.

Let’s start with a really simple GMetaEntry document

{
  "subject": "https://search.api.globus.org/abc.txt",
  "visible_to": ["public"],
  "content": {
    "name": "abc.txt",
    "extension": "txt",
    "type": "file"
  }
}

This describes the Subject https://search.api.globus.org/abc.txt, a "Search result", with public visibility and one searchable field: metadata-schema/file#type: file.

The subject is any string you wish to use as a search result — we just made this one up and the content is an almost arbitrary JSON blob describing it.

As mentioned in the Overview, there is no Entry ID for this data, so the Entry ID is null. We’ll cover the few restrictions on content in a moment, but first, let’s look at that special field: visible_to.

visible_to is a list of security principals allowed to read the metadata. That is to say people, or descriptors for groups of people, who can see this result when they query Globus Search. Each string must be a Principal URN, or one of the special strings "public" or "all_authenticated_users".

The meanings of these strings is covered in the overview of visibility values.

2.1.1. GMetaEntry.content

content is the main body of a GMetaEntry, and it is this data which will be indexed and queryable in Search.

2.1.2. GMetaEntry.id

Our original example entry document did not specify an id and therefore used a null ID.

The ID is used to distinguish between multiple Entries for a single Subject.

It is also needed to access the entry operations like the Get Entry API, which provides read capabilities for individual entries.

id is an arbitrary string field in a GMetaEntry document. For example, here’s a GMetaEntry with an explicit id:

{
  "id": "filetype",
  "subject": "https://search.api.globus.org/abc.txt",
  "visible_to": ["public"],
  "content": {
    "type": "file"
  }
}

2.2. Complete Example Document

Now that everything has been introduced, let’s combine them all into a single GIngest document with multiple subjects and multiple entries.

{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "id": "filetype",
        "subject": "https://search.api.globus.org/abc.txt",
        "visible_to": ["public"],
        "content": {
          "type": "file"
        }
      },
      {
        "id": "size",
        "subject": "https://search.api.globus.org/abc.txt",
        "visible_to": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"],
        "content": {
          "size": "1000000",
          "size_human": "1MB"
        }
      },
      {
        "subject": "https://search.api.globus.org/def.txt",
        "visible_to": ["public"],
        "content": {
          "type": "file",
          "size": "1000000",
          "size_human": "1MB"
        }
      }
    ]
  }
}

Two of the entries have explicit id fields and one is using the implicit null id.

One of them sets visible_to to only let data be viewed by a single specific identity, while the other two are public.

Two documents describe https://search.api.globus.org/abc.txt and one describes https://search.api.globus.org/def.txt.

To submit this to Search, use the Ingest API.

For example, to submit the above document to index 4de0e89e-a395-11e7-bc54-8c705ad34f60 using a token xxxxx, you could run the following command:

curl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer xxxxx' \
  -XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
  --data '
{
  "ingest_type": "GMetaList",
  "ingest_data": {
    "gmeta": [
      {
        "id": "filetype",
        "subject": "https://search.api.globus.org/abc.txt",
        "visible_to": ["public"],
        "content": {
          "type": "file"
        }
      },
      {
        "id": "size",
        "subject": "https://search.api.globus.org/abc.txt",
        "visible_to": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"],
        "content": {
          "size": "1000000",
          "size_human": "1MB"
        }
      },
      {
        "subject": "https://search.api.globus.org/def.txt",
        "visible_to": ["public"],
        "content": {
          "type": "file",
          "size": "1000000",
          "size_human": "1MB"
        }
      }
    ]
  }
}
'

3. Monitoring an Ingest Task

When you submit an Ingest task, the response will include a task_id.

You can then poll the status of the task to wait for it to succeed or fail. For example:

Example: Get task with task_id="05c1ec1b-2400-44e2-9797-922c29199042"
curl \
  -H 'Authorization: Bearer xxxxx' \
  -XGET 'https://search.api.globus.org/v1/task/05c1ec1b-2400-44e2-9797-922c29199042'

may output

{
  "state_description": "Task succeeded",
  "task_id": "05c1ec1b-2400-44e2-9797-922c29199042",
  "state": "SUCCESS",
  "creation_date": "2018-12-13T18:08:42.746911",
  "completion_date": "2018-12-13T18:08:44.539611",
  "additional_details": null,
  "message": null,
  "index_id": "696af25c-8c24-469a-b5e0-67d3e4b71df7"
}

When state is SUCCESS or FAILED, the task is complete.

  • Globus Search
  • Overview
  • API Usage & Basics
  • Ingest
  • Query
  • Types, Type Detection, and Schemas
  • Error Handling
  • API Reference
    • Create or Update Entry
    • Delete by Query
    • Delete by Subject
    • Delete Entry
    • Get Entry
    • GET Query
    • Get Subject
    • Get Task
    • Index Create (BETA)
    • Index Delete (BETA)
    • Index List
    • Index Reopen (BETA)
    • Ingest
    • POST Query
    • Role Create
    • Role Delete
    • Role List
    • Scroll Query
    • Show Index
    • Task List
  • Guides
    • Geospatial Search
    • Role Based Filtering
    • Searchable Files
  • Globus Search Limits
  • API Change History
© 2010- The University of Chicago Legal Privacy Accessibility