Ingest
The Ingest API allows you to perform a bulk create or update operation, (over)writing entries via a single API call.
The API allows you to submit bulk data to be written to an index. The data will be queued up to be added to index_id, and the call will return a Task ID which can be used with the Get Task API to check the status of the ingest request.
You may also want to read the documentation for Types and Type Detection in order to understand how Globus Search will assign types to your data.
Method |
POST |
URL |
/v1/index/<index_id>/ingest |
Authentication required? |
Yes |
Required Roles |
You must have either |
Request Body |
a GIngest document |
Response Body |
Authentication & Authorization
Tokens for this call must have one of these scopes.
urn:globus:scopes:search.api.globus.org:all urn:globus:scopes:search.api.globus.org:ingest
Examples
-
in the index
4de0e89e-a395-11e7-bc54-8c705ad34f60
-
with a
subject
ofhttps://example.com/foo/bar
-
with a null
entry_id
-
public visibility
curl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer xxxxx' \
-XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
--data '
{
"ingest_type": "GMetaEntry",
"ingest_data": {
"subject": "https://example.com/foo/bar",
"visible_to": ["public"],
"content": {
"foo/bar": "some val"
}
}
}
'
The datatype of the ingest_data
document in GMetaEntry
for a single entry
is GMetaEntry
.
content
is an arbitrary JSON body.
-
in the index
4de0e89e-a395-11e7-bc54-8c705ad34f60
-
with
subject
values ofhttps://example.com/foo/bar
andhttps://example.com/foo/bar/baz
-
with
entry_id
values ofnull
,"alpha"
, and"beta"
-
public visibility and visibility only to the user
globus@globus.org
-
The ID of
globus@globus.org
is46bd0f56-e24f-11e5-a510-131bef46955c
, so this is the value which will be used
-
curl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer xxxxx' \
-XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
--data '
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"subject": "https://example.com/foo/bar",
"visible_to": ["public"],
"content": {
"foo/bar": "some val"
}
},
{
"subject": "https://example.com/foo/bar",
"id": "alpha",
"visible_to": [
"urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
],
"content": {
"foo/bar": "some otherval"
}
},
{
"subject": "https://example.com/foo/bar/baz",
"id": "alpha",
"visible_to": [
"urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"
],
"content": {
"foo/bar/baz": "some val"
}
},
{
"subject": "https://example.com/foo/bar/baz",
"id": "beta",
"visible_to": ["public"],
"content": {
"foo/bar/baz": "some otherval"
}
}
]
}
}
'
This time, the ingest_data
is of type GMetaList
.
GMetaList.gmeta
is an array of GMetaEntry
documents.
The first entry does not specify an id
, so its entry_id
is null
.
The notation in visible_to
is a Principal URN
-
in the index
4de0e89e-a395-11e7-bc54-8c705ad34f60
-
with
subject
values ofhttps://example.com/foo/
-
with
entry_id
values of"alpha"
, and"beta"
-
public visibility and visibility only to the Group with ID
0a4dea26-44cd-11e8-847f-0e6e723ad808
curl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer xxxxx' \
-XPOST 'https://search.api.globus.org/v1/index/4de0e89e-a395-11e7-bc54-8c705ad34f60/ingest' \
--data '
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"subject": "https://example.com/foo",
"id": "alpha",
"visible_to": [
"urn:globus:group:id:0a4dea26-44cd-11e8-847f-0e6e723ad808"
],
"content": {
"foo/bar": "some val"
}
},
{
"subject": "https://example.com/foo",
"id": "beta",
"visible_to": ["public"],
"content": {
"foo/bar/baz": "some otherval"
}
}
]
}
}
'
The value in the visible_to
field of the first entry above is a
Principal URN for a Globus Group.
Request Schemas
GMetaEntry
A GMetaEntry is a single block of data pertaining to a given subject.
Field Name | Type | Description |
---|---|---|
subject |
String |
The entity described by this metadata, typically a URI |
visible_to |
Array of Strings |
This is a list of security principals allowed to read the metadata. Each string
will be in the form of a Principal URN, or
the special string |
content |
Object |
A GMetaContent. This is the actual metadata to assert about subject |
id |
String |
Optional. A unique identifier for this metadata entry. This value will be used on further API operations which reference this entry such as updates or delete. When id is not provided, it is assumed to have a default "null" value. |
{
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"http://transfer.api.globus.org/metadata-schema/file#type": "file"
}
}
{
"subject": "https://search.api.globus.org/abc.txt",
"mimetype": "application/json",
"visible_to": ["urn:globus:auth:identity:46bd0f56-e24f-11e5-a510-131bef46955c"],
"id" : "visible_to_globus@globus.org",
"content": {
"http://transfer.api.globus.org/metadata-schema/file#type": "file",
"http://transfer.api.globus.org/metadata-schema/file#extension": "txt",
"http://transfer.api.globus.org/metadata-schema/file#name" : "abc.txt"
}
}
This document is a superset of Example 1, but is only visible to the user
globus@globus.org
. This demonstrates how multiple entries about the same
subject, but with different IDs, can be useful: some data is only visible to
certain users or groups, while other data is public.
GMetaContent
GMetaContent is arbitrary structured data provided by data sources for Globus Search. It has only one special field, @context.
Field Name | Type | Description |
---|---|---|
@context |
Object |
A set of shorthands which will be expanded in all other fields of the document |
The @context field is used to define a shorthand for values which are interpolated into the document keys. To best understand, see the examples section.
All text or string type fields are constrained on their total length when used for faceting or sorting. A record containing more than 10,000 characters in a field will not appear in any facet buckets for that field. A record which contains more than 10,000 characters will appear at the end of any sort operation on that field even though it may lexically appear earlier in the list.
{
"@context": {
"f": "file_meta"
},
"f:type": "file",
"f:extension": "txt",
"f:name" : "abc.txt"
}
which is equivalent to and will be expanded as:
{
"file_meta#type": "file",
"file_meta#extension": "txt",
"file_meta#name": "abc.txt"
}
GMetaList
A GMetaList is a collection of GMetaEntry
documents.
Field Name | Type | Description |
---|---|---|
gmeta |
Array |
an array of |
{
"gmeta": [
{
"subject": "https://datasearch.demo.globus.org/",
"mimetype": "application/json",
"visible_to": ["public"],
"id" : "valid_doc_1",
"content": {
"type": "file",
"extension": "txt",
"name" : "abc.txt"
}
}
]
}
GIngest
A GIngest document is a wrapper around a GMetaList or GMetaEntry which supplies attributes relevant to the ingest and indexing of metadata into the Globus Search service.
Field Name | Type | Description |
---|---|---|
ingest_type |
String |
must be one of {"GMetaList", "GMetaEntry"}. Describes the type of ingest_data |
ingest_data |
Object |
must be a document of the type named in ingest_type. This is the data to add to the Search Index |
{
"ingest_type": "GMetaEntry",
"ingest_data": {
"subject": "https://search.api.globus.org/",
"mimetype": "application/json",
"visible_to": ["public"],
"id": "stephen_test_doc_2016_11_13",
"content": {
"type": "file",
"extension": "txt",
"name" : "stephen's test document with spaces.txt"
}
}
}
{
"ingest_type": "GMetaEntry",
"ingest_data": {
"subject": "https://search.api.globus.org/",
"mimetype": "application/json",
"visible_to": ["public"],
"id": "test_doc_2017_06_14",
"content": {
"type": "file",
"extension": "txt",
"name" : "another_document_without_spaces.txt"
}
}
}
Response Schemas
IngestResponse
Field Name | Type | Description |
---|---|---|
task_id |
UUID |
The ID of the submitted Task |
as_identity |
String |
The principal URN of the caller’s primary ID |
success |
Boolean |
This is a deprecated field kept for backwards compatibility.
Always |
num_documents_ingested |
Integer |
This is a deprecated field kept for backwards compatibility.
Always |