worker-globfilter
The GlobFilter Worker is a Batch Worker which takes in a glob filter as the batch definition, filters a given directory for matches and creates items of work from those matches.
Input
The following is an example input JSON for the Glob Filter Worker:
{
"batchDefinition":"input-sub-folder/**.txt",
"batchType":"GlobPattern",
"taskMessageType":"DocumentMessage",
"taskMessageParams":{
"pi:datastorePartialReference":"74c98be740b44d64b5b7a4e224555917",
"field:binaryFile":"CONTENT",
"field:fileName":"FILE_NAME",
"field:binaryFileReference":"STORAGE_REFERENCE",
"newField:aNewField":"aNewFieldValue",
"cd:aCustomDataField":"aCustomDataFieldValue"
},
"targetPipe":"langdetect-in"
}
Input JSON fields
-
batchDefinition
The glob filter to match - in this caseinput-sub-folder/**.txt
. -
batchType
The plugin to use for processing of the batchDefinition. Currently the only supported batch type isGlobPattern
. Other batch types may be added in the future as required. -
taskMessageType
The type of TaskMessage that should be output from the worker. This must be set to "DocumentMessage" (different types may become configurable in the future). -
taskMessageParams
A list of namespace message parameters that the worker uses to build Task Messages (pi
andfield
namespaces are described under Task Message Parameters Namespaces):pi:datastorePartialReference
The DataStore service partial reference to store file binaries against.field:binaryFile
The name of the field that will hold the reference to the content of the file asstorage_ref
encoded string.field:fileName
The name of the field that will hold the name of the file asutf-8
encoded string.field:binaryFileReference
The name of the field that will hold the storage reference of the file asutf-8
encoded string.newField:aNewField
A new field and value to be added to the output Documents' taskData fields. Given the example above, a field with a key calledaNewField
and a value ofaNewFieldValue
will be added.cd:aCustomDataField
A field and value to add to the output Documents' taskData customData. Given the example above, a field with a key calledaCustomDataField
and a value ofaCustomDataFieldValue
will be added.
-
targetPipe
The queue that generated TaskMessages should be output to.
Task Message Parameter Namespaces
The following tables describe the Glob Filter Worker's processing instruction and field namespace parameters:
Processing instructions
The processing instructions, denoted with pi
, are parameters that are used by worker operations.
The following table lists the processing instructions that are used by the Glob Filter worker:
pi | Description |
---|---|
datastorePartialReference | The DataStore service partial reference to store file binaries against. |
Field
The fields, denoted with field
, are parameters that provide the name of the fields as to which information on the glob matched file will be stored.
The following table lists the fields that will be added to TaskMessage taskData fields
output from the Glob Filter Worker when specified:
field | Description |
---|---|
binaryFile | The name of the field that will hold the reference to the content. |
binaryFileReference | The name of the field that will hold the storage reference of the file. |
fileName | The name of the field that will hold the name of the file. |
Note that if a name is not specified for any of the above fields then that field will not be added to the taskData.
Output
The following is an example TaskMessage taskData JSON returned from the Glob Filter Worker for a file that matched the glob operation:
{
"fields":{
"aNewField":[
{
"data":"aNewFieldValue"
}
],
"CONTENT":[
{
"data":"74c98be740b44d64b5b7a4e224555917/adb4cdce-62f1-4ada-a7e7-463e9abd4b95",
"encoding":"storage_ref"
}
],
"FILE_NAME":[
{
"data":"ATextFileThatMatchedTheGlobFilterOutput.txt"
}
],
"STORAGE_REFERENCE":[
{
"data":"74c98be740b44d64b5b7a4e224555917/adb4cdce-62f1-4ada-a7e7-463e9abd4b95"
}
]
},
"customData":{
"aCustomDataField":"aCustomDataFieldValue"
}
}
Environment Variables
CAF_GLOB_WORKER_BINARY_DATA_INPUT_FOLDER
The input folder to be scanned for matches e.g:/mnt/caf-datastore-root/sample-files
.