Enriching adds enhancments to the processed data that Unstructured produces. These enrichments include:
- Providing a summarized description of the contents of a detected image. Learn more.
- Providing a summarized description of the contents of a detected table. Learn more.
- Providing a representation of a detected table in HTML markup format. Learn more.
- Providing a list of recognized entities and their types, through a process known as named entity recognition (NER). Learn more.
- Having a vision language model (VLM) use advanced optical character recognition (OCR) to improve the accuracy of initially-processed text blocks. Learn more.
You can change enrichment settings only through Custom workflow settings.
Unstructured can potentially generate image summary descriptions, table summary descriptions, table-to-HTML output, and generative OCR optimizations, only for workflows that are configured as follows:
- With a Partitioner node set to use the Auto or High Res partitioning strategy, and an image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is added.
- With a Partitioner node set to use the VLM partitioning strategy. No image summary description node, table summary description node, table-to-HTML output node, or generative OCR optimization node is needed (or allowed).
- High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
- VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
- With a Partitioner node set to use the Fast partitioning strategy.
- With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain images or tables.
- Image to provide a summarized description of the contents of each detected image. Learn more.
- Table to provide a summarized description of the contents of each detected table. Learn more.
- Table can also provide a representation of each detected table in HTML markup format. Learn more.
- Text to provide a list of recognized entities and their types by using a technique called named entity recognition (NER). Learn more.
- Generative OCR to have a VLM use advanced OCR to improve the accuracy of initially-processed text blocks. Learn more.

