Skip to main content
To start using Unstructured right away, skip ahead to the UI quickstart or API quickstart now!

What is Unstructured?

Unstructured provides a platform and tools to ingest and process unstructured documents for retrieval-augmented generation (RAG) and agentic AI. This 60-second video describes more about what Unstructured does and its benefits (no sound):
This 40-second video demonstrates a simple use case that Unstructured helps solve (no sound):
This 60-second video shows why using Unstructured is preferable to building your own similar solution:
You can use Unstructured through a user interface (UI), an API, or both. Read on to learn more.

Unstructured UI quickstart

This quickstart shows how, in just a few minutes, you can use the Unstructured user interface (UI) to quickly and easily see Unstructured’s best-in-class transformation results for a single file that is stored on your local computer.
This quickstart focuses on a single, local file for ease-of-use demonstration purposes.To use Unstructured later to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, skip over to the remote quickstart after you finish this one.
If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io. Do the following:
  1. After you are signed in, the Start page appears.
  2. In the Welcome area, do one of the following:
    • Click one of the sample files, such as realestate.pdf, to have Unstructured parse and transform that sample file.
    • Click Browse files, and then browse to and select one of your own files, to have Unstructured parse and transform it. If you choose to use your own file, the file must be 10 MB or less in size. Also, the file must one of the following supported file types:
      File extension
      .bmp
      .csv
      .doc
      .docx
      .eml
      .epub
      .heic
      .html
      .jpeg
      .jpg
      .md
      .msg
      .odt
      .org
      .p7s
      .pdf
      .png
      .ppt
      .pptx
      .rst
      .rtf
      .tif
      .tiff
      .tsv
      .txt
      .xls
      .xlsx
      .xml
    Welcome interface on the Start page
  3. After Unstructured has finished parsing and transforming the file (a process known as partitioning), you will see the file’s contents in the Preview pane in the center and Unstructured’s results in the Result pane on the right. Unstructured's parse and transform results
  4. The Result pane shows a formatted view of Unstructured’s results by default. This formatted view is designed for human readability. To see the underlying JSON view of the results, which is designed for RAG and agentic AI, click JSON at the top of the Result pane. Learn about what’s in the JSON view. Switching to the JSON view of the results
  5. Unstructured’s initial results are based on its High Res partitioning strategy, which begins processing the file’s contents and converting these contents into a series of Unstructured document elements and metadata. This partitioning strategy provides good results overall, depending on the complexity of the file’s contents. This partioning strategy also generates a bounding box for each detected object in the file. A bounding box is an imaginary rectangular box drawn around the object to show its location and extent within the file. After the High Res partitioning results are shown, Unstructured begins improving these initial results by using vision language models (VLMs) to apply a series of generative refinements known as enrichments. These enrichments include:
    • An image description enrichment, which uses a VLM to provide a text-based summary of the contents of the each detected image.
    • A generative OCR enrichment, which uses a VLM to improve the accuracy of each block of initially-processed text.
    • A table to HTML enrichment, which uses a VLM to provide an HTML-structured representation of each detected table.
    While these enrichments are being applied, a banner appears at the top of the Result pane. Updating the initial results with enrichments To see these enrichments applied to the initial results, click Update results in the banner as soon as this button appears, which might take up to a minute or more. Seeing the initial results udpated with the enrichments
    Each page that Unstructured processes by using this approach is counted as two pages for usage and billing purposes.This is because Unstructured processes each page once with its High Res partitioning strategy and then reprocessess each page with a VLM to improve the quality, accuracy, and relevance of the initial partitioning results. The final results of these two processing passes for each page count as two pages for usage and billing purposes. This two-pass process happens regardless of whether you click Update results in the banner.This two-page usage and billing behavior is a known issue and will be addressed in a future release.
  6. To synchronize the scrolling of the Preview pane’s selected contents with the Result pane’s Formatted results, rest your mouse pointer anywhere inside the contents of the Preview pane until a bounding box appears. Then click the bounding box. Unstructured automatically scrolls the Result pane’s Formatted results to match the selected bounding box. (You cannot synchronize the scrolling of the JSON results.) Selecting a bounding box To show all of the bounding boxes in the Preview pane at once, turn on the Show all bounding boxes toggle at the top of the Preview pane. You can now click any of the bounding boxes without first needing to rest your mouse pointer on them to show them. Showing all bounding boxes
You can also do the following:
  • To download the JSON view of the results as a local JSON file, click the download icon to the left of the Formatted and JSON buttons in the Result pane. (You cannot download the formatted view of the results.) Downloading the results as a local JSON file
  • To have Unstructured partition a different file, click Add new file in the Files pane on the left, and then browse to and select the target file.
  • To view the results for a file that was previously partitioned during this session, click the file’s name in the Recent files list in the Files pane.
  • To return to the Start page, click the X (close) button at the left on the title bar, next to Transform.
  • To have Unstructured do more—such as chunking, embedding, applying additional kinds of enrichments, and processing larger files and semi-structured data in batches at scale—click Edit in Workflow Editor at the right on the title bar, and then skip over to the walkthrough. Switching to the workflow editor
  Learn how to add chunking, embeddings, and additional enrichments to your results.   Learn more about the Unstructured user interface.

Unstructured API quickstart

This quickstart shows how you can use the Unstructured API to quickly and easily see Unstructured’s transformation results for a single file that is stored locally.
This quickstart uses the Unstructured API’s Partition Endpoint and focuses on a single, local file for ease-of-use demonstration purposes. This quickstart also focuses only on a limited set of Unstructured’s full capabilities.To unlock Unstructured’s full feature set, as well as use Unstructured to do large-scale batch processing of multiple files and semi-structured data that are stored in remote locations, skip over to an expanded, advanced version of this quickstart that uses the Unstructured API’s Workflow Endpoint instead.
  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your Unstructured Starter account, at https://platform.unstructured.io.
  2. Watch the following 3-minute video:
  Run this quickstart as a notebook on Google Colab instead.   Get the sample code for this video.   Get the full setup instructions for this video.   Learn more.

Pricing

Unstructured offers several account types with different pricing plans:
  •   Starter - A single user, with a single workspace, hosted alongside other accounts on Unstructured’s cloud infrastructure.
  •   Team - Multiple users and workspaces, hosted alongside other accounts on Unstructured’s cloud instrastructure.
  •   Enterprise - Multiple users and workspaces, isolated from all other accounts, with two hosting options for additional security and control:
    •   Dedicated instance - Hosted within a virtual private cloud (VPC) running inside Unstructured’s cloud infrastructure.
    •   In-VPC - Hosted within your own VPC on your own cloud infrastructure.
    Enterprise accounts also allow for robust customization of Unstructured’s features for your unique needs.
For more details, see the Unstructured Pricing page. To upgrade your account from Starter to Team, or from Team to Enterprise, email Unstructured Sales at sales@unstructured.io. Some of these plans have billing details that are determined on a per-page basis. Unstructured calculates a page as follows:
  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, Unstructured calculates the number of pages based on that metadata.
  • For all other file types, Unstructured calculates the number of pages as the file’s size divided by 100 KB.
  • For non-file data, Unstructured calculates a page as 100 KB of incoming data to be processed.

Questions? Need help?

  • For general questions about Unstructured products and pricing, email Unstructured Sales at sales@unstructured.io.
  • For technical support for Unstructured accounts, email Unstructured Support at support@unstructured.io.
  • For technical support for the Unstructured open source library, use our Slack community.