Skip to main content

Docling

Bundles group provider-specific components that integrate third-party services with SkillFlaw.

SkillFlaw integrates with Docling through components for remote conversion, export, and chunking of documents. The current public Docling bundle exposes Docling Serve, Chunk DoclingDocument, and Export DoclingDocument. Local inline Docling parsing is no longer exposed as a public component.

Prerequisites

  • Docling Serve access: You need a reachable Docling Serve instance to use the Docling Serve component.

  • Bundled Docling support: Current SkillFlaw builds include the bundled support required by Chunk DoclingDocument and Export DoclingDocument.

  • Earlier self-managed environments: If your environment does not include the bundled Docling support yet, install the Docling extra with uv pip install 'skillflaw[docling]'. For packaged desktop variants, add the corresponding dependency to the application's requirements.txt. For more information, see Install custom dependencies.

Use Docling components in a flow

tip

To learn more about content extraction with Docling, see the video tutorial Docling document processing for AI workflows.

This example demonstrates how to use Docling components to split a PDF in a flow:

  1. Connect a Docling Serve and an Export DoclingDocument component to a Split Text component.

    The Docling Serve component converts the uploaded document through your Docling Serve instance, and the Export DoclingDocument component converts the returned DoclingDocument into the format you select. This example converts the document to Markdown, with images represented as placeholders. The Split Text component will split the Markdown into chunks for the vector database to store in the next part of the flow.

  2. Connect a Chroma DB vector store component to the Split Text component's Chunks output.

  3. Connect an embedding model component to the Chroma DB component's Embedding port and a Chat Output component to view the extracted DataFrame.

  4. In the embedding model component, select your preferred model, provide credentials, and configure other settings as needed.

  5. In Docling Serve, set the service URL and add a file to process.

  6. To run the flow, click Playground.

    The chunked document is loaded as vectors into your vector database.

Docling components

The following sections describe the purpose and configuration options for each component in the Docling bundle.

Docling Serve

The Docling Serve component ingests documents and processes them with a Docling API service rather than a local model.

It outputs a DataFrame containing the processed DoclingDocument data.

For more information, see the Docling serve project repository.

Docling Serve parameters

NameTypeDescription
filesFileThe files to process.
api_urlStringURL of the Docling Serve instance.
max_concurrencyIntegerMaximum number of concurrent requests for the server.
max_poll_timeoutFloatMaximum waiting time for the document conversion to complete.
api_headersDictOptional dictionary of additional headers required for connecting to Docling Serve.
docling_serve_optsDictOptional dictionary of additional options for Docling Serve.

Chunk DoclingDocument

The Chunk DoclingDocument component splits DoclingDocument objects into chunks.

It outputs the chunked documents as a DataFrame.

For more information, see the Docling core project repository.

Chunk DoclingDocument parameters

NameTypeDescription
data_inputsData/DataFrameThe data with documents to split in chunks.
chunkerStringWhich chunker to use (HybridChunker, HierarchicalChunker).
providerStringWhich tokenizer provider to use with HybridChunker (OpenAI or Hugging Face). Default: OpenAI.
hf_model_nameStringModel name of the tokenizer to use with the HybridChunker when Hugging Face is chosen.
openai_model_nameStringModel name of the tokenizer to use with the HybridChunker when OpenAI is chosen.
max_tokensIntegerMaximum number of tokens for the HybridChunker.
doc_keyStringThe key to use for the DoclingDocument column.

Export DoclingDocument

The Export DoclingDocument component exports DoclingDocument to Markdown, HTML, and other formats.

It can output the exported data as either Data or DataFrame.

For more information, see the Docling core project repository.

Export DoclingDocument parameters

NameTypeDescription
data_inputsData/DataFrameThe data with documents to export.
export_formatStringSelect the export format to convert the input (Markdown, HTML, Plaintext, DocTags).
image_modeStringSpecify how images are exported in the output (placeholder, embedded).
md_image_placeholderStringSpecify the image placeholder for markdown exports.
md_page_break_placeholderStringAdd this placeholder between pages in the markdown output.
doc_keyStringThe key to use for the DoclingDocument column.

See also