Paperless-ngx with AI

Self-hosted document archive with OCR, full-text search and AI-powered classification, tagging and chat — built on AWS for UK parish and town councils

Paperless-ngx
Try Before You Buy

View source on GitHub (opens in new tab)

Overview

Your filing cabinet, but searchable.
Paperless-ngx is a popular open-source document management system used by parish councils, small charities and individuals to digitise and search their paper records. The NDX:Try deployment adds Amazon Bedrock so every document is automatically tagged, titled, classified and summarised — and you can chat with the whole archive in plain English.

Learning Artifact: This is a pre-deployed demonstration environment for learning and exploration, not a production-ready deployment.

Paperless-ngx ingests scanned PDFs, images, Word documents and emails, runs OCR with Tesseract, converts Office documents through Apache Tika and Gotenberg, and gives you a fast searchable archive in your browser.

The NDX:Try version layers six AI features on top using Amazon Bedrock: every uploaded document is auto-classified (tags, document type, correspondent), retitled into something human-readable, and summarised in two sentences — all in the seconds after upload. A separate chat-with-archive interface lets you ask plain-English questions and get answers grounded in your real documents, with citations back to the source files and PII protection from Amazon Bedrock Guardrails.

Important After requesting your session, the environment will deploy automatically in approximately 15-20 minutes. The walkthrough will guide you through the pre-loaded archive, the AI features, and the chat interface.

What you’ll explore

Core features

Feature AWS Service
Document OCR Tesseract OCR + Fargate
Full-text search Built into Paperless-ngx
AI auto-tagging Amazon Bedrock (Nova Pro)
AI title rewriting Amazon Bedrock (Nova Pro)
Document type & correspondent Amazon Bedrock (Nova Pro)
Two-sentence AI summary Amazon Bedrock (Nova Pro)
Chat with archive Bedrock Knowledge Base + S3 Vectors + Guardrails

Infrastructure

Component AWS Service
Compute AWS Fargate (multi-container task)
Database Amazon RDS PostgreSQL
Cache / broker Amazon ElastiCache Redis
File storage Amazon S3 Files + S3
Vector store Amazon S3 Vectors
Foundation models Amazon Bedrock (Nova Pro, Titan Embed)
Safety Bedrock Guardrails
CDN & HTTPS Amazon CloudFront

Getting started

  1. Request your session — Click “Try this now” above. Your environment will deploy automatically.
  2. Find your credentials — In the AWS Console, go to CloudFormation → PaperlessNgxStack → Outputs tab. Copy the PaperlessUrl and AdminPassword (the username is admin).
  3. Sign in — Open the PaperlessUrl in your browser and log in.
  4. Browse the pre-loaded archive — The sandbox arrives with around 30 sample parish council documents (planning notices, minutes, agendas, invoices, correspondence) already OCR’d and AI-classified.
  5. Open the chat — From the same CloudFormation outputs, copy the ChatUrl. Ask plain-English questions across the archive; answers cite the source documents.
  6. Upload a document of your own — Drop a PDF, image, Word or email file in. Within a minute or two it will be OCR’d, AI-classified, retitled, summarised and indexed for chat.
View the full walkthrough →

Why this matters for local government

A filing cabinet that thinks for itself

Most parish and town councils still rely on paper or scattered shared drives. Paperless-ngx gives you a single searchable archive, and the AI layer means a clerk no longer has to manually tag, title or classify every document — Bedrock does it on upload.

Ask the archive

The chat interface lets a clerk or councillor ask questions like “what did we decide about the recreation ground play equipment?” or “show me planning notices from the last quarter” and get answers grounded in the real documents, with links back to the source.

Built on a thriving open-source project

Paperless-ngx has more than 40,000 stars on GitHub and an active community. The NDX:Try deployment uses the upstream container image directly — no fork — so anything you learn here applies to a self-hosted deployment elsewhere.

Safety baked in

Bedrock Guardrails sit in front of the chat interface, blocking attempts to extract personal data, off-topic questions and jailbreak prompts. The AWS infrastructure is private to your sandbox account and torn down at the end of the session.


Constraints

This is a time-limited evaluation environment provided through the NDX Innovation Sandbox:

  • Budget: Fixed allocation per session (sufficient for the walkthrough and extended exploration)
  • Duration: Sessions are time-limited — complete your evaluation within the allocated period
  • Purpose: Learning and evaluation only — do not upload documents containing sensitive or classified information
  • Data: All data, including any documents you upload, is deleted when the session ends

Important notes

  • OCR language: English only in this deployment (Paperless-ngx upstream supports many more)
  • AI model: Amazon Nova Pro via Bedrock for classification, titling, summary and chat — no Anthropic marketplace agreement required
  • Vector store: Amazon S3 Vectors (serverless) backs the chat-with-archive Knowledge Base
  • HTTPS: All access is via Amazon CloudFront with the default *.cloudfront.net certificate
  • Authentication: A randomly generated admin password is shown in the CloudFormation Outputs tab
  • Sample data: Around 30 fictional UK parish council documents are auto-loaded on first boot so the archive isn’t empty when you arrive

How this was built

Paperless-ngx is built and maintained by the open-source community at github.com/paperless-ngx/paperless-ngx (opens in new tab) (GPL-3.0). The NDX:Try version uses the upstream container image unchanged, and adds AWS-native bits around the edges:

  • Amazon Bedrock Nova Pro called from a Paperless-ngx post-consume hook to classify, title and summarise every uploaded document
  • Amazon Bedrock Knowledge Base with S3 Vectors for retrieval-augmented chat over the archive
  • Amazon Bedrock Guardrails to filter PII, off-topic queries and prompt-injection attempts on the chat interface
  • Amazon S3 Files so a single S3 bucket acts as both the Paperless-ngx working file system and the source for the Bedrock Knowledge Base
  • AWS Fargate for the multi-container task (Paperless web, Apache Tika, Gotenberg, init)
  • Amazon RDS PostgreSQL and Amazon ElastiCache Redis for the application database and Celery broker

Source code: paperless-ngx/paperless-ngx (opens in new tab) (upstream) | NDX:Try CDK stack (opens in new tab) (deployment)


Explore more scenarios

  • Council Chatbot — AI-powered chatbot for citizen enquiries using Amazon Bedrock
  • Minute — Meeting transcription and AI minute generation
  • FOI Redaction — Detect and redact PII in FOI responses with AI
  • Simply Readable — Document translation and Easy Read conversion

Troubleshooting

Cannot reach the PaperlessUrl
Wait for the CloudFormation stack to reach CREATE_COMPLETE (around 15-20 minutes). The URL won’t respond before then.

Documents arrive but have no AI tags or title
The post-consume hook installs its Python dependencies on first run, so the very first uploaded document can take a little longer than later ones. After that, classification typically completes within 30 seconds.

Chat says it cannot find an answer
The Bedrock Knowledge Base ingests new documents on a roughly 60-second batching window. If you’ve just uploaded a document, give it a minute or two and try again.

Chat blocks a question I expected to work
Bedrock Guardrails filter PII, jailbreak attempts and off-topic content. Rephrase the question, or check the walkthrough for examples that work well.


Support

For help with this scenario or NDX:Try: