Paperless-ngx with AI
Self-hosted document archive with OCR, full-text search and AI-powered classification, tagging and chat — built on AWS for UK parish and town councils
Overview
Paperless-ngx is a popular open-source document management system used by parish councils, small charities and individuals to digitise and search their paper records. The NDX:Try deployment adds Amazon Bedrock so every document is automatically tagged, titled, classified and summarised — and you can chat with the whole archive in plain English.
Learning Artifact: This is a pre-deployed demonstration environment for learning and exploration, not a production-ready deployment.
Paperless-ngx ingests scanned PDFs, images, Word documents and emails, runs OCR with Tesseract, converts Office documents through Apache Tika and Gotenberg, and gives you a fast searchable archive in your browser.
The NDX:Try version layers six AI features on top using Amazon Bedrock: every uploaded document is auto-classified (tags, document type, correspondent), retitled into something human-readable, and summarised in two sentences — all in the seconds after upload. A separate chat-with-archive interface lets you ask plain-English questions and get answers grounded in your real documents, with citations back to the source files and PII protection from Amazon Bedrock Guardrails.
What you’ll explore
Core features
| Feature | AWS Service |
|---|---|
| Document OCR | Tesseract OCR + Fargate |
| Full-text search | Built into Paperless-ngx |
| AI auto-tagging | Amazon Bedrock (Nova Pro) |
| AI title rewriting | Amazon Bedrock (Nova Pro) |
| Document type & correspondent | Amazon Bedrock (Nova Pro) |
| Two-sentence AI summary | Amazon Bedrock (Nova Pro) |
| Chat with archive | Bedrock Knowledge Base + S3 Vectors + Guardrails |
Infrastructure
| Component | AWS Service |
|---|---|
| Compute | AWS Fargate (multi-container task) |
| Database | Amazon RDS PostgreSQL |
| Cache / broker | Amazon ElastiCache Redis |
| File storage | Amazon S3 Files + S3 |
| Vector store | Amazon S3 Vectors |
| Foundation models | Amazon Bedrock (Nova Pro, Titan Embed) |
| Safety | Bedrock Guardrails |
| CDN & HTTPS | Amazon CloudFront |
Getting started
- Request your session — Click “Try this now” above. Your environment will deploy automatically.
- Find your credentials — In the AWS Console, go to CloudFormation → PaperlessNgxStack → Outputs tab. Copy the PaperlessUrl and AdminPassword (the username is
admin). - Sign in — Open the PaperlessUrl in your browser and log in.
- Browse the pre-loaded archive — The sandbox arrives with around 30 sample parish council documents (planning notices, minutes, agendas, invoices, correspondence) already OCR’d and AI-classified.
- Open the chat — From the same CloudFormation outputs, copy the ChatUrl. Ask plain-English questions across the archive; answers cite the source documents.
- Upload a document of your own — Drop a PDF, image, Word or email file in. Within a minute or two it will be OCR’d, AI-classified, retitled, summarised and indexed for chat.
Why this matters for local government
A filing cabinet that thinks for itself
Most parish and town councils still rely on paper or scattered shared drives. Paperless-ngx gives you a single searchable archive, and the AI layer means a clerk no longer has to manually tag, title or classify every document — Bedrock does it on upload.
Ask the archive
The chat interface lets a clerk or councillor ask questions like “what did we decide about the recreation ground play equipment?” or “show me planning notices from the last quarter” and get answers grounded in the real documents, with links back to the source.
Built on a thriving open-source project
Paperless-ngx has more than 40,000 stars on GitHub and an active community. The NDX:Try deployment uses the upstream container image directly — no fork — so anything you learn here applies to a self-hosted deployment elsewhere.
Safety baked in
Bedrock Guardrails sit in front of the chat interface, blocking attempts to extract personal data, off-topic questions and jailbreak prompts. The AWS infrastructure is private to your sandbox account and torn down at the end of the session.
Constraints
This is a time-limited evaluation environment provided through the NDX Innovation Sandbox:
- Budget: Fixed allocation per session (sufficient for the walkthrough and extended exploration)
- Duration: Sessions are time-limited — complete your evaluation within the allocated period
- Purpose: Learning and evaluation only — do not upload documents containing sensitive or classified information
- Data: All data, including any documents you upload, is deleted when the session ends
Important notes
- OCR language: English only in this deployment (Paperless-ngx upstream supports many more)
- AI model: Amazon Nova Pro via Bedrock for classification, titling, summary and chat — no Anthropic marketplace agreement required
- Vector store: Amazon S3 Vectors (serverless) backs the chat-with-archive Knowledge Base
- HTTPS: All access is via Amazon CloudFront with the default
*.cloudfront.netcertificate - Authentication: A randomly generated admin password is shown in the CloudFormation Outputs tab
- Sample data: Around 30 fictional UK parish council documents are auto-loaded on first boot so the archive isn’t empty when you arrive
How this was built
Paperless-ngx is built and maintained by the open-source community at github.com/paperless-ngx/paperless-ngx (opens in new tab) (GPL-3.0). The NDX:Try version uses the upstream container image unchanged, and adds AWS-native bits around the edges:
- Amazon Bedrock Nova Pro called from a Paperless-ngx post-consume hook to classify, title and summarise every uploaded document
- Amazon Bedrock Knowledge Base with S3 Vectors for retrieval-augmented chat over the archive
- Amazon Bedrock Guardrails to filter PII, off-topic queries and prompt-injection attempts on the chat interface
- Amazon S3 Files so a single S3 bucket acts as both the Paperless-ngx working file system and the source for the Bedrock Knowledge Base
- AWS Fargate for the multi-container task (Paperless web, Apache Tika, Gotenberg, init)
- Amazon RDS PostgreSQL and Amazon ElastiCache Redis for the application database and Celery broker
Source code: paperless-ngx/paperless-ngx (opens in new tab) (upstream) | NDX:Try CDK stack (opens in new tab) (deployment)
Explore more scenarios
- Council Chatbot — AI-powered chatbot for citizen enquiries using Amazon Bedrock
- Minute — Meeting transcription and AI minute generation
- FOI Redaction — Detect and redact PII in FOI responses with AI
- Simply Readable — Document translation and Easy Read conversion
Troubleshooting
Cannot reach the PaperlessUrl
Wait for the CloudFormation stack to reach CREATE_COMPLETE (around 15-20 minutes). The URL won’t respond before then.
Documents arrive but have no AI tags or title
The post-consume hook installs its Python dependencies on first run, so the very first uploaded document can take a little longer than later ones. After that, classification typically completes within 30 seconds.
Chat says it cannot find an answer
The Bedrock Knowledge Base ingests new documents on a roughly 60-second batching window. If you’ve just uploaded a document, give it a minute or two and try again.
Chat blocks a question I expected to work
Bedrock Guardrails filter PII, jailbreak attempts and off-topic content. Rephrase the question, or check the walkthrough for examples that work well.
Support
For help with this scenario or NDX:Try:
- Walkthrough: Step-by-step guide (opens in new tab)
- Source code: paperless-ngx/paperless-ngx on GitHub (opens in new tab)
- NDX Team: Contact via the NDX support channel (opens in new tab)