OCR Document Scanner
Skills demo recreating a receipt scanning pipeline from past contracted work. The original system is closed-source per contract terms. This demo showcases the same techniques - document OCR with data extraction and fraud detection - running entirely client-side in the browser.
Interactive Demo
All processing runs in your browser - no data is sent to any server.
Upload a document image - fuel receipts, retail receipts, invoices, or any text document. The OCR engine auto-detects the document type, extracts structured data, and runs validation checks.
How It Works
Client-side processing pipeline. No data leaves your browser.
OCR Engine
Tesseract.js v5 processes receipt images via Web Workers. A key-value parser extracts structured fields, then regex patterns handle edge cases and fallbacks.
Image Preprocessing
A Web Worker pipeline upscales, converts to grayscale, applies median noise reduction, normalizes contrast, and runs adaptive thresholding for optimal OCR accuracy.
Fraud Detection
Six heuristic checks flag anomalies: price range validation, volume sanity, math cross-referencing, duplicate amount detection, date presence, and OCR confidence scoring.