Portfolio
Project / Machine VisionARR

OCR Document Scanner

Skills demo recreating a receipt scanning pipeline from past contracted work. The original system is closed-source per contract terms. This demo showcases the same techniques - document OCR with data extraction and fraud detection - running entirely client-side in the browser.

01 / Demo

Interactive Demo

All processing runs in your browser - no data is sent to any server.

Upload a document image - fuel receipts, retail receipts, invoices, or any text document. The OCR engine auto-detects the document type, extracts structured data, and runs validation checks.

Drop a document image here
or click to browse - PNG, JPG, WEBP supported
Or try a sample:
Image preprocessing Upscale + denoise + adaptive threshold
Initializing… 0%
Image Preview
Raw OCR Output
High Medium Low
Extracted Data
Fraud Detection
02 / Pipeline

How It Works

Client-side processing pipeline. No data leaves your browser.

01

OCR Engine

Tesseract.js v5 processes receipt images via Web Workers. A key-value parser extracts structured fields, then regex patterns handle edge cases and fallbacks.

Tesseract.jsWeb WorkersWASM
02

Image Preprocessing

A Web Worker pipeline upscales, converts to grayscale, applies median noise reduction, normalizes contrast, and runs adaptive thresholding for optimal OCR accuracy.

Web WorkersCanvas APIAdaptive Threshold
03

Fraud Detection

Six heuristic checks flag anomalies: price range validation, volume sanity, math cross-referencing, duplicate amount detection, date presence, and OCR confidence scoring.

HeuristicsValidationAnomaly Detection
03 / Stack

Technology

Languages
JavaScript HTML / CSS
Libraries
Tesseract.js v5 Web Workers
Techniques
OCR / Machine Vision Image Preprocessing Fraud Detection Heuristics Document Type Classification