Project / Machine VisionARR

OCR Document Scanner

Skills demo recreating a receipt scanning pipeline from past contracted work. The original system is closed-source per contract terms. This demo showcases the same techniques - document OCR with data extraction and fraud detection - running entirely client-side in the browser.

01 / Demo

Interactive Demo

All processing runs in your browser - no data is sent to any server.

Upload a document image - fuel receipts, retail receipts, invoices, or any text document. The OCR engine auto-detects the document type, extracts structured data, and runs validation checks.

Drop a document image here

or click to browse - PNG, JPG, WEBP supported

Or try a sample:

Image preprocessing Upscale + denoise + adaptive threshold

Initializing… 0%

Image Preview

Raw OCR Output

High Medium Low

Extracted Data

Fraud Detection

02 / Pipeline

How It Works

Client-side processing pipeline. No data leaves your browser.

OCR Engine

Tesseract.js v5 processes receipt images via Web Workers. A key-value parser extracts structured fields, then regex patterns handle edge cases and fallbacks.

Tesseract.jsWeb WorkersWASM

Image Preprocessing

A Web Worker pipeline upscales, converts to grayscale, applies median noise reduction, normalizes contrast, and runs adaptive thresholding for optimal OCR accuracy.

Web WorkersCanvas APIAdaptive Threshold

Fraud Detection

Six heuristic checks flag anomalies: price range validation, volume sanity, math cross-referencing, duplicate amount detection, date presence, and OCR confidence scoring.

HeuristicsValidationAnomaly Detection

03 / Stack

Technology

Languages

JavaScript HTML / CSS

Libraries

Tesseract.js v5 Web Workers

Techniques

OCR / Machine Vision Image Preprocessing Fraud Detection Heuristics Document Type Classification