Files
EverShelf/assets/js
dadaloop82 a6c2fb93cf feat: offline OCR (Tesseract) + embedding category classifier (@xenova/transformers)
Tesseract OCR (PHP, server-side):
- Dockerfile: adds tesseract-ocr + tesseract-ocr-ita + libgd-dev (gd extension)
- api/index.php: new tesseractReadExpiry() — decodes base64 image, pre-processes with GD (2× upscale, greyscale, auto-contrast, sharpen), runs tesseract CLI with ita+eng PSM-6, extracts date with multi-pattern regex (DD/MM/YYYY, MM/YYYY, ISO, named-month), returns YYYY-MM-DD + confidence
- geminiReadExpiry() now: (1) tries Tesseract first; (2) falls back to Gemini Vision if OCR returns null or no date found; (3) passes source ('ocr'|'gemini') in response

@xenova/transformers embedding classifier (browser-side):
- index.html: ES-module bootstrap that lazy-loads 'Xenova/all-MiniLM-L6-v2' quantized (~23 MB, cached in browser) via window._getCategoryPipeline(); pre-warms on first scan page visit
- assets/js/app.js: classifyCategoryByEmbedding(name) — embeds product name + 16 category anchor descriptions, cosine similarity, threshold 0.30; results cached in _embeddingCache Map
- autoDetectCategory(): after keyword map misses, fires classifyCategoryByEmbedding async and updates select when resolved (respects manuallySet flag)
- createQuickProduct(): if regex returned 'altro', silently patches category with embedding result via a background api call
2026-05-03 13:17:14 +00:00
..