Files
EverShelf/Dockerfile
dadaloop82 a6c2fb93cf feat: offline OCR (Tesseract) + embedding category classifier (@xenova/transformers)
Tesseract OCR (PHP, server-side):
- Dockerfile: adds tesseract-ocr + tesseract-ocr-ita + libgd-dev (gd extension)
- api/index.php: new tesseractReadExpiry() — decodes base64 image, pre-processes with GD (2× upscale, greyscale, auto-contrast, sharpen), runs tesseract CLI with ita+eng PSM-6, extracts date with multi-pattern regex (DD/MM/YYYY, MM/YYYY, ISO, named-month), returns YYYY-MM-DD + confidence
- geminiReadExpiry() now: (1) tries Tesseract first; (2) falls back to Gemini Vision if OCR returns null or no date found; (3) passes source ('ocr'|'gemini') in response

@xenova/transformers embedding classifier (browser-side):
- index.html: ES-module bootstrap that lazy-loads 'Xenova/all-MiniLM-L6-v2' quantized (~23 MB, cached in browser) via window._getCategoryPipeline(); pre-warms on first scan page visit
- assets/js/app.js: classifyCategoryByEmbedding(name) — embeds product name + 16 category anchor descriptions, cosine similarity, threshold 0.30; results cached in _embeddingCache Map
- autoDetectCategory(): after keyword map misses, fires classifyCategoryByEmbedding async and updates select when resolved (respects manuallySet flag)
- createQuickProduct(): if regex returned 'altro', silently patches category with embedding result via a background api call
2026-05-03 13:17:14 +00:00

47 lines
1.3 KiB
Docker

FROM php:8.2-apache
# Install required PHP extensions + Tesseract OCR for offline expiry date reading
RUN apt-get update && apt-get install -y \
libsqlite3-dev \
libcurl4-openssl-dev \
libonig-dev \
libgd-dev \
tesseract-ocr \
tesseract-ocr-ita \
tesseract-ocr-eng \
&& docker-php-ext-install pdo_sqlite curl mbstring gd \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
# Enable Apache mod_rewrite and mod_headers
RUN a2enmod rewrite headers
# Set working directory
WORKDIR /var/www/html
# Copy application files
COPY . /var/www/html/
# Create data directory with proper permissions
RUN mkdir -p /var/www/html/data/backups \
&& chown -R www-data:www-data /var/www/html/data \
&& chmod -R 775 /var/www/html/data
# Create .env from example if it doesn't exist (will be overridden by volume mount)
RUN [ ! -f /var/www/html/.env ] && cp /var/www/html/.env.example /var/www/html/.env || true
# Apache configuration: serve from app root
RUN echo '<Directory /var/www/html>\n\
AllowOverride All\n\
Require all granted\n\
</Directory>' > /etc/apache2/conf-available/evershelf.conf \
&& a2enconf evershelf
# Expose port 80
EXPOSE 80
# Health check
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost/ || exit 1
CMD ["apache2-foreground"]