Settings

Settings are read from environment variables and .env. Edit the file on disk and restart the service to apply changes.

Active configuration

VariableValue
CRAWLTRACKER_DB/opt/crawltracker/crawltracker.db
CRAWLTRACKER_LOG_FILE/var/log/caddy/straten-access.log
CRAWLTRACKER_SITEMAP_URLhttps://straten.jiskta.com/sitemap-index.xml
CRAWLTRACKER_VERIFY_TTL_DAYS7

Caddy configuration note

For accurate Googlebot verification, Caddy must log the real visitor IP from CF-Connecting-IP rather than the Cloudflare proxy IP.

Add the following to your global Caddy config and site block:

{
    servers {
        trusted_proxies static 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 \
            104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 \
            141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 \
            188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 \
            2400:cb00::/32 2606:4700::/32 2803:f800::/32 2405:b500::/32 \
            2405:8100::/32 2a06:98c0::/29 2c0f:f248::/32
        client_ip_headers CF-Connecting-IP X-Forwarded-For
    }
}

straten.jiskta.com {
    log {
        output file /var/log/caddy/straten-access.log {
            roll_size 100MiB
            roll_keep 10
        }
        format json
    }
    file_server
    root * /opt/straten/site
}

⚠️ Only trust X-Forwarded-For when the request comes from a known Cloudflare IP range. The trusted_proxies directive ensures Caddy only uses the header from trusted sources. Current Cloudflare IP ranges: cloudflare.com/ips

CLI commands

python -m crawltracker ingest-logs --file /var/log/caddy/straten-access.log
python -m crawltracker ingest-sitemap --url https://straten.jiskta.com/sitemap.xml
python -m crawltracker verify-ips
python -m crawltracker serve --host 127.0.0.1 --port 8787
python -m crawltracker report summary