Files
Provider-Crawl/CLAUDE.md
Richie cc91427789 Initial commit: funeral provider discovery pipeline
Python crawlers for VIC Register, Funerals Australia, NFDA
n8n workflows for scheduled discovery and enrichment
SQLite schema and seeded dev database (1,463 providers)
End-to-end process documentation in n8n/PROCESS.md
2026-04-24 10:27:08 +10:00

1.9 KiB

Claude Code orientation

You've been handed a funeral-provider discovery pipeline. Before doing anything:

  1. Read README.md for the repo layout.
  2. Read n8n/PROCESS.md for the end-to-end flow and how data conforms to the DB schema. This is the authoritative doc.
  3. Read crawlers/PIPELINE.md for Python module internals.

Project shape

  • crawlers/ — Python modules, one per data source. Invoked either by run_overnight.sh (manual) or by n8n workflows via executeCommand.
  • n8n/workflows/*.json — four scheduled workflows that drive the pipeline end-to-end.
  • database/providers.db — live SQLite snapshot (~1,463 providers, 121 with pricing). Safe to inspect; re-creatable from schema_sqlite.sql.

Key constraints

  • Never write to funeral_brand.verified or funeral_brand.hidden — those are admin-only. The pipeline keeps providers hidden and unverified until a human reviews them.
  • Do not use Gathered Here data as a source of truth. It's a competitor. crawl_gathered_here.py exists as historical tooling but isn't part of the active pipeline — all enrichment comes from providers' own websites or regulatory disclosure PDFs.
  • Listing tier is computed, not stored as the source of truth. compute_tiers.py derives it from package/inclusion data. Don't set it manually.

Running locally

You'll need a Serper API key (free 2,500/mo at serper.dev) to do website discovery. Everything else can run without keys, though AI pricing extraction in Workflow 3 needs an Anthropic key.

cp crawlers/config.example.json crawlers/config.json
# add keys to config.json
cd crawlers && ./run_overnight.sh

Things that aren't here

  • No live secrets / API keys — crawlers/config.json is gitignored, use config.example.json as a template.
  • No admin review UI — that's a separate frontend project.
  • No Postgres migration tooling — database/schema.sql is the target, but the repo uses SQLite for dev.