Python crawlers for VIC Register, Funerals Australia, NFDA n8n workflows for scheduled discovery and enrichment SQLite schema and seeded dev database (1,463 providers) End-to-end process documentation in n8n/PROCESS.md
1.9 KiB
1.9 KiB
Claude Code orientation
You've been handed a funeral-provider discovery pipeline. Before doing anything:
- Read
README.mdfor the repo layout. - Read
n8n/PROCESS.mdfor the end-to-end flow and how data conforms to the DB schema. This is the authoritative doc. - Read
crawlers/PIPELINE.mdfor Python module internals.
Project shape
crawlers/— Python modules, one per data source. Invoked either byrun_overnight.sh(manual) or by n8n workflows viaexecuteCommand.n8n/workflows/*.json— four scheduled workflows that drive the pipeline end-to-end.database/providers.db— live SQLite snapshot (~1,463 providers, 121 with pricing). Safe to inspect; re-creatable fromschema_sqlite.sql.
Key constraints
- Never write to
funeral_brand.verifiedorfuneral_brand.hidden— those are admin-only. The pipeline keeps providers hidden and unverified until a human reviews them. - Do not use Gathered Here data as a source of truth. It's a competitor.
crawl_gathered_here.pyexists as historical tooling but isn't part of the active pipeline — all enrichment comes from providers' own websites or regulatory disclosure PDFs. - Listing tier is computed, not stored as the source of truth.
compute_tiers.pyderives it from package/inclusion data. Don't set it manually.
Running locally
You'll need a Serper API key (free 2,500/mo at serper.dev) to do website discovery. Everything else can run without keys, though AI pricing extraction in Workflow 3 needs an Anthropic key.
cp crawlers/config.example.json crawlers/config.json
# add keys to config.json
cd crawlers && ./run_overnight.sh
Things that aren't here
- No live secrets / API keys —
crawlers/config.jsonis gitignored, useconfig.example.jsonas a template. - No admin review UI — that's a separate frontend project.
- No Postgres migration tooling —
database/schema.sqlis the target, but the repo uses SQLite for dev.