Python crawlers for VIC Register, Funerals Australia, NFDA n8n workflows for scheduled discovery and enrichment SQLite schema and seeded dev database (1,463 providers) End-to-end process documentation in n8n/PROCESS.md
N8N Workflow Setup
For a plain-English walkthrough of what the pipeline does end-to-end and how
its output conforms to the database schema, see PROCESS.md.
Prerequisites
- Docker & Docker Compose
- API keys (see below)
API Keys
Create crawlers/config.json from the template:
cp crawlers/config.example.json crawlers/config.json
| Key | Service | Cost | Get it at |
|---|---|---|---|
serper_api_key |
Serper.dev (Google search) | 2,500 free | https://serper.dev |
abr_guid |
ABR (ABN lookup) | Free | https://abr.business.gov.au/Tools/WebServices |
anthropic_api_key |
Claude Haiku (AI extraction) | ~$2/full run | https://console.anthropic.com |
Also set ANTHROPIC_API_KEY as an N8N credential/environment variable.
Start N8N
cd n8n/
docker compose up -d
N8N will be available at http://localhost:5678
Import Workflows
In the N8N UI:
- Go to Workflows → Import from File
- Import each file from
n8n/workflows/:1_weekly_discovery.json— discovers new providers from registries2_daily_website_discovery.json— finds provider websites3_daily_enrichment.json— crawls sites & AI-extracts pricing4_monthly_refresh.json— re-checks pricing for stale data
- Activate each workflow
Workflow Schedule
| # | Workflow | Schedule | What It Does |
|---|---|---|---|
| 1 | Weekly Discovery | Mon 2am AEST | Crawls VIC Register, Funerals AU, NFDA → dedup |
| 2 | Daily Website Discovery | 4am AEST | Finds websites for 100 providers/day |
| 3 | Daily Enrichment | 6am AEST | Crawls 50 websites/day → AI extracts pricing |
| 4 | Monthly Refresh | 1st of month, 3am | Re-checks pricing older than 30 days |
Workflow Flow
Mon 2am Daily 4am Daily 6am Monthly
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
│Registry │ │ ABN │ │ Crawl │ │ Reset │
│Crawlers │ │ Lookup │ │ Websites │ │ Stale │
│(VIC,FA, │ │ (free) │ │ (50/day) │ │Providers│
│ NFDA) │ │ │ │ │ │ │
└────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
│ Dedup │ │ Serper │ │ Claude │ │Re-enrich│
│& Merge │ │ Search │ │ Haiku AI │ │ Batch │
│ │ │(100/day) │ │ Extract │ │ │
└────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
▼ ▼ ▼ ▼
New providers Websites found Packages & Updated tiers
queued in DB tiers updated
Manual Run
You can also run the pipeline manually without N8N:
cd crawlers/
# Full pipeline
python3 crawl_all.py
python3 dedup.py
python3 lookup_abn.py --limit=100
python3 discover_websites.py --limit=100
python3 enrich_websites.py --limit=50
python3 compute_tiers.py
# Test mode
python3 crawl_all.py --test
python3 discover_websites.py --limit=5 --state=VIC
python3 enrich_websites.py --limit=3
Database
The pipeline uses SQLite at database/providers.db for the demo.
A Postgres schema is at database/schema.sql for production.
To reset:
rm database/providers.db
sqlite3 database/providers.db < database/schema_sqlite.sql