Files
Provider-Crawl/n8n/README.md
Richie cc91427789 Initial commit: funeral provider discovery pipeline
Python crawlers for VIC Register, Funerals Australia, NFDA
n8n workflows for scheduled discovery and enrichment
SQLite schema and seeded dev database (1,463 providers)
End-to-end process documentation in n8n/PROCESS.md
2026-04-24 10:27:08 +10:00

4.0 KiB

N8N Workflow Setup

For a plain-English walkthrough of what the pipeline does end-to-end and how its output conforms to the database schema, see PROCESS.md.

Prerequisites

  • Docker & Docker Compose
  • API keys (see below)

API Keys

Create crawlers/config.json from the template:

cp crawlers/config.example.json crawlers/config.json
Key Service Cost Get it at
serper_api_key Serper.dev (Google search) 2,500 free https://serper.dev
abr_guid ABR (ABN lookup) Free https://abr.business.gov.au/Tools/WebServices
anthropic_api_key Claude Haiku (AI extraction) ~$2/full run https://console.anthropic.com

Also set ANTHROPIC_API_KEY as an N8N credential/environment variable.

Start N8N

cd n8n/
docker compose up -d

N8N will be available at http://localhost:5678

Import Workflows

In the N8N UI:

  1. Go to WorkflowsImport from File
  2. Import each file from n8n/workflows/:
    • 1_weekly_discovery.json — discovers new providers from registries
    • 2_daily_website_discovery.json — finds provider websites
    • 3_daily_enrichment.json — crawls sites & AI-extracts pricing
    • 4_monthly_refresh.json — re-checks pricing for stale data
  3. Activate each workflow

Workflow Schedule

# Workflow Schedule What It Does
1 Weekly Discovery Mon 2am AEST Crawls VIC Register, Funerals AU, NFDA → dedup
2 Daily Website Discovery 4am AEST Finds websites for 100 providers/day
3 Daily Enrichment 6am AEST Crawls 50 websites/day → AI extracts pricing
4 Monthly Refresh 1st of month, 3am Re-checks pricing older than 30 days

Workflow Flow

  Mon 2am          Daily 4am           Daily 6am         Monthly
  ┌────────┐      ┌──────────┐       ┌──────────┐      ┌─────────┐
  │Registry │      │  ABN     │       │ Crawl    │      │ Reset   │
  │Crawlers │      │  Lookup  │       │ Websites │      │ Stale   │
  │(VIC,FA, │      │  (free)  │       │ (50/day) │      │Providers│
  │ NFDA)   │      │          │       │          │      │         │
  └────┬───┘      └────┬────┘       └────┬────┘      └────┬────┘
       │               │                 │                 │
       ▼               ▼                 ▼                 ▼
  ┌────────┐      ┌──────────┐       ┌──────────┐      ┌─────────┐
  │ Dedup  │      │ Serper   │       │ Claude   │      │Re-enrich│
  │& Merge │      │ Search   │       │ Haiku AI │      │  Batch  │
  │        │      │(100/day) │       │ Extract  │      │         │
  └────┬───┘      └────┬────┘       └────┬────┘      └────┬────┘
       │               │                 │                 │
       ▼               ▼                 ▼                 ▼
  New providers    Websites found     Packages &       Updated tiers
  queued           in DB              tiers updated

Manual Run

You can also run the pipeline manually without N8N:

cd crawlers/

# Full pipeline
python3 crawl_all.py
python3 dedup.py
python3 lookup_abn.py --limit=100
python3 discover_websites.py --limit=100
python3 enrich_websites.py --limit=50
python3 compute_tiers.py

# Test mode
python3 crawl_all.py --test
python3 discover_websites.py --limit=5 --state=VIC
python3 enrich_websites.py --limit=3

Database

The pipeline uses SQLite at database/providers.db for the demo. A Postgres schema is at database/schema.sql for production.

To reset:

rm database/providers.db
sqlite3 database/providers.db < database/schema_sqlite.sql