Initial commit: funeral provider discovery pipeline
Python crawlers for VIC Register, Funerals Australia, NFDA n8n workflows for scheduled discovery and enrichment SQLite schema and seeded dev database (1,463 providers) End-to-end process documentation in n8n/PROCESS.md
This commit is contained in:
110
n8n/README.md
Normal file
110
n8n/README.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# N8N Workflow Setup
|
||||
|
||||
For a plain-English walkthrough of what the pipeline does end-to-end and how
|
||||
its output conforms to the database schema, see [`PROCESS.md`](./PROCESS.md).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker & Docker Compose
|
||||
- API keys (see below)
|
||||
|
||||
## API Keys
|
||||
|
||||
Create `crawlers/config.json` from the template:
|
||||
|
||||
```bash
|
||||
cp crawlers/config.example.json crawlers/config.json
|
||||
```
|
||||
|
||||
| Key | Service | Cost | Get it at |
|
||||
|-----|---------|------|-----------|
|
||||
| `serper_api_key` | Serper.dev (Google search) | 2,500 free | https://serper.dev |
|
||||
| `abr_guid` | ABR (ABN lookup) | Free | https://abr.business.gov.au/Tools/WebServices |
|
||||
| `anthropic_api_key` | Claude Haiku (AI extraction) | ~$2/full run | https://console.anthropic.com |
|
||||
|
||||
Also set `ANTHROPIC_API_KEY` as an N8N credential/environment variable.
|
||||
|
||||
## Start N8N
|
||||
|
||||
```bash
|
||||
cd n8n/
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
N8N will be available at http://localhost:5678
|
||||
|
||||
## Import Workflows
|
||||
|
||||
In the N8N UI:
|
||||
|
||||
1. Go to **Workflows** → **Import from File**
|
||||
2. Import each file from `n8n/workflows/`:
|
||||
- `1_weekly_discovery.json` — discovers new providers from registries
|
||||
- `2_daily_website_discovery.json` — finds provider websites
|
||||
- `3_daily_enrichment.json` — crawls sites & AI-extracts pricing
|
||||
- `4_monthly_refresh.json` — re-checks pricing for stale data
|
||||
3. Activate each workflow
|
||||
|
||||
## Workflow Schedule
|
||||
|
||||
| # | Workflow | Schedule | What It Does |
|
||||
|---|---------|----------|-------------|
|
||||
| 1 | Weekly Discovery | Mon 2am AEST | Crawls VIC Register, Funerals AU, NFDA → dedup |
|
||||
| 2 | Daily Website Discovery | 4am AEST | Finds websites for 100 providers/day |
|
||||
| 3 | Daily Enrichment | 6am AEST | Crawls 50 websites/day → AI extracts pricing |
|
||||
| 4 | Monthly Refresh | 1st of month, 3am | Re-checks pricing older than 30 days |
|
||||
|
||||
## Workflow Flow
|
||||
|
||||
```
|
||||
Mon 2am Daily 4am Daily 6am Monthly
|
||||
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
|
||||
│Registry │ │ ABN │ │ Crawl │ │ Reset │
|
||||
│Crawlers │ │ Lookup │ │ Websites │ │ Stale │
|
||||
│(VIC,FA, │ │ (free) │ │ (50/day) │ │Providers│
|
||||
│ NFDA) │ │ │ │ │ │ │
|
||||
└────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
|
||||
│ Dedup │ │ Serper │ │ Claude │ │Re-enrich│
|
||||
│& Merge │ │ Search │ │ Haiku AI │ │ Batch │
|
||||
│ │ │(100/day) │ │ Extract │ │ │
|
||||
└────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
New providers Websites found Packages & Updated tiers
|
||||
queued in DB tiers updated
|
||||
```
|
||||
|
||||
## Manual Run
|
||||
|
||||
You can also run the pipeline manually without N8N:
|
||||
|
||||
```bash
|
||||
cd crawlers/
|
||||
|
||||
# Full pipeline
|
||||
python3 crawl_all.py
|
||||
python3 dedup.py
|
||||
python3 lookup_abn.py --limit=100
|
||||
python3 discover_websites.py --limit=100
|
||||
python3 enrich_websites.py --limit=50
|
||||
python3 compute_tiers.py
|
||||
|
||||
# Test mode
|
||||
python3 crawl_all.py --test
|
||||
python3 discover_websites.py --limit=5 --state=VIC
|
||||
python3 enrich_websites.py --limit=3
|
||||
```
|
||||
|
||||
## Database
|
||||
|
||||
The pipeline uses SQLite at `database/providers.db` for the demo.
|
||||
A Postgres schema is at `database/schema.sql` for production.
|
||||
|
||||
To reset:
|
||||
```bash
|
||||
rm database/providers.db
|
||||
sqlite3 database/providers.db < database/schema_sqlite.sql
|
||||
```
|
||||
Reference in New Issue
Block a user