# N8N Workflow Setup For a plain-English walkthrough of what the pipeline does end-to-end and how its output conforms to the database schema, see [`PROCESS.md`](./PROCESS.md). ## Prerequisites - Docker & Docker Compose - API keys (see below) ## API Keys Create `crawlers/config.json` from the template: ```bash cp crawlers/config.example.json crawlers/config.json ``` | Key | Service | Cost | Get it at | |-----|---------|------|-----------| | `serper_api_key` | Serper.dev (Google search) | 2,500 free | https://serper.dev | | `abr_guid` | ABR (ABN lookup) | Free | https://abr.business.gov.au/Tools/WebServices | | `anthropic_api_key` | Claude Haiku (AI extraction) | ~$2/full run | https://console.anthropic.com | Also set `ANTHROPIC_API_KEY` as an N8N credential/environment variable. ## Start N8N ```bash cd n8n/ docker compose up -d ``` N8N will be available at http://localhost:5678 ## Import Workflows In the N8N UI: 1. Go to **Workflows** → **Import from File** 2. Import each file from `n8n/workflows/`: - `1_weekly_discovery.json` — discovers new providers from registries - `2_daily_website_discovery.json` — finds provider websites - `3_daily_enrichment.json` — crawls sites & AI-extracts pricing - `4_monthly_refresh.json` — re-checks pricing for stale data 3. Activate each workflow ## Workflow Schedule | # | Workflow | Schedule | What It Does | |---|---------|----------|-------------| | 1 | Weekly Discovery | Mon 2am AEST | Crawls VIC Register, Funerals AU, NFDA → dedup | | 2 | Daily Website Discovery | 4am AEST | Finds websites for 100 providers/day | | 3 | Daily Enrichment | 6am AEST | Crawls 50 websites/day → AI extracts pricing | | 4 | Monthly Refresh | 1st of month, 3am | Re-checks pricing older than 30 days | ## Workflow Flow ``` Mon 2am Daily 4am Daily 6am Monthly ┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │Registry │ │ ABN │ │ Crawl │ │ Reset │ │Crawlers │ │ Lookup │ │ Websites │ │ Stale │ │(VIC,FA, │ │ (free) │ │ (50/day) │ │Providers│ │ NFDA) │ │ │ │ │ │ │ └────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ Dedup │ │ Serper │ │ Claude │ │Re-enrich│ │& Merge │ │ Search │ │ Haiku AI │ │ Batch │ │ │ │(100/day) │ │ Extract │ │ │ └────┬───┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ ▼ ▼ ▼ ▼ New providers Websites found Packages & Updated tiers queued in DB tiers updated ``` ## Manual Run You can also run the pipeline manually without N8N: ```bash cd crawlers/ # Full pipeline python3 crawl_all.py python3 dedup.py python3 lookup_abn.py --limit=100 python3 discover_websites.py --limit=100 python3 enrich_websites.py --limit=50 python3 compute_tiers.py # Test mode python3 crawl_all.py --test python3 discover_websites.py --limit=5 --state=VIC python3 enrich_websites.py --limit=3 ``` ## Database The pipeline uses SQLite at `database/providers.db` for the demo. A Postgres schema is at `database/schema.sql` for production. To reset: ```bash rm database/providers.db sqlite3 database/providers.db < database/schema_sqlite.sql ```