Initial commit: funeral provider discovery pipeline

Python crawlers for VIC Register, Funerals Australia, NFDA
n8n workflows for scheduled discovery and enrichment
SQLite schema and seeded dev database (1,463 providers)
End-to-end process documentation in n8n/PROCESS.md
This commit is contained in:
Richie
2026-04-24 10:27:08 +10:00
commit cc91427789
30 changed files with 4706 additions and 0 deletions

View File

@@ -0,0 +1,209 @@
# Provider Data Model — Verified & Unverified Providers
This document extends the CMS schema (`schemas/cms-schema-spec.md`) with support for
unverified (auto-discovered) providers alongside the existing verified (signed-up) providers.
---
## Overview
The platform lists funeral directors in two categories:
- **Verified providers** — Signed up to the platform. Full branding (logo, badge, colours),
complete package configuration, and online arrangement booking enabled.
- **Unverified providers** — Auto-discovered from public registries and their own websites.
Listed with whatever public information is available. Can apply to become verified.
All providers share the same `funeral_brand` table and schema. The difference is driven
by data completeness and the `verified` / `listing_tier` fields.
---
## Schema Changes to FuneralBrand
These fields are **added** to the existing FuneralBrand collection from `cms-schema-spec.md`:
| Field | Type | Default | Purpose |
|-------|------|---------|---------|
| `verified` | Boolean | `false` | `true` for signed-up partners, `false` for auto-discovered |
| `listing_tier` | Enum | `'listed'` | Display tier, computed from data quality (see below) |
| `hidden` | Boolean | `true` | Unverified providers start hidden until admin-reviewed |
| `source_key` | String (unique) | `null` | Provenance identifier, e.g. `"nfda:1234"` |
| `source_url` | String (URL) | `null` | Where this record was discovered |
| `last_enriched_at` | DateTime | `null` | When data was last refreshed from provider's website |
| `enrichment_status` | Enum | `'pending'` | `pending` / `partial` / `complete` / `failed` |
### Fields that become optional for unverified providers
These fields are **required** for verified providers but **nullable** for unverified:
| Field | Verified | Unverified |
|-------|----------|------------|
| `logo` | Required (brand logo image) | `null` — no images until they sign up |
| `badge` | Required (card badge image) | `null` — no images until they sign up |
| `description` | Required | Optional (extracted from their website if available) |
| `backgroundColour` | Set (brand theme) | `null` — use platform default |
| `foregroundColour` | Set (brand theme) | `null` — use platform default |
| `modalDescription` | Set | `null` |
| `code` | Set (URL slug) | Auto-generated from business name |
### Fields present for both verified and unverified
| Field | Notes |
|-------|-------|
| `title` | Business name (always present) |
| `phone` | Contact phone (present for ~94% of providers) |
| `email` | Contact email (present for ~66%) |
| `website` | External website URL (present for ~68%) |
| `abn` | Australian Business Number (strongest dedup key) |
| `businessAddress/Suburb/State/Postcode` | Business location |
| `availableFuneralTypes` | Comma-separated funeral type IDs |
---
## Listing Tiers
Every provider is assigned a `listing_tier` that determines how they appear on the platform.
The tier is **computed from data quality** — specifically from what package/pricing data exists.
| Tier | Value | Criteria | UI Treatment |
|------|-------|----------|-------------|
| **Verified** | `'verified'` | `verified = true` | Full branding, package selection, online arrangements, custom images |
| **Priced** | `'priced'` | Unverified + 2 or more packages with itemized inclusion prices | Show packages with line-item breakdowns, no arrangements |
| **Estimated** | `'estimated'` | Unverified + at least 1 package with a total price | Show package prices, "Contact for full details" on breakdowns |
| **Listed** | `'listed'` | Unverified + no pricing data | Show contact info only, "Contact for pricing" CTA |
### Tier computation logic
```
if brand.verified:
tier = 'verified'
elif brand has 2+ packages, each with 2+ priced inclusions:
tier = 'priced'
elif brand has 1+ packages with any price:
tier = 'estimated'
else:
tier = 'listed'
```
### Upgrade incentive
Each tier below verified creates a natural CTA for the provider:
- `listed` → "Publish your pricing to help families compare"
- `estimated` → "Add detailed breakdowns to stand out"
- `priced` → "Sign up to enable online arrangements and add your branding"
---
## Data Relationships (unchanged from CMS spec, but applied to both tiers)
```
FuneralBrand (verified or unverified)
├── Location[] (physical offices — at least 1 per provider)
├── Package[] (funeral plan bundles — 0 for 'listed' tier)
│ └── PackageInclusion[] (fee line items — 0 for 'estimated' tier)
├── KnownFor[] (feature badges — verified only typically)
└── FuneralArea[] (service regions — M:N)
```
### Package (same schema as CMS spec, with additions)
| Field | Type | Notes |
|-------|------|-------|
| `id` | PK | |
| `title` | String | e.g. "Direct Cremation", "Chapel Service" |
| `description` | Text | What's included |
| `funeral_type` | Enum | `Service & Cremation`, `Service & Burial`, `Cremation Only`, `Graveside Burial`, `Water Cremation` |
| `brand_id` | FK → FuneralBrand | |
| `source_url` | String | Where this pricing was found (provider's website) |
| `extraction_confidence` | Float 0-1 | How reliable the extracted data is (0.7 = HTML, 0.6 = PDF) |
| `sort` | Integer | Display order |
| `hidden` | Boolean | |
### PackageInclusion (same schema as CMS spec)
| Field | Type | Notes |
|-------|------|-------|
| `id` | PK | |
| `price` | Decimal | Dollar amount |
| `optional` | Boolean | User can opt in/out |
| `complimentary` | Boolean | Included free |
| `display` | Boolean | Whether shown to user |
| `inclusion_type_title` | String | Category label (see standard types below) |
| `package_id` | FK → Package | |
### Standard inclusion type names
These are the consistent labels used across all providers:
**Standard fees:** Professional Service Fee, Transportation Service Fee, Professional Mortuary Care, Death Registration Certificate, Cremation Certificate/Permit, Government Levy, Accommodation
**Products:** Coffin, Cremation Fee, Cemetery Fee, Celebrant Fee
**Optional extras:** Saturday Service Fee, Twilight Service Surcharge, Viewing Fee, After Hours Transfer Surcharge, Dressing Fee, Embalming, Digital Recording, Webstreaming, Coffin Bearing by Funeral Directors
---
## Current Data
The database (`database/providers.db`, SQLite) contains:
| Metric | Count |
|--------|-------|
| Total providers | 1,463 |
| With phone | 1,380 (94%) |
| With email | 972 (66%) |
| With website | 994 (68%) |
| With description | 618 (42%) |
| Total packages | 416 |
| Total inclusions | 388 |
### Tier distribution
| Tier | Providers |
|------|-----------|
| Verified | 0 (existing 12 brands not yet imported as verified) |
| Priced | 10 |
| Estimated | 111 |
| Listed | 1,342 |
### State distribution
| State | Providers | With Pricing |
|-------|-----------|-------------|
| VIC | 701 | 77 |
| NSW | 269 | 8 |
| QLD | 151 | 21 |
| SA | 85 | 1 |
| WA | 79 | 12 |
| TAS | 25 | 0 |
| NT | 7 | 0 |
| ACT | 9 | 0 |
---
## Database Schema Files
- **`database/schema.sql`** — Full Postgres schema (production-ready)
- **`database/schema_sqlite.sql`** — SQLite schema (dev/demo)
- **`database/providers.db`** — Live SQLite database with 1,463 providers
- **`database/seed_verified.sql`** — Script to mark imported CMS brands as verified
The schema is designed to be **additive** to the existing CMS schema from `schemas/cms-schema-spec.md`.
The original 12 verified brands and their packages/products should be imported first, then
`seed_verified.sql` marks them as `verified = true, listing_tier = 'verified'`.
---
## Verified Provider Upgrade Path
When an unverified provider applies to become verified:
1. They claim their listing (email verification or ABN match)
2. They fill in missing fields: description, logo, badge, brand colours
3. They configure packages with full inclusion breakdowns
4. They enable arrangement booking
5. Admin approves → `verified = true, listing_tier = 'verified'`
The backend should support this flow — updating an existing unverified brand
record rather than creating a new one.