Initial commit: funeral provider discovery pipeline
Python crawlers for VIC Register, Funerals Australia, NFDA n8n workflows for scheduled discovery and enrichment SQLite schema and seeded dev database (1,463 providers) End-to-end process documentation in n8n/PROCESS.md
This commit is contained in:
69
database/IMAGE-MAPPING.md
Normal file
69
database/IMAGE-MAPPING.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Image Assets & Verified Provider Mapping
|
||||
|
||||
## Image Directory Structure
|
||||
|
||||
All images are downloaded locally in `images/` with the following structure:
|
||||
|
||||
```
|
||||
images/
|
||||
├── manifest.json # Full index mapping CMS IDs → local paths
|
||||
├── providers/{slug}/ # 12 verified brands
|
||||
│ ├── logo.{ext} # Rectangular/stacked logo
|
||||
│ └── badge.{ext} # Circular/square badge (for cards)
|
||||
├── funeral-homes/{slug}/ # 7 parent organisations
|
||||
│ └── logo.{ext}
|
||||
├── locations/{slug}/ # 20 physical offices
|
||||
│ └── photo.{ext} # Building/staff hero photo
|
||||
├── coffins/{category}/ # 201 coffins by range
|
||||
│ └── {slug}/01.{ext} # 1-4 images per coffin
|
||||
├── venues/{slug}/ # 1,678 service venues
|
||||
│ └── 01.{ext}
|
||||
└── crematoriums/{slug}/ # 38 crematoriums
|
||||
└── 01.{ext}
|
||||
```
|
||||
|
||||
## Verified Brand → Image Mapping
|
||||
|
||||
These are the 12 existing verified brands from the CMS, with their image paths:
|
||||
|
||||
| CMS ID | Brand | Logo | Badge |
|
||||
|--------|-------|------|-------|
|
||||
| 1 | H.Parsons Funeral Directors | `providers/hparsons-funeral-directors/logo.png` | `providers/hparsons-funeral-directors/badge.png` |
|
||||
| 3 | Rankins Funerals | `providers/rankins-funerals/logo.webp` | `providers/rankins-funerals/badge.png` |
|
||||
| 4 | Parsons Ladies Funeral Directors | `providers/parsons-ladies-funeral-directors/logo.png` | `providers/parsons-ladies-funeral-directors/badge.png` |
|
||||
| 5 | Wollongong City Funerals | `providers/wollongong-city-funerals/logo.webp` | `providers/wollongong-city-funerals/badge.png` |
|
||||
| 6 | Easy Funerals | `providers/easy-funerals/logo.webp` | `providers/easy-funerals/badge.png` |
|
||||
| 7 | Mackay Family Funerals | `providers/mackay-family-funerals/logo.webp` | `providers/mackay-family-funerals/badge.png` |
|
||||
| 8 | H.Parsons Shoalhaven | `providers/hparsons-funeral-directors-shoalhaven/logo.png` | `providers/hparsons-funeral-directors-shoalhaven/badge.png` |
|
||||
| 9 | Killick Family Funerals | `providers/killick-family-funerals/logo.webp` | `providers/killick-family-funerals/badge.png` |
|
||||
| 10 | Kenneally's Funerals | `providers/kenneallys-funerals/logo.webp` | `providers/kenneallys-funerals/badge.png` |
|
||||
| 11 | Lady Anne Funerals | `providers/lady-anne-funerals/logo.webp` | `providers/lady-anne-funerals/badge.png` |
|
||||
| 12 | Mannings Funerals | `providers/mannings-funerals/logo.webp` | `providers/mannings-funerals/badge.png` |
|
||||
| 13 | Botanical Funerals | `providers/botanical-funerals-by-ian-allison/logo.webp` | `providers/botanical-funerals-by-ian-allison/badge.png` |
|
||||
|
||||
## How to Use on the Demo Site
|
||||
|
||||
### For verified providers:
|
||||
- Serve images from `images/providers/{slug}/` for logos and badges
|
||||
- Serve location photos from `images/locations/{slug}/`
|
||||
- Serve product images from `images/coffins/`, `images/venues/`, `images/crematoriums/`
|
||||
- The `manifest.json` contains the full mapping from CMS record IDs to local file paths
|
||||
|
||||
### For unverified providers:
|
||||
- **No images** — they have no logo, badge, or photos
|
||||
- Use a generic placeholder or text-based display (business name initials, etc.)
|
||||
- Images are only added when a provider signs up to become verified
|
||||
|
||||
### Importing verified brands:
|
||||
The 12 verified brands need to be imported into the database with their full data from
|
||||
`schemas/brands-full.json` (brand details, locations, packages, inclusions) and linked
|
||||
to their images. Some of these brands were also discovered by the crawler and already
|
||||
exist in `providers.db` as unverified — they should be **upgraded** (set `verified = true`,
|
||||
add images) rather than duplicated.
|
||||
|
||||
### Product images:
|
||||
- 201 coffins with 1-4 images each, organised by range (solid-timber, custom-board, etc.)
|
||||
- 1,678 venue photos
|
||||
- 38 crematorium photos
|
||||
- These are only relevant for verified provider flows (arrangement booking)
|
||||
- The `manifest.json` maps each product's CMS ID to its local image path
|
||||
209
database/PROVIDER-SCHEMA-SPEC.md
Normal file
209
database/PROVIDER-SCHEMA-SPEC.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# Provider Data Model — Verified & Unverified Providers
|
||||
|
||||
This document extends the CMS schema (`schemas/cms-schema-spec.md`) with support for
|
||||
unverified (auto-discovered) providers alongside the existing verified (signed-up) providers.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The platform lists funeral directors in two categories:
|
||||
|
||||
- **Verified providers** — Signed up to the platform. Full branding (logo, badge, colours),
|
||||
complete package configuration, and online arrangement booking enabled.
|
||||
- **Unverified providers** — Auto-discovered from public registries and their own websites.
|
||||
Listed with whatever public information is available. Can apply to become verified.
|
||||
|
||||
All providers share the same `funeral_brand` table and schema. The difference is driven
|
||||
by data completeness and the `verified` / `listing_tier` fields.
|
||||
|
||||
---
|
||||
|
||||
## Schema Changes to FuneralBrand
|
||||
|
||||
These fields are **added** to the existing FuneralBrand collection from `cms-schema-spec.md`:
|
||||
|
||||
| Field | Type | Default | Purpose |
|
||||
|-------|------|---------|---------|
|
||||
| `verified` | Boolean | `false` | `true` for signed-up partners, `false` for auto-discovered |
|
||||
| `listing_tier` | Enum | `'listed'` | Display tier, computed from data quality (see below) |
|
||||
| `hidden` | Boolean | `true` | Unverified providers start hidden until admin-reviewed |
|
||||
| `source_key` | String (unique) | `null` | Provenance identifier, e.g. `"nfda:1234"` |
|
||||
| `source_url` | String (URL) | `null` | Where this record was discovered |
|
||||
| `last_enriched_at` | DateTime | `null` | When data was last refreshed from provider's website |
|
||||
| `enrichment_status` | Enum | `'pending'` | `pending` / `partial` / `complete` / `failed` |
|
||||
|
||||
### Fields that become optional for unverified providers
|
||||
|
||||
These fields are **required** for verified providers but **nullable** for unverified:
|
||||
|
||||
| Field | Verified | Unverified |
|
||||
|-------|----------|------------|
|
||||
| `logo` | Required (brand logo image) | `null` — no images until they sign up |
|
||||
| `badge` | Required (card badge image) | `null` — no images until they sign up |
|
||||
| `description` | Required | Optional (extracted from their website if available) |
|
||||
| `backgroundColour` | Set (brand theme) | `null` — use platform default |
|
||||
| `foregroundColour` | Set (brand theme) | `null` — use platform default |
|
||||
| `modalDescription` | Set | `null` |
|
||||
| `code` | Set (URL slug) | Auto-generated from business name |
|
||||
|
||||
### Fields present for both verified and unverified
|
||||
|
||||
| Field | Notes |
|
||||
|-------|-------|
|
||||
| `title` | Business name (always present) |
|
||||
| `phone` | Contact phone (present for ~94% of providers) |
|
||||
| `email` | Contact email (present for ~66%) |
|
||||
| `website` | External website URL (present for ~68%) |
|
||||
| `abn` | Australian Business Number (strongest dedup key) |
|
||||
| `businessAddress/Suburb/State/Postcode` | Business location |
|
||||
| `availableFuneralTypes` | Comma-separated funeral type IDs |
|
||||
|
||||
---
|
||||
|
||||
## Listing Tiers
|
||||
|
||||
Every provider is assigned a `listing_tier` that determines how they appear on the platform.
|
||||
The tier is **computed from data quality** — specifically from what package/pricing data exists.
|
||||
|
||||
| Tier | Value | Criteria | UI Treatment |
|
||||
|------|-------|----------|-------------|
|
||||
| **Verified** | `'verified'` | `verified = true` | Full branding, package selection, online arrangements, custom images |
|
||||
| **Priced** | `'priced'` | Unverified + 2 or more packages with itemized inclusion prices | Show packages with line-item breakdowns, no arrangements |
|
||||
| **Estimated** | `'estimated'` | Unverified + at least 1 package with a total price | Show package prices, "Contact for full details" on breakdowns |
|
||||
| **Listed** | `'listed'` | Unverified + no pricing data | Show contact info only, "Contact for pricing" CTA |
|
||||
|
||||
### Tier computation logic
|
||||
|
||||
```
|
||||
if brand.verified:
|
||||
tier = 'verified'
|
||||
elif brand has 2+ packages, each with 2+ priced inclusions:
|
||||
tier = 'priced'
|
||||
elif brand has 1+ packages with any price:
|
||||
tier = 'estimated'
|
||||
else:
|
||||
tier = 'listed'
|
||||
```
|
||||
|
||||
### Upgrade incentive
|
||||
|
||||
Each tier below verified creates a natural CTA for the provider:
|
||||
- `listed` → "Publish your pricing to help families compare"
|
||||
- `estimated` → "Add detailed breakdowns to stand out"
|
||||
- `priced` → "Sign up to enable online arrangements and add your branding"
|
||||
|
||||
---
|
||||
|
||||
## Data Relationships (unchanged from CMS spec, but applied to both tiers)
|
||||
|
||||
```
|
||||
FuneralBrand (verified or unverified)
|
||||
├── Location[] (physical offices — at least 1 per provider)
|
||||
├── Package[] (funeral plan bundles — 0 for 'listed' tier)
|
||||
│ └── PackageInclusion[] (fee line items — 0 for 'estimated' tier)
|
||||
├── KnownFor[] (feature badges — verified only typically)
|
||||
└── FuneralArea[] (service regions — M:N)
|
||||
```
|
||||
|
||||
### Package (same schema as CMS spec, with additions)
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `id` | PK | |
|
||||
| `title` | String | e.g. "Direct Cremation", "Chapel Service" |
|
||||
| `description` | Text | What's included |
|
||||
| `funeral_type` | Enum | `Service & Cremation`, `Service & Burial`, `Cremation Only`, `Graveside Burial`, `Water Cremation` |
|
||||
| `brand_id` | FK → FuneralBrand | |
|
||||
| `source_url` | String | Where this pricing was found (provider's website) |
|
||||
| `extraction_confidence` | Float 0-1 | How reliable the extracted data is (0.7 = HTML, 0.6 = PDF) |
|
||||
| `sort` | Integer | Display order |
|
||||
| `hidden` | Boolean | |
|
||||
|
||||
### PackageInclusion (same schema as CMS spec)
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `id` | PK | |
|
||||
| `price` | Decimal | Dollar amount |
|
||||
| `optional` | Boolean | User can opt in/out |
|
||||
| `complimentary` | Boolean | Included free |
|
||||
| `display` | Boolean | Whether shown to user |
|
||||
| `inclusion_type_title` | String | Category label (see standard types below) |
|
||||
| `package_id` | FK → Package | |
|
||||
|
||||
### Standard inclusion type names
|
||||
|
||||
These are the consistent labels used across all providers:
|
||||
|
||||
**Standard fees:** Professional Service Fee, Transportation Service Fee, Professional Mortuary Care, Death Registration Certificate, Cremation Certificate/Permit, Government Levy, Accommodation
|
||||
|
||||
**Products:** Coffin, Cremation Fee, Cemetery Fee, Celebrant Fee
|
||||
|
||||
**Optional extras:** Saturday Service Fee, Twilight Service Surcharge, Viewing Fee, After Hours Transfer Surcharge, Dressing Fee, Embalming, Digital Recording, Webstreaming, Coffin Bearing by Funeral Directors
|
||||
|
||||
---
|
||||
|
||||
## Current Data
|
||||
|
||||
The database (`database/providers.db`, SQLite) contains:
|
||||
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| Total providers | 1,463 |
|
||||
| With phone | 1,380 (94%) |
|
||||
| With email | 972 (66%) |
|
||||
| With website | 994 (68%) |
|
||||
| With description | 618 (42%) |
|
||||
| Total packages | 416 |
|
||||
| Total inclusions | 388 |
|
||||
|
||||
### Tier distribution
|
||||
|
||||
| Tier | Providers |
|
||||
|------|-----------|
|
||||
| Verified | 0 (existing 12 brands not yet imported as verified) |
|
||||
| Priced | 10 |
|
||||
| Estimated | 111 |
|
||||
| Listed | 1,342 |
|
||||
|
||||
### State distribution
|
||||
|
||||
| State | Providers | With Pricing |
|
||||
|-------|-----------|-------------|
|
||||
| VIC | 701 | 77 |
|
||||
| NSW | 269 | 8 |
|
||||
| QLD | 151 | 21 |
|
||||
| SA | 85 | 1 |
|
||||
| WA | 79 | 12 |
|
||||
| TAS | 25 | 0 |
|
||||
| NT | 7 | 0 |
|
||||
| ACT | 9 | 0 |
|
||||
|
||||
---
|
||||
|
||||
## Database Schema Files
|
||||
|
||||
- **`database/schema.sql`** — Full Postgres schema (production-ready)
|
||||
- **`database/schema_sqlite.sql`** — SQLite schema (dev/demo)
|
||||
- **`database/providers.db`** — Live SQLite database with 1,463 providers
|
||||
- **`database/seed_verified.sql`** — Script to mark imported CMS brands as verified
|
||||
|
||||
The schema is designed to be **additive** to the existing CMS schema from `schemas/cms-schema-spec.md`.
|
||||
The original 12 verified brands and their packages/products should be imported first, then
|
||||
`seed_verified.sql` marks them as `verified = true, listing_tier = 'verified'`.
|
||||
|
||||
---
|
||||
|
||||
## Verified Provider Upgrade Path
|
||||
|
||||
When an unverified provider applies to become verified:
|
||||
|
||||
1. They claim their listing (email verification or ABN match)
|
||||
2. They fill in missing fields: description, logo, badge, brand colours
|
||||
3. They configure packages with full inclusion breakdowns
|
||||
4. They enable arrangement booking
|
||||
5. Admin approves → `verified = true, listing_tier = 'verified'`
|
||||
|
||||
The backend should support this flow — updating an existing unverified brand
|
||||
record rather than creating a new one.
|
||||
BIN
database/providers.db
Normal file
BIN
database/providers.db
Normal file
Binary file not shown.
285
database/schema.sql
Normal file
285
database/schema.sql
Normal file
@@ -0,0 +1,285 @@
|
||||
-- Provider Discovery Pipeline - Database Schema
|
||||
-- Designed for Postgres. Compatible with SilverStripe CMS adaptation.
|
||||
--
|
||||
-- This schema covers the provider-facing tables needed for both
|
||||
-- verified (signed-up) and unverified (auto-discovered) providers.
|
||||
-- Product catalog tables (coffins, venues, etc.) are NOT included here —
|
||||
-- those only apply to verified providers and live in the main CMS.
|
||||
|
||||
BEGIN;
|
||||
|
||||
-- ============================================================
|
||||
-- ENUMS
|
||||
-- ============================================================
|
||||
|
||||
CREATE TYPE enrichment_status AS ENUM ('pending', 'partial', 'complete', 'failed');
|
||||
|
||||
-- Listing tier determines how a provider appears on the platform.
|
||||
-- Computed from data quality: verified status + packages + inclusions.
|
||||
CREATE TYPE listing_tier AS ENUM (
|
||||
'verified', -- Tier 1: Signed up, full branding, arrangements enabled
|
||||
'priced', -- Tier 2: Unverified, 2+ packages with itemized inclusion prices
|
||||
'estimated', -- Tier 3: Unverified, at least one total package price
|
||||
'listed' -- Tier 4: Unverified, contact info only, no pricing
|
||||
);
|
||||
|
||||
CREATE TYPE funeral_type_enum AS ENUM (
|
||||
'Service & Cremation',
|
||||
'Service & Burial',
|
||||
'Cremation Only',
|
||||
'Graveside Burial',
|
||||
'Water Cremation'
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- 1. FUNERAL HOME (parent organisation)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE funeral_home (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
website TEXT,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- 2. FUNERAL BRAND (customer-facing provider)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE funeral_brand (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
modal_description TEXT,
|
||||
email TEXT,
|
||||
phone TEXT,
|
||||
website TEXT,
|
||||
abn TEXT,
|
||||
code TEXT UNIQUE, -- URL slug (e.g. "hparsons")
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden BOOLEAN NOT NULL DEFAULT TRUE, -- unverified start hidden
|
||||
|
||||
-- Address
|
||||
business_address TEXT,
|
||||
business_suburb TEXT,
|
||||
business_state TEXT,
|
||||
business_postcode TEXT,
|
||||
|
||||
-- Branding (nullable — unverified providers have no images)
|
||||
background_colour TEXT,
|
||||
foreground_colour TEXT,
|
||||
|
||||
-- Organisation
|
||||
funeral_home_id INTEGER REFERENCES funeral_home(id) ON DELETE SET NULL,
|
||||
|
||||
-- Verified vs auto-discovered
|
||||
verified BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
|
||||
-- Provenance tracking
|
||||
source_key TEXT UNIQUE, -- "{source}:{externalId}" for dedup
|
||||
source_url TEXT, -- where this record was found
|
||||
last_enriched_at TIMESTAMPTZ,
|
||||
enrichment_status enrichment_status NOT NULL DEFAULT 'pending',
|
||||
|
||||
-- Listing tier (computed from data quality)
|
||||
listing_tier listing_tier NOT NULL DEFAULT 'listed',
|
||||
|
||||
-- Funeral types offered (comma-separated IDs, same as existing CMS)
|
||||
available_funeral_types TEXT,
|
||||
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Deduplication indexes
|
||||
CREATE INDEX idx_brand_abn ON funeral_brand(abn) WHERE abn IS NOT NULL;
|
||||
CREATE INDEX idx_brand_listing_tier ON funeral_brand(listing_tier);
|
||||
CREATE INDEX idx_brand_source_key ON funeral_brand(source_key) WHERE source_key IS NOT NULL;
|
||||
CREATE INDEX idx_brand_name_postcode ON funeral_brand(title, business_postcode);
|
||||
CREATE INDEX idx_brand_verified ON funeral_brand(verified);
|
||||
CREATE INDEX idx_brand_hidden ON funeral_brand(hidden);
|
||||
CREATE INDEX idx_brand_enrichment ON funeral_brand(enrichment_status) WHERE verified = FALSE;
|
||||
|
||||
-- ============================================================
|
||||
-- 3. LOCATION (physical office/chapel)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE location (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL, -- display name (e.g. "Kingaroy, QLD")
|
||||
address TEXT,
|
||||
suburb TEXT,
|
||||
state TEXT,
|
||||
postcode TEXT,
|
||||
country TEXT DEFAULT 'Australia',
|
||||
lat DOUBLE PRECISION,
|
||||
lng DOUBLE PRECISION,
|
||||
rating REAL, -- Google rating 0-5
|
||||
rating_num INTEGER, -- number of Google reviews
|
||||
google_place_key TEXT, -- Google Places ID
|
||||
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_location_brand ON location(brand_id);
|
||||
CREATE INDEX idx_location_state ON location(state);
|
||||
CREATE INDEX idx_location_postcode ON location(postcode);
|
||||
CREATE INDEX idx_location_coords ON location(lat, lng);
|
||||
CREATE INDEX idx_location_google ON location(google_place_key) WHERE google_place_key IS NOT NULL;
|
||||
|
||||
-- ============================================================
|
||||
-- 4. FUNERAL AREA (service region)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE funeral_area (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
code TEXT,
|
||||
description TEXT,
|
||||
postcodes TEXT, -- comma-separated postcode list
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden BOOLEAN DEFAULT FALSE,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
-- Junction: brand <-> funeral_area
|
||||
CREATE TABLE brand_funeral_area (
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
funeral_area_id INTEGER NOT NULL REFERENCES funeral_area(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (brand_id, funeral_area_id)
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- 5. PACKAGE (funeral plan bundle)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE package (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden BOOLEAN DEFAULT FALSE,
|
||||
for_whom TEXT, -- 'myself' / 'someone' / null (both)
|
||||
religion TEXT, -- comma-separated supported religions
|
||||
funeral_type funeral_type_enum,
|
||||
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
|
||||
-- Provenance (for AI-extracted packages)
|
||||
source_url TEXT, -- page this was extracted from
|
||||
extraction_confidence REAL, -- 0-1 confidence score from AI
|
||||
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_package_brand ON package(brand_id);
|
||||
CREATE INDEX idx_package_type ON package(funeral_type);
|
||||
|
||||
-- Junction: package <-> funeral_area
|
||||
CREATE TABLE package_funeral_area (
|
||||
package_id INTEGER NOT NULL REFERENCES package(id) ON DELETE CASCADE,
|
||||
funeral_area_id INTEGER NOT NULL REFERENCES funeral_area(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (package_id, funeral_area_id)
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- 6. PACKAGE INCLUSION (fee line item within a package)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE package_inclusion (
|
||||
id SERIAL PRIMARY KEY,
|
||||
price NUMERIC(10,2) NOT NULL,
|
||||
optional BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
complimentary BOOLEAN NOT NULL DEFAULT FALSE,
|
||||
display BOOLEAN NOT NULL DEFAULT TRUE,
|
||||
description TEXT,
|
||||
sort INTEGER DEFAULT 0,
|
||||
inclusion_type_title TEXT NOT NULL, -- category label (e.g. "Professional Service Fee")
|
||||
|
||||
package_id INTEGER NOT NULL REFERENCES package(id) ON DELETE CASCADE,
|
||||
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_inclusion_package ON package_inclusion(package_id);
|
||||
|
||||
-- ============================================================
|
||||
-- 7. KNOWN FOR (feature badges on provider cards)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE known_for (
|
||||
id SERIAL PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_known_for_brand ON known_for(brand_id);
|
||||
|
||||
-- ============================================================
|
||||
-- 8. SOURCE LOG (audit trail of scrape runs)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE source_log (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_name TEXT NOT NULL, -- 'vic_register', 'gathered_here', 'nfda', 'funerals_australia'
|
||||
run_started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
run_finished_at TIMESTAMPTZ,
|
||||
records_found INTEGER DEFAULT 0,
|
||||
records_new INTEGER DEFAULT 0,
|
||||
records_updated INTEGER DEFAULT 0,
|
||||
records_skipped INTEGER DEFAULT 0,
|
||||
status TEXT DEFAULT 'running', -- 'running', 'completed', 'failed'
|
||||
error_message TEXT,
|
||||
metadata JSONB -- any extra run info
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- 9. SOURCE RECORD (raw scraped data, kept for audit)
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE source_record (
|
||||
id SERIAL PRIMARY KEY,
|
||||
source_name TEXT NOT NULL,
|
||||
source_id TEXT NOT NULL, -- external ID from the source
|
||||
source_url TEXT,
|
||||
raw_data JSONB NOT NULL, -- original scraped data
|
||||
normalized_data JSONB, -- mapped to intermediate format
|
||||
matched_brand_id INTEGER REFERENCES funeral_brand(id) ON DELETE SET NULL,
|
||||
match_type TEXT, -- 'source_key', 'abn', 'name_postcode', 'fuzzy', 'new'
|
||||
processed_at TIMESTAMPTZ,
|
||||
log_id INTEGER REFERENCES source_log(id) ON DELETE SET NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
|
||||
UNIQUE(source_name, source_id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_source_record_source ON source_record(source_name, source_id);
|
||||
CREATE INDEX idx_source_record_brand ON source_record(matched_brand_id) WHERE matched_brand_id IS NOT NULL;
|
||||
|
||||
-- ============================================================
|
||||
-- UPDATED_AT TRIGGER
|
||||
-- ============================================================
|
||||
|
||||
CREATE OR REPLACE FUNCTION update_updated_at()
|
||||
RETURNS TRIGGER AS $$
|
||||
BEGIN
|
||||
NEW.updated_at = NOW();
|
||||
RETURN NEW;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
|
||||
CREATE TRIGGER trg_funeral_home_updated BEFORE UPDATE ON funeral_home FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
CREATE TRIGGER trg_funeral_brand_updated BEFORE UPDATE ON funeral_brand FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
CREATE TRIGGER trg_location_updated BEFORE UPDATE ON location FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
CREATE TRIGGER trg_funeral_area_updated BEFORE UPDATE ON funeral_area FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
CREATE TRIGGER trg_package_updated BEFORE UPDATE ON package FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
CREATE TRIGGER trg_package_inclusion_updated BEFORE UPDATE ON package_inclusion FOR EACH ROW EXECUTE FUNCTION update_updated_at();
|
||||
|
||||
COMMIT;
|
||||
221
database/schema_sqlite.sql
Normal file
221
database/schema_sqlite.sql
Normal file
@@ -0,0 +1,221 @@
|
||||
-- Provider Discovery Pipeline - SQLite Schema (for local dev/testing)
|
||||
-- Production uses Postgres (see schema.sql)
|
||||
|
||||
-- ============================================================
|
||||
-- FUNERAL HOME
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS funeral_home (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
website TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- FUNERAL BRAND
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS funeral_brand (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
modal_description TEXT,
|
||||
email TEXT,
|
||||
phone TEXT,
|
||||
website TEXT,
|
||||
abn TEXT,
|
||||
code TEXT UNIQUE,
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden INTEGER NOT NULL DEFAULT 1,
|
||||
|
||||
business_address TEXT,
|
||||
business_suburb TEXT,
|
||||
business_state TEXT,
|
||||
business_postcode TEXT,
|
||||
|
||||
background_colour TEXT,
|
||||
foreground_colour TEXT,
|
||||
|
||||
funeral_home_id INTEGER REFERENCES funeral_home(id) ON DELETE SET NULL,
|
||||
|
||||
verified INTEGER NOT NULL DEFAULT 0,
|
||||
source_key TEXT UNIQUE,
|
||||
source_url TEXT,
|
||||
last_enriched_at TEXT,
|
||||
enrichment_status TEXT NOT NULL DEFAULT 'pending' CHECK(enrichment_status IN ('pending','partial','complete','failed')),
|
||||
|
||||
-- Listing tier: verified | priced | estimated | listed
|
||||
listing_tier TEXT NOT NULL DEFAULT 'listed'
|
||||
CHECK(listing_tier IN ('verified','priced','estimated','listed')),
|
||||
|
||||
available_funeral_types TEXT,
|
||||
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_abn ON funeral_brand(abn);
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_source_key ON funeral_brand(source_key);
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_listing_tier ON funeral_brand(listing_tier);
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_name_postcode ON funeral_brand(title, business_postcode);
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_verified ON funeral_brand(verified);
|
||||
CREATE INDEX IF NOT EXISTS idx_brand_hidden ON funeral_brand(hidden);
|
||||
|
||||
-- ============================================================
|
||||
-- LOCATION
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS location (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
address TEXT,
|
||||
suburb TEXT,
|
||||
state TEXT,
|
||||
postcode TEXT,
|
||||
country TEXT DEFAULT 'Australia',
|
||||
lat REAL,
|
||||
lng REAL,
|
||||
rating REAL,
|
||||
rating_num INTEGER,
|
||||
google_place_key TEXT,
|
||||
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_location_brand ON location(brand_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_location_postcode ON location(postcode);
|
||||
|
||||
-- ============================================================
|
||||
-- FUNERAL AREA
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS funeral_area (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
code TEXT,
|
||||
description TEXT,
|
||||
postcodes TEXT,
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden INTEGER DEFAULT 0,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS brand_funeral_area (
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
funeral_area_id INTEGER NOT NULL REFERENCES funeral_area(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (brand_id, funeral_area_id)
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- PACKAGE
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS package (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
sort INTEGER DEFAULT 0,
|
||||
hidden INTEGER DEFAULT 0,
|
||||
for_whom TEXT,
|
||||
religion TEXT,
|
||||
funeral_type TEXT CHECK(funeral_type IN (
|
||||
'Service & Cremation','Service & Burial','Cremation Only',
|
||||
'Graveside Burial','Water Cremation'
|
||||
)),
|
||||
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE,
|
||||
|
||||
source_url TEXT,
|
||||
extraction_confidence REAL,
|
||||
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_package_brand ON package(brand_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS package_funeral_area (
|
||||
package_id INTEGER NOT NULL REFERENCES package(id) ON DELETE CASCADE,
|
||||
funeral_area_id INTEGER NOT NULL REFERENCES funeral_area(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (package_id, funeral_area_id)
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- PACKAGE INCLUSION
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS package_inclusion (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
price REAL NOT NULL,
|
||||
optional INTEGER NOT NULL DEFAULT 0,
|
||||
complimentary INTEGER NOT NULL DEFAULT 0,
|
||||
display INTEGER NOT NULL DEFAULT 1,
|
||||
description TEXT,
|
||||
sort INTEGER DEFAULT 0,
|
||||
inclusion_type_title TEXT NOT NULL,
|
||||
|
||||
package_id INTEGER NOT NULL REFERENCES package(id) ON DELETE CASCADE,
|
||||
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_inclusion_package ON package_inclusion(package_id);
|
||||
|
||||
-- ============================================================
|
||||
-- KNOWN FOR
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS known_for (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
brand_id INTEGER NOT NULL REFERENCES funeral_brand(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_known_for_brand ON known_for(brand_id);
|
||||
|
||||
-- ============================================================
|
||||
-- SOURCE LOG
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS source_log (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
source_name TEXT NOT NULL,
|
||||
run_started_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
run_finished_at TEXT,
|
||||
records_found INTEGER DEFAULT 0,
|
||||
records_new INTEGER DEFAULT 0,
|
||||
records_updated INTEGER DEFAULT 0,
|
||||
records_skipped INTEGER DEFAULT 0,
|
||||
status TEXT DEFAULT 'running',
|
||||
error_message TEXT,
|
||||
metadata TEXT -- JSON string
|
||||
);
|
||||
|
||||
-- ============================================================
|
||||
-- SOURCE RECORD
|
||||
-- ============================================================
|
||||
|
||||
CREATE TABLE IF NOT EXISTS source_record (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
source_name TEXT NOT NULL,
|
||||
source_id TEXT NOT NULL,
|
||||
source_url TEXT,
|
||||
raw_data TEXT NOT NULL, -- JSON string
|
||||
normalized_data TEXT, -- JSON string
|
||||
matched_brand_id INTEGER REFERENCES funeral_brand(id) ON DELETE SET NULL,
|
||||
match_type TEXT,
|
||||
processed_at TEXT,
|
||||
log_id INTEGER REFERENCES source_log(id) ON DELETE SET NULL,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
|
||||
UNIQUE(source_name, source_id)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_source_record_source ON source_record(source_name, source_id);
|
||||
24
database/seed_verified.sql
Normal file
24
database/seed_verified.sql
Normal file
@@ -0,0 +1,24 @@
|
||||
-- Seed script: Mark existing brands as verified
|
||||
-- Run after importing existing CMS data into the new schema.
|
||||
--
|
||||
-- This updates all pre-existing brands (imported from brands-full.json)
|
||||
-- to verified=true, hidden=false, enrichment_status='complete'.
|
||||
|
||||
UPDATE funeral_brand
|
||||
SET verified = TRUE,
|
||||
hidden = FALSE,
|
||||
enrichment_status = 'complete',
|
||||
listing_tier = 'verified',
|
||||
updated_at = NOW()
|
||||
WHERE id IN (
|
||||
-- IDs from the existing 12 brands in brands-full.json
|
||||
-- These will be populated during the initial CMS data import.
|
||||
-- Update this list to match actual imported IDs.
|
||||
SELECT id FROM funeral_brand WHERE source_key IS NULL
|
||||
);
|
||||
|
||||
-- Alternatively, if importing with known codes:
|
||||
-- UPDATE funeral_brand SET verified = TRUE, hidden = FALSE, enrichment_status = 'complete'
|
||||
-- WHERE code IN ('hparsons', 'parsons-ladies', 'rankins', 'killick', 'botanical',
|
||||
-- 'easy', 'wollongong-city', 'kenneallys', 'lady-anne',
|
||||
-- 'mackay', 'mannings', 'guardian');
|
||||
Reference in New Issue
Block a user