Operations
The operational setup blueprint and day-to-day runbooks. This is an execution checklist, not an architecture document — for architecture see the engineering overview.
Core principle
Section titled “Core principle”Before writing any code, five things must be true:
- accounts exist
- infrastructure is accessible
- development tools are installed
- data sources are reachable
- the deployment path is defined
If any are missing, development is blocked.
Accounts and platforms
Section titled “Accounts and platforms”| Area | Choice | Purpose |
|---|---|---|
| Source control | GitHub (org recommended) | code hosting, collaboration, CI/CD |
| Backend hosting | FastAPI Cloud (MVP) — Render is the fallback | run FastAPI, serve the API |
| Frontend hosting | Cloudflare Pages | static app hosting |
| App database | Neon (serverless Postgres, free tier to start) | events, sessions, analytics |
| Auth | Supabase Auth (managed login + JWT) | identity only — app data stays in Neon |
| Containers | Docker Hub (optional) | image storage / deploy consistency |
| Error tracking | Sentry or equivalent | detect production failures |
| Logs | cloud logs dashboard | monitor system health |
Local development environment
Section titled “Local development environment”System tools: Git, Docker, Node.js (LTS), Python 3.10+. Python stack: a virtual
env manager (repo uses uv), PyTorch, OpenCV, a YOLO library. Frontend stack:
Node.js + bun, React (TanStack Start + Vite, scaffolded in frontend/).
See Local development for the exact run steps and port map.
Video / camera data access
Section titled “Video / camera data access”You must prepare at least one source: an RTSP camera stream, or recorded CCTV
video files. Credentials to collect for a live camera: IP address,
username/password (if needed), and the stream URL format. See the research doc
camera-access.html for the full set of ways to reach a feed (LAN RTSP, NVR,
mesh VPN, tunnel, push, offline) and which to use per stage.
Near-real-time layer
Section titled “Near-real-time layer”MVP latency budget is seconds–minutes, so the transport is deliberately simple:
- The dashboard polls the REST API every few seconds.
- The edge batches events and POSTs over HTTP (buffered + retry on the box).
- No WebSockets, no Redis.
Phase 2 (only if a sub-second live-tick UX is ever required): a WebSocket server
- Redis event buffer.
Deployment strategy
Section titled “Deployment strategy”Chosen: hybrid / edge-first. The CV worker runs where the camera is; the cloud only stores events and serves the dashboard.
STORE (edge box / laptop) CLOUDCamera → CV worker (YOLO+ByteTrack) → FastAPI (validates Supabase JWT) → Postgres (Neon) → Dashboard (Cloudflare) batched HTTP events (JSON)- Cloud-only (dev/demo, not for live): worker + everything on one laptop reading a recorded file. Fine for building or processing a prospect’s clip.
- Edge: worker on a small in-store box, reads RTSP locally, sends only events upstream. Real-time and privacy-friendly (video never leaves the store).
- Hybrid (chosen): edge does detection/tracking/events; cloud does storage, auth, analytics and dashboard. Early pilots can skip buying a box if the stream is reachable via VPN/public RTSP.
Runbook — database migrations
Section titled “Runbook — database migrations”Migrations are Alembic, keyed to app_settings.database_url.
Local:
cd backenduv run alembic upgrade headProduction (Neon) runs automatically in CI before every backend deploy
(.github/workflows/backend.yml, migrate step). To run it out of band — e.g.
when your local network can’t reach Neon on 5432 — use the manual workflow:
gh workflow run migrate.yml # runs `alembic upgrade head` against PROD_DATABASE_URLor trigger it from the Actions tab → “Migrate (manual)” → Run workflow.
Runbook — monitoring and health
Section titled “Runbook — monitoring and health”- Error tracking: Sentry (or equivalent) for backend + frontend exceptions.
- Logs: the cloud logs dashboard for the deployed backend.
- Camera liveness: the worker sends events + a heartbeat; a camera shows
live in the backend
/healthview when both are landing. If a camera goes stale, that view is the first place to look.
Runbook — post-deploy smoke test
Section titled “Runbook — post-deploy smoke test”- Backend
/docsloads at the FastAPI Cloud URL. - Log in on the dashboard (OTP + Google) → your store appears.
- Admin: provision a store, invite an owner, add a camera.
- Point the on-site worker at the deployed API → events + heartbeat land →
camera shows live in
/health.
Operational risks
Section titled “Operational risks”| Risk | Impact |
|---|---|
| No camera access | Blocks the CV pipeline completely |
| Weak hardware | CV inference becomes slow |
| No RTSP support | System cannot ingest live feeds |
| Cloud cost | Must optimize inference cost early |
Execution rule
Section titled “Execution rule”Build the CV counter locally first — on sample video, no cloud, no camera. Prove the count is right, then add API, DB and deployment. Infrastructure serves a working product, not the other way around.
Related
Section titled “Related”- Local development — run services locally
- Deployment — the full deploy runbook
- Testing — test strategy and CI