Architecture Decision Log (ADR)

This document records the major architectural decisions made during the development of GrantMaster.

1. Multi-Tenancy: Logical Isolation vs. Physical Isolation

Status: Accepted
Decision: Use Logical Isolation (shared collections, organizationId filter).
Rationale:
- Firestore scaling handles large collections efficiently.
- Easier to implement platform-wide analytics and cross-tenant support (imuseration).
- Keeps operational costs lower by avoiding per-tenant database instances.
Consequence: Strictly requires organizationId checks at the Service and Security Rule layers.

2. Infrastructure: Firebase as Primary Backend

Status: Accepted
Decision: Use Firebase (Auth, Firestore, Cloud Functions).
Rationale:
- Rapid development velocity.
- Real-time capabilities (onSnapshot) are native.
- Generous free tier for NGOs in early stages.
Consequence: Vendor lock-in; requires careful abstraction in dataService.ts to allow for potential future migration.

3. Communication: In-Memory EventBus

Status: Accepted
Decision: Use an in-memory EventBus for inter-module communication.
Rationale:
- Avoids complexity of a message broker (RabbitMQ/PubSub) for internal sync.
- Immediate UI consistency.
- Persistence only where needed (Audit Logs).
Consequence: Listeners are synchronous; long-running tasks must be offloaded to Cloud Functions or background fetches.

4. Frontend: React Context vs. Redux/Zustand

Status: Accepted
Decision: Use React Context + Feature Providers.
Rationale:
- Native to React; no additional library weight.
- Data is naturally scoped to sections of the tree.
- Encourages a “Domain-Driven” frontend structure.
Consequence: Careful management needed to avoid unnecessary re-renders in large lists.

5. Security: Auditor Access Grants

Status: Accepted
Decision: Implement temporary, scope-limited access tokens for auditors.
Rationale:
- Compliance requirement: Auditors should not have full Admin credentials.
- Allows for time-boxed reviews of organizational data.
Consequence: Requires a dedicated AuditorContext and complex security rules that check for “Grant” active windows.

6. Billing: Stripe Payment Service Migration

Status: Accepted
Decision: Replace monolithic stripeService.ts with stripePaymentService.ts backed by Firebase Cloud Functions.
Rationale:
- All sensitive Stripe operations (secret keys, subscription mutations) must run server-side.
- Cloud Functions enforce authentication and authorization before touching Stripe.
- Frontend only handles the publishable key for Stripe Elements.
Consequence: Frontend calls httpsCallable wrappers; no direct Stripe SDK usage in the browser.

7. Billing: Credit & Entitlement System

Status: Accepted
Decision: Introduce a credit-based metering system (creditService.ts) with a reservation/consume/release lifecycle.
Rationale:
- AI agent runs are expensive; per-call billing requires atomic credit accounting.
- Firestore transactions prevent concurrent agent runs from overdrawing the credit balance.
- Credit packs can be purchased as one-time top-ups alongside the subscription.
Consequence: AgentExecutionService must reserve credits before a run and release unused credits on completion or failure. Entitlements are gated via src/config/entitlements.ts.

8. Extensions: Pluggable Feature Architecture

Status: Accepted
Decision: Build a multi-milestone extension system with a stable public API facade, contribution registry, settings panels, lifecycle hooks, dependency validation, data migrations, observability, and DX scaffolding.
Rationale:
- Features like Grant Calendar or Impact Tracking can be independently enabled/disabled per tenant.
- Marketplace model allows pricing and trial periods per extension.
- Clean separation from core via ExtensionAPI facade prevents internal coupling.
Consequence: Extensions register contributions (routes, sidebar items, widgets) through the contribution registry. Dependency graph validation prevents circular or missing dependencies.

9. AI Agents: Execution Architecture

Status: Accepted
Decision: Implement autonomous AI agents with a step-based execution model, tool registry, quota enforcement, and human-in-the-loop escalation.
Rationale:
- Agents need a bounded execution model (max steps, credit budgets) for cost control.
- Tool registry (AgentToolRegistry) restricts each agent to its declared allowed tools.
- Escalation pattern (awaiting_human status) lets agents pause for human approval on sensitive actions.
Consequence: Agent runs follow a state machine (queued → running → paused/awaiting_human → completed/failed/cancelled). Every tool execution is scoped to the triggering user’s RBAC permissions.

10. Frontend: Tailwind CSS 4 with OKLCH Design Tokens

Status: Accepted
Decision: Use Tailwind CSS 4 with @theme block and OKLCH color functions for the design token system.
Rationale:
- OKLCH produces perceptually uniform color scales (consistent perceived lightness across hues).
- Tailwind 4 @theme replaces tailwind.config.js for token definitions.
- primary-* tokens decouple brand color from hardcoded blue-* classes.
Consequence: All color references must use semantic tokens (primary-*, surface-*). Direct blue-*, gray-*, red-*, green-* Tailwind classes are prohibited (use primary-*, slate-*, rose-*, emerald-*).

11. Module Boundaries: Feature Public API Enforcement

Status: Accepted
Decision: Enforce a feature boundary rule where cross-feature imports must use the feature public API (@/features/<feature> or @/features/<feature>/index), not internal feature files.
Rationale:
- Reduces accidental tight coupling across feature implementations.
- Makes each feature easier to refactor internally without cascading breakages.
- Encourages explicit public surface design at feature boundaries.
Consequence: CI runs check:feature-public-api with a no-regression baseline (config/feature-public-api-violations-baseline.json) and publishes a report artifact (artifacts/feature-public-api-report.json).

12. Code Stewardship: Layer Ownership Coverage

Status: Accepted
Decision: Require explicit ownership mapping for all source files under src/ via config/layer-ownership.json.
Rationale:
- Clarifies accountability for architectural layers.
- Prevents unowned code paths from accumulating hidden maintenance risk.
- Supports review routing and faster incident response.
Consequence: CI runs check:layer-ownership and fails on unmapped files. A machine-readable report is published at artifacts/layer-ownership-report.json.

13. EventBus Migration: Superadmin → Platform Feature

Status: Accepted
Decision: Move the EventBus monitoring page from src/features/superadmin/ to src/features/platform/eventbus/ and simplify it to a 2-tab layout (Stream + Dead Letter).
Rationale:
- The EventBus page serves as operational infrastructure monitoring, not tenant-specific admin — it belongs in platform/.
- The old 4-tab layout (Overview, Events, Topics, Config) was backed by mock data from PlatformConsole seeds and provided no real operational value.
- Replacing mock-backed tabs with Firestore-backed Stream and Dead Letter tabs provides actionable monitoring.
- The Dead Letter tab now reads from the eventDlq collection and supports Cloud Function replay.
Consequence: Old superadmin EventBus components, seeds, and routes were deleted. The useEventBus and useDeadLetterQueue hooks now live under platform/eventbus/hooks/.

14. Procurement P2P: Server-Side tRPC Architecture

Status: Accepted
Decision: Implement the Procurement (Procure-to-Pay) feature as a server-side tRPC router with ServerProcurementService in Cloud Functions, rather than using client-side Firestore services.
Rationale:
- Procurement involves multi-step approval workflows and vendor qualification — these benefit from server-side validation and authorization.
- Zod schemas in packages/domain-schema/src/trpc/procurement.ts enforce input validation at the API boundary.
- Aligns with the established pattern for new features (expenses and journals also gained server-side services in this iteration).
Consequence: Frontend consumes procurement data via tRPC hooks. Domain schemas are shared between the frontend and Cloud Functions via the domain-schema package.

15. Integration OAuth: Server-Side Token Delegation

Status: Accepted
Decision: Move OAuth token exchange and refresh for third-party integrations (Google Calendar, HubSpot) to Cloud Functions. Raw tokens are stored in Firestore and never returned to the client.
Rationale:
- Client-side OAuth token handling exposes client secrets in the browser bundle.
- Server-side delegation keeps secrets in environment variables and tokens in Firestore integrationConfigs/ subcollections.
- The pattern is consistent with the existing Stripe payment service architecture (ADR #6).
Consequence: Frontend calls exchangeGoogleCalendarCode, refreshGoogleCalendarToken, and refreshHubSpotToken Cloud Functions. Environment variables GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, HUBSPOT_CLIENT_ID, HUBSPOT_CLIENT_SECRET must be configured on the server.

16. Skeleton Loading States: AppShell Skeleton System

Status: Accepted
Decision: Introduce a set of skeleton loading components (AppShellSkeleton, DashboardSkeleton, TablePageSkeleton, FormPageSkeleton, WorkspaceSkeleton, ApprovalsSkeleton) for use as React.Suspense fallbacks in route-level code splitting.
Rationale:
- Default spinner fallbacks cause layout shift when chunked route components load.
- Purpose-built skeletons match the layout of common page patterns (dashboard grids, data tables, form pages) and provide instant perceived responsiveness.
Consequence: Route lazy() boundaries should use the appropriate skeleton as the Suspense fallback. Skeletons live in src/components/ui/skeletons/ with the shell wrapper in src/components/ui/AppShellSkeleton.tsx.

17. Grant→Project Auto-Linking on Conversion

Status: Accepted (2026-04-14)
Decision: When a pipeline entry is converted to an ActiveGrant via ServerActiveGrantServiceClass.convert(), automatically create a linked Project document and bidirectionally link the two records.
Rationale:
- Previously, convertToActiveGrant() created only an ActiveGrant document; no project was spawned. This left won grants as data islands, disconnecting the entire downstream MVP flow (expenses, journals, compliance, reports) from the grant origin.
- The Project schema already carried grantId/activeGrantId fields but they were never populated automatically.
- Operational delivery (tasks, budget, expenses, journals, compliance) is scoped by projectId, so a missing project meant the grant had no operational hooks.
Consequence: ServerActiveGrantServiceClass.convert() (in functions/src/api/services/ServerGrantService.ts) now:
1. Creates the ActiveGrant document.
2. Looks up the pipeline entry to derive the project name and grant title.
3. Calls serverProjectService.create(...) with grantId, managerId, startDate/endDate, and budget derived from the award.
4. Back-links the project’s id onto the ActiveGrant as projectId.
5. Emits GRANT_WON with { grantId, pipelineId, projectId } in the payload so downstream subscribers receive the project link.
Client impact: useGrants.convertToActiveGrant invalidates utils.projects.list and utils.projects.stats on success so the new project appears immediately. Local-workspace (demo) mode displays a toast noting the linked project was created.
Follow-up: Budget cascade from grant-approved line items to project budget lines is not yet wired; see docs/planning/ for the work-tracking entry.

18. Grant-Aware Compliance Policy Auto-Attach

Status: Accepted (2026-04-14)
Decision: When a grant is converted to active (ADR #17), the server also auto-provisions draft compliance policies derived from the grant opportunity’s complianceRequirements and grantorType.
Rationale:
- Funder-specific compliance rules (e.g. EU procurement thresholds, USAID branding requirements) should flow from the grant context into the organization’s compliance surface without manual re-entry.
- Seeding policies as status: 'draft' with adoptionSource: 'grant_conversion' gives compliance officers visibility and a reviewable starting point rather than silent auto-enforcement.
Consequence: ServerActiveGrantServiceClass.convert() reads the source opportunity from grantOpportunities/{id}, extracts string-based complianceRequirements plus any overrides from grantDetails, and fans out serverCompliancePolicyService.create(...) calls with:
- name (truncated to 80 chars), description, category: 'Grant Compliance'
- severity: 'high', status: 'draft', frequency: 'ongoing'
- projectIds: [<newProjectId>], adoptionSource: 'grant_conversion', grantId, grantorType
- Failures are logged but non-fatal to the grant conversion (Promise.allSettled).
Follow-up: Full integration with GrantorComplianceService.getRecommendedRules() (platform rule recommendations by donor type) is a future enhancement; current implementation seeds from grant-specific requirements only.

19. Report Narrative Enrichment with Financial Context

Status: Accepted (2026-04-14)
Decision: generateReportNarrative() (in src/features/ai/services/geminiForecast.ts) accepts an optional ReportFinancialContext parameter that summarizes expenses and journal hours for the reporting period.
Rationale:
- Prior to this change, report generation received only projects[] and user. The AI produced generic, data-starved narratives with no real expenditure or effort figures.
- Funders expect concrete numbers (total spend, budget utilization %, hours by activity). Providing a structured summary into the prompt lets the AI cite specific figures rather than hallucinate.
Consequence:
- New exported type ReportFinancialContext encapsulates totalExpenses, expensesByCategory, currency, budgetUtilization, totalJournalHours, journalHoursByProject.
- useReportGeneration({ projects, user, financialContext }) forwards the context to the AI service.
- The Reports container (src/features/reports/components/Reports.tsx) composes the context from useExpenses() and useJournals() via useMemo and passes it to the hook.
- Dev-mode fallback narrative (when no Gemini API key is present) also emits a financial overview section when financialContext is populated.

20. Partnership Routes and Mission Routes Build-Gated for Launch

Status: Superseded by ADR #22 (2026-04-15)
Decision: Set BUILD_ENABLE_PARTNERSHIP_ROUTES = false in src/config/launchScope.ts. This gated both partnershipRoutes and missionRoutes (the flag governed both in src/routes/config/grantDomainRoutes.tsx).
Rationale:
- Partnerships were already enumerated in DISABLED_ROUTE_PATHS, creating a contradictory state where the route bundle was emitted but the paths were filtered at the router. Setting the build flag to false removes dead code from the production bundle.
- Mission, invitations, and partner detail pages are not part of the core MVP operational flow (discover → pipeline → win → deliver → report).
Consequence:
- partnershipRoutes and missionRoutes resolved to empty arrays at build time; their lazy imports tree-shook out.
- PlatformNavItemDef gained an optional launchCheck: () => boolean field so the platform sidebar can hide the Partnerships item at runtime.
- The tenant sidebar Mission nav item carried launchCheck: () => BUILD_ENABLE_PARTNERSHIP_ROUTES and was filtered out by useNavResolver.
- routeConfig.test.ts was updated to assert Portal/Stakeholders routes remain while Mission is gated.

21. Route-Level Error Boundaries on Suspense-Wrapped Tab Content

Status: Accepted (2026-04-14)
Decision: Wrap Suspense boundaries that load tab/section content with RouteErrorBoundary in DashboardTabs.tsx and ComplianceWorkspace.tsx.
Rationale:
- Without an error boundary, a failed React.lazy chunk load or a runtime error inside the loaded component causes a blank white screen for the whole dashboard/compliance workspace.
- RouteErrorBoundary already existed for page-level route boundaries (withRouteBoundary). Re-using it at the nested Suspense level isolates errors to the active tab/section.
Consequence:
- DashboardTabs wraps KeepAliveTabPanels in <RouteErrorBoundary isolationLevel="route">. A crashed widget tab shows the error UI inline while the rest of the dashboard (KPIs, header, nav) remains interactive.
- ComplianceWorkspace wraps the section Suspense the same way — a failed AlertsSection or TrendsSection load no longer takes down the PageTabs header.

22. Decouple Partnerships from Mission Behind Independent Launch Flags

Status: Accepted (2026-04-15)
Decision: Split the shared BUILD_ENABLE_PARTNERSHIP_ROUTES gate into two flags in src/config/launchScope.ts:
- BUILD_ENABLE_PARTNERSHIP_ROUTES = true — governs partnershipRoutes and the platform sidebar Partnerships item.
- BUILD_ENABLE_MISSION_ROUTE = false — governs missionRoutes (mission + invitations) and the tenant sidebar Mission item.
Rationale:
- The Partnerships platform workspace was wanted for launch, but Mission/invitations are still not part of MVP scope. Gating both under one flag forced an all-or-nothing choice.
- Keeping the gates independent lets each surface ship when its content is ready without ceremony around edits to the shared flag or the DISABLED_ROUTE_PATHS set.
Consequence:
- partnershipRoutes now emits its lazy bundles; missionRoutes still resolves to [] and tree-shakes out.
- Platform sidebar Partnerships nav item becomes visible; tenant sidebar Mission remains filtered out by useNavResolver via its new launchCheck: () => BUILD_ENABLE_MISSION_ROUTE.
- partnerships and partnerships/:partnerId removed from DISABLED_ROUTE_PATHS — the build flag is the single source of truth now.
- LAUNCH_BUILD_CONTROLS gains a mission-route entry alongside partnership-routes.
- routeConfig.test.ts was updated to assert Mission/invitations are absent while the rest of the Impact surface (Portal, Stakeholders) remains.

23. Repository Pattern for Persistence (`IFirestoreRepository<T>`)

Status: Accepted (2026-04-18)
Decision: Services do not import from firebase/firestore (client) or firebase-admin/firestore (server) directly. Persistence is routed through IFirestoreRepository<T> from @/core/repository (client) and ./core/repository (server). The interface is symmetric across both runtimes.
Rationale:
- The pre-existing dual-path pattern on BaseService (optional IFirestoreClient) left a ~1,400-site raw-SDK surface area in feature services; BaseService’s audit / quota / validation guarantees only fired when subclasses remembered to call them.
- A narrow, typed persistence interface lets the repository enforce tenant scoping (organizationId required on reads), schema validation (Zod), and policy (cache invalidation, error taxonomy) in one place rather than 1,400.
- Client/server symmetry means service migration patterns are portable — a migration on the client (e.g. grantDataAccess.ts) applies almost verbatim to the equivalent server handler.
Consequence:
- Two baselines enforce the rule: check:no-raw-firestore-in-services (client) and check:no-raw-firestore-in-functions (server). New violations fail CI; migrations monotonically shrink the baselines.
- API surface (both runtimes): getById, getByIdUnscoped, list, listByIds, listAcrossTenants, paginate, paginateAcrossTenants, create, update, setMerge, delete, batchUpdate, cursorById (admin), plus options transform (pre-validation hook) and skipTenantCheck (explicit opt-out).
- Client-only: stream(options) returns an Unsubscribe and routes through listenerManager for automatic listener pooling (30–50% fewer onSnapshot calls when the same query fans out to multiple components).
- Cross-repo coordination helpers: fallbackGetByIdUnscoped / fallbackList (for COLLECTION_FALLBACKS read order), createBatchWriter (cross-collection atomic writes), runInTransaction (admin; read-then-write atomic correctness).
- Shared taxonomy: @grantmaster/shared/errors (AppError hierarchy) and @grantmaster/shared/schemas (passthrough schemas for server-side collections) ensure both runtimes throw/catch the same classes and parse the same shapes.

When to use which repo method

Need	Method
Single doc, tenant-scoped	`getById(id, organizationId)`
Single doc, cross-tenant or subcollection	`getByIdUnscoped(id)`
List, tenant-scoped	`list({ organizationId, where?, orderBy?, limit? })`
List, cross-tenant / platform	`listAcrossTenants({ where?, orderBy?, limit? })`
Cursor pagination, tenant-scoped	`paginate({ organizationId, cursor, pageSize })`
Cursor pagination, platform	`paginateAcrossTenants({ cursor, pageSize })`
Bulk by id list	`listByIds(ids, { organizationId })`
Real-time subscription (client)	`stream({ organizationId, onNext, onError })`
Create (auto-id)	`create(data)`
Create (explicit id)	`create(data, { id })`
Update existing	`update(id, data, { organizationId? })`
Upsert (create-or-merge)	`setMerge(id, data, { organizationId? })`
Delete	`delete(id, { organizationId? })`
Bulk same-collection update	`batchUpdate(entries)`
Cross-collection atomic write	`createBatchWriter(db).set(repo, id, data).commit()`
Read-then-write atomic (server)	`runInTransaction(db, tx => { ... })`
Multi-collection fallback read	`fallbackGetByIdUnscoped([r1, r2], id)` / `fallbackList([r1, r2], opts)`

Subcollection pattern

Subcollections under a parent id (e.g. grantTracker/<pid>/tasks) are addressed by constructing a per-parent repo on the fly:

function pipelineTasksRepo(pipelineId: string) {
  return new FirestoreRepository<PipelineTask>({
    collectionName: `grantTracker/${pipelineId}/tasks`,
    schema: pipelineTaskSchema,
    getCollectionRef: () => collections.path(`grantTracker`, pipelineId, `tasks`),
    getDocRef: (id) => docs.path(`grantTracker`, pipelineId, `tasks`, id),
  });
}

Subcollection docs typically don’t carry their own organizationId (the parent path is the tenant boundary), so these repos use listAcrossTenants + getByIdUnscoped — tenancy is enforced by the parent lookup.

24. Observability: Cloud Trace Disabled at Launch (H0.1.2)

Status: Accepted — 2026-04-22, launch posture
Decision: Do NOT set OTEL_CLOUD_TRACE_ENABLED=true in production Cloud Functions at launch. Ship with the env var unset. Sentry backend tracesSampleRate=0.2 and frontend tracesSampleRate=0.1 provide the launch-critical observability baseline.
Rationale:
- Cloud Trace implementation already exists in functions/src/core/sentry.ts (installCloudTraceExporter) and is one env-var flip to enable — this is a reversible default, not a permanent exclusion.
- At launch-scale traffic (3 pilot tenants, ~100 sessions/day) cost is effectively zero either way — under the 2.5M-span/month free tier until ≥100 tenants. Cost is not the gating factor.
- Real gating concerns are (a) operational complexity — one more sink to monitor, one more “is it healthy?” to check weekly — and (b) an unfinished data-residency audit on the production GCP project region. Shipping a trace pipeline to an unverified region would be a self-inflicted credibility problem for an EU-first NGO tool under GDPR scrutiny.
- Sentry already covers 90% of the debug surface. Cloud Trace’s marginal value is distributed traces across Cloud Functions + Firestore/Pub-Sub spans — a real benefit for “why is this slow?” investigations, but not launch-blocking.
Consequence:
- Perf debugging for pilots relies on Sentry’s 20% sample. If a specific session isn’t sampled, we can raise the relevant handler’s rate to 1.0 via Sentry.startSpan on demand — documented workaround.
- No GCP-native alerting on Cloud Trace metrics (e.g. “alert if p95 of sendNotification exceeds 2s”). This gap is accepted for the launch window; Sentry performance thresholds cover it.
- Revisit trigger: 30 days post-launch, review on the H1 agenda. Flip the flag ON iff we have seen ≥1 perf incident that full-sample traces would have resolved faster. Pre-flip: run gcloud config get-value compute/region --project=<prod> and confirm EU region; if not, set GCP data-residency restrictions before enabling.
- Decision is reversible in ~5 minutes via firebase functions:config:set or the v2 env-var mechanism. No code change needed — the exporter picks up the flag on init.
Related: docs/planning/2026-04-22-h0-1-2-cloud-trace-decision.md (full decision memo with cost model, decision matrix, and revisit criteria).

Get Started

Product — Domain Model

Product — Workflows

Product — Features

Product — Extensions

Product — User Guides

Product — Launch & Pricing

Engineering — Architecture

Engineering — Data

Engineering — Frontend

Engineering — Security

Engineering — Testing

Engineering — Guides

Contributing

Marketing

Documentation Index

​Architecture Decision Log (ADR)

​1. Multi-Tenancy: Logical Isolation vs. Physical Isolation

​2. Infrastructure: Firebase as Primary Backend

​3. Communication: In-Memory EventBus

​4. Frontend: React Context vs. Redux/Zustand

​5. Security: Auditor Access Grants

​6. Billing: Stripe Payment Service Migration

​7. Billing: Credit & Entitlement System

​8. Extensions: Pluggable Feature Architecture

​9. AI Agents: Execution Architecture

​10. Frontend: Tailwind CSS 4 with OKLCH Design Tokens

​11. Module Boundaries: Feature Public API Enforcement

​12. Code Stewardship: Layer Ownership Coverage

​13. EventBus Migration: Superadmin → Platform Feature

​14. Procurement P2P: Server-Side tRPC Architecture

​15. Integration OAuth: Server-Side Token Delegation

​16. Skeleton Loading States: AppShell Skeleton System

​17. Grant→Project Auto-Linking on Conversion

​18. Grant-Aware Compliance Policy Auto-Attach

​19. Report Narrative Enrichment with Financial Context

​20. Partnership Routes and Mission Routes Build-Gated for Launch

​21. Route-Level Error Boundaries on Suspense-Wrapped Tab Content

​22. Decouple Partnerships from Mission Behind Independent Launch Flags

​23. Repository Pattern for Persistence (IFirestoreRepository<T>)

​When to use which repo method

​Subcollection pattern

​24. Observability: Cloud Trace Disabled at Launch (H0.1.2)