Search and Indexing Strategy

This document details how GrantMaster implements high-performance search across structured data (tenants, users) and unstructured content (grant listings, documents).

Hybrid Search Architecture

We use a three-tier strategy to balance speed, consistency, and offline resilience:

1. Firestore Native Search (Point Queries)

Used for simple, exact-match filtering (e.g., “Find user by email”, “List all grants for Tenant A”).

Pros: Real-time consistency, no extra cost.
Cons: No support for partial matches (fuzzy search) or complex “OR” queries.

2. Typesense (Full-Text Search — Primary)

Typesense Cloud is the primary full-text search engine (typesense v3.0.1). It powers the command palette global search and cross-entity queries.

Multi-collection search: A single query fans out across all indexed collections via client.multiSearch.perform().
Fuzzy matching: Supports typo-tolerant search on configurable query fields per collection.
Faceted filters: Status, project, user, date-range filters are applied server-side via Typesense filter_by clauses.
Hosted on Typesense Cloud (*.a2.typesense.net); no self-hosted infra required.

3. Fuse.js (Client-Side Fallback)

When Typesense is unreachable (offline, misconfigured), the platform degrades to in-memory fuzzy search via Fuse.js. The hybridSearch() function in searchService.ts orchestrates the fallback:

Mode	Behavior
`server`	Always queries Typesense; throws on failure.
`local`	Always uses Fuse.js against a pre-built in-memory index.
`auto`	Tries Typesense when online and configured; falls back to Fuse.js on error.

Indexed Collections

Seven Firestore collections are synced to Typesense. Schemas are defined in functions/src/search/typesenseSchema.ts and collection names use the gm_ prefix.

Typesense Collection	Firestore Source	Query Fields	Default Sort
`gm_projects`	`projects`	name, description, funder, grantNumber, projectManager	createdAt
`gm_employees`	`employees`	name, email, jobTitle, department	createdAt
`gm_expenses`	`expenses`	description, vendor, category, employeeName, projectName	createdAt
`gm_journals`	`journals`	activityDescription, activityType, employeeName, projectName	date
`gm_documents`	`documents`	filename, title, content, projectName	createdAt
`gm_contacts`	`contacts`	name, email, company, notes	createdAt
`gm_compliance_rules`	`complianceRules`	name, description, category, projectName	createdAt

Each schema includes faceted fields for organizationId, status, projectId, and domain-specific dimensions (e.g., funder, category, severity). See typesenseSchema.ts for the full field definitions.

Data Synchronization

Firestore-to-Typesense Triggers

Every indexed Firestore collection has an onDocumentWritten Cloud Function trigger defined in functions/src/search/indexingTriggers.ts:

Cloud Function	Firestore Path	Typesense Collection
`searchIndexProject`	`projects/{projectId}`	`gm_projects`
`searchIndexEmployee`	`employees/{employeeId}`	`gm_employees`
`searchIndexExpense`	`expenses/{expenseId}`	`gm_expenses`
`searchIndexJournal`	`journals/{journalId}`	`gm_journals`
`searchIndexDocument`	`documents/{documentId}`	`gm_documents`
`searchIndexContact`	`contacts/{contactId}`	`gm_contacts`
`searchIndexComplianceRule`	`complianceRules/{ruleId}`	`gm_compliance_rules`

Sync Lifecycle

Trigger: onDocumentWritten fires on any create, update, or delete in the source collection.
Guard: If Typesense is not configured (isTypesenseConfigured() returns false), the trigger exits silently — the system degrades gracefully.
Schema Assurance: ensureCollection() creates the Typesense collection on first write (idempotent).
Transformation: The Cloud Function extracts a flat search document, converting Firestore Timestamp values to Unix milliseconds and stripping internal-only fields.
Upsert / Delete: upsertDocument() or deleteDocument() is called on the Typesense client.
Error Logging: Failures are logged via firebase-functions/v2 structured logger with document ID and collection name for observability.

Manual Reindex

The reindexCollection callable function allows admins to trigger a full reindex of a specific collection for a given organization. Requires the admin custom claim on the caller’s auth token.

// Client-side invocation
const reindex = httpsCallable(functions, 'reindexCollection');
await reindex({ collection: 'gm_projects', organizationId: 'org_abc' });

Frontend Search Service

Client Initialization

The frontend Typesense client (src/shared/platform/typesenseSearchService.ts) is a lazy-initialized singleton configured via environment variables:

Variable	Purpose
`VITE_TYPESENSE_HOST`	Typesense Cloud node hostname
`VITE_TYPESENSE_SEARCH_API_KEY`	Read-only Search API key (no write access)
`VITE_TYPESENSE_PORT`	Port (default `443`)
`VITE_TYPESENSE_PROTOCOL`	Protocol (default `https`)
`VITE_TYPESENSE_USE_CLOUD_FUNCTIONS`	If `true`, route searches through callable `searchTypesenseSecure` (recommended for production)

Multi-Collection Search

searchTypesense() builds a multiSearch request that fans out across all permitted collections in a single round-trip. Each sub-request includes:

query_by — collection-specific text fields.
filter_by — always starts with organizationId:={orgId} (mandatory tenant isolation), then appends optional status/project/user filters.
per_page — configurable limit per collection (default 10).
highlight_full_fields — enables snippet highlighting for UI display.

When VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=true, the frontend does not query Typesense directly. It calls the callable Cloud Function searchTypesenseSecure, which executes multi-search server-side and returns only mapped result payloads.

RBAC Post-Filtering

After Typesense returns results, applyRBACFilter() removes items the current user lacks permission to view:

Collection-level: Each collection maps to one or more Permission enums. If the user lacks the required permission, the entire collection’s results are dropped.
Ownership-level: For journals and expenses, non-approvers only see their own records (filtered by userId).
Pre-optimization: Collections the user cannot access are excluded from the multiSearch request entirely, reducing payload and latency.

Result Transformation

Each Typesense hit is transformed into a unified SearchResult object that includes:

title, description, subtitle — human-readable display fields.
navigationTarget — deep-link path (e.g., /projects/{id}, /expenses?id={id}).
icon — Lucide icon name for UI rendering.
score — Typesense text_match score for relevance ranking.
matchedFields — fields that contributed to the match (for highlight display).

Vector Search (AI Discovery)

For advanced semantic matching (e.g., “Find grants related to rural healthcare for seniors”), the platform uses Vector Embeddings via the RAG service (src/features/ai/services/ragService.ts). This is a separate system from the Typesense full-text search:

Embedding: Text content is converted into vector arrays using Google Gemini models.
Similarity search: Cosine similarity queries run against the vectorized grant listings.
Hybrid scoring: Semantic results are combined with traditional filters (location, dollar amount) to produce a final Match Score.

Security and Tenancy

Search indices are strictly partitioned by organizationId.

Server-Side (Cloud Functions)

Every indexing trigger validates that organizationId exists before upserting. Documents without an organizationId are skipped and logged as warnings.
The Typesense admin API key (used for writes) is stored as a Firebase Secret (TYPESENSE_API_KEY), never exposed to the client.

Client-Side (Frontend)

The frontend uses a read-only Search API key (VITE_TYPESENSE_SEARCH_API_KEY) that cannot modify indices.
Every search query programmatically injects organizationId:=${orgId} into filter_by — this is enforced in code, not via Typesense scoped keys.
Data masking: Search results contain only the metadata needed for display. Full records are always fetched from Firestore when the user navigates to a detail view.

Callable Search Proxy (Recommended)

searchTypesenseSecure (Cloud Functions callable):

Verifies authentication.
Resolves caller tenant context from people collection.
Rejects cross-tenant search unless caller is superadmin.
Applies mandatory organizationId filters server-side before querying Typesense.
Supports phased rollout with frontend feature flag:
Enable: VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=true
Roll back: set VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=false

Environment Configuration

Frontend (`src/.env.local`)

VITE_TYPESENSE_HOST=<cluster>.a2.typesense.net
VITE_TYPESENSE_SEARCH_API_KEY=<search-only-key>
VITE_TYPESENSE_PORT=443
VITE_TYPESENSE_PROTOCOL=https
VITE_TYPESENSE_ENABLED=true
VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=false

Cloud Functions (`functions/.env`)

TYPESENSE_API_KEY=<admin-key>       # Write access; use Firebase Secrets in production
TYPESENSE_HOST=<cluster>.a2.typesense.net
TYPESENSE_PORT=443
TYPESENSE_PROTOCOL=https

Rollout Playbook (Secure Search Mode)

Deploy Functions containing searchTypesenseSecure.
Keep VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=false and verify no regressions.
Enable VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=true in staging frontend.
Validate command palette search:
- works for normal user in own organization
- rejects cross-tenant access
- logs show non-zero searchTime and expected collections
Promote env flag to production.
If issues occur, immediately roll back by setting VITE_TYPESENSE_USE_CLOUD_FUNCTIONS=false.

Search Observability and Operations

Metrics Counters

Search telemetry is aggregated into searchMetricsDaily/{YYYY-MM-DD} with per-day counters:

typesense_success
secure_callable_error
fuse_fallback_count

Frontend reports telemetry through callable reportSearchTelemetry; secure callable search also records success/error metrics server-side.

No-Result Query Capture

No-result queries are stored in searchNoResultQueries (daily aggregated) and used for nightly search tuning.

Nightly Jobs

Two scheduled Cloud Functions run daily (Europe/Amsterdam timezone):

nightlyTuneSearchFromNoResults (02:00) - derives synonym updates + typo config from top no-result queries.
nightlyReconcileSearchIndex (02:30) - compares Firestore vs Typesense collection counts and writes drift reports to searchReconciliationReports.

Key Source Files

File	Purpose
`functions/src/search/typesenseSchema.ts`	Collection schemas and field definitions
`functions/src/search/typesenseClient.ts`	Server-side Typesense client (upsert, delete, search, health check)
`functions/src/search/indexingTriggers.ts`	Firestore `onDocumentWritten` triggers for all 7 collections
`functions/src/search/searchCallable.ts`	Callable secure search endpoint (`searchTypesenseSecure`)
`functions/src/search/index.ts`	Barrel export for the search module
`src/shared/platform/typesenseSearchService.ts`	Frontend Typesense client (multi-search, RBAC, result transformation)
`src/shared/platform/searchService.ts`	Hybrid search orchestrator (Typesense + Fuse.js fallback)

Get Started

Product — Domain Model

Product — Workflows

Product — Features

Product — Extensions

Product — User Guides

Product — Launch & Pricing

Engineering — Architecture

Engineering — Data

Engineering — Frontend

Engineering — Security

Engineering — Testing

Engineering — Guides

Contributing

Marketing

Documentation Index

​Search and Indexing Strategy

​Hybrid Search Architecture

​1. Firestore Native Search (Point Queries)

​2. Typesense (Full-Text Search — Primary)

​3. Fuse.js (Client-Side Fallback)

​Indexed Collections

​Data Synchronization

​Firestore-to-Typesense Triggers

​Sync Lifecycle

​Manual Reindex

​Frontend Search Service

​Client Initialization

​Multi-Collection Search

​RBAC Post-Filtering

​Result Transformation

​Vector Search (AI Discovery)

​Security and Tenancy

​Server-Side (Cloud Functions)

​Client-Side (Frontend)

​Callable Search Proxy (Recommended)

​Environment Configuration

​Frontend (src/.env.local)

​Cloud Functions (functions/.env)

​Rollout Playbook (Secure Search Mode)

​Search Observability and Operations

​Metrics Counters

​No-Result Query Capture

​Nightly Jobs

​Key Source Files