DocsTechnical & Security Wiki

Technical & Security Wiki

Last updated: March 2026

1. System Overview & Architecture

What Arkova Is

Arkova is a jurisdiction-aware verification layer that enables organizations to issue, anchor, and verify credentials against the anchoring network. It transforms documents such as diplomas, certificates, licenses, attestations, and compliance records into tamper-evident digital credentials — without ever taking custody of the underlying documents.

Arkova is not a blockchain company. It is a verification infrastructure company that uses a public ledger as an immutable timestamping layer.

The Verification Layer Concept

┌─────────────┐          ┌─────────────┐          ┌─────────────┐
│   ISSUER    │  anchor   │   ARKOVA    │  verify   │  VERIFIER   │
│ (University,│ ────────► │ Verification│ ◄──────── │ (Employer,  │
│  Employer,  │          │   Layer     │          │  Regulator, │
│  Regulator) │          │             │          │  Partner)   │
└─────────────┘          └─────────────┘          └─────────────┘
                              │
                              ▼
                     ┌─────────────────┐
                     │   ANCHORING     │
                     │   NETWORK       │
                     │  (Immutable     │
                     │   Timestamp)    │
                     └─────────────────┘

How it works:

  • The Issuer uploads or creates a credential. The document is fingerprinted (SHA-256) entirely on the user's device. Only the fingerprint — never the document — leaves the browser.
  • Arkova anchors the fingerprint to the anchoring network via an OP_RETURN output containing a 36-byte payload (ARKV prefix + SHA-256 hash).
  • Any Verifier can query Arkova's API or public verification page to confirm the credential's authenticity, timestamp, issuer, and status.

Non-Custodial Architecture

DimensionWhat This Means
Document Non-CustodyDocuments never leave the user's device. Arkova never receives, stores, transmits, or processes raw document content. Only a one-way SHA-256 fingerprint is stored.
Financial Non-CustodyArkova does not store, accept, or manage user cryptocurrency. All on-chain fees are paid from an Arkova-managed corporate fee account.
Key Non-CustodyTreasury signing keys are secured in cloud HSMs (AWS KMS / GCP Cloud HSM). No human has access to raw private key material.
Important
This design eliminates regulated data custody risk. Arkova does not become a custodian of PII, financial assets, or cryptographic material.

Schema-First Build Philosophy

  • Schema First — Define Postgres tables, columns, constraints, and Row Level Security policies before writing any application code.
  • Migration Immutability — Once a migration is applied, it is never modified. Changes are expressed as compensating migrations.
  • Type Generation — TypeScript types are auto-generated from the database schema, ensuring compile-time safety across the full stack.
  • Validation at the Boundary — All write paths are validated with Zod schemas before reaching the database.

2. Security & Privacy

Mandatory Row Level Security (RLS)

Every table in the Arkova database has FORCE ROW LEVEL SECURITY enabled. This is a non-negotiable architectural constraint.

Note
Even if application code has a bug, the database will refuse to return rows the authenticated user is not authorized to see. FORCE ROW LEVEL SECURITY means RLS policies apply even to the table owner.
TablePolicy
anchorsUsers see own anchors + org anchors (via org membership)
profilesUsers see own profile only
organizationsMembers see their own org
audit_eventsUsers see own events only
api_keysORG_ADMIN only (not readable by ORG_MEMBER)
webhook_endpointsORG_ADMIN full CRUD for own org
billing_eventsUser reads own; append-only (triggers block UPDATE/DELETE)
attestationsPublic read; write restricted to authenticated users

Tenant Isolation

Multi-tenancy is enforced at the database level, not the application level. Every row carries an org_id foreign key. RLS policies use auth.uid() to resolve the caller's identity. Cross-tenant data access is architecturally impossible.

The Client-Side Processing Boundary

Important
Documents never leave the user's device. This is Arkova's foundational privacy guarantee.
┌─────────────────────────────────────────────────────────┐
│  USER'S DEVICE (Browser)                                │
│                                                         │
│  Document  ──►  PDF.js / Tesseract.js  ──►  Raw OCR    │
│                 (Web Worker)                Text         │
│                                              │          │
│                                              ▼          │
│                                    PII Stripping        │
│                                              │          │
│  SHA-256 Fingerprint  ◄──── Document ────────┤          │
│       (32 bytes)                             │          │
│            │                                 ▼          │
│            │                     PII-Stripped Metadata   │
└────────────┼─────────────────────────┼──────────────────┘
             │                         │
     ────────┼─────────────────────────┼──── NETWORK BOUNDARY
             │                         │
             ▼                         ▼
    ┌─────────────┐          ┌─────────────────┐
    │  Supabase   │          │  Worker (AI)    │
    └─────────────┘          └─────────────────┘

Why this matters for partners:

  • Arkova is not a data processor under GDPR for document content
  • There is no "raw mode" bypass
  • The generateFingerprint() function is architecturally prohibited from being imported in server-side code
  • Client-side PII stripping uses regex-based removal of SSNs, student IDs, DOBs, emails, phones, and names

Audit Trail

All significant actions logged to immutable, append-only audit_events table. Triggers reject all UPDATE and DELETE — even from service_role. Event categories: AUTH, ANCHOR, PROFILE, ORG, ADMIN, SYSTEM. PII fields nullified at write time.

API Key Security

Keys hashed with HMAC-SHA256 using API_KEY_HMAC_SECRET. Raw keys never stored after initial creation. Supports scoped permissions: verify, verify:batch, keys:manage, usage:read.

On-Chain Content Policy

Only 36 bytes are ever written to the anchoring network: ARKV (4 bytes) + SHA-256 hash (32 bytes). Forbidden from on-chain: filenames, file sizes, MIME types, user IDs, org IDs, email addresses, any PII.

3. Terminology & Compliance

Strict Enterprise Terminology

Banned TermRequired AlternativeRationale
WalletFee Account / Billing AccountAvoids confusion with custodial cryptocurrency wallets
TransactionNetwork Receipt / Anchor ReceiptPrevents association with financial transactions
HashFingerprintEnterprise-friendly; conveys intent without jargon
BlockNetwork ConfirmationAvoids blockchain-specific terminology
Blockchain / BitcoinAnchoring Network / Production NetworkTechnology-neutral messaging
Testnet / MainnetTest Environment / Production NetworkStandard enterprise naming
GasNetwork FeeNot applicable but reserved
UTXO / Broadcast(internal only)No user-visible equivalent
Note
This policy is CI-enforced via npm run lint:copy. All user-visible strings are centralized in src/lib/copy.ts.

Credential Types

TypeExamples
DIPLOMAUniversity degrees, academic diplomas
CERTIFICATEProfessional certifications, course completions
LICENSEProfessional licenses, regulatory permits
BADGEDigital badges, micro-credentials
ATTESTATIONThird-party attestation claims
FINANCIALFinancial compliance documents
LEGALLegal agreements, contracts
INSURANCEInsurance certificates, COIs
SEC_FILINGSEC regulatory filings
PATENTPatent filings and grants
REGULATIONRegulatory documents
PUBLICATIONAcademic publications
OTHERGeneral-purpose catch-all

Compliance Posture

RequirementArkova's Approach
GDPRNon-custodial for documents. Fingerprints are one-way hashes. Account deletion implemented with full cascade.
SOC 2Evidence collection documented. Branch protection, RLS, audit trails, and key management provide CC6.1/CC6.3/CC7.2 controls.
Data RetentionConfigurable retention policies. cleanup_expired_data RPC runs on schedule. Legal hold overrides prevent deletion when active.
CCPAAccount deletion cascade covers all personal data. No sale of personal information.

4. AI Intelligence Suite

Overview

Arkova's AI operates exclusively on PII-stripped metadata, never on raw document content.

Architecture

┌────────────────────────────────────┐
│  Client (Browser)                  │
│  OCR (PDF.js + Tesseract.js)       │
│         │                          │
│         ▼                          │
│  PII Stripping (regex-based)       │
│         │                          │
│         ▼                          │
│  Stripped Text + Fingerprint       │
└─────────┼──────────────────────────┘
          │  POST /api/v1/ai/extract
          ▼
┌────────────────────────────────────┐
│  Worker (Server)                   │
│  IAIProvider Interface             │
│    ├── GeminiProvider (primary)    │
│    ├── Cloudflare AI (fallback)    │
│    └── Replicate (QA only)        │
│         │                          │
│         ▼                          │
│  Structured Metadata Fields        │
│  + Confidence Score (0-1)          │
│  + Integrity Score (0-100)         │
└────────────────────────────────────┘

Capabilities

CapabilityDescription
Metadata ExtractionExtracts structured fields from PII-stripped OCR text using Gemini Flash. Returns confidence scores per field.
Batch ExtractionProcess multiple credentials in a single request. Up to 100 items.
Semantic SearchNatural language search across all credentials using pgvector embeddings (768-dim).
Fraud / Integrity ScoringComputes 0-100 integrity score. Scores below 60 auto-flagged for human review.
Visual Fraud DetectionImage-based fraud analysis for credential documents.
Human Review QueueFlagged credentials surface in admin review queue with disposition workflow.
Extraction FeedbackClosed-loop learning: human corrections improve future accuracy.
Knowledge QueryRetrieval-augmented generation against 29,000+ public records. Returns cited sources.

Cost-Efficiency Model

OperationCostModel
Metadata Extraction1 AI creditGemini 2.0 Flash
Semantic Search1 AI credittext-embedding-004
Fraud Analysis5 AI creditsGemini 2.0 Flash
Embedding Generation1 AI credittext-embedding-004
RAG QueryVariableGemini + pgvector
Tip
Gemini Flash provides extraction accuracy on par with larger models (F1=82.1%) at ~$0.075 per 1M input tokens. The provider abstraction layer supports hot-swapping to OpenAI or Anthropic.

Feature Flags

FlagGatesDefault
ENABLE_AI_EXTRACTIONAll extraction endpoints + client-side pipelinefalse
ENABLE_SEMANTIC_SEARCHpgvector search endpointsfalse
ENABLE_AI_FRAUDFraud analysis pipelinefalse

Public Data Pipeline

SourceRecordsUpdate Frequency
SEC EDGARFilingsContinuous
Federal RegisterRegulatory actionsContinuous
DAPIP (Dept. of Education)Institutional dataBatch (resumable)
OpenAlexAcademic publicationsEvery 30 minutes
Total29,000+Auto-growing via Cloud Scheduler

5. Roadmap & Evolution

Three-Phase Product Evolution

PhaseNameStatusDescription
Phase 1Credentialing MVPLive (94% complete)Issue, anchor, verify credentials. Network anchoring. AI extraction. Verification API. Payments.
Phase 1.5FoundationIn ProgressPublic records pipeline, x402 micropayments, RAG intelligence, SDKs, multi-chain support.
Phase 2AttestationsPlannedThird-party attestation claims, lifecycle management, network anchoring.
Phase 3E-SignaturesPlannedLegally recognized electronic signatures on anchoring infrastructure.

Detailed Milestone Roadmap

MilestoneTargetKey Deliverables
Beta Launch (Signet)Complete1,572+ SECURED anchors, 13 beta stories, 2,236 tests
Production NetworkQ2 2026Production treasury funding, batch anchoring, production receipts
Base L2 AnchoringQ2 2026Multi-chain support via Base
Attestation API v1Q2 20265 attestation types, revocation, expiry, CRUD API
x402 MicropaymentsQ2 2026USDC on Base L2, pay-per-call API
Python & TypeScript SDKsQ2 2026Partner integration libraries
E-Signature LayerQ4 2026Legally binding signatures anchored to the anchoring network

Infrastructure Metrics

MetricValue
Database Migrations121
Test Suite2,433+ tests (1,024 frontend + 1,409 worker)
Stories Completed180 / 192 (94%)
Security Audit Findings24 / 24 resolved (100%)
SECURED Anchors1,572+
Public Records Indexed29,000+
Vector Embeddings9,300+
AI Eval F1 Score82.1%

6. Developer Reference

Technology Stack

LayerTechnologyPurpose
FrontendReact 18 + TypeScriptSingle-page application
StylingTailwind CSS + shadcn/uiComponent library and design system
BundlerViteDevelopment and production builds
Routingreact-router-dom v6Client-side routing
DatabaseSupabase (Postgres)Managed Postgres with auth, realtime, RLS
AuthSupabase AuthEmail/password, Google OAuth, MFA/TOTP
WorkerNode.js + ExpressWebhooks, anchoring jobs, cron, AI processing
ValidationZodRuntime schema validation
PaymentsStripe (SDK + webhooks)Subscription billing (worker-only)
Micropaymentsx402 Protocol (USDC on Base L2)Pay-per-call API access
Chain (Anchoring)bitcoinjs-lib + Cloud HSMOP_RETURN anchoring
Chain (Base L2)viemEVM-based anchoring
AI (Primary)Gemini 2.0 FlashExtraction, fraud, RAG
AI (Fallback)Cloudflare Workers AIGated by ENABLE_AI_FALLBACK
Vector Searchpgvector768-dim embeddings
TestingVitest + PlaywrightUnit, integration, RLS, E2E
Formal VerificationTLA PreCheckState machine correctness proofs
ObservabilitySentryError tracking with PII scrubbing
Edge ComputeCloudflare WorkersMCP server, queue processing
IngressCloudflare TunnelZero Trust, no public ports
CI/CDGitHub Actions → Vercel + RailwayAutomated deploy on merge

Infrastructure Topology

┌──────────────────────────────────────────────────────────────┐
│  Internet                                                     │
│                                                               │
│  ┌───────────────┐    ┌───────────────┐    ┌──────────────┐ │
│  │  Vercel CDN   │    │  Cloudflare   │    │  Cloud Run   │ │
│  │  (Frontend)   │    │  Tunnel       │    │  (Worker)    │ │
│  │  React SPA    │    │  Zero Trust   │    │  Express API │ │
│  └───────┬───────┘    └───────┬───────┘    └──────┬───────┘ │
│          │                    │                    │          │
│          ▼                    ▼                    ▼          │
│  ┌──────────────────────────────────────────────────────┐    │
│  │  Supabase (Managed Postgres)                         │    │
│  │  • Auth  • Realtime  • RLS  • pgvector              │    │
│  └──────────────────────────────────────────────────────┘    │
│          │                                    │              │
│          ▼                                    ▼              │
│  ┌───────────────┐                   ┌───────────────┐      │
│  │  Stripe       │                   │  Anchoring  │      │
│  │  (Payments)   │                   │  Base L2      │      │
│  └───────────────┘                   └───────────────┘      │
└──────────────────────────────────────────────────────────────┘

Webhook Reliability

StandardSpecification
Delivery ProtocolHTTPS only (enforced by database CHECK constraint)
SignatureHMAC-SHA256 on full payload body. X-Arkova-Signature header.
Retry Policy5 attempts with exponential backoff: immediate → 1m → 5m → 30m → 2h
Circuit BreakerConsecutive failures trip the circuit. Probe after cooldown.
Dead Letter QueueAfter all retries, events retained 30 days. Manual replay available.
Timeout30-second delivery timeout
Rate Limit100 deliveries/minute per organization
Idempotencyidempotency_key prevents duplicate processing

Webhook Events

EventTrigger
anchor.createdNew credential anchor created
anchor.securedAnchor confirmed on the anchoring network
anchor.revokedCredential revoked
anchor.verifiedVerification lookup performed
attestation.createdNew attestation claim created
attestation.revokedAttestation revoked

Authentication Methods

MethodUse CaseHeader
API Key (Bearer)Verification API, batch operationsAuthorization: Bearer ak_live_...
API Key (Header)Alternative API key deliveryX-API-Key: ak_live_...
Supabase JWTKey management, AI endpointsAuthorization: Bearer eyJ...
x402 PaymentPay-per-call (no subscription)HTTP 402 → USDC payment → retry with proof

Rate Limiting

ScopeLimitResponse
Anonymous (public verification)100 req/min per IPHTTP 429 + Retry-After
API Key holders1,000 req/min per keyHTTP 429 + Retry-After
Batch endpoints10 req/min per API keyHTTP 429 + Retry-After

7. API Reference

Base URL

bash
https://{worker-host}/api/v1
Tip
Interactive documentation (Swagger UI) is available at /api/docs. The OpenAPI 3.0 spec is downloadable at /api/docs/spec.json.

Authentication

bash
# Bearer token
curl -H "Authorization: Bearer ak_live_your_key_here" \
  https://api.arkova.io/api/v1/verify/ARK-2026-001

# Header
curl -H "X-API-Key: ak_live_your_key_here" \
  https://api.arkova.io/api/v1/verify/ARK-2026-001

Verification Endpoints

MethodEndpointAuthDescription
GET/verify/{publicId}OptionalVerify a single credential. Returns frozen schema.
POST/verify/batchRequiredBatch verify up to 100 credentials.
GET/verify/{publicId}/proofOptionalDownload cryptographic proof package.
GET/verify/entityRequiredCross-reference entity against public records.
GET/verify/searchRequiredAgentic semantic search. Designed for AI agents.
GET/jobs/{jobId}RequiredPoll async batch job status.
GET/usageRequiredCurrent month API usage.

Verification Response Schema (Frozen)

Important
The verification response schema is frozen — fields cannot be removed or renamed after publication. Only additive nullable fields may be added.
json
{
  "verified": true,
  "status": "ACTIVE",
  "issuer_name": "University of Michigan",
  "recipient_identifier": "sha256:ab3f...",
  "credential_type": "DIPLOMA",
  "issued_date": "2026-01-15T00:00:00Z",
  "expiry_date": null,
  "anchor_timestamp": "2026-03-10T08:00:00Z",
  "network_block": 204567,
  "network_receipt_id": "b8e381df09ca404e...",
  "merkle_proof_hash": null,
  "record_uri": "https://app.arkova.io/verify/ARK-2026-001",
  "jurisdiction": "US-MI"
}

Status values: ACTIVE, REVOKED, SUPERSEDED, EXPIRED, PENDING

Note
jurisdiction is omitted when null — it is never returned as null.

Anchoring Endpoint

MethodEndpointAuthDescription
POST/anchorRequiredSubmit fingerprint for network anchoring. Idempotent.
json
{
  "fingerprint": "a1b2c3d4e5f6...64-char-hex",
  "label": "Bachelor of Science in Computer Science",
  "credential_type": "DIPLOMA",
  "metadata": {
    "issuer": "University of Michigan",
    "issued_date": "2026-01-15"
  }
}

Attestation Endpoints

MethodEndpointAuthDescription
POST/attestationsRequiredCreate an attestation claim.
GET/attestationsPublicList attestations with cursor-based pagination.
GET/attestations/{publicId}PublicRetrieve a single attestation.
PATCH/attestations/{publicId}/revokeRequired (owner)Revoke with optional reason.

AI Intelligence Endpoints

MethodEndpointDescription
POST/ai/extractExtract structured metadata from PII-stripped text
POST/ai/extract/batchBatch extraction
POST/ai/embedGenerate 768-dim pgvector embedding
GET/ai/searchNatural language semantic search
POST/ai/integrity/computeCompute fraud/integrity score (0-100)
POST/ai/fraud/visualVisual fraud detection
GET/ai/reviewList flagged items in review queue
POST/ai/feedbackSubmit extraction corrections
POST/knowledge/queryRAG query against knowledge base

Error Response Format

json
{
  "error": "not_found",
  "message": "Credential with public ID ARK-2026-999 not found"
}
HTTP StatusMeaning
400Invalid request parameters
401Authentication required or invalid
402Payment required (x402 or insufficient credits)
403Insufficient permissions
404Resource not found
409Conflict (e.g., already revoked)
429Rate limit exceeded
503Feature not enabled

8. Shared Responsibility

Partner Integration Responsibilities

ResponsibilityArkovaPartner
Credential AnchoringManages network transactions and chain confirmationSubmits fingerprints and metadata via API
Document ProcessingProvides client-side SDKs for fingerprinting and OCRRuns fingerprinting in their own client
Document StorageDoes not store documentsStores and manages original documents
PII ManagementStrips PII client-side before server transmissionEnsures PII not embedded in metadata sent to API
API Key SecurityIssues keys, enforces HMAC hashing, scoped permissionsStores keys securely. Rotates on schedule.
Webhook VerificationSigns all webhooks with HMAC-SHA256Verifies X-Arkova-Signature on receipt
Rate Limit ComplianceEnforces limits and returns Retry-After headersImplements backoff. Caches results.
Data RetentionConfigurable retention policies. Legal hold support.Defines retention requirements.
AI Extraction AccuracyTargets F1 > 80% across credential typesSubmits feedback corrections to improve accuracy
Schema VersioningFrozen v1 schema. 12-month deprecation for breaking changes.Builds against versioned schema.

Investor Infrastructure Summary

DimensionDetail
HostingVercel (frontend CDN), Cloud Run (worker), Supabase (managed Postgres)
SecurityCloudflare Zero Trust, RLS on every table, HMAC-SHA256 API keys, cloud HSM signing, SOC 2 evidence
ScalabilityStateless worker, Postgres connection pooling, CDN-cached frontend, async batch processing
ReliabilityCircuit breakers, dead letter queues, exponential backoff, idempotent webhooks
AI InfrastructureProvider-agnostic, credit-based cost controls, feature-flagged rollout, 2,050+ entry golden dataset
ComplianceGDPR (non-custodial), SOC 2, immutable audit trail, configurable retention, legal hold
Chain StrategyPublic ledger (immutability) + Base L2 (cost efficiency). Non-custodial. Technology-neutral UX.

For questions, contact support@arkova.ai

Version 1.0 | March 2026