dayz-search-index
Overview
Index vanilla DayZ source (Enforce Script, .layout, config.cpp, .rvmat) into a per-user vector database so DayZ agents can do semantic search via the dayz-rag MCP server. One-time setup; rerun with --full after a DayZ update. Required before agents can use the search_dayz_source tool.
/dayz-search-index
Build a semantic-search index over vanilla DayZ source so DayZ specialist agents stop flying blind through P:\ with Grep. Backs the dayz-rag MCP server.
Follow .claude/skills/_shared/dayz-conventions.md.
What it indexes
| Source | Files | Chunking |
|---|---|---|
P:\scripts\**\*.c | Enforce Script (~2,800 files) | per class/method |
P:\gui\**\*.layout | UI layouts (~210 files) | per top-level Widget block |
P:\dz\**\config.cpp | Vanilla configs (~200 files) | per class X { ... } block |
P:\**\*.rvmat | Materials (~35,000 files) | whole file, content-deduped by SHA-256 |
P:\**\*.xml | XML (types, etc.) | batched 50 elements/chunk |
model.cfg is intentionally skipped — vanilla extracted data doesn't ship them.
Where the index lives
~/.claude/dayz-search-index/ (= C:\Users\<you>\.claude\dayz-search-index\)
lancedb/— vector storemanifest.json— embed model used, per-source chunk counts, indexed-at, token usage + $ estimateconfig.json— model name the MCP server should use at query time
Per-user, gitignored (lives outside the repo). Survives clones.
Embedding via Voyage AI (cloud)
Embedding runs through Voyage AI's hosted API. Code-tuned model, much higher retrieval quality than local CPU options for our DayZ corpus. The trade-off is a network call per batch and an API key.
Default model: voyage-code-3 — code-tuned, 32k context, 1024-dim. $0.18 per 1M tokens with 200M tokens free (Series-4 free allowance, doesn't expire when payment method is added).
Cost reality: a full vanilla rebuild is ~65M tokens (measured, not estimated — dominated by ~34k unique rvmats and ~10k XML chunks from types.xml batches). At voyage-code-3's $0.18/M that's ~$12, but the 200M free tier covers ~3 rebuilds. After that, switch to voyage-4-lite ($0.02/M ≈ $1.30/rebuild) via VOYAGE_MODEL env var if cost matters. Live token + $ counter prints during the embed loop, and final totals are saved to manifest.json.
Free-tier projection check: before kicking off the embed phase, the skill estimates this build's tokens (sum of chunk text lengths ÷ chars-per-token) and projects what the cumulative monthly usage would be after this run. If the projection exceeds 80% of the 200M free tier, you get a prompt to confirm. Cumulative usage is tracked locally in ~/.claude/dayz-search-index/usage.log (one JSONL entry per build). Caveats: the local counter only sees builds run via these skills — it does NOT see usage by other apps sharing your VOYAGE_API_KEY. The authoritative usage view is at dash.voyageai.com. Free tier resets monthly per Voyage's billing cycle. Pass --ignore-tier-warning to skip the prompt (CI / scripted use). Override the budget with VOYAGE_FREE_TIER_TOKENS=<int> env var if you're on a non-default plan.
One-time setup
- Get a key at dash.voyageai.com.
- Add it to
.envat the repo root (gitignored — never commits):VOYAGE_API_KEY=pa-xxxxxxxxxxxx - (Optional) override the model via env var if you want a cheaper or higher-quality variant:
VOYAGE_MODEL=voyage-4-lite # cheapest ($0.02/M, 16M TPM)VOYAGE_MODEL=voyage-code-3 # default, code-tunedVOYAGE_MODEL=voyage-4-large # best general quality
The indexer auto-installs voyageai + python-dotenv on first run.
How to run
First time / after DayZ update:
python .claude\skills\dayz-search-index\index.py --full
Status only:
python .claude\skills\dayz-search-index\index.py --status
When to run
- After a fresh clone, before using any DayZ specialist agent that needs vanilla source recall.
- After a DayZ update (vanilla source on
P:\may have changed). - After re-extracting vanilla data via DayZ Tools.
Output
DayZ RAG indexer
[OK] P:\ mounted
[OK] Vanilla data: P:\dz
[OK] VOYAGE_API_KEY loaded (pa-GRJ...edF)
[INFO] Index: C:\Users\you\.claude\dayz-search-index\
[INFO] Embedding model: voyage-code-3 (1024D, via Voyage cloud)
[INFO] Pricing: $0.18 per 1M tokens (Series-4 includes 200M free)
Walking vanilla folders: scripts, gui, dz, core, graphics, languagecore
scripts/ c=28432c/2805f
gui/ layout=1842c/214f
dz/ cpp=3201c/206f
core/ ...
rvmat dedup: 33800 duplicates (97.6%)
Embedding 34322 chunks via Voyage (voyage-code-3, document mode)...
embedding: 100%|██| 34322/34322 [03:42<00:00] 12,847,231 tok / ~$2.312
[OK] Index built: 34322 chunks
[INFO] Tokens used: 12,847,231 | Estimated cost: $2.3125
[INFO] Embed time: 222.4s
Performance expectations
- Full index: ~25-30 min (network-bound, ~52k chunks at ~75 chunks/s).
- Index disk size: ~200-400 MB for the vector store + manifest.
- Tokens consumed: ~65M (covered by Voyage's 200M free tier for the first ~3 rebuilds).
Safety rails
- Hard-fails if
VOYAGE_API_KEYis missing — no silent partial runs. - Refuses to embed >150k chunks without
--force(cost/runtime backstop). - Retries on 429 / transient network errors with exponential backoff.
- Live token counter + $ estimate so an unexpected blow-up surfaces instantly.
Dependencies
pip install -r requirements.txt (the indexer auto-installs on first run if missing).
lancedb— file-based vector storetqdm— progress barsvoyageai— Voyage embedding SDKpython-dotenv— loadVOYAGE_API_KEYfrom.env
Do not
- Don't gate this skill on
/dayz-preflight's warnings (Tools, Workshop) — onlyP:\mounted is mandatory. - Don't index per clone. The index reflects the user's machine (their DayZ install), not the project. It lives at
~/.claude/dayz-search-index/. - Don't commit
.env. It's in.gitignore; keep it there. - Don't switch models without re-running
--full— the LanceDB table is fixed-dim and switching providers/models will mismatch query and document vectors.