[ edition / self-hosted ]

Works where you work.

Sidekick Self-Hosted runs inside your perimeter — on your hardware, against your models. Binaries, prompts, findings, and notebooks never cross the boundary.

01
Air-gapped work

Classified, isolated, or export-controlled environments where no outbound connection is permitted.
02
IP-sensitive reversing

Artifacts that must not be uploaded anywhere.
03
Regulated industries

Finance, healthcare, defense, and government engagements with contractual limits on data residency and processing.

❯ perimeter

Nothing leaves the boundary.

Cloud deployments route prompts and context to hosted model providers. Self-Hosted never does. Every artifact Sidekick touches stays on infrastructure you control.

⟦ your perimeter ⟧

client Binary Ninja + Sidekick plugin

data Binaries · Notebooks · Findings

service Sidekick server

inference Your models · Your GPUs

❯ parity

Same workflow, either way.

The deployment boundary changes. The product does not.

01
Same agents

Search, simplify, surface, and verify behave identically to cloud. No feature flags, no reduced capability tier.
02
Same notebook

Findings accumulate across sessions with evidence links intact. The artifact you build here travels with you.
03
Same verification

Background validation cross-checks every finding against the binary. Runs locally, on your hardware.
04
Same API

Script agents, chain workflows, embed Sidekick in pipelines. One surface across deployment models.

❯ runs on your stack

Your hardware, your models.

Sidekick routes different operations to different models. Hardware sizing is driven by concurrency, context window, and tokens-per-second — how many analysts, how much history they carry, how much background verification — not by raw model count.

Hardware tiers

Floor

96 GB VRAM · 3 concurrent

e.g. 1x RTX 6000 Pro Blackwell — comfortably hosts both GPT-OSS 120B and GPT-OSS 20B at the same time, routing tasks between them. The higher GPU memory bandwidth also helps keep interactive workloads feeling faster. Fits a single analyst or a small shared team.
Team

192–320 GB · 4–10 concurrent

e.g. 2× RTX 6000 Pro Blackwell, or 2–4× H100. The added memory bandwidth gives you more headroom for longer context windows under load and for background verification to run alongside interactive chat.
Fleet

640 GB+ · 20+ concurrent

Single dense node — e.g. 8× H100 or H200. Sized for headless agent pipelines running alongside interactive analysts: long contexts, long histories, batch triage at throughput.

Open-weight models

Sidekick can route work across modern open-weight model families that you choose to host on the inference server you already trust — vLLM, TGI, or any other OpenAI-compatible endpoint. Different tasks route to different models: generation, verification, and analysis each pick the model best suited to the job.

GPT-OSS 20B / 120B
Gemma 4 4B – 31B
Qwen 3.6 35B-A3B
Nemotron 3 – 500B

The families listed above are bring-your-own options for customers who want to host their own models. Our deployment package currently ships with recommended GPT-OSS configurations, and you can also connect your own weights or fine-tuned models.

❯ see it live

Contact us for a demo.

Every environment is different — the hardware you have, the models you trust, the boundary you enforce. We set up Self-Hosted trials two ways.

in-person
On-site demo

We come to you. Bring your hardware, your analysts, and a binary you actually care about.
virtual
Hosted Self-Hosted demo

We spin up an inference server in a neutral environment so you can evaluate the on-prem experience without shipping GPUs.

Request a demo

Works where you work.

Air-gapped work

IP-sensitive reversing

Regulated industries

Same agents

Same notebook

Same verification

Same API

Hardware tiers

Open-weight models

Contact us for a demo.

On-site demo

Hosted Self-Hosted demo