[ edition / self-hosted ]

Works where you work.

Sidekick Self-Hosted runs inside your perimeter — on your hardware, against your models. Binaries, prompts, findings, and notebooks never cross the boundary.

  • 01

    Air-gapped work

    Classified, isolated, or export-controlled environments where no outbound connection is permitted.

  • 02

    IP-sensitive reversing

    Artifacts that must not be uploaded anywhere.

  • 03

    Regulated industries

    Finance, healthcare, defense, and government engagements with contractual limits on data residency and processing.

Nothing leaves the boundary.

Cloud deployments route prompts and context to hosted model providers. Self-Hosted never does. Every artifact Sidekick touches stays on infrastructure you control.

your perimeter
client Binary Ninja + Sidekick plugin
data Binaries · Notebooks · Findings
service Sidekick server
inference Your models · Your GPUs

Same workflow, either way.

The deployment boundary changes. The product does not.

  • 01

    Same agents

    Search, simplify, surface, and verify behave identically to cloud. No feature flags, no reduced capability tier.

  • 02

    Same notebook

    Findings accumulate across sessions with evidence links intact. The artifact you build here travels with you.

  • 03

    Same verification

    Background validation cross-checks every finding against the binary. Runs locally, on your hardware.

  • 04

    Same API

    Script agents, chain workflows, embed Sidekick in pipelines. One surface across deployment models.

Your hardware, your models.

Sidekick routes different operations to different models. Hardware sizing is driven by concurrency, context window, and tokens-per-second — how many analysts, how much history they carry, how much background verification — not by raw model count.

Hardware tiers

  • Floor
    96 GB VRAM · 3 concurrent
    e.g. 1x RTX 6000 Pro Blackwell — comfortably hosts both GPT-OSS 120B and GPT-OSS 20B at the same time, routing tasks between them. The higher GPU memory bandwidth also helps keep interactive workloads feeling faster. Fits a single analyst or a small shared team.
  • Team
    192–320 GB · 4–10 concurrent
    e.g. 2× RTX 6000 Pro Blackwell, or 2–4× H100. The added memory bandwidth gives you more headroom for longer context windows under load and for background verification to run alongside interactive chat.
  • Fleet
    640 GB+ · 20+ concurrent
    Single dense node — e.g. 8× H100 or H200. Sized for headless agent pipelines running alongside interactive analysts: long contexts, long histories, batch triage at throughput.

Open-weight models

Sidekick can route work across modern open-weight model families that you choose to host on the inference server you already trust — vLLM, TGI, or any other OpenAI-compatible endpoint. Different tasks route to different models: generation, verification, and analysis each pick the model best suited to the job.

  • GPT-OSS 20B / 120B
  • Gemma 4 4B – 31B
  • Qwen 3.6 35B-A3B
  • Nemotron 3 – 500B

The families listed above are bring-your-own options for customers who want to host their own models. Our deployment package currently ships with recommended GPT-OSS configurations, and you can also connect your own weights or fine-tuned models.

Contact us for a demo.

Every environment is different — the hardware you have, the models you trust, the boundary you enforce. We set up Self-Hosted trials two ways.

  • in-person

    On-site demo

    We come to you. Bring your hardware, your analysts, and a binary you actually care about.

  • virtual

    Hosted Self-Hosted demo

    We spin up an inference server in a neutral environment so you can evaluate the on-prem experience without shipping GPUs.