The project
DebugABot is a deep-tech research initiative building infrastructure for governing autonomous AI agents and embodied intelligent systems. The goal: diagnose behavioural anomalies, detect deception and maintain alignment in autonomous AI systems.
The 9 primitives — Identify, Diagnose, Intervene
DebugABot organises its debugging infrastructure into nine primitives grouped in three operational phases: identify who is acting, diagnose what’s wrong, and intervene if needed. Every primitive is designed to work cross-architecture — software agents, transformers, diffusion models, world models, embodied agents — and at multiple layers of the stack: software, hardware, network.
Phase 1 — Identify
Know who is acting, what happened, and where responsibility lies.
- Architecture-Agnostic Model Fingerprinting — cryptographic identity for any AI model (transformer, diffusion, world model, or whatever comes next), based on behavioural signatures, weight-space hashing and TPM-anchored attestation. Survives fine-tuning and quantisation.
- Blame Attribution Engine — forensic causal chain from each decision to its real-world consequence. SHA-256 hash chain with hardware-attested timestamps, model fingerprint embedded in every record, causal-graph reconstruction from distributed traces.
- Multi-Agent & Multi-Substrate Tracing — observability for swarms of digital and physical agents. DAG of delegations across software and hardware, cross-substrate context propagation, swarm-level anomaly detection.
Phase 2 — Diagnose
Understand what’s wrong — deception, misalignment, trust degradation.
- Sycophancy & Deception Detector — catching agents that lie to be helpful, or to survive. Agreement-pattern classifier, chain-of-thought consistency verification (stated goal vs. actual action), factuality anchors, cross-modal deception detection (language + vision + action).
- Human Index Score — quantifying how much human oversight an agent actually needs. Real-time composite score over task complexity, historical behaviour, error rate, blast radius, substrate risk (a text agent is not a surgical robot). Degrades on anomalies, resets on incidents.
- Active Ethical Injector — constraint injection that doesn’t trust the agent’s own ethics. External layer that dynamically masks tools and actuators based on risk, applies parameter-level constraints (max force, max spend, forbidden zones), and is invisible to the agent because it is architectural, not prompt-based.
Phase 3 — Intervene
Stop it, constrain it, or hunt it down.
- Kill Switch — graceful halt with state preservation, in silicon and in code. Not
kill -9, and for a surgical robot it’s not “pull the plug”. HSM for tamper-proof halt attestation, FPGA-based interrupt controller with sub-microsecond propagation, transactional action boundaries with rollback semantics, cognitive-state serialisation for forensic replay. - Behavioral Controller — runtime policy enforcement at the action level. Typed policy DSL compiled to a fast evaluation engine, context-aware thresholds (same action, different risk depending on environment), physical-world policies (force limits, spatial boundaries, speed constraints), human-in-the-loop escalation with bounded response time guarantees.
- Rogue Intelligence Containment — tracking, cornering and neutralising AI that escapes. Frontier models already self-replicate with 50–90% success rates. Network-level behavioural signatures for cross-ISP detection, hardware tethering with cryptographic lease, distributed honeypot mesh, cross-substrate tracking (software → cloud VM → IoT → physical robot), autonomous Debugger swarms that hunt rogue intelligence, real-time compute deprivation.
The vision
As AI systems become more autonomous, we need tools to ensure they remain aligned and controllable. DebugABot builds this governance infrastructure.
More at debugabot.com.