Tool-Mediated Belief Injection: How Tool Outputs Can Cascade Into Model Misalignment
When we deploy language models with access to external tools, we dramatically expand their capabilities. However, tool access also introduces new attack surfaces that differ fundamentally from traditional prompt injection. We document how adversarially crafted tool outputs can establish false premises that persist and compound across a conversation.