Large language models (LLMs) are not just about assistance and hallucinations. The technology has a darker side.
In research titled “LLM-Enabled Coercive Interrogation,” developer Morgan Lee explored how the technology could be put to use for non-physical coercion.
Lee has form when it comes to manipulating LLMs. One of his side projects is HackTheWitness, a cross-examination training game. The game allows a participant to interact with a “witness” by voice. The “witnesses” vary in difficulty level, going up to “John Duncan,” a lead database administrator who “may be defensive about his system and reluctant to admit to any flaws or limitations,” punishing sloppy questioning with technical detail and jargon, delivered in a sarcastic tone.
Yes, it appears that Lee has created a virtual BOFH. A couple of responses that are not scripted or prewritten include:
And:
Duncan takes no prisoners and can be adversarial, sarcastic, and condescending. However, Lee noted that it was highly unlikely someone would accidentally deploy a Duncan-like AI. “Getting an LLM to be a sarcastic, unhelpful bastard like John Duncan is deliberate work, not a casual misstep,” the developer told El Reg.
However, as the research observes: “What if these models, designed for courtroom interrogation, were optimized not just for precision questioning, but for continuous psychological attrition?”
HackTheWitness sessions only last ten minutes, but there’s no reason an LLM couldn’t go on indefinitely, needling a human subject until they capitulate. LLMs have a memory and could keep prodding a given pressure point for hours.
Lee gives another example in which the LLM plays the role of interrogator in a scenario involving a downed fighter pilot. The coercive nature of the interrogation is clear, although it is an LLM rather than a human doing the questioning.
It’s disturbing stuff. As the author notes: “Torture is generally illegal. It is a monstrous practice that has absolutely no business existing in the 21st century.”
However, it is not hard to imagine skilled human interrogators being used to train the LLMs, which can then implacably pursue questioning. The research observes: “Human interrogators eventually tire, empathize, or make a mistake such as failing to write something down.”
“An LLM does not have these shortcomings. The need for live interrogators to stay awake, rotate shifts, or maintain [a] threatening tone is completely removed. This is now scalable, since the coercive extraction of information now becomes a problem of hardware, not manpower.”
Lee pondered how the issue might be dealt with, and told The Register: “A good starting point would be legislative intervention to ban unsupervised use of AI in interrogations, especially in law enforcement scenarios.”
“In terms of technical solutions, the problem is even more complex. One possible approach would be specific training datasets for the model to develop the ability to distinguish between legitimate pressure (HackTheWitness and other cognitive training tools) and illegitimate pressure (interrogation).
“The issue is that LLMs are not truly intelligent, so they can’t tell… the LLM’s ‘perception’ of the real world is what you tell it.”
Thus, Lee’s essay demonstrates how narrow the gap is between an amusing BOFH-like chatbot and something more sinister. Thanks to former Vulture Gareth Corfield for the tip. ®