People and organizations are grappling with the impact of text written by AI. Teachers want to know whether their students’ work reflects their own understanding. Consumers want to know whether an ad was written by a human or a machine.
Creating rules to govern the use of AI-generated content is relatively easy. Enforcing them relies on the more difficult task of reliably detecting whether a piece of text was generated by artificial intelligence.
you may like
AI text detection issues
The basic workflow behind AI text detection is easy to explain. Start with the part of the text you want to determine the origin of. It then applies a detection tool (often the AI system itself) that analyzes the text and produces a score (usually expressed as a probability) that indicates the likelihood that the text was generated by an AI. Use scores to inform downstream decisions, such as whether to impose penalties for rule violations.
However, this simple explanation hides great complexity. It ignores a lot of background assumptions that need to be made explicit. Do you know which AI tools may have been used to generate this text? What access do you have to these tools? Can you run them yourself or inspect their inner workings? How much text do you have? Is there a single text or a collection of works that you have collected over time? What AI detection tools can and cannot tell you depends largely on the answers to questions like these.
There is one additional detail that is particularly important. Did the AI system that generated the text intentionally embed the marker to facilitate later detection?
These indicators are known as watermarks. Watermarked text looks like regular text, but the markers are embedded in subtle ways that may not be obvious at first glance. Anyone with the appropriate key can later verify the presence of these markers and confirm that the text comes from the watermarked AI-generated source. However, this approach relies on cooperation from AI vendors, which is not always available.
How AI text detection tools work
One obvious approach is to use the AI itself to detect text written by the AI. The idea is simple. First, we collect a large corpus (meaning a collection of sentences) of examples labeled as human-written or AI-generated, and then train a model to distinguish between the two. In fact, AI text detection is treated as a standard classification problem, similar in spirit to spam filtering. Once trained, the detector examines new text and predicts whether the AI-generated examples closely resemble previous human-written examples.
The learned detector approach works even if you know little about the AI tool that may have generated the text. The main requirement is that the training corpus is diverse enough to include output from a wide range of AI systems.
But if you have access to the AI tools you are concerned about, a different approach becomes possible. This second strategy does not rely on collecting large labeled datasets or training separate detectors. Instead, we often look for statistical signals in the text related to how a particular AI model produces language and assess whether the text is likely to have been generated by an AI. For example, some techniques look at the probability that an AI model assigns to text. If a model assigns an unusually high probability to a precise sequence of words, this may be a signal that the text was actually generated by that model.
you may like
Finally, for text generated by an AI system with an embedded watermark, the problem shifts from detection to verification. Using a private key provided by the AI vendor, the verifier can evaluate whether the text matches what was generated by the watermarking system. This approach relies on information not available from the text alone, rather than on inferences drawn from the text itself.
watch on
Each tool family has its own limitations, so it’s difficult to declare a clear winner. For example, learning-based detectors are sensitive to how similar new text is to the data used for training. Accuracy decreases if the text differs significantly from the training corpus. As new AI models are released, the training corpus can quickly become outdated. Continually collecting new data and retraining the detector is costly, and the detector inevitably lags behind the system it is trying to identify.
Statistical tests face various limitations. Many rely on assumptions about how certain AI models generate text, or access to the probability distributions of those models. These assumptions break down if the model is proprietary, frequently updated, or simply unknown. As a result, techniques that work well in controlled settings may become unreliable or inapplicable in the real world.
Watermarking moves the problem from detection to validation, but it introduces dependencies of its own. This relies on the cooperation of AI vendors and only applies to text generated with watermarks enabled.
More broadly, AI text detection is part of an escalating arms race. Detection tools must be made publicly available to be effective, but their transparency allows for evasion. As AI text generators become more capable and evasion techniques become more sophisticated, it becomes less likely that detectors will have a permanent advantage.
harsh reality
The problem of AI text detection is easy to say, but difficult to solve reliably. Agencies with rules governing the use of AI-written text cannot rely solely on detection tools to enforce them.
As society adapts to generative AI, norms around acceptable uses of AI-generated text are likely to be refined and detection techniques may improve. But ultimately you have to learn to accept the fact that such tools are never perfect.
This edited article is republished from The Conversation under a Creative Commons license. Read the original article.
Source link
