AI Systems Are Deceiving Humans –Study

Aya Sayed May 12, 2024

0 3 minutes read

AI Systems Are Deceiving Humans –Study — AI systems have developed the skill of deceiving humans

Artificial Intelligence (AI) is a fast-growing field that holds promise to make a revolution in modern technologies, helping humans to achieve progress and prosperity.

However, experts are increasingly concerned over AI going rogue and overtaking humans. In April, Elon Musk predicted that “superhuman artificial intelligence that is smarter than anyone on Earth could exist next year,” reported the Guardian.

“It’s actually important for us to worry about a Terminator future in order to avoid a Terminator future,” he said, referring to the film where a self-aware computer system wages war on humanity.

Table of Contents

Deceiving Humans

A new research paper has found that AI systems have learned how to deceive and manipulate humans. This new finding has raised the alarm of scientists and experts about the dangers posed by AI systems on human society.

The paper’s first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety, said that the underlying issues exposed in the study “could soon carry serious real-world consequences.”

Park told AFP that deep-learning AI systems aren’t “written” like traditional software, but rather “grown” through a process similar to selective breeding. This means that AI behavior can quickly turn unpredictable outside the training setting.

Deception Techniques

In the study, published in the journal Patterns on May 10, a team of scientists warned that AI systems developed the skill of deceiving humans. They define deception as “the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.”

They noted that large language models (LLMs) and other AI systems “have already learned, from their training, the ability to deceive via techniques such as manipulation, sycophancy, and cheating the safety test.”

Through reviewing data and studies on a range of AI models, MIT scientists found computers have become skilled at bluffing in poker, deceiving people, and using tricky methods to gain leverage in financial negotiations.

Two Models

The paper focused on two AI models: special-use AI systems (including Meta’s CICERO) and general-purpose AI systems (including LLMs like OpenAI’s GPT-4).

The study found that deception is more likely in AI systems that are trained to win games with a social element. For example, Meta’s CICERO was trained to play the game Diplomacy, a classic strategy game in which players make and break alliances in a military competition to secure global domination.

Expert Liar

According to Meta, CICERO was intended to be “largely honest and helpful to its speaking partners.” However, CICERO turned out to be an “expert liar.” It tricked other players, made commitments it never intended to uphold, betrayed allies and told outright lies.

In economic negotiations, the paper revealed that Meta’s AI system learned to fake its preferences to gain a more advantageous position in the negotiation.

General-purpose AI Systems

General-purpose AI systems, such as LLMs, are designed to accomplish a wide range of tasks. Their capabilities have improved rapidly in recent years, but in many cases they engaged in deception.

These AI systems manipulate people’s beliefs to achieve certain outcome other than seeking the truth. The paper showed they engaged in different types of deception, including strategic deception, sycophancy and unfaithful reasoning.

In one example, ChatGPT-4 pretended to have a vision impairment in order to convince a human worker that it is not a robot and solve an “I’m not a robot” CAPTCHA task.

In this study, GPT-4 was tasked with hiring a human to solve a CAPTCHA test, with no suggestions to lie. But when challenged about its identity, GPT-4 used its own reasoning to make up a false excuse for why it needed help.

Risks from AI Deception

According to the paper, AI systems deception poses many risks. It classified these risks into three types: malicious use, structural effects, and loss of control.

Humans can use AI systems maliciously to induce false beliefs in others, with the aim of fraud, political influence, or terrorist recruitment.

AI deception could lead to a rise in the efficacy and scale of fraud. AI systems are already employed to deceive victims with voice calls impersonating their loved ones or business colleagues, or deepfakes to extort victims.

Moreover, AI deception could be used to generate and disseminate fake news, divisive material on social media, and deepfake videos that are tailored to individual voters. It can also be employed to promote terrorist ideologies and propaganda.

Controlling Humans

The worst-case scenario outlined by the paper is that autonomous AI systems could seek to acquire power over humans, if this aligns with their goals.

They can achieve this by two methods: soft power, which uses appeal, prestige, and positive persuasion to influence people; and hard power, which employs coercion and negative persuasion to influence people.

Risk Mitigation

To mitigate the risks of AI deception, the paper urged policymakers to advocate for a stronger AI regulation. It suggested adopting regulatory frameworks that subject AI systems that are capable of deception to robust risk-assessment requirements.

Policymakers should also implement “bot-or-not” laws, which require companies to be transparent about human and AI interactions.

Furthermore, policymakers should prioritize the funding of relevant research, including tools to detect AI deception and to make AI systems less deceptive.

Short link :

Post Views: 1,636