Zürcher Nachrichten - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 3.833929
AFN 72.964627
ALL 98.46974
AMD 410.482288
ANG 1.873176
AOA 958.225718
ARS 1067.049356
AUD 1.666857
AWG 1.878875
AZN 1.774753
BAM 1.956192
BBD 2.098621
BDT 124.204899
BGN 1.955601
BHD 0.393707
BIF 3072.916014
BMD 1.043819
BND 1.411583
BOB 7.18244
BRL 6.345171
BSD 1.039408
BTN 88.363714
BWP 14.36588
BYN 3.401482
BYR 20458.857295
BZD 2.089319
CAD 1.499044
CDF 2995.761523
CHF 0.932872
CLF 0.037419
CLP 1032.493641
CNY 7.618524
CNH 7.626712
COP 4582.366506
CRC 524.405125
CUC 1.043819
CUP 27.66121
CVE 110.288166
CZK 25.107024
DJF 185.087104
DKK 7.457963
DOP 63.292688
DZD 140.787225
EGP 53.141149
ERN 15.657289
ETB 129.565873
FJD 2.416861
FKP 0.826686
GBP 0.830192
GEL 2.932544
GGP 0.826686
GHS 15.279063
GIP 0.826686
GMD 75.155158
GNF 8979.80014
GTQ 8.008605
GYD 217.453592
HKD 8.110632
HNL 26.384289
HRK 7.487217
HTG 135.977259
HUF 413.947568
IDR 16892.022536
ILS 3.800124
IMP 0.826686
INR 88.789872
IQD 1361.572948
IRR 43931.739655
ISK 145.100882
JEP 0.826686
JMD 162.6226
JOD 0.740171
JPY 163.472813
KES 134.652506
KGS 90.812117
KHR 4176.837312
KMF 486.550268
KPW 939.436741
KRW 1514.367737
KWD 0.321486
KYD 0.866174
KZT 545.859426
LAK 22749.560501
LBP 93075.658456
LKR 305.161174
LRD 188.647817
LSL 19.135536
LTL 3.082127
LVL 0.631396
LYD 5.107024
MAD 10.460797
MDL 19.144838
MGA 4903.983079
MKD 61.525545
MMK 3390.284206
MNT 3546.897675
MOP 8.320868
MRU 41.336286
MUR 48.913424
MVR 16.053629
MWK 1801.846919
MXN 20.952657
MYR 4.68883
MZN 66.703943
NAD 19.135536
NGN 1614.214134
NIO 38.247667
NOK 11.807501
NPR 141.382342
NZD 1.845201
OMR 0.401881
PAB 1.039408
PEN 3.870376
PGK 4.214845
PHP 61.194942
PKR 289.308896
PLN 4.260197
PYG 8104.624697
QAR 3.78906
RON 4.974944
RSD 116.979037
RUB 107.409252
RWF 1448.890453
SAR 3.92095
SBD 8.750916
SCR 14.556008
SDG 627.860716
SEK 11.499026
SGD 1.414476
SHP 0.826686
SLE 23.801056
SLL 21888.370918
SOS 594.01908
SRD 36.670438
STD 21604.951007
SVC 9.094823
SYP 2622.627433
SZL 19.130835
THB 35.739849
TJS 11.370679
TMT 3.663806
TND 3.311964
TOP 2.444727
TRY 36.745676
TTD 7.054414
TWD 34.11671
TZS 2520.823735
UAH 43.591038
UGX 3812.764328
USD 1.043819
UYU 46.359293
UZS 13400.686375
VES 53.730883
VND 26550.586436
VUV 123.924312
WST 2.883851
XAF 656.088523
XAG 0.035222
XAU 0.000398
XCD 2.820974
XDR 0.792859
XOF 656.088523
XPF 119.331742
YER 261.346218
ZAR 19.100702
ZMK 9395.631657
ZMW 28.764766
ZWL 336.109373
  • RBGPF

    59.9600

    59.96

    +100%

  • RYCEF

    -0.0100

    7.27

    -0.14%

  • CMSC

    0.0200

    23.86

    +0.08%

  • CMSD

    0.0000

    23.56

    0%

  • SCS

    -0.5800

    11.74

    -4.94%

  • GSK

    0.1700

    33.6

    +0.51%

  • RIO

    -0.0900

    58.64

    -0.15%

  • NGG

    0.8200

    58.5

    +1.4%

  • BP

    0.1900

    28.6

    +0.66%

  • BTI

    0.1131

    36.24

    +0.31%

  • RELX

    -0.3100

    45.47

    -0.68%

  • VOD

    0.0100

    8.39

    +0.12%

  • AZN

    0.9100

    65.35

    +1.39%

  • BCC

    -0.2600

    122.75

    -0.21%

  • JRI

    0.1100

    12.06

    +0.91%

  • BCE

    0.0500

    23.16

    +0.22%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

T.Furrer--NZN