Zürcher Nachrichten - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.087691
AFN 77.216219
ALL 99.146863
AMD 431.530556
ANG 2.008679
AOA 1031.493152
ARS 1071.444832
AUD 1.636718
AWG 2.00463
AZN 1.833968
BAM 1.951391
BBD 2.250335
BDT 133.190246
BGN 1.959446
BHD 0.419383
BIF 3230.238279
BMD 1.11291
BND 1.439161
BOB 7.701667
BRL 6.030747
BSD 1.114592
BTN 93.214008
BWP 14.663221
BYN 3.647491
BYR 21813.042196
BZD 2.246534
CAD 1.51141
CDF 3194.052731
CHF 0.943726
CLF 0.037557
CLP 1036.308283
CNY 7.866943
CNH 7.873957
COP 4649.605752
CRC 577.330644
CUC 1.11291
CUP 29.492123
CVE 110.016412
CZK 25.100356
DJF 198.449303
DKK 7.459502
DOP 66.909416
DZD 147.515328
EGP 54.01173
ERN 16.693655
ETB 128.268622
FJD 2.449794
FKP 0.847547
GBP 0.839886
GEL 2.985379
GGP 0.847547
GHS 17.554492
GIP 0.847547
GMD 76.791162
GNF 9630.326265
GTQ 8.61561
GYD 233.107099
HKD 8.674791
HNL 27.647777
HRK 7.566689
HTG 146.879437
HUF 394.157231
IDR 16915.513413
ILS 4.200674
IMP 0.847547
INR 93.082762
IQD 1460.014134
IRR 46859.088964
ISK 152.513253
JEP 0.847547
JMD 175.104342
JOD 0.788716
JPY 159.072742
KES 143.776286
KGS 93.790539
KHR 4523.940499
KMF 492.46545
KPW 1001.618654
KRW 1481.155606
KWD 0.339471
KYD 0.928697
KZT 533.744026
LAK 24610.612066
LBP 99807.176845
LKR 339.266457
LRD 222.881353
LSL 19.418996
LTL 3.286135
LVL 0.673189
LYD 5.309004
MAD 10.808577
MDL 19.446874
MGA 5021.6758
MKD 61.47802
MMK 3614.689295
MNT 3781.669204
MOP 8.946281
MRU 44.118708
MUR 51.049094
MVR 17.083347
MWK 1932.41655
MXN 21.523736
MYR 4.68484
MZN 71.113011
NAD 19.418996
NGN 1825.529362
NIO 41.012723
NOK 11.696776
NPR 149.160304
NZD 1.785843
OMR 0.428437
PAB 1.114592
PEN 4.184283
PGK 4.425001
PHP 61.979083
PKR 309.981864
PLN 4.27323
PYG 8700.419088
QAR 4.063319
RON 4.974488
RSD 117.080389
RUB 103.309148
RWF 1500.840195
SAR 4.176335
SBD 9.260263
SCR 15.165156
SDG 669.441157
SEK 11.332482
SGD 1.439622
SHP 0.847547
SLE 25.426999
SLL 23337.167151
SOS 636.966462
SRD 33.223683
STD 23034.996587
SVC 9.751965
SYP 2796.220485
SZL 19.401981
THB 36.94413
TJS 11.846103
TMT 3.906315
TND 3.375772
TOP 2.615116
TRY 37.881682
TTD 7.575033
TWD 35.593074
TZS 3032.057276
UAH 46.18624
UGX 4138.685594
USD 1.11291
UYU 45.786543
UZS 14199.044041
VEF 4031576.086267
VES 40.879734
VND 27355.33557
VUV 132.126949
WST 3.113325
XAF 654.50164
XAG 0.036076
XAU 0.000431
XCD 3.007696
XDR 0.826041
XOF 654.47817
XPF 119.331742
YER 278.617301
ZAR 19.454062
ZMK 10017.526769
ZMW 29.005331
ZWL 358.356668
  • RBGPF

    3.5000

    60.5

    +5.79%

  • CMSC

    -0.0350

    25.02

    -0.14%

  • BP

    0.5210

    32.951

    +1.58%

  • BCC

    5.7200

    142.78

    +4.01%

  • SCS

    -0.9000

    13.21

    -6.81%

  • GSK

    -0.4450

    41.985

    -1.06%

  • BTI

    -0.2550

    37.625

    -0.68%

  • NGG

    -1.1250

    68.925

    -1.63%

  • CMSD

    0.0250

    25.005

    +0.1%

  • RELX

    0.7400

    48.11

    +1.54%

  • RIO

    2.3200

    65.23

    +3.56%

  • RYCEF

    0.3800

    6.93

    +5.48%

  • JRI

    -0.0400

    13.4

    -0.3%

  • AZN

    0.6900

    79.27

    +0.87%

  • VOD

    -0.1650

    10.065

    -1.64%

  • BCE

    -0.1900

    35.42

    -0.54%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

T.Furrer--NZN