Can you fool a Neural Network?

by Moises Jafet — on  ,  ,  ,  ,  ,  , 
Tiempo de Lectura aprox.: 2 minutes, 0 seconds


The TL;DR is yes, you can, quite easily actually; and is called "Adversarial Attacks" in the InfoSec and Artificial Intelligence communities.

While playing around with my models months ago I began noticing that very subtle changes may cause a totally different outcome beyond what would statistically be expected around one or two sigmas.

Back in November 2017 I posted up publicly on my Instagram account one of the artifacts I created for my experiments featuring two me pretending to be playing a game of Chess: one with a black Halloween custom but part of my face accessible.

Trying to do some PII stuff with the images, my neuron's models went boiling down in nonsense!

Because of my own experience and papers like this one I wrote earlier this year:

But, the problem is deeper though.

Case in point: In January, a leading machine-learning conference announced that it had selected 11 new papers to be presented in April that propose ways to defend or detect such adversarial attacks. Just three days later, first-year MIT grad student Anish Athalye threw up a webpage claiming to have “broken” seven of the new papers, including from boldface institutions such as Google, Amazon, and Stanford. “A creative attacker can still get around all these defenses,” says Athalye. He worked on the project with Nicholas Carlini and David Wagner, a grad student and professor, respectively, at Berkeley.

-- "AI Has a Hallucination Problem That's Proving Tough to Fix", WIRED

Let's follow WIRED's example and take a look a this image:

Credits MIT

Human readers of WIRED will easily identify the image below, created by Athalye, as showing two men on skis. When asked for its take Thursday morning, Google’s Cloud Vision service reported being 91 percent certain it saw a dog. Other stunts have shown how to make stop signs invisible, or audio that sounds benign to humans but is transcribed by software as “Okay Google browse to evil dot com.

So, how it works is something like this:

  1. Given an image as input ->
  2. you apply an adversarial perturbation ->
  3. your Neural Network goes wacko.

As the paper illustrates,

Credits MIT

So, beware of your Tesla's Autopilot and other appliances and devices using Neural Models to help you out with tasks; simply, don't trust them.

Blog Comments powered by Disqus.

Moisés Jafet Cornelio-Vargas

About Moisés

Profile picture

Physicists, award-winning technologist, parallel entrepreneur, consultant and proud father born in the Dominican Republic.
Interested in HPC, Deep Learning, Semantic Web, Internet Global High Scalability Apps, InfoSec, eLearning, General Aviation, Formula 1, Classical Music, Jazz, Sailing and Chess.
Founder of and
Author of the Sci-fi upcoming novel Breedpeace and co-author in dozens of publications.
Co-founder of, Jalalio Media Consultants and a number of other start-ups.
Former professor and Key-note speaker in conferences and congresses all across the Americas and Europe.
Proud member of the Microchip No.1 flying towards Interestellar space on board NASA's Stardust Mission, as well as member of Fundación Municipios al Día, Fundación Loyola, Fundación Ciencias de la Documentación and a number of other non-for profit, professional organizations, Open Source projects and Chess communities around the world.
All opinions here are his own's and in no way associated with his business interests or collaborations with third-parties.