Why doesn't the Amazon Echo respond to advertisements or reports about Alexa?

22

2

I previously asked about what you can do if Alexa is triggered by a television programme, but recently I realised something strange: The Echo does not respond to the voices in adverts for the Echo, even if voices say "Alexa, play ..." or "Alexa, set a timer for ...".

I searched on a few other Echo communities, and found a Reddit post that suggests that this is common/intended behaviour. There isn't a definitive answer in the thread, though, so I thought I would ask here to see if someone knows a little bit more.

How does my Echo know not to answer to a TV advert? Is it just a co-incidence or is there something that tells Alexa not to react?

Aurora0001

Posted 2017-01-18T15:35:33.407

Reputation: 11 277

Did you train your Alexa to recognise your voice more accurately? I do not know if voice training can result in not recognising someone else's voice.Bence Kaulics 2017-01-18T15:50:08.953

1@BenceKaulics Nope, I've not needed to train the Echo; it's using the default settings.Aurora0001 2017-01-18T15:51:36.687

It would be really useful to get a look at the audio in question. I don't suppose there's a linkable copy anywhere?goobering 2017-01-18T16:15:58.003

1

@goobering I believe the adverts referred to in the reddit post are: Mascot Keys and Fire Extinguisher. I'm not currently able to test whether these trigger Alexa (I wonder if they're different to the TV versions?). If someone could do that and comment with the results, that'd be really useful.

Aurora0001 2017-01-18T16:18:49.007

3

There may be clues in the source code. 266MB download, however. Going to be at the grokking for a while. :P

goobering 2017-01-18T16:32:34.237

Answers

16

According to this reddit post, Alexa is sensitive to the audio spectrum in addition to detecting the wake word. Thus, a normal real-world wide-band signal is accepted, but a signal which is band-limited (a notch between 4kHz and 5hKz is postulated) will be identified as from a broadcast.

This makes some sense since broadcasters may use in-band signaling to identify adverts (for localised replacement), and the audio processing typically applied to adverts might be optimised for clarity over fidelity. The filtering might be set up so that typical adverts are monitored with reduced sensitivity, and during the production of a specific advert, the senditivity could be explicitly reduced too.

A news report (which reportedly did trigger Alexa) would be more likely to use the full broadcast audio spectrum (8 or 16 kHz) without processing. So this theory assumes that there is either something special about many adverts (at least in some regions), or adverts (such as those produced by Amazon) can be configured specifically.

As a follow-up, there is a paper reported here which describes how small (sub-audible) changes to a waveform can result in a speech engine returning a completely different result compared with what a human would recognise.

Sean Houlihane

Posted 2017-01-18T15:35:33.407

Reputation: 7 357

1If that's where they've notched it, and the wake word is 'Alexa', it's plausible that they're just ditching the fricative 'ks' sound to minimise pickup by the mic. That's fairly high frequency for human speech.goobering 2017-01-18T15:56:47.593

7

Well, the echo/Alexa definitely hears the request. If you go into your settings, scroll down to General and then select history you can play back all of the requests which are heard. All of the requests that are heard from the commercial say "Voice request not intended for your Echo—nothing was returned.".

Ryan

Posted 2017-01-18T15:35:33.407

Reputation: 71

2Seems like a new detail to the history. Very helpful :)Helmar 2017-08-13T08:34:40.873

7

I very much assume that the wake word recognition in the Echo is more than just listening for the wake word. It's listening for an alerting context. Consider this excerpt from Speech Technologies:

[A Wake-Up-Word] has the following unique requirement: Detect a single word or phrase when spoken in an alerting context, while rejecting all other words, phrases, sounds, noises and other acoustic events with virtually 100% accuracy including the same word or phrase of interest spoken in a non-alerting (i.e. referential) context.

(Speech Technologies: Wake-Up-Word Speech Recognition by Veton Kepuska)

This can be quite easily be tested as the device (at least mine) does not react to the sentence, "I was talking to Alexa about skiing recently." That is not an alerting context, it's purely referential. Thus the wake word recognition engine inside the Echo is not only listening to the pure appearance of the word but also on the intonation and preceding pauses that make it possible to more accurately predict if the device was actually spoken to.

Helmar

Posted 2017-01-18T15:35:33.407

Reputation: 5 936

4Surely an advert demonstrating the use of Alexa should trigger it though, if it was just this stopping it from being picked up? Are the adverts perhaps phrased carefully so they don't actually trigger the device, despite using the wake word to demonstrate how the Echo is used?Aurora0001 2017-01-18T19:54:52.103

2@Aurora0001 I assume that additionally to what I describe there is also some method along the lines of what Sean mentions in his answer employed. Some filter that tries to reduce triggers by other devices.Helmar 2017-01-18T19:56:32.937

5

My total guess is that in the adverts for Echo, Alexa responds to the question much quicker than in reality. Therefore, the Echo is hearing the word 'Alexa' but almost immediately hearing Alexa's own voice giving the response.

My echo lights up when the advert comes on but then appears to dismiss the alert. There may be some logic to prevent two Echos responding to a request if they both hear it. The Echo may be designed to listen specifically for Alexa's own voice and ignore it.

However, like I said, this is a total guess. :)

Andy Jones

Posted 2017-01-18T15:35:33.407

Reputation: 151

We were thinking the same thing, so we paused the DVR between the Alexa request and her response in the commercial. Our Echo still woke up, but then backed off without answering, identical to what happens when we didn't pause the DVR.ViperGeek 2017-08-10T04:22:06.403

I've been meaning to try that for ages and keep forgetting. That's one more thing off the to-do list, thanks. :)Andy Jones 2017-08-11T07:18:52.943

5

If 1000 people say the alert word, it will have 1000 different acoustical signatures. If they do it again, another 1000.

If 1000 Alexas hear a TV program saying the alert word, it will have 1000 of the same acoustical signatures.

It would not be that hard to detect this server-side. Not least, because if they happen at the same time, the voice-reco server gets a slam of traffic.

If the list of these incidents is small, they could even download the signatures to every Alexa.


Also, a user calling Alexa sounds like silence alert-word.

A news article sounds like blah blah blah alert-word. A commercial sounds like music_here alert-word. Not the same at all.

Harper

Posted 2017-01-18T15:35:33.407

Reputation: 151

4

Following recent new reports that Alexa can be sensitive to UHF sounds (reference BBC News Sevice) I would postulate that during the adverts they broadcast an additional sound beyond human hearing, which is designated as an ‘ignore this command’ command.

As per the aforementioned ability for Alexa to differentiate between user voices, this is a feature which is planned but as yet unimplemented. i.e. you have to actively command Alexa to switch between user accounts in the same household.

The only device currently enabled to differentiate voices is the Google device.

Rai Iwa

Posted 2017-01-18T15:35:33.407

Reputation: 37

1

When mixing the advert's audio, they simply remove some frequencies. This means that Alexa won't be triggered as it will not register it as a voice command, but viewers can still make out what they are saying in the advert.

You'll also probably notice that when the command is spoken in the adverts, it sounds a little thin or garbled. This is why :)

John Smith

Posted 2017-01-18T15:35:33.407

Reputation: 11

Interesting; this is a little similar to what Sean suggested. Do you have any sources or experience of this that you could share to prove that the frequency removal is the case? That might be an interesting thing to investigate.

Aurora0001 2017-11-17T14:48:56.653