Shop OBEX P1 Docs P2 Docs Learn Events
A challenge for computer speech recognition? — Parallax Forums

A challenge for computer speech recognition?

Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
edited 2018-07-11 16:06 in General Discussion
What do you hear when you play this video?



"Laurel" or "yanni?" How can two people hear two such different words from the same recording?

-Phil

Comments

  • ercoerco Posts: 20,244
    I get "yammy" just hearing that.
  • W9GFOW9GFO Posts: 4,010
    You should be able to hear either one if you try. I think the recording is both words right on top of each other. Your brain has to pick one to make sense of what it is receiving so you perceive one or the other, a few people hear both at the same time.
  • ercoerco Posts: 20,244
    edited 2018-07-12 04:34
    I just listened to it again and I swear they changed it. ~ 4 hours ago it sounded exactly like yammy, nothing like laurel. Made me wonder what the gag was. And now I hear laurel very clearly, nothing close to yammy at all.
  • I hear Laurel and think the Yanni's are nuts.

    Here's a simulator.

    https://www.nytimes.com/interactive/2018/05/16/upshot/audio-clip-yanny-laurel-debate.html

  • I was convinced for a time that they were messing with phasing between the two stereo channels. When I sit in front of my MacBook, I hear "laurel." When I move off to the side, I hear "yanni." But then, I slowly moved from the side to the front and still heard "yanni."

    I'm not convinced by the frequency argument yet. There has to be something more subtle going on than that.

    -Phil
  • I think it has to do with the nature of the digital synthesizer used to render speech. There is a coarse (digital) component and a somewhat smoother (Still digital .. but better) component. I think the original word to be synthesized is "Laurel" which would be the smoother component, the coarse component reveals itself as "Yanni" as the synthesizer struggles to render the sounds together. .... It's a bad example anyway... I hear "Yummy" .... what is Yanni anyway? ... just digital artifacts if you ask me.
  • Shawn LoweShawn Lowe Posts: 635
    edited 2018-07-12 16:18
    this is so weird. I distinctly hear Laurel. Played it for my wife, she hears yammi. Weird
  • As Dubya would have said. It must be sublimaniminals.
  • Heater.Heater. Posts: 21,230
    I think a lot of this is down to what you expect to hear. What you are used to and what you are not.

    For me as an old Brit "Laurel" is familiar, "yanni" is not even a word. Why would my brain match a sound to something it knows nothing about?



  • Here is what wiki says. It depends on what frequencies you hear best.
  • Yes, the frequency. Your hearing and the sound system in your computer/TV play a huge part in what you can hear. There are a ton of misheard lyrics ("bathroom on the right," "wrapped up like a douche," etc.) that are a testament to this.

    It has nothing to do with some mystical brain magic, except that the brain can interpret only the sounds that come through the cochlea. Impair the quality of that sound and your brain will make a closest association, even if it's a nonsensical word.
  • GordonMcCombGordonMcComb Posts: 3,366
    edited 2018-07-12 23:54
    Heater. wrote: »
    For me as an old Brit "Laurel" is familiar, "yanni" is not even a word.

    It may not be a "word" but it is the name of a famous musician who's been playing for nearly four decades. Odds are you've heard the word/name at one point or another in your life, though that has nothing to do with why you or anyone would hear "yanni." Our brains will approximate; "Earn more sessions by sleeving / Ten more seconds and I'm leaving." Steve Martin knows, just ask him.

  • When you brought up the point about frequencies, I decided to try something. I listened through the speaker on my tablet, and then I plugged a set of earbuds in. The difference was surprising...when I use the speakers on the tablet I hear "yanni", and when I use the earbuds I hear "laurel".
  • GordonMcCombGordonMcComb Posts: 3,366
    edited 2018-07-13 17:17
    For many years I worked for several Hollywood film labs, including Technicolor, developing software used in the transcription of film and TV dialogue. These to-the-frame transcripts are done to create lists for subtitling, foreign language translations, captioning, and other uses. Film sound is often highly augmented and compressed, with effects and music added, making interpreting a character's spoken lines a real art. I was never very good at it, but fortunately I only had to write the software.

    There were a lot of times when a transcriber got the words totally wrong, and the studios always made an unholy fuss about it, even though they were fully aware of the challenges. It can be a real problem in translations because if they've already started dubbing they have to pay for a retranslation, and maybe get the actor to come back in and voice the new dialogue. It's not unusual for voice actors to be paid a minimum four-hour day, even if they work for 10 minutes. Mistakes add up.

    After a couple instances of this with a particular studio that liked to be a PITA for price leverage, we always demanded isolated dialogue track only. Arnold Schwarzenegger is hard enough to understand without loud music and explosions. In a particular project involving robots from the future, when hearing the actual dialogue stems without filtering augmentation or effects a number of lines came out very differently.
  • Arnold Schwarzenegger is hard enough to understand without loud music and explosions. In a particular project involving robots from the future, when hearing the actual dialogue stems without filtering augmentation or effects a number of lines came out very differently.
    Hasta la vista, maybe!

    -Phil
  • I just hear 'jello'

    Mike
  • Heater.Heater. Posts: 21,230
    GordonMcComb,
    It may not be a "word" but it is the name of a famous musician who's been playing for nearly four decades.
    Ah, you mean Yanni as in Yiannis Chryssomallis. Interesting, never heard of him, thanks for the heads up. Although that's not really my style.

    I'm more likely to think "jänis" the Finnish word for "hare".







  • Alexa, how do you wreck-a-nice beach?
  • ercoerco Posts: 20,244
    The BIGGER question here in the Matrix is choice: What do you WANT to hear? Yanni or Kenny G?



  • Heater.Heater. Posts: 21,230
    edited 2018-07-18 17:17
    Neither. Same to me. All sounds like that stuff you hear while waiting for a call to a customer support center be answered.

    I want Janis:

  • GordonMcCombGordonMcComb Posts: 3,366
    edited 2018-07-18 17:07
    I never listened much to Yanni. I throw rocks at the stereo system when there's a Kenny G song on. I cannot stand soprano sax. I'm a Stan Getz (mostly tenor sax) kind of guy.

    Now, was that Girl from Ipanema or Grill for Hypotenuse?



Sign In or Register to comment.