Skip to content Skip to footer
0 items - $0.00 0

Accents in latent spaces: How AI hears accent strength in English by ilyausorov

Accents in latent spaces: How AI hears accent strength in English by ilyausorov

Accents in latent spaces: How AI hears accent strength in English by ilyausorov

20 Comments

  • Post Author
    treetalker
    Posted May 6, 2025 at 2:55 pm

    This is cool and one of the applications of LLMs that I'm actually looking forward to: accent training when acquiring a new language, particularly hearing what you would sound like without an accent!

    That said, I found the recording of Victor's speech after practicing with the recording of his own unaccented voice to be far less intelligible than his original recording.

    Looking forward to seeing the developments in this particular application.

  • Post Author
    georgewsinger
    Posted May 6, 2025 at 3:02 pm

    This is so cool. Real-time accent feedback is something language learners have never had throughout all of human history, until now.

    Along similar lines, it would be useful to map a speaker's vowels in vowel-space (and likewise for consonants?) to compare native to non-native speakers.

    I can't wait until something like this is available for Japanese.

  • Post Author
    mckirk
    Posted May 6, 2025 at 3:06 pm

    Is it just me, or did the sound files get hugged-to-death?

  • Post Author
    pjc50
    Posted May 6, 2025 at 3:21 pm

    What the vector-space data gets right, and what the human commentary tends not to, is the idea that accents are a complex statistical distribution. You should be careful about the concept of a "default" or "neutral" accent. Telecommunications has spent the 20th century flattening accents together, as has accent discrimination. There's always the tendency for people to say "my accent is the neutral standard against which all others should be measured".

  • Post Author
    fxtentacle
    Posted May 6, 2025 at 3:24 pm

    What a great AI use-case! At first, I felt excited …

    But then I read their privacy policy. They want permission to save all of my audio interactions for all eternity. It's so sad that I will never try out their (admittedly super cool) AI tech.

  • Post Author
    joshjhargreaves
    Posted May 6, 2025 at 3:28 pm

    Damn, this is really cool.

  • Post Author
    vessenes
    Posted May 6, 2025 at 3:29 pm

    This is super cool.

    A suggestion and some surprise: I’m surprised by your assertion that there’s no clustering. I see the representation shows no clustering, and believe you that there is therefore no broad high-dimensional clustering. I also agree that the demo where Victor’s voice moves closer to Eliza’s sounds more native.

    But, how can it be that you can show directionality toward “native” without clustering? I would read this as a problem with my embedding, not a feature. Perhaps there are some smaller-dimensional sub-axes that do encode what sort of accent someone has?

    Suggestion for the BoldVoice team: if you’d like to go viral, I suggest you dig into American idiolects — two that are hard not to talk about / opine on / retweet are AAVE and Gay male speech (not sure if there’s a more formal name for this, it’s what Wikipedia uses).

    I’m in a mixed race family, and we spent a lot of time playing with ChatGPT’s AAVE abilities which have, I think sadly, been completely nerfed over the releases. Chat seems to have no sense of shame when it says speaking like one of my kids is harmful; I imagine the well intentioned OpenAI folks were sort of thinking the opposite when they cut it out. It seems to have a list of “okay” and “bad” idiolects baked in – for instance, it will give you a thick Irish accent, a Boston accent, a NY/Bronx accent, but no Asian/SE Asian accents.

    I like the idea of an idiolect-manager, something that could help me move my speech more or less toward a given idiolect. Similarly England is a rich minefield of idiolects, from scouse to highly posh.

    I’m guessing you guys are aimed at the call center market based on your demo, but there could be a lot more applications! Voice coaches in Hollywood (the good ones) charge hundreds of dollar per hour, so there’s a valuable if small market out there for much of this. Thanks for the demo and write up. Very cool.

  • Post Author
    adhsu01
    Posted May 6, 2025 at 3:35 pm

    Super cool work, congrats BoldVoice team! I've always thought that one of the non-obvious applications of voice cloning/matching is the ability to show a language learner what they would sound like with a more native accent.

  • Post Author
    asveikau
    Posted May 6, 2025 at 3:44 pm

    Victor's problem isn't really the vowels or pacing. The final consonants are soft or not really audible. I am not hearing the /ŋ/ of "long" as the most marked example. It sounds closer to "law". In his "improved" recording he hasn't fixed this.

    I sometimes see content on social media encouraging people to sound more native or improve their accent. But IMO it's perfectly ok to have an accent, as long as the speech meets some baseline of intelligibility. (So Victor needs to work on "long" but not "days".) I've even come across people who are trying to mimick a native accent but lose intelligibility, where they'd sound better with their foreign accent. (An example I've seen is a native Spanish speaker trying to imitate the American accent's intervocalic T and D, and I don't understand them. A Spanish /t/ or /d/ would be different from most English language accents, but be way more understandable.)

  • Post Author
    wbroo
    Posted May 6, 2025 at 4:05 pm

    Very interestng! Have you tested for other factors like speaking speed, emotional tone, or microphone quality to see what else is (or isn’t) influencing model perception?

  • Post Author
    ccppurcell
    Posted May 6, 2025 at 4:08 pm

    Oh pssh. There's no such thing as accent strength. There's only accent distance. Accent strength is just an artefact of distance from the accent of a socially dominant group.

  • Post Author
    Goofy_Coyote
    Posted May 6, 2025 at 4:24 pm

    Glad to see BoldVoice here.

    I’ve been using it for a few months, and I can confirm it’s working.

  • Post Author
    sonny3690
    Posted May 6, 2025 at 4:50 pm

    This is some insanely cool work. It's going to help so many people.

  • Post Author
    childintime
    Posted May 6, 2025 at 4:50 pm

    I didn't find international english, would have been interesting.

    Also, the USA writing convention falls short, like "who put the dot inside the string."

    crazy. Rationals "put the dot after the string". No spelling corrector should change that.

  • Post Author
    Unearned5161
    Posted May 6, 2025 at 5:00 pm

    I'm always very entertained when I'm talking with someone and pick up on some very slight deviation from the "norm" in their accent. I think it shows two things: that its near impossible to totally wipe that fingerprint of a past tongue, and that our ears are incredibly adept pieces of tooling

  • Post Author
    SamBam
    Posted May 6, 2025 at 5:14 pm

    Like others recently, I've been extremely impressed by LLM's ability to play GeoGuessr, or, more generally, to geo-locate random snapshots that you give them, with what seem (to me) to be almost no context clues. (I gave ChatGPT loads of holiday snapshots, screenshotted to remove metadata, and it did amazingly.)

    I assume that, with enough training, we could get similarly accurate guesses of a person's linguistic history from their voice data.

    Obviously it would be extremely tricky for lots of people. For instance, many people think I sound English or Irish. I grew up in France to American parents who both went to Oxford and spent 15 years in England. I wouldn't be surprised, though, if a well-trained model could do much better on my accent than "you sound kinda Irish."

  • Post Author
    dgan
    Posted May 6, 2025 at 5:37 pm

    wow always wanted to know an objective measure of my Russian accent in French. I ve been living here for a long, long time and some people tell me it's impossible to recognise where i come from. i d like to put that to test

  • Post Author
    oezi
    Posted May 6, 2025 at 5:48 pm

    Did you publish that accent dataset somewhere?

  • Post Author
    ccheever
    Posted May 6, 2025 at 6:11 pm

    This is really cool.

    Just had an employee at our company start expensing BoldVoice. Being able to be understood more easily is a big deal for global remote employees.

    (Note – I am a small investor in BoldVoice)

  • Post Author
    runelohrhauge
    Posted May 6, 2025 at 6:33 pm

    This is fascinating work. Love seeing how you’re combining machine learning with practical coaching to support real accent improvement. The concept of an “accent fingerprint” is especially clever, and the visualization of progress in latent space really brings it to life. Excited to see where you take this next!

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.