Some thoughts about ASR

Victor TanJuly 30, 2023Post a Comment

What you’re reading this the result of dictation from an M2 Max Macbook Pro, to which I upgraded after some time of pretty much just realizing that I needed a new computer.

I don’t know how good it is going to be, but I have expectations that it will be a little better compared to what we have on the iPhone and other devices that we can use as part of the Apple system. So far, it seems there are some issues with the system because it recognizes some words incorrectly – partly that could be because I am pronouncing those words in a way that is not really concordant with what the algorithm is able to imitate. Hence, it ends up transcribing the wrong things, because what it hears is, in fact, something incomprehensible.

When it comes to automatic speech recognition, the algorithms that process our speech face problems when the expected input is ambiguous and can possibly match several possible outputs. The automatic speech recognition function is, after all, based on making predictions, based on the likely input that would be expected from a user over time. In the event that the prediction made by the algorithm is incorrect, the software may be penalized as a result of the input not matching the prediction. The ultimate goal is to have a scenario where every prediction generated by the model, as the person speaks, matches what the person intended to say with the least possible corrections required.

Let’s start with the voice’s pitch and ambient noise.

For an algorithm to receive the correct input, the sound it receives should match the specific waveforms used in the training dataset that guided the development of the automatic speech recognition system. If there is a deviation in the sound pattern, either due to ambient noise or tone of voice, the generated text can be inaccurate because the system receives problematic inputs in the first place. This is natural, because if you feed garbage in, you will naturally receive garbage out. There is a sort of equivalent exchange at play.

Let’s now talk about something different altogether – the device’s processing power. I’m not sure if this is a factor with the automatic speech recognition system on Apple devices, but I suspect it is. Each device needs to perform complex calculations that allow it to make predictions at a relatively high rate of 100 to 130 words per minute. If you look into the memory usage of the computer as it performs this process, you may see that the memory does not get consumed at a high degree – that is something I plan to test in the coming days.

The last possibility is that there is a problem with the algorithm itself in recognizing certain patterns of waves. There can be some variation depending on the quality of the input, but it’s also possible that the algorithm used to process the data can make mistakes on occasion. I’m confident that Apple is making strides to improve the output quality of its algorithms, and for that reason, I am optimistic about the improvements they can bring about.

I give this much thought because it is one of the most important aspects of any generative artificial intelligence system. These systems rely on good input into the algorithms, and speech recognition systems are extremely important as sources of input. As we interact with natural language on a daily basis, which is often faster than typing or pressing buttons on our devices, I believe that accurate dictation and procedural proofreading of our daily writing will lead us to a new era of AI.

I can’t wait to see what the future holds, particularly as we approach the end of 2023 with developments like iOS 17, AI generators, and all the different forms of technology that are becoming more prevalent. Time passes, and it is somewhat sad to think that we are coming closer to the end of our lives before these things come to fruition – still, the show is not over until it is over, and I can’t wait to see what is going to come!

One Date, Two Destinies: A Book Release

Hello, everyone! Happy Malaysia Day to all of you who are from Malaysia! On another and related note, those of you who know me probably know that I am a big fan of Lee Kuan Yew. Well, that’s a bit of a small understatement. I mean, it would have to […]

September 16, 2025February 20, 2026

Location of Culture?

I’m reading “The Location of Culture” by Homi K. Bhabha, and frankly, it is a hot mess. Either that, or I am the hot mess, and I don’t know what’s actually happening. It was so bad I didn’t understand so much of it that at some point I actually messaged […]

June 25, 2026

The Privilege to Not Care

There’s a specific neoliberal idea that is very common in online content creation, and it’s one that online gurus tend to repeat one after another as if it were coordinated and they were all given the same script: “Post whatever it is that you want. Nobody is looking, nobody cares. […]

June 25, 2026June 24, 2026

Hi it’s me again.

I finished Orientalism today, and it was truly unexpected that I would have. I was plodding along with the book as usual, maybe spending just a little bit of extra time. As I reached page 323 out of 378, fully expecting that the whole project would last much longer, I […]

June 24, 2026

Diary Entryesque – Attempt #1

I really want to start treating this like a diary again. I have failed before, but hopefully this is a small success. If it does end up as a failure, you will notice because it won’t update. I will try. Failure is normal, it is expected, and I probably will […]

June 23, 2026June 23, 2026

Making It

As I was practicing the cello this morning, I remember feeling a big sense of frustration that I wasn’t hitting the notes properly. In that moment, I caught myself in an interesting thought: “Shit, Timmy is going to scold me later.” I thought about how my fingers were clumsy and […]

June 11, 2026June 11, 2026

Human in an AI Age.

As Claude Fable is released into the world, I think we have hit a critical point that has made me reflect and recognize that being a human in the AI age… Is just flat out weird. Look at the world around us now. You open emails, and they’re written with […]

June 10, 2026

Victor Tan

Tags

Victor Tan

Some thoughts about ASR

Leave A Comment Cancel reply

Search Here ….

Tags

Some thoughts about ASR

Leave A Comment Cancel reply

Recommended Posts