What you’re reading this the result of dictation from an M2 Max Macbook Pro, to which I upgraded after some time of pretty much just realizing that I needed a new computer.

I don’t know how good it is going to be, but I have expectations that it will be a little better compared to what we have on the iPhone and other devices that we can use as part of the Apple system. So far, it seems there are some issues with the system because it recognizes some words incorrectly – partly that could be because I am pronouncing those words in a way that is not really concordant with what the algorithm is able to imitate. Hence, it ends up transcribing the wrong things, because what it hears is, in fact, something incomprehensible.

When it comes to automatic speech recognition, the algorithms that process our speech face problems when the expected input is ambiguous and can possibly match several possible outputs. The automatic speech recognition function is, after all, based on making predictions, based on the likely input that would be expected from a user over time. In the event that the prediction made by the algorithm is incorrect, the software may be penalized as a result of the input not matching the prediction. The ultimate goal is to have a scenario where every prediction generated by the model, as the person speaks, matches what the person intended to say with the least possible corrections required.

Let’s start with the voice’s pitch and ambient noise.

For an algorithm to receive the correct input, the sound it receives should match the specific waveforms used in the training dataset that guided the development of the automatic speech recognition system. If there is a deviation in the sound pattern, either due to ambient noise or tone of voice, the generated text can be inaccurate because the system receives problematic inputs in the first place. This is natural, because if you feed garbage in, you will naturally receive garbage out. There is a sort of equivalent exchange at play.

Let’s now talk about something different altogether – the device’s processing power. I’m not sure if this is a factor with the automatic speech recognition system on Apple devices, but I suspect it is. Each device needs to perform complex calculations that allow it to make predictions at a relatively high rate of 100 to 130 words per minute. If you look into the memory usage of the computer as it performs this process, you may see that the memory does not get consumed at a high degree – that is something I plan to test in the coming days.

The last possibility is that there is a problem with the algorithm itself in recognizing certain patterns of waves. There can be some variation depending on the quality of the input, but it’s also possible that the algorithm used to process the data can make mistakes on occasion. I’m confident that Apple is making strides to improve the output quality of its algorithms, and for that reason, I am optimistic about the improvements they can bring about.

I give this much thought because it is one of the most important aspects of any generative artificial intelligence system. These systems rely on good input into the algorithms, and speech recognition systems are extremely important as sources of input. As we interact with natural language on a daily basis, which is often faster than typing or pressing buttons on our devices, I believe that accurate dictation and procedural proofreading of our daily writing will lead us to a new era of AI.

I can’t wait to see what the future holds, particularly as we approach the end of 2023 with developments like iOS 17, AI generators, and all the different forms of technology that are becoming more prevalent. Time passes, and it is somewhat sad to think that we are coming closer to the end of our lives before these things come to fruition – still, the show is not over until it is over, and I can’t wait to see what is going to come!

Leave A Comment

Recommended Posts

Assessing English Standards in Malaysia: An Analysis with the CEFR

Often in Malaysia, people talk about how our standard of English is either sufficiently good that it is the basis of a thesis for investment, or they say that our English is abysmal and needs drastically to be improved – discussions go on and on, and people fight, oftentimes in what seems like a battle for the soul of our society. But what does it mean, actually, that our English is good or our English is bad? Some say that Malaysia aligns itself to international standards in creating its curricula, but others squabble day in and day out, constantly complaining about the quality of English amongst graduates who come into the workforce, observing that many of them lack basic skills that they would expect graduates to have. How can it be possible that Malaysia calibrates itself to international standards while at the same time its graduates languish in terms of their English language proficiency? But at the end of the day, who’s right?  As it turns out, investigating a little further tells us that the answer is both. Here’s where the subject of our blog post for today comes in: The Common European Framework for Reference, otherwise known as the CEFR.  The reason that I’m making this comparison today and telling you about CEFR is that Malaysia uses it to calibrate SPM writing standards.  The CEFR operationalizes language proficiency in accordance with six dimensions, from A1 up until C2. It is an international standard that is utilized by examining bodies across the world in order to designate proficiency levels and descriptors that students attain after courses of study, and it is used also in designing curricula so that students can reach a certain defined standard. Source: https://www.cambridgeenglish.org/Images/126130-cefr-diagram.pdf CEFR operationalizes English proficiency according to numerous level descriptors, providing explanations of what a user of the English language should be able to do […]

On Facing Judgment’s Shadow

Picture this. You’ve written a post that you want to share on social media, or you’ve made a video somehow or another. You’re sitting there on the edge of your chair, just about to click post, but you look at what you’ve written, what you’ve made, you notice that final error, you question yourself, thinking about the manifold ways in which people could be judging you silently and from afar, contemplating in your mind’s eye the dialogues that must be taking place. “Oh my gosh, this person wrote this?” “Oh my gosh. Did he really make that grammatical error?” “Wow, this is boring. Why am I even watching?”“How could he make such a video?” And so the thoughts come out, percolate like coffee through filter paper, and eventually crystallize into little gems of self-doubt, blocking the nervous signal that would have caused you to click. You turn away from your plan, and you declare:”Maybe later, but not now.” Before you know it, the entire project is abandoned. If you’re anything like me, you may have faced this, this feeling of wanting to do something, but realizing, or at least thinking that you weren’t good enough, that the manifold imperfections that existed in you would come out, and that people would judge you one way or another. Well, here’s a fact, though. People certainly will judge you. I mean, how could they not? Everyone encounters something as a first glance, thinks about it, and evaluates it on their own terms. That’s just how it is. The judgment will happen. There is absolutely nothing that you can do about it, and your feeling certainly is right. The thing is though, that even though it is right, this isn’t a valid reason to run away. Because truly, the only way to get past it is to face your fears, to accept judgements as they […]

Work, Life, World.

One of the things I’ve come to really appreciate learning in college is the ability to just work, and work, and work. Sometimes I feel like my work ethic is just unending, and I can continue doing everything that I’m supposed to do, just efficiently and continually, almost like a machine.  I start, I take breaks specifically for the purpose of making sure that my work efficiency is maintained, then I begin again. The cycle continues, and life proceeds just in that way.  It’s one of the miracles that has come about, I think, from being trained in an environment where people were constantly working hard, and a place where people would not just work hard, but also have the right motivation, initiative, and desire for it.  Somehow or another, it influenced me, and rubbed off on me. And before I knew it, I was one of those workaholics out there, just casually pulling long hours without even questioning things, completely by my own volition.  It sounds bad, but I’ve come to appreciate that part of myself quite a bit.  It’s one of the many reasons that I respect myself, and appreciate the person that I’ve become. Because it’s become an enduring facet of my personality, and something that I know that I can look to whenever I think about my identity as a person, but at the same time, I think it’s come with at least two different disadvantages.  The first of these is that when I think about work, I just continue on and on. It takes hold like a vice grip, consuming almost every single aspect of my mind and my thoughts.  The result?  I just carry on doing things in the way that I feel most natural. Ignoring different things, socializing, hanging out, spending time with people, messaging…The list goes on, and I don’t know how many […]

My cardinal sin

If there is a sin of which I can unreservedly declare my guilt, it is the sin of over-complicating things. As a thinker, I tend to plan very extensively. I consider different things here and there and everywhere, and this manifests itself very often in just thinking about different factors and considerations that should go into a final plan. At the same time though, this manifests very often in overly complicated content, which in turn highlights another possible problem.  What if the fundamental problem isn’t actually over-complication, but instead the wrong way of planning, or planning in such a way that it doesn’t serve the goal at the end of the day?  If one simply were to distill things down to the simplest of all possible elements and in turn just use those things effectively, wouldn’t that create a transformation?  Wouldn’t that make things significantly easier at the end of the day?  This is something that I’ve noticed about myself, and it’s something that I would want to resolve in the days ahead. But what I’ve come to recognize and see is that simplicity requires effort. Completion requires work. Shortening requires sacrifice, judgment, and discernment. It’s not something so easy to develop.  One might say that it’s a talent, but in reality, it’s probably best considered as a skill. One to be honed over many successive days, months, years of practice that go into the development and creation of something that somehow ends up working out. As with all skills though, it’s not something that one can immediately receive just from having done it one single time.  Rather, one must do everything to make sure that it is perfect, tailored.  And although it’s possible to practice for many years, it may be that the person does not reach the destination. What is required is the correct planning, the correct noticing of […]

Automated Speech Recognition

Once in a while, a technology comes along that just completely transforms the way that we think, we live, and we experience the entire world.  ​​Certainly the entire world has been captivated by the rise of AI in recent days – how could it not, when millions of influencers around the world endeavor on a day to day basis to showcase the 500th AI tool that you ABSOLUTELY NEED TO USE on a day to day basis? Well, I don’t know much about technologies beyond ChatGPT, to be honest, but there is definitely one thing that has come out from it, which is probably the feature that I use the very most out of pretty much everything on the planet, and that is automated speech recognition, specifically, the OpenAI ChatGPT Whisper ASR Recognition System. Automated speech recognition is how I’m communicating everything here to you. It is how I’m putting down my thoughts, word by word, by simply sitting down next to this open door on a rainy morning, narrating out the story as if I were talking to you.  The Automated Speech Recognition Algorithm, which is in this case the ChatGPT Whisper app, is transcribing everything that I’m saying with an almost perfect accuracy, but perhaps with some small issues with punctuation that I will fix after the fact. It is incredible, tremendously accurate, and something that I could have never imagined just three to four months ago. As a result of this technology, as you read, you’re actually listening, in a sense, to what I said on that morning when the air was cool and the rain was falling, it was 7.49am in the morning, and 48 seconds had passed on the clock.  As I narrated these words then, eventually, the clock turned to 7.50, indicating a shift in time.  I made a mental note to myself at that […]

Insomnia

Insomnia is a horrible curse, but it’s one of those reminders out there that sleep is one of those things that we can rightly call a blessing and that ultimately, the greatest things within life come from within and not from without, from the simple acceptance of the struggles around us, rather than just continually trying to force things to happen in the way that we want. Nowhere is this more clearly represented than in the domain of sleep. The simple act of just closing your eyes and having yourself fall asleep is one of the most beautiful and restorative things in the entire world, and when it’s taken away from you, it reminds you at the end of the day that one of the greatest joys in this world is one that is inbuilt, one that lies within you, one that has the ability to transform you if only you will accept it, rather than challenging it by adulterating it with things like caffeine and other sorts of things out there that disrupt the balance, change the equation, and otherwise impede you from doing what is natural to you. That the body can rest is something that we assume by default should happen, but in reality it’s one of the greatest luxuries upon the planet yet one of the easiest to corrupt the moment, we allow our minds and our psychologies to take over and to take on the driver’s seat, during which we begin to push ourselves to go for more than what is natural, to push ourselves beyond to do certain things when in reality they are neither needed nor desirable for our ultimate furtherance. Often we sleep late just because we are worried about those things, about all the different matters that will not be completed if we simply do not fall asleep, yet ironically enough that […]