What you’re reading this the result of dictation from an M2 Max Macbook Pro, to which I upgraded after some time of pretty much just realizing that I needed a new computer.

I don’t know how good it is going to be, but I have expectations that it will be a little better compared to what we have on the iPhone and other devices that we can use as part of the Apple system. So far, it seems there are some issues with the system because it recognizes some words incorrectly – partly that could be because I am pronouncing those words in a way that is not really concordant with what the algorithm is able to imitate. Hence, it ends up transcribing the wrong things, because what it hears is, in fact, something incomprehensible.

When it comes to automatic speech recognition, the algorithms that process our speech face problems when the expected input is ambiguous and can possibly match several possible outputs. The automatic speech recognition function is, after all, based on making predictions, based on the likely input that would be expected from a user over time. In the event that the prediction made by the algorithm is incorrect, the software may be penalized as a result of the input not matching the prediction. The ultimate goal is to have a scenario where every prediction generated by the model, as the person speaks, matches what the person intended to say with the least possible corrections required.

Let’s start with the voice’s pitch and ambient noise.

For an algorithm to receive the correct input, the sound it receives should match the specific waveforms used in the training dataset that guided the development of the automatic speech recognition system. If there is a deviation in the sound pattern, either due to ambient noise or tone of voice, the generated text can be inaccurate because the system receives problematic inputs in the first place. This is natural, because if you feed garbage in, you will naturally receive garbage out. There is a sort of equivalent exchange at play.

Let’s now talk about something different altogether – the device’s processing power. I’m not sure if this is a factor with the automatic speech recognition system on Apple devices, but I suspect it is. Each device needs to perform complex calculations that allow it to make predictions at a relatively high rate of 100 to 130 words per minute. If you look into the memory usage of the computer as it performs this process, you may see that the memory does not get consumed at a high degree – that is something I plan to test in the coming days.

The last possibility is that there is a problem with the algorithm itself in recognizing certain patterns of waves. There can be some variation depending on the quality of the input, but it’s also possible that the algorithm used to process the data can make mistakes on occasion. I’m confident that Apple is making strides to improve the output quality of its algorithms, and for that reason, I am optimistic about the improvements they can bring about.

I give this much thought because it is one of the most important aspects of any generative artificial intelligence system. These systems rely on good input into the algorithms, and speech recognition systems are extremely important as sources of input. As we interact with natural language on a daily basis, which is often faster than typing or pressing buttons on our devices, I believe that accurate dictation and procedural proofreading of our daily writing will lead us to a new era of AI.

I can’t wait to see what the future holds, particularly as we approach the end of 2023 with developments like iOS 17, AI generators, and all the different forms of technology that are becoming more prevalent. Time passes, and it is somewhat sad to think that we are coming closer to the end of our lives before these things come to fruition – still, the show is not over until it is over, and I can’t wait to see what is going to come!

Leave A Comment

Recommended Posts

What I Would Do Differently From The Madani Government (In Managing Speech Online)

As some of you may know, I have recently been making a range of videos about topics that I think are important for Malaysia to discuss, namely the 3 R’s. Recently, the user ​⁠@coldsunflares asked me on my YouTube channel and my video about the penunggang agama Rayyan Wong who recently accused PMX and our Agong of eating in a non-halal restaurant about what I would do differently from the Madani Government when it comes to regulating what some may call extremism or penunggang agama.  It was quite a thoughtful comment, and I reproduce it here.  “You mentioned the government’s inability to deal with these kinds of issues, which for the most part, is true. However, how would you propose they deal with it? Because any time the government decides to take these so-called “decisive action”, they are labelled as “draconian, stifling freedom of speech” among other things. On one hand, the government is hard pressed to take these measure because of their history of championing reforms, equality and civil liberty, but on the other are those “from the other side” who hides behind the guise of freedom of speech (without decorum) to spread malicious statements, as is evident from multiple recent incidents, i.e. China flag issue, mandatory Halal cert, etc. We are bursting at the seams with people who point out the problem, but not so much people who can come up with a feasible solution to these issues.” The comment I wrote was too long for the margins of the comment window, and after I had written it I realized – it was too long even for the YouTube post window, so here it is in full blog entry glory.  Response begins:  I think even now, the Madani government is having huge problems with actually portraying itself as a compassionate government – but I feel that this is because […]

My Wrong Assumptions About Destiny and Getting Old

As Reinhold Neibuhr once famously said… I reflect on this quote a lot more than I should, and every single year it means something slightly different. I rather like my interpretation this year and the thoughts that have come out from it, and so I share them here. When I was a child, I had a whole list of ideas of what people must be like as they grew older. Older people were richer because the universe made them so – they were married because their partners were brought into their lives; they were fatter because a divine ordinance made their bodies expand; things happened automatically because they were simply ‘meant to be’. I now see that a lot of this was wrong-headed, and came about because of intellectual laziness that I no longer consider valid. As time passed, I saw that things were not so simple. People became rich because they worked for it either hard or smart – they got married because they had relationships with people, romantic and then sexual, that they decided to make into family ties; they were fatter because they were often sedentary as part of a modern condition; things could happen because of chance, but in all likelihood people could steer the ship far more effectively than they could give themselves credit for but even then lose themselves in the comforting soma of a ‘fate’ narrative. Well, comfort is a beautiful thing. In some instances, it’s even necessary. After all, there are lots of things in this world where what you believe and what I believe are opposed, but circumstances are uncertain and neither of us might be right – in this situation, how should we think and navigate the world? It would be easy for one person to conclude that well, because fate is a thing, it doesn’t matter what we do – […]

Letting Go Of Presumptions

There’s a very liberating feeling that comes about when a person lets go of all the things that they felt used to hold them back – a sense that maybe things are easier to do, a feeling that nobody is restraining them. That’s definitely how I’ve felt about making content recently, even as I make things that not everyone may agree with or things that people may feel are controversial. Some people say that it’s dangerous, and maybe that’s true, but the way I think about my content is that I should make content that is true to myself, to what I believe in, and what I’m fighting for – and that if there is a social aspect to what I do and choose to create, it is that it should reshape society in an image that I want it to be reshaped in. I find it odd that I didn’t use to think this way – that somehow or another it always felt difficult to say what I truly wanted to say, that my voice was somehow caught inside a metaphorical throat filled with narrow passageways and constant blockages, refusing to allow what came from within to be expressed. Moving ahead seems a little easier now, and it is something that I will do. Looking forward to sharing more with the world soon 🙂

Making Every Minute Worth It

Lately, I’ve been thinking a lot about how time is finite. The moments that we have on earth, the memories that we have, the seconds that flow by… Everything is finite. You think that the moments will roll and everything will come and go infinitely – but it’s not true; all of it is part of a set of flowing sands flowing through glass crevices into a pile that lies down below, and whether we like it or not, these moments will one day all fade away as we hit inescapable limits, bound by biology, time, and energy. We have all the reason to make every minute worth it. Every ounce of energy earn something. Every part of our minds, our cognitions, our planning yield some sort of meaningful and measurable benefit to our happiness, our joy, our wallets, and everything in between. As the year comes to an end, it’s strange to see – my energy has multiplied, my peace has come closer, and I am moving forward faster than I ever have, with so little compunction or fear that it’s interesting to watch someone who seems to be of a different body and mind than the person who had been here before. There are many good things that I feel about who I am and who I will become, and I look forward to seeing where things will go 🙂

Doc.new

Just discovered the doc.new shortcut, and it’s lifechanging.  All you do? Go to Chrome, and type in “doc.new” into the address bar, and poof – here you are, with a brand new Google Document. Why do I even know this? Because I use Google documents every day, and I like to make things just a little easier for myself so I don’t get the excuse of saying that I didn’t do things because they were too cumbersome or too difficult.  Here, I was trying to get a shortcut to create a new document and I was looking for the easiest possible way to do it – a way of enabling me to do things more easily, in more refined a fashion, in more simple a way to make things happen and develop. Docs.new is one of the most elegant things I’ve discovered this entire year, and it’s a shock that that realization came in nothing more than a single search for the shortcut and a single phrase typed into a keyboard. It makes me wonder how many other instances of this exist out there in our strange universe.

Some Thoughts on YouTube

Lately, I’ve become a lot more consistent with making YouTube content, but it’s not because of any sort of planning or anything – it’s because I’ve become a lot more stubborn, dogged, and just don’t really care as much what people think. Maybe it’s because I’ve gotten a little older now, maybe it’s because I no longer care, or maybe it was a skill issue – I won’t really know until I do my self-analysis, which I hope to do progressively as I compare my scripts to what I’ve done along the way, which I would like to do and hopefully will succeed at some time soon. Anyway, I thought this would be a fun post to think about what I’m putting out there and why, which kind of extends to the question of what I’m doing with social media anyway. But first… Why Even YouTube? YouTube to me is one of the best art forms that I have access to, and it’s one of the most enjoyable pastimes to me. It’s not even a pastime that I’m particularly good at, but it’s something that gives me meaning in a whole bunch of different ways because it’s enjoyable – something that blends together my feelings at any moment with that wish somehow to craft things for this world. You see, YouTube is about videos, and videos are an immersive experience and a recorded section of reality. The thing is (and we could go deep philosophical into this but this really isn’t the point of this blog post) videos don’t even have to be about the tangible and the everyday – they can just be selections or samplings of experiences that narrow down that experience into a single channel; a collection of moments seen, created, formed – a targeted crafting of reality that is very different from say, writing a blog post […]