What you’re reading this the result of dictation from an M2 Max Macbook Pro, to which I upgraded after some time of pretty much just realizing that I needed a new computer.

I don’t know how good it is going to be, but I have expectations that it will be a little better compared to what we have on the iPhone and other devices that we can use as part of the Apple system. So far, it seems there are some issues with the system because it recognizes some words incorrectly – partly that could be because I am pronouncing those words in a way that is not really concordant with what the algorithm is able to imitate. Hence, it ends up transcribing the wrong things, because what it hears is, in fact, something incomprehensible.

When it comes to automatic speech recognition, the algorithms that process our speech face problems when the expected input is ambiguous and can possibly match several possible outputs. The automatic speech recognition function is, after all, based on making predictions, based on the likely input that would be expected from a user over time. In the event that the prediction made by the algorithm is incorrect, the software may be penalized as a result of the input not matching the prediction. The ultimate goal is to have a scenario where every prediction generated by the model, as the person speaks, matches what the person intended to say with the least possible corrections required.

Let’s start with the voice’s pitch and ambient noise.

For an algorithm to receive the correct input, the sound it receives should match the specific waveforms used in the training dataset that guided the development of the automatic speech recognition system. If there is a deviation in the sound pattern, either due to ambient noise or tone of voice, the generated text can be inaccurate because the system receives problematic inputs in the first place. This is natural, because if you feed garbage in, you will naturally receive garbage out. There is a sort of equivalent exchange at play.

Let’s now talk about something different altogether – the device’s processing power. I’m not sure if this is a factor with the automatic speech recognition system on Apple devices, but I suspect it is. Each device needs to perform complex calculations that allow it to make predictions at a relatively high rate of 100 to 130 words per minute. If you look into the memory usage of the computer as it performs this process, you may see that the memory does not get consumed at a high degree – that is something I plan to test in the coming days.

The last possibility is that there is a problem with the algorithm itself in recognizing certain patterns of waves. There can be some variation depending on the quality of the input, but it’s also possible that the algorithm used to process the data can make mistakes on occasion. I’m confident that Apple is making strides to improve the output quality of its algorithms, and for that reason, I am optimistic about the improvements they can bring about.

I give this much thought because it is one of the most important aspects of any generative artificial intelligence system. These systems rely on good input into the algorithms, and speech recognition systems are extremely important as sources of input. As we interact with natural language on a daily basis, which is often faster than typing or pressing buttons on our devices, I believe that accurate dictation and procedural proofreading of our daily writing will lead us to a new era of AI.

I can’t wait to see what the future holds, particularly as we approach the end of 2023 with developments like iOS 17, AI generators, and all the different forms of technology that are becoming more prevalent. Time passes, and it is somewhat sad to think that we are coming closer to the end of our lives before these things come to fruition – still, the show is not over until it is over, and I can’t wait to see what is going to come!

Leave A Comment

Recommended Posts

Doc.new

Just discovered the doc.new shortcut, and it’s lifechanging.  All you do? Go to Chrome, and type in “doc.new” into the address bar, and poof – here you are, with a brand new Google Document. Why do I even know this? Because I use Google documents every day, and I like to make things just a little easier for myself so I don’t get the excuse of saying that I didn’t do things because they were too cumbersome or too difficult.  Here, I was trying to get a shortcut to create a new document and I was looking for the easiest possible way to do it – a way of enabling me to do things more easily, in more refined a fashion, in more simple a way to make things happen and develop. Docs.new is one of the most elegant things I’ve discovered this entire year, and it’s a shock that that realization came in nothing more than a single search for the shortcut and a single phrase typed into a keyboard. It makes me wonder how many other instances of this exist out there in our strange universe.

Some Thoughts on YouTube

Lately, I’ve become a lot more consistent with making YouTube content, but it’s not because of any sort of planning or anything – it’s because I’ve become a lot more stubborn, dogged, and just don’t really care as much what people think. Maybe it’s because I’ve gotten a little older now, maybe it’s because I no longer care, or maybe it was a skill issue – I won’t really know until I do my self-analysis, which I hope to do progressively as I compare my scripts to what I’ve done along the way, which I would like to do and hopefully will succeed at some time soon. Anyway, I thought this would be a fun post to think about what I’m putting out there and why, which kind of extends to the question of what I’m doing with social media anyway. But first… Why Even YouTube? YouTube to me is one of the best art forms that I have access to, and it’s one of the most enjoyable pastimes to me. It’s not even a pastime that I’m particularly good at, but it’s something that gives me meaning in a whole bunch of different ways because it’s enjoyable – something that blends together my feelings at any moment with that wish somehow to craft things for this world. You see, YouTube is about videos, and videos are an immersive experience and a recorded section of reality. The thing is (and we could go deep philosophical into this but this really isn’t the point of this blog post) videos don’t even have to be about the tangible and the everyday – they can just be selections or samplings of experiences that narrow down that experience into a single channel; a collection of moments seen, created, formed – a targeted crafting of reality that is very different from say, writing a blog post […]

Today’s Morning Reading

My morning began with the voice of David Brooks tearing down the elite class – it was a voice that I hadn’t heard for the longest of times, after procrastinating on replying a text message from someone for the longest of times. I think that it is worth a watch. Beyond what’s implied by the title of the video, Brooks discusses the evolution of merit, how the ‘elite’ was once defined and redefined in America as the Mayflower class transitioned into classes at Harvard and Stanford – how behaviors designated as desirable for our future leaders evolved through time as the generation took a turn and the world evolved. Midway through the video, I realized that I was distracted and thinking of something else – I began thinking about things that weren’t really related to what I was reading – but somehow through the pathway of internal reflections, a part of my conscience led me to read his “How The Ivy League Broke America” in its full 10871 word glory. Many thoughts went through my head at that point and still are at the moment – but Brooks expressed it better in that piece than I can, and I suggest that you read it. For what it’s worth though, here’s what I’ll say: His thoughts made me tap into an intuition that I’ve been having for a while – that intelligence isn’t really the primary determinant of life outcomes, and that there are other qualities and characteristics that I need as a person to continue pushing forward to have a fuller life, fuller existence, and everything else. Even now, my thoughts are evolving, and who I am as a person is changing – and it is fascinating to see that process take place, even if I’m not constantly watching every single detail of it – but that’s a story for another […]

Kamala Harris and the Overton Window 

When I saw the headline, my eyes widened.  “How Kamala Harris burned through 1.5 billion dollars in 15 weeks.”  I stared at my phone once, and I stared at it twice, as my disbelief grew.  Was this not NYT? Did they not just endorse Kamala with the force of an angry democratic tiger no less than two months ago? Was this real? As I thought through the implications, I saw my disbelief echoed in the comments that came along with it, the shock that filled my mind – the discomfiting revelation. Our world had transformed.  The New York Times is a paper that is unique amongst many others. First among equals in the world of newspaper journalism, its eminence has proven itself through the years and across eras as it shapes the way that the world thinks on a range of different issues, alongside its counterparts such as The Wall Street Journal, the Associated Press, and The Washington Post in the U.S., and on an international front, the BBC and Reuters in the United Kingdom, and Al Jazeera in the Middle East. There is an interesting adage that goes as follows: “When America sneezes, the rest of the world catches a cold”. To that I say, whatever the New York Times publishes, the world garnishes as the realm of acceptable discourse unfolds, an entire communicating planet paying homage to the one of the dominant media voices in the United States of America. …Which leads to my question.  What does it mean when NYT tells us about “How Kamala Harris Burnt Through $15 billion in 15 weeks”? The article I would like to write is not an article about campaign spending, and neither is it a piece to point out flaws or discrepancies in Kamala’s campaign: The first would be far too boring, and the second would land me into polemics in […]

Becoming Less Verbose

Children learn all sorts of things through lots of different ways, whether it’s school, whether it’s at home, whether from the local tutor or in the church. But you know, one of my favorite learning methods is the fight. Now I know it sounds bad and I certainly don’t mean UFC grappling and punching hijinks, but I will admit that I can be combative at times, and I’m not particularly afraid of fighting with words, which often kind of makes things worse, but then c’est la vie, it is my personality and the way that I personally get to truth one argument and one debate at a time, hopefully listening a little more than I speak but then making sure to clarify whatever I feel that we cannot say we know. I recently had a fight with someone in Mensa International, and I chose to block that person. I mean, fair dice – Mensa International, which is the main Facebook group of Mensa as an organization determined by the Mensa International Board of Directors (IBD) is the single largest gathering of Mensans, who – while I love many of them – can also be some of the most irritating people in the world. Anyway, we had a dispute about of all things, Kamala Harris, in relation to this exciting headline: …Which has attracted some rather interesting comments: This was a pretty interesting topic (which I will write about) because of how it showcases a shift in the Overton Window while at the same time showcasing media bias in a range of different ways. Anyway, this individual had commented. I’d had a conflict with her before on account of her attempting to use her background to win an argument once (a clear pet peeve of mine which I might talk about a little more later on) but thought that okay, I can’t […]

The Body is the Hardware, The Mind is the Software

The analogy was interesting when I heard it first, and it remains interesting now because it resonates with me on at least a couple of different levels. Our bodies, the physical parts of us, are basically analogous to the hardware of a computer, running along with different parts here and there – upgradable, we can improve them by increasing the quality of the resources that go into them; improvable through good maintenance, we can exercise, sleep well, and do all sorts of other things to improve the hygiene on that front. Our minds, on the other hand, are the software – the programming that decides how we interact, think, solve problems in specific situations; the algorithms and little decisions that decide how we react to different scenarios and confronting different situations, whether it comes to talking to girls, investing, selling, marketing, or doing business with others. It is nice to think that the mind is upgradeable, and that somehow you can improve yourself through an act of willpower by learning certain things. Through sitting down and unlocking the secrets of the universe one after another, through a mixture of magic and also destiny. But who’s to say exactly how that should happen? Sorry, that’s a silly question. The answer is that it’s you.