USApps and More

Over the last week, I had the chance to drop by USApps, the event that Chen Chow began many years ago and that caters to the needs of Malaysian students all around my country who want to, for some reason or another, pursue higher education in the States.

I too benefited from this event many years ago, and it was a joy to come back for the second time in two years, an opportunity that I would not have had if not for Hamdi Hakimi, who randomly called me out of nowhere and asked me to speak at last year’s USApps event.

It was strange then, and it remains strange now, that things happened the way that they did, but suffice to say, I think it was all for the better.

Was very grateful to speak to many interesting people along the way and started a small new project in which I interview different people just in general about education and their reasons for valuing it, which has brought me into contact with people from lots of different universities, and all of uniformly high and boundary-breaking intelligence.

Beyond that though, it’s opened up an interesting new vista of… well, I don’t really think I should call this opportunity, but rather just interest in pursuing a course that I just find interesting and meaningful.

It is my pleasure to welcome you to watch the interviews that I have been doing in the hope that they will be interesting and enlightening to you. Have a look here.


– V

Whirlwind

My mind constantly catches itself entangled in various thoughts, leading to frequent distractions. My thoughts whirl around like a tornado, often appearing out of control, beyond my reach. However, in reality, many things are within grasp. Sometimes, I believe I just need to pause, lie down, and allow my brain to enter a catatonic state, much like Nao. 

As I lie on my bed, eyes closed and heart open to the world, I find myself pondering the future. I question why things happen the way they do and often find no answers. I would be misleading if I claimed to understand every step of the journey that unfolds, but somehow, things have always worked out in the past. 

It would be equally inaccurate to predict a smooth journey in the future or even the present, based on past success. Yet, I find myself in a better situation now than I was before. Despite the complexities of life, I see no reason to believe that the issues I face are unfixable. 

Striving forward seems necessary, even when it feels peculiar. As I rest my weary body, I realize that life is like a Newton’s cradle, a constant state of motion and rest. This cycle repeats until one day, everything comes to an end. The inevitable entropy of the universe expresses itself through the cessation of my bodily functions. This mystifying end remains beyond my comprehension, except through literature, art, and history, which paint a tantalizing image in my imagination.

As I gaze at the screen that has been my silent confidant, I am pleasantly surprised. The screen effortlessly transcribes the words I’ve been uttering for the past few minutes, revealing that technology is progressing at a speed I hadn’t anticipated. My vision of a time when we could converse with our devices seems to be materializing. These devices are beginning to power our lives in ways currently beyond our comprehension. 

I have no concept of what the future holds, nor any predictions. How could I possibly foresee what’s to come when things are moving as rapidly as my thoughts? The reality we’re transitioning into is something I could not have fathomed just a year or two ago. There are countless things to look forward to, endless unique possibilities, some of which I hope for, others I find unlikely or impractical. Yet, everything seems inevitable as we move forward, and the intricate pieces of a grand puzzle, too vast for our full appreciation, begin to fall into place

Some thoughts about ASR

What you’re reading this the result of dictation from an M2 Max Macbook Pro, to which I upgraded after some time of pretty much just realizing that I needed a new computer.

I don’t know how good it is going to be, but I have expectations that it will be a little better compared to what we have on the iPhone and other devices that we can use as part of the Apple system. So far, it seems there are some issues with the system because it recognizes some words incorrectly – partly that could be because I am pronouncing those words in a way that is not really concordant with what the algorithm is able to imitate. Hence, it ends up transcribing the wrong things, because what it hears is, in fact, something incomprehensible.

When it comes to automatic speech recognition, the algorithms that process our speech face problems when the expected input is ambiguous and can possibly match several possible outputs. The automatic speech recognition function is, after all, based on making predictions, based on the likely input that would be expected from a user over time. In the event that the prediction made by the algorithm is incorrect, the software may be penalized as a result of the input not matching the prediction. The ultimate goal is to have a scenario where every prediction generated by the model, as the person speaks, matches what the person intended to say with the least possible corrections required.

Let’s start with the voice’s pitch and ambient noise.

For an algorithm to receive the correct input, the sound it receives should match the specific waveforms used in the training dataset that guided the development of the automatic speech recognition system. If there is a deviation in the sound pattern, either due to ambient noise or tone of voice, the generated text can be inaccurate because the system receives problematic inputs in the first place. This is natural, because if you feed garbage in, you will naturally receive garbage out. There is a sort of equivalent exchange at play.

Let’s now talk about something different altogether – the device’s processing power. I’m not sure if this is a factor with the automatic speech recognition system on Apple devices, but I suspect it is. Each device needs to perform complex calculations that allow it to make predictions at a relatively high rate of 100 to 130 words per minute. If you look into the memory usage of the computer as it performs this process, you may see that the memory does not get consumed at a high degree – that is something I plan to test in the coming days.

The last possibility is that there is a problem with the algorithm itself in recognizing certain patterns of waves. There can be some variation depending on the quality of the input, but it’s also possible that the algorithm used to process the data can make mistakes on occasion. I’m confident that Apple is making strides to improve the output quality of its algorithms, and for that reason, I am optimistic about the improvements they can bring about.

I give this much thought because it is one of the most important aspects of any generative artificial intelligence system. These systems rely on good input into the algorithms, and speech recognition systems are extremely important as sources of input. As we interact with natural language on a daily basis, which is often faster than typing or pressing buttons on our devices, I believe that accurate dictation and procedural proofreading of our daily writing will lead us to a new era of AI.

I can’t wait to see what the future holds, particularly as we approach the end of 2023 with developments like iOS 17, AI generators, and all the different forms of technology that are becoming more prevalent. Time passes, and it is somewhat sad to think that we are coming closer to the end of our lives before these things come to fruition – still, the show is not over until it is over, and I can’t wait to see what is going to come!