Automated Speech Recognition

Victor TanOctober 18, 2023October 18, 2023Post a Comment

Once in a while, a technology comes along that just completely transforms the way that we think, we live, and we experience the entire world.

Certainly the entire world has been captivated by the rise of AI in recent days – how could it not, when millions of influencers around the world endeavor on a day to day basis to showcase the 500th AI tool that you ABSOLUTELY NEED TO USE on a day to day basis?

Well, I don’t know much about technologies beyond ChatGPT, to be honest, but there is definitely one thing that has come out from it, which is probably the feature that I use the very most out of pretty much everything on the planet, and that is automated speech recognition, specifically, the OpenAI ChatGPT Whisper ASR Recognition System.

Automated speech recognition is how I’m communicating everything here to you. It is how I’m putting down my thoughts, word by word, by simply sitting down next to this open door on a rainy morning, narrating out the story as if I were talking to you.

The Automated Speech Recognition Algorithm, which is in this case the ChatGPT Whisper app, is transcribing everything that I’m saying with an almost perfect accuracy, but perhaps with some small issues with punctuation that I will fix after the fact. It is incredible, tremendously accurate, and something that I could have never imagined just three to four months ago.

As a result of this technology, as you read, you’re actually listening, in a sense, to what I said on that morning when the air was cool and the rain was falling, it was 7.49am in the morning, and 48 seconds had passed on the clock.

As I narrated these words then, eventually, the clock turned to 7.50, indicating a shift in time.

I made a mental note to myself at that time that I would look at the total number of words that had transpired during this time, because it bears a significant meaning, which I would like to elaborate upon.

Automated speech recognition is wonderful for me. It has done some of the following things:

Dramatically sped up my rate of interactions,
Reduced the strain on my body
Given me extensive practice in public speaking and articulation.

Let me go into all of these one after another.

Dramatically sped up my rate of interactions.

Every form of communication has its idiosyncrasies, and can be considered a skill in its own right.

In terms of speed, handwriting is the slowest, clocking in at 12-20 words per minute.

Typing comes next at around 40-80 words per minute, depending on the typist, with some people going far above that, assuming they’ve had professional training.

And finally, speaking clocks in at around 130-150 words per minute.

The clear corollary of all this, I think, is that if a person adopts speaking as their dominant mode of communication, that they will be able to get things done at a much faster rate than they otherwise would be able to by texting or writing to others. In fact, this is the reason why communicating by phone or meeting in person can be so much more efficient relative to just sending out messages and waiting for email conversations to proceed.

As a user of automated speech recognition technology, I get to take advantage of the fact that I can speak quickly in order to create documents, which in turn helps me to very rapidly think of different things.

In a sense, I am constantly on my toes and crafting different ways of dealing with problems, for the simple reason that I can now deal with more of them within a smaller amount of time than I used to.

Rather than taking, say, 5-10 minutes to reply a text message, as I did before, I can now simply speak out the contents of what I want to say to others, very simply and very easily, without really thinking too much about typing down all the words, which itself is a long exercise.

This has allowed me to take many more opportunities within shorter periods of time, and in turn to try faster and more frequently. This, for me, has been a game-changer in many different ways, and the consequences are something that I have yet to even fully understand, although they will need to be accompanied by developments in planning in the days to come.

Reduced the strain on my body

Texting is physically strenuous.

It might not initially seem so, but it absolutely is, because whether you’re typing on a computer keyboard or on a phone, what is happening is that you are actuating your fingers and joints to hit keys over and over again for the purpose of communication, which requires you to move your fingers around in such a way that you can create the desired pattern of output on the screen.

Having said that, these implications alone are far from the only problems that one could associate with texting for long periods of time. Here’s a helpful list created by ChatGPT.

Using automated speech recognition can reduce repetitive strain by reducing the repetitive movements associated with typing and even text claw, which is something that I discovered when I began using these technologies after a long period of time in which I had begun facing finger and wrist pain from texting too much and making use of devices too extensively.

This has been a game changer for me because I was in so much pain on some days that I found it difficult to type but found it necessary to continue typing anyway.

Being able to address this problem was truly incredible because it opened up possibilities of communicating without a situation of pain. It’s also worthwhile to note that typing via automated speech recognition allows a person to communicate with better posture and under more relaxed circumstances. Even as we speak right now, I am casually narrating all of this while sitting down on my secret lab chair and leaning back with my feet on the gate in front of me. Just communicating everything that I intend to say in a relatively free manner and dramatically faster than I otherwise would have been able to just a short while ago. This helps to prevent a variety of different problems associated with texting or writing which include text neck which occurs when a person’s neck is hunched over as they look at a device. And also the postural problems associated with maintaining one’s eyes upon a device in an attempt to look at the words that are being produced on a document. I am simply at the moment just holding my phone in my left hand watching the transcription seeing if it is going out properly and everything is just coming out easily. Even right now, for that matter, I am witnessing other benefits such as reduced eye strain. My eyes are closed as I am narrating all of this and it can seem as if I am speaking to myself but that is not exactly the case. Still, what is real though is that I am able to perform this entire task without looking at my phone screen even for a single moment which allows me in turn to go right ahead and just type out everything without fear or favour. It’s also worthwhile to note that this benefit offers significant advantages in accessibility to anyone out there who needs such access. It’s allowing a person to potentially communicate at an extremely quick rate even if they happen to suffer from a disability that would otherwise impede them from performing this type of communication. It’s also worthwhile to note that this allows for multitasking and allows me in turn to do different things and to look around me as well. Positioning my focus between different things rather than just looking at the screen and having my entire attention focus on the process of creating a single document. Which in turn leads into a lower cognitive load overall and in turn into a very natural communicative aspect which is manifested in the words that I am saying at this point in time.

Given me practice in public speaking and articulation.

Using an ASR system is a truly unique experience.

It’s an experience that involves speaking to a device, which in turn involves thinking about what you’re going to say, thinking about how it’s going to come out, and arguably thinking more intuitively about what a listener on the other side might actually be hearing, feeling, or imagining.

It’s not a complete substitute for speaking to an actual audience, or to actual people, of course, but the very act of articulating things through speech itself gives a person significant practice in understanding how to develop their manner of speech, the cadences of their voice, the structure of their thoughts, and the rises and falls of emotion along the way.

This is tremendously good practice, I think, for situations in which a person might, at a later point, communicate via speech particularly as one can do it in a relatively relaxed manner, as I mentioned in the previous few points, while at the same time communicating at a much faster rate than they otherwise would if they were simply to go ahead and type things out.

This type of practice, affirmed constantly and experimented with over time, is something that has dramatically improved my personal speaking skills. It has made me more articulate, not only because I have had to think about what to write about, and because I do so much more frequently now, but also because it constantly keeps me on my toes, forcing me in various ways to source things from my imagination and my thoughts in order to put them upon the page, which in turn reinforces a continual cycle of thought retrieval, building, structuring, and articulation that leads itself into a reinforcing cycle that develops, or at least I feel has developed, my process of thought formation in many different ways and will continue to be tremendously useful over the course of time for practice purposes, creation purposes, and in turn preparing me to speak on progressively larger stages in the days to come.

Using an ASR system is a very unique experience. It certainly is a brand new technology. But at the same time, it is something that allows a person to engage with his or her human abilities on a level that I have never truly encountered before, and that stands as unique to me within the history of humanity.

Concluding thoughts

I spoke extensively about the ways in which using ChatGPT’s Whisper ASR system has dramatically sped up my rate of interactions, reduced the strain on my body, and given me practice in public speaking and articulation. And I cannot emphasize more that this has been transformative, to say the very least.

The last time I checked the word count of the piece, and prior to saying these words, it was already above 1,800, and I had started this project at 8am in the morning, which testifies to just how quick it is. I conducted the entire thing without straining my neck in any way, and in fact, in an ergonomically comfortable position, either while sitting down on a chair in a reclined position, or while standing up and just casually carrying my phone around, reducing the possibility of any incidence of text neck, and completely resolving the problem associated with text claw and repetitive strain injuries.

Along the way, after having experienced both of these incredible benefits, I received some very extensive practice in public speaking and articulation, which admittedly was directed towards this device, but at the same time was also directed towards everyone who was capable of hearing me within a certain range. This in and of itself has been truly incredible, and the process of writing this piece has been a wonderful practice session. If it is not clear to those out there who haven’t used this before, I truly consider this to be a transformative technology, and one that has catalyzed a sea change within my own personal life.

ChatGPT continues to hold the throne, of course, for technologies that have enabled the possibility of seemingly reasoning, AI systems that are capable of creating outputs that shock us and that, even now, I am continually learning from. In fact, it even houses the technology that has made it possible for me to make use of what I am making use of at the moment, possibly training its systems on the type of communication that I have chosen to initiate. Perhaps OpenAI’s engineers will keep track of this entire speech or conversation that I have initiated and that have in turn released onto their servers, but that for me is not ultimately a matter of concern, because I do believe in the idea that if one’s thoughts are sound and otherwise valuable, that they should be shared with the world anyway. As a matter of individual and collective responsibility, whether these thoughts are, of course, worthwhile, desirable, and may lead to a causal and beneficial impact upon the world, of course, is a matter of contention somehow or another. But one that I believe is being continually refined and created through the development of these technologies themselves. Of course, a person should repose self-awareness in the extent to which they truly are able to contribute, and should not overstate or over-inflate the extent of their capabilities. For what I can say, however, is that it feels a tremendous privilege to live in this day and age, and to be able to make use of something that has had such a profound impact on our ability to interact, to utilise our cognition, and to create in turn. It may seem like something trivial or otherwise small in the grand scheme of things, but this for me has been truly profound, and it is one of the many things that I cite and will continue to cite as my rationale for undertaking a journey of constant self-improvement as I move forward into the future.

Thank you for reading (or was it listening?) and I will see you in the next piece.

The Future of Writing: How Automated Speech Recognition Will Transform Your Writing

In the modern world, creating documents is a fundamental part of almost every profession, and they are key to a variety of tasks: communicating ideas, formalizing agreements, sharing information, reporting progress, instructing, or preserving records. Whether you’re working in business, education, healthcare, law, or creative fields, the need to produce written content is pretty much everywhere — whether you’re creating a PowerPoint deck, writing a report, or crafting a script either for a YouTube video or for an episode of a TV show, you’ll definitely have to sit down and begin writing to bring out your ideas. But as you may know, writing isn’t always easy. In fact, it can be tedious and painstaking, imposing challenges upon your body that are difficult to deal with, such as carpal tunnel syndrome and the stresses of sitting down for long periods of time. Thankfully, it’s a challenge that has a solution: Automated Speech Recognition (ASR). ASR is a technology that historically hasn’t been the best at reporting down what people say, but it has remarkably improved although the best of it is something that still evades the modern and widely available voice assistants on iPhones and Android phones. However, nowadays apps like Wispr Flow and also MacWhisper work effectively and allow dramatically more accurate and longer transcriptions that serve niches such as creating subtitles, as can the ChatGPT app on your phone along the way, which you can download on iOS or Android, Which allows you to use your voice to interact with your devices in ways that have historically not been possible and that also transform the way that we use our technology and therefore interact with the world. Let’s dive into why. While some of you already know that I started using Wispr Flow recently, I’ve actually been using a range of different transcription tools as part of the suite of different apps that make life simple and […]

October 9, 2024

Wispr Flow: The Future of Voice-Activated AI Transcription

In the past couple of decades of human history, I can remember the seminal inventions that shaped our human existence so profoundly that somehow or another, whether we realized it or not, our lives had changed. Of these inventions, the most immediate that I can point to is Google, the search engine that made it so we could see the entire world. Beyond that, I’d say Facebook, the social media app that connected the world in a strange technological network. The next one of these and probably freshest in people’s memories is ChatGPT, the tool that showed us the power and usefulness of generative AI, highlighting for us both the revolution of this new technology and also heightening our fears that one day robots would take over all of us. Well, I firmly believe that the next one is here, and its name is Wispr Flow. Download it here! But what exactly is Wispr, and why are you asking me to download this? Well, I’m glad you asked. What is Wispr? Wispr is an AI transcription software, but it is not just any transcription software. It’s a transcription software that activates at the touch of a button. You can use it in any text field and begin transcribing what you are saying by nothing more than a touch of a button and then speaking into your microphone, which ends up creating transcriptions like this, and even intelligently paragraph what you are saying while at the same time minimizing redundancy by fixing mistakes for you on the fly, based on your writing style, yielding transcriptions like this. What does it cost? The software itself is free to use for up to 2,000 words in the course of a single week if you choose to use the Flow Basic plan. On the other hand, if you use Flow Pro, which most of you probably will. […]

October 3, 2024October 3, 2024

Meeting Tun Dr Mahathir

Today I had a conversation with Tun Dr Mahathir. This is the kind of conversation that a person doesn’t normally have. I don’t expect that many people will have it or many people would have had it. Given everything that has happened so far, it’s far from clear that many other people will be able to have it, and so I know that it is a rare and wonderful privilege. I remember clearly all the things that happened. I showed up in a GrabCar to the Perdana Leadership Foundation, ten minutes before our 9:30 appointment. Walking in to the picturesque building, there I saw our very first national car in blue – the Saga, brought forth from one of Tun Mahathir’s pet projects. As I looked around, I saw that the place was grand – the paintings of prime ministers depicting Tunku Abdul Rahman, Tun Razak, Tun Hussein Onn, Tun Mahathir, and Tun Abdullah – the gallery – the chandeliers and carpeted floors broken only by gorgeous wooden balustrades that led a curved staircase up into an open space. I stood there spellbound – I had not expected a place of such beauty. As I looked around, I realized that I had arrived early and it was not time for my appointment yet. But before long, my contact Adam called – and so with bated breath, I walked into the room where I would meet Tun Dr. Mahathir. In the morning, I had watched Khairy Jamaluddin and Shahril Hamdan’s interview of Tun Mahathir on 2X, paying attention to the questions that he had asked and all of the things along the way, which was also interesting because incidentally I’d also met both of them just the other day at a book launch featuring Kishore Mahbubani – How strange fate is and how the world seems to connect everybody in short order. […]

September 26, 2024

The Night Before I Met Mahathir

It is the night before I meet Dr. Mahathir. For those of you who didn’t know about this, welcome to yet another strange and interesting episode of my life: Tomorrow, I will be interviewing Tun Dr. Mahathir, the 4th and 7th Prime Minister of Malaysia, for my podcast, Pathways To Excellence. I sit here with two books in front of me, the first, The Malay Dilemma, and the next, A Doctor in the House, and I contemplate both and the way they have unquestionably shaped my life. Dr. Mahathir was my Prime Minister when I was just born. From young, I always thought that every country had a Prime Minister; indeed, it is from him that I learned the very concept of Prime Minister itself. For years and years, this had gone on, and I went from thinking that he was the only one who would ever occupy that position, to learning that other countries had ‘presidents’ and ‘kings’, later downgrading the man’s significance as I thought of the ‘world’ and how wide it was, moving first from thinking that Malaysia was everything to thinking that it was tiny, insignificant, hating it, coming back, making it home, and then realising that it was what we made of it. It is fascinating how small the mind of a child is – yet, as I would later realize, how small the mind of an adult is when they fail to contemplate the significance of things that are nearby. I never really thought too extensively what it would mean to actually encounter this person one day. Then one day, many years after my father had died and was buried in the Sungai Petani Christian Cemetery, I found a book. My mum said that she wanted to throw it away, but somehow she didn’t, and there I saw it in its ancient form, yellowed pages […]

September 25, 2024September 25, 2024

Societally Valuable

Every morning I wake up, I ask myself: How do I be someone valuable to society, and how do I create things that are valuable to society? Ever since I was a child, I think that this question has been a part of me – the part that wants to create something that’s of my own in service of the world in which I live through imagination, thought, and the machinations of a mind that will not sit still. Some may argue that choosing to make a difference is a matter of disposition. I don’t want to give to society. I want to live for myself! Why should I care about what other people think? I consider people entirely able to make such statements and accept that they exist don’t disagree with that – human beings are different and naturally abide in different worlds; bearing different personalities, we approach the world through myriads of different lenses built from different world views, cultural backgrounds, and educational experiences. In such a world, might someone not argue that becoming societally valuable is merely one of many pathways. Surely that is an overgeneralization? Personally, I feel that that is not so, purely because society is a large and far-ranging concept. Rather than an abstract and faraway entity, it is something that is close and begins from those closest and dearest to us before it extends outwards into the world. Society is fundamentally made up of individuals – our friends, our family members, the people who make up the sum and total matrix of people whom we know and love, and those whom we have yet to know whether near and within our communities, or far away and outside of them. To bring value to these people and by extension to society is not so grandiose as ending climate change, eliminating inner city crime, or resolving […]

September 4, 2024

AUAM-NAMSA Corporate Pathways Networking Dinner – Some small reflections.

The journey has been pretty interesting in a whole bunch of different ways. Amongst other things, I’ve received a partnership with GerakBudaya, and also in conjunction with the American Universities Alumni Association of Malaysia and the National Assembly of Malaysian Students in the United States of America (NAMSA), we are organizing this event. Here’s the event PDF to showcase that this is quite real. Honestly, even the term Corporate Pathways is a bit of a misnomer. I don’t know how corporate this event is going to be, primarily because it’s mainly going to be focused upon experience sharing and how people thought about their lives in the course of GLC in relation to the education that they received while they were in the U.S. There is a whole backstory to this that goes back about a month or two months or so, but has led me to a place of networking, meeting different people, and establishing friendly chat after friendly chat, rather than transactional moment after transactional moment with a bunch of different people with whom I probably never imagined at the outset that I’d be on casual speaking terms with. Anyway, here are some of the people who will be on the panel. GLC Panel: Nick Khaw, Head of Research at Khazanah and alumnus of Harvard University. Aik Chong Phuah, previous CEO of Petronas Digital and alumnus of the University of Chicago Booth School of Business. Brendan Yap, Senior Executive at the Securities Commission and alumnus of NYU. Athirah Azmi, former Manager, Client Coverage at Maybank Investment Bank and alumna of the University of Chicago Private Sector Panel: Audrey Ooi, co-founder of Colony Coworking Space and an alumna of Mount Holyoke College, also known as @fourfeetnine. Dato’ Vincent Choo, Founder, Urban Ground Group, Franchisee Subway; alumnus of Eastern Michigan University. Yen Ping Teh, APAC Product Partnerships at Google and an […]

July 9, 2024July 9, 2024

Victor Tan

Tags

Victor Tan

Automated Speech Recognition

Dramatically sped up my rate of interactions.

Reduced the strain on my body

Given me practice in public speaking and articulation.

Concluding thoughts

Leave A Comment Cancel reply

The Future of Writing: How Automated Speech Recognition Will Transform Your Writing

Wispr Flow: The Future of Voice-Activated AI Transcription

Meeting Tun Dr Mahathir

The Night Before I Met Mahathir

Societally Valuable

AUAM-NAMSA Corporate Pathways Networking Dinner – Some small reflections.

Search Here ….

Tags

Victor Tan

Automated Speech Recognition

Dramatically sped up my rate of interactions.

Reduced the strain on my body

Given me practice in public speaking and articulation.

Concluding thoughts

Leave A Comment Cancel reply

Recommended Posts

The Future of Writing: How Automated Speech Recognition Will Transform Your Writing

Wispr Flow: The Future of Voice-Activated AI Transcription

Meeting Tun Dr Mahathir

The Night Before I Met Mahathir

Societally Valuable

AUAM-NAMSA Corporate Pathways Networking Dinner – Some small reflections.