ML for Translating Dysarthria Speech (Pre-Part 1)

What is Dysarthria?

Per the Mayo Clinic, Dysarthria occurs when the muscles you use for speech are weak or you have difficulty controlling them. Dysarthria often causes slurred or slow speech that can be difficult to understand.

Typically, patients with dysarthria obtain this disorder from other reasons:

Amyotrophic lateral sclerosis (ALS, or Lou Gehrig’s disease)
Brain injury
Brain tumor
Cerebral palsy
Guillain-Barre syndrome
Head injury
Huntington’s disease
Lyme disease
Multiple sclerosis
Muscular dystrophy
Myasthenia gravis
Parkinson’s disease
Stroke
Wilson’s disease

This list is not meant to be exhaustive but to indicate the most common reasons why an individual develops dysarthria.

I am hoping to create a workflow and machine learning model that can listen to an individual’s voice and be able to produce a model that is unique to them. This model would be able to receive voice data and provide text output. Google has already done something very similar with Former NFL Player Tim Shaw. Tim Shaw announced he had ALS back in 2014. Google recently worked to help develop an application that would listen and translate Tim’s voice and produce text for others to read. Along with this they used audio records to reproduce his original voice.

I hope to take that research one step further by introducing speech translation from one language to another prior to displaying the translated speech text.

Why is this important, in general, to me?

My grandmother is an immigrant from Quebec, Canada. She came to the United States with my grandfather many years ago. When I was very young she suffered from two strokes. Every interaction I can remember with my grandmother it was always a struggle to understand her. Phone calls have always been difficult. The combination of poor audio quality and my grandmothers condition made it almost impossible to communicate. On top of all that my grandmother’s English is not up to par. This isn’t her fault since she is actually fluent in Canadian French. Since my grandmother lives in the United States most individuals don’t speak Canadian French. This would make communicating with my grandmother in any sense more difficult. The combination of this information and my inability to speak canadian french led me to want to use machine learning to fix the problem.

Due to my grandmother’s age and lack of interaction with technology, recordings of her voice prior to her strokes are pretty much non-existent. That only limits me in being able to recreate my grandmother’s voice. I should still be able to create something that can listen to her voice and produce text translation. If audio is required an alternative voice can be used with the text that is produced.

Technology Stack: GCP vs. AWS vs. Azure

I plan to create my own model using data that I collect. I unfortunately haven’t collected any data from prior conversations, but plan to collect going forward. I hope to make the code open-source on my github. That way others can collect their data and build a similar model. I don’t plan to release my model or data due to it being sensitive data. I haven’t decided which cloud platform I will use. The platform will need to provide fast API inference and at a reasonable price. There shouldn’t have to be too much data processing if any at all between training and inference. My initial thought is to use GCP but as time goes forward this may change. I will be following a traditional way of framing the ML problem, but a slightly different weighing scale on the tasks.

There are 5 major areas where effort should be allocated.

Defining KPI’s. – 5%
Collecting Data – 50%
Building Infrastructure – 25%
Optimizing ML Algorithm – 10%
Integration – 10%

KPI’s

My KPI is the level of accuracy of the translation. I hope to achieve as accurate of translation as possible. This is a relatively straight forward KPI. Not much time spent here.

Collection of Data

This is literally half of the work. Since I have no data it makes difficult to generate a model. I am searching for another model to potentially perform transfer learning on but I haven’t been able to find one. The initial collection will be from conversations I have with my grandmother. The more conversations I have the more data I can train the model on. Once I have a model deployed in production, I hope to have a separate pipeline that collect audio submitted to the platform. That pipeline will save the audio to help train future models. As time goes on the models should hopefully become better to the point where serious analysis must be performed to determine which model is better.

Building Infrastructure

This section will vary as time goes on. This mostly depends on what platform has the cheapest storage for the audio recordings, models, and various other data. On top of that which platform has the faster inference for the cheapest cost. Another determining factor is the available API’s and their ability to perform. Both GCP and AWS have amazing ML API’s. I plan to test both sets of NLP APIs for my specific needs.

Optimizing ML Algorithm

This section is more of a future conversation. Since we have no data we can’t really optimize an ML algorithm. Initial building of the model will most likely use something similar to GRU, LSTM, or other similar models. I will most likely use some sort of transfer learning or existing language translation API. It all depends whether or not the model for language translation for Canadian French exists. Initially, I may train an additional model to translate her english audio as well.

Integration

Since there isn’t another app or system that will be interfacing with this API/Model there aren’t really considerations here. I hope to create an app that integrates this API so my immediate relatives who interact with my grandmother are able to use it as well. I will also have to figure out an easy way for my grandmother to be able to use this with other individuals. My grandmother is very smart but still struggles with newer technology. This may be a hurdle that I will need to face. Another possibility is using Tensorflow Lite and creating an application that uses the model for speech prediction. Since I typically communicate with my Grandmother via facetime, I hope to make the app integrate with video chat applications for ease of use.

End Goal

Future goal is to have an app that can listen to any voice and make this tool available to anyone. If there is someone in your life that suffers from similar conditions, I hope that this work will eventually help make life easier. As the project nears production ready, I hope to create an easy way to convert the model to work with other individuals. This can hopefully be done using transfer learning.

Wrap-up

This is the first part of a multi-part series that will hopefully be fruitful, and produce an open-source method for translating speech from individuals who suffer from Dysarthria. This pre-part 1 was to discuss the problem, the approach, and end goal. Thank you for reading and please be on the lookout for the official release of part 1.

ML for Translating Dysarthria Speech (Pre-Part 1)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112