Early onset cognitive impairment detection from speech and language signals
Award Number
2431571Award Type
StudentshipStatus / Stage
ActiveDates
29 September 2020 -28 September 2024
Duration (calculated)
03 years 11 monthsFunder(s)
EPSRC (UKRI)Funding Amount
£0.00Funder/Grant study page
EPSRCContracted Centre
University of SheffieldPrincipal Investigator
Samuel HollandsWHO Catergories
Methodologies and approaches for risk reduction researchDisease Type
Mild Cognitive Impairment (MCI)CPEC Review Info
Reference ID | 755 |
---|---|
Researcher | Reside Team |
Published | 24/07/2023 |
Data
Award Number | 2431571 |
---|---|
Status / Stage | Active |
Start Date | 20200929 |
End Date | 20240928 |
Duration (calculated) | 03 years 11 months |
Funder/Grant study page | EPSRC |
Contracted Centre | University of Sheffield |
Funding Amount | £0.00 |
Abstract
The detection of medical conditions using speech processing techniques is a rapidly emerging field, focusing on the development of non-invasive diagnostic strategies for conditions ranging from dementia, to depression and anxiety, to even Covid-19. One of the largest issues facing medical speech classification that seldom impacts other areas of speech technology (at least in English) in the scarcity of data available. Many of the largest corpora looking at dementia, for example, contain only tens of thousands of words. In addition, these datasets usually contain very few speakers. Limited speakers to train a model creates a new risk of overfitting to idiosyncratic, dialectal, or accentual features of the participants which in turn can gravely impact the efficacy of the classifier depending on the language features of the test subject.
Accent variation can have a large impact on speech technologies, the traditional approach to counter this impact is to use a colossal corpus or accent independent models, either selected by the user or dictated based on geography which are specifically trained on individuals with a similar accent. Unfortunately, this approach depends on a rich dataset from which to train said models, where in the case of medical classification systems, simply does not exist.
This project aims to explore approaches for reducing the impact of accent and language variation on onset cognitive impairment detection systems. This approach will explore the impact of accents both on the construction of cognitive impairment detection classifiers, and on the compilation and initial processing and feature extraction of the datasets. Whilst large elements of this feature extraction will expand over the process of compiling a literature review, one such example may be to investigate bilingual features. Is it possible that dementia has a consistently detrimental impact on second language production that is distinctly different from the broken language you find in an early learner of the language? For example, we know individuals make the largest amount of speech production errors on phones which are more similar between their L1 and L2, particularly when learning the language, do we see a loss in ability to maintain phonetic distinctions as someone’s cognitive state declines and are the features different to the inverse process of an L2 language learner and thus classifiable. This project aims to develop normalisation strategies and new feature extraction methods for limiting the impact of accents and language variation on medical speech classification systems.
The importance of this research stems from the growing enthusiasm to implement onset cognitive impairment detection systems into the medical industry. Issues here arise where the tools may only be effective on certain demographics of individuals creating significant concern over potential inadvertent segregation created by the technologies. Tools from facial recognition systems to credit scoring systems have all previously and presently seen substantial criticism for their impact on certain demographics of individuals where the systems either perform poorly or adversely impact certain groups of people. It remains vital that medical speech technology is non-discriminatory and provides universally stable efficacy across as many demographics of people as possible.
Year 1:
Write Literature Review
Formulate Cohesive Meaningful Research Questions
Interact with Appropriate Training Materials
Explore Existing Datasets & Data Collection
Early Experimentation
Year 2:
Primary Experimentation
Expand Training to Encompass Experimentation Methodologies
Write Methodology
Begin Analysis
Year 3:
Write the Bulk of the Analysis
Formulate Conclusions
Submit Thesis
Aims
This project aims to explore approaches for reducing the impact of accent and language variation on onset cognitive impairment detection systems.