Early onset cognitive impairment detection from speech and language signals

Award Number
2431571
Award Type
Studentship
Status / Stage
Active
Dates
29 September 2020 -
28 September 2024
Duration (calculated)
03 years 11 months
Funder(s)
EPSRC (UKRI)
Funding Amount
£0.00
Funder/Grant study page
EPSRC
Contracted Centre
University of Sheffield
Principal Investigator
Samuel Hollands
WHO Catergories
Methodologies and approaches for risk reduction research
Disease Type
Mild Cognitive Impairment (MCI)

CPEC Review Info
Reference ID755
ResearcherReside Team
Published24/07/2023

Data

Award Number2431571
Status / StageActive
Start Date20200929
End Date20240928
Duration (calculated) 03 years 11 months
Funder/Grant study pageEPSRC
Contracted CentreUniversity of Sheffield
Funding Amount£0.00

Abstract

The detection of medical conditions using speech processing techniques is a rapidly emerging field, focusing on the development of non-invasive diagnostic strategies for conditions ranging from dementia, to depression and anxiety, to even Covid-19. One of the largest issues facing medical speech classification that seldom impacts other areas of speech technology (at least in English) in the scarcity of data available. Many of the largest corpora looking at dementia, for example, contain only tens of thousands of words. In addition, these datasets usually contain very few speakers. Limited speakers to train a model creates a new risk of overfitting to idiosyncratic, dialectal, or accentual features of the participants which in turn can gravely impact the efficacy of the classifier depending on the language features of the test subject.
Accent variation can have a large impact on speech technologies, the traditional approach to counter this impact is to use a colossal corpus or accent independent models, either selected by the user or dictated based on geography which are specifically trained on individuals with a similar accent. Unfortunately, this approach depends on a rich dataset from which to train said models, where in the case of medical classification systems, simply does not exist.
This project aims to explore approaches for reducing the impact of accent and language variation on onset cognitive impairment detection systems. This approach will explore the impact of accents both on the construction of cognitive impairment detection classifiers, and on the compilation and initial processing and feature extraction of the datasets. Whilst large elements of this feature extraction will expand over the process of compiling a literature review, one such example may be to investigate bilingual features. Is it possible that dementia has a consistently detrimental impact on second language production that is distinctly different from the broken language you find in an early learner of the language? For example, we know individuals make the largest amount of speech production errors on phones which are more similar between their L1 and L2, particularly when learning the language, do we see a loss in ability to maintain phonetic distinctions as someone’s cognitive state declines and are the features different to the inverse process of an L2 language learner and thus classifiable. This project aims to develop normalisation strategies and new feature extraction methods for limiting the impact of accents and language variation on medical speech classification systems.
The importance of this research stems from the growing enthusiasm to implement onset cognitive impairment detection systems into the medical industry. Issues here arise where the tools may only be effective on certain demographics of individuals creating significant concern over potential inadvertent segregation created by the technologies. Tools from facial recognition systems to credit scoring systems have all previously and presently seen substantial criticism for their impact on certain demographics of individuals where the systems either perform poorly or adversely impact certain groups of people. It remains vital that medical speech technology is non-discriminatory and provides universally stable efficacy across as many demographics of people as possible.
Year 1:
Write Literature Review
Formulate Cohesive Meaningful Research Questions
Interact with Appropriate Training Materials
Explore Existing Datasets & Data Collection
Early Experimentation

Year 2:
Primary Experimentation
Expand Training to Encompass Experimentation Methodologies
Write Methodology
Begin Analysis

Year 3:
Write the Bulk of the Analysis
Formulate Conclusions
Submit Thesis

Aims

This project aims to explore approaches for reducing the impact of accent and language variation on onset cognitive impairment detection systems.