Project to detect COVID-19 from coughs and speech
What if it was possible to detect whether someone has COVID-19 or not, just from the sounds of their coughing or talking? It sounds like science fiction, but it may soon come true. This is the goal of the project “Detecção de COVID-19 a partir de tosse e fala” (“COVID-19 detection from coughs and speech”), developed by a team of researchers from Instituto Superior Técnico and INESC-ID.
Using Artificial Intelligence (AI) technologies, the project aims to develop a robust system that helps to identify who is infected with the SARS-CoV-2 virus, through recorded voice and cough. “The main purpose of this project is to be one more clue that can indicate the disease or even be combined with other biomarkers”, highlights the project coordinator, professor Isabel Trancoso, who is also Técnico professor (Department of Electrical and Computer Engineering – DEEC) and INESC-ID researcher.
Although not yet conclusive, the research carried out around this topic is already getting some exciting answers. Several articles published on the subject suggest the hypothesis that even asymptomatic patients reveal changes in their voice, due to the impact of the virus on the lungs and vocal cords, showing slight differences when compared with a healthy person. Although this difference is not decipherable to the human ear, an AI model may be able to detect it.
RT-PCR testing is the mainstay in diagnosing COVID-19, and more recently, antigen tests. There are several disadvantages associated with this testing protocol, namely delayed results, due to the increased workload in laboratories and the huge demand. Consequently, there is a growing interest in developing a cheap, immediate and easy to use system that allows to optimize the testing process. This project was created to follow this need and to take advantage of the solid knowledge that already exists about the potential of speech as a biomarker for health, strongly based on AI methods.
Analyzing speech patterns can help diagnose diseases
Speaking requires the coordination of numerous anatomical structures and systems. The lungs send air through the vocal cords, which produce sounds that are shaped by the tongue, lips and nasal cavities, among other structures. The brain, along with other parts of the nervous system, helps to regulate all these processes and determine the words someone is saying. A disease that affects any one of these systems might leave diagnostic clues in a patient’s speech.
The Técnico professor explains “the potential of speech as a biomarker for health has already been identified for diseases that affect respiratory organs, such as simple cold, or sleep apnea; for mental disorders, such as depression, bipolar disorder, autism spectrum; and for neurodegenerative diseases such as Parkinson’s disease, Alzheimer’s disease, Huntington’s disease; or amyotrophic lateral sclerosis, among many other diseases”. Over the past decade, scientists have used machine learning systems to identify potential vocal biomarkers for a wide variety of these clinical conditions.
The idea for this project comes up right at the beginning of the first lockdown. “Our experience with these diseases clearly pointed to the need to make a great effort to collect an extensive sound data related with COVID-19”, says professor Isabel Trancoso.
A similar project, carried out by a team of researchers at the University of Cambridge, explored the use of traditional acoustic clues (cepstral coefficients, energy, fundamental frequency, etc.) and clues obtained through transfer learning techniques using neural networks, along with different classifiers for COVID-19 detection. The developed models for COVID-19 detection show that the performance is close 80%, even in users who tested negative for COVID-19, but who also had cough due to cold or asthma.
According to the INESC-ID researcher, “the results of the various research works on this topic are very promising, but there are still many areas left unexplored”.
The importance of the community in this project
The first phase of the project is to collect an extensive dataset with representative examples of speech and simulated coughs and snores from both COVID-19 positive (symptomatic and asymptomatic) and negative individuals (ideally including also participants with respiratory conditions other than COVID-19, such as flu, cold, asthma, etc.).
These data will be crucial for the development and success of the project, and for this reason the participation of community is essential and warmly appreciated. The challenge of participating in this study extends to the whole society.
To participate, just follow this link (where you can find the informed consent form), or use the QR code available here.
The participants will have to supply an audio recording of their cough and snoring, as well as speech – sustained vowel, reading a short text, free description of an image. In addition, participants just need to provide some personal data, namely demographic data – age, sex, mother tongue; health data – date and result of the COVID test (for those who were already tested), symptoms in the last 15 days, chronic diseases or chronic medical conditions, voice disorders. All necessary measures will be taken to ensure the security and anonymity of the data collected.
After the necessary data is collected, the research team will use signal processing and machine learning techniques to assess the presence of biomarkers indicative of COVID-19 in coughs and speech, and to develop robust systems for the detection of COVID-19. Once properly tested, these systems can be easily deployed as a web tool and/or a mobile application.
An important screening tool
The research team do not intend to develop a clinical diagnostic test, but rather a complementary and low-cost test – a simple screening tool – using non-intrusive techniques and whose use does not depend on health professionals. In the future, the effective implementation of this screening tool may be essential to curb the spread of COVID-19 pandemic if, for example, it is used at the entrance of schools or companies/institutions.
The data collected in this study will also allow to continue studying other diseases that affect the respiratory system. “It is extremely important to have a volume of data that allows us to carry out this study”, stresses professor Isabel Trancoso.
“My vision is that collecting speech samples will become as common as a blood test”, says the INESC-ID researcher. “It is a ubiquitous signal and can be collected in a non-invasive way, both in person and by teleconsultations”, she stresses.
Source: Instituto Superior Técnico
Upcoming Events
OLISSIPO Workshop: “How to design a graphical abstract” with Dr. Rita Félix (CNC-UC)
On April 19, the OLISSIPO project will host an 8-hour workshop titled “How to design a graphical abstract” with Dr. Rita Félix, a science communicator, illustrator and designer from CNC Center for Neuroscience and Cell Biology (Coimbra, Portugal). Registration is free and seating is limited.
Registration Deadline: April 5 | Register here (free but mandatory)
Date & Time: April 19, 09h00-18h00 ( 8-hours)
Where: INESC-ID, R. Alves Redol 9, 1000-029 Lisboa | Room 9 (Auditorium), Ground Floor
Summary: “How to design a graphical abstract” Workshop aims to explain what a graphical abstract is, and give you design tools and tips on how to create a better, clear and engaging graphical abstract. This workshop is tailored to give you tools and improve your graphical abstract, without having to learn how to use a new software program (like Adobe Illustrator). Bring your graphical abstract, share it with the class, work on it and take home a new version.
Short Bio: Rita Félix is a science communicator, illustrator and designer, with life sciences research experience. Currently working as the Institutional Communication Manager and Designer at CNC-UC. She completed her PhD in Neuroscience in 2020, in the Champalimaud Neuroscience Programme. After that, she enrolled in a Digital Illustration Specialization Course to further develop her visualization and design skills. Worked as a Scientific Graphic Designer at Science Crunchers, a science communication company, where she developed multiple graphical abstracts, article figures, infographics, diagrams, illustrations, visual identity, logos and webdesign for companies, scientific institutions and Horizon 2020 consortia. More information at https://ritallfelix.wixsite.com/portfolio .
INESC-ID talk: “Rise of the AI-Empowered End User Software Engineer” by Ed Ayers and Andy Gordon (Cogna)
On April 19, INESC-ID will host a talk by Ed Ayers and Andy Gordon from the startup Cogna. The talk is titled “Rise of the AI-Empowered End User Software Engineer” and is organised by INESC-ID researcher Nuno Lopes.
Date & Time: April 19, 15h00 -16h00
Where: INESC-ID, Rua Alves Redol, 9, 1000-029 Lisboa | Room 9 (Auditorium), Ground Floor
Summary:
“What if natural language really is the new programming language? Inspired by the transformation of professional software engineering by generative AI, let’s take the next step: empowering end users. We can boost their productivity with hyper-customized software generated from natural language. This challenge needs research right across software engineering: requirements, architecture, coding, testing, verification, repair, and maintenance. We will survey current progress and open research questions in this exciting new area of programming language research.”
(Photo: Cogna website)