
Project to detect COVID-19 from coughs and speech
What if it was possible to detect whether someone has COVID-19 or not, just from the sounds of their coughing or talking? It sounds like science fiction, but it may soon come true. This is the goal of the project “Detecção de COVID-19 a partir de tosse e fala” (“COVID-19 detection from coughs and speech”), developed by a team of researchers from Instituto Superior Técnico and INESC-ID.
Using Artificial Intelligence (AI) technologies, the project aims to develop a robust system that helps to identify who is infected with the SARS-CoV-2 virus, through recorded voice and cough. “The main purpose of this project is to be one more clue that can indicate the disease or even be combined with other biomarkers”, highlights the project coordinator, professor Isabel Trancoso, who is also Técnico professor (Department of Electrical and Computer Engineering – DEEC) and INESC-ID researcher.
Although not yet conclusive, the research carried out around this topic is already getting some exciting answers. Several articles published on the subject suggest the hypothesis that even asymptomatic patients reveal changes in their voice, due to the impact of the virus on the lungs and vocal cords, showing slight differences when compared with a healthy person. Although this difference is not decipherable to the human ear, an AI model may be able to detect it.
RT-PCR testing is the mainstay in diagnosing COVID-19, and more recently, antigen tests. There are several disadvantages associated with this testing protocol, namely delayed results, due to the increased workload in laboratories and the huge demand. Consequently, there is a growing interest in developing a cheap, immediate and easy to use system that allows to optimize the testing process. This project was created to follow this need and to take advantage of the solid knowledge that already exists about the potential of speech as a biomarker for health, strongly based on AI methods.
Analyzing speech patterns can help diagnose diseases
Speaking requires the coordination of numerous anatomical structures and systems. The lungs send air through the vocal cords, which produce sounds that are shaped by the tongue, lips and nasal cavities, among other structures. The brain, along with other parts of the nervous system, helps to regulate all these processes and determine the words someone is saying. A disease that affects any one of these systems might leave diagnostic clues in a patient’s speech.
The Técnico professor explains “the potential of speech as a biomarker for health has already been identified for diseases that affect respiratory organs, such as simple cold, or sleep apnea; for mental disorders, such as depression, bipolar disorder, autism spectrum; and for neurodegenerative diseases such as Parkinson’s disease, Alzheimer’s disease, Huntington’s disease; or amyotrophic lateral sclerosis, among many other diseases”. Over the past decade, scientists have used machine learning systems to identify potential vocal biomarkers for a wide variety of these clinical conditions.
The idea for this project comes up right at the beginning of the first lockdown. “Our experience with these diseases clearly pointed to the need to make a great effort to collect an extensive sound data related with COVID-19”, says professor Isabel Trancoso.
A similar project, carried out by a team of researchers at the University of Cambridge, explored the use of traditional acoustic clues (cepstral coefficients, energy, fundamental frequency, etc.) and clues obtained through transfer learning techniques using neural networks, along with different classifiers for COVID-19 detection. The developed models for COVID-19 detection show that the performance is close 80%, even in users who tested negative for COVID-19, but who also had cough due to cold or asthma.
According to the INESC-ID researcher, “the results of the various research works on this topic are very promising, but there are still many areas left unexplored”.
The importance of the community in this project
The first phase of the project is to collect an extensive dataset with representative examples of speech and simulated coughs and snores from both COVID-19 positive (symptomatic and asymptomatic) and negative individuals (ideally including also participants with respiratory conditions other than COVID-19, such as flu, cold, asthma, etc.).
These data will be crucial for the development and success of the project, and for this reason the participation of community is essential and warmly appreciated. The challenge of participating in this study extends to the whole society.
To participate, just follow this link (where you can find the informed consent form), or use the QR code available here.
The participants will have to supply an audio recording of their cough and snoring, as well as speech – sustained vowel, reading a short text, free description of an image. In addition, participants just need to provide some personal data, namely demographic data – age, sex, mother tongue; health data – date and result of the COVID test (for those who were already tested), symptoms in the last 15 days, chronic diseases or chronic medical conditions, voice disorders. All necessary measures will be taken to ensure the security and anonymity of the data collected.
After the necessary data is collected, the research team will use signal processing and machine learning techniques to assess the presence of biomarkers indicative of COVID-19 in coughs and speech, and to develop robust systems for the detection of COVID-19. Once properly tested, these systems can be easily deployed as a web tool and/or a mobile application.
An important screening tool
The research team do not intend to develop a clinical diagnostic test, but rather a complementary and low-cost test – a simple screening tool – using non-intrusive techniques and whose use does not depend on health professionals. In the future, the effective implementation of this screening tool may be essential to curb the spread of COVID-19 pandemic if, for example, it is used at the entrance of schools or companies/institutions.
The data collected in this study will also allow to continue studying other diseases that affect the respiratory system. “It is extremely important to have a volume of data that allows us to carry out this study”, stresses professor Isabel Trancoso.
“My vision is that collecting speech samples will become as common as a blood test”, says the INESC-ID researcher. “It is a ubiquitous signal and can be collected in a non-invasive way, both in person and by teleconsultations”, she stresses.
Source: Instituto Superior Técnico
Upcoming Events
11th Lisbon Machine Learning Summer School

LxMLS 2021 will take place July 7th to July 15th in online format (via zoom and slack). It is organized jointly by Instituto Superior Técnico (IST), a leading Engineering and Science school in Portugal, the Instituto de Telecomunicações, the Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC-ID), Unbabel and Cleverly.
Click here for information about past editions (LxMLS 2011, LxMLS 2012, LxMLS 2013, LxMLS 2014, LxMLS 2015, LxMLS 2016, LxMLS 2017, LxMLS 2018, LxMLS 2019, LxMLS 2020) and to watch the videos of the lectures (2016, 2017, 2018, 2020).
Call for Participation
Important Dates
—————
* Application Deadline: May 15, 2021
* Decision: June 1, 2021
* Early Registration: June 15 – July 1, 2021
* Summer School: July 7 – 15, 2021
Topics and Intended Audience
—————
The school will cover a range of Machine Learning (ML) topics, from theory to practice, that are important in solving Natural Language Processing (NLP) problems that arise in the analysis and use of Web data.
Our target audience is:
- Researchers and graduate students in the fields of NLP and Computational Linguistics;
- Computer scientists who have interests in statistics and machine learning;
- Industry practitioners who desire a more in depth understanding of these subjects.
Features of LxMLS:
- No deep previous knowledge of ML or NLP is required, but the attendants are assumed to have some basic background on mathematics and programming
- Lecturers are leading researchers in machine learning and natural language processing (see speakers)
- Days are divided into morning lectures and afternoon lab sessions and practical talks (see schedule)
- The Labs guide will be provided one month in advance. Last year’s guide can be found here
- A day zero is scheduled to review basic concepts and introduce the necessary tools for implementation exercises
- Both basic (e.g linear classifiers) and advanced topics (e.g. deep learning, reinforcement learning) will be covered
Online Format:
Due to the current COVID-19 pandemic, the 11th Lisbon Machine Learning School will be held online (via zoom and slack). Similar to last year, we are excited for the opportunity to create a virtual school, where you will be able to attend all the lectures, and participate in the Q&As and labs remotely. We will also provide the tools for students to engage with each other remotely. The lectures will also be streamed to YouTube, and will become freely available later in our YouTube channel. The Q&A, labs and social activities will remain restricted to the accepted students only.
List of Confirmed Speakers
—————
LUIS PEDRO COELHO Fudan University | China
MÁRIO FIGUEIREDO Instituto de Telecomunicações & Instituto Superior Técnico | Portugal
ANDRE MARTINS Instituto de Telecomunicações & Unbabel | Portugal
IRYNA GULEYVICH Technical University Darmstat | Germany
NOAH SMITH University of Washington & Allen Institute for Artificial Intelligence | USA
SLAV PETROV Google Inc. | USA
XAVIER CARRERAS dMetrics | USA
GRAHAM NEUBIG Carnegie Mellon University | USA
BHIKSHA RAJ Carnegie Mellon University | USA
CHRIS DYER Google Deep Mind | UK
ELIAS BARENBOIM Columbia University | USA
ADELE RIBEIRO Columbia University | USA
STEFAN RIEZLER Institut für Computerlinguistik, Universität Heidelberg | Germany
BARBARA PLANK IT University of Copenhagen | Denmark
SASHA RUSH Cornell Tech | USA
Please visit the webpage for up to date information: http://lxmls.it.pt/2021
To apply, please fill the form in https://lisbonmls.wufoo.com/forms/application-form-lxmls-2021/
Any questions should be directed to: lxmls-2021@lx.it.pt.
International European Conference on Parallel and Distributed Computing

The 27th International European Conference on Parallel and Distributed Computing (Euro-Par 2021) will take from August 30 to September 3 2021 in Lisbon.
Euro-Par is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language and interface design and implementation, to tools, support infrastructures, and application performance aspects.
The 2021 edition of Euro-Par will be organized as a collaboration between INESC-ID and Instituto Superior Técnico (IST).
Important Dates:
– Abstract Submission: February 5, 2021
– Paper Submission Deadline: February 12, 2021
– Author Notification: April 30, 2021
– Camera-Ready Papers: June 6, 2021
More information is available here.