Project to detect COVID-19 from coughs and speech
What if it was possible to detect whether someone has COVID-19 or not, just from the sounds of their coughing or talking? It sounds like science fiction, but it may soon come true. This is the goal of the project “Detecção de COVID-19 a partir de tosse e fala” (“COVID-19 detection from coughs and speech”), developed by a team of researchers from Instituto Superior Técnico and INESC-ID.
Using Artificial Intelligence (AI) technologies, the project aims to develop a robust system that helps to identify who is infected with the SARS-CoV-2 virus, through recorded voice and cough. “The main purpose of this project is to be one more clue that can indicate the disease or even be combined with other biomarkers”, highlights the project coordinator, professor Isabel Trancoso, who is also Técnico professor (Department of Electrical and Computer Engineering – DEEC) and INESC-ID researcher.
Although not yet conclusive, the research carried out around this topic is already getting some exciting answers. Several articles published on the subject suggest the hypothesis that even asymptomatic patients reveal changes in their voice, due to the impact of the virus on the lungs and vocal cords, showing slight differences when compared with a healthy person. Although this difference is not decipherable to the human ear, an AI model may be able to detect it.
RT-PCR testing is the mainstay in diagnosing COVID-19, and more recently, antigen tests. There are several disadvantages associated with this testing protocol, namely delayed results, due to the increased workload in laboratories and the huge demand. Consequently, there is a growing interest in developing a cheap, immediate and easy to use system that allows to optimize the testing process. This project was created to follow this need and to take advantage of the solid knowledge that already exists about the potential of speech as a biomarker for health, strongly based on AI methods.
Analyzing speech patterns can help diagnose diseases
Speaking requires the coordination of numerous anatomical structures and systems. The lungs send air through the vocal cords, which produce sounds that are shaped by the tongue, lips and nasal cavities, among other structures. The brain, along with other parts of the nervous system, helps to regulate all these processes and determine the words someone is saying. A disease that affects any one of these systems might leave diagnostic clues in a patient’s speech.
The Técnico professor explains “the potential of speech as a biomarker for health has already been identified for diseases that affect respiratory organs, such as simple cold, or sleep apnea; for mental disorders, such as depression, bipolar disorder, autism spectrum; and for neurodegenerative diseases such as Parkinson’s disease, Alzheimer’s disease, Huntington’s disease; or amyotrophic lateral sclerosis, among many other diseases”. Over the past decade, scientists have used machine learning systems to identify potential vocal biomarkers for a wide variety of these clinical conditions.
The idea for this project comes up right at the beginning of the first lockdown. “Our experience with these diseases clearly pointed to the need to make a great effort to collect an extensive sound data related with COVID-19”, says professor Isabel Trancoso.
A similar project, carried out by a team of researchers at the University of Cambridge, explored the use of traditional acoustic clues (cepstral coefficients, energy, fundamental frequency, etc.) and clues obtained through transfer learning techniques using neural networks, along with different classifiers for COVID-19 detection. The developed models for COVID-19 detection show that the performance is close 80%, even in users who tested negative for COVID-19, but who also had cough due to cold or asthma.
According to the INESC-ID researcher, “the results of the various research works on this topic are very promising, but there are still many areas left unexplored”.
The importance of the community in this project
The first phase of the project is to collect an extensive dataset with representative examples of speech and simulated coughs and snores from both COVID-19 positive (symptomatic and asymptomatic) and negative individuals (ideally including also participants with respiratory conditions other than COVID-19, such as flu, cold, asthma, etc.).
These data will be crucial for the development and success of the project, and for this reason the participation of community is essential and warmly appreciated. The challenge of participating in this study extends to the whole society.
The participants will have to supply an audio recording of their cough and snoring, as well as speech – sustained vowel, reading a short text, free description of an image. In addition, participants just need to provide some personal data, namely demographic data – age, sex, mother tongue; health data – date and result of the COVID test (for those who were already tested), symptoms in the last 15 days, chronic diseases or chronic medical conditions, voice disorders. All necessary measures will be taken to ensure the security and anonymity of the data collected.
After the necessary data is collected, the research team will use signal processing and machine learning techniques to assess the presence of biomarkers indicative of COVID-19 in coughs and speech, and to develop robust systems for the detection of COVID-19. Once properly tested, these systems can be easily deployed as a web tool and/or a mobile application.
An important screening tool
The research team do not intend to develop a clinical diagnostic test, but rather a complementary and low-cost test – a simple screening tool – using non-intrusive techniques and whose use does not depend on health professionals. In the future, the effective implementation of this screening tool may be essential to curb the spread of COVID-19 pandemic if, for example, it is used at the entrance of schools or companies/institutions.
The data collected in this study will also allow to continue studying other diseases that affect the respiratory system. “It is extremely important to have a volume of data that allows us to carry out this study”, stresses professor Isabel Trancoso.
“My vision is that collecting speech samples will become as common as a blood test”, says the INESC-ID researcher. “It is a ubiquitous signal and can be collected in a non-invasive way, both in person and by teleconsultations”, she stresses.
Source: Instituto Superior Técnico
OLISSIPO Twin Seminars on Computational Biology
Sparse regularization for multi-omics data
20th May 2021
13:00-14:30 (WEST – Lisbon) / 14:00-15:30 (CEST) (held online)
ZOOM link: https://videoconf-colibri.zoom.us/j/84981014599
No password or registration needed for this session
The Twin Seminars will contribute to disseminate the scientific work and expertise of INESC-ID and all the Olissipo Project Consortium that includes Inria, ETH Zürich and EMBL. These seminars will comprise two short presentations, one researcher from Lisbon and one from a twin international institution working on similar topics in Computational Biology. The seminars will be opened to everyone interested and will include a discussion to further promote the interaction between all the participants.
Regularized optimization has proved to be a promising and valuable strategy to solve regression problems in high-dimensional spaces by imposing constraints on the parameters. We will discuss novel methods beyond the classical elastic net that allows to include a priori knowledge, such as network-based information. The application to multi-omics patient data, from classification problems to survival analysis, illustrates the potential of sparse structured models for more interpretable and personalized medicine.
Susana Vinga, Instituto Superior Técnico (IST) and INESC-ID (Lisbon, Portugal)
Susana is an Associate Professor at IST (ULisboa) in a joint position at the Dept. of Computer Science and Engineering (DEI) and the Dept. of Bioengineering (DBE). She is a Senior Researcher at INESC-ID in the Information and Decision Support Systems lab, a member of the INESC-ID Board of Directors, and Vice-President of DEI. Prof. Vinga received a Mechanical Engineering degree (1999), a post-graduate degree in Probability and Statistics at IST, a Biomedical specialization at Politecnico of Milan, and a PhD degree in Bioinformatics (2005) at ITQB-UNL (Portugal). From 2006-2013, she was a researcher in the Knowledge Discovery and Bioinformatics group at INESC-ID and invited assistant professor of Biostatistics and Informatics at the Faculty of Medical Sciences. Between 2013-2018 she was a Principal Investigator at Mechanical Engineering Institute (IDMEC/IST). In 2010, she was granted the Young Research Award of the Technical University of Lisbon, and in 2017 she was awarded the Scientific Prize of ULisboa/CGD in the area of Computer Science and Engineering for the impact of her publications. Susana’s main scientific achievements are in the area of systems biology, with the development of models for the analysis of biological networks, and in computational biology and bioinformatics, where she is interested in data science and machine learning methods for the analysis of high-dimensional clinical data. Susana is the Principal Coordinator of the OLISSIPO Twinning Project.
Valentina Boeva, ETH Zürich (Zürich, Switzerland)
Valentina is a Tenure Track Assistant Professor of Biomedical Informatics at the Department of Computer Science of ETH Zürich (Switzerland). She was previously a group leader of the laboratory of Computational Epigenetics of Cancer at Inserm, located at the Cochin Institute in Paris, France (2016-2021). Prof. Valentina received a MSc degree in Applied Mathematics (2003) at Lomonosov Moscow State University (MSU) (Russia) and a PhD degree in Biophysics and Bioinformatics (2007) at MSU. From 2002-2006, she also worked at Inria (France) in sequence analysis algorithms and statistics of DNA motifs and from 2004-2007 in GosNIIgenetika developing statistical methods for DNA sequence analysis. Valentina also worked at Ecole Polytechnique (France) where she contributed to the analysis of cancer-related metabolic networks (2007-2008). Before joining the Cochin Institute in 2016, she worked for about seven years at the Curie Institute, also in Paris. First as a postdoc, then as a researcher scientist. Valentina was an ATIP-Avenir 2015 laureate and received the French Embassy Award in 2012. Her research focuses on understanding the role of epigenetic cancer drivers, and developing computational approaches to process multi-omics information to help clinicians make treatment choices for cancer patients based on genomic, epigenetic, transcriptomic, and other information.
Know more about Olissipo Project at https://olissipo.inesc-id.pt/
11th Lisbon Machine Learning Summer School
LxMLS 2021 will take place July 7th to July 15th in online format (via zoom and slack). It is organized jointly by Instituto Superior Técnico (IST), a leading Engineering and Science school in Portugal, the Instituto de Telecomunicações, the Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC-ID), Unbabel and Cleverly.
Click here for information about past editions (LxMLS 2011, LxMLS 2012, LxMLS 2013, LxMLS 2014, LxMLS 2015, LxMLS 2016, LxMLS 2017, LxMLS 2018, LxMLS 2019, LxMLS 2020) and to watch the videos of the lectures (2016, 2017, 2018, 2020).
Call for Participation
* Application Deadline: May 15, 2021
* Decision: June 1, 2021
* Early Registration: June 15 – July 1, 2021
* Summer School: July 7 – 15, 2021
Topics and Intended Audience
The school will cover a range of Machine Learning (ML) topics, from theory to practice, that are important in solving Natural Language Processing (NLP) problems that arise in the analysis and use of Web data.
Our target audience is:
- Researchers and graduate students in the fields of NLP and Computational Linguistics;
- Computer scientists who have interests in statistics and machine learning;
- Industry practitioners who desire a more in depth understanding of these subjects.
Features of LxMLS:
- No deep previous knowledge of ML or NLP is required, but the attendants are assumed to have some basic background on mathematics and programming
- Lecturers are leading researchers in machine learning and natural language processing (see speakers)
- Days are divided into morning lectures and afternoon lab sessions and practical talks (see schedule)
- The Labs guide will be provided one month in advance. Last year’s guide can be found here
- A day zero is scheduled to review basic concepts and introduce the necessary tools for implementation exercises
- Both basic (e.g linear classifiers) and advanced topics (e.g. deep learning, reinforcement learning) will be covered
Due to the current COVID-19 pandemic, the 11th Lisbon Machine Learning School will be held online (via zoom and slack). Similar to last year, we are excited for the opportunity to create a virtual school, where you will be able to attend all the lectures, and participate in the Q&As and labs remotely. We will also provide the tools for students to engage with each other remotely. The lectures will also be streamed to YouTube, and will become freely available later in our YouTube channel. The Q&A, labs and social activities will remain restricted to the accepted students only.
List of Confirmed Speakers
LUIS PEDRO COELHO Fudan University | China
MÁRIO FIGUEIREDO Instituto de Telecomunicações & Instituto Superior Técnico | Portugal
ANDRE MARTINS Instituto de Telecomunicações & Unbabel | Portugal
IRYNA GULEYVICH Technical University Darmstat | Germany
NOAH SMITH University of Washington & Allen Institute for Artificial Intelligence | USA
SLAV PETROV Google Inc. | USA
XAVIER CARRERAS dMetrics | USA
GRAHAM NEUBIG Carnegie Mellon University | USA
BHIKSHA RAJ Carnegie Mellon University | USA
CHRIS DYER Google Deep Mind | UK
ELIAS BARENBOIM Columbia University | USA
ADELE RIBEIRO Columbia University | USA
STEFAN RIEZLER Institut für Computerlinguistik, Universität Heidelberg | Germany
BARBARA PLANK IT University of Copenhagen | Denmark
SASHA RUSH Cornell Tech | USA
Please visit the webpage for up to date information: http://lxmls.it.pt/2021
To apply, please fill the form in https://lisbonmls.wufoo.com/forms/application-form-lxmls-2021/
Any questions should be directed to: firstname.lastname@example.org.
International European Conference on Parallel and Distributed Computing
The 27th International European Conference on Parallel and Distributed Computing (Euro-Par 2021) will take from August 30 to September 3 2021 in Lisbon.
Euro-Par is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language and interface design and implementation, to tools, support infrastructures, and application performance aspects.
The 2021 edition of Euro-Par will be organized as a collaboration between INESC-ID and Instituto Superior Técnico (IST).
– Abstract Submission: February 5, 2021
– Paper Submission Deadline: February 12, 2021
– Author Notification: April 30, 2021
– Camera-Ready Papers: June 6, 2021
More information is available here.