INESC-ID Talk: “Machine Learning as a magnifying glass to study society” by Joana Gonçalves de Sá (SPAC, LIP)
On January 23, at 13:00, INESC-ID will host a talk by Joana Gonçalves de Sá, a researcher at LIP and the coordinator investigator of the Social Physics and Complexity (SPAC) research group, at Técnico. The talk is titled “Machine Learning as a magnifying glass to study society”.
Where: INESC Lisboa, Rua Alves Redol, 9, 1000-029 Lisboa | Room 9 (Auditorium), Ground Floor
Summary:
“Machine Learning Algorithms (MLAs) are trained on vast amounts of data and work by learning patterns and finding non-linear and often black box mathematical relations between that data. A central challenge MLAs face is that the data used to train them is not generated in a social vacuum: if the data or the targets are biased, the models will also be biased. This creates an important problem: how should MLAs be trained to identify relevant differences in data while not perpetuating or even amplifying prejudice or social bias? To date, the main approach has been deductive, or top-down: researchers or coders start by listing known biases, such as racial prejudice, and then search for signs of their presence in the data, the models, or in societies.
The implicit assumptions are that a) all biases or all types of biased features are known a-priori, b) they are identifiable; and c) once identified, they can be debiased-against. However, there is no comprehensive and universal list of biases, new biases emerge dynamically, and the coder or researcher’s contextual backgrounds influence the debiasing approaches.
In summary, even screened datasets or models are likely to contain biased patterns. Therefore, it is crucial to develop inductive systems to identify biases in MLAs.
The talk will be divided into two parts. In the first, I will describe the first (to the best of our knowledge), experimental audit study for detecting possible differential tracking in misinformation websites and its impact on third-party content and search engine results. We created a two-staged experimental audit, which resorts to stateful crawlers to mimic users browsing the web, while experimentally controlling for websites, time and geo-location, and collecting online tracking data. But analyzing differences in search-engine recommendation to bots that have different browsing experiences (and, thus, collected different cookies) it should be possible to audit their algorithms for biased customization. I will present results indicating that 1) disinformation websites are tracked more heavily by third-parties than non-disinformation websites, 2) simply changing the location of the bots is sufficient to customize the content being recommended, and 3) this has implications for polarization and misinformation spread. In the second part, I will discuss the possibility of expanding on this and other work and take advantage of MLAs to identify novel biases. That MLAs so efficiently learn from widely recognized prejudice, suggests that it should be possible to use algorithms to reverse the problem and develop statistical, bottom-up tools to identify latent, unknown biases. This is a very preliminary project, and I would value the community’s input.”
Bio:
Joana Gonçalves de Sá is a researcher at LIP and the coordinator investigator of the Social Physics and Complexity (SPAC) research group. She has a degree in Physics Engineering from Instituto Superior Técnico – University of Lisbon, and a PhD in Systems Biology from NOVA – ITQB, having developed her thesis at Harvard University, USA. Her current research uses data analytics and machine learning to study complex problems at the interface between Biomedicine, Social Sciences, and Computation, with a large ethical and societal focus. Before that, she was an Associated Professor at Nova School of Business and Economics and a Principal Investigator at Instituto Gulbenkian de Ciência, where she also coordinated the Science for Society Initiative and was the founder and Director of the Graduate Program Science for Development (PGCD), aiming at improving scientific research in Africa. She received two ERC grants (Stg_2019 and PoC_2022) to study human and algorithmic biases using fake news as a model system.