Séminaire IA & Sciences Sociales

Le séminaire IA et Sciences Sociales a lieu régulièrement toute l’année. Les sessions externes sont ouvertes au public via Zoom (n’hésitez pas à envoyer un mail à arnault.chatelain[at]ensae.fr si vous souhaitez obtenir le lien du Zoom).

2024-2025

Le séminaire a lieu à 17h sur Zoom. Le lien Zoom est transmis aux personnes inscrites sur nos listes de diffusion.

25 Septembre 2024 – Sayash Kapoor (Princeton): « AI Snake Oil »

Confused about AI and worried about what it means for your future and the future of the world? You’re not alone. AI is everywhere–and few things are surrounded by so much hype, misinformation, and misunderstanding. In AI Snake Oil, computer scientists Arvind Narayanan and Sayash Kapoor cut through the confusion to give you an essential understanding of how AI works and why it often doesn’t, where it might be useful or harmful, and when you should suspect that companies are using AI hype to sell AI snake oil–products that don’t work, and probably never will.

While acknowledging the potential of some AI, such as ChatGPT, AI Snake Oil uncovers rampant misleading claims about the capabilities of AI and describes the serious harms AI is already causing in how it’s being built, marketed, and used in areas such as education, medicine, hiring, banking, insurance, and criminal justice. The book explains the crucial differences between types of AI, why organizations are falling for AI snake oil, why AI can’t fix social media, why AI isn’t an existential risk, and why we should be far more worried about what people will do with AI than about anything AI will do on its own. The book also warns of the dangers of a world where AI continues to be controlled by largely unaccountable big tech companies.

By revealing AI’s limits and real risks, AI Snake Oil will help you make better decisions about whether and how to use AI at work and home.

02 Octobre 2024 – Séminaire interne

16 Octobre 2024 – Thomas Davidson (Rutgers) & Youngjin Chae (Rutgers) : « Large Language Models for Text Classification: From Zero-Shot Learning to Instruction-Tuning »

Advances in large language models (LLMs) have transformed the field of natural language processing and have enormous potential for social scientific analysis. We explore the application of LLMs to supervised text classification. As a case study, we consider stance detection and examine variation in predictive accuracy across different architectures, training regimes, and task specifications. We compare ten models ranging in size from 86 million to 1.7 trillion parameters and four distinct training regimes: prompt-based zero-shot learning; few-shot learning; fine-tuning; and instruction-tuning. The largest models generally offer the best predictive performance, but fine-tuning smaller models is a competitive solution due to their relatively high accuracy and low cost. For complex prediction tasks, instruction-tuned open-weights models can perform well, rivaling state-of-the-art commercial models. We provide recommendations for the use of LLMs for text classification in sociological research and discuss the limitations and challenges related to the use of these technologies.

30 Octobre 2024 – Séminaire interne

13 Novembre 2024 – Christopher Barrie (NYU): « Prompt Stability Scoring for Text Annotation with Large Language Models »

Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, researchers have typically tested a variety of semantically similar prompts to determine what we call « prompt stability. » These approaches remain ad-hoc and task specific. In this article, we propose a general framework for diagnosing prompt stability by adapting traditional approaches to intra- and inter-coder reliability scoring. We call the resulting metric the Prompt Stability Score (PSS) and provide a Python package PromptStability for its estimation. Using six different datasets and twelve outcomes, we classify >150k rows of data to: a) diagnose when prompt stability is low; and b) demonstrate the functionality of the package. We conclude by providing best practice recommendations for applied researchers.

27 Novembre 2024 – Oana Balalau (INRIA) : « Argumentation mining and its applications to propaganda detection »

11 Décembre 2024 – Séminaire interne


2023-2024

The seminars are at 5:15pm (CET) both at CREST and on Zoom.

26 June 2024 – internal seminar

12 June 2024 – Hannah Waight (NYU): « Propaganda Bias and Large Language Models »

Artificial Intelligence (AI) systems have been shown to display various social biases. While many such biases arise from content and data produced by individual internet users, we uncover a more insidious, centralized form of bias in AI – political biases that likely stem from government propaganda in the training data. Leveraging two unique datasets of Chinese propaganda news articles, we quantify the amount of propaganda in open-source training datasets for large language models (LLMs). We find large footprints of propaganda in the Chinese portions of open-source training datasets, especially for political topics. Using audit experiments with both human and machine evaluations, we document systematic difference in the output of LLMs in response to political questions – Chinese-language queries consistently generate more positive responses on Chinese political institutions and figures than the same queries in English. We further show evidence that the most used LLM systems to date memorize common propaganda phrases. In future versions of this paper, we will report on our pre-training experiments, demonstrating that the introduction of additional documents from the propaganda apparatus to pre-training can shape open-source LLMs to be more favorable to the Chinese government. While our evidence is primarily drawn from the Chinese case, our paper broadly introduces the possibility of propaganda bias – the potential for strategic manipulation of and unintended influence on LLMs through training data by existing political institutions.

29 May 2024 – Fabrizio Gilardi (U. Zurich): « Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning »

This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT-3.5 and GPT-4, though still lagging behind fine-tuned GPT-3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.

16 May 2024 – internal seminar

10 April 2024 – Petter Törnberg (U. Amsterdam): « Simulating Agents with LLMs »

Social media is often criticized for amplifying toxic discourse and discouraging constructive conversations. But designing social media platforms to promote better conversations is inherently challenging. This paper asks whether simulating social media through a combination of Large Language Models (LLM) and Agent-Based Modeling can help researchers study how different news feed algorithms shape the quality of online conversations. We create realistic personas using data from the American National Election Study to populate simulated social media platforms. Next, we prompt the agents to read and share news articles – and like or comment upon each other’s messages – within three platforms that use different news feed algorithms. In the first platform, users see the most liked and commented posts from users whom they follow. In the second, they see posts from all users – even those outside their own network. The third platform employs a novel « bridging » algorithm that highlights posts that are liked by people with opposing political views. We find this bridging algorithm promotes more constructive, non-toxic, conversation across political divides than the other two models. Though further research is needed to evaluate these findings, we argue that LLMs hold considerable potential to improve simulation research on social media and many other complex social settings.

20 Mar 2024 – internal seminar

31 Jan 2024 – internal seminar

17 Jan 2024 – Alexander Kindel (Sciences Po médialab): « A multivariate perspective on word embedding association tests »

Word embedding association tests are a popular family of linear models for measuring conceptual associations observable in text corpora (e.g., biases, stereotypes, schemas) using word embeddings. The key quantity in such measurement models is the arithmetic mean cosine similarity (MCS) between pairs of word vectors with labels drawn from keyword lists that relate to the targeted concepts. This quantity is always distorted by the choice of keyword lists whenever the number of words in each list is greater than two. Model-based linear adjustments (e.g. controlling for word frequency) do not fix the distortion. I describe the degree of distortion in several exemplary MCS models published in computational social science, and I show how to obtain a valid metric using results from the literature on multivariate correlation. An important implication is that MCS is a valid metric for conceptual association problems only under a contradictory assumption about the relevance of the keyword lists to their target concepts.

10 Jan 2024 – internal seminar

20 Dec 2023 – internal seminar

6 Dec 2023 – Antonin Descampe & Louis Escouflaire (UC Louvain): « Analyzing Subjectivity in Journalism: A Multidisciplinary Discourse Analysis Using Linguistics, Machine Learning, and Human Evaluation »

We present the results of three experiments on subjectivity detection in French press articles. Our research lies at the crossroads of journalism studies and linguistics and aims to uncover the mechanisms of objective writing in journalistic discourse. First, we evaluated a range of linguistic features for a text classification task of news articles and opinion pieces. Then, we fine-tuned a transformer model (CamemBERT) on the same task and compared it with the feature-based model in terms of accuracy, computational cost and explainability. We used model explanation methods to extract linguistic patterns from the transformer model in order to build a more accurate and more transparent hybrid classification model. Finally, we conducted an annotation experiment in which 36 participants were tasked with highlighting “subjective elements” in 150 press articles. This allowed us to compare human-based and machine-derived insights on subjectivity, and to confront these results with journalistic guidelines on objective writing.

22 Nov 2023 – Isabelle Augenstein (University of Copenhagen): « Transparent Cross-Domain Stance Detection »

Understanding attitudes expressed in text is an important task for content moderation, market research, or to detect false information online. Stance detection has been framed in many different ways, e.g. targets can explicit or implicit, and contexts can range from short tweets to entire articles. Moreover, datasets differ by domain, and use varying label inventories, annotation protocols, and cover different languages. This requires novel methods that can bridge domains as well as languages. Moreover, to be applied to content moderation, having a model that can provide a reason for a certain stance can be useful.
In this talk, I will present our research on cross-domain as well as cross-lingual stance detection, as well as on methods for creating transparent predictions by additionally providing explanations.

08 Nov 2023 – Yiwei Luo (Stanford University): « Othering and low prestige framing of immigrant cuisines in US restaurant reviews and large language models » (with Kristina Gligorić and Dan Jurafsky)

Identifying and understanding implicit attitudes toward food can help efforts to mitigate social prejudice due to food’s pervasive role as a marker of cultural and ethnic identity. Stereotypes about food are a form of microaggression that contribute to harmful public discourse that may in turn perpetuate prejudice toward ethnic groups and negatively impact economic outcomes for restaurants. Through careful linguistic analyses, we evaluate social theories about attitudes toward immigrant cuisine in a large-scale study of framing differences in 2.1M English language Yelp reviews of restaurants in 14 US states. Controlling for factors such as restaurant price and neighborhood racial diversity, we find that immigrant cuisines are more likely to be framed in objectifying and othering terms of authenticity (e.g., authentic, traditional), exoticism (e.g., exotic, different), and prototypicality (e.g., typical, usual), but that non-Western immigrant cuisines (e.g., Indian, Mexican) receive more othering than European cuisines (e.g., French, Italian). We further find that non-Western immigrant cuisines are framed less positively and as lower status, being evaluated in terms of affordability and hygiene. Finally, we show that reviews generated by large language models (LLMs) reproduce many of the same framing tendencies. Our results empirically corroborate social theories of taste and gastronomic stereotyping, and reveal linguistic processes by which such attitudes are reified.

25 Oct 2023 – Pierre-Carl Langlais (Head of Research, OpSic). « De l’opérationnalisation au fine-tuning : créer des LLM pour l’analyse de corpus en sciences sociales » (in French)

Depuis quelques mois, chatGPT est concurrencé par une nouvelle génération de LLM ouverts. Llama, Mistral, Falcon : ces modèles plus compacts peuvent être adaptés à une grande variété de tâches sous réserve d’être entraînés en amont. Cette présentation décrit de premiers essais expérimentaux de fine-tuning pour l’annotation de grands corpus en sciences sociales et en humanités : textes littéraires, expressions sur les réseaux sociaux ou échanges avec le service public. L’élargissement de la fenêtre contextuelle (jusqu’à 3000 mots pour llama) et la sophistication croissante des LLM permet aujourd’hui d’opérationnaliser des catégories d’analyses complexes (sarcasme, complotisme, intertextualité, temps diégétique). Avec ces premiers résultats, nous évoquerons également les enjeux méthodologiques associés à l’entraînement des LLM aujourd’hui, dont le recours de plus en plus fréquent à des données synthétiques.