Programme

The workshop will be held fully online. Participation is free of charge but Registration is required. All times are in Central European Time (CET).

Download the event programme here.

9:30 AM – 9:35 AM

WELCOMING AND LOGGING IN

9:35 AM – 9:45 AM

WORKSHOP OPENING

Valentina Bambini 

Laboratory of Neurolinguistics and Experimental Pragmatics (NEPLab), University School for Advanced Studies IUSS, Pavia, Italy

9:45 AM – 10:10 AM

USING LARGE LANGUAGE MODELS TO IDENTIFY METAPHORICAL EXPRESSIONS IN TEXT

Matteo Fuoli1, Weihang Huang1, Jeannette Littlemore1, Sarah Turner2, Ellen Wilding1

1Department of Linguistics and Communication University of Birmingham, UK

2Centre for Arts, Memory & Communities Coventry University, UK

Metaphor is a pervasive feature of discourse, yet it remains particularly difficult to quantify reliably due to its highly context-sensitive nature. For this reason, most existing studies rely on manual corpus annotation, a process that is both time-consuming and tedious. But what if we could use large language models (LLMs) to at least partially automate this task? In this talk, we present a study in which we tested a range of LLMs and three different methodological approaches to LLM-assisted metaphor identification. Our findings show that state-of-the-art closed-source models can achieve high levels of accuracy, with fine-tuning yielding a median F1 score of 0.79. A comparison of human and model outputs further reveals that most disagreements are systematic, reflecting well-known grey areas and enduring conceptual challenges in metaphor theory.

10:10 AM – 10:35 AM

LANGUAGE MODELS AND THE MAGIC OF METAPHOR, REVISITED: A COMPARATIVE EVALUATION OF ITALIAN BABY AND LARGE LANGUAGE MODEL WITH HUMAN INTERPRETATIONS

Simone Mazzoli1, Alice Suozzi1, Gianluca E. Lebani1,2

1QuaCLing Lab, Dipartimento di Studi Linguistici e Culturali Comparati, Università Ca’ Foscari Venezia, Dorsoduro, Venice, Italy

2European Centre for Living Technology (ECLT), Ca’ Bottacin, Dorsoduro, Venice, Italy

This study investigates the metaphor comprehension abilities of Italian-trained Baby and Large Language Models (LMs) by comparing their behavior with human judgments and human-produced interpretations. Building on previous work (Mazzoli et al., 2025), which evaluated models using a log-likelihood–based multiple-choice task, we combine fixed-choice evaluation with a finer-grained, distributional approach grounded in human-generated paraphrases. In more detail, we annotated and organized a large set of human-generated interpretations into semantic clusters. We then tested whether models’ log-likelihood scores correlate with the salience and frequency of these interpretation types in human cognition. This methodology allows for a more nuanced evaluation of whether LLMs truly capture the multifaceted nature of figurative language or merely rely on surface-level statistical patterns. The results provide new insights into the alignment between the probabilistic outputs of LLMs and the collective interpretative behavior of native speakers.

10:35 AM – 11:00 AM

A NOVEL METAPHOR DATASET OF TOXIC POSTS FROM CHRISTIAN SUBREDDITS

Sebastian Reimann, Tatjana Scheffler

Department for German Language and Literature Bochum, Ruhr University Bochum, Germany

Metaphors are a linguistic strategy that is heavily used in toxic language from explicit insults (“You’re a pig”) to implicit propaganda (“immigrant flood”) (Camp, 2017). We aim to provide empirical data to support the joint analysis of the connections between toxic language and metaphors. We present a dataset of posts from two Christian subreddits (r/OpenChristian and r/TrueChristian) that were annotated for metaphor via MIP-SFU (Hashemi et al., 2025), a modified version of the Metaphor Identification Procedure VU Amsterdam by (Steen et al., 2010), developed by the Discourse Processing Lab at Simon Fraser University. The posts were sampled to create a dataset of toxic posts that are annotated for metaphor and are part of an effort to ultimately combine the detection of metaphor and toxic language. Specifically, we sampled posts that both received more downvotes than upvotes by other users and toxicity scores that are larger than 0.5 by the Perspective API (https://perspectiveapi.com/). In total, we annotated 3,881 Tokens, out of which 736 are metaphoric. Eventually, we used this dataset to evaluate two BERT-based state-of-the-art metaphor detection approaches (Choi et al., 2021; Zhang and Liu, 2023). We found that these approaches perform well on conventional toxic metaphors (i.e.ex “piece of shit”) but struggle with more creative metaphoric insults. This dataset represents work in progress and we aim to extend it with annotated posts that were heavily downvoted and received low toxicity scores in order to cover toxic language that was potentially made implicit through the use of metaphor.

11:00 AM – 11:15 AM

BREAK

11:15 AM – 11:40 AM

EXPLORING METAPHORS IN LLMs WITH A HUMAN-CENTRIC AND CULTURE-AWARE APPROACH

Bolette S. Pedersen, Ali Basirat, Alberto Parola, Sussi Olsen

Centre for Language Technology, Department of Nordic Studies and Linguistics, Copenhagen University, Copenhagen, Denmark

The METALLM project, funded by the Independent Research Fund Denmark from 2026, explores how LLMs interpret metaphorical language and explores its relationship to human cognition, linguistics, and cultural background. Derived from empirical cognitive and linguistic data compiled in the project, we strive towards providing new insights into LLMs’ inner workings when processing metaphors and enhance transparency and explainability in AI. We aim at paving thereby the way for the development of models that do not homogenize language and culture to the same extent as we currently see, but are more human-centric and inclusive in terms of linguistic and cultural diversity. In our talk we will present the ongoing data development and future plans for experiments.

11:40 AM – 12:05 PM

COMMONSENSE REASONING FOR AUTOMATIC METAPHOR ELABORATION: A CONCEPTUAL COMBINATION APPROACH

Antonio Lieto1,2, Gian Luca Pozzato3, Stefano Zoia3

1Cognition, Interaction and Intelligent Technologies Lab/DISPC, Università di Salerno, Fisciano
(SA), Italy

2Cognitive Systems for Robotics, ICAR-CNR Institute, Palermo, Italy

3Dipartimento di Informatica, Università degli Studi di Torino, Turin, Italy

Conceptual combination is a cognitive mechanism applied in several contexts where human language and reasoning is involved, including metaphor elaboration. From a computational perspective, many obstacles in reproducing such mechanism were overcome by the logic-based tool TCL (Lieto and Pozzato, 2020). We show how TCL can be applied to automatic metaphor generation and classification (Lieto et al., 2025), improving the performance of LLMs in metaphor classification with metaphorical representations that received good evaluations by human judges. We are currently working on a multimodal extension of this approach, involving metaphorical images and satirical comics.

12:05 AM – 12:20 PM

HUMMUS: A DATASET OF HUMOROUS MULTIMODAL METAPHOR USE

Xiaoyu Tong1, Zhi Zhang1, Pia Sommerauer2, Martha Lewis1, Ekaterina Shutova1

ILLC, University of Amsterdam, the Netherlands

2Vrije Universiteit Amsterdam, the Netherlands

Metaphor and humor share a lot of common ground, and metaphor is one of the most common humorous mechanisms. This study focuses on the humorous capacity of multimodal metaphors. Taking inspiration from well-established metaphor and humor theories, we developed a novel annotation scheme for humorous multimodal metaphor use in image-caption pairs. We create the Hummus Dataset of Humorous Multimodal Metaphor Use, providing expert annotation on 1k New Yorker cartoons with humorous captions. We use this dataset to test state-of-the-art multimodal large language models on their ability to detect and understand humorous multimodal metaphor use.

12:20 PM – 12:35 PM

PARTICIPANTS AND MODELS: THE ROLE OF LLMs IN PSYCHOLINGUISTIC APPROACHES TO METAPHOR

Veronica Mangiaterra1, Hamad Al-Azary2, Chiara Barattieri di San Pietro1, Paolo Canal1, Valentina Bambini1

1Laboratory of Neurolinguistics and Experimental Pragmatics (NEPLab), Department of Humanities and Life Sciences, University School for Advanced Studies IUSS, Pavia, Italy

2Department of Humanities, Social Sciences and Communication, College of Arts and Sciences, Lawrence
Technological University, Southfield, MI, USA

Large language models are increasingly employed in experimental research on language processing. In this talk, I will present two studies showcasing the complementary roles that LLMs can play in psycholinguistic research on metaphor: as participants, by approximating human judgments, and as models or quantitative tools, by providing estimates that operationalize specific theoretical assumptions. In the first study, we assessed the viability of using LLMs to generate valid and reliable ratings for metaphorical stimuli, thereby accelerating the norming stages of psycholinguistic experiments, which have traditionally relied on human participants. In the second study, we used different computational models to clarify the functional nature of electrophysiological responses to metaphors. Together, these studies support the implementation of LLMs within metaphor research, not as substitutes for human participants, which remain the reference in the study of processing, but as reliable tools to quantify stimuli features and theoretical predictions.

 

12:35 PM – 12:50 PM

METAMAP – MAPPING METAPHORS ACROSS LANGUAGES AND CULTURES

Ginevra Martinelli, Chiara Barattieri di San Pietro, Maddalena Bressler, Veronica Mangiaterra, Valentina Bambini

Laboratory of Neurolinguistics and Experimental Pragmatics (NEPLab), Department of Humanities and Life Sciences, University School for Advanced Studies IUSS, Pavia, Italy

Metaphors bridge language, thought, and culture, straddling the border between cognitively driven mechanisms and culturally specific conceptual patterns. To examine this intersection at scale and assess cross-linguistic variation in metaphorical associations, we introduce the MetaMap project, which leverages Large Language Models to systematically detect and categorize metaphorical expressions across languages, including those typically underrepresented. In this talk, I will present findings from a large-scale corpus analysis including 30 languages from nine linguistic families, with a focus on emotional metaphors. Specifically, we identified shared and culture-specific metaphorical mappings, the range of metaphors used to describe each emotional concept, and the factors driving cross-linguistic variation.

12:50 PM – 1:00 PM

FINAL REMARKS

The event is organized by the Neurolinguistics and Experimental Pragmatics Lab at IUSS Pavia and is supported by the ERC project “PROcessing MEtaphors: Neurochronometry, Acquisition and DEcay” (PROMENADE).  E-mail address for information about the conference: ginevra.martinelli@iusspavia.it