publications | Karolina Stańczak

2024

EMNLP 2024
Benchmarking Vision Language Models for Cultural Understanding

Shravan Nayak, Kanishk Jain, Rabiul Awal, and 5 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs Bib HTML PDF

Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene understanding - recognizing objects, attributes, and actions - rather than cultural comprehension. This study introduces CulturalVQA, a visual question-answering benchmark aimed at assessing VLM’s geo-diverse cultural understanding. We curate a diverse collection of 2378 image - question pairs with 1-5 answers per question representing cultures from 11 countries across 5 continents. The questions probe understanding of various facets of culture such as clothing, food, drinks, rituals, and traditions. Benchmarking VLMs on CulturalVQA, including GPT-4V and Gemini, reveals disparity in their level of cultural understanding across regions, with strong cultural understanding capabilities for North America while significantly weaker capabilities for Africa. We observe disparity in their performance across cultural facets too, with clothing, rituals, and traditions seeing higher performances than food and drink. These disparities help us identify areas where VLMs lack cultural understanding and demonstrate the potential of CulturalVQA as a comprehensive evaluation set for gauging VLM progress in understanding diverse cultures.
@inproceedings{nayak2024benchmarkingvisionlanguagemodels, title = {Benchmarking Vision Language Models for Cultural Understanding}, author = {Nayak, Shravan and Jain, Kanishk and Awal, Rabiul and Reddy, Siva and van Steenkiste, Sjoerd and Hendricks, Lisa Anne and Stańczak, Karolina and Agrawal, Aishwarya}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.329}, pages = {5769--5790}, dimensions = {true} }

TACL

The Causal Influence of Grammatical Gender on Distributional Semantics

Karolina Stańczak, Kevin Du, Adina Williams, and 2 more authors

Transactions of the Association for Computational Linguistics, Nov 2024

@article{stanczak2023grammatical,
  author = {Stańczak, Karolina and Du, Kevin and Williams, Adina and Augenstein, Isabelle and Cotterell, Ryan},
  title = {The Causal Influence of Grammatical Gender on Distributional Semantics},
  year = {2024},
  shorthand = {TACL},
  journal = {Transactions of the Association for Computational Linguistics},
  url = {https://arxiv.org/abs/2311.18567},
  publisher = {MIT Press},
}

EMNLP 2024
Social Bias Probing: Fairness Benchmarking for Language Models

Marta Marchiori Manerba^*, Karolina Stańczak^*, Riccardo Guidotti, and 1 more author

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs Bib

While the impact of social biases in language models has been recognized, prior methods for bias evaluation have been limited to binary association tests on small datasets, limiting our understanding of bias complexities. This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment, which involves treating individuals differently according to their affiliation with a sensitive demographic group. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. SoFa expands the analysis beyond the binary comparison of stereotypical versus anti-stereotypical identities to include a diverse range of identities and stereotypes. Comparing our methodology with existing benchmarks, we reveal that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized. Benchmarking LMs on SoFa, we expose how identities expressing different religions lead to the most pronounced disparate treatments across all models. Finally, our findings indicate that real-life adversities faced by various groups such as women and people with disabilities are mirrored in the behavior of these models.
@inproceedings{marchiori-manerba-etal-2024-social, title = {Social Bias Probing: Fairness Benchmarking for Language Models}, author = {Marchiori Manerba, Marta and Stańczak, Karolina and Guidotti, Riccardo and Augenstein, Isabelle}, shorthand = {EMNLP'24a}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.812}, pages = {14653--14671}, author+an = {2=highlight}, }

2023

SlavicNLP@EACL’23
Measuring Gender Bias in West Slavic Language Models

Sandra Martinková, Karolina Stańczak, and Isabelle Augenstein

In Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023), May 2023

Abs Bib

Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these findings are predominantly based on monolingual language models for English, whereas there are few investigative studies of biases encoded in language models for languages beyond English. In this paper, we fill this gap by analysing gender bias in West Slavic language models. We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects. We complete the sentences using both mono- and multilingual language models and assess their suitability for the masked language modelling objective. Next, we measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words. We find that these language models produce hurtful completions that depend on the subject’s gender. Perhaps surprisingly, Czech, Slovak, and Polish language models produce more hurtful completions with men as subjects, which, upon inspection, we find is due to completions being related to violence, death, and sickness.
@inproceedings{martinkova-etal-2023-measuring, title = {Measuring Gender Bias in {W}est {S}lavic Language Models}, author = {Martinkov{\'a}, Sandra and Stańczak, Karolina and Augenstein, Isabelle}, booktitle = {Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)}, shorthand = {SlavicNLP@EACL'23}, month = may, year = {2023}, address = {Dubrovnik, Croatia}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.bsnlp-1.17}, pages = {146--154}, }
ACL 2023
Measuring Intersectional Biases in Historical Documents

Nadav Borenstein^*, Karolina Stańczak^*, Thea Rolskov, and 3 more authors

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

Abs Bib

Data-driven analyses of biases in historical texts can help illuminate the origin and development of biases prevailing in modern society. However, digitised historical documents pose a challenge for NLP practitioners as these corpora suffer from errors introduced by optical character recognition (OCR) and are written in an archaic language. In this paper, we investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries). Our analyses are performed along the axes of gender, race, and their intersection. We examine these biases by conducting a temporal study in which we measure the development of lexical associations using distributional semantics models and word embeddings. Further, we evaluate the effectiveness of techniques designed to process OCR-generated data and assess their stability when trained on and applied to the noisy historical newspapers. We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset. We provide evidence that gender and racial biases are interdependent, and their intersection triggers distinct effects. These findings align with the theory of intersectionality, which stresses that biases affecting people with multiple marginalised identities compound to more than the sum of their constituents.
@inproceedings{borenstein-etal-2023-measuring, title = {Measuring Intersectional Biases in Historical Documents}, author = {Borenstein, Nadav and Stańczak, Karolina and Rolskov, Thea and Klein K{\"a}fer, Natacha and da Silva Perez, Nat{\'a}lia and Augenstein, Isabelle}, shorthand = {ACL'23}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2023}, month = jul, year = {2023}, address = {Toronto, Canada}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2023.findings-acl.170}, pages = {2711--2730}, }

AAAI 2023

A Latent-Variable Model for Intrinsic Probing

Karolina Stańczak^*, Lucas Torroba Hennigen^*, Adina Williams, and 2 more authors

Proceedings of the AAAI Conference on Artificial Intelligence, Jun 2023

Bib

@article{stanczak2023latent,
  title = {A Latent-Variable Model for Intrinsic Probing},
  volume = {37},
  url = {https://ojs.aaai.org/index.php/AAAI/article/view/26593},
  number = {11},
  shorthand = {AAAI'23},
  journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
  author = {Stańczak, Karolina and Torroba Hennigen, Lucas and Williams, Adina and Cotterell, Ryan and Augenstein, Isabelle},
  year = {2023},
  month = jun,
  pages = {13591-13599},
}

arXiv 2023

Invisible Women in Digital Diplomacy: A Multidimensional Framework for Online Gender Bias Against Women Ambassadors Worldwide

Yevgeniy Golovchenko, Karolina Stańczak, Rebecca Adler-Nissen, and 2 more authors

arXiv:2311.17627 [cs], Jun 2023

Bib

@article{golovchenko,
  title = {Invisible Women in Digital Diplomacy: A Multidimensional Framework for Online Gender Bias Against Women Ambassadors Worldwide},
  author = {Golovchenko, Yevgeniy and Stańczak, Karolina and Adler-Nissen, Rebecca and Wangen, Patri-ce and Augenstein, Isabelle},
  shorthand = {arXiv'23},
  year = {2023},
  eprint = {2311.17627},
  archiveprefix = {arXiv},
  primaryclass = {cs.SI},
  url = {https://arxiv.org/abs/2311.17627},
  publisher = {arXiv},
  copyright = {arXiv.org perpetual, non-exclusive license},
  journal = {arXiv:2311.17627 [cs]},
}

PLOS ONE
Quantifying gender bias towards politicians in cross-lingual language models

Karolina Stańczak, Sagnik Ray Choudhury, Tiago Pimentel, and 2 more authors

PLOS ONE, Nov 2023

Abs Bib

Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models’ stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.
@article{stanczak2021quantifying, author = {Stańczak, Karolina and Ray Choudhury, Sagnik and Pimentel, Tiago and Cotterell, Ryan and Augenstein, Isabelle}, journal = {PLOS ONE}, shorthand = {PLOS One 18}, publisher = {Public Library of Science}, title = {Quantifying gender bias towards politicians in cross-lingual language models}, year = {2023}, month = nov, volume = {18}, url = {https://doi.org/10.1371/journal.pone.0277640}, pages = {1-24}, number = {11}, }

2022

NAACL 2022

Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models

Karolina Stańczak, Edoardo Ponti, Lucas Torroba Hennigen, and 2 more authors

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022

Bib Video

@inproceedings{stanczak-etal-2022-neuron,
  title = {Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models},
  author = {Stańczak, Karolina and Ponti, Edoardo and Torroba Hennigen, Lucas and Cotterell, Ryan and Augenstein, Isabelle},
  shorthand = {NAACL'22},
  booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  month = jul,
  year = {2022},
  address = {Seattle, United States},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.naacl-main.114},
  pages = {1589--1598},
}

PLOS ONE
Quantifying gender biases towards politicians on Reddit

Sara Marjanovic, Karolina Stańczak, and Isabelle Augenstein

PLOS ONE, Oct 2022

Abs Bib

Despite attempts to increase gender parity in politics, global efforts have struggled to ensure equal female representation. This is likely tied to implicit gender biases against women in authority. In this work, we present a comprehensive study of gender biases that appear in online political discussion. To this end, we collect 10 million comments on Reddit in conversations about male and female politicians, which enables an exhaustive study of automatic gender bias detection. We address not only misogynistic language, but also other manifestations of bias, like benevolent sexism in the form of seemingly positive sentiment and dominance attributed to female politicians, or differences in descriptor attribution. Finally, we conduct a multi-faceted study of gender bias towards politicians investigating both linguistic and extra-linguistic cues. We assess 5 different types of gender bias, evaluating coverage, combinatorial, nominal, sentimental and lexical biases extant in social media language and discourse. Overall, we find that, contrary to previous research, coverage and sentiment biases suggest equal public interest in female politicians. Rather than overt hostile or benevolent sexism, the results of the nominal and lexical analyses suggest this interest is not as professional or respectful as that expressed about male politicians. Female politicians are often named by their first names and are described in relation to their body, clothing, or family; this is a treatment that is not similarly extended to men. On the now banned far-right subreddits, this disparity is greatest, though differences in gender biases still appear in the right and left-leaning subreddits. We release the curated dataset to the public for future studies.
@article{marjanovic2022quantifying, author = {Marjanovic, Sara and Stańczak, Karolina and Augenstein, Isabelle}, journal = {PLOS ONE}, publisher = {Public Library of Science}, title = {Quantifying gender biases towards politicians on Reddit}, shorthand = {PLOS One 17}, year = {2022}, month = oct, volume = {17}, url = {https://doi.org/10.1371/journal.pone.0274317}, pages = {1-36}, number = {10}, }

2021

arXiv 2021

A Survey on Gender Bias in Natural Language Processing

Karolina Stańczak, and Isabelle Augenstein

arXiv:2112.14168 [cs], Oct 2021

Bib

@article{stanczak-etal-2021-survey,
  url = {https://arxiv.org/abs/2112.14168},
  author = {Stańczak, Karolina and Augenstein, Isabelle},
  shorthand = {arXiv'21},
  keywords = {Computer Science - Computation and Language},
  title = {A Survey on Gender Bias in Natural Language Processing},
  publisher = {arXiv},
  year = {2021},
  copyright = {arXiv.org perpetual, non-exclusive license},
  journal = {arXiv:2112.14168 [cs]},
  primaryclass = {cs},
}