Call for Papers

In the evolving landscape of AI, NLP advancements have potential across diverse sectors, yet underserved communities often miss out on these benefits. Due to limited resources, computational models, or commercial interests, many languages—such as Indigenous, regional dialects, and those spoken by smaller populations—remain unsupported by advanced NLP technologies. These include languages like Yoruba, Igbo, Native American languages, and minority languages in multilingual countries such as India, China, and Indonesia.

This workshop seeks to advance NLP for underserved communities, addressing unique challenges in deploying language models (LMs) to enable equitable, sustainable, and culturally sensitive NLP technologies. Our framework centers on three key pillars:

AI Governance: Developing legal and ethical frameworks for NLP, ensuring fairness, transparency, and respect for data sovereignty and cultural rights.
Cultural NLP: Building culturally nuanced models that understand and preserve language-specific terms and values, safeguarding linguistic diversity.
Sustainable NLP: Creating resource-efficient, scalable models suitable for low-resource and environmentally constrained contexts.

We invite full and short papers on the following (but not limited to) topics:

Democratization: Democratization of AI, open access data & models, community-driven LLMs
Data Sovereignty: Data ownership & protection for language data
Legal Concerns: Ethical LLMs, privacy in language data, intellectual property for LLMs
Preserving Diversity in LLMs: Human-centric data collection for endangered languages, LLMs for minority languages
Preserving Cultural Norms: Encoding cultural norms, training & evaluating culturally specific LLMs
Efficient LMs for Broader Accessibility: Efficient LLMs, model compression, distributed computing, transfer learning, and knowledge distillation

This workshop will explore cutting-edge research and methodologies to develop and deploy LLMs for underserved communities, from training to deployment. By gathering insights from AI governance, cultural NLP, and sustainable NLP, we aim to create inclusive, impactful NLP technologies.

Submission Details

Submission Deadline: ~~January 30, 2025~~ February 7, 2025
Pre-reviewed (ARR) Submission Deadline: February 20, 2025
Notification of Acceptance: March 01, 2025
Workshop Date: May 04, 2025
Submission Format: We welcome long papers (8 pages) and short papers (4 pages), excluding references. Submissions are double-blind and must follow the ARR guidelines.

We accept both archival and non-archival submissions. Accepted papers that choose the archival track will be published in the NAACL workshop proceedings and archived on the workshop website. For inquiries, please contact the workshop organizers through this email: lm4uc.organizers (at) gmail.com.

We look forward to contributions that drive innovation and inclusivity in NLP, supporting underserved communities globally.

Online Participation

We have setup the Gathertown platform for online participation. You can join the workshop using the link below: Join Gathertown

Proceedings

The Proceedings of the 1st Workshop on Language Models for Underserved Communities (LM4UC 2025) is published on the ACL Anthology.

List of Speakers

List of Organizers

Schedule

Time	Event
09:00am - 09:30am	(In-person) LM4UC Opening Remark: Alice Oh
09:30am - 10:00am	(In-person) LM4UC Keynote 1: David Ifeoluwa Adelani
10:00am - 10:30am	(In-person) LM4UC Keynote 2: Cynthia Bailey
10:30am - 11:00am	(Hybrid) Structured Networking Event + Tea Break
11:00am - 11:30am	(In-person) LM4UC Keynote 3: Genta Winata
11:30am - 12:00pm	(Virtual) LM4UC Keynote 4: Timothy Baldwin
12:00pm - 12:30pm	(Virtual) LM4UC Keynote 5: Pratyusha Ria Kalluri
12:30pm - 01:00pm	(Hybrid) Structured Networking Event + Lunch Break
01:00pm - 01:50pm	(Hybrid) LM4UC Panel Discussion by Angelina Wang
01:50pm - 03:30pm	(Hybrid) Student Oral
03:30pm - 04:30pm	(Hybrid) Poster Session
04:30pm - 05:00pm	(Virtual) LM4UC Conclusion and Award: Sanmi Koyejo

Awards

Best Paper Award: Enhancing Small Language Models for Cross-Lingual GeneralizedZero-Shot Classification with Soft Prompt Tuning by Fred Philippy, Siwen Guo, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé

Honorable Mention: Direct Preference Optimization With Unobserved Preference Heterogeneity by Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis

Accepted Papers

Title	Authors	Presentation
Enhance Contextual Learning in ASR for Endangered Low-resource Languages Paper Video	Zhaolin Li, Jan Niehues	Poster
Empowering Low-Resource Languages: TraSe Architecture for Enhanced Retrieval-Augmented Generation in Bangla Paper Video	Atia Shahnaz Ipa, Mohammad Abu Tareq Rony, Mohammad Shariful Islam	Poster
ABDUL: a new Approach to Build language models for Dialects Using formal Language corpora only Paper Video	Yassine Toughrai, Kamel Smaïli, David Langlois	Oral
Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging Paper Video	Enora Rice, Ali Marashian, Hannah J. Haynie, Katharina von der Wense, Alexis Palmer	Oral
Serving the Underserved: Leveraging BARTBahnar Language Model for Bahnaric-Vietnamese Translation Paper Video	Long Nguyen, Tran Le, Huong Nguyen, Quynh Vo, Phong Nguyen, Tho Quan	Poster
Caption generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models Paper Video	Artem Reshetnikov, Maria-Cristina Marinescu	Oral
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems Paper Video	Mahfuz Ahmed Anik, Abdur Rahman, Azmine Toushik Wasi, Md Manjurul Ahsan	Oral
Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning Paper Video	Fred Philippy, Siwen Guo, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé	Oral
Cognate and Contact-Induced Transfer Learning for Hamshentsnag: A Low-Resource and Endangered Language Paper Video	Onur Keleş, Baran Günay, Berat Doğan	Oral
PALbot+: An Empathetic Dialogue System for the Marginalized Populations in Korea with AI Safety Features Paper Video	Suyeon Lee, Hye Jin Lee, Hyunjong Kim, Sohhyung Park, Sungzoon Cho	Poster
Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages Paper Video	Adithya S Kolavi, Vyoman Jain, Samarth P	Poster
On Tables with Numbers, with Numbers Paper Video	Konstantinos Kogkalidis, Stergios Chatzikyriakidis	Oral
Direct Preference Optimization With Unobserved Preference Heterogeneity Paper Video	Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis	Oral
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper Video	María Grandury, Javier Aula-Blasco, Júlia Falcão, et al.	Poster

NAACL 2025 Workshop Language Models for Underserved Communities

Albuquerque, New Mexico

May 04, 2025

Call for Papers

Submission Details

Online Participation

Proceedings

List of Speakers

Timothy Baldwin

MBZUAI

Cynthia Bailey

Stanford

Pratyusha Ria Kalluri

Stanford

David Ifeoluwa Adelani

McGill

Genta Indra Winata

Capital One

List of Organizers

Sang Truong

Stanford

Rifki Afina Putri

KAIST

Duc Nguyen

VNU-HCMUT

Angelina Wang

Stanford

Daniel Ho

Stanford

Alice Oh

KAIST

Sanmi Koyejo

Stanford

Schedule

Awards

Accepted Papers