Call for Papers
In the evolving landscape of AI, NLP advancements have potential across diverse sectors, yet underserved communities often miss out on these benefits. Due to limited resources, computational models, or commercial interests, many languages—such as Indigenous, regional dialects, and those spoken by smaller populations—remain unsupported by advanced NLP technologies. These include languages like Yoruba, Igbo, Native American languages, and minority languages in multilingual countries such as India, China, and Indonesia.
This workshop seeks to advance NLP for underserved communities, addressing unique challenges in deploying language models (LMs) to enable equitable, sustainable, and culturally sensitive NLP technologies. Our framework centers on three key pillars:
- AI Governance: Developing legal and ethical frameworks for NLP, ensuring fairness, transparency, and respect for data sovereignty and cultural rights.
- Cultural NLP: Building culturally nuanced models that understand and preserve language-specific terms and values, safeguarding linguistic diversity.
- Sustainable NLP: Creating resource-efficient, scalable models suitable for low-resource and environmentally constrained contexts.
We invite full and short papers on the following (but not limited to) topics:
- Democratization: Democratization of AI, open access data & models, community-driven LLMs
- Data Sovereignty: Data ownership & protection for language data
- Legal Concerns: Ethical LLMs, privacy in language data, intellectual property for LLMs
- Preserving Diversity in LLMs: Human-centric data collection for endangered languages, LLMs for minority languages
- Preserving Cultural Norms: Encoding cultural norms, training & evaluating culturally specific LLMs
- Efficient LMs for Broader Accessibility: Efficient LLMs, model compression, distributed computing, transfer learning, and knowledge distillation
This workshop will explore cutting-edge research and methodologies to develop and deploy LLMs for underserved communities, from training to deployment. By gathering insights from AI governance, cultural NLP, and sustainable NLP, we aim to create inclusive, impactful NLP technologies.
Submission Details
- Submission Deadline:
January 30, 2025February 7, 2025 - Pre-reviewed (ARR) Submission Deadline: February 20, 2025
- Notification of Acceptance: March 01, 2025
- Workshop Date: May 04, 2025
- Submission Format: We welcome long papers (8 pages) and short papers (4 pages), excluding references. Submissions are double-blind and must follow the ARR guidelines.
We accept both archival and non-archival submissions. Accepted papers that choose the archival track will be published in the NAACL workshop proceedings and archived on the workshop website. For inquiries, please contact the workshop organizers through this email: lm4uc.organizers (at) gmail.com.
We look forward to contributions that drive innovation and inclusivity in NLP, supporting underserved communities globally.
Online Participation
We have setup the Gathertown platform for online participation. You can join the workshop using the link below: Join Gathertown
Proceedings
The Proceedings of the 1st Workshop on Language Models for Underserved Communities (LM4UC 2025) is published on the ACL Anthology.
List of Speakers
List of Organizers
Schedule
| Time | Event |
|---|---|
| 09:00am - 09:30am | (In-person) LM4UC Opening Remark: Alice Oh |
| 09:30am - 10:00am | (In-person) LM4UC Keynote 1: David Ifeoluwa Adelani |
| 10:00am - 10:30am | (In-person) LM4UC Keynote 2: Cynthia Bailey |
| 10:30am - 11:00am | (Hybrid) Structured Networking Event + Tea Break |
| 11:00am - 11:30am | (In-person) LM4UC Keynote 3: Genta Winata |
| 11:30am - 12:00pm | (Virtual) LM4UC Keynote 4: Timothy Baldwin |
| 12:00pm - 12:30pm | (Virtual) LM4UC Keynote 5: Pratyusha Ria Kalluri |
| 12:30pm - 01:00pm | (Hybrid) Structured Networking Event + Lunch Break |
| 01:00pm - 01:50pm | (Hybrid) LM4UC Panel Discussion by Angelina Wang |
| 01:50pm - 03:30pm | (Hybrid) Student Oral |
| 03:30pm - 04:30pm | (Hybrid) Poster Session |
| 04:30pm - 05:00pm | (Virtual) LM4UC Conclusion and Award: Sanmi Koyejo |
Awards
Best Paper Award: Enhancing Small Language Models for Cross-Lingual GeneralizedZero-Shot Classification with Soft Prompt Tuning by Fred Philippy, Siwen Guo, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé
Honorable Mention: Direct Preference Optimization With Unobserved Preference Heterogeneity by Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis
Accepted Papers
| Title | Authors | Presentation |
|---|---|---|
| Enhance Contextual Learning in ASR for Endangered Low-resource Languages Paper Video | Zhaolin Li, Jan Niehues | Poster |
| Empowering Low-Resource Languages: TraSe Architecture for Enhanced Retrieval-Augmented Generation in Bangla Paper Video | Atia Shahnaz Ipa, Mohammad Abu Tareq Rony, Mohammad Shariful Islam | Poster |
| ABDUL: a new Approach to Build language models for Dialects Using formal Language corpora only Paper Video | Yassine Toughrai, Kamel Smaïli, David Langlois | Oral |
| Untangling the Influence of Typology, Data and Model Architecture on Ranking Transfer Languages for Cross-Lingual POS Tagging Paper Video | Enora Rice, Ali Marashian, Hannah J. Haynie, Katharina von der Wense, Alexis Palmer | Oral |
| Serving the Underserved: Leveraging BARTBahnar Language Model for Bahnaric-Vietnamese Translation Paper Video | Long Nguyen, Tran Le, Huong Nguyen, Quynh Vo, Phong Nguyen, Tho Quan | Poster |
| Caption generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models Paper Video | Artem Reshetnikov, Maria-Cristina Marinescu | Oral |
| Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems Paper Video | Mahfuz Ahmed Anik, Abdur Rahman, Azmine Toushik Wasi, Md Manjurul Ahsan | Oral |
| Enhancing Small Language Models for Cross-Lingual Generalized Zero-Shot Classification with Soft Prompt Tuning Paper Video | Fred Philippy, Siwen Guo, Cedric Lothritz, Jacques Klein, Tegawendé F. Bissyandé | Oral |
| Cognate and Contact-Induced Transfer Learning for Hamshentsnag: A Low-Resource and Endangered Language Paper Video | Onur Keleş, Baran Günay, Berat Doğan | Oral |
| PALbot+: An Empathetic Dialogue System for the Marginalized Populations in Korea with AI Safety Features Paper Video | Suyeon Lee, Hye Jin Lee, Hyunjong Kim, Sohhyung Park, Sungzoon Cho | Poster |
| Nayana OCR: A Scalable Framework for Document OCR in Low-Resource Languages Paper Video | Adithya S Kolavi, Vyoman Jain, Samarth P | Poster |
| On Tables with Numbers, with Numbers Paper Video | Konstantinos Kogkalidis, Stergios Chatzikyriakidis | Oral |
| Direct Preference Optimization With Unobserved Preference Heterogeneity Paper Video | Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis | Oral |
| La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America Paper Video | María Grandury, Javier Aula-Blasco, Júlia Falcão, et al. | Poster |