LASHERRIE LHERNDON

I am Dr. Lasherrie L. Herndon, a linguist, computational ethnographer, and AI ethicist pioneering critical frameworks to navigate the dual-edge impact of multilingual LLMs on endangered languages. As the Founding Director of the Endangered Language Dynamics Lab at Stanford (2020–present) and former Head of Ethical AI at UNESCO’s Digital Heritage Initiative (2016–2020), I work at the intersection of low-resource NLP, linguistic anthropology, and decolonial AI to confront a paradox: Generative models can resurrect fading languages—or accelerate their extinction. My TongueGuard framework, which embeds community-driven sovereignty protocols into neural architectures, reduced linguistic bias by 52% in 43 endangered language families while curbing "digital erosion" (ACL 2024 Best Paper). My mission: To transform generative AI from a homogenizing force into a culturally recursive tool, amplifying linguistic diversity without replicating colonial data hierarchies.

Methodological Innovations

1. Community-Centric Model Training

  • Neurosymbolic Architecture:

    • Hybridized transformer models with symbolic grammars co-designed by Indigenous speakers, enforcing syntactic constraints via dynamic lexical trees.

    • Revitalized Ainu (Japan) and Yuchi (Oklahoma) oral traditions by generating culturally grounded narratives with 89% speaker approval (EMNLP 2024).

    • Key safeguard: Fluid Consent Layers allowing communities to retroactively delete or relicense training data.

2. Erosion Quantification Metrics

  • Linguistic Entropy Index (LEI):

    • Developed LEI to measure how LLM-generated code-switching accelerates grammatical simplification in endangered languages.

    • Found that unrestricted ChatGPT use among Nahuatl youth increased Spanish loanwords by 37% in 6 months (Nature Language Science 2025).

3. Cross-Generational Knowledge Transfer

  • Generational Embedding Alignment:

    • Aligned latent spaces of elder speech recordings and youth text messages to bridge intergenerational dialect gaps.

    • Enabled Māori teenagers to generate TikTok content in ancestral dialects with 95% morphological accuracy.

Landmark Applications

1. Digital Revitalization Partnerships

  • UNESCO & Navajo Nation Collaboration:

    • Co-built DinéChat, a generative app preserving Navajo (Diné Bizaad) through gamified storytelling and elder-approved synthetic speech.

    • Increased youth fluency by 28% in pilot schools while blocking English-dominant code-mixing.

2. Anti-Erosion Policy Tools

  • EU Endangered Language Act Compliance:

    • Designed LinguaSentry, an API detecting LLM-induced grammatical erosion in real-time for regulatory auditing.

    • Mandated in Wales to protect Welsh-language content from anglophone model contamination.

3. Crisis Response for "Last Speakers"

  • Amazon Conservation Alliance:

    • Deployed LastSpeakerML to salvage Taushiro (Peru) and Dumi (Nepal) from extinction via hallucination-free grammar induction from <5 hours of speech.

    • Synthesized 2000+ Taushiro sentences for UNESCO’s emergency archive.

Technical and Ethical Impact

1. Decentralized Language Sovereignty

  • Launched TongueGuard Cloud:

    • Federated learning platform where communities retain data ownership while contributing to global language models.

    • Adopted by 142 Indigenous groups to train localized LLMs without corporate data extraction.

2. Neuromorphic Preservation Hardware

  • IBM TrueNorth Collaboration:

    • Embedded endangered grammars into neuromorphic chips as energy-efficient "linguistic DNA" for offline use.

    • Enabled Rapa Nui (Easter Island) language survival during internet blackouts.

3. Linguistic Reparations Framework

  • African Union Partnership:

    • Trained AfroGPT on pre-colonial language maps to reverse AI’s Eurocentric lexical bias.

    • Restored Bantu click consonant systems erased by colonial orthographies in 23 LLMs.

Future Directions

  1. Post-Extinction Language Inference
    Reconstruct dormant languages like Ubykh (Turkey) via cross-linguistic topology and ancient loanword analysis.

  2. Generative Orality Preservation
    Develop 4D speech synthesis capturing gesture-prosody entanglement in signed/endangered oral traditions.

  3. Anti-Colonial Model Licensing
    Co-design blockchain-based data sovereignty contracts to prevent corporate appropriation of community IP.

Collaboration Vision
I seek partners to:

  • Scale TongueGuard for the Pan-African Language Digitization Initiative.

  • Co-develop GestureGPT with Deaf communities to preserve endangered sign languages.

  • Establish AI-Linguistic Reparations Tribunal to audit historical LLM harm to Indigenous data ecosystems.

Research Experiments

Conducting experiments to validate effective language preservation models.

A wooden sign with colorful illustrations of children, trees, and a river, surrounded by lush green foliage. The sign features text in a foreign language and has a playful, cartoonish style.
A wooden sign with colorful illustrations of children, trees, and a river, surrounded by lush green foliage. The sign features text in a foreign language and has a playful, cartoonish style.
Dataset Collection

Collection and preprocessing of diverse datasets for endangered languages.

Yellow flags with text in multiple languages featuring a logo with a silhouette, against a backdrop of dark green trees.
Yellow flags with text in multiple languages featuring a logo with a silhouette, against a backdrop of dark green trees.
Model Validation

Validating model effects on endangered languages through real-world scenarios.

Preserving Endangered Languages Through Innovation

We collect and analyze diverse datasets to develop advanced models aimed at preserving endangered languages and their cultural heritage through innovative technology and rigorous validation.

Three spiral-bound booklets with colorful covers are hanging from strings against a textured wall. Each cover features different people and text in a language that appears to be Korean.
Three spiral-bound booklets with colorful covers are hanging from strings against a textured wall. Each cover features different people and text in a language that appears to be Korean.

Language Preservation Solutions

We specialize in validating models for preserving endangered languages through innovative data collection and analysis.

Model Evaluation Process

Our process evaluates the impact of multilingual models on endangered languages through rigorous experimental validation.

A miniature model showcasing a series of structures resembling traditional huts or enclosures, each labeled with Arabic text. The structures are enclosed by fences made of wooden sticks or similar materials. The layout seems to depict an organized settlement or village.
A miniature model showcasing a series of structures resembling traditional huts or enclosures, each labeled with Arabic text. The structures are enclosed by fences made of wooden sticks or similar materials. The layout seems to depict an organized settlement or village.
Data Collection Methods

We utilize diverse datasets to ensure comprehensive representation of endangered languages and their cultural contexts.

Our experiments validate model effectiveness in real-world scenarios, ensuring practical applications for language preservation.

Validation and Testing
A miniature model of a building complex featuring white structures resembling traditional architectural styles. The model includes several interconnected buildings with arched doorways and windows, all set on a textured grassy base. Protective glass encases the display, which rests on a wooden surface.
A miniature model of a building complex featuring white structures resembling traditional architectural styles. The model includes several interconnected buildings with arched doorways and windows, all set on a textured grassy base. Protective glass encases the display, which rests on a wooden surface.
A group of people are gathered in front of a building. On the left side, three women in red and pink traditional attire stand outside an open door, one seated on a blue chair. Two children are seen on the right, peering into a window, dressed in casual clothes. Above them, a sign written in a language using a non-Latin script is posted.
A group of people are gathered in front of a building. On the left side, three women in red and pink traditional attire stand outside an open door, one seated on a blue chair. Two children are seen on the right, peering into a window, dressed in casual clothes. Above them, a sign written in a language using a non-Latin script is posted.

In my past research, the following works are highly relevant to the current study:

“Research on the Application of Multilingual Generative Models in Language Preservation”: This study explored the broad impact of multilingual generative models in language preservation, providing a technical foundation for the current research.

“Quantitative Analysis of Endangered Languages”: This study systematically analyzed the characteristics and trends of endangered languages, providing theoretical support for the current research.

“Case Studies of Endangered Languages Based on GPT-3.5”: This study conducted case studies of endangered languages using GPT-3.5, providing a technical foundation and lessons learned for the current research.

These studies have laid a solid theoretical and technical foundation for my current work and are worth referencing.