Keeping Endangered Languages Alive Online

Keeping Endangered Languages Alive #GV2012

Will we all converge on English and Chinese online? Or will technology help sustain our many mother tongues?

Eddie Avila, Director of Rising Voices, notes the difference between endangered languages and underrepresented languages. Three of the panelists are working with the latter in African languages, where there's not a proportional amount of the language online. Young people need to see their language reflected on the internet and in localized software (like OpenOffice) to understand that their language belongs to the future, and not history.

Boukary Konate

Boukary (@fasokan, blogging at Fasokan.com), is from Mali, and speaks Bambara, one of maybe fifteen languages in Mali. Bambara is spoken by some 80% of the people. The educational system begins with the mother tongue, followed by French and English. He thinks it's critical to keep his native language online, so he blogs in Bambara and French. Language is passed on in villages with alphabetization lessons, so Boukary has designed lessons to encourage children to write and tell stories in their native language.

Bambara depends on four very unique letters, and Western keyboards don't feature these characters. But there are extraneous letters on the standard QWERTY keyboard, like the letter Q itself, which Bambara does not need, so substitutions can be made manually and letters can be switched. Someone developed a Facebook / Twitter application that supports Bambara, allowing people to post statuses in their native tongue.

Might on-screen, software-based keyboards help us overcome the limitations of physical keyboards designed for dominant languages? (See ANLOC's language keyboards page for more). Accentuate.us is another solution. It's a great Firefox extension that “allows you to type quickly and easily in more than 100 languages without extra keystrokes or a special keyboard.”


Abdoulaye Bah

Abdoulaye is a Global Voices blogger in the French-speaking group. He is from Guinea, but lives in Italy and France. Over the course of his life, he has spoken 8 languages, but the one he knows least is his own, Fulani. He has used the other languages more, as they have been required by practical purposes. Fula is spoken as a first or second language in as many as 18 African countries, including Guinea, where 40% of the people speak it, and Mauritania, Cameroon, Chad, and parts of Ethiopia.

Many of the problems the language faces stem from the fact that it isn't taught in school. People learn to write it by individual lesson, or not at all. Abdoulaye sees blogging as one of the only ways to keep the language alive. There are many blogs in Fulani. Online videos are another way the language is represented online, bypassing the challenge that more people speak than write with the language.


Oliver Stegen

Oliver is originally from Germany, but has been in East Africa for many years (his mother tongue is most definitely not endangered). He came to Global Voices not as a blogger, but as a linguist with SIL. They work with mother tongues and translation.

Rangi is one of 120 languages spoken in Tanzania, and is historically only a spoken language. The global linguist community wanted to document a written version, so Oliver spent years in Tanzania talking to government officials and teachers to record the language. He wrote linguistic articles in academic journals, but quickly realized that this form of publication would not help this endangered language survive. Instead, he worked with teachers to establish literacy classes in villages. But the language still didn't gain much relative ground.

Come 2006, Oliver discovered citizen media. He began posting Facebook statuses in Rangi, to the dismay of his friends. He couldn't even convince Rangi speakers to use Rangi online, because many of their own friends spoke English. Languages appear to benefit and suffer from the same network effects as social networking sites themselves.

indigenoustweets.com Twitter List

Oliver then began writing tweets in Rangi, and discovered indigenoustweets.com, where Kevin Scannell identifies and traces tweets posted in small languages. The list represents well over 100 small languages, and the most influential tweeters in each language (as measured by followers). Oliver's Rangi tweets made the list. He provided Kevin with 150 pages of Rangi text, which was included in the software so the language can now be recognized when it is used online.

The African Network for Localization (ANLOC) organizes forums in local languages.
Oliver started a Facebook Group for Rangi speakers, and after years of trying, the group grew to 26 members within 2 hours. The earlier literacy efforts had paid off, in terms of people being able to write the language. The group now has well over 300 members, with daily usage across Tanzania and worldwide. A Rangi-speaker in Miami was able to use the group to connect with a Rangi-speaker in Minnesota who was traveling through on a business trip. Challenges include the special characters of the Rwangi language, which is hard to represent online. Accentuate.us is a great service, but Rwangi relies on unique characters.

Oliver is also on the language committee of Wikimedia and an editor at the Swahili Wikipedia. There's a debate with Swahili, which is certainly not endangered, but is underrepresented online. Global Voices has had challenges finding Swahili translators, resulting in months without fresh Swahili content on the site.

EndangeredLanguages.com screenshot of languages by location

Last week, Google launched their EndangeredLanguages.com initiative with the Alliance for Linguistic Diversity. The first challenge we face is counting endangered languages. The Google initiative lists over 3,000 languages, some of which may be underrepresented rather than endangered. Only 285 languages have a Wikipedia edition at all (much less a robust edition). The gap between Google's 3,054 languages and the 285 Wikipedia editions is the challenge of getting endangered languages online.

A group is working in Romania to localize software, and faces challenges like inventing new words for modern objects like the computer. Rising Voices offers microgrants to amplify the voices of people in underrepresented communities, and this includes three grants to people working to revive underrepresented languages:

  • Aché in Paraguay
  • Quechua in Peru
  • Powahattan, considered an extinct language (with no more active speakers), but being reconstructed and retaught by one man.

Lingual Convergence Online

It appears that with language online, people default to the most common and dominant languages. One person wrote on a friend's wall in Kechua, but he replied in Spanish. Especially on Facebook, people write for the language that the majority of their friends will understand. Oliver suggests that solution around this phenomenon is to create a dedicated space online where an underrepresented language can be practiced. If people come into the Rangi forums writing only in Swahili, they're told off. Language norms can be enforced in online fora just like other community norms.

As the world becomes more global, people seek to conform. In Africa, people are moving to cities and learning English and especially Chinese. Even languages that weren't previously endangered are becoming so, because newly urban children don't learn or use their mother tongue. We all speak English on these panels, and the concept of a lingua franca is nothing new.

An audience member shares a linguistic joke with us: Finnish is such a difficult language that when two Finns want to tell each other something quickly, they use English instead.

Another joke: Which language is closer to heaven, Hungarian or Finnish?
It doesn't matter, they both take eternity to learn.

Is automated translation an option?

Google Chrome, and now Gmail, offer inline translation. Bing provides Facebook status translation. It's not perfect, but it helps us get the gist of what the message is about. And it works OK between Hungarian and Finnish. But it's terrible for Swahili. We need many human translators to improve the automated algorithms. We're unlikely to ever have an automated option for endangered languages, as we don't even have a corpus to train the machine.

8 comments

  • Many thanks for the liveblog.

  • […] Keeping endangered languages alive […]

  • […] In the current disposition, beyond the enthusiasm of many Africans in the face of an avalanche of proposals by computer specialists on the possibility of writing African languages with divergent scripts, there is need to remain lucid. Indeed, thanks somehow to significant developments in translation and translation technology, the 21st century has ushered in a paradigm shift whereby linguistic diversity is no longer seen as a threat as was the case over the years with certain sociolinguistic models. Rather, the in-thing is to advocate for the preservation all endangered languages (http://www.endangeredlanguages.com/about/#about_alliance; http://www.paradisec.org.au/blog/2012/06/elar-cracks-a-ton/; http://summit2012.globalvoicesonline.org/2012/07/keeping-endangered-languages-alive-online/). In the process, each of the estimated 7000 languages or so across the world, including well over 2500 in Africa alone, is now seen as a true investment opportunity, with many computer specialists outside Africa already positioning themselves for exploitation. However, while such investors fundamentally focus on demonstrating that hitherto unwritten endogenic languages of Africa can effectively be written via electronic and internet means, it cannot be overemphasised that the matter goes far beyond the possibility or effectiveness of writing African languages electronically. The issue is actually that of efficiency and competitive advantage. More precisely, the relevant question is whether electronic writing in this case is competitive and efficient enough in the face of writing in the existing official languages. Until now, a fact remains: the actual large scale implementation of proposals by potential or real investors in divergent script writing across Africa can hardly be obtained in the much-desired short term. There is urgent need not simply to save most endogenic languages from further attrition and death but, most importantly, to ensure effective learning, teaching and hobbying in those languages as well as sustainable content development during and after training by grassroots users themselves. Some may want to argue that writing in languages like English, French, Spanish, German, took a very long time to gain relative stability. But then, the science of writing had hardly developed to the level it is today. With the current scientific knowledge available in matters of writing, it is possible to develop and stabilise writing systems and writing habits much faster than in the past, with the proviso that the right decisions are taken during the design stage of the writing systems. As of now, the efficient writing of an African language with scripts that diverge from the writing symbols of the official language of the setting remains an uphill task, especially in terms of digital interoperability and transferability. […]

  • […] segundo grupo de sesiones simultáneas tocó temas diversos. Una de las sesiones fue sobre los idiomas en peligro y cómo desde los medios ciudadanos se puede hacer algo por su […]

  • Joesela

    I thought the establishment of ACALAN (http://www.acalan.org/) signified a continental political will which should lay the foundation for developing and implementing good policies regarding the (sustainable) development and use of indigenous African languages in all domains. But media reports from yesterday show that the Kenyan CJ has just banned the use of mother tongues in court registries, whatever that means (http://www.kenyan-post.com/2012/07/willy-mutunga-comes-up-with-yet-another.html), for the “sake” of national cohesion and unity.

  • […] Keeping Endangered Languages Alive Online […]

  • […] Owigar (cofounder and president of Akirachix). Eddie Avila, Director of Rising Voices, hosted another panel discussion about the role of Internet in keeping smaller native languages alive. One session […]

  • […] Keeping Endangered Languages Alive Online / Rising Voices: Wiring Offgrid Villages & Preserving Language Online Given the history and location of the Caribbean, there are several languages within the region which are in danger of extinction, or in the case of some languages like Antillean Creole, in danger of extinction from some territories. The Jamaica Language Unit at the University of the West Indies Mona Campus has recorded some of these languages for posterity and posted them to YouTube: French Creole in Trinidad; Lokono (Arawak) in Guyana;Kromanti (Maroon community in Jamaica); Berbice Dutch Creole (Guyana); and Garifuna (Belize). Other languages in the region include Saramaccan , spoken by the Saramaka ('Bush Negroes') in Suriname. […]

Join the conversation

Authors, please log in »

Guidelines

  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.