The Center for Advanced Language Proficiency Education and Research (CALPER) at the Pennsylvania State University is one of fourteen National Language Resource Centers in the United States.
"Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects."
Hamshahri is one of the most popular daily newspapers in Iran that has been publishing for more than 20 years. Hamshahri corpus is a Persian test collection that consists of 345 MB of news texts from this newspaper from year 1996 to 2002 (corpus size with tags is 564 MB). This corpus contains more that 160,000 news articles about variety of subjects (82 categories like politic, literature, art, economy, …) and includes nearly 417000 different words.
"There are more than 200 Australian Indigenous languages. Less than 20 languages are strong, and even these are endangered: the others have been destroyed, live in the memories of the elderly, or are being revived by their communities. This site has annotated links to 180 resources for about 60 languages. About 25% of these resources are produced or published by Indigenous people. "
A digital archive of recordings and texts in and about the indigenous languages of Latin America. Includes recordings of naturally-occurring discourse in a wide range of genres, including narratives, ceremonies, oratory, conversations, and songs. Many of these recordings are accompanied by transcriptions and translations in either Spanish, English, or Portuguese. Also collects materials about these languages, such as grammars, dictionaries, ethnographies, and research notes. See also Center for Indigenous Languages of Latin America (CILLA).
"The ASLLRP includes: investigation of the syntactic structure of American Sign Language, and the relationship of syntax to semantics and prosody;
development of multimedia tools to facilitate access to and analysis of primary data for sign language research;
collaboration with computer scientists interested in problems involved in computer-based recognition and generation of signed languages.
"
From the Linguistics Laboratory, University of Pennsylvania, the Atlas is a part of the "The Telsur Project ... a survey of linguistic changes in progress in North American English, supported by the National Science Foundation and the National Endowment for the Humanities. " Access to the Print with CD version is available locally at:
-F- PE 2808 L26 2006 CD-ROM Electronic Information Center PCL 2.200
-F- PE 2808 L26 2006 TEXT Map Collection PCL Level 1
Language data, maps, bibliography, such as the Languages of Mexico.
A project of SIL International (formerly the Summer Institute of Linguistics). Print version: Ethnologue: languages of the world. / Ethnologue. / 14th ed. / Dallas, Tex. / 2000
P 123 G73 2000 CD-ROM PCL Reference - Electronic Information Center
P 123 G73 2000 TEXT Vols.1-2 PCL Reference Dept USE IN LIBRARY ONLY
Includes links to "University Programs of Study for Endangered Language Research" and "Organizations Engaged in Language Revitalization and Maintenance"
"The MLA Language Map uses data from the 2000 United States census to display the locations and numbers of speakers of thirty languages and three groups of less commonly spoken languages in the United States."
"The UCLA Language Materials Project (LMP) is an on-line bibliographic database of teaching and learning materials for over 100 Less Commonly Taught Languages (LCTLs). The "