Linguistics Data Consortium
The Linguistic Data Consortium at the University of Pennsylvania supports language-related education, research and technology development by creating and sharing linguistic resources: data, tools and standards.
Corpora licensed by the UT Libraries include selections from LDC catalog years: 2008, 2007, 2006, 2005, 2004, 2003, 1997, 1996, 1995 and 1994 and are available for checkout to current UT students, faculty and staff. Request these CD and DVDs by call number at the Electronic Information Center of the Perry-Castaneda Library (hours). The proctor's desk is at the left inside the glass doors on the right side of the PCL Lobby. For information about using LDC data see http://www.ldc.upenn.edu/Using/. Links by title go to the LDC catalog description.
- LDC2008T01 Hungarian-English Parallel Text, Version 1.0 (1 CD)
CDROM 2430 Electronic Information Center PCL 2.200 - LDC2008T04 OntoNotes Release 2.0 (1 DVD)
DVDROM 77 Electronic Information Center PCL 2.200 - LDC2008T05 Penn Discourse Treebank Version 2 (1 CD)
CDROM 2421 Electronic Information Center PCL 2.200 - LDC2008T15 North American news text, complete (1 DVD)
DVDROM 124 Electronic Information Center PCL 2.200 - LDC2008T19 The New York times annotated corpus (1 DVD)
DVDROM 123 Electronic Information Center PCL 2.200
- LDC2007S01 Levantine Arabic Conversational Telephone Speech (2 DVDs)
DVDROM 73 Electronic Information Center PCL 2.200 - LDC2007S08 CSLU: Foreign Accented English Release 1.2 (1 DVD)
DVDROM 86 Electronic Information Center PCL 2.200 - LDC2007S09 Mandarin Affective Speech (1 DVD)
DVDROM 87 Electronic Information Center PCL 2.200 - LDC2007S13 CSLU: Apple Words and Phrases (1 DVD)
DVDROM 88 Electronic Information Center PCL 2.200 - LDC2007S15 Nationwide Speech Project (1 DVD)
DVDROM 89 Electronic Information Center PCL 2.200 - LDC2007S18 CSLU: Kids` Speech Version 1.1 (3 DVDs)
DVDROM 90 Electronic Information Center PCL 2.200 - LDC2007T01 Levantine Arabic Conversational Telephone Speech, Transcripts (1 CD)
CDROM 2345 Electronic Information Center PCL 2.200 - LDC2007T02 English Chinese Translation Treebank v 1.0 (1 CD)
CDROM 2346 Electronic Information Center PCL 2.200 - LDC2007T07 English Gigaword Third Edition (2 DVDs)
DVDROM 74 Electronic Information Center PCL 2.200 - LDC2007T08 ISI Arabic-English Automatically Extracted Parallel Text (1 CD)
CDROM 2347 Electronic Information Center PCL 2.200 - LDC2007T09 ISI Chinese-English Automatically Extracted Parallel Text (1 CD)
CDROM 2348 Electronic Information Center PCL 2.200 - LDC2007T22 2001 Topic Annotated Enron Email Data Set (1 DVD)
CDROM 2429 Electronic Information Center PCL 2.200 - LDC2007T23 GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1 (1 CD)
CDROM 2349 Electronic Information Center PCL 2.200 - LDC2007T24 GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1 (1 CD)
CDROM 2350 Electronic Information Center PCL 2.200 - LDC2007T38 Chinese Gigaword Third Edition (1 DVD)
DVDROM 75 Electronic Information Center PCL 2.200 - LDC2007T40 Arabic Gigaword Third Edition (1 DVD)
DVDROM 76 Electronic Information Center PCL 2.200
- LDC2006S13 N4 NATO Native and Non-Native Speech (1 DVD)
DVDROM 78 Electronic Information Center PCL 2.200 - LDC2006S14 CSLU: Stories v 1.2 (1 CD)
CDROM 2427 Electronic Information Center PCL 2.200 - LDC2006S29 Levantine Arabic QT Training Data Set 5, Speech (3 DVDs)
DVDROM 64 Electronic Information Center PCL 2.200 - LDC2006S31 NIST 2003 Language Recognition Evaluation (1 DVD)
DVDROM 65 Electronic Information Center PCL 2.200 - LDC2006S34 Russian through Switched Telephone Network (1 DVD)
DVDROM 66 Electronic Information Center PCL 2.200 - LDC2006S35CSLU: Multilanguage Telephone Speech Version 1.2 (1 DVD)
DVDROM 79 Electronic Information Center PCL 2.200 - LDC2006S36 West Point Korean Speech (2 DVDs)
DVDROM 67 Electronic Information Center PCL 2.200 - LDC2006S37 West Point Heroico Spanish Speech (1 DVD)
DVDROM 68 Electronic Information Center PCL 2.200 - LDC2006T01 Prague Dependency Treebank 2.0 (1 CD)
CDROM 2428 Electronic Information Center PCL 2.200 - LDC2006T06 ACE 2005 Multilingual Training Corpus (1 DVD)
DVDROM 69 Electronic Information Center PCL 2.200 - LDC2006T07 Levantine Arabic QT Training Data Set 5, Transcripts (1 CD)
CDROM 2343 Electronic Information Center PCL 2.200 - LDC2006T12 Spanish Gigaword First Edition (1 DVD)
DVDROM 70 Electronic Information Center PCL 2.200 - LDC2006T13 Web 1T 5-gram Version 1 (6 DVDs)
DVDROM 80 Electronic Information Center PCL 2.200 - LDC2006T17 French Gigaword First Edition (1 DVD)
DVDROM 71 Electronic Information Center PCL 2.200 - LDC2006T18 TDT5 Multilingual Text (1 DVD)
DVDROM 72 Electronic Information Center PCL 2.200 - LDC2006T19 TDT5 Topics and Annotations (1 CD)
CDROM 2344 Electronic Information Center PCL 2.200
- LDC2005S14 Levantine Arabic QT Training Data Set 4 (speech + transcripts) (2 DVDs)
DVDROM 36 DISCS 1-2 Electronic Information Center PCL 2.200 - LDC2005T01 Chinese Treebank 5.0 (1 CD)
CDROM 1785 Electronic Information Center PCL 2.200 - LDC2005T02 Arabic Treebank: Part 1 v 3.0 (POS with full vocal.+ syntactic analysis (1 DVD)
DVDROM 26 Electronic Information Center PCL 2.200 - LDC2005T06 Chinese News Translation Text Part 1 (1CD)
CDROM 1787 Electronic Information Center PCL 2.200 - LDC2005T07 ACE Time Normalization (TERN) 2004 English Training Data V1.0 (1 CD)
CDROM 1796 Electronic Information Center PCL 2.200 - LDC2005T08 Discourse Graphbank (1 CD)
CDROM 1797 Electronic Information Center PCL 2.200 - LDC2005T09 ACE 2004 Multilingual Training Corpus (1 CD)
CDROM 1798 PCL Reference - Electronic Information - LDC2005T12 English Gigaword Second Edition (2 DVDs)
DVDROM 33 DISCS 1-2 Electronic Information Center PCL 2.200 - LDC2005T13 CCGbank (1 DVD)
DVDROM 29 Electronic Information Center PCL 2.200 - LDC2005T16 TDT4 Multilingual Text and Annotations (1 DVD)
DVDROM 34 Electronic Information Center PCL 2.200 - LDC2005T20 Arabic Treebank: Part 3 (full corpus) v2.0 (MPG + syntactic analysis)(1 DVD)
DVDROM 35 Electronic Information Center PCL 2.200 - LDC2005S25 Santa Barbara Corpus of Spoken American English Part-IV (1 DVD)
DVDROM 37 Electronic Information Center PCL 2.200 - LDC2005T28 HARD 2004 Text(1 DVD)
DVDROM 38 Electronic Information Center PCL 2.200 - LDC2005T29 HARD 2004 Topics and Annotations(1 CD)
CDROM 1915 Electronic Information Center PCL 2.200 - LDC2005T30 Arabic Treebank: Part 4 v1.0 (MPG annotation) (1 CD)
CDROM 1914 Electronic Information Center PCL 2.200 - LDC2005T33 BBN Pronoun Coreference and Entity Type Corpus(1 CD)
CDROM 1916 Electronic Information Center PCL 2.200
- LDC2004S09 NIST Meeting Pilot Corpus Speech (9 DVDs)
DVDROM 28 NOs.1-9 Electronic Information Center PCL 2.200 - LDC2004S13 Fisher English Training Speech Part 1 Speech (7 DVDs)
DVDROM 24 NOs. 1-7 Electronic Information Center PCL 2.200 - LDC2004T02 Arabic Treebank: Part 2 v 2.0 (1 CD)
CDROM 1782 Electronic Information Center PCL 2.200 - LDC2004T05 Chinese Treebank Version 4.0 (1 CD)
CDROM 1784 Electronic Information Center PCL 2.200 - LDC2004T07 Multiple-Translation Chinese (MTC) Part 3 (1 CD)
CDROM 1786 Electronic Information Center PCL 2.200 - LDC2004T09 TIDES Extraction (ACE) 2003 Multilingual Training Data (1 CD)
CDROM 1788 Electronic Information Center PCL 2.200 - LDC2004T11 Arabic Treebank: Part 3 v 1.0 (1 CD)
CDROM 1783 Electronic Information Center PCL 2.200 - LDC2004T12 MDE RT-03 Training Data Text and Annotations (1 DVD)
DVDROM 27 Electronic Information Center PCL 2.200 - LDC2004T13 NIST Meeting Pilot Corpus Transcripts and Metadata (1 CD)
CDROM 1789 Electronic Information Center PCL 2.200 - LDC2004T14 Proposition Bank I (1 CD)
CDROM 1790 Electronic Information Center PCL 2.200 - LDC2004T15 2000 Communicator Dialogue Act Tagged (1 CD)
CDROM 1791 Electronic Information Center PCL 2.200 - LDC2004T16 2001 Communicator Dialogue Act Tagged (1 CD)
CDROM 1792 PCL Reference - Electronic Information - LDC2004T17 Arabic News Translation Text Part 1 (1 CD)
CDROM 1793 Electronic Information Center PCL 2.200 - LDC2004T18 Arabic English Parallel News Part 1 (1 CD)
CDROM 1794 Electronic Information Center PCL 2.200 - LDC2004T19 Fisher English Training Speech Part 1, Transcripts (1 CD)
CDROM 1795 Electronic Information Center PCL 2.200 - LDC2004V01 FORM1 Kinematic Gesture (1 DVD )
DVDROM 25 Electronic Information Center PCL 2.200
- LDC2003S01 2001 Communicator Evaluation (1 DVD)
DVDROM 18 Electronic Information Center PCL 2.200 - LDC2003T03 1997 HUB5 German Transcripts (1 CD)
CDROM 1218 Electronic Information Center PCL 2.200 - LDC2003T04 1997 HUB5 Spanish Transcripts (1 CD)
CDROM 1219 Electronic Information Center PCL 2.200 - LDC2003T02 1998 HUB5 English Transcripts (1 CD)
CDROM 1220 Electronic Information Center PCL 2.200 - LDC2003T01 2001 HUB5 Mandarin Transcripts (1 CD)
CDROM 1221 Electronic Information Center PCL 2.200 - LDC2003T11 ACE-2 Version 1.0 (1 CD)
CDROM 1222 Electronic Information Center PCL 2.200 - LDC2003T12 Arabic Gigaword (1 DVD)
DVDROM 17 Electronic Information Center PCL 2.200 - LDC2003T07 Arabic Treebank: Part 1 - 10K-word English Translation (1 CD)
CDROM 1223 Electronic Information Center PCL 2.200 - LDC2003T06 Arabic Treebank: Part 1 v 2.0 (1 CD)
CDROM 1224 Electronic Information Center PCL 2.200 - LDC2003T09 Chinese Gigaword (1 DVD)
DVDROM 16 Electronic Information Center PCL 2.200 - LDC2003T05 English Gigaword (1 DVD)
DVDROM 15 Electronic Information Center PCL 2.200 - LDC2003V01 FORM2 Kinematic Gesture (1 CD)
CDROM 1225 Electronic Information Center PCL 2.200 - LDC2003L01 Grassfields Bantu Fieldwork: Dschang Lexicon (1 CD)
CDROM 1226 Electronic Information Center PCL 2.200 - LDC2003L02 Korean Telephone Conversations Lexicon (1 CD)
CDROM 1228 Electronic Information Center PCL 2.200 - LDC2003S02 Grassfields Bantu Fieldwork: Dschang Tone Paradigms (1 CD)
CDROM 1227 Electronic Information Center PCL 2.200 - LDC2003S03 Korean Telephone Conversations Speech (3 CDs)
CDROM 1229 DISCS 1-3 Electronic Information Center PCL 2.200 - LDC2003S06 Santa Barbara Corpus of Spoken American English Part-II (1 DVD)
DVDROM 14 Electronic Information Center PCL 2.200 - LDC2003T08 Korean Telephone Conversations Transcripts (1 CD)
CDROM 1230 Electronic Information Center PCL 2.200 - LDC2003T13 Message Understanding Conference (MUC) 6 (1 CD)
CDROM 1231 Electronic Information Center PCL 2.200 - LDC2003T18 Multiple-Translation Arabic (MTA) Part 1 (1 CD)
CDROM 1232 Electronic Information Center PCL 2.200 - LDC2003T17 Multiple-Translation Chinese (MTC) Part 2 (1 CD)
CDROM 1233 Electronic Information Center PCL 2.200 - LDC2003T10 SAID (1 CD)
CDROM 1234 Electronic Information Center PCL 2.200 - LDC2003T15 SLX Corpus of Classic Sociolinguistic Interviews (1 DVD)
DVDROM 19 Electronic Information Center PCL 2.200 - LDC2003T16 SummBank 1.0 (4 DVDs)
DVDROM 20 DISCS 1-4 Electronic Information Center PCL 2.200 - LDC2003S05 West Point Russian Speech (1 CD)
CDROM 1235 Electronic Information Center PCL 2.200
- LDC97T14 CALLHOME American English Transcripts (1 CD)
CDROM 2342 Electronic Information Center PCL 2.200
- LDC96T17 CALLHOME Spanish Transcripts (1 CD)
CDROM 2341 Electronic Information Center PCL 2.200 - CELEX 2 (1 CD)
CDROM 26521 Electronic Information Center PCL 2.200
- LDC95S26 ATIS3 Test Data (2 CDs) - Air Travel Information System (ATIS3) : multi-site speech collection.
CDROM 1911 DISCS: 4.2, 5.1 Electronic Information Center PCL 2.200
- LDC94S19 ATIS3 Training Data (3 CDs) - Air Travel Information System (ATIS3) : multi-site speech collection.
CDROM 1910 DISCS: 1.1, 2.1, 3.1 Electronic Information Center PCL 2.200
Please refer to the LDC site for specific information on Using LDC corpora
Groups:
