aj/chamilo-lms2 @ 8cd85a7c6932900ce31a75267272c6a31fdd4760

Julio Montoya fe5db3a50e Remove unused code. Cleaning up i18n library.		9 年之前
..
language_profiles	94984e6b12 merge	10 年之前
sample_texts	94984e6b12 merge	10 年之前
index.html	94984e6b12 merge	10 年之前
readme.txt	cbf2d50681 Updating license information for the internationalization library.	14 年之前

		
				readme.txt
			
				Libbrary of statistical profiles for language recognition

---------------------------------------------------------

The sample texts for dieffernt languages have been taken from

Perl module: Lingua::LanguageGuesser - http://gensen.dl.itc.u-tokyo.ac.jp/LanguageGuesser/LanguageGuesser_demo.html

Statistical Text Analysis - http://boxoffice.ch/pseudo/

Some random sample texts have been taken from Wikiedia - http://wikipedia.org/

All the sample texts should be UTF-8 encoded!

To understand how does language recognition work you need to read the following remarkable work:

W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.

http://citeseer.ist.psu.edu/cache/papers/cs/810/http:zSzzSzwww.info.unicaen.frzSz~giguetzSzclassifzSzcavnar_trenkle_ngram.pdf/n-gram-based-text.pdf

License: GNU General Public License 3 as published by the Free Software Foundation (http://www.fsf.org/).

Assembled by Ivan Tcholakov, 

November, 2009