aj/chamilo-lms2 @ 8cd85a7c6932900ce31a75267272c6a31fdd4760

Julio Montoya fe5db3a50e Remove unused code. Cleaning up i18n library.		9 سال پیش
..
language_profiles	94984e6b12 merge	10 سال پیش
sample_texts	94984e6b12 merge	10 سال پیش
index.html	94984e6b12 merge	10 سال پیش
readme.txt	cbf2d50681 Updating license information for the internationalization library.	14 سال پیش

		
				readme.txt
			
				Libbrary of statistical profiles for language recognition

---------------------------------------------------------

The sample texts for dieffernt languages have been taken from

Perl module: Lingua::LanguageGuesser - http://gensen.dl.itc.u-tokyo.ac.jp/LanguageGuesser/LanguageGuesser_demo.html

Statistical Text Analysis - http://boxoffice.ch/pseudo/

Some random sample texts have been taken from Wikiedia - http://wikipedia.org/

All the sample texts should be UTF-8 encoded!

To understand how does language recognition work you need to read the following remarkable work:

W. B. Cavnar and J. M. Trenkle. N-gram-based text categorization. Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.

http://citeseer.ist.psu.edu/cache/papers/cs/810/http:zSzzSzwww.info.unicaen.frzSz~giguetzSzclassifzSzcavnar_trenkle_ngram.pdf/n-gram-based-text.pdf

License: GNU General Public License 3 as published by the Free Software Foundation (http://www.fsf.org/).

Assembled by Ivan Tcholakov, 

November, 2009