Language Detection Library/Code

Shashwat Anand anand.shashwat at gmail.com
Mon Dec 27 19:42:00 EST 2010


On Tue, Dec 28, 2010 at 6:03 AM, Katie T <katie at coderstack.co.uk> wrote:

> On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand
> <anand.shashwat at gmail.com> wrote:
> > Can anyone suggest a language detection library in python which works on
> a
> > phrase of say 2-5 words.
>
> Generally such libraries work by bi/trigram frequency analysis, which
> means you're going to have a fairly high error rate with such small
> phrases. If you're only dealing with a handful of languages it may
> make more sense to combine an existing library with a simple
> dictionary lookup model to improve accuracy.
>
> Katie
>

Infact I'm dealing with very few languages - German, French, Italian,
Portugese and Russian.
I read papers mentioning bi/tri gram frequency but was unable to find any
library.
'guess-language' doesn't perform at all.  The cld (Compact Language
Detection) module of
Google chrome performs well but it is not a standalone library ( I hope
someone ports it ).

Regarding dictionary lookup+n-gram approach I didn't quite understand what
you wanted to say.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101228/02eb1c71/attachment.html>


More information about the Python-list mailing list