<br><br><div class="gmail_quote">On Tue, Dec 28, 2010 at 6:03 AM, Katie T <span dir="ltr"><<a href="mailto:katie@coderstack.co.uk">katie@coderstack.co.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div class="im">On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand<br>

<<a href="mailto:anand.shashwat@gmail.com">anand.shashwat@gmail.com</a>> wrote:<br>

> Can anyone suggest a language detection library in python which works on a<br>

> phrase of say 2-5 words.<br>

<br>

</div>Generally such libraries work by bi/trigram frequency analysis, which<br>

means you're going to have a fairly high error rate with such small<br>

phrases. If you're only dealing with a handful of languages it may<br>

make more sense to combine an existing library with a simple<br>

dictionary lookup model to improve accuracy.<br>

<br>

Katie<br></blockquote><div><br></div><div>Infact I'm dealing with very few languages - German, French, Italian, Portugese and Russian.</div><div>I read papers mentioning bi/tri gram frequency but was unable to find any library.</div>


<div>'guess-language' doesn't perform at all.  The cld (Compact Language Detection) module of </div><div>Google chrome performs well but it is not a standalone library ( I hope someone ports it ).</div><div><br>


</div><div>Regarding dictionary lookup+n-gram approach I didn't quite understand what you wanted to say.</div></div>