[Tutor] Is there a package to "un-mangle" characters?

Steven D'Aprano steve at pearwood.info
Fri Nov 22 16:30:34 CET 2013


On Thu, Nov 21, 2013 at 12:04:19PM -0800, Albert-Jan Roskam wrote:
> Hi,
> 
> Today I had a csv file in utf-8 encoding, but part of the accented 
> characters were mangled. The data were scraped from a website and it 
> turned out that at least some of the data were mangled on the website 
> already. Bits of the text were actually cp1252 (or cp850), I think, 
> even though the webpage was in utf-8 Is there any package that helps 
> to correct such issues?

Python has superpowers :-)

http://blog.luminoso.com/2012/08/20/fix-unicode-mistakes-with-python/



-- 
Steven


More information about the Tutor mailing list