hex dump w/ or w/out utf-8 chars

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jul 9 08:53:39 CEST 2013

On Tue, 09 Jul 2013 07:49:45 +1000, Chris Angelico wrote:

> On Tue, Jul 9, 2013 at 6:56 AM, Dave Angel <davea at davea.name> wrote:
>> But Unicode has nothing to do with Guido, and it has existed for about
>> 25 years (if I recall correctly).
> Depends how you measure. According to [1], the work kinda began back
> then (25 years ago being 1988), but it wasn't till 1991/92 that the spec
> was published. Also, the full Unicode range with multiple planes came
> about in 1996, with Unicode 2.0, so that could also be considered the
> beginning of Unicode. But that still means it's nearly old enough to
> drink, so programmers ought to be aware of it.

Yes, yes, a thousand times yes. It's really not that hard to get the 
basics of Unicode.

"When I discovered that the popular web development tool PHP has almost 
complete ignorance of character encoding issues, blithely using 8 bits 
for characters, making it darn near impossible to develop good 
international web applications, I thought, enough is enough.

So I have an announcement to make: if you are a programmer working in 
2003 and you don't know the basics of characters, character sets, 
encodings, and Unicode, and I catch you, I'm going to punish you by 
making you peel onions for 6 months in a submarine. I swear I will."


Also: http://nedbatchelder.com/text/unipain.html

To start with, if you're writing code for Python 2.x, and not using u'' 
for strings, then you're making a rod for your own back. Do yourself a 
favour and get into the habit of always using u'' strings in Python 2.

I'll-start-taking-my-own-advice-next-week-I-promise-ly yrs,


More information about the Python-list mailing list