Hi! My first comment after having installed Nevow 0.3: my polish cp1250 encoding chars are not displayed: - htmlfile replaces them all with question marks (it works OK when using UTF-8 encoding) - xmlfile works fine when reading the text from file but again it fails to display national charset when it gets it returned from Python code Maybe it's me doing something wrong somewhere, I'm getting more&more confused with that encoding stuff :) Bartek Bargiel
On Sep 28, 2004, at 12:47 AM, Bartek Bargiel wrote:
My first comment after having installed Nevow 0.3: my polish cp1250 encoding chars are not displayed:
- htmlfile replaces them all with question marks (it works OK when using UTF-8 encoding)
- xmlfile works fine when reading the text from file but again it fails to display national charset when it gets it returned from Python code
Maybe it's me doing something wrong somewhere, I'm getting more&more confused with that encoding stuff :)
Firstly, Nevow is and always will be designed to use only unicode internally. Doing anything else at this point in time is complete madness. This has a few consequences: 1) you should always use unicode strings in your python code if they have any non-core-ASCII characters in them. like e.g. u"새카만 커피 oh no~ 새하얀 우유 oh yes~" Additionally, you have to make sure your source code file encoding is set properly <http://www.python.org/peps/pep-0263.html> or else use unicode escapes instead of the actual characters, e.g. u"\uc0c8\uce74\ub9cc \ucee4\ud53c oh no~ \uc0c8\ud558\uc580 \uc6b0\uc720 oh yes~" 2) xmlfile and htmlfile must decode from the file's encoding to unicode. However, htmlfile is completely broken in this regard: it does not decode the file encoding at all. If the file happens to be in UTF-8 already, it will "work", but only because it returns byte strings, which are not encoded upon output. This really ought to be fixed; people have lots of pre-existing files in strange encodings, and utf-8 editor support isn't quite all there yet, either. htmlfile should do META content-type tag sniffing (like a browser would), and also allow the developer to specify a default encoding in the htmlfile constructor. Fortunately, xmlfile does work right: use a standard <?xml version="1.0" encoding="cp1250"?> declaration at the top of the file and it'll do the right thing. 3) When writing the response to the client, nevow must encode from unicode into the proper response encoding. Currently there is no way to specify any response encoding besides UTF-8. I do not believe this needs to be (or even should be) fixed: any browser that cannot handle UTF-8 encoding is utterly worthless, and I don't think there are any browsers that worthless still in use. At least I hope there aren't. James
participants (2)
-
Bartek Bargiel
-
James Y Knight