On Nov 26, 2014, at 02:30 PM, Abhilash Raj wrote:
I am working on porting mailman3 to python3. There are few doubts I have which may sound stupid, but I have a little knowledge about encoding and charsets which I think is important for the port. Also this is my first time working on python3 so forgive me for asking anything obvious( I did try googling first) The code is in very preliminary stages with tons of errors. Its up on launchpad.
Yay! Note that I also have been playing around with it a bit. I wouldn't say it's a functional branch, but you might compare notes:
pkg_resources.resource_stringused to read the configuration files while
initialization and testing now returns bytes instead of string(like in python2).
Right. "resource_string" is really a misnomer. In another Python 3 project I work on, I use:
from pkg_resource import resource_string as resource_bytes
just so that the call sites are accurate.
So instead I use
pkg_resources.resource_streamand deocode to the steam to a
utf-8encoding(why? to avoid errors for the time being). I wanted to ask what would be the best for this case?
You can of course also decode the bytestring that "resource_bytes" returns. That's generally what I do in that other project.
- To create a hash everywhere unicode strings must be encoded. So again the
same question what should I encode it to? UTF-8 or US-ASCII? Or someway to determine which encoding should be used?
I think utf-8 is generally the right encoding to use, except in contexts where
you are given a better hint. E.g. if you're decoding say the payload of a
message with a Content-Type header, and that header has a
it will name the encoding that you're supposed to use.
There may be cases where us-ascii or latin-1 would be better, but I think those should be determined on a case-by-case basis.
Also if anybody has suggestions on porting please let me know. I am online sporadically on irc as
maxkingon #mailman (I see all of the messages sent while I am away).
The trunk should now be
python2.7 -3 clean, so it's ready to be ported.
I'd probably not try to run the full test suite yet, but start at the lowest
level, e.g. database/model and work your way up. If there are changes that
make sense even in Python 2 (i.e. for bilingual support for now) try to
put those in separate branches and merge proposals, so they can be merged into
trunk even before a full port.
This page will be indispensable: