[Python-Dev] Dropping bytes "support" in json

Robert Brewer fumanchu at aminus.org
Fri Apr 10 18:47:11 CEST 2009


On Thu, 2009-04-09 at 22:38 -0400, Barry Warsaw wrote:
> On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote:
> 
> > On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw <barry at python.org> wrote:
> > Anyway, aside from that decision, I haven't come up with an elegant  
> > way to allow /output/ in both bytes and strings (input is I think  
> > theoretically easier by sniffing the arguments).
> >
> > Won't this work? (assuming dumps() always returns a string)
> >
> > def dumpb(obj, encoding='utf-8', *args, **kw):
> >     s = dumps(obj, *args, **kw)
> >     return s.encode(encoding)
> 
> So, what I'm really asking is this.  Let's say you agree that there  
> are use cases for accessing a header value as either the raw encoded  
> bytes or the decoded unicode.  What should this return:
> 
>  >>> message['Subject']
> 
> The raw bytes or the decoded unicode?
> 
> Okay, so you've picked one.  Now how do you spell the other way?
> 
> The Message class probably has these explicit methods:
> 
>  >>> Message.get_header_bytes('Subject')
>  >>> Message.get_header_string('Subject')
> 
> (or better names... it's late and I'm tired ;).  One of those maps to  
> message['Subject'] but which is the more obvious choice?
> 
> Now, setting headers.  Sometimes you have some unicode thing and  
> sometimes you have some bytes.  You need to end up with bytes in the  
> ASCII range and you'd like to leave the header value unencoded if so.   
> But in both cases, you might have bytes or characters outside that  
> range, so you need an explicit encoding, defaulting to utf-8 probably.
> 
>  >>> Message.set_header('Subject', 'Some text', encoding='utf-8')
>  >>> Message.set_header('Subject', b'Some bytes')
> 
> One of those maps to
> 
>  >>> message['Subject'] = ???
> 
> I'm open to any suggestions here!

Syntactically, there's no sense in providing:

    Message.set_header('Subject', 'Some text', encoding='utf-16')

...since you could more clearly write the same as:

    Message.set_header('Subject', 'Some text'.encode('utf-16'))

The only interesting case is if you provided a *default* encoding, so that:

    Message.default_header_encoding = 'utf-16'
    Message.set_header('Subject', 'Some text')

...has the same effect.

But it would be far easier to do all the encoding at once in an output()
or serialize() method. Do different headers need different encodings? If
so, make message['Subject'] a subclass of str and give it an .encoding
attribute (with a default). If not, Message.header_encoding should be
sufficient.


Robert Brewer
fumanchu at aminus.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090410/1880daf9/attachment-0001.htm>


More information about the Python-Dev mailing list