About size of Unicode string
Frank Abel Cancio Bello
frankabel at tesla.cujae.edu.cu
Mon Jun 6 20:48:53 CEST 2005
Well I will repeat the question:
Can I get how many bytes have a string object independently of its encoding?
Is the "len" function the right way of get it?
Laci look the following code:
request = urllib2.Request(url= 'http://localhost:6000')
data = 'data to send\n'.encode('utf_8')
file = urllib2.urlopen(request)
Is always true that "the size of the entity-body" is "len(data)"
independently of the encoding of "data"?
> -----Original Message-----
> From: Laszlo Zsolt Nagy [mailto:gandalf at geochemsource.com]
> Sent: Monday, June 06, 2005 1:43 PM
> To: Frank Abel Cancio Bello; python-list at python.org
> Subject: Re: About size of Unicode string
> Frank Abel Cancio Bello wrote:
> >Hi all!
> >I need know the size of string object independently of its encoding. For
> > len('123') == len('123'.encode('utf_8'))
> >while the size of '123' object is different of the size of
> >I need send in HTTP request a string. Then I need know the length of the
> >string to set the header "content-length" independently of its encoding.
> >Any idea?
> This is from the RFC:
> > The Content-Length entity-header field indicates the size of the
> > entity-body, in decimal number of OCTETs, sent to the recipient or, in
> > the case of the HEAD method, the size of the entity-body that would
> > have been sent had the request been a GET.
> > Content-Length = "Content-Length" ":" 1*DIGIT
> > An example is
> > Content-Length: 3495
> > Applications SHOULD use this field to indicate the transfer-length of
> > the message-body, unless this is prohibited by the rules in section
> > 4.4 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4>.
> > Any Content-Length greater than or equal to zero is a valid value.
> > Section 4.4 describes how to determine the length of a message-body if
> > a Content-Length is not given.
> Looks to me that the Content-Length header has nothing to do with the
> encoding. It is a very low levet stuff. The content length is given in
> OCTETs and it represents the size of the body. Clearly, it has nothing
> to do with MIME/encoding etc. It is about the number of bits transferred
> in the body. Try to write your unicode strings into a StringIO and take
> its length....
More information about the Python-list