[Tutor] Confusing Unicode Conversion Problem.

Wed Dec 13 20:24:14 CET 2006

Thanks for the detailed reply.

The reason I an forcing each line to string and splitting it is because the
pure numeric values coming from the excel sheet all come in a decimal, and
have an appended .0 at the end.
So 123456 in Excel is becoming 123456.0 when using the loop to extract it. I
was told by another person here in the office that Excel and COM aren't the
most intelligent collaborators =P

The destination for the list (you guessed correct) is another loop that
creates SQL commands, and then posts them into a database. Essentially this
script is just reading some 6K rows of data (max column count) from excel
and posting them into SQL tables. The script works fine for just about every
column, which is what has me so puzzled. I guess when MicroStrategies is
pulling this data into Excel format it must be adding extra data or
something.

Anyways, maybe now that I've explained what I'm doing, I could get a little
more focused solution to my problem? I think I've got a full understanding
of what is happening, but I'm still not sure of the fix. If it was in the
last e-mail, it must be over my head and I'll need it pointed out with neon
lights :)

thanks

On 12/13/06, Tim Golden <Tim.Golden at viacom-outdoor.co.uk> wrote:
>
> [Chris Hengge]
>
> | 'ascii' codec can't encode character u'\xa0' in position 11:
> | ordinal not in range(128)
> | Error with: FRAMEMRISER  of type: <type 'unicode'>
> | Excel Row : 6355
>
> OK. Let's get to the basics first:
>
> <code>
> import unicodedata
> print unicodedata.name (u'\xa0')
> # outputs: NO-BREAK SPACE
>
> </code>
>
> So somewhere (maybe at the end) of your unicode
> string is a non-breaking space. (Notice that
> extra space between "FRAMERISER" and "of" in
> the message above.
>
> Next, when you print to the screen, you're implicitly
> using the sys.stdout encoding, which on my XP machine
> is cp437:
>
> <code>
> import sys
> print sys.stdout.encoding
> # outputs: cp437
>
> print u'\xa0'.encode (sys.stdout.encoding)
> # outputs a blank line, presumably including a non-breaking space
>
> </code>
>
> But when you convert to a str using str (...) Python
> will use an ascii encoding. So let's try that:
>
> <code>
> print str (u'\xa0')
> # sure enough: UnicodeError, blah, blah
>
> </code>
>
> In essence, when you're using Unicode data, you either
> need to encode immediately to a consistent encoding of
> your choice (or possibly forced upon you) or to retain
> Unicode data throughout until you need to output, to
> screen or database or file, and then convert as needed.
>
> Let's take your code (snipped a bit):
>
> 1             while xlSht.Cells(row,col).Value != None:
> 2                      tempValue = xlSht.Cells(row,col).Value
> 3                      tempString = str(tempValue).split('.')[0]
> 4                      ExcelValues.append(tempString)
> 5                      Row = 1 + row # Increment Rows.
>
> It's not clear what ExcelValues is, but let's assume
> it's a list of things you're going to output later
> to a file. Your line 3 is doing an implicit conversion
> when it doesn't look like it needs to. Have a look
> at this trivial example:
>
> <code>
> import codecs
>
> fake_excel_data = ([u"Stuff.0", u"\xa0and\xa0.1", u"nonsense.2"])
> values = []
>
> for data in fake_excel_data:
>   pre, post = data.split (".")
>   values.append (pre)
>
> #
> # later...
> #
> f = codecs.open ("excel_values.txt", "w", "utf-8")
> try:
>   f.writelines (values)
> finally:
>   f.close ()
>
> </code>
>
> Notice I haven't done the encoding until I finally
> output to a file, where I've used the codecs module
> to specify an encoding. You could do this string by
> string or some other way.
>
> If I were simply writing back to, say, another
> Excel sheet, or any other target which was expecting
> Unicode data, I wouldn't encode it anywhere. The Unicode
> objects offer nearly all the same methods as the
> string objects so you just use them as you would strings.
>
> What you have to look out for is situations like
> your str () conversion where an implicit encoding-to-ascii
> goes on.
>
> HTH
> TJG
>
> ________________________________________________________________________
> This e-mail has been scanned for all viruses by Star. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________________
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20061213/f3e4d84a/attachment.html