[Tutor] Topic #2 of Tutor Digest

Todd Tabern TTabern at ddti.net
Sat Jul 28 19:53:38 CEST 2012


Mark Lawrence: Yes, I did... I kept encountering errors when trying to post the first time. I didn't think my question went through, so I tried this one.
Even if I were to purposefully ask the question in multiple places, why does that concern you? I wasn't aware that asking for help in multiple places is forbidden.
I'm sorry that it offended you so much that you felt the need to respond in that manner instead of providing assistance...

Cheers

tutor-request at python.org wrote:


Send Tutor mailing list submissions to
        tutor at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
        tutor-request at python.org

You can reach the person managing the list at
        tutor-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

   1. Re: Encoding error when reading text files in Python 3
      (Steven D'Aprano)
   2. Re: Search and replace text in XML file? (Mark Lawrence)
   3. Re: Encoding error when reading text files in Python 3 (Dat Huynh)
   4. Re: Flatten a list in tuples and remove doubles
      (Francesco Loffredo)
   5. Re: Flatten a list in tuples and remove doubles
      (Francesco Loffredo)


----------------------------------------------------------------------

Message: 1
Date: Sat, 28 Jul 2012 20:09:28 +1000
From: Steven D'Aprano <steve at pearwood.info>
To: tutor at python.org
Subject: Re: [Tutor] Encoding error when reading text files in Python
        3
Message-ID: <5013BA58.1040404 at pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dat Huynh wrote:
> Dear all,
>
> I have written a simple application by Python to read data from text files.
>
> Current I have both Python version 2.7.2 and Python 3.2.3 on my laptop.
> I don't know why it does not run on Python version 3 while it runs
> well on Python 2.

Python 2 is more forgiving of beginner errors when dealing with text and
bytes, but makes it harder to deal with text correctly.

Python 3 makes it easier to deal with text correctly, but is less forgiving.

When you read from a file in Python 2, it will give you *something*, even if
it is the wrong thing. It will not give an decoding error, even if the text
you are reading is not valid text. It will just give you junk bytes, sometimes
known as moji-bake.

Python 3 no longer does that. It tells you when there is a problem, so you can
fix it.


> Could you please tell me how I can run it on python 3?
> Following is my Python code.
>
>  ------------------------------
>    for subdir, dirs, files in os.walk(rootdir):
>         for file in files:
>             print("Processing [" +file +"]...\n" )
>             f = open(rootdir+file, 'r')
>             data = f.read()
>             f.close()
>             print(data)
> ------------------------------
>
> This is the error message:
[...]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position
> 4980: ordinal not in range(128)


This tells you that you are reading a non-ASCII file but haven't told Python
what encoding to use, so by default Python uses ASCII.

Do you know what encoding the file is?

Do you understand about Unicode text and bytes? If not, I suggest you read
this article:

http://www.joelonsoftware.com/articles/Unicode.html


In Python 3, you can either tell Python what encoding to use:

f = open(rootdir+file, 'r', encoding='utf8')  # for example

or you can set an error handler:

f = open(rootdir+file, 'r', errors='ignore')  # for example

or both

f = open(rootdir+file, 'r', encoding='ascii', errors='replace')


You can see the list of encodings and error handlers here:

http://docs.python.org/py3k/library/codecs.html


Unfortunately, Python 2 does not support this using the built-in open
function. Instead, you have to uses codecs.open instead of the built-in open,
like this:

import codecs
f = codecs.open(rootdir+file, 'r', encoding='utf8')  # for example

which fortunately works in both Python 2 or 3.


Or you can read the file in binary mode, and then decode it into text:

f = open(rootdir+file, 'rb')
data = f.read()
f.close()
text = data.decode('cp866', 'replace')
print(text)


If you don't know the encoding, you can try opening the file in Firefox or
Internet Explorer and see if they can guess it, or you can use the chardet
library in Python.

http://pypi.python.org/pypi/chardet

Or if you don't care about getting moji-bake, you can pretend that the file is
encoded using Latin-1. That will pretty much read anything, although what it
gives you may be junk.



--
Steven


------------------------------

Message: 2
Date: Sat, 28 Jul 2012 11:25:30 +0100
From: Mark Lawrence <breamoreboy at yahoo.co.uk>
To: tutor at python.org
Subject: Re: [Tutor] Search and replace text in XML file?
Message-ID: <jv0emn$eda$1 at dough.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 28/07/2012 02:38, Todd Tabern wrote:
> I'm looking to search an entire XML file for specific text and replace that text, while maintaining the structure of the XML file. The text occurs within multiple nodes throughout the file.
> I basically need to replace every occurrence C:\Program Files with C:\Program Files (x86), regardless of location. For example, that text appears within:
> <URL>C:\Program Files\\Map Data\Road_Centerlines.shp</URL>
> and also within:
> <RoutingIndexPathName>C:\Program Files\Templates\RoadNetwork.rtx</RoutingIndexPathName>
> ...among others.
> I've tried some non-python methods and they all ruined the XML structure. I've been Google searching all day and can only seem to find solutions that look for a specific node and replace the whole string between the tags.
> I've been looking at using minidom to achieve this but I just can't seem to figure out the right method.
> My end goal, once I have working code, is to compile an exe that can work on machines without python, allowing a user can click in order to perform the XML modification.
> Thanks in advance.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

Did you really have to ask the same question on two separate Python
mailing lists and only 15 minutes apart?

--
Cheers.

Mark Lawrence.



------------------------------

Message: 3
Date: Sat, 28 Jul 2012 18:45:47 +0800
From: Dat Huynh <htdatcse at gmail.com>
To: tutor at python.org
Subject: Re: [Tutor] Encoding error when reading text files in Python
        3
Message-ID:
        <CAPw=odian5_MYMudR+OiaWstgoG9i+zDoSk-0rRy1arxif1G0g at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

I change my code and it runs on Python 3 now.

           f = open(rootdir+file, 'rb')
          data = f.read().decode('utf8', 'ignore')

Thank you very much.
Sincerely,
Dat.




On Sat, Jul 28, 2012 at 6:09 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> Dat Huynh wrote:
>>
>> Dear all,
>>
>> I have written a simple application by Python to read data from text
>> files.
>>
>> Current I have both Python version 2.7.2 and Python 3.2.3 on my laptop.
>> I don't know why it does not run on Python version 3 while it runs
>> well on Python 2.
>
>
> Python 2 is more forgiving of beginner errors when dealing with text and
> bytes, but makes it harder to deal with text correctly.
>
> Python 3 makes it easier to deal with text correctly, but is less forgiving.
>
> When you read from a file in Python 2, it will give you *something*, even if
> it is the wrong thing. It will not give an decoding error, even if the text
> you are reading is not valid text. It will just give you junk bytes,
> sometimes known as moji-bake.
>
> Python 3 no longer does that. It tells you when there is a problem, so you
> can fix it.
>
>
>
>> Could you please tell me how I can run it on python 3?
>> Following is my Python code.
>>
>>  ------------------------------
>>    for subdir, dirs, files in os.walk(rootdir):
>>         for file in files:
>>             print("Processing [" +file +"]...\n" )
>>             f = open(rootdir+file, 'r')
>>             data = f.read()
>>             f.close()
>>             print(data)
>> ------------------------------
>>
>> This is the error message:
>
> [...]
>
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position
>> 4980: ordinal not in range(128)
>
>
>
> This tells you that you are reading a non-ASCII file but haven't told Python
> what encoding to use, so by default Python uses ASCII.
>
> Do you know what encoding the file is?
>
> Do you understand about Unicode text and bytes? If not, I suggest you read
> this article:
>
> http://www.joelonsoftware.com/articles/Unicode.html
>
>
> In Python 3, you can either tell Python what encoding to use:
>
> f = open(rootdir+file, 'r', encoding='utf8')  # for example
>
> or you can set an error handler:
>
> f = open(rootdir+file, 'r', errors='ignore')  # for example
>
> or both
>
> f = open(rootdir+file, 'r', encoding='ascii', errors='replace')
>
>
> You can see the list of encodings and error handlers here:
>
> http://docs.python.org/py3k/library/codecs.html
>
>
> Unfortunately, Python 2 does not support this using the built-in open
> function. Instead, you have to uses codecs.open instead of the built-in
> open, like this:
>
> import codecs
> f = codecs.open(rootdir+file, 'r', encoding='utf8')  # for example
>
> which fortunately works in both Python 2 or 3.
>
>
> Or you can read the file in binary mode, and then decode it into text:
>
> f = open(rootdir+file, 'rb')
> data = f.read()
> f.close()
> text = data.decode('cp866', 'replace')
> print(text)
>
>
> If you don't know the encoding, you can try opening the file in Firefox or
> Internet Explorer and see if they can guess it, or you can use the chardet
> library in Python.
>
> http://pypi.python.org/pypi/chardet
>
> Or if you don't care about getting moji-bake, you can pretend that the file
> is encoded using Latin-1. That will pretty much read anything, although what
> it gives you may be junk.
>
>
>
> --
> Steven
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor


------------------------------

Message: 4
Date: Sat, 28 Jul 2012 17:12:57 +0200
From: Francesco Loffredo <fal at libero.it>
To: tutor at python.org
Subject: Re: [Tutor] Flatten a list in tuples and remove doubles
Message-ID: <50140179.2080306 at libero.it>
Content-Type: text/plain; charset=windows-1251; format=flowed

Il 19/07/2012 19:33, PyProg PyProg ha scritto:
> Hi all,
>
> I would get a new list as:
>
> [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0',
> '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy',
> '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0',
> '7.5/10.0', '40.5/60.0')]
>
> ... from this one:
>
> [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont',
> 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5,
> 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA',
> 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette',
> 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA',
> 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0),
> (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4,
> 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)]
>
> How to make that ? I'm looking for but for now I can't do it.
>
> Thanks in advance.
>
> a+
>
I had to study carefully your present and desired lists, and I
understood what follows (please, next time explain !):
- each 7-tuple in your present list is a record for some measure
relative to a person. Its fields are as follows:
     - field 0: code (I think you want that in growing order)
     - field 1: group code (could be a class or a group to which both of
your example persons belong)
     - fields 2, 3: surname and name of the person
     - field 4: progressive number of the measure (these are in order
already, but I think you want to enforce this) that you want to exclude
from the output list while keeping the order
     - field 5, 6: numerator and denominator of a ratio that is the
measure. you want the ratio to be written as a single string: "%s/%s" %
field5, field6

Taking for granted this structure and my educated guesses about what you
didn't tell us, here's my solution:

def flatten(inlist)
     """
       takes PyProg PyProg's current list and returns his/her desired one,
       given my guesses about the structure of inlist and the desired
result.
     """
     tempdict = {}
     for item in inlist:
         if len(item) != 7:
             print "Item errato: \n", item
         id = tuple(item[:4])
         progr = item[4]
         payload = "%s/%s" % item[5:]
         if id in tempdict:
            tempdict[id].extend([(progr, payload)])
         else:
            tempdict[id] = [(progr, payload)]
     for item in tempdict:
         tempdict[item].sort() # so we set payloads in progressive
order, if they aren't already
     # print "Temporary Dict: ", tempdict
     tmplist2 = []
     for item in tempdict:
         templist = []
         templist.extend(item)
         templist.extend(tempdict[item])
         tmplist2.append(tuple(templist))
     tmplist2.sort()# so we set IDs in order
     # print "Temporary List: ", tmplist2
     outlist = []
     for item in tmplist2:
         templist = []
         if isinstance(item, tuple):
            for subitem in item:
                if isinstance(subitem, tuple):
                   templist.append(subitem[1])
                else:
                   templist.append(subitem)
            outlist.append(tuple(templist))
         else:
            outlist.append(item)
     # print "\nOutput List: ", outlist
     return outlist


------------------------------

Message: 5
Date: Sat, 28 Jul 2012 18:29:20 +0200
From: Francesco Loffredo <fal at libero.it>
To: tutor at python.org
Subject: Re: [Tutor] Flatten a list in tuples and remove doubles
Message-ID: <50141360.6030606 at libero.it>
Content-Type: text/plain; charset=windows-1251; format=flowed

Il 28/07/2012 17:12, Francesco Loffredo ha scritto:
> Il 19/07/2012 19:33, PyProg PyProg ha scritto:
>> Hi all,
>>
>> I would get a new list as:
>>
>> [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0',
>> '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy',
>> '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0',
>> '7.5/10.0', '40.5/60.0')]
>>
>> ... from this one:
>>
>> [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont',
>> 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5,
>> 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA',
>> 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette',
>> 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA',
>> 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0),
>> (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4,
>> 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)]
>>
>> How to make that ? I'm looking for but for now I can't do it.
>>
>> Thanks in advance.
>>
>> a+
>>
> I had to study carefully your present and desired lists, and I
> understood what follows (please, next time explain !):
> - each 7-tuple in your present list is a record for some measure
> relative to a person. Its fields are as follows:
>     - field 0: code (I think you want that in growing order)
>     - field 1: group code (could be a class or a group to which both
> of your example persons belong)
>     - fields 2, 3: surname and name of the person
>     - field 4: progressive number of the measure (these are in order
> already, but I think you want to enforce this) that you want to
> exclude from the output list while keeping the order
>     - field 5, 6: numerator and denominator of a ratio that is the
> measure. you want the ratio to be written as a single string: "%s/%s"
> % field5, field6
>
> Taking for granted this structure and my educated guesses about what
> you didn't tell us, here's my solution:
>
> def flatten(inlist)
>     """
>       takes PyProg PyProg's current list and returns his/her desired one,
>       given my guesses about the structure of inlist and the desired
> result.
>     """
>     tempdict = {}
>     for item in inlist:
>         if len(item) != 7:
>             print "Item errato: \n", item
>         id = tuple(item[:4])
>         progr = item[4]
>         payload = "%s/%s" % item[5:]
>         if id in tempdict:
>            tempdict[id].extend([(progr, payload)])
>         else:
>            tempdict[id] = [(progr, payload)]
>     for item in tempdict:
>         tempdict[item].sort() # so we set payloads in progressive
> order, if they aren't already
>     # print "Temporary Dict: ", tempdict
>     tmplist2 = []
>     for item in tempdict:
>         templist = []
>         templist.extend(item)
>         templist.extend(tempdict[item])
>         tmplist2.append(tuple(templist))
>     tmplist2.sort()# so we set IDs in order
>     # print "Temporary List: ", tmplist2
>     outlist = []
>     for item in tmplist2:
>         templist = []
>         if isinstance(item, tuple):
>            for subitem in item:
>                if isinstance(subitem, tuple):
>                   templist.append(subitem[1])
>                else:
>                   templist.append(subitem)
>            outlist.append(tuple(templist))
>         else:
>            outlist.append(item)
>     # print "\nOutput List: ", outlist
>     return outlist
>
ok, as usual when I look again at something I wrote, I found some little
mistakes. Here's my errata corrige:

1- of course, a function definition must end with a colon...
    line 1:
def flatten(inlist):

2- sorry, English is not my first language...
    line 9:
              print "Item length wrong!\n", item

3- I didn't insert a break statement after line 9, but if inlist
contained a wrong item it would be nice to do something more than simply
tell the user, for example we could skip that item, or trim / pad it, or
stop the execution, or raise an exception... I just told it to the
unsuspecting user, and this may very probably lead to some exception in
a later point, or (much worse) to wrong results. So:
    line 8-9:
         if len(item) != 7:
              print "Item length wrong!\n", item
              raise ValueError("item length != 7")


... now I feel better ... but I must avoid reading my function again, or
I'll find some more bugs!

Francesco


------------------------------

_______________________________________________
Tutor maillist  -  Tutor at python.org
http://mail.python.org/mailman/listinfo/tutor


End of Tutor Digest, Vol 101, Issue 99
**************************************


More information about the Tutor mailing list