[Tutor] Critique and Question

Mon Nov 28 18:31:31 CET 2011

Sorry for not providing all the required info. I am running python 3.2 on
windows vista. I determined the files are double spaced by viewong them
(random sampling) in notepad++. Not double spaced on server by downloading
one in the browser. Can I use the 'Wu' flag when writing.  I might just be
'w'-ing. I will look when I get home.

Thanks

Mark

Also a little bummed that subprocess module doesn't appear to work on
windows. I probably (hopefully) won't need it, but it still bums me.
 On Nov 28, 2011 4:27 AM, "Dave Angel" <d at davea.name> wrote:

> On 11/28/2011 04:28 AM, Mark Lybrand wrote:
>
>> Okay, so I just started to learn Python.  I have been working through Dive
>> Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
>>  However, with Dive, I had an issue with him referencing the files in the
>> example directory, which from the website seem very unhandy.  Although I
>> have since stumbled upon his GitHub, I made a Python script to grab those
>> files for me and it works great, with the exception of doubling the line
>> spacing.  So here is my code. I hope you critique the heck out of my and
>> that you point out what I did wrong to introduce double line-spacing.
>>  Thanks a bunch:
>>
>> import os
>> import urllib.request
>> import re
>>
>> url_root = 'http://diveintopython3.ep.io/**examples/<http://diveintopython3.ep.io/examples/>
>> '
>> file_root = os.path.join(os.path.**expanduser("~"), "diveintopython3",
>> "examples")
>>
>> main_page = urllib.request.urlopen(url_**root).read()
>> main_page = main_page.decode("utf-8")
>>
>> pattern = 'href="([^"].*?.)(py|xml)"'
>> matches = re.findall(pattern, main_page)
>> for my_tuple in matches:
>> this_file = my_tuple[0] + my_tuple[1]
>> data = urllib.request.urlopen(url_**root + this_file).read()
>> data = data.decode("utf-8")
>> with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
>> as a_file:
>> a_file.write(data)
>>
>>  You don't tell what your environment is, nor how you decide that the
> file is double-spaced.  You also don't mention whether you're using Python
> 2.x or 3.x
>
> My guess is that you are using a Unix/Linux environment, and that the Dive
> author(s) used Windows.  And that your text editor is interpreting the
> cr/lf pair (hex 0d 0a) as two line-endings.  I believe emacs would have
> ignored the redundant cr.  Python likewise probably won't care, though I'm
> not positive about things like lines that continue across newline
> boundaries.
>
> You can figure out what is actually in the file by using repr() on bytes
> read from the file in binary mode.  Exactly how you do that will differ
> between Python 2.x and 3.x
>
> As for fixing it, you could either just use one of the dos2unix utilities
> kicking around (one's available on my Ubuntu from the Synaptic package
> manager), or you could make your utility manage it.  On a regular file
> open, there's a mode paramter that you can use "u", or better "ru" to say
> Universal.  It's intended to handle any of the three common line endings,
> and use a simple newline for all 3 cases.  I don't know whether urlopen()
> also has that option, but if not, you can always copy the file after you
> have it locally.
>
>
> --
>
> DaveA
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111128/919052a8/attachment.html>