[Tutor] Critique and Question
mlybrand at gmail.com
Mon Nov 28 18:31:31 CET 2011
Sorry for not providing all the required info. I am running python 3.2 on
windows vista. I determined the files are double spaced by viewong them
(random sampling) in notepad++. Not double spaced on server by downloading
one in the browser. Can I use the 'Wu' flag when writing. I might just be
'w'-ing. I will look when I get home.
Also a little bummed that subprocess module doesn't appear to work on
windows. I probably (hopefully) won't need it, but it still bums me.
On Nov 28, 2011 4:27 AM, "Dave Angel" <d at davea.name> wrote:
> On 11/28/2011 04:28 AM, Mark Lybrand wrote:
>> Okay, so I just started to learn Python. I have been working through Dive
>> Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
>> However, with Dive, I had an issue with him referencing the files in the
>> example directory, which from the website seem very unhandy. Although I
>> have since stumbled upon his GitHub, I made a Python script to grab those
>> files for me and it works great, with the exception of doubling the line
>> spacing. So here is my code. I hope you critique the heck out of my and
>> that you point out what I did wrong to introduce double line-spacing.
>> Thanks a bunch:
>> import os
>> import urllib.request
>> import re
>> url_root = 'http://diveintopython3.ep.io/**examples/<http://diveintopython3.ep.io/examples/>
>> file_root = os.path.join(os.path.**expanduser("~"), "diveintopython3",
>> main_page = urllib.request.urlopen(url_**root).read()
>> main_page = main_page.decode("utf-8")
>> pattern = 'href="([^"].*?.)(py|xml)"'
>> matches = re.findall(pattern, main_page)
>> for my_tuple in matches:
>> this_file = my_tuple + my_tuple
>> data = urllib.request.urlopen(url_**root + this_file).read()
>> data = data.decode("utf-8")
>> with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
>> as a_file:
>> You don't tell what your environment is, nor how you decide that the
> file is double-spaced. You also don't mention whether you're using Python
> 2.x or 3.x
> My guess is that you are using a Unix/Linux environment, and that the Dive
> author(s) used Windows. And that your text editor is interpreting the
> cr/lf pair (hex 0d 0a) as two line-endings. I believe emacs would have
> ignored the redundant cr. Python likewise probably won't care, though I'm
> not positive about things like lines that continue across newline
> You can figure out what is actually in the file by using repr() on bytes
> read from the file in binary mode. Exactly how you do that will differ
> between Python 2.x and 3.x
> As for fixing it, you could either just use one of the dos2unix utilities
> kicking around (one's available on my Ubuntu from the Synaptic package
> manager), or you could make your utility manage it. On a regular file
> open, there's a mode paramter that you can use "u", or better "ru" to say
> Universal. It's intended to handle any of the three common line endings,
> and use a simple newline for all 3 cases. I don't know whether urlopen()
> also has that option, but if not, you can always copy the file after you
> have it locally.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor