[Tutor] Critique and Question
Dave Angel
d at davea.name
Mon Nov 28 13:26:36 CET 2011
On 11/28/2011 04:28 AM, Mark Lybrand wrote:
> Okay, so I just started to learn Python. I have been working through Dive
> Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
> However, with Dive, I had an issue with him referencing the files in the
> example directory, which from the website seem very unhandy. Although I
> have since stumbled upon his GitHub, I made a Python script to grab those
> files for me and it works great, with the exception of doubling the line
> spacing. So here is my code. I hope you critique the heck out of my and
> that you point out what I did wrong to introduce double line-spacing.
> Thanks a bunch:
>
> import os
> import urllib.request
> import re
>
> url_root = 'http://diveintopython3.ep.io/examples/'
> file_root = os.path.join(os.path.expanduser("~"), "diveintopython3",
> "examples")
>
> main_page = urllib.request.urlopen(url_root).read()
> main_page = main_page.decode("utf-8")
>
> pattern = 'href="([^"].*?.)(py|xml)"'
> matches = re.findall(pattern, main_page)
> for my_tuple in matches:
> this_file = my_tuple[0] + my_tuple[1]
> data = urllib.request.urlopen(url_root + this_file).read()
> data = data.decode("utf-8")
> with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
> as a_file:
> a_file.write(data)
>
You don't tell what your environment is, nor how you decide that the
file is double-spaced. You also don't mention whether you're using
Python 2.x or 3.x
My guess is that you are using a Unix/Linux environment, and that the
Dive author(s) used Windows. And that your text editor is interpreting
the cr/lf pair (hex 0d 0a) as two line-endings. I believe emacs would
have ignored the redundant cr. Python likewise probably won't care,
though I'm not positive about things like lines that continue across
newline boundaries.
You can figure out what is actually in the file by using repr() on bytes
read from the file in binary mode. Exactly how you do that will differ
between Python 2.x and 3.x
As for fixing it, you could either just use one of the dos2unix
utilities kicking around (one's available on my Ubuntu from the Synaptic
package manager), or you could make your utility manage it. On a
regular file open, there's a mode paramter that you can use "u", or
better "ru" to say Universal. It's intended to handle any of the three
common line endings, and use a simple newline for all 3 cases. I don't
know whether urlopen() also has that option, but if not, you can always
copy the file after you have it locally.
--
DaveA
More information about the Tutor
mailing list