[Tutor] Critique and Question

Dave Angel d at davea.name
Mon Nov 28 13:26:36 CET 2011

On 11/28/2011 04:28 AM, Mark Lybrand wrote:
> Okay, so I just started to learn Python.  I have been working through Dive
> Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
>   However, with Dive, I had an issue with him referencing the files in the
> example directory, which from the website seem very unhandy.  Although I
> have since stumbled upon his GitHub, I made a Python script to grab those
> files for me and it works great, with the exception of doubling the line
> spacing.  So here is my code. I hope you critique the heck out of my and
> that you point out what I did wrong to introduce double line-spacing.
>   Thanks a bunch:
> import os
> import urllib.request
> import re
> url_root = 'http://diveintopython3.ep.io/examples/'
> file_root = os.path.join(os.path.expanduser("~"), "diveintopython3",
> "examples")
> main_page = urllib.request.urlopen(url_root).read()
> main_page = main_page.decode("utf-8")
> pattern = 'href="([^"].*?.)(py|xml)"'
> matches = re.findall(pattern, main_page)
> for my_tuple in matches:
> this_file = my_tuple[0] + my_tuple[1]
> data = urllib.request.urlopen(url_root + this_file).read()
> data = data.decode("utf-8")
> with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
> as a_file:
> a_file.write(data)
You don't tell what your environment is, nor how you decide that the 
file is double-spaced.  You also don't mention whether you're using 
Python 2.x or 3.x

My guess is that you are using a Unix/Linux environment, and that the 
Dive author(s) used Windows.  And that your text editor is interpreting 
the cr/lf pair (hex 0d 0a) as two line-endings.  I believe emacs would 
have ignored the redundant cr.  Python likewise probably won't care, 
though I'm not positive about things like lines that continue across 
newline boundaries.

You can figure out what is actually in the file by using repr() on bytes 
read from the file in binary mode.  Exactly how you do that will differ 
between Python 2.x and 3.x

As for fixing it, you could either just use one of the dos2unix 
utilities kicking around (one's available on my Ubuntu from the Synaptic 
package manager), or you could make your utility manage it.  On a 
regular file open, there's a mode paramter that you can use "u", or 
better "ru" to say Universal.  It's intended to handle any of the three 
common line endings, and use a simple newline for all 3 cases.  I don't 
know whether urlopen() also has that option, but if not, you can always 
copy the file after you have it locally.



More information about the Tutor mailing list