[Tutor] problem with back slash
Alex Kleider
alexkleider at gmail.com
Wed Feb 23 22:33:28 EST 2022
Thank you, Cameron, for pointing me in the correct direction- it's just
coincidence that the program failed on the line that happened to contain
backslashes!
Changing the conditional for the 'break' solved the problem:
if (next_space == -1) or (next_space > max_len):
break
And thank you for your other suggestions as well.
Thanks to Dennis and Matts for drawing my attention to the textwrap
module; I was not aware of its existence.
On Wed, Feb 23, 2022 at 3:30 PM Cameron Simpson <cs at cskk.id.au> wrote:
> On 23Feb2022 11:37, Alex Kleider <alexkleider at gmail.com> wrote:
> >I've written myself a little utility that accepts a text file which
> >might have very long lines and returns a file with the same text but
> >(as much as possible) with the lines no longer than MAX_LEN
> >characters. (I've chosen 70.)
> >It seems to work except when the the source file contains back
> >slashes! (Presence of a back slash appears to cause the program
> >to go into an endless loop.)
>
> There's _nothing_ in your code which cares about backslashes.
>
> >I've tried converting to raw strings but to no avail.
>
> I have no idea what you mean here - all the strings you're manipulating
> come from the text file. They're just "strings". A Python "raw string"
> is just a _syntactic_ way to express a string in a programme, eg:
>
> r'some regexp maybe \n foo'
>
> After that's evaluated, it is just a string.
>
> >Here's the code, followed by an example source file.
>
> Thank you. This shows the bug. Here's me running it:
>
> [~/tmp/p1]fleet2*> py3 foo.py input.txt
> line: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> writing: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> remaining: ''
> line: '
> '
> writing: ''
> remaining: ''
> line: 'You can activate Windows 10 using the product key for your
> hardware which
> '
>
> ^CTraceback (most recent call last):
> File "/Users/cameron/tmp/p1/foo.py", line 63, in <module>
> line2write, line = split_on_space_closest_to_max_len(
> File "/Users/cameron/tmp/p1/foo.py", line 47, in
> split_on_space_closest_to_max_len
> if next_space > max_len: break
> KeyboardInterrupt
>
> It hung just before the traceback, where I interrupted it with ^C.
>
> Now, that tells me where in the code it was - the programme is not hung,
> it is spinning. When interrupted it was in this loop:
>
> while True:
> next_space = unindented_line.find(' ', i_space+1)
> if next_space > max_len: break
> else: i_space =next_space
>
> On the face of that loop should always advance i_space and therefore
> exit. But find() can return -1:
>
> >>> help(str.find)
> Help on method_descriptor:
>
> find(...)
> S.find(sub[, start[, end]]) -> int
>
> Return the lowest index in S where substring sub is found,
> such that sub is contained within S[start:end]. Optional
> arguments start and end are interpreted as in slice notation.
>
> Return -1 on failure.
>
> i.e. when there is no space from the search point onward. So this could
> spin out. Let's see with modified code:
>
> print("LOOP1")
> while True:
> assert ' ' in unindented_line[i_space+1:], (
> "no space in unindented_line[i_space(%d)+1:]: %r"
> % (i_space, unindented_line[i_space+1:])
> )
> next_space = unindented_line.find(' ', i_space+1)
> if next_space > max_len: break
> else: i_space =next_space
> print("LOOP1 DONE")
>
> thus:
>
> [~/tmp/p1]fleet2*> py3 foo.py input.txt
> line: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> writing: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> remaining: ''
> line: '
> '
> writing: ''
> remaining: ''
> line: 'You can activate Windows 10 using the product key for your
> hardware which
> '
> LOOP1
> Traceback (most recent call last):
> File "/Users/cameron/tmp/p1/foo.py", line 69, in <module>
> line2write, line = split_on_space_closest_to_max_len(
> File "/Users/cameron/tmp/p1/foo.py", line 47, in
> split_on_space_closest_to_max_len
> assert ' ' in unindented_line[i_space+1:], (
> AssertionError: no space in unindented_line[i_space(67)+1:]: 'which'
>
> As suspected. Commenting out the assert and printing next_space shows
> the cycle, with this code:
>
> print("LOOP1")
> while True:
> ##assert ' ' in unindented_line[i_space+1:], (
> ## "no space in unindented_line[i_space(%d)+1:]: %r"
> ## % (i_space, unindented_line[i_space+1:])
> ##)
> next_space = unindented_line.find(' ', i_space+1)
> print("next_space =", next_space)
> if next_space > max_len: break
> else: i_space =next_space
> print("LOOP1 DONE")
>
> which outputs this:
>
> [~/tmp/p1]fleet2*> py3 foo.py input.txt 2>&1 | sed 50q
> line: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> writing: '
> https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
> '
> remaining: ''
> line: '
> '
> writing: ''
> remaining: ''
> line: 'You can activate Windows 10 using the product key for your
> hardware which
> '
> LOOP1
> next_space = 7
> next_space = 16
> next_space = 24
> next_space = 27
> next_space = 33
> next_space = 37
> next_space = 45
> next_space = 49
> next_space = 53
> next_space = 58
> next_space = 67
> next_space = -1
> next_space = 3
> next_space = 7
> next_space = 16
> next_space = 24
> next_space = 27
> next_space = 33
>
> and so on indefinitely. You can see next_space reset to -1.
>
> Which I'm here, some random remarks about the code:
>
> > original_line = line[:]
>
> There's no need for this. Because strings are immutable, you can just
> go:
>
> original_line = line
>
> All the other operations on "line" return new strings (because strings
> are immutable), leaving original_line untouched.
>
> > unindented_line = line.lstrip()
> > n_leading_spaces = line_length - len(unindented_line)
> > if n_leading_spaces > max_len: # big indentation!!!
> > return ('', line[max_len:])
> > indentation = ' ' * n_leading_spaces
>
> Isn't this also unindented_line[:n_leading_spaces]? I would be inclined
> to use that in case the whitespace isn't just spaces (eg TABs). Because
> "line.lstrip()" strips leading whitespace, not leading spaces. This
> would preserve whetever was there.
>
> Howvere your code is focussed on the space character, so maybe a more
> precise lstip() would be better:
>
> line.lstrip(' ')
>
> stripping only space characters.
>
> [...]
> > while True:
> > next_space = unindented_line.find(' ', i_space+1)
> > if next_space > max_len: break
> > else: i_space =next_space
>
> A lot of us might write this:
>
> while True:
> next_space = unindented_line.find(' ', i_space+1)
> if next_space > max_len:
> break
> i_space =next_space
>
> dropping the "else:". It is just style, but to my eye it is more clear
> that the "i_space =next_space" is an "uncodnitional" part of the normal
> loop iteration.
>
> > for line in source:
> ># line = repr(line.rstrip())
>
> The commented out line above would damage "line" (by adding quotes and
> stuff to it), if uncommented.
>
> > print("line: '{}'".format(line)) # for debugging
>
> You know you can just write?
>
> print("line:", line)
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list