[Tutor] problem with back slash
Cameron Simpson
cs at cskk.id.au
Wed Feb 23 16:41:36 EST 2022
On 23Feb2022 11:37, Alex Kleider <alexkleider at gmail.com> wrote:
>I've written myself a little utility that accepts a text file which
>might have very long lines and returns a file with the same text but
>(as much as possible) with the lines no longer than MAX_LEN
>characters. (I've chosen 70.)
>It seems to work except when the the source file contains back
>slashes! (Presence of a back slash appears to cause the program
>to go into an endless loop.)
There's _nothing_ in your code which cares about backslashes.
>I've tried converting to raw strings but to no avail.
I have no idea what you mean here - all the strings you're manipulating
come from the text file. They're just "strings". A Python "raw string"
is just a _syntactic_ way to express a string in a programme, eg:
r'some regexp maybe \n foo'
After that's evaluated, it is just a string.
>Here's the code, followed by an example source file.
Thank you. This shows the bug. Here's me running it:
[~/tmp/p1]fleet2*> py3 foo.py input.txt
line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
'
writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
remaining: ''
line: '
'
writing: ''
remaining: ''
line: 'You can activate Windows 10 using the product key for your hardware which
'
^CTraceback (most recent call last):
File "/Users/cameron/tmp/p1/foo.py", line 63, in <module>
line2write, line = split_on_space_closest_to_max_len(
File "/Users/cameron/tmp/p1/foo.py", line 47, in
split_on_space_closest_to_max_len
if next_space > max_len: break
KeyboardInterrupt
It hung just before the traceback, where I interrupted it with ^C.
Now, that tells me where in the code it was - the programme is not hung,
it is spinning. When interrupted it was in this loop:
while True:
next_space = unindented_line.find(' ', i_space+1)
if next_space > max_len: break
else: i_space =next_space
On the face of that loop should always advance i_space and therefore
exit. But find() can return -1:
>>> help(str.find)
Help on method_descriptor:
find(...)
S.find(sub[, start[, end]]) -> int
Return the lowest index in S where substring sub is found,
such that sub is contained within S[start:end]. Optional
arguments start and end are interpreted as in slice notation.
Return -1 on failure.
i.e. when there is no space from the search point onward. So this could
spin out. Let's see with modified code:
print("LOOP1")
while True:
assert ' ' in unindented_line[i_space+1:], (
"no space in unindented_line[i_space(%d)+1:]: %r"
% (i_space, unindented_line[i_space+1:])
)
next_space = unindented_line.find(' ', i_space+1)
if next_space > max_len: break
else: i_space =next_space
print("LOOP1 DONE")
thus:
[~/tmp/p1]fleet2*> py3 foo.py input.txt
line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
'
writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
remaining: ''
line: '
'
writing: ''
remaining: ''
line: 'You can activate Windows 10 using the product key for your hardware which
'
LOOP1
Traceback (most recent call last):
File "/Users/cameron/tmp/p1/foo.py", line 69, in <module>
line2write, line = split_on_space_closest_to_max_len(
File "/Users/cameron/tmp/p1/foo.py", line 47, in
split_on_space_closest_to_max_len
assert ' ' in unindented_line[i_space+1:], (
AssertionError: no space in unindented_line[i_space(67)+1:]: 'which'
As suspected. Commenting out the assert and printing next_space shows
the cycle, with this code:
print("LOOP1")
while True:
##assert ' ' in unindented_line[i_space+1:], (
## "no space in unindented_line[i_space(%d)+1:]: %r"
## % (i_space, unindented_line[i_space+1:])
##)
next_space = unindented_line.find(' ', i_space+1)
print("next_space =", next_space)
if next_space > max_len: break
else: i_space =next_space
print("LOOP1 DONE")
which outputs this:
[~/tmp/p1]fleet2*> py3 foo.py input.txt 2>&1 | sed 50q
line: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935
'
writing: 'https://superuser.com/questions/1313241/install-windows-10-from-an-unbooted-oem-drive-into-virtualbox/1329935#1329935'
remaining: ''
line: '
'
writing: ''
remaining: ''
line: 'You can activate Windows 10 using the product key for your hardware which
'
LOOP1
next_space = 7
next_space = 16
next_space = 24
next_space = 27
next_space = 33
next_space = 37
next_space = 45
next_space = 49
next_space = 53
next_space = 58
next_space = 67
next_space = -1
next_space = 3
next_space = 7
next_space = 16
next_space = 24
next_space = 27
next_space = 33
and so on indefinitely. You can see next_space reset to -1.
Which I'm here, some random remarks about the code:
> original_line = line[:]
There's no need for this. Because strings are immutable, you can just
go:
original_line = line
All the other operations on "line" return new strings (because strings
are immutable), leaving original_line untouched.
> unindented_line = line.lstrip()
> n_leading_spaces = line_length - len(unindented_line)
> if n_leading_spaces > max_len: # big indentation!!!
> return ('', line[max_len:])
> indentation = ' ' * n_leading_spaces
Isn't this also unindented_line[:n_leading_spaces]? I would be inclined
to use that in case the whitespace isn't just spaces (eg TABs). Because
"line.lstrip()" strips leading whitespace, not leading spaces. This
would preserve whetever was there.
Howvere your code is focussed on the space character, so maybe a more
precise lstip() would be better:
line.lstrip(' ')
stripping only space characters.
[...]
> while True:
> next_space = unindented_line.find(' ', i_space+1)
> if next_space > max_len: break
> else: i_space =next_space
A lot of us might write this:
while True:
next_space = unindented_line.find(' ', i_space+1)
if next_space > max_len:
break
i_space =next_space
dropping the "else:". It is just style, but to my eye it is more clear
that the "i_space =next_space" is an "uncodnitional" part of the normal
loop iteration.
> for line in source:
># line = repr(line.rstrip())
The commented out line above would damage "line" (by adding quotes and
stuff to it), if uncommented.
> print("line: '{}'".format(line)) # for debugging
You know you can just write?
print("line:", line)
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list