basic question: target assignment in for loop

Mon Feb 24 02:42:05 EST 2003

Kawaldeep Grewal wrote:

> hello,
> 
> this may be a faq, and if it is, I would appreciate a pointer in the
> right direction.
> 
> I'm using python to edit some text/html with the re module. I want to do
> this:
> 
> html = htmlFile.readlines()
> for line in html:
>        line = re.sub("regexString", functionReturningString, line),
> 
> but python assigns the target by value and not by reference, which (in
> my mind) breaks the abstraction. So, I have to resort to this:

The terms "by value" and "by reference" are somewhat muddled.  But
anyway, in Python, ANY statement of the form

    <simple name> = <expression>

in ANY context and WITHOUT exceptions, re-binds the simple name to
the expression's value, and NEVER affects in any way, shape, or
form, whatever object (if any) was previously bound to the same
simple name (a footnote for pedants: if the re-binding leaves no
references at all to the "whatever object", Python is then free
to garbage collect it either immediately or later, of course, but
that doesn't affect the rule).

I don't see how this simple, general, universal rule bereft of any
exception can be said to "break the abstraction".

> html = htmlFile.readlines()
> i = 0
> while i < len(html):
>         html[i] = re.sub("regexString", functionReturningString, html[i])
>         i = i + 1
> 
> this code is decidedly not elegant, and looks very C-ish. As I'm new to
> python, can anyone tell me whether I'm just confused or that this is the
> way to do things?

A more idiomatic but essentially equivalent way:

html = htmlFile.readlines()
for i in range(len(html)):
        html[i] = re.sub("regexString", functionReturningString, html[i])

a Python 2.3 variant:

html = htmlFile.readlines()
for i, line in enumerate(html):
        html[i] = re.sub("regexString", functionReturningString, line)

a list-comprehension (LC) variant:

html = [ re.sub("regexString", functionReturningString, line)
    for line in htmlFile.readlines() ]

or more naturally:

html = [ re.sub("regexString", functionReturningString, line)
    for line in htmlFile ]

The key to conceptualizing any of these variants (except the LC
ones): you are not "changing each line" -- this is impossible as
strings are immutable -- rather, you ARE changing (by rebinding
items) the list bound to name 'html'.  The one clean way to
rebind an item of that list is to assign to html[i], of course.

The LC variants do away with the concept of "changing" anything:
they simply build the list you want, period.  I like that a lot,
since I think conceptualizing by "building the object I want" is
quite a bit simpler and more elegant than any conceptualization
based on altering data structures, when both are applicable.

Alex