Perl is worse!

Sun Jul 30 08:52:12 EDT 2000

"Steve Lamb" <grey at despair.rpglink.com> wrote in message
news:slrn8o74um.fjt.grey at teleute.rpglink.com...
> On Sun, 30 Jul 2000 02:22:31 +0200, Alex Martelli <alex at magenta.com>
wrote:
> >And of course None can also be converted, it's just not converted
> >by a plain int(x).  But it just takes an int(x or 7) to fix that,
> >for example (if one wants None to be converted to 7...:-).
>
>     It is my understand that in an int(None or 0) what is happening is
that
> the None fails, the or kicks in and we're really int()ing an integer in
the
> first place, not actually converting the None.

There is no 'fail' involved (Icon is an interesting programming language
that models Boolean expressions as succeed/fail, but Python, like most
other languages, doesn't; its modeling of Booleans is rather in terms
of true/false).  What does it mean to 'convert' an immutable object
(such as a character string, a number, or None) to another immutable
object (ditto)?  Clearly, it means having a one-argument function (in
the mathematical sense of a mapping; syntactically, any expression),
where the object we are 'starting from' takes part, and which yields
as its result the object we are 'ending with'.  Presumably, if we are
using the verb 'convert', we are thinking of the result object as
somehow 'equivalent and corresponding' to the argument object in a
different domain.

The expression int(x or 0) yields a result depending solely on x (for
some values of x, it fails, i.e., raises an exception).  If x is a
non-empty sequence of decimal digits (that is not too long), it yields
the integer 'corresponding' to that sequence in decimal notation; if
x is None, it yields 0; if x is a floating-point number (that is not
too large in magnitude), it yields its integer-part; and so on.

So, _of course_ we ARE 'converting' whatever value x happens to refer
to (or, if failing to do so, an exception is raised).  The work is
done by the int builtin function if x evaluates to 'true' (e.g., any
non-empty string); if x evaluates to 'false', then int sees 0 as an
argument, so it just returns it.  What purpose is served (except a
feeble attempt at rhetorics) by maintaining that 'we are not actually
converting x' in either case?  Of course *we* are doing no such thing,
in a sense -- rather, the *program fragment* in question is doing so
(in accordance to language-rules, specifically the semantics of int
and or).  What a captious distinction to try to draw here!

> >Losing accumulated logs, and/or logging false data, would be utter
> >disasters.
>
>     You're speaking in absolute terms on something which is best
determined by
> the end user, not the language.  To me, for a lot of the logs I process,
no,
> it isn't a disaster.  It isn't even a minor annoyance.  It is just a speck
of
> dust on a rather large desk that I couldn't care less about.  Having the
logs
> processing puke on me, or any other of dozens of applications I can think
of,
> because of a inconsiquential burp in the data is the real annoyance.

If it's OK for logs to be destroyed, or made up of arbitrary data, then
the appropriate program to use is probably rm or whatever equivalent way
of deleting files your chosen platfor supplies (this will save disk
space compared to filling the logs from /dev/random).

Given you don't care about such wholesale destruction, what is the 'puke
on me' behaviour that is such a huge annoyance to you?  I assume you
refer to ERROR MESSAGES -- any indication that the failure (which you
do not care about, so it's hard to fathom why you're running the
program, or had it written at all) has taken place.  Very well, then,
o peculiar 'end user', append a 2>/dev/null to the program you're
running, if your platform supports that idiom.  In programming terms,
the most cost-effective program to satisfy such requirements is
    pass
but if your program must also satisfy sensible users, ones that
*desperately WANT* error indications, rather than silent corruption
of data, if errors occur, then it's also very easy: just write all
your processing as a function named dowork, and the main line of
processing as:
    try:
        dowork()
    except:
        if sensible_user():
            diagnose_error()

On the other hand, if you let undetected errors creep into the
data, there is no simple way to ever make your program at all
acceptable to the sensible users who care about the integrity
of their data (or else they wouldn't be bothering to process
them in the first place!).

> >bug is clearly the priority; if subtly-false data were being logged
> >instead,
>
>     Uhm, logs processing process logs for one of a variety of uses.  IE,
the
> data has already been logged, how can it be falsely logged again?

*instead*, not *again*.  If you're summarizing existing logs into
new more selective ones (as opposed to other forms of query), then
(if the program has subtle data-falsifying bugs) the subtly-false
data will be logged (into the new logs).

One is presumably processing existing logs to extract data upon
which business decisions will be based (if there is no interest
whatsoever in the data extracted, why is it being extracted?!).
Basing decisions upon false data is far worse than _knowing_ the
data are not available because of some error!

> >Why would you need to keep track of all things in the former
> >paragraph?  Just be explicit about what is to be done for each
> >of the cases it can be, and you're all set.
>
>     Except now you need to do tons of conversion at different levels
depending
> on what you do with the data.  You either type it up front and have it
fail
> later, or type it later dozens of times.  Neither is appealing to me.

This makes no sense to me.  What tons of conversion?  What dozens of
times?  I think the likeliest hypothesis is that you're tackling the
whole problem in a way that's easy to enhance, just as it was for
the key-into-dictionary issue (nothing particulary wrong with using
a string as key, but using the tuple directly is far simpler) and even
more for the extract-numbers-from-re-matches one (where any of a half
dozen or more alternative solutions were far simpler and easier than
the one originally considered).  So, try again, giving any example of
the 'tons of conversions' and/or the 'type it dozens of times', and
once again we'll presumably show how _it just isn't so_...

> >Furthermore, I keep pointing out that, if you call the groups method
> >of the match object, what Python returns for non-matching groups IS
> >UP TO YOU: it will return the argument you pass to groups, None if
> >you choose to pass no argument.  So, *what* is supposed to be 'sad'?!
>
>     I don't see where I can tell it to pass something other than None on a
> match.  The docs I read, (not here) state that it /will/ return None on a
> no-match, not that it /can/ and that you can override it.

Look at file match-objects.html in your Python docs.  The entry for
the groups method says, essentially:

groups ([default])
    Return a tuple containing all the subgroups of the match, from 1 up to
    however many groups are in the pattern. The default argument is used for
    groups that did not participate in the match; it defaults to None.

This seems quite clear to me!  Maybe your confusion comes from the group()
method (singular, not plural) that behaves differently.  So, if one method
does what you want and another doesn't, use the method that does fit your
needs -- that seems rather obvious advice.

Quoting from your message of 07/27, what you were doing was:

: var1 = int(match.group(1))
: var2 = int(match.group(2))
: var3 = match.group(3)        # This is never an int, always a string.
: if match.group(4):           # need to check since you cannot
:   var4 = int(match.group(4)) # int() a None
:
:     Maybe I am missing something, I mean this is my first script after
"hello

And the answer, yes, you ARE (were?) missing something.  One thing, if I
recall correctly, was to arrange the RE so that the sign (plus or minus,
which you were stuffing into var3) and the digits accompanying it (if any:
what you were stuffint into var4) got matched together.  Then, apart from
the likely inappropriateness of the variable names,

    var1,var2,var3 = map(int,match.groups(0))

becomes the most concise (and no doubt fastest) solution.

If, in another situation, you did need to extract both strings and
integers (possibly 0's in correspondence to no-matches) separately,
several other approaches offer themselves, of course.  For example:

def extract(match, indices=None, default=None, transform=None):
    results = []
    for i in indices or range(1,1+len(match.groups())):
        x = match.group(i) or default
        if transform: x = transform(x)
        results.append(x)
    return results

Of course, you can recast the body of this function in many
different ways, also depending on whether you have list
comprehensions (Python 1.6 or later), and on your favourite
stylistical tradeoffs between concision, readability, and
performance considerations.  But these half dozen lines, of
course, are not the essence; as long as their functionality
is there, packed in a reusable form (function or class),
and suitably general, then how exactly you implement that
functionality is a third-order issue.

Then, to extract certain groups as integers defaulting to 7,
    vara,varb,varc=extract(match,(1,4,6),7,int)
while others are extracted as strings defaulting to "foo":
    varx,vary,varz=extract(match,(2,3,5),"foo")

Alex