Literal concatenation, strings vs. numbers (was: Numeric literals in other than base 10 - was Annoying octal notation)

Mon Aug 24 06:56:08 EDT 2009

On Aug 23, 7:45 pm, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
> greg <g... at cosc.canterbury.ac.nz> writes:
> > J. Cliff Dyer wrote:
>
> > > What happens if you use a literal like 0x10f 304?
>
> > To me the obvious thing to do is concatenate them textually and then
> > treat the whole thing as a single numeric literal. Anything else
> > wouldn't be sane, IMO.
>
> Yet, as was pointed out, that behaviour would be inconsistent with the
> concatenation of string literals::
>
>     >>> "abc" r'def' u"ghi" 'jkl'
>     u'abcdefghijkl'

Well my take on it is that this would not be the same as string
concatenation, the series of digits would be parsed as a single token
with spaces automatically removed.  That does make a difference to the
users (it's not just under the covers).

For instance, string concatenation works across lines:

"abc"
"def"

but if the numbers were parsed as a single token it wouldn't
necessarily be allowed, and would be unwise, so this is out:

100
200

You might want to also enforce rules such as only a single space can
separate digits, no tabs, not multiple spaces, so this

100  200

would also be right out.  You might even want to enforce that spaces
be at regular intervals.  I don't think it would matter too much that
digit separation can superficially resemble string concatenation if
you don't break the strings across lines, it's not too difficult to
explain what the difference is, and there's really not much chance
anyone would be confused by their meanings.

Having said all that, I would favor _ as a digit separator in Python
any day of the week, and I don't think it's all that important to have
one at all.

HOWEVER, I once proposed that if I were designing a new language I'd
consider allowing spaces in identifiers.  (That didn't stop people
from arguing why it would be confusing in Python, but never mind
that.)  If spaces were allowed in identifiers, then I'd be also in
favor of spaces in numeric literals.

> So, different representations of literals are parsed as separate
> literals, then concatenated. To have the behaviour you describe, the
> case needs to be made separately that digit concatenation should not be
> consistent with the established string literal parsing behaviour.

Well, one doesn't really *need* to make that case, they just might not
care about consistency.

But if they did I think Erik's case is a good one: very little chance
of confusion because there's really only one reasonable
interpretation.  The point of consistency is to help understand things
by analogy, but if analogy doesn't help understanding--and it wouldn't
in this case--there's no point.

Carl Banks