[Python-ideas] strptime without second argument as an inverse to __str__

Andrew Barnert abarnert at yahoo.com
Fri Aug 8 08:10:57 CEST 2014


Mixing up responses to an previous email you already responded to with the new one makes it harder to reply, but I'll try.


On Thursday, August 7, 2014 10:43 PM, Akira Li <4kir4.1i at gmail.com> wrote:
> > Andrew Barnert <abarnert at yahoo.com> writes:
> 
>>  On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i at gmail.com> wrote:
>> 
>>>  Please, don't spread misinformation.
>>> 
>>>  Among the explicit rfc 3339 design goals are simplicity and human
>>>  readability. 
>>> 
>>>  Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is
>>>  relatively short and readable).
> ...
>>  (And there's also a whole section of interpreting
>>  "legacy"/"deprecated" 2-digit years and how you should 
> handle them.)
>> 
>>  So, is the RFC "spreading misinformation" about itself?
> 
> You are *obviously* wrong for the rfc 3339 Internet Date/Time Format
> itself (used by __str__, isoformat -- relevant to the current topic).


You accused me of "spreading misinformation" by saying that RFC 3339 is more complicated than what Python's str generates. You also suggested that people often parse RFC 3339 with a simple strptime.

I pointed out that the RFC itself clearly defines something more complicated than what Python's str generates, and that it can't be fully parsed by a simple strptime, and then added a parenthetical remark about support for 2-year dates.

You cut out everything but the parenthetical remark, and replied to that as if it was the whole point of my message. And you even responded to nothing but the parenthetical 2-year comment in replies to completely separate sections of the message. I have no idea how to respond to that except to say: read it again.

The fact that a compliant parser should accept any of "T", "t", or " " as a separator already makes it impossible to parse with strptime, and more complicated than parsing Python's str output. I don't see how you can dispute that, or how it's "spreading misinformation" to point that out.

>   "and other things that you shouldn't generate but some code 

> might."
>   [Andrew Barnert] 
> 
> What things? Could you be more specific?

I was specific, and you chopped it out of my message, or moved it to a different part of the message.

>>>  The format is so simple that people just write adhoc parsers using
>>  strptime() without installing any formal rfc3339 module (if it even
>>  exists). 
>> 
>>  Sure, and people also do that and call it an ISO parser. 
>> 
>>  If it can't interoperable with everything compliant applications may
>>  generate (much less deprecated formats the standard doesn't allow you
>>  to generate but mandates how you parse and interpret), it's not
>>  accurate to call it an RFC 3339 parser (at least not in a
>>  general-purpose library).
>> 
>>  As I said before, it's still certainly much easier to write an RFC
>>  3339 parser than an ISO 8601 parser, but it's not as trivial as 
> you're
>>  implying.
> 
> Compared to the full ISO 8601 format (I don't know whether it can be
> parsed unambiguously), rfc 3339 (a conformant subset of the ISO 8601
> extended format) is simple by design.


Since you're arguing here exactly what I said, in the very paragraph you just quoted—"it's certainly much easier to write an RFC 3339 parser than an ISO 8601 parser"—I don't know why you think you have to convince me of that fact.

But again, that doesn't mean that a trivial strptime call is sufficient for a general-purpose library that claims to be a compliant RFC 3339 parser.

> Regardless the rfc language nuances, 2-digit year is harmful in practice


Of course. So what?

If my point actually were that RFC 3339 was complicated because of 2-digit years, and if that point were true, then this rebuttal might be relevant—but in that case it would only be proving the opposite point you're trying to make, that RFC 3339 parsing is _hard_. Fortunately, that isn't my point, and your statement is just irrelevant rather than contradictory to your whole argument.

> P.S.

> ...
> What is the point of the copy-paste? Does handling lowercase "t", 
> "z",
> and a space complicates the parsing in a meaningful manner?


Sure. Even ignoring the fact that you're claiming that people can parse it with strptime, the very fact that many people have written code that claims to be able to parse RFC 3339, and yet doesn't handle lowercase "t" and "z", shows that there is something you can get wrong; therefore, it is not trivial.

> Imagine a parser that supports sep='T' (default for isoformat()) and
> sep=' ' (space for __str__). How hard do you think to extend such parser
> to support 't' as a separator?


One again: "it's certainly much easier to write an RFC 3339 parser than an ISO 8601 parser," but it's not completely trivial.

You seem to think that this is a black-or-white issue: either RFC 3339 is a complete failure and it's actually as hard to parse as ISO 8601, or RFC 3339 is trivially parseable and no one should bother to write a parser for a subset of the language because it's just as easy to parse the whole thing. In reality, neither one of those is true. The latter may be closer to the truth than the former, but it's still wrong.

In other words, it makes perfect sense to write a parser for exactly what Python generates, and not claim it as RFC 3339 compliant.


More information about the Python-ideas mailing list