[Python-ideas] strptime without second argument as an inverse to __str__

Akira Li 4kir4.1i at gmail.com
Fri Aug 8 07:43:32 CEST 2014


Andrew Barnert <abarnert at yahoo.com> writes:

> On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i at gmail.com> wrote:
>
>> Andrew Barnert
>> <abarnert at yahoo.com.dmarc.invalid> writes:
>> 
>>> On Aug 5, 2014, at 14:46, Petr Viktorin
>>> <encukou at gmail.com> wrote:
>>>> When people say "iso" in the context of datestimes, they usually mean RFC 3339.
>>> 
>>> RFC 3339 is still more complicated than just reversing Python's str or
>>> isoformat. IIRC (it's hard to check on my phone), it mandates that
>>> parsers should accept 2-digit years (including 3-digit or
>>> semicolon-and-two-digit years), lowercase T and Z, missing "-", and
>>> other things that you shouldn't generate but some code might.
>> 
>> Please, don't spread misinformation.
>> 
>> Among the explicit rfc 3339 design goals are simplicity and human
>> readability. 
>> 
>> Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is
>> relatively short and readable).
...
> (And there's also a whole section of interpreting
> "legacy"/"deprecated" 2-digit years and how you should handle them.)
>
> So, is the RFC "spreading misinformation" about itself?

You are *obviously* wrong for the rfc 3339 Internet Date/Time Format
itself (used by __str__, isoformat -- relevant to the current topic).
http://tools.ietf.org/html/rfc3339

You can be *subtly* wrong for the rfc as a whole. 

The full ABNF from my previous message contains date-fullyear
definition. "The following profile of ISO 8601 [ISO8601] dates SHOULD be
used in new protocols on the Internet" [rfc3339]:

   date-fullyear   = 4DIGIT

It means that the year SHOULD be *exactly* 4 digits i.e., the rfc 3339
Internet Date/Time Format uses only 4-digit years.

I don't know what

  "mandates that parsers should accept 2-digit years" [Andrew Barnert]

means (does "mandates" mean SHOULD or MUST here?) but it seems inspired
by:

  "Internet Protocols MUST generate four digit years in dates. The use
  of 2-digit years is deprecated.  If a 2-digit year is received, it
  should be accepted ONLY if an incorrect interpretation will not cause
  a protocol or processing failure" [rfc3339]

and the words MUST and SHOULD are well-defined [rfc2119].

Do you see that 2-digit year MUST or SHOULD be accepted? (I don't see
it). 

  'missing "-"' [Andrew Barnert]

seems like an obvious mistake. All punctuaction except optional
time-secfrac (fraction of a second) is mandatory.

  "and other things that you shouldn't generate but some code might."
  [Andrew Barnert] 

What things? Could you be more specific? There is not much room in the
format. Redundant information is not included.

>> The format is so simple that people just write adhoc parsers using
> strptime() without installing any formal rfc3339 module (if it even
> exists). 
>
> Sure, and people also do that and call it an ISO parser. 
>
> If it can't interoperable with everything compliant applications may
> generate (much less deprecated formats the standard doesn't allow you
> to generate but mandates how you parse and interpret), it's not
> accurate to call it an RFC 3339 parser (at least not in a
> general-purpose library).
>
> As I said before, it's still certainly much easier to write an RFC
> 3339 parser than an ISO 8601 parser, but it's not as trivial as you're
> implying.

Compared to the full ISO 8601 format (I don't know whether it can be
parsed unambiguously), rfc 3339 (a conformant subset of the ISO 8601
extended format) is simple by design.

It *is* accurate to call it (with only 4-digit year support) an rfc 3339
parser in a general purpose library: compliant software MUST generate
4-digit year, Internet Date/Time Format SHOULD contain ONLY 4-digit year
(show quotes from the rfc that contain MUST, SHOULD, etc that say
otherwise if you disagree).

To be fair, the rfc *does not forbid* 2-digit year outright in *all
possible cases*.

Regardless the rfc language nuances, 2-digit year is harmful in practice
-- different software may interpret it differently: software that is
aware of rfc 3339 MUST generate 4-digit year, software that is not aware
of rfc 3339 can interpret 2-digit year differently. If an error is
possible when 2-digit year should not be used: "it should be accepted
ONLY if an incorrect interpretation will not cause a protocol or
processing failure" [rfc3339]


P.S.
...
>
> OK, I just read it. Among other things:
>
>> NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
>> syntax may alternatively be lower case "t" or "z" respectively. This
>> date/time format may be used in some environments or contexts that
>> distinguish between the upper- and lower-case letters 'A'-'Z' and
>> a'-'z' (e.g. XML). Specifications that use this format in such
>> environments MAY further limit the date/time syntax so that the
>> letters 'T' and 'Z' used in the date/time syntax must always be
>> upper case. Applications that generate this format SHOULD use upper
>> case letters. NOTE: ISO 8601 defines date and time separated by
>> "T". Applications using this syntax may choose, for the sake of
>> readability, to specify a full-date and full-time separated by (say)
>> a space character. Klyne, et. al. Standards Track [Page 8]

What is the point of the copy-paste? Does handling lowercase "t", "z",
and a space complicates the parsing in a meaningful manner?

Imagine a parser that supports sep='T' (default for isoformat()) and
sep=' ' (space for __str__). How hard do you think to extend such parser
to support 't' as a separator?


--
Akira



More information about the Python-ideas mailing list