[Python-ideas] PEP 426, YAML in the stdlib and implementation discovery

Andrew Barnert abarnert at yahoo.com
Wed Jun 5 09:48:23 CEST 2013


From: Stefan Drees <stefan at drees.name>

Sent: Tuesday, June 4, 2013 11:25 PM


> On 04.06.13 21:57, Vinay Sajip wrote:
>>  Philipp A. <flying-sheep at ...> writes:
>> 
>>>  PyYAML might not implement YAML 1.2 fully on paper, but the most useful
>>>  part of 1.2 (parsing arbitrary JSON) works flawlessly.
>> 
>>  Does it? What about this issue?
>> 
>>  https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded ...
> 
> parsing arbitrary JSON is not guaranteed by[1] "the spec" (version 
> 1.2, section 1.3, third paragraph).

Parsing arbitrary JSON _is_ guaranteed by the spec, except for the semantics of repeated keys (undefined in JSON, illegal in YAML).

Parsing arbitrary JSON is not guaranteed by section 1.3, but that's non-normative text from the introduction, and doesn't really guarantee anything.

Of course it's possible that there are errors in the spec, and therefore YAML is not a strict superset of JSON. (The errata list has been empty since the third patch to YAML 1.2 on 2009-10-01, but that just means nobody has _found_ any errors, not that none exist.)

But the example provided doesn't show that.

> Here YAML spec is very clear, as it uses the C-escape characters.


The YAML spec is very clear, as it uses _a superset of_ the C escape sequences, as you quoted immediately below. As a superset, it includes sequences that C does not, and "\/" is one of them. In section 5.7, #53 is:

    ns-esc-slash ::= "/"


With comment:

    Escaped ASCII slash (#x2F), for JSON compatibility.

Then, #62 is:

    c-ns-esc-char ::= "\"
                      ( ns-esc-null | … | ns-esc-slash | … )

So, "\/" is unambiguously a valid escape sequence.

You could argue that the semantics could be explained more clearly. (It's a huge improvement over 1.1, which can be read to imply that each escape sequence is interpreter as itself.) But it's pretty clear what's intended, and I can't think of any other reasonable way to interpret it besides the intended way.

You could also argue that it would be clearer for the spec to give the official Unicode names and/or codepoints of each escaped character, instead of informal descriptions. But I can't imagine anyone would interpret "ASCII slash (#x2F)" as ambiguous, or as a description of any Unicode character other than "Solidus (Slash)" (#2xF).

> As the JSON of the ticket is {"key": "hi\/there"} this 

> will not work in YAML as specified (in the relevant escape sequence section, as 
> "\/" will not find the target Unicode character to replace it.

Sure it will. The target Unicode character is "/".

This will not work in YAML 1.1 (see section 5.6), but it will work in YAML 1.2; this was, in fact, one of the changes specifically made to fix YAML so that it's a strict superset of JSON.

> This is not a PyYAML or libyaml problem.


Well, it's not a PyYAML problem in that PyYAML supports YAML 1.1 properly, and 1.1, unlike 1.2, did not allow "\/".

Whether it's a problem for using PyYAML as the basis for a stdlib package is a different question. Is it important that a stdlib yaml package support YAML 1.2, or that it support a strict superset of JSON? If so, it's definitely a problem; if not, it probably isn't.

> The above mentioned 3rd paragraph claiming the JSON - YAML relation as 

> automatic, as long as the keys are uniqueis wrong and should be corrected, in 
> aversion 1.3 or as errata to 1.2 (while I would prefer the former, as this is 
> IMO a nasty and irritating inconsistency).

Both JSON and YAML 1.2 will interpret the sequence "\/" as the Unicode character "/". So, the 3rd paragraph is correct.

At any rate, if you're just correcting Vinay Sajip's raising of errors with PyYAML by pointing out that this isn't an error… well, that's true, but not for the reasons you gave.


But if you're arguing that YAML 1.2 is a broken spec that's impossible to implement, or not worth implementing, you haven't made that point.

And if you're not arguing either of those, I think I've missed your point entirely. That could easily be my fault.



More information about the Python-ideas mailing list