[Python-ideas] PEP 426, YAML in the stdlib and implementation discovery
Stefan Drees
stefan at drees.name
Wed Jun 5 08:25:50 CEST 2013
On 04.06.13 21:57, Vinay Sajip wrote:
> Philipp A. <flying-sheep at ...> writes:
>
>> PyYAML might not implement YAML 1.2 fully on paper, but the most useful
>> part of 1.2 (parsing arbitrary JSON) works flawlessly.
>
> Does it? What about this issue?
>
> https://bitbucket.org/xi/pyyaml/issue/11/valid-json-not-being-loaded ...
if "TL;DR":
summary()
parsing arbitrary JSON is not guaranteed by[1] "the spec" (version 1.2,
section 1.3, third paragraph). There I read wording like eg. """YAML can
therefore be viewed as a natural superset of JSON, offering improved
human readability and a more complete information model. This is also
the case in practice; every JSON file is also a valid YAML file.[...]"""
and even states, that the only issue might be """JSON's RFC4627 requires
that mappings keys merely “SHOULD” be unique, while YAML insists they
“MUST” be. Technically, YAML therefore complies with the JSON spec,
choosing to treat duplicates as an error. In practice, since JSON is
silent on the semantics of such duplicates, the only portable JSON files
are those with unique keys, which are therefore valid YAML files. """
(4th paragraph ibid).
So the first sentence might even match perl "can [...] be viewed" as
python :-) and the second (as the 4th paragraph) is in error, as JSON
allows insertion of backslash in double quoted string value and
associates no meaning with it, but YAML (read on) does!
So arbitrary JSON should cover the "russian doll" style of escaping data
in serialization for some end-target (like the use case in the ticket,
where some client likes slashes to be prepended by a backslash) are
brittle at best.
Here YAML spec is very clear, as it uses the C-escape characters.
C.f. somewhere in section 2.3 "The double-quoted style provides escape
sequences." Where escape sequences in YAML are explained in section 5.7,
that starts with """ All non-printable characters must be escaped. YAML
escape sequences use the “\” notation common to most modern computer
languages. Each escape sequence must be parsed into the appropriate
Unicode character. The original escape sequence is a presentation detail
and must not be used to convey content information.
Note that escape sequences are only interpreted in double-quoted
scalars. In all other scalar styles, the “\” character has no special
meaning and non-printable characters are not available. """
and continues with """ YAML escape sequences are a superset of C’s
escape sequences:"""
As the JSON of the ticket is {"key": "hi\/there"} this will not work in
YAML as specified (in the relevant escape sequence section, as "\/" will
not find the target Unicode character to replace it.
This is not a PyYAML or libyaml problem. Consider the following C program:
main(){char a[] = "\/";}
compiling will not work, as the compiler catches the error: unknown
escape sequence '\/'
This is where PyYAML (and libyaml) is correct in throwing an error, as
the spec is mandating escape sequences (and there interpretation).
The above mentioned 3rd paragraph claiming the JSON - YAML relation as
automatic, as long as the keys are uniqueis wrong and should be
corrected, in aversion 1.3 or as errata to 1.2 (while I would prefer the
former, as this is IMO a nasty and irritating inconsistency).
References:
[1]: http://www.yaml.org/spec/1.2/spec.html
def summary():
"""If the post is too long, summarize."""
print('YAML v1.2 is inconsistent and')
print('can't parse \/ in a double quoted string')
All the best,
Stefan.
More information about the Python-ideas
mailing list