PyYaml?

Paul Moore pf_moore at yahoo.co.uk
Sun Sep 19 09:53:22 EDT 2004


"Chris S." <chrisks at NOSPAM.udel.edu> writes:

> Wilk wrote:
>
>> "Chris S." <chrisks at NOSPAM.udel.edu> writes:
>> 
>>>Is there any benefit to Pickle over YAML? Given that Pickle is
>>>insecure, wouldn't it make more sense to support a secure
>>>serialization format, one that's even readable to boot, such as YAML?
>>>There's even a pure Python implementation at www.pyyaml.org
>> There is others advantages using yaml instead of pickle anyway
>> (portability, readability...) Syck is even faster than pickle i
>> think.
>> http://whytheluckystiff.net/syck/
>
> I agree completely, although I've been surprised by the general lack of 
> interest around here. You'd think a more secure, portable, and readable 
> serialization format would be welcomed with open arms, yet most of the 
> comments I've read past and present have been almost hostile.

"Hostile" seems a little exaggerated. The original posting (quoted
above) asked the question "Is there any benefit to Pickle over YAML?"
I suppose that a reasonable answer (from me) might be "not that I
know of", but that begs the question, as I know very little of YAML.

Maybe the original poster (or some other supporter of YAML) could
provide some reasons to think that YAML *might* be superior to
Pickle. Then the people who know about Pickle could respond more
helpfully.

For example, you (Chris S) claim that YAML is "more secure, portable,
and readable". OK, let's take these in turn:

More secure - as others have pointed out, Pickle allows pickling and
unpickling of class instances, and class code can do what it likes in
the constructor (I oversimplify here, as I don't know the details well
myself). Sure, this is a security issue, but it's an inherent
insecurity in the feature, and not limited to Pickle. If YAML
implemented the same feature, it would have the same issues to
resolve. Improving security by removing features isn't a clear win for
YAML (note thet I am not saying that security in exchange for reduced
features might not be a good tradeoff in some cases - I'm addressing
the "replace Pickle with YAML" suggestion, not a suggestion that we
have both).

More portable - hmm, OK. I'm not sure where you want portability
*between*, though. Pickle is, as far as I know, portable across
platforms. Are you talking about portability between languages? I
can't think where I'd want to dump a Python object for loading into
Perl or Ruby, though. Can you offer me some real-life use cases?

More readable - I'll give you this. And yes, it can be useful. I've
been stuffed before now with Java programs whose configuration is
stored as a serialised-to-disk object which is completely opaque to
external tools, let alone human readers. But this is a property that
is useful only in case of failure (if the config gets stuffed, I can
hand-hack the dump file, or if I forget what I set parameter X to, I
can look in the dump). If the application design *requires* the dump
format to be readable, we've moved away from serialisation, and
started to talk about configuration formats (which is a separate
issue, one in which it is quite possible that YAML is strong, but
*not* one in which it is competing with Pickle).

>> But all theses projects seems to sleep...
>
> Can you blame them from the lack of interest? No good idea goes
> unpunished... Ironically, YAML borrows key ideas from several
> languages, including Python.

I have certainly looked at YAML. I have to say that I wasn't really
sure what it *was* though. It seems to claim to be different things
at different times - a serialisation format, a config file format, a
replacement for XML, ... At the time, I was looking for a config
format, and it wasn't *quite* what I wanted, because some of the
serialisation and XML aspects made it slightly clumsy as a config
format. I suspect that people who want to use YAML for serialisation,
or as an XML replacement, may feel the same way. And yet, I don't get
the feeling that YAML is being developed as a "compromise" format, so
I am obviously missing a key design principle.

As regards the existing YAML libraries for Python, when I looked I
found that the PyYAML website claimed that it was out of date with
respect to the latest spec. I also tried SYCK, which looks OK, but
which I did manage to provoke a crash from without trying too hard.
Also, there were a number of features (not that I know how important
they are) marked with "Available in Ruby" (and hence not Python, I
assume, given that other features mention Python explicitly).

None of this is a criticism of YAML and/or its libraries themselves.
However, it does make any suggestion that YAML be used to replace a
key part of the Python standard library seem a little premature, at
least.

I hope this response didn't come across as hostile - I certainly
don't intend it that way. But I do believe that it is the
responsibility of those making the suggestion that YAML replace
pickle to come up with decent arguments. (Or a robust, tested,
documented patch for the Python core, of course - that avoids the
impression that the requester is hoping that someone else will do the
work for him :-))

I'd like to see a strong (this includes "well-documented"!! :-)) YAML
library for Python, if only so I could try it out and find out what
YAML *is* good for, in my environment. In theory, I like YAML - it's
just the practicalities that elude me.

[Later]
I just re-read some of the YAML website. It appears clear from there
that YAML is designed as a serialisation format. But there seems to
be a lack of justification as to *why* the design goals (section 1.1
of the spec) are important. Also, security is *not* an explicit goal,
and section 3.1.6 (the "Construct" process) is completely lacking in
any discussion of the security or other implications of converting a
YAML file to a native language object. This seems somewhat surprising
in a specification for a serialisation format...

Paul.
-- 
Home computers are being called upon to perform many new functions,
including the consumption of homework formerly eaten by the dog --
Doug Larson



More information about the Python-list mailing list