[BangPypers] (no subject)

Dhananjay Nene dhananjay.nene at gmail.com
Thu Sep 26 00:21:31 CEST 2013


On Wed, Sep 25, 2013 at 7:09 PM, Vineet Naik <naikvin at gmail.com> wrote:
> On Tue, Sep 24, 2013 at 6:19 PM, Dhananjay Nene <dhananjay.nene at gmail.com>wrote:
>
>> On Tue, Sep 24, 2013 at 6:11 PM, Dhananjay Nene
>> <dhananjay.nene at gmail.com> wrote:
[..]
>>
>> which just trashed the ordering of an and followed by a plus followed by
>> an and.
>>
>
> This is a more serious problem particularly if the dict is required to be
> serialized back to xml.
= = =
TL;DR. Use domain rather than use cases for exploring data structures
especially where they can cause a lot of change later. Semantic
Consistency is important, don't treat it lightly. Impedance mismatch
can be fragile. If feasible, stay away.
= = =

You probably did not mean this literally, but interpreting it to be so
helps unravel a couple of interesting issues, since it attempts to
reach a judgement based on a use case and doesn't reflect on the
semantics of that data.

a) Do you design a data structure based on a use case or a domain.
I've often preferred to use the underlying domain representation
rather than the minimalistic needed by use cases. eg. When designing
tables, I would choose to not add a column unless it is necessary for
the current use cases. However if the current use cases simplify a
one-to-many relationship to one-to-one, and a reasonable domain
assessment concludes that the relation in the wild is a one-to-many, I
prefer to model it as a one-to-many. That is because over a period of
time more aspects of the domain get implemented and some of these
early decisions need to be revisited. Adding columns to a table is
relatively less expensive than changing the cardinality of table
relationships. (YAGNI proponents may choose to disagree, and thats
fair - I am just stating my opinion here)

Why is it relevant here ? It doesn't matter whether the dict is
required to be serialized back to xml today. What matters more is
whether the domain suggests that ordering is important. I would use
that as deciding factor rather than what operations get done on the
data in the current use cases. If order is unimportant, round tripping
won't hurt, and if it does, just throw xmltodict away.

b) There is yet another issue here. The thread primarily began by
attempting a conversion assuming syntactic equivalence without
verifying semantic compatibility. While JSON / YAML are quite
compatible with each other, XML has a different set of semantics. (and
I don't even refer to some of the issues as discussed at
http://www.w3.org/DesignIssues/XML-Semantics.html and
http://www.balisage.net/Proceedings/vol6/html/Dombrowski01/BalisageVol6-Dombrowski01.html
). I refer to a simple difference. JSON / YAML have two types - a
sequence and a map. The sequence is ordered, the map isn't. Each of
the sequence or a map may contain other primitives or other sequences
or maps. There is no SequenceMap type. A type which has a name, has a
map having key value pairs, but also is a sequence of other
SequenceMaps with potentially not necessarily consecutively repeating
names. There simply isn't any similar pythonic structure (The way I
have worded it make it sound like a structural characteristic rather
than semantics, but I'll continue with that word for the moment).

Why is this relevant ? When one decides on a JSON / YAML file format,
one commits to one set of semantics and when one commits to a XML it
is a different set of semantics. The issue is as the software keeps on
growing a file that is in JSON will more likely continue to grow
differently in terms of its schema than an equivalent XML file. So if
you are the author of the file format just make it a JSON if you want
simple pythonic structures. But if you are choosing to deal with XML
because that format is not in your control, there is a good likelihood
that the maintainer of that file will continue to grow it with XML
semantics in mind, and even if under today's use cases you could find
a way to coerce a XML to primitive python structures like JSON, that
whole approach could break as you go along over the next few years of
maintenance.


More information about the BangPypers mailing list