Alternatives to XML?
Joonas Liik
liik.joonas at gmail.com
Fri Aug 26 11:00:30 EDT 2016
On 26 August 2016 at 17:58, Joonas Liik <liik.joonas at gmail.com> wrote:
> On 26 August 2016 at 16:10, Frank Millman <frank at chagford.com> wrote:
>> "Joonas Liik" wrote in message
>> news:CAB1GNpQnJDENaA-GZgt0TbcvWjaKNgD3YRoiXgyY+Mim7fw0zQ at mail.gmail.com...
>>
>>> On 26 August 2016 at 08:22, Frank Millman <frank at chagford.com> wrote:
>>> >
>>> > So this is my conversion routine -
>>> >
>>> > lines = string.split('"') # split on attributes
>>> > for pos, line in enumerate(lines):
>>> > if pos%2: # every 2nd line is an attribute
>>> > lines[pos] = line.replace('<', '<').replace('>', '>')
>>> > return '"'.join(lines)
>>> >
>>>
>>> or.. you could just escape all & as & before escaping the > and <,
>>> and do the reverse on decode
>>>
>>
>> Thanks, Joonas, but I have not quite grasped that.
>>
>> Would you mind explaining how it would work?
>>
>> Just to confirm that we are talking about the same thing -
>>
>> This is not allowed - '<root><fld name="<new>"/></root>' [A]
>>
>>>>> import xml.etree.ElementTree as etree
>>>>> x = '<root><fld name="<new>"/></root>'
>>>>> y = etree.fromstring(x)
>>
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File
>> "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\xml\etree\ElementTree.py",
>> line 1320, in XML
>> parser.feed(text)
>> xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1,
>> column 17
>>
>> You have to escape it like this - '<root><fld name="<new>"/></root>'
>> [B]
>>
>>>>> x = '<root><fld name="<new>"/></root>'
>>>>> y = etree.fromstring(x)
>>>>> y.find('fld').get('name')
>>
>> '<new>'
>>>>>
>>>>>
>>
>> I want to convert the string from [B] to [A] for editing, and then back to
>> [B] before saving.
>>
>> Thanks
>>
>> Frank
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>
> something like.. (untested)
>
> def escape(untrusted_string):
> ''' Use on the user provided strings to render them inert for storage
> escaping & ensures that the user cant type sth like '>' in
> source and have it magically decode as '>'
> '''
> return untrusted_string.replace("&","&").replace("<",
> "<").replace(">", ">")
>
> def unescape(escaped_string):
> '''Once the user string is retreived from storage use this
> function to restore it to its original form'''
> return escaped_string.replace("<","<").replace(">",
> ">").replace("&", "&")
>
> i should note tho that this example is very ad-hoc, i'm no xml expert
> just know a bit about xml entities.
> if you decide to go this route there are probably some much better
> tested functions out there to escape text for storage in xml
> documents.
you might want to un-wrap that before testing tho.. no idea why my
messages get mutilated like that :(
(sent using gmail, maybe somebody can comment on that?)
More information about the Python-list
mailing list