Specifying an API for a straeming parser

tyler at monkeypox.org tyler at monkeypox.org
Fri Dec 4 16:51:15 EST 2009


Howdy folks, I'm working on a JSON Python module [1] and I'm struggling with an
appropriate syntax for dealing with incrementally parsing streams of data as
they come in (off a socket or file object). 

The underlying C-level parsing library that I'm using (Yajl [2]) already uses a
callback system internally for handling such things, but I'm worried about:
  * Ease of use, simplicity
  * Python method invocation overhead going from C back into Python

One of the ideas I've had is to "iterparse" a la:

    >>> for k, v in yajl.iterloads(fp):
    ...     print ('key, value', k, v)
    >>> 

Effectively building a generator for the JSON string coming off of the `fp`
object and when generator.next() is called reading more of the stream object.
This has some shortcomings however:
  * For JSON like: '''{"rc":0,"data":<large JSON object>}''' the iterloads() 
    function would block for some time when processing the value of the "data"
    key.
  * Presumes the developer has prior knowledge of the kind of JSON strings
    being passed in

I've searched around, following this "iterloads" notion, for a tree-generator
and I came up with nothing.

Any suggestions on how to accomplish iterloads, or perhaps a suggestion for a
more sensible syntax for incrementally parsing objects from the stream and
passing them up into Python?

Cheers,
-R. Tyler Ballance
--------------------------------------
 Jabber: rtyler at jabber.org
 GitHub: http://github.com/rtyler
Twitter: http://twitter.com/agentdero
   Blog: http://unethicalblogger.com



[1] http://github.com/rtyler/py-yajl
[2] http://lloyd.github.com/yajl/


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20091204/f6e53599/attachment.sig>


More information about the Python-list mailing list