[Chicago] AppEngine Bulk Loading

Cosmin Stejerean cstejerean at gmail.com
Thu May 15 17:51:11 CEST 2008


On Thu, May 15, 2008 at 8:03 AM, Feihong Hsu <hsu.feihong at yahoo.com> wrote:
> Argh, bulk loading to production was a disaster. I kept getting
> "BadStatusLine" and "Software caused connection abort" errors. It
> might have been due to an unstable connection, though.
>
> During bulk load I would try to process 50 lines of CSV text.
> I think I might have gotten some timeouts, so when I retried the
> upload I got some redundant entities in my datastore. To be on the
> safe side, I'll try going one line at a time from now on. I think
> that's what Google's bulk loader does anyway.
>

I ended up writing a custom loader and giving every entry a keyname to
prevent duplicate imports (importing the same thing twice would just
update the entry in the data store). I had a slightly more complicated
scenario than the bulk loader seemed to handle (or at least I was too
lazy to read the entire documentation).

I also wanted the ability to delete items using the bulk import tool
(so I can feed it a diff between CSV files and correctly add and
delete item). My code is available at
http://github.com/cosmin/metratime/tree/master/metratime/bulkloader


-- 
Cosmin Stejerean
http://blog.offbytwo.com


More information about the Chicago mailing list