[Chicago] AppEngine Bulk Loading

Feihong Hsu hsu.feihong at yahoo.com
Thu May 15 19:44:18 CEST 2008


Ah, yes. I didn't know that it's possible to just use
Model.get_or_insert() and explicitly set the key_name attribute. I
should've done that from the beginning. Now I don't have to worry
about processing 50 rows and timing out in the middle. 


--- Cosmin Stejerean <cstejerean at gmail.com> wrote:

> On Thu, May 15, 2008 at 8:03 AM, Feihong Hsu
> <hsu.feihong at yahoo.com> wrote:
> > Argh, bulk loading to production was a disaster. I kept getting
> > "BadStatusLine" and "Software caused connection abort" errors. It
> > might have been due to an unstable connection, though.
> >
> > During bulk load I would try to process 50 lines of CSV text.
> > I think I might have gotten some timeouts, so when I retried the
> > upload I got some redundant entities in my datastore. To be on
> the
> > safe side, I'll try going one line at a time from now on. I think
> > that's what Google's bulk loader does anyway.
> >
> 
> I ended up writing a custom loader and giving every entry a keyname
> to
> prevent duplicate imports (importing the same thing twice would
> just
> update the entry in the data store). I had a slightly more
> complicated
> scenario than the bulk loader seemed to handle (or at least I was
> too
> lazy to read the entire documentation).
> 
> I also wanted the ability to delete items using the bulk import
> tool
> (so I can feed it a diff between CSV files and correctly add and
> delete item). My code is available at
> http://github.com/cosmin/metratime/tree/master/metratime/bulkloader
> 
> 
> -- 
> Cosmin Stejerean
> http://blog.offbytwo.com
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
> 



      


More information about the Chicago mailing list