[summerofcode] Application sent

ChunWei Ho fuzzybr80 at gmail.com
Sat Jun 4 08:32:16 CEST 2005


Hi all,

I have just sent an application to Google's summer of code, based on
Python web programming (WSGI) and an old project of mine that I will
make more generic. I would appreciate if anyone could eyeball it and
give comments.

Thanks in advance for the time. :)


Some told me that publishing the idea may not be good, but I thought
(a) Its not worth stealing
(b) Python programmers are above this (*naivety* point)
(c) The potential mentors would be reading this anyway

-------------------------------------------------

Project Title: Data Serving/Collection Framework in Python/WSGI

Proposed Mentor/Sponsoring Organization: Python Software Foundation

Project Description:
A framework based on bulk data serving/collection via the internet.
Bulk data are in the form of files that could easily be
several-several hundred MB (not surveys or simple POST data).

The client has a file repository that it wishes to sync to the server
(a WSGI application). This server should be able to facilitate
transfer via a number of protocols, including HTTP file transfer, HTTP
form upload, FTP, Email.

This project is aimed not at yet another ad-hoc file transfer or p2p
file-sharing program but as a persistent production setup for
transferring data from data collection sites/areas to a server,
possibly via internet through different methods to get through strict
organizational firewalls and web admins.

Unlike a normal straightforward file transfer application, the
framework should support:
+ Authentication and encryption
+ Verification scheme for data transfer, retries, etc - MD5 hash compare?
+ Chunking of large files and reassemble on receipt
+ Partial/Resume file transfers support - may depend on nature of data

Also, unlike commercial advanced file transfer programs, the framework does:
+ Supports multiple protocols for transfer HTTP/FTP/Email
+ Automatic identification of files to synchronize (comparison of
server and client repositories and request automatically)
+ Conditional Processing (triggers - resync file if modified? logic -
user specified)
+ Robust and considerate client - may be shared machine, means a
service (I initially designed it for Windows clients - platform choice
was not up to me) that must be configurable on when it runs, how long
it runs
+ and if configured limit does not allow client to sync all data -
what must be synced first (Latest file first, Earliest file first,
Latest file only, etc). This form of consideration seems to be
important for running on production sites or factory machines when the
machine is in use in the day but idle for our use at night, or when
machines have internet connectivity (possibly dialup) at only certain
times of the day.


Development will be based on WSGI/Paste model, although I will also
investigate Zope/Cherry/Plone and other frameworks purely for
comparison or design consideration purposes. WSGI is chosen for small
learning curve, as well as the fact that data collection for an
application can be separated from other functions.


Benefits To The Community:
I had attempted to do such work for data collection in an earlier
project in Python. My lack of knowledge of web frameworks probably
made it a messy piece of code. Also there wasn't anything like this
(perhaps I was not thinking generic enough) that I could just adapt to
fill my needs.

Even if the project itself has a niche audience (not everyone needs a
framework to collect data from behind firewalls and strict site
admins), going thru the project will come up with components for
general use in Python/WSGI, such as on the TODO list:
+ A WSGI file-serving application. This application should understand
all the relevant conditionals (If-Modified-Since, etc), gzipped
encoding, etc.
+ Scheme to translate different forms/protocols, e.g. FTP, into WSGI
executable actions (HTTP GET, PUT, PROPFIND)


Brief Bio:

<stuff omitted>


More information about the summerofcode mailing list