python file synchronization

Sherif Shehab Aldin silentquote at gmail.com
Thu Feb 16 21:11:52 CET 2012


Hi Cameron,

First sorry for my very very late reply, has been overloaded at work last
week :(
Anyways... I will reply inline this time ;)

On Wed, Feb 8, 2012 at 11:59 AM, Cameron Simpson <cs at zip.com.au> wrote:

> [ Please reply inline; it makes the discussion read like a converation,
>  with context. - Cameron
> ]
>
> On 08Feb2012 08:57, Sherif Shehab Aldin <silentquote at gmail.com> wrote:
> | Thanks a lot for your help, I just forgot to state that the FTP server is
> | not under my command, I can't control how the file grow, or how the
> records
> | are added, I can only login to It, copy the whole file.
>
> Oh. That's a pity.
>
> | The reason why I am parsing the file and trying to get the diffs between
> | the new file and the old one, and copy it to new_file.time_stamp is that
> I
> | need to cut down the file size so when server (C) grabs the file, It
> grabs
> | only new data, also to cut down the network bandwidth.
>
> Can a simple byte count help here? Copy the whole file with FTP. From
> the new copy, extract the bytes from the last byte count offset onward.
> Then parse the smaller file, extracting whole records for use by (C).
> That way you can just keep the unparsed tail (partial record I imagine)
> around for the next fetch.
>
> Looking at RFC959 (the FTP protocol):
>
>  http://www.w3.org/Protocols/rfc959/4_FileTransfer.html
>
> it looks like you can do a partial file fetch, also, by issuing a REST
> (restart) command to set a file offset and then issuing a RETR (retrieve)
> command to get the rest of the file. These all need to be in binary mode
> of course.
>
> So in principle you could track the byte offset of what you have fetched
> with FTP so far, and fetch only what is new.
>

 I am actually grabbing the file from ftp with a bash script using lftp, It
seemed a simple task for python at the beginning and then I noticed the
many problems. I have checked lftp and did not know how to continue
downloading a file. Do I have to use ftp library, may be in python so I can
use that feature?

>
> | One of my problems was after mounting server (B) diffs_dir into Server
> (A)
> | throw NFS, I used to create filename.lock first into server (B) local
> file
> | then start copy filename to server (B) then remove filename.lock, so when
> | the daemon running on server (C) parses the files in the local_diffs dir,
> | ignores the files that are still being copied,
> |
> | After searching more yesterday, I found that local mv is atomic, so
> instead
> | of creating the lock files, I will copy the new diffs to tmp dir, and
> after
> | the copy is over, mv it to actual diffs dir, that will avoid reading It
> | while It's still being copied.
>
> Yes, this sounds good. Provided the mv is on the same filesystem.
>
> For example: "mv /tmp/foo /home/username/foo" is actually a copy and not
> a rename because /tmp is normally a different filesystem from /home.
>
> Yes they are in same file system, I am making sure of that ;)


> | Sorry if the above is bit confusing, the system is bit complex.
>
> Complex systems often need fiddly solutions.
>
> | Also there is one more factor that confuses me, I am so bad in testing,
> and
> | I am trying to start actually implement unit testing to test my code,
> what
> | I find hard is how to test code like the one that do the copy, mv and so,
> | also the code that fetch data from the web.
>
> Ha. I used to be very bad at testing, now I am improving and am merely
> weak.
>
> One approach to testing is to make a mock up of the other half of the
> system, and test against the mockup.
>
> For example, you have code to FTP new data and then feed it to (C). You
> don't control the server side of the FTP. So you might make a small mock
> up program that writes valid (but fictitious) data records progressively
> to a local data file (write record, flush, pause briefly, etc). If you
> can FTP to your own test machine you could then treat _that_ growing
> file as the remote server's data file.
>
> Then you could copy it progressively using a byte count to keep track of
> the bits you have seen to skip them, and the the
>
> If you can't FTP to your test system, you could abstract out the "fetch
> part of this file by FTP" into its own function. Write an equivalent
> function that fetches part of a local file just by opening it.
>
> Then you could use the local file version in a test that doesn't
> actually do the FTP, but could exercise the rest of it.
>
> It is also useful to make simple tests of small pieces of the code.
> So make the code to get part of the data a simple function, and write
> tests to execute it in a few ways (no new data, part of a record,
> several records etc).
>
> You are right, my problem is that I don't care about testing until my code
grows badly and then I notice what I got myself into :)
But ur suggestion is cool. I will try to implement that once I get back to
that project again... As I got some problems with another project currently
so I had to go fix them first.. and then the customer wanted some changes..
;-)

There are many people better than I to teach testing.
>
> I really appreciate your help. I am trying to learn from the mailing list,
I noticed many interesting posts in the list already. I wish I could read
the python-list same way.. but unfortunately the mail digest they send is
quiet annoying :(

Many thanks to you, and I will keep you posted if I got other ideas. :)

> Cheers,
> --
> Cameron Simpson <cs at zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> Testing can show the presence of bugs, but not their absence.   - Dijkstra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120216/01d61e7d/attachment.html>


More information about the Python-list mailing list