Re: [Python-Dev] PEP 3147: PYC Repository Directories

On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:
It's an interesting challenge to write the file in such a way that it's safe for a reader and writer to co-exist. Like Brett, I considered an append-only scheme, but one needs to handle the case where the bytecode for a particular magic number changes. At some point you'd need to sweep garbage from the file. All solutions seem unnecessarily complex, and unnecessary since in practice the case should not come up.
I don't think that part's difficult. The byte code's only going to change if the source file has changed, and in that case, /all/ the byte code in the "fat pyc" file will be invalidated, so the whole thing can be deleted by the first writer. I'd worked that out in the original fat pyc version of the PEP. -Barry

On Sat, Feb 6, 2010 at 3:28 PM, Barry Warsaw <barry@python.org> wrote:
On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:
It's an interesting challenge to write the file in such a way that it's safe for a reader and writer to co-exist. Like Brett, I considered an append-only scheme, but one needs to handle the case where the bytecode for a particular magic number changes. At some point you'd need to sweep garbage from the file. All solutions seem unnecessarily complex, and unnecessary since in practice the case should not come up.
I don't think that part's difficult. The byte code's only going to change if the source file has changed, and in that case, /all/ the byte code in the "fat pyc" file will be invalidated, so the whole thing can be deleted by the first writer. I'd worked that out in the original fat pyc version of the PEP.
I'm sorry, but I'm totally against fat bytecode files. They make things harder for all tools. The beauty of the existing bytecode format is that it's totally trivial: magic number, source mtime, unmarshalled code object. You can't beat the beauty of that. For the traditional "skinny" bytecode files, I believe that the existing algorithm which writes zeros in the place of the magic number first, writes the rest of the file, and then goes back to write the correct magic number, is correct with a single writer and multiple readers (assuming the readers ignore the file if its magic number is invalid). The creat(O_EXCL) option ensures that there won't be multiple writers. No rename() is necessary; POSIX rename() may be atomic, but it's a directory modification which makes it potentially slow. -- --Guido van Rossum (python.org/~guido)

On Feb 06, 2010, at 04:02 PM, Guido van Rossum wrote:
On Sat, Feb 6, 2010 at 3:28 PM, Barry Warsaw <barry@python.org> wrote:
On Feb 01, 2010, at 02:04 PM, Paul Du Bois wrote:
It's an interesting challenge to write the file in such a way that it's safe for a reader and writer to co-exist. Like Brett, I considered an append-only scheme, but one needs to handle the case where the bytecode for a particular magic number changes. At some point you'd need to sweep garbage from the file. All solutions seem unnecessarily complex, and unnecessary since in practice the case should not come up.
I don't think that part's difficult. The byte code's only going to change if the source file has changed, and in that case, /all/ the byte code in the "fat pyc" file will be invalidated, so the whole thing can be deleted by the first writer. I'd worked that out in the original fat pyc version of the PEP.
I'm sorry, but I'm totally against fat bytecode files. They make things harder for all tools. The beauty of the existing bytecode format is that it's totally trivial: magic number, source mtime, unmarshalled code object. You can't beat the beauty of that.
Just for the record, I totally agree. I was just explaining something I had figured out in the original version of the PEP, which wasn't published but which Martin had seen an early draft of. When Martin made the suggestion of sibling cache directories, I immediately realized that it was much cleaner, better, and easier to implement than fat files (especially because I already had some nasty complex code that implemented the fat files ;). I'm beginning to be convinced <wink> that a folder-per-folder approach is the best take on this yet.
For the traditional "skinny" bytecode files, I believe that the existing algorithm which writes zeros in the place of the magic number first, writes the rest of the file, and then goes back to write the correct magic number, is correct with a single writer and multiple readers (assuming the readers ignore the file if its magic number is invalid). The creat(O_EXCL) option ensures that there won't be multiple writers. No rename() is necessary; POSIX rename() may be atomic, but it's a directory modification which makes it potentially slow.
Agreed, and the current approach is time and battle tested. I don't think we need to be mucking around with it. My current effort on this PEP will be spent on fleshing out the folder-per-folder approach, understanding the implications of that, and integrating all the other great comments in this thread. -Barry
participants (2)
-
Barry Warsaw
-
Guido van Rossum