Uniquely identifying each & every html template

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Jan 22 00:43:20 CET 2013


On 21 January 2013 23:01, Tom P <werotizy at freent.dd> wrote:
> On 01/21/2013 01:39 PM, Oscar Benjamin wrote:
>>
>> On 21 January 2013 12:06, Ferrous Cranus <nikos.gr33k at gmail.com> wrote:
>>>
>>> Τη Δευτέρα, 21 Ιανουαρίου 2013 11:31:24 π.μ. UTC+2, ο χρήστης Chris
>>> Angelico έγραψε:
>>>>
>>>>
>>>> Seriously, you're asking for something that's beyond the power of
>>>> humans or computers. You want to identify that something's the same
>>>> file, without tracking the change or having any identifiable tag.
>>>>
>>>> That's a fundamentally impossible task.
>>>
>>>
>>> No, it is difficult but not impossible.
>>> It just cannot be done by tagging the file by:
>>>
>>> 1. filename
>>> 2. filepath
>>> 3. hash (math algorithm producing a string based on the file's contents)
>>>
>>> We need another way to identify the file WITHOUT using the above
>>> attributes.
>>
>>
>> This is a very old problem (still unsolved I believe):
>> http://en.wikipedia.org/wiki/Ship_of_Theseus
>>
> That wiki article gives a hint to a poosible solution -use a timestamp to
> determine which key is valid when.

In the Ship of Theseus, it is only argued that it is the same ship
because people were aware of the incremental changes that took place
along the way. The same applies here: if you don't track the
incremental changes and the two files have nothing concrete in common,
what does it mean to say that a file is "the same file" as some older
file?

That being said, I've always been impressed with the way that git can
understand when I think that a file is the same as some older file
(though it does sometimes go wrong):

~/tmp$ git init
Initialized empty Git repository in /home/oscar/tmp/.git/
~/tmp$ vim old.py
~/tmp$ cat old.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")
~/tmp$ git add old.py
~/tmp$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#   new file:   old.py
#
~/tmp$ git commit
[master (root-commit) 8e91665] First commit
 1 file changed, 4 insertions(+)
 create mode 100644 old.py
~/tmp$ ls
old.py
~/tmp$ cat old.py > new.py
~/tmp$ rm old.py
~/tmp$ vim new.py
~/tmp$ cat new.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")

print("Although, I've edited it somewhat, it's still useless")
~/tmp$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add/rm <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   deleted:    old.py
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   new.py
no changes added to commit (use "git add" and/or "git commit -a")
~/tmp$ git add -A .
~/tmp$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   renamed:    old.py -> new.py
#

So it *is* Theseus' ship!


Oscar



More information about the Python-list mailing list