Discussion about crash tolerance feature for gdbm module
Hi folks, From gdbm 1.21, gdbm supports the crash tolerance feature. see: https://www.gnu.org.ua/software/gdbm/manual/Crash-Tolerance.html I would like to introduce this feature since python standard library is the only gdbm binding library that is available for end-users. And this is also effort for end-users to provide the latest new features. If this feature is added following methods will be added. - 'x' flag will be added for extended gdbm format. - gdbm.gdbm_failure_atomic('path1', 'path2') API will be added for snapshot paths. - gdbm.gdbm_latest_snapshot('path1', 'path2') API will be added for getting latest valid snapshot path. The above APIs will be used for people who need to recover their gdbm file if disaster situations happen. However, this feature is not yet decided to land to CPython. and we have already discussed this issue at https://bugs.python.org/issue45452 but I would like to hear other devs opinions. I cc the original authors of this feature and Serhiy who thankfully already participated in this discussion too :) Warm Regards, Dong-hee
Hi Dong-hee, Can you please show me a short full example opening a database with automatic snapshop recovery (with your proposed API)? Do you ask us our opinion on the Python API that you propose? Or if the whole feature is worth it? Victor On Tue, Jan 18, 2022 at 2:41 AM Dong-hee Na <donghee.na@python.org> wrote:
Hi folks, From gdbm 1.21, gdbm supports the crash tolerance feature. see: https://www.gnu.org.ua/software/gdbm/manual/Crash-Tolerance.html
I would like to introduce this feature since python standard library is the only gdbm binding library that is available for end-users. And this is also effort for end-users to provide the latest new features. If this feature is added following methods will be added. - 'x' flag will be added for extended gdbm format. - gdbm.gdbm_failure_atomic('path1', 'path2') API will be added for snapshot paths. - gdbm.gdbm_latest_snapshot('path1', 'path2') API will be added for getting latest valid snapshot path.
The above APIs will be used for people who need to recover their gdbm file if disaster situations happen. However, this feature is not yet decided to land to CPython. and we have already discussed this issue at https://bugs.python.org/issue45452 but I would like to hear other devs opinions.
I cc the original authors of this feature and Serhiy who thankfully already participated in this discussion too :)
Warm Regards, Dong-hee _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RFICESJK... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
Do you ask us our opinion on the Python API that you propose? Or if
Hello Victor :) the whole feature is worth it? Latter must be done first :)
Can you please show me a short full example opening a database with automatic snapshot recovery (with your proposed API)?
The following codes are demonstrating how the user sets the gdbm file to be able to use the crash tolerance feature. import dbm.gnu as dbm db = dbm.open('x.db', 'nx') # For extension format db.gdbm_failure_atomic('even_snapshot.bin', 'odd_snapshot.bin') # For snapshot declaration for k, v in zip('abcdef', 'ghijkl'): db[k] = v db.sync() db.close() The recovery task will be done separately because this situation is a very special case. import dbm.gnu as dbm latest_snapshot = dbm.gdbm_latest_snapshot('even_snapshot.bin', 'odd_snapshot.bin') db = dbm.open(latest_snapshot, 'r') # Open the latest valid snapshot # Do what user want, os.rename(latest_snapshot, 'x.db') whatever. Warm regards, Dong-hee 2022년 1월 18일 (화) 오후 6:57, Victor Stinner <vstinner@python.org>님이 작성:
Hi Dong-hee,
Can you please show me a short full example opening a database with automatic snapshop recovery (with your proposed API)?
Do you ask us our opinion on the Python API that you propose? Or if the whole feature is worth it?
Victor
On Tue, Jan 18, 2022 at 2:41 AM Dong-hee Na <donghee.na@python.org> wrote:
Hi folks, From gdbm 1.21, gdbm supports the crash tolerance feature. see: https://www.gnu.org.ua/software/gdbm/manual/Crash-Tolerance.html
I would like to introduce this feature since python standard library is
the only gdbm binding library that is available for end-users.
And this is also effort for end-users to provide the latest new features. If this feature is added following methods will be added. - 'x' flag will be added for extended gdbm format. - gdbm.gdbm_failure_atomic('path1', 'path2') API will be added for snapshot paths. - gdbm.gdbm_latest_snapshot('path1', 'path2') API will be added for getting latest valid snapshot path.
The above APIs will be used for people who need to recover their gdbm file if disaster situations happen. However, this feature is not yet decided to land to CPython. and we have already discussed this issue at https://bugs.python.org/issue45452 but I would like to hear other devs opinions.
I cc the original authors of this feature and Serhiy who thankfully already participated in this discussion too :)
Warm Regards, Dong-hee _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RFICESJK... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
How does someone know if a database is corrupted? Why is a separated script needed? A single script cannot automatically detect a corrupted database and load the latest snapshot? How is different from simply copying the whole database file? Victor -- Night gathers, and now my watch begins. It shall not end until my death.
I exchanged a mail for investigating details. Before getting started, please remind the following fact. - The latest snapshot is always valid, even if corruption has not occurred.
Why is a separated script needed? / A single script cannot automatically detect a corrupted database and load the latest snapshot?
So, the author said a separate script is never needed, if the user reads from the latest snapshot file, it will always be recovered. so the user code will be like this. import dbm.gnu as dbm # skip check code that all files are exists, origin, even_snapshot, odd_snapshot if origin is None: db = dbm.open(origin, 'nx') # For extension format db.gdbm_failure_atomic(even_snapshot, odd_snapshot) # For snapshot declaration else: latest_snapshot = dbm.gdbm_latest_snapshot(even_snapshot odd_snapshot) db = dbm.open(latest_snapshot, 'r') # Open the latest valid snapshot for k, v in zip('abcdef', 'ghijkl'): db[k] = v db.sync() db.close()
How is different from simply copying the whole database file?
Under the hood, the gdbm crash-tolerance mechanism *does* (logically) copy the whole database file, but it does so efficiently, using "reflink" copies, so the amount of physical storage resources used is minimal. You may good to read this paper: https://dl.acm.org/doi/pdf/10.1145/3487019.3487353 Warm Regards, Dong-hee 2022년 1월 18일 (화) 오후 10:54, Victor Stinner <vstinner@python.org>님이 작성:
How does someone know if a database is corrupted? Why is a separated script needed?
A single script cannot automatically detect a corrupted database and load the latest snapshot?
How is different from simply copying the whole database file?
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
For more readable code: https://gist.github.com/corona10/d4fe0b6367ea6865e37b4369a7d60912 2022년 1월 21일 (금) 오후 12:50, Dong-hee Na <donghee.na@python.org>님이 작성:
I exchanged a mail for investigating details.
Before getting started, please remind the following fact. - The latest snapshot is always valid, even if corruption has not occurred.
Why is a separated script needed? / A single script cannot automatically detect a corrupted database and load the latest snapshot?
So, the author said a separate script is never needed, if the user reads from the latest snapshot file, it will always be recovered. so the user code will be like this.
import dbm.gnu as dbm
# skip check code that all files are exists, origin, even_snapshot, odd_snapshot
if origin is None:
db = dbm.open(origin, 'nx') # For extension format
db.gdbm_failure_atomic(even_snapshot, odd_snapshot) # For snapshot declaration
else:
latest_snapshot = dbm.gdbm_latest_snapshot(even_snapshot odd_snapshot)
db = dbm.open(latest_snapshot, 'r') # Open the latest valid snapshot
for k, v in zip('abcdef', 'ghijkl'): db[k] = v
db.sync() db.close()
How is different from simply copying the whole database file?
Under the hood, the gdbm crash-tolerance mechanism *does* (logically) copy the whole database file, but it does so efficiently, using "reflink" copies, so the amount of physical storage resources used is minimal.
You may good to read this paper: https://dl.acm.org/doi/pdf/10.1145/3487019.3487353
Warm Regards,
Dong-hee
2022년 1월 18일 (화) 오후 10:54, Victor Stinner <vstinner@python.org>님이 작성:
How does someone know if a database is corrupted? Why is a separated script needed?
A single script cannot automatically detect a corrupted database and load the latest snapshot?
How is different from simply copying the whole database file?
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
After discussion with Victor by using DM, I decided to provide high-level API instead of low-level APIs. - gdbm.open(filename, snapshots=(foo, bar)) will do everything at once. Regards, Dong-hee 2022년 1월 21일 (금) 오후 12:52, Dong-hee Na <donghee.na@python.org>님이 작성:
For more readable code: https://gist.github.com/corona10/d4fe0b6367ea6865e37b4369a7d60912
2022년 1월 21일 (금) 오후 12:50, Dong-hee Na <donghee.na@python.org>님이 작성:
I exchanged a mail for investigating details.
Before getting started, please remind the following fact. - The latest snapshot is always valid, even if corruption has not occurred.
Why is a separated script needed? / A single script cannot automatically detect a corrupted database and load the latest snapshot?
So, the author said a separate script is never needed, if the user reads from the latest snapshot file, it will always be recovered. so the user code will be like this.
import dbm.gnu as dbm
# skip check code that all files are exists, origin, even_snapshot, odd_snapshot
if origin is None:
db = dbm.open(origin, 'nx') # For extension format
db.gdbm_failure_atomic(even_snapshot, odd_snapshot) # For snapshot declaration
else:
latest_snapshot = dbm.gdbm_latest_snapshot(even_snapshot odd_snapshot)
db = dbm.open(latest_snapshot, 'r') # Open the latest valid snapshot
for k, v in zip('abcdef', 'ghijkl'): db[k] = v
db.sync() db.close()
How is different from simply copying the whole database file?
Under the hood, the gdbm crash-tolerance mechanism *does* (logically) copy the whole database file, but it does so efficiently, using "reflink" copies, so the amount of physical storage resources used is minimal.
You may good to read this paper: https://dl.acm.org/doi/pdf/10.1145/3487019.3487353
Warm Regards,
Dong-hee
2022년 1월 18일 (화) 오후 10:54, Victor Stinner <vstinner@python.org>님이 작성:
How does someone know if a database is corrupted? Why is a separated script needed?
A single script cannot automatically detect a corrupted database and load the latest snapshot?
How is different from simply copying the whole database file?
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
participants (2)
-
Dong-hee Na -
Victor Stinner