[Python-Dev] A few questions about setobject
Noam Raphael
noamraph at gmail.com
Wed Dec 28 12:58:38 CET 2005
On 12/28/05, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>
> The setentry typedef clearly violates the principles of the API, so
> it should be renamed.
(And so will _setobject, although I think it will be much easier to remove it.)
>
> > Perhaps the header file should stick
> > with writing "struct { long hash; PyObject *key; }" three times (or
> > define it in a macro and then undefine it), and the typedef be left to
> > the .c file?
>
> That would not be conforming to the C language standard (although
> accepted by most compilers).
Can you explain why it is not conforming to the standard? Can't a
typedef be used intechangably with the original type?
> Not sure what "this" refers to in your message: the text of the C API
> documentation certainly is desirable as it stands (although it should
> be clearer as to whether struct names should be prefixed).
> > Even if it is, it seems that the second sentence
> > contradicts the first sentence.
>
> Why does that seem so? To quote, the first sentence is
>
> 'All user visible names defined by Python.h (except those defined by
> the included standard headers) have one of the prefixes "Py" or "_Py".'
>
> and the second sentence is
>
> 'Names beginning with "_Py" are for internal use by the Python
> implementation and should not be used by extension writers.'
>
> I cannot see any contradiction between these.
>
Oops. It's the first and the third:
The first: All user visible names defined by Python.h (except those
defined by the included standard headers) have one of the prefixes
"Py" or "_Py".
The third: Structure member names do not have a reserved prefix.
I think that "structure member names" refers to things like setentry.
The third sentence contradicts the first since structure member names
are user visible names. Anyway, it seems to me best if the third
sentence will be removed and all names will start with Py or _Py.
>
> > I think it should be ok because it's never used
> > really as a PyObject. Am I missing something? (Ok, I now thought that
> > maybe it's because some parts don't treat dummy elements specially.
> > But it seems to me that most parts do treat them specially, so perhaps
> > it would be better to make a few changes so that all parts will treat
> > them specially?)
>
> In principle, you are right. One place that doesn't special-case the
> dummy is set_clear_internal (in fact, the Py_DEBUG assertion is
> completely useless there, AFAICT).
>
> The tricky question is: can we be certain that we find all places,
> in all code paths, where we have to special-case the dummy? Having
> PyObject* which don't point to PyObject is error-prone.
>
> Also, what would we gain? If you think it is speed: I doubt it. In
> one place, a comment suggests that actually seeing the dummy element
> is so much more unlikely than the other cases that, for performance,
> the test for the dummy is done last. We would lose additional speed
> in the cases where the dummy isn't yet considered.
>
Ok, I tried. It took me 25 minutes to change the code, while going
over every occurence of "key" and "decref" in setobject.c. (It seems
to me that the dummy element can only be accessed from entry->key.)
Most of the code isn't bothered by the dummy element, since it uses
the data structure in a more abstract way. I think that it simplifies
the code, while adding a condition in only two places, which I don't
think should make anything noticably slower. The result passes the
complete test suite. In one sentence: I think that it makes the code
slightly better, and the risk is pretty low.
I thought to post it as a patch, but sourceforge didn't work for me,
and it's not that long, so I paste it at the end of this message. Feel
free to do whatever you want with it.
> > 3) The type of the result of a binary operator applied on a set and a
> > frozenset is the type of the left set. You are welcomed to ignore
> > this, but I just wanted to say that it seems to me better to make the
> > operator symmetric, and to return a frozenset only if both sets are
> > frozen.
>
> How would you implement this? The result is obtained from copying the
> left operand, and then applying the other operand. This is done so
> that set subtyping becomes possible:
>
> >>> class myset(set):pass
> ...
> >>> x=myset([2,6])
> >>> y=set([2,6])
> >>> x.union(y)
> myset([2, 6])
>
> So if the result is not obtained by copying the left operand first,
> how would you compute the result type, so that this example still
> works?
>
The behaviour will change to work like in other types - the returned
value will be of the base type:
>>> class MyInt(int): pass
...
>>> x = MyInt(3)
>>> y = 5
>>> x.__add__(y)
8
I'm not talking about backwards compatibility - I'm just currently
asking if others also feel that the symmetric version is preferable.
Ok, here's the diff:
=== modified file 'Objects/setobject.c'
--- Objects/setobject.c
+++ Objects/setobject.c
@@ -13,8 +13,12 @@
/* This must be >= 1. */
#define PERTURB_SHIFT 5
-/* Object used as dummy key to fill deleted entries */
-static PyObject *dummy = NULL; /* Initialized by first call to
make_new_set() */
+/* Object used as dummy key to fill deleted entries.
+ * The only requirement is that it won't be a valid pointer to a
+ * PyObject, so it is instead a pointer to a dummy int.
+ */
+static int dummy_int;
+static PyObject *dummy = (PyObject *)(&dummy_int);
#define INIT_NONZERO_SET_SLOTS(so) do { \
(so)->table = (so)->smalltable; \
@@ -199,7 +203,6 @@
entry->key = key;
entry->hash = hash;
so->used++;
- Py_DECREF(dummy);
} else {
/* ACTIVE */
Py_DECREF(key);
@@ -283,8 +286,6 @@
} else if (entry->key == dummy) {
/* DUMMY */
--i;
- assert(entry->key == dummy);
- Py_DECREF(entry->key);
} else {
/* ACTIVE */
--i;
@@ -356,7 +357,6 @@
if (entry->key == NULL || entry->key == dummy)
return DISCARD_NOTFOUND;
old_key = entry->key;
- Py_INCREF(dummy);
entry->key = dummy;
so->used--;
Py_DECREF(old_key);
@@ -383,7 +383,6 @@
if (entry->key == NULL || entry->key == dummy)
return DISCARD_NOTFOUND;
old_key = entry->key;
- Py_INCREF(dummy);
entry->key = dummy;
so->used--;
Py_DECREF(old_key);
@@ -439,14 +438,12 @@
assert(i < n);
++i;
#endif
- if (entry->key) {
+ if (entry->key != NULL) {
--fill;
- Py_DECREF(entry->key);
- }
-#ifdef Py_DEBUG
- else
- assert(entry->key == NULL || entry->key == dummy);
-#endif
+ if (entry->key != dummy) {
+ Py_DECREF(entry->key);
+ }
+ }
}
if (table_is_malloced)
@@ -499,9 +496,11 @@
PyObject_ClearWeakRefs((PyObject *) so);
for (entry = so->table; fill > 0; entry++) {
- if (entry->key) {
+ if (entry->key != NULL) {
--fill;
- Py_DECREF(entry->key);
+ if (entry->key != dummy) {
+ Py_DECREF(entry->key);
+ }
}
}
if (so->table != so->smalltable)
@@ -661,7 +660,6 @@
}
}
key = entry->key;
- Py_INCREF(dummy);
entry->key = dummy;
so->used--;
so->table[0].hash = i + 1; /* next place to start */
@@ -889,12 +887,6 @@
{
register PySetObject *so = NULL;
- if (dummy == NULL) { /* Auto-initialize dummy */
- dummy = PyString_FromString("<dummy key>");
- if (dummy == NULL)
- return NULL;
- }
-
/* create PySetObject structure */
if (num_free_sets &&
(type == &PySet_Type || type == &PyFrozenSet_Type)) {
@@ -971,7 +963,6 @@
so = free_sets[num_free_sets];
PyObject_GC_Del(so);
}
- Py_XDECREF(dummy);
Py_XDECREF(emptyfrozenset);
}
More information about the Python-Dev
mailing list