Python-checkins
Threads by month
- ----- 2026 -----
- April
- March
- February
- January
- ----- 2025 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2007 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2006 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2005 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2004 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2003 -----
- December
- November
- October
- September
- August
March 2022
- 1 participants
- 509 discussions
March 2, 2022
https://github.com/python/cpython/commit/b6b711a1aa233001c1874af1d920e459b6…
commit: b6b711a1aa233001c1874af1d920e459b6bf962c
branch: main
author: Victor Stinner <vstinner(a)python.org>
committer: vstinner <vstinner(a)python.org>
date: 2022-03-02T14:15:26+01:00
summary:
bpo-46848: Move _PyBytes_Find() to internal C API (GH-31642)
Move _PyBytes_Find() and _PyBytes_ReverseFind() functions to the
internal C API.
bytesobject.c now includes pycore_bytesobject.h.
files:
M Include/cpython/bytesobject.h
M Include/internal/pycore_bytesobject.h
M Modules/mmapmodule.c
M Objects/bytesobject.c
diff --git a/Include/cpython/bytesobject.h b/Include/cpython/bytesobject.h
index 38a0fe0af660f..6b3f55224fc55 100644
--- a/Include/cpython/bytesobject.h
+++ b/Include/cpython/bytesobject.h
@@ -116,22 +116,3 @@ PyAPI_FUNC(void*) _PyBytesWriter_WriteBytes(_PyBytesWriter *writer,
void *str,
const void *bytes,
Py_ssize_t size);
-
-/* Substring Search.
-
- Returns the index of the first occurence of
- a substring ("needle") in a larger text ("haystack").
- If the needle is not found, return -1.
- If the needle is found, add offset to the index.
-*/
-
-PyAPI_FUNC(Py_ssize_t)
-_PyBytes_Find(const char *haystack, Py_ssize_t len_haystack,
- const char *needle, Py_ssize_t len_needle,
- Py_ssize_t offset);
-
-/* Same as above, but search right-to-left */
-PyAPI_FUNC(Py_ssize_t)
-_PyBytes_ReverseFind(const char *haystack, Py_ssize_t len_haystack,
- const char *needle, Py_ssize_t len_needle,
- Py_ssize_t offset);
diff --git a/Include/internal/pycore_bytesobject.h b/Include/internal/pycore_bytesobject.h
index 18d9530aaf41e..8739a759ec36b 100644
--- a/Include/internal/pycore_bytesobject.h
+++ b/Include/internal/pycore_bytesobject.h
@@ -14,6 +14,25 @@ extern "C" {
extern PyStatus _PyBytes_InitTypes(PyInterpreterState *);
+/* Substring Search.
+
+ Returns the index of the first occurence of
+ a substring ("needle") in a larger text ("haystack").
+ If the needle is not found, return -1.
+ If the needle is found, add offset to the index.
+*/
+
+PyAPI_FUNC(Py_ssize_t)
+_PyBytes_Find(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset);
+
+/* Same as above, but search right-to-left */
+PyAPI_FUNC(Py_ssize_t)
+_PyBytes_ReverseFind(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset);
+
#ifdef __cplusplus
}
#endif
diff --git a/Modules/mmapmodule.c b/Modules/mmapmodule.c
index 6a038e72f93cf..ec36465728c3a 100644
--- a/Modules/mmapmodule.c
+++ b/Modules/mmapmodule.c
@@ -24,6 +24,7 @@
#define PY_SSIZE_T_CLEAN
#include <Python.h>
+#include "pycore_bytesobject.h" // _PyBytes_Find()
#include "pycore_fileutils.h" // _Py_stat_struct
#include "structmember.h" // PyMemberDef
#include <stddef.h> // offsetof()
diff --git a/Objects/bytesobject.c b/Objects/bytesobject.c
index 4c67b8f7af213..c6160aad790be 100644
--- a/Objects/bytesobject.c
+++ b/Objects/bytesobject.c
@@ -4,6 +4,7 @@
#include "Python.h"
#include "pycore_abstract.h" // _PyIndex_Check()
+#include "pycore_bytesobject.h" // _PyBytes_Find()
#include "pycore_bytes_methods.h" // _Py_bytes_startswith()
#include "pycore_call.h" // _PyObject_CallNoArgs()
#include "pycore_format.h" // F_LJUST
1
0
https://github.com/python/cpython/commit/03642df1a1cfddcd740b62e78bddfa3ea6…
commit: 03642df1a1cfddcd740b62e78bddfa3ea6863da4
branch: main
author: Inada Naoki <songofacandy(a)gmail.com>
committer: methane <songofacandy(a)gmail.com>
date: 2022-03-02T19:05:12+09:00
summary:
dict: Internal cleanup (GH-31641)
* Make empty_key from split table to combined table.
* Use unicode_get_hash() when possible.
files:
M Objects/dictobject.c
diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 20d7edab93ab1..abe455e4ae034 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -454,7 +454,7 @@ static PyDictKeysObject empty_keys_struct = {
1, /* dk_refcnt */
0, /* dk_log2_size */
0, /* dk_log2_index_bytes */
- DICT_KEYS_SPLIT, /* dk_kind */
+ DICT_KEYS_UNICODE, /* dk_kind */
1, /* dk_version */
0, /* dk_usable (immutable) */
0, /* dk_nentries */
@@ -462,16 +462,6 @@ static PyDictKeysObject empty_keys_struct = {
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}, /* dk_indices */
};
-
-struct {
- uint8_t prefix[sizeof(PyObject *)];
- PyDictValues values;
-} empty_values_struct = {
- { [sizeof(PyObject *)-1] = sizeof(PyObject *) },
- {{NULL}}
-};
-#define empty_values (&empty_values_struct.values)
-
#define Py_EMPTY_KEYS &empty_keys_struct
/* Uncomment to check the dict content in _PyDict_CheckConsistency() */
@@ -495,7 +485,6 @@ get_index_from_order(PyDictObject *mp, Py_ssize_t i)
static void
dump_entries(PyDictKeysObject *dk)
{
- int kind = dk->dk_kind;
for (Py_ssize_t i = 0; i < dk->dk_nentries; i++) {
if (DK_IS_UNICODE(dk)) {
PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(dk)[i];
@@ -531,7 +520,7 @@ _PyDict_CheckConsistency(PyObject *op, int check_content)
if (!splitted) {
/* combined table */
CHECK(keys->dk_kind != DICT_KEYS_SPLIT);
- CHECK(keys->dk_refcnt == 1);
+ CHECK(keys->dk_refcnt == 1 || keys == Py_EMPTY_KEYS);
}
else {
CHECK(keys->dk_kind == DICT_KEYS_SPLIT);
@@ -688,7 +677,8 @@ free_keys_object(PyDictKeysObject *keys)
// free_keys_object() must not be called after _PyDict_Fini()
assert(state->keys_numfree != -1);
#endif
- if (DK_LOG_SIZE(keys) == PyDict_LOG_MINSIZE && state->keys_numfree < PyDict_MAXFREELIST
+ if (DK_LOG_SIZE(keys) == PyDict_LOG_MINSIZE
+ && state->keys_numfree < PyDict_MAXFREELIST
&& DK_IS_UNICODE(keys)) {
state->keys_free_list[state->keys_numfree++] = keys;
return;
@@ -845,7 +835,7 @@ PyObject *
PyDict_New(void)
{
dictkeys_incref(Py_EMPTY_KEYS);
- return new_dict(Py_EMPTY_KEYS, empty_values, 0, 0);
+ return new_dict(Py_EMPTY_KEYS, NULL, 0, 0);
}
/* Search index of hash table from offset of entry table */
@@ -1478,9 +1468,7 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize, int unicode)
}
dictkeys_decref(oldkeys);
mp->ma_values = NULL;
- if (oldvalues != empty_values) {
- free_values(oldvalues);
- }
+ free_values(oldvalues);
}
else { // oldkeys is combined.
if (oldkeys->dk_kind == DICT_KEYS_GENERAL) {
@@ -1506,7 +1494,7 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize, int unicode)
if (unicode) { // combined unicode -> combined unicode
PyDictUnicodeEntry *newentries = DK_UNICODE_ENTRIES(mp->ma_keys);
if (oldkeys->dk_nentries == numentries && mp->ma_keys->dk_kind == DICT_KEYS_UNICODE) {
- memcpy(newentries, oldentries, numentries * sizeof(PyDictUnicodeEntry));
+ memcpy(newentries, oldentries, numentries * sizeof(PyDictUnicodeEntry));
}
else {
PyDictUnicodeEntry *ep = oldentries;
@@ -1533,27 +1521,31 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize, int unicode)
}
}
- assert(oldkeys->dk_kind != DICT_KEYS_SPLIT);
- assert(oldkeys->dk_refcnt == 1);
+ // We can not use free_keys_object here because key's reference
+ // are moved already.
+ if (oldkeys != Py_EMPTY_KEYS) {
+ assert(oldkeys->dk_kind != DICT_KEYS_SPLIT);
+ assert(oldkeys->dk_refcnt == 1);
#ifdef Py_REF_DEBUG
- _Py_RefTotal--;
+ _Py_RefTotal--;
#endif
#if PyDict_MAXFREELIST > 0
- struct _Py_dict_state *state = get_dict_state();
+ struct _Py_dict_state *state = get_dict_state();
#ifdef Py_DEBUG
- // dictresize() must not be called after _PyDict_Fini()
- assert(state->keys_numfree != -1);
+ // dictresize() must not be called after _PyDict_Fini()
+ assert(state->keys_numfree != -1);
#endif
- if (DK_LOG_SIZE(oldkeys) == PyDict_LOG_MINSIZE &&
- DK_IS_UNICODE(oldkeys) &&
- state->keys_numfree < PyDict_MAXFREELIST)
- {
- state->keys_free_list[state->keys_numfree++] = oldkeys;
- }
- else
+ if (DK_LOG_SIZE(oldkeys) == PyDict_LOG_MINSIZE &&
+ DK_IS_UNICODE(oldkeys) &&
+ state->keys_numfree < PyDict_MAXFREELIST)
+ {
+ state->keys_free_list[state->keys_numfree++] = oldkeys;
+ }
+ else
#endif
- {
- PyObject_Free(oldkeys);
+ {
+ PyObject_Free(oldkeys);
+ }
}
}
@@ -1844,9 +1836,7 @@ _PyDict_LoadGlobal(PyDictObject *globals, PyDictObject *builtins, PyObject *key)
Py_hash_t hash;
PyObject *value;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1)
- {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -1873,9 +1863,7 @@ _PyDict_SetItem_Take2(PyDictObject *mp, PyObject *key, PyObject *value)
assert(value);
assert(PyDict_Check(mp));
Py_hash_t hash;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1)
- {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1) {
Py_DECREF(key);
@@ -1998,8 +1986,7 @@ PyDict_DelItem(PyObject *op, PyObject *key)
{
Py_hash_t hash;
assert(key);
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return -1;
@@ -2091,12 +2078,13 @@ PyDict_Clear(PyObject *op)
mp = ((PyDictObject *)op);
oldkeys = mp->ma_keys;
oldvalues = mp->ma_values;
- if (oldvalues == empty_values)
+ if (oldkeys == Py_EMPTY_KEYS) {
return;
+ }
/* Empty the dict... */
dictkeys_incref(Py_EMPTY_KEYS);
mp->ma_keys = Py_EMPTY_KEYS;
- mp->ma_values = empty_values;
+ mp->ma_values = NULL;
mp->ma_used = 0;
mp->ma_version_tag = DICT_NEXT_VERSION();
/* ...then clear the keys and values */
@@ -2257,8 +2245,7 @@ _PyDict_Pop(PyObject *dict, PyObject *key, PyObject *deflt)
_PyErr_SetKeyError(key);
return NULL;
}
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -2372,16 +2359,14 @@ dict_dealloc(PyDictObject *mp)
PyObject_GC_UnTrack(mp);
Py_TRASHCAN_BEGIN(mp, dict_dealloc)
if (values != NULL) {
- if (values != empty_values) {
- for (i = 0, n = mp->ma_keys->dk_nentries; i < n; i++) {
- Py_XDECREF(values->values[i]);
- }
- free_values(values);
+ for (i = 0, n = mp->ma_keys->dk_nentries; i < n; i++) {
+ Py_XDECREF(values->values[i]);
}
+ free_values(values);
dictkeys_decref(keys);
}
else if (keys != NULL) {
- assert(keys->dk_refcnt == 1);
+ assert(keys->dk_refcnt == 1 || keys == Py_EMPTY_KEYS);
dictkeys_decref(keys);
}
#if PyDict_MAXFREELIST > 0
@@ -2498,8 +2483,7 @@ dict_subscript(PyDictObject *mp, PyObject *key)
Py_hash_t hash;
PyObject *value;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -2862,9 +2846,7 @@ dict_merge(PyObject *a, PyObject *b, int override)
dictkeys_decref(mp->ma_keys);
mp->ma_keys = keys;
if (mp->ma_values != NULL) {
- if (mp->ma_values != empty_values) {
- free_values(mp->ma_values);
- }
+ free_values(mp->ma_values);
mp->ma_values = NULL;
}
@@ -3257,8 +3239,7 @@ dict___contains__(PyDictObject *self, PyObject *key)
Py_ssize_t ix;
PyObject *value;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -3289,8 +3270,7 @@ dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
Py_hash_t hash;
Py_ssize_t ix;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -3317,8 +3297,7 @@ PyDict_SetDefault(PyObject *d, PyObject *key, PyObject *defaultobj)
return NULL;
}
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return NULL;
@@ -3707,8 +3686,7 @@ PyDict_Contains(PyObject *op, PyObject *key)
PyDictObject *mp = (PyDictObject *)op;
PyObject *value;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1) {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1)
return -1;
@@ -3780,7 +3758,7 @@ dict_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
d->ma_version_tag = DICT_NEXT_VERSION();
dictkeys_incref(Py_EMPTY_KEYS);
d->ma_keys = Py_EMPTY_KEYS;
- d->ma_values = empty_values;
+ d->ma_values = NULL;
ASSERT_CONSISTENT(d);
if (type != &PyDict_Type) {
1
0
March 2, 2022
https://github.com/python/cpython/commit/eb6c840a2414dc057ffcfbb5ad68d6253c…
commit: eb6c840a2414dc057ffcfbb5ad68d6253c8dd57c
branch: 3.8
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ambv <lukasz(a)langa.pl>
date: 2022-03-02T10:19:33+01:00
summary:
bpo-46794: Bump up the libexpat version into 2.4.6 (GH-31487) (GH-31520)
(cherry picked from commit 1935e1cc284942bec8006287c939e295e1a7bf13)
Co-authored-by: Dong-hee Na <donghee.na(a)python.org>
files:
A Misc/NEWS.d/next/Core and Builtins/2022-02-22-12-07-53.bpo-46794.6WvJ9o.rst
M Modules/expat/expat.h
M Modules/expat/xmlparse.c
M Modules/expat/xmlrole.c
M Modules/expat/xmltok.c
M Modules/expat/xmltok_impl.c
diff --git a/Misc/NEWS.d/next/Core and Builtins/2022-02-22-12-07-53.bpo-46794.6WvJ9o.rst b/Misc/NEWS.d/next/Core and Builtins/2022-02-22-12-07-53.bpo-46794.6WvJ9o.rst
new file mode 100644
index 0000000000000..127387d32cb7a
--- /dev/null
+++ b/Misc/NEWS.d/next/Core and Builtins/2022-02-22-12-07-53.bpo-46794.6WvJ9o.rst
@@ -0,0 +1 @@
+Bump up the libexpat version into 2.4.6
diff --git a/Modules/expat/expat.h b/Modules/expat/expat.h
index 4c5704fd9336b..46a0e1bcd22de 100644
--- a/Modules/expat/expat.h
+++ b/Modules/expat/expat.h
@@ -1041,7 +1041,7 @@ XML_SetBillionLaughsAttackProtectionActivationThreshold(
*/
#define XML_MAJOR_VERSION 2
#define XML_MINOR_VERSION 4
-#define XML_MICRO_VERSION 4
+#define XML_MICRO_VERSION 6
#ifdef __cplusplus
}
diff --git a/Modules/expat/xmlparse.c b/Modules/expat/xmlparse.c
index 4b43e61321691..7db28d07acbcd 100644
--- a/Modules/expat/xmlparse.c
+++ b/Modules/expat/xmlparse.c
@@ -1,4 +1,4 @@
-/* 2e2c8ce5f11a473d65ec313ab20ceee6afefb355f5405afc06e7204e2e41c8c0 (2.4.4+)
+/* a30d2613dcfdef81475a9d1a349134d2d42722172fdaa7d5bb12ed2aa74b9596 (2.4.6+)
__ __ _
___\ \/ /_ __ __ _| |_
/ _ \\ /| '_ \ / _` | __|
@@ -11,7 +11,7 @@
Copyright (c) 2000-2006 Fred L. Drake, Jr. <fdrake(a)users.sourceforge.net>
Copyright (c) 2001-2002 Greg Stein <gstein(a)users.sourceforge.net>
Copyright (c) 2002-2016 Karl Waclawek <karl(a)waclawek.net>
- Copyright (c) 2005-2009 Steven Solie <ssolie(a)users.sourceforge.net>
+ Copyright (c) 2005-2009 Steven Solie <steven(a)solie.ca>
Copyright (c) 2016 Eric Rahm <erahm(a)mozilla.com>
Copyright (c) 2016-2022 Sebastian Pipping <sebastian(a)pipping.org>
Copyright (c) 2016 Gaurav <g.gupta(a)samsung.com>
@@ -718,8 +718,7 @@ XML_ParserCreate(const XML_Char *encodingName) {
XML_Parser XMLCALL
XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep) {
- XML_Char tmp[2];
- *tmp = nsSep;
+ XML_Char tmp[2] = {nsSep, 0};
return XML_ParserCreate_MM(encodingName, NULL, tmp);
}
@@ -1344,8 +1343,7 @@ XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char *context,
would be otherwise.
*/
if (parser->m_ns) {
- XML_Char tmp[2];
- *tmp = parser->m_namespaceSeparator;
+ XML_Char tmp[2] = {parser->m_namespaceSeparator, 0};
parser = parserCreate(encodingName, &parser->m_mem, tmp, newDtd);
} else {
parser = parserCreate(encodingName, &parser->m_mem, NULL, newDtd);
@@ -2563,6 +2561,7 @@ storeRawNames(XML_Parser parser) {
while (tag) {
int bufSize;
int nameLen = sizeof(XML_Char) * (tag->name.strLen + 1);
+ size_t rawNameLen;
char *rawNameBuf = tag->buf + nameLen;
/* Stop if already stored. Since m_tagStack is a stack, we can stop
at the first entry that has already been copied; everything
@@ -2574,7 +2573,11 @@ storeRawNames(XML_Parser parser) {
/* For re-use purposes we need to ensure that the
size of tag->buf is a multiple of sizeof(XML_Char).
*/
- bufSize = nameLen + ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+ rawNameLen = ROUND_UP(tag->rawNameLength, sizeof(XML_Char));
+ /* Detect and prevent integer overflow. */
+ if (rawNameLen > (size_t)INT_MAX - nameLen)
+ return XML_FALSE;
+ bufSize = nameLen + (int)rawNameLen;
if (bufSize > tag->bufEnd - tag->buf) {
char *temp = (char *)REALLOC(parser, tag->buf, bufSize);
if (temp == NULL)
@@ -3756,6 +3759,17 @@ addBinding(XML_Parser parser, PREFIX *prefix, const ATTRIBUTE_ID *attId,
if (! mustBeXML && isXMLNS
&& (len > xmlnsLen || uri[len] != xmlnsNamespace[len]))
isXMLNS = XML_FALSE;
+
+ // NOTE: While Expat does not validate namespace URIs against RFC 3986,
+ // we have to at least make sure that the XML processor on top of
+ // Expat (that is splitting tag names by namespace separator into
+ // 2- or 3-tuples (uri-local or uri-local-prefix)) cannot be confused
+ // by an attacker putting additional namespace separator characters
+ // into namespace declarations. That would be ambiguous and not to
+ // be expected.
+ if (parser->m_ns && (uri[len] == parser->m_namespaceSeparator)) {
+ return XML_ERROR_SYNTAX;
+ }
}
isXML = isXML && len == xmlLen;
isXMLNS = isXMLNS && len == xmlnsLen;
@@ -7317,44 +7331,15 @@ nextScaffoldPart(XML_Parser parser) {
return next;
}
-static void
-build_node(XML_Parser parser, int src_node, XML_Content *dest,
- XML_Content **contpos, XML_Char **strpos) {
- DTD *const dtd = parser->m_dtd; /* save one level of indirection */
- dest->type = dtd->scaffold[src_node].type;
- dest->quant = dtd->scaffold[src_node].quant;
- if (dest->type == XML_CTYPE_NAME) {
- const XML_Char *src;
- dest->name = *strpos;
- src = dtd->scaffold[src_node].name;
- for (;;) {
- *(*strpos)++ = *src;
- if (! *src)
- break;
- src++;
- }
- dest->numchildren = 0;
- dest->children = NULL;
- } else {
- unsigned int i;
- int cn;
- dest->numchildren = dtd->scaffold[src_node].childcnt;
- dest->children = *contpos;
- *contpos += dest->numchildren;
- for (i = 0, cn = dtd->scaffold[src_node].firstchild; i < dest->numchildren;
- i++, cn = dtd->scaffold[cn].nextsib) {
- build_node(parser, cn, &(dest->children[i]), contpos, strpos);
- }
- dest->name = NULL;
- }
-}
-
static XML_Content *
build_model(XML_Parser parser) {
+ /* Function build_model transforms the existing parser->m_dtd->scaffold
+ * array of CONTENT_SCAFFOLD tree nodes into a new array of
+ * XML_Content tree nodes followed by a gapless list of zero-terminated
+ * strings. */
DTD *const dtd = parser->m_dtd; /* save one level of indirection */
XML_Content *ret;
- XML_Content *cpos;
- XML_Char *str;
+ XML_Char *str; /* the current string writing location */
/* Detect and prevent integer overflow.
* The preprocessor guard addresses the "always false" warning
@@ -7380,10 +7365,96 @@ build_model(XML_Parser parser) {
if (! ret)
return NULL;
- str = (XML_Char *)(&ret[dtd->scaffCount]);
- cpos = &ret[1];
+ /* What follows is an iterative implementation (of what was previously done
+ * recursively in a dedicated function called "build_node". The old recursive
+ * build_node could be forced into stack exhaustion from input as small as a
+ * few megabyte, and so that was a security issue. Hence, a function call
+ * stack is avoided now by resolving recursion.)
+ *
+ * The iterative approach works as follows:
+ *
+ * - We have two writing pointers, both walking up the result array; one does
+ * the work, the other creates "jobs" for its colleague to do, and leads
+ * the way:
+ *
+ * - The faster one, pointer jobDest, always leads and writes "what job
+ * to do" by the other, once they reach that place in the
+ * array: leader "jobDest" stores the source node array index (relative
+ * to array dtd->scaffold) in field "numchildren".
+ *
+ * - The slower one, pointer dest, looks at the value stored in the
+ * "numchildren" field (which actually holds a source node array index
+ * at that time) and puts the real data from dtd->scaffold in.
+ *
+ * - Before the loop starts, jobDest writes source array index 0
+ * (where the root node is located) so that dest will have something to do
+ * when it starts operation.
+ *
+ * - Whenever nodes with children are encountered, jobDest appends
+ * them as new jobs, in order. As a result, tree node siblings are
+ * adjacent in the resulting array, for example:
+ *
+ * [0] root, has two children
+ * [1] first child of 0, has three children
+ * [3] first child of 1, does not have children
+ * [4] second child of 1, does not have children
+ * [5] third child of 1, does not have children
+ * [2] second child of 0, does not have children
+ *
+ * Or (the same data) presented in flat array view:
+ *
+ * [0] root, has two children
+ *
+ * [1] first child of 0, has three children
+ * [2] second child of 0, does not have children
+ *
+ * [3] first child of 1, does not have children
+ * [4] second child of 1, does not have children
+ * [5] third child of 1, does not have children
+ *
+ * - The algorithm repeats until all target array indices have been processed.
+ */
+ XML_Content *dest = ret; /* tree node writing location, moves upwards */
+ XML_Content *const destLimit = &ret[dtd->scaffCount];
+ XML_Content *jobDest = ret; /* next free writing location in target array */
+ str = (XML_Char *)&ret[dtd->scaffCount];
+
+ /* Add the starting job, the root node (index 0) of the source tree */
+ (jobDest++)->numchildren = 0;
+
+ for (; dest < destLimit; dest++) {
+ /* Retrieve source tree array index from job storage */
+ const int src_node = (int)dest->numchildren;
+
+ /* Convert item */
+ dest->type = dtd->scaffold[src_node].type;
+ dest->quant = dtd->scaffold[src_node].quant;
+ if (dest->type == XML_CTYPE_NAME) {
+ const XML_Char *src;
+ dest->name = str;
+ src = dtd->scaffold[src_node].name;
+ for (;;) {
+ *str++ = *src;
+ if (! *src)
+ break;
+ src++;
+ }
+ dest->numchildren = 0;
+ dest->children = NULL;
+ } else {
+ unsigned int i;
+ int cn;
+ dest->name = NULL;
+ dest->numchildren = dtd->scaffold[src_node].childcnt;
+ dest->children = jobDest;
+
+ /* Append scaffold indices of children to array */
+ for (i = 0, cn = dtd->scaffold[src_node].firstchild;
+ i < dest->numchildren; i++, cn = dtd->scaffold[cn].nextsib)
+ (jobDest++)->numchildren = (unsigned int)cn;
+ }
+ }
- build_node(parser, 0, ret, &cpos, &str);
return ret;
}
@@ -7412,7 +7483,7 @@ getElementType(XML_Parser parser, const ENCODING *enc, const char *ptr,
static XML_Char *
copyString(const XML_Char *s, const XML_Memory_Handling_Suite *memsuite) {
- int charsRequired = 0;
+ size_t charsRequired = 0;
XML_Char *result;
/* First determine how long the string is */
diff --git a/Modules/expat/xmlrole.c b/Modules/expat/xmlrole.c
index 77746ee42d10a..3f0f5c150c627 100644
--- a/Modules/expat/xmlrole.c
+++ b/Modules/expat/xmlrole.c
@@ -11,7 +11,7 @@
Copyright (c) 2002 Greg Stein <gstein(a)users.sourceforge.net>
Copyright (c) 2002-2006 Karl Waclawek <karl(a)waclawek.net>
Copyright (c) 2002-2003 Fred L. Drake, Jr. <fdrake(a)users.sourceforge.net>
- Copyright (c) 2005-2009 Steven Solie <ssolie(a)users.sourceforge.net>
+ Copyright (c) 2005-2009 Steven Solie <steven(a)solie.ca>
Copyright (c) 2016-2021 Sebastian Pipping <sebastian(a)pipping.org>
Copyright (c) 2017 Rhodri James <rhodri(a)wildebeest.org.uk>
Copyright (c) 2019 David Loffredo <loffredo(a)steptools.com>
diff --git a/Modules/expat/xmltok.c b/Modules/expat/xmltok.c
index 502ca1adc33b9..c659983b4008b 100644
--- a/Modules/expat/xmltok.c
+++ b/Modules/expat/xmltok.c
@@ -11,8 +11,8 @@
Copyright (c) 2001-2003 Fred L. Drake, Jr. <fdrake(a)users.sourceforge.net>
Copyright (c) 2002 Greg Stein <gstein(a)users.sourceforge.net>
Copyright (c) 2002-2016 Karl Waclawek <karl(a)waclawek.net>
- Copyright (c) 2005-2009 Steven Solie <ssolie(a)users.sourceforge.net>
- Copyright (c) 2016-2021 Sebastian Pipping <sebastian(a)pipping.org>
+ Copyright (c) 2005-2009 Steven Solie <steven(a)solie.ca>
+ Copyright (c) 2016-2022 Sebastian Pipping <sebastian(a)pipping.org>
Copyright (c) 2016 Pascal Cuoq <cuoq(a)trust-in-soft.com>
Copyright (c) 2016 Don Lewis <truckman(a)apache.org>
Copyright (c) 2017 Rhodri James <rhodri(a)wildebeest.org.uk>
@@ -98,11 +98,6 @@
+ ((((byte)[1]) & 3) << 1) + ((((byte)[2]) >> 5) & 1)] \
& (1u << (((byte)[2]) & 0x1F)))
-#define UTF8_GET_NAMING(pages, p, n) \
- ((n) == 2 \
- ? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \
- : ((n) == 3 ? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) : 0))
-
/* Detection of invalid UTF-8 sequences is based on Table 3.1B
of Unicode 3.2: http://www.unicode.org/unicode/reports/tr28/
with the additional restriction of not allowing the Unicode
diff --git a/Modules/expat/xmltok_impl.c b/Modules/expat/xmltok_impl.c
index 0430591b42636..4072b06497d1c 100644
--- a/Modules/expat/xmltok_impl.c
+++ b/Modules/expat/xmltok_impl.c
@@ -10,7 +10,7 @@
Copyright (c) 2000 Clark Cooper <coopercc(a)users.sourceforge.net>
Copyright (c) 2002 Fred L. Drake, Jr. <fdrake(a)users.sourceforge.net>
Copyright (c) 2002-2016 Karl Waclawek <karl(a)waclawek.net>
- Copyright (c) 2016-2021 Sebastian Pipping <sebastian(a)pipping.org>
+ Copyright (c) 2016-2022 Sebastian Pipping <sebastian(a)pipping.org>
Copyright (c) 2017 Rhodri James <rhodri(a)wildebeest.org.uk>
Copyright (c) 2018 Benjamin Peterson <benjamin(a)python.org>
Copyright (c) 2018 Anton Maklakov <antmak.pub(a)gmail.com>
@@ -69,7 +69,7 @@
case BT_LEAD##n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
- if (! IS_NAME_CHAR(enc, ptr, n)) { \
+ if (IS_INVALID_CHAR(enc, ptr, n) || ! IS_NAME_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
@@ -98,7 +98,7 @@
case BT_LEAD##n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
- if (! IS_NMSTRT_CHAR(enc, ptr, n)) { \
+ if (IS_INVALID_CHAR(enc, ptr, n) || ! IS_NMSTRT_CHAR(enc, ptr, n)) { \
*nextTokPtr = ptr; \
return XML_TOK_INVALID; \
} \
@@ -1142,6 +1142,10 @@ PREFIX(prologTok)(const ENCODING *enc, const char *ptr, const char *end,
case BT_LEAD##n: \
if (end - ptr < n) \
return XML_TOK_PARTIAL_CHAR; \
+ if (IS_INVALID_CHAR(enc, ptr, n)) { \
+ *nextTokPtr = ptr; \
+ return XML_TOK_INVALID; \
+ } \
if (IS_NMSTRT_CHAR(enc, ptr, n)) { \
ptr += n; \
tok = XML_TOK_NAME; \
@@ -1270,7 +1274,7 @@ PREFIX(attributeValueTok)(const ENCODING *enc, const char *ptr, const char *end,
switch (BYTE_TYPE(enc, ptr)) {
# define LEAD_CASE(n) \
case BT_LEAD##n: \
- ptr += n; \
+ ptr += n; /* NOTE: The encoding has already been validated. */ \
break;
LEAD_CASE(2)
LEAD_CASE(3)
@@ -1339,7 +1343,7 @@ PREFIX(entityValueTok)(const ENCODING *enc, const char *ptr, const char *end,
switch (BYTE_TYPE(enc, ptr)) {
# define LEAD_CASE(n) \
case BT_LEAD##n: \
- ptr += n; \
+ ptr += n; /* NOTE: The encoding has already been validated. */ \
break;
LEAD_CASE(2)
LEAD_CASE(3)
@@ -1518,7 +1522,7 @@ PREFIX(getAtts)(const ENCODING *enc, const char *ptr, int attsMax,
state = inName; \
}
# define LEAD_CASE(n) \
- case BT_LEAD##n: \
+ case BT_LEAD##n: /* NOTE: The encoding has already been validated. */ \
START_NAME ptr += (n - MINBPC(enc)); \
break;
LEAD_CASE(2)
@@ -1730,7 +1734,7 @@ PREFIX(nameLength)(const ENCODING *enc, const char *ptr) {
switch (BYTE_TYPE(enc, ptr)) {
# define LEAD_CASE(n) \
case BT_LEAD##n: \
- ptr += n; \
+ ptr += n; /* NOTE: The encoding has already been validated. */ \
break;
LEAD_CASE(2)
LEAD_CASE(3)
@@ -1775,7 +1779,7 @@ PREFIX(updatePosition)(const ENCODING *enc, const char *ptr, const char *end,
switch (BYTE_TYPE(enc, ptr)) {
# define LEAD_CASE(n) \
case BT_LEAD##n: \
- ptr += n; \
+ ptr += n; /* NOTE: The encoding has already been validated. */ \
pos->columnNumber++; \
break;
LEAD_CASE(2)
1
0
March 2, 2022
https://github.com/python/cpython/commit/20a1c8ee4bcb1c421b7cca1f3f5d6ad7ce…
commit: 20a1c8ee4bcb1c421b7cca1f3f5d6ad7ce30a9c9
branch: main
author: Nikita Sobolev <mail(a)sobolevn.me>
committer: JelleZijlstra <jelle.zijlstra(a)gmail.com>
date: 2022-03-01T21:29:46-08:00
summary:
bpo-46195: Do not add `Optional` in `get_type_hints` (GH-30304)
Co-authored-by: Ken Jin <28750310+Fidget-Spinner(a)users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra(a)gmail.com>
files:
A Misc/NEWS.d/next/Library/2021-12-30-21-38-51.bpo-46195.jFKGq_.rst
M Doc/library/typing.rst
M Lib/test/test_typing.py
M Lib/typing.py
diff --git a/Doc/library/typing.rst b/Doc/library/typing.rst
index a02ad244d9fbb..bfcbeb8c7e680 100644
--- a/Doc/library/typing.rst
+++ b/Doc/library/typing.rst
@@ -2185,9 +2185,7 @@ Introspection helpers
This is often the same as ``obj.__annotations__``. In addition,
forward references encoded as string literals are handled by evaluating
- them in ``globals`` and ``locals`` namespaces. If necessary,
- ``Optional[t]`` is added for function and method annotations if a default
- value equal to ``None`` is set. For a class ``C``, return
+ them in ``globals`` and ``locals`` namespaces. For a class ``C``, return
a dictionary constructed by merging all the ``__annotations__`` along
``C.__mro__`` in reverse order.
@@ -2214,6 +2212,11 @@ Introspection helpers
.. versionchanged:: 3.9
Added ``include_extras`` parameter as part of :pep:`593`.
+ .. versionchanged:: 3.11
+ Previously, ``Optional[t]`` was added for function and method annotations
+ if a default value equal to ``None`` was set.
+ Now the annotation is returned unchanged.
+
.. function:: get_args(tp)
.. function:: get_origin(tp)
diff --git a/Lib/test/test_typing.py b/Lib/test/test_typing.py
index dc1514d63b777..8fcc24c25eb95 100644
--- a/Lib/test/test_typing.py
+++ b/Lib/test/test_typing.py
@@ -2828,16 +2828,15 @@ def add_right(self, node: 'Node[T]' = None):
t = Node[int]
both_hints = get_type_hints(t.add_both, globals(), locals())
self.assertEqual(both_hints['left'], Optional[Node[T]])
- self.assertEqual(both_hints['right'], Optional[Node[T]])
- self.assertEqual(both_hints['left'], both_hints['right'])
- self.assertEqual(both_hints['stuff'], Optional[int])
+ self.assertEqual(both_hints['right'], Node[T])
+ self.assertEqual(both_hints['stuff'], int)
self.assertNotIn('blah', both_hints)
left_hints = get_type_hints(t.add_left, globals(), locals())
self.assertEqual(left_hints['node'], Optional[Node[T]])
right_hints = get_type_hints(t.add_right, globals(), locals())
- self.assertEqual(right_hints['node'], Optional[Node[T]])
+ self.assertEqual(right_hints['node'], Node[T])
def test_forwardref_instance_type_error(self):
fr = typing.ForwardRef('int')
@@ -3630,6 +3629,18 @@ def __iand__(self, other: Const["MySet[T]"]) -> "MySet[T]":
{'other': MySet[T], 'return': MySet[T]}
)
+ def test_get_type_hints_annotated_with_none_default(self):
+ # See: https://bugs.python.org/issue46195
+ def annotated_with_none_default(x: Annotated[int, 'data'] = None): ...
+ self.assertEqual(
+ get_type_hints(annotated_with_none_default),
+ {'x': int},
+ )
+ self.assertEqual(
+ get_type_hints(annotated_with_none_default, include_extras=True),
+ {'x': Annotated[int, 'data']},
+ )
+
def test_get_type_hints_classes_str_annotations(self):
class Foo:
y = str
diff --git a/Lib/typing.py b/Lib/typing.py
index ad1435ed23d27..9d668b3cf4a2a 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -1879,26 +1879,6 @@ def cast(typ, val):
return val
-def _get_defaults(func):
- """Internal helper to extract the default arguments, by name."""
- try:
- code = func.__code__
- except AttributeError:
- # Some built-in functions don't have __code__, __defaults__, etc.
- return {}
- pos_count = code.co_argcount
- arg_names = code.co_varnames
- arg_names = arg_names[:pos_count]
- defaults = func.__defaults__ or ()
- kwdefaults = func.__kwdefaults__
- res = dict(kwdefaults) if kwdefaults else {}
- pos_offset = pos_count - len(defaults)
- for name, value in zip(arg_names[pos_offset:], defaults):
- assert name not in res
- res[name] = value
- return res
-
-
_allowed_types = (types.FunctionType, types.BuiltinFunctionType,
types.MethodType, types.ModuleType,
WrapperDescriptorType, MethodWrapperType, MethodDescriptorType)
@@ -1908,8 +1888,7 @@ def get_type_hints(obj, globalns=None, localns=None, include_extras=False):
"""Return type hints for an object.
This is often the same as obj.__annotations__, but it handles
- forward references encoded as string literals, adds Optional[t] if a
- default value equal to None is set and recursively replaces all
+ forward references encoded as string literals and recursively replaces all
'Annotated[T, ...]' with 'T' (unless 'include_extras=True').
The argument may be a module, class, method, or function. The annotations
@@ -1989,7 +1968,6 @@ def get_type_hints(obj, globalns=None, localns=None, include_extras=False):
else:
raise TypeError('{!r} is not a module, class, method, '
'or function.'.format(obj))
- defaults = _get_defaults(obj)
hints = dict(hints)
for name, value in hints.items():
if value is None:
@@ -2002,10 +1980,7 @@ def get_type_hints(obj, globalns=None, localns=None, include_extras=False):
is_argument=not isinstance(obj, types.ModuleType),
is_class=False,
)
- value = _eval_type(value, globalns, localns)
- if name in defaults and defaults[name] is None:
- value = Optional[value]
- hints[name] = value
+ hints[name] = _eval_type(value, globalns, localns)
return hints if include_extras else {k: _strip_annotations(t) for k, t in hints.items()}
diff --git a/Misc/NEWS.d/next/Library/2021-12-30-21-38-51.bpo-46195.jFKGq_.rst b/Misc/NEWS.d/next/Library/2021-12-30-21-38-51.bpo-46195.jFKGq_.rst
new file mode 100644
index 0000000000000..03ea46c3a83f1
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2021-12-30-21-38-51.bpo-46195.jFKGq_.rst
@@ -0,0 +1,3 @@
+:func:`typing.get_type_hints` no longer adds ``Optional`` to parameters with
+``None`` as a default. This aligns to changes to PEP 484 in
+https://github.com/python/peps/pull/689
1
0
https://github.com/python/cpython/commit/6ddb09f35b922a3bbb59e408a3ca7636a6…
commit: 6ddb09f35b922a3bbb59e408a3ca7636a6938468
branch: main
author: Dennis Sweeney <36520290+sweeneyde(a)users.noreply.github.com>
committer: sweeneyde <36520290+sweeneyde(a)users.noreply.github.com>
date: 2022-03-01T23:46:30-05:00
summary:
bpo-46848: Use stringlib/fastsearch in mmap (GH-31625)
Speed up mmap.find(). Add _PyBytes_Find() and _PyBytes_ReverseFind().
files:
A Misc/NEWS.d/next/Library/2022-03-01-01-16-13.bpo-46848.BB01Fr.rst
M Include/cpython/bytesobject.h
M Modules/mmapmodule.c
M Objects/bytesobject.c
diff --git a/Include/cpython/bytesobject.h b/Include/cpython/bytesobject.h
index 6b3f55224fc55..38a0fe0af660f 100644
--- a/Include/cpython/bytesobject.h
+++ b/Include/cpython/bytesobject.h
@@ -116,3 +116,22 @@ PyAPI_FUNC(void*) _PyBytesWriter_WriteBytes(_PyBytesWriter *writer,
void *str,
const void *bytes,
Py_ssize_t size);
+
+/* Substring Search.
+
+ Returns the index of the first occurence of
+ a substring ("needle") in a larger text ("haystack").
+ If the needle is not found, return -1.
+ If the needle is found, add offset to the index.
+*/
+
+PyAPI_FUNC(Py_ssize_t)
+_PyBytes_Find(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset);
+
+/* Same as above, but search right-to-left */
+PyAPI_FUNC(Py_ssize_t)
+_PyBytes_ReverseFind(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset);
diff --git a/Misc/NEWS.d/next/Library/2022-03-01-01-16-13.bpo-46848.BB01Fr.rst b/Misc/NEWS.d/next/Library/2022-03-01-01-16-13.bpo-46848.BB01Fr.rst
new file mode 100644
index 0000000000000..bd20a843ab6ce
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2022-03-01-01-16-13.bpo-46848.BB01Fr.rst
@@ -0,0 +1,3 @@
+For performance, use the optimized string-searching implementations
+from :meth:`~bytes.find` and :meth:`~bytes.rfind`
+for :meth:`~mmap.find` and :meth:`~mmap.rfind`.
diff --git a/Modules/mmapmodule.c b/Modules/mmapmodule.c
index 26cedf1b9006d..6a038e72f93cf 100644
--- a/Modules/mmapmodule.c
+++ b/Modules/mmapmodule.c
@@ -315,12 +315,8 @@ mmap_gfind(mmap_object *self,
if (!PyArg_ParseTuple(args, reverse ? "y*|nn:rfind" : "y*|nn:find",
&view, &start, &end)) {
return NULL;
- } else {
- const char *p, *start_p, *end_p;
- int sign = reverse ? -1 : 1;
- const char *needle = view.buf;
- Py_ssize_t len = view.len;
-
+ }
+ else {
if (start < 0)
start += self->size;
if (start < 0)
@@ -335,21 +331,19 @@ mmap_gfind(mmap_object *self,
else if (end > self->size)
end = self->size;
- start_p = self->data + start;
- end_p = self->data + end;
-
- for (p = (reverse ? end_p - len : start_p);
- (p >= start_p) && (p + len <= end_p); p += sign) {
- Py_ssize_t i;
- for (i = 0; i < len && needle[i] == p[i]; ++i)
- /* nothing */;
- if (i == len) {
- PyBuffer_Release(&view);
- return PyLong_FromSsize_t(p - self->data);
- }
+ Py_ssize_t res;
+ if (reverse) {
+ res = _PyBytes_ReverseFind(
+ self->data + start, end - start,
+ view.buf, view.len, start);
+ }
+ else {
+ res = _PyBytes_Find(
+ self->data + start, end - start,
+ view.buf, view.len, start);
}
PyBuffer_Release(&view);
- return PyLong_FromLong(-1);
+ return PyLong_FromSsize_t(res);
}
}
diff --git a/Objects/bytesobject.c b/Objects/bytesobject.c
index 3d8a21696d1c8..4c67b8f7af213 100644
--- a/Objects/bytesobject.c
+++ b/Objects/bytesobject.c
@@ -1247,6 +1247,24 @@ PyBytes_AsStringAndSize(PyObject *obj,
#undef STRINGLIB_GET_EMPTY
+Py_ssize_t
+_PyBytes_Find(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset)
+{
+ return stringlib_find(haystack, len_haystack,
+ needle, len_needle, offset);
+}
+
+Py_ssize_t
+_PyBytes_ReverseFind(const char *haystack, Py_ssize_t len_haystack,
+ const char *needle, Py_ssize_t len_needle,
+ Py_ssize_t offset)
+{
+ return stringlib_rfind(haystack, len_haystack,
+ needle, len_needle, offset);
+}
+
PyObject *
PyBytes_Repr(PyObject *obj, int smartquotes)
{
1
0
[3.9] bpo-43853: Expand test suite for SQLite UDF's (GH-27642) (GH-31030) (GH-31586)
by miss-islington March 2, 2022
by miss-islington March 2, 2022
March 2, 2022
https://github.com/python/cpython/commit/3ea2a8f425d26e81d914c54d477e9d56eb…
commit: 3ea2a8f425d26e81d914c54d477e9d56eb27ac98
branch: 3.9
author: Erlend Egeberg Aasland <erlend.aasland(a)innova.no>
committer: miss-islington <31488909+miss-islington(a)users.noreply.github.com>
date: 2022-03-01T20:46:16-08:00
summary:
[3.9] bpo-43853: Expand test suite for SQLite UDF's (GH-27642) (GH-31030) (GH-31586)
(cherry picked from commit 3eb3b4f270757f66c7fb6dcf5afa416ee1582a4b)
files:
M Lib/sqlite3/test/userfunctions.py
M Modules/_sqlite/connection.c
M Modules/_sqlite/statement.c
diff --git a/Lib/sqlite3/test/userfunctions.py b/Lib/sqlite3/test/userfunctions.py
index 1bceefe8e69d9..8fc8b4c3abb5a 100644
--- a/Lib/sqlite3/test/userfunctions.py
+++ b/Lib/sqlite3/test/userfunctions.py
@@ -44,22 +44,6 @@ def func_returnlonglong():
def func_raiseexception():
5/0
-def func_isstring(v):
- return type(v) is str
-def func_isint(v):
- return type(v) is int
-def func_isfloat(v):
- return type(v) is float
-def func_isnone(v):
- return type(v) is type(None)
-def func_isblob(v):
- return isinstance(v, (bytes, memoryview))
-def func_islonglong(v):
- return isinstance(v, int) and v >= 1<<31
-
-def func(*args):
- return len(args)
-
class AggrNoStep:
def __init__(self):
pass
@@ -160,15 +144,13 @@ def setUp(self):
self.con.create_function("returnnull", 0, func_returnnull)
self.con.create_function("returnblob", 0, func_returnblob)
self.con.create_function("returnlonglong", 0, func_returnlonglong)
+ self.con.create_function("returnnan", 0, lambda: float("nan"))
+ self.con.create_function("returntoolargeint", 0, lambda: 1 << 65)
self.con.create_function("raiseexception", 0, func_raiseexception)
- self.con.create_function("isstring", 1, func_isstring)
- self.con.create_function("isint", 1, func_isint)
- self.con.create_function("isfloat", 1, func_isfloat)
- self.con.create_function("isnone", 1, func_isnone)
- self.con.create_function("isblob", 1, func_isblob)
- self.con.create_function("islonglong", 1, func_islonglong)
- self.con.create_function("spam", -1, func)
+ self.con.create_function("isblob", 1, lambda x: isinstance(x, bytes))
+ self.con.create_function("isnone", 1, lambda x: x is None)
+ self.con.create_function("spam", -1, lambda *x: len(x))
self.con.execute("create table test(t text)")
def tearDown(self):
@@ -245,6 +227,16 @@ def CheckFuncReturnLongLong(self):
val = cur.fetchone()[0]
self.assertEqual(val, 1<<31)
+ def CheckFuncReturnNaN(self):
+ cur = self.con.cursor()
+ cur.execute("select returnnan()")
+ self.assertIsNone(cur.fetchone()[0])
+
+ def CheckFuncReturnTooLargeInt(self):
+ cur = self.con.cursor()
+ with self.assertRaises(sqlite.OperationalError):
+ self.con.execute("select returntoolargeint()")
+
def CheckFuncException(self):
cur = self.con.cursor()
with self.assertRaises(sqlite.OperationalError) as cm:
@@ -252,50 +244,62 @@ def CheckFuncException(self):
cur.fetchone()
self.assertEqual(str(cm.exception), 'user-defined function raised exception')
- def CheckParamString(self):
- cur = self.con.cursor()
- for text in ["foo", str()]:
- with self.subTest(text=text):
- cur.execute("select isstring(?)", (text,))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
- def CheckParamInt(self):
- cur = self.con.cursor()
- cur.execute("select isint(?)", (42,))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
- def CheckParamFloat(self):
- cur = self.con.cursor()
- cur.execute("select isfloat(?)", (3.14,))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
- def CheckParamNone(self):
- cur = self.con.cursor()
- cur.execute("select isnone(?)", (None,))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
- def CheckParamBlob(self):
- cur = self.con.cursor()
- cur.execute("select isblob(?)", (memoryview(b"blob"),))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
- def CheckParamLongLong(self):
- cur = self.con.cursor()
- cur.execute("select islonglong(?)", (1<<42,))
- val = cur.fetchone()[0]
- self.assertEqual(val, 1)
-
def CheckAnyArguments(self):
cur = self.con.cursor()
cur.execute("select spam(?, ?)", (1, 2))
val = cur.fetchone()[0]
self.assertEqual(val, 2)
+ def CheckEmptyBlob(self):
+ cur = self.con.execute("select isblob(x'')")
+ self.assertTrue(cur.fetchone()[0])
+
+ def CheckNaNFloat(self):
+ cur = self.con.execute("select isnone(?)", (float("nan"),))
+ # SQLite has no concept of nan; it is converted to NULL
+ self.assertTrue(cur.fetchone()[0])
+
+ def CheckTooLargeInt(self):
+ err = "Python int too large to convert to SQLite INTEGER"
+ self.assertRaisesRegex(OverflowError, err, self.con.execute,
+ "select spam(?)", (1 << 65,))
+
+ def CheckNonContiguousBlob(self):
+ self.assertRaisesRegex(ValueError, "could not convert BLOB to buffer",
+ self.con.execute, "select spam(?)",
+ (memoryview(b"blob")[::2],))
+
+ def CheckParamSurrogates(self):
+ self.assertRaisesRegex(UnicodeEncodeError, "surrogates not allowed",
+ self.con.execute, "select spam(?)",
+ ("\ud803\ude6d",))
+
+ def CheckFuncParams(self):
+ results = []
+ def append_result(arg):
+ results.append((arg, type(arg)))
+ self.con.create_function("test_params", 1, append_result)
+
+ dataset = [
+ (42, int),
+ (-1, int),
+ (1234567890123456789, int),
+ (4611686018427387905, int), # 63-bit int with non-zero low bits
+ (3.14, float),
+ (float('inf'), float),
+ ("text", str),
+ ("1\x002", str),
+ ("\u02e2q\u02e1\u2071\u1d57\u1d49", str),
+ (b"blob", bytes),
+ (bytearray(range(2)), bytes),
+ (memoryview(b"blob"), bytes),
+ (None, type(None)),
+ ]
+ for val, _ in dataset:
+ cur = self.con.execute("select test_params(?)", (val,))
+ cur.fetchone()
+ self.assertEqual(dataset, results)
+
# Regarding deterministic functions:
#
# Between 3.8.3 and 3.15.0, deterministic functions were only used to
diff --git a/Modules/_sqlite/connection.c b/Modules/_sqlite/connection.c
index 30e333a4b86d8..90327376cc0d4 100644
--- a/Modules/_sqlite/connection.c
+++ b/Modules/_sqlite/connection.c
@@ -518,7 +518,11 @@ _pysqlite_set_result(sqlite3_context* context, PyObject* py_val)
return -1;
sqlite3_result_int64(context, value);
} else if (PyFloat_Check(py_val)) {
- sqlite3_result_double(context, PyFloat_AsDouble(py_val));
+ double value = PyFloat_AsDouble(py_val);
+ if (value == -1 && PyErr_Occurred()) {
+ return -1;
+ }
+ sqlite3_result_double(context, value);
} else if (PyUnicode_Check(py_val)) {
Py_ssize_t sz;
const char *str = PyUnicode_AsUTF8AndSize(py_val, &sz);
diff --git a/Modules/_sqlite/statement.c b/Modules/_sqlite/statement.c
index 23c204e7521f0..0272ce11207f1 100644
--- a/Modules/_sqlite/statement.c
+++ b/Modules/_sqlite/statement.c
@@ -152,9 +152,16 @@ int pysqlite_statement_bind_parameter(pysqlite_Statement* self, int pos, PyObjec
rc = sqlite3_bind_int64(self->st, pos, value);
break;
}
- case TYPE_FLOAT:
- rc = sqlite3_bind_double(self->st, pos, PyFloat_AsDouble(parameter));
+ case TYPE_FLOAT: {
+ double value = PyFloat_AsDouble(parameter);
+ if (value == -1 && PyErr_Occurred()) {
+ rc = -1;
+ }
+ else {
+ rc = sqlite3_bind_double(self->st, pos, value);
+ }
break;
+ }
case TYPE_UNICODE:
string = PyUnicode_AsUTF8AndSize(parameter, &buflen);
if (string == NULL)
1
0
https://github.com/python/cpython/commit/7190617b562eae44e961dd6cc8c3c139b4…
commit: 7190617b562eae44e961dd6cc8c3c139b48af303
branch: 3.9
author: Jelle Zijlstra <jelle.zijlstra(a)gmail.com>
committer: JelleZijlstra <jelle.zijlstra(a)gmail.com>
date: 2022-03-01T20:45:54-08:00
summary:
[3.9] Minor fixes to C API docs (GH-31501) (GH-31526)
* C API docs: move PyErr_SetImportErrorSubclass docs
It was in the section about warnings, but it makes more sense to
put it with PyErr_SetImportError.
* C API docs: document closeit argument to PyRun_AnyFileExFlags
It was already documented for PyRun_SimpleFileExFlags.
* textual fixes to unicode docs
* Move paragraph about tp_dealloc into tp_dealloc section
* __aiter__ returns an async iterator, not an awaitable.
(cherry picked from commit 43cf44ddcce6b225f959ea2a53e4817244ca6054)
files:
M Doc/c-api/exceptions.rst
M Doc/c-api/typeobj.rst
M Doc/c-api/unicode.rst
M Doc/c-api/veryhigh.rst
diff --git a/Doc/c-api/exceptions.rst b/Doc/c-api/exceptions.rst
index 614eb24525541..df38ba43ec431 100644
--- a/Doc/c-api/exceptions.rst
+++ b/Doc/c-api/exceptions.rst
@@ -253,6 +253,14 @@ For convenience, some of these functions will always return a
.. versionadded:: 3.3
+.. c:function:: PyObject* PyErr_SetImportErrorSubclass(PyObject *exception, PyObject *msg, PyObject *name, PyObject *path)
+
+ Much like :c:func:`PyErr_SetImportError` but this function allows for
+ specifying a subclass of :exc:`ImportError` to raise.
+
+ .. versionadded:: 3.6
+
+
.. c:function:: void PyErr_SyntaxLocationObject(PyObject *filename, int lineno, int col_offset)
Set file, line, and offset information for the current exception. If the
@@ -320,13 +328,6 @@ an error value).
:mod:`warnings` module and the :option:`-W` option in the command line
documentation. There is no C API for warning control.
-.. c:function:: PyObject* PyErr_SetImportErrorSubclass(PyObject *exception, PyObject *msg, PyObject *name, PyObject *path)
-
- Much like :c:func:`PyErr_SetImportError` but this function allows for
- specifying a subclass of :exc:`ImportError` to raise.
-
- .. versionadded:: 3.6
-
.. c:function:: int PyErr_WarnExplicitObject(PyObject *category, PyObject *message, PyObject *filename, int lineno, PyObject *module, PyObject *registry)
diff --git a/Doc/c-api/typeobj.rst b/Doc/c-api/typeobj.rst
index a800616730b9d..d58a53b1d69fb 100644
--- a/Doc/c-api/typeobj.rst
+++ b/Doc/c-api/typeobj.rst
@@ -474,7 +474,7 @@ PyObject Slots
--------------
The type object structure extends the :c:type:`PyVarObject` structure. The
-:attr:`ob_size` field is used for dynamic types (created by :func:`type_new`,
+:attr:`ob_size` field is used for dynamic types (created by :func:`type_new`,
usually called from a class statement). Note that :c:data:`PyType_Type` (the
metatype) initializes :c:member:`~PyTypeObject.tp_itemsize`, which means that its instances (i.e.
type objects) *must* have the :attr:`ob_size` field.
@@ -1909,6 +1909,17 @@ and :c:type:`PyType_Type` effectively act as defaults.)
For this field to be taken into account (even through inheritance),
you must also set the :const:`Py_TPFLAGS_HAVE_FINALIZE` flags bit.
+ Also, note that, in a garbage collected Python,
+ :c:member:`~PyTypeObject.tp_dealloc` may be called from
+ any Python thread, not just the thread which created the object (if the object
+ becomes part of a refcount cycle, that cycle might be collected by a garbage
+ collection on any thread). This is not a problem for Python API calls, since
+ the thread on which tp_dealloc is called will own the Global Interpreter Lock
+ (GIL). However, if the object being destroyed in turn destroys objects from some
+ other C or C++ library, care should be taken to ensure that destroying those
+ objects on the thread which called tp_dealloc will not violate any assumptions
+ of the library.
+
**Inheritance:**
This field is inherited by subtypes.
@@ -1933,17 +1944,6 @@ and :c:type:`PyType_Type` effectively act as defaults.)
.. versionadded:: 3.9 (the field exists since 3.8 but it's only used since 3.9)
-Also, note that, in a garbage collected Python, :c:member:`~PyTypeObject.tp_dealloc` may be called from
-any Python thread, not just the thread which created the object (if the object
-becomes part of a refcount cycle, that cycle might be collected by a garbage
-collection on any thread). This is not a problem for Python API calls, since
-the thread on which tp_dealloc is called will own the Global Interpreter Lock
-(GIL). However, if the object being destroyed in turn destroys objects from some
-other C or C++ library, care should be taken to ensure that destroying those
-objects on the thread which called tp_dealloc will not violate any assumptions
-of the library.
-
-
.. _heap-types:
Heap Types
@@ -2340,7 +2340,8 @@ Async Object Structures
PyObject *am_aiter(PyObject *self);
- Must return an :term:`awaitable` object. See :meth:`__anext__` for details.
+ Must return an :term:`asynchronous iterator` object.
+ See :meth:`__anext__` for details.
This slot may be set to ``NULL`` if an object does not implement
asynchronous iteration protocol.
diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
index 0d13d949e38c7..a6ae8ba13657d 100644
--- a/Doc/c-api/unicode.rst
+++ b/Doc/c-api/unicode.rst
@@ -1025,7 +1025,7 @@ Error handling is set by errors which may also be set to ``NULL`` meaning to use
the default handling defined for the codec. Default error handling for all
built-in codecs is "strict" (:exc:`ValueError` is raised).
-The codecs all use a similar interface. Only deviation from the following
+The codecs all use a similar interface. Only deviations from the following
generic ones are documented for simplicity.
@@ -1239,7 +1239,7 @@ These are the UTF-16 codec APIs:
``1``, any byte order mark is copied to the output (where it will result in
either a ``\ufeff`` or a ``\ufffe`` character).
- After completion, *\*byteorder* is set to the current byte order at the end
+ After completion, ``*byteorder`` is set to the current byte order at the end
of input data.
If *byteorder* is ``NULL``, the codec starts in native order mode.
@@ -1457,7 +1457,7 @@ Character Map Codecs
This codec is special in that it can be used to implement many different codecs
(and this is in fact what was done to obtain most of the standard codecs
-included in the :mod:`encodings` package). The codec uses mapping to encode and
+included in the :mod:`encodings` package). The codec uses mappings to encode and
decode characters. The mapping objects provided must support the
:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
@@ -1619,7 +1619,7 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Split a Unicode string at line breaks, returning a list of Unicode strings.
- CRLF is considered to be one line break. If *keepend* is ``0``, the Line break
+ CRLF is considered to be one line break. If *keepend* is ``0``, the line break
characters are not included in the resulting strings.
diff --git a/Doc/c-api/veryhigh.rst b/Doc/c-api/veryhigh.rst
index 551846ea6d720..3595a7b995626 100644
--- a/Doc/c-api/veryhigh.rst
+++ b/Doc/c-api/veryhigh.rst
@@ -75,6 +75,8 @@ the same library that the Python runtime is using.
:c:func:`PyRun_SimpleFile`. *filename* is decoded from the filesystem
encoding (:func:`sys.getfilesystemencoding`). If *filename* is ``NULL``, this
function uses ``"???"`` as the filename.
+ If *closeit* is true, the file is closed before
+ ``PyRun_SimpleFileExFlags()`` returns.
.. c:function:: int PyRun_SimpleString(const char *command)
1
0
March 2, 2022
https://github.com/python/cpython/commit/9833bb91e4d5c2606421d9ec2085f5c2df…
commit: 9833bb91e4d5c2606421d9ec2085f5c2dfb6f72c
branch: main
author: Inada Naoki <songofacandy(a)gmail.com>
committer: methane <songofacandy(a)gmail.com>
date: 2022-03-02T08:09:28+09:00
summary:
bpo-46845: Reduce dict size when all keys are Unicode (GH-31564)
files:
A Misc/NEWS.d/next/Core and Builtins/2022-02-25-14-57-21.bpo-46845.TUvaMG.rst
M Doc/whatsnew/3.11.rst
M Include/internal/pycore_dict.h
M Lib/test/test_sys.py
M Objects/call.c
M Objects/dictnotes.txt
M Objects/dictobject.c
M Python/ceval.c
M Tools/gdb/libpython.py
diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 8ebe1cb8cc4ca..fbfe02ccfc2c0 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -404,6 +404,11 @@ Optimizations
larger *k*).
(Contributed by Serhiy Storchaka in :issue:`37295`.)
+* Dict don't store hash value when all inserted keys are Unicode objects.
+ This reduces dict size. For example, ``sys.getsizeof(dict.fromkeys("abcdefg"))``
+ becomes 272 bytes from 352 bytes on 64bit platform.
+ (Contributed by Inada Naoki in :issue:`46845`.)
+
CPython bytecode changes
========================
diff --git a/Include/internal/pycore_dict.h b/Include/internal/pycore_dict.h
index 68f6663dc6445..24d2a711878ce 100644
--- a/Include/internal/pycore_dict.h
+++ b/Include/internal/pycore_dict.h
@@ -43,6 +43,11 @@ typedef struct {
PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;
+typedef struct {
+ PyObject *me_key; /* The key must be Unicode and have hash. */
+ PyObject *me_value; /* This field is only meaningful for combined tables */
+} PyDictUnicodeEntry;
+
extern PyDictKeysObject *_PyDict_NewKeysForClass(void);
extern PyObject *_PyDict_FromKeys(PyObject *, PyObject *, PyObject *);
@@ -70,6 +75,7 @@ extern PyObject *_PyDict_Pop_KnownHash(PyObject *, PyObject *, Py_hash_t, PyObje
#define DKIX_EMPTY (-1)
#define DKIX_DUMMY (-2) /* Used internally */
#define DKIX_ERROR (-3)
+#define DKIX_KEY_CHANGED (-4) /* Used internally */
typedef enum {
DICT_KEYS_GENERAL = 0,
@@ -114,7 +120,7 @@ struct _dictkeysobject {
Dynamically sized, SIZEOF_VOID_P is minimum. */
char dk_indices[]; /* char is required to avoid strict aliasing. */
- /* "PyDictKeyEntry dk_entries[dk_usable];" array follows:
+ /* "PyDictKeyEntry or PyDictUnicodeEntry dk_entries[USABLE_FRACTION(DK_SIZE(dk))];" array follows:
see the DK_ENTRIES() macro */
};
@@ -148,13 +154,20 @@ struct _dictvalues {
2 : sizeof(int32_t))
#endif
#define DK_ENTRIES(dk) \
- ((PyDictKeyEntry*)(&((int8_t*)((dk)->dk_indices))[(size_t)1 << (dk)->dk_log2_index_bytes]))
+ (assert(dk->dk_kind == DICT_KEYS_GENERAL), (PyDictKeyEntry*)(&((int8_t*)((dk)->dk_indices))[(size_t)1 << (dk)->dk_log2_index_bytes]))
+#define DK_UNICODE_ENTRIES(dk) \
+ (assert(dk->dk_kind != DICT_KEYS_GENERAL), (PyDictUnicodeEntry*)(&((int8_t*)((dk)->dk_indices))[(size_t)1 << (dk)->dk_log2_index_bytes]))
+#define DK_IS_UNICODE(dk) ((dk)->dk_kind != DICT_KEYS_GENERAL)
extern uint64_t _pydict_global_version;
#define DICT_NEXT_VERSION() (++_pydict_global_version)
extern PyObject *_PyObject_MakeDictFromInstanceAttributes(PyObject *obj, PyDictValues *values);
+extern PyObject *_PyDict_FromItems(
+ PyObject *const *keys, Py_ssize_t keys_offset,
+ PyObject *const *values, Py_ssize_t values_offset,
+ Py_ssize_t length);
static inline void
_PyDictValues_AddToInsertionOrder(PyDictValues *values, Py_ssize_t ix)
diff --git a/Lib/test/test_sys.py b/Lib/test/test_sys.py
index 70768f56fa9f1..f4deb1763b95f 100644
--- a/Lib/test/test_sys.py
+++ b/Lib/test/test_sys.py
@@ -1346,8 +1346,12 @@ def inner():
check({}.__iter__, size('2P'))
# empty dict
check({}, size('nQ2P'))
- # dict
- check({"a": 1}, size('nQ2P') + calcsize(DICT_KEY_STRUCT_FORMAT) + 8 + (8*2//3)*calcsize('n2P'))
+ # dict (string key)
+ check({"a": 1}, size('nQ2P') + calcsize(DICT_KEY_STRUCT_FORMAT) + 8 + (8*2//3)*calcsize('2P'))
+ longdict = {str(i): i for i in range(8)}
+ check(longdict, size('nQ2P') + calcsize(DICT_KEY_STRUCT_FORMAT) + 16 + (16*2//3)*calcsize('2P'))
+ # dict (non-string key)
+ check({1: 1}, size('nQ2P') + calcsize(DICT_KEY_STRUCT_FORMAT) + 8 + (8*2//3)*calcsize('n2P'))
longdict = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8}
check(longdict, size('nQ2P') + calcsize(DICT_KEY_STRUCT_FORMAT) + 16 + (16*2//3)*calcsize('n2P'))
# dictionary-keyview
@@ -1506,14 +1510,14 @@ def delx(self): del self.__x
)
class newstyleclass(object): pass
# Separate block for PyDictKeysObject with 8 keys and 5 entries
- check(newstyleclass, s + calcsize(DICT_KEY_STRUCT_FORMAT) + 64 + 42*calcsize("n2P"))
+ check(newstyleclass, s + calcsize(DICT_KEY_STRUCT_FORMAT) + 64 + 42*calcsize("2P"))
# dict with shared keys
[newstyleclass() for _ in range(100)]
check(newstyleclass().__dict__, size('nQ2P') + self.P)
o = newstyleclass()
o.a = o.b = o.c = o.d = o.e = o.f = o.g = o.h = 1
# Separate block for PyDictKeysObject with 16 keys and 10 entries
- check(newstyleclass, s + calcsize(DICT_KEY_STRUCT_FORMAT) + 64 + 42*calcsize("n2P"))
+ check(newstyleclass, s + calcsize(DICT_KEY_STRUCT_FORMAT) + 64 + 42*calcsize("2P"))
# dict with shared keys
check(newstyleclass().__dict__, size('nQ2P') + self.P)
# unicode
diff --git a/Misc/NEWS.d/next/Core and Builtins/2022-02-25-14-57-21.bpo-46845.TUvaMG.rst b/Misc/NEWS.d/next/Core and Builtins/2022-02-25-14-57-21.bpo-46845.TUvaMG.rst
new file mode 100644
index 0000000000000..518a67c4dd527
--- /dev/null
+++ b/Misc/NEWS.d/next/Core and Builtins/2022-02-25-14-57-21.bpo-46845.TUvaMG.rst
@@ -0,0 +1,3 @@
+Reduces dict size by removing hash value from hash table when all inserted
+keys are Unicode. For example, ``sys.getsizeof(dict.fromkeys("abcdefg"))``
+becomes 272 bytes from 352 bytes on 64bit platform.
diff --git a/Objects/call.c b/Objects/call.c
index 9646ad2d77507..cf8fa1eeffe1c 100644
--- a/Objects/call.c
+++ b/Objects/call.c
@@ -934,26 +934,11 @@ PyObject *
_PyStack_AsDict(PyObject *const *values, PyObject *kwnames)
{
Py_ssize_t nkwargs;
- PyObject *kwdict;
- Py_ssize_t i;
assert(kwnames != NULL);
nkwargs = PyTuple_GET_SIZE(kwnames);
- kwdict = _PyDict_NewPresized(nkwargs);
- if (kwdict == NULL) {
- return NULL;
- }
-
- for (i = 0; i < nkwargs; i++) {
- PyObject *key = PyTuple_GET_ITEM(kwnames, i);
- PyObject *value = *values++;
- /* If key already exists, replace it with the new value */
- if (PyDict_SetItem(kwdict, key, value)) {
- Py_DECREF(kwdict);
- return NULL;
- }
- }
- return kwdict;
+ return _PyDict_FromItems(&PyTuple_GET_ITEM(kwnames, 0), 1,
+ values, 1, nkwargs);
}
diff --git a/Objects/dictnotes.txt b/Objects/dictnotes.txt
index f89720c9f604e..db6a3cf1d634b 100644
--- a/Objects/dictnotes.txt
+++ b/Objects/dictnotes.txt
@@ -70,8 +70,8 @@ A values array
Tunable Dictionary Parameters
-----------------------------
-See comments for PyDict_MINSIZE_SPLIT, PyDict_MINSIZE_COMBINED,
-USABLE_FRACTION and GROWTH_RATE in dictobject.c
+See comments for PyDict_MINSIZE, USABLE_FRACTION and GROWTH_RATE in
+dictobject.c
Tune-ups should be measured across a broad range of applications and
use cases. A change to any parameter will help in some situations and
diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 68b79f2515682..20d7edab93ab1 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -40,8 +40,8 @@ Size of indices is dk_size. Type of each index in indices is vary on dk_size:
* int32 for 2**16 <= dk_size <= 2**31
* int64 for 2**32 <= dk_size
-dk_entries is array of PyDictKeyEntry. Its size is USABLE_FRACTION(dk_size).
-DK_ENTRIES(dk) can be used to get pointer to entries.
+dk_entries is array of PyDictKeyEntry when dk_kind == DICT_KEYS_GENERAL or
+PyDictUnicodeEntry otherwise. Its length is USABLE_FRACTION(dk_size).
NOTE: Since negative value is used for DKIX_EMPTY and DKIX_DUMMY, type of
dk_indices entry is signed integer and int16 is used for table which
@@ -123,6 +123,8 @@ As a consequence of this, split keys have a maximum size of 16.
#include "pycore_pystate.h" // _PyThreadState_GET()
#include "stringlib/eq.h" // unicode_eq()
+#include <stdbool.h>
+
/*[clinic input]
class dict "PyDictObject *" "&PyDict_Type"
[clinic start generated code]*/
@@ -230,7 +232,7 @@ equally good collision statistics, needed less code & used less memory.
*/
-static int dictresize(PyDictObject *mp, uint8_t log_newsize);
+static int dictresize(PyDictObject *mp, uint8_t log_newsize, int unicode);
static PyObject* dict_iter(PyDictObject *dict);
@@ -280,6 +282,12 @@ _PyDict_Fini(PyInterpreterState *interp)
#endif
}
+static inline Py_hash_t
+unicode_get_hash(PyObject *o)
+{
+ assert(PyUnicode_CheckExact(o));
+ return ((PyASCIIObject*)o)->hash;
+}
/* Print summary info about the state of the optimized allocator */
void
@@ -467,7 +475,7 @@ struct {
#define Py_EMPTY_KEYS &empty_keys_struct
/* Uncomment to check the dict content in _PyDict_CheckConsistency() */
-/* #define DEBUG_PYDICT */
+// #define DEBUG_PYDICT
#ifdef DEBUG_PYDICT
# define ASSERT_CONSISTENT(op) assert(_PyDict_CheckConsistency((PyObject *)(op), 1))
@@ -483,6 +491,24 @@ get_index_from_order(PyDictObject *mp, Py_ssize_t i)
return ((char *)mp->ma_values)[-3-i];
}
+#ifdef DEBUG_PYDICT
+static void
+dump_entries(PyDictKeysObject *dk)
+{
+ int kind = dk->dk_kind;
+ for (Py_ssize_t i = 0; i < dk->dk_nentries; i++) {
+ if (DK_IS_UNICODE(dk)) {
+ PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(dk)[i];
+ printf("key=%p value=%p\n", ep->me_key, ep->me_value);
+ }
+ else {
+ PyDictKeyEntry *ep = &DK_ENTRIES(dk)[i];
+ printf("key=%p hash=%lx value=%p\n", ep->me_key, ep->me_hash, ep->me_value);
+ }
+ }
+}
+#endif
+
int
_PyDict_CheckConsistency(PyObject *op, int check_content)
{
@@ -504,41 +530,56 @@ _PyDict_CheckConsistency(PyObject *op, int check_content)
if (!splitted) {
/* combined table */
+ CHECK(keys->dk_kind != DICT_KEYS_SPLIT);
CHECK(keys->dk_refcnt == 1);
}
else {
+ CHECK(keys->dk_kind == DICT_KEYS_SPLIT);
CHECK(mp->ma_used <= SHARED_KEYS_MAX_SIZE);
}
if (check_content) {
- PyDictKeyEntry *entries = DK_ENTRIES(keys);
-
for (Py_ssize_t i=0; i < DK_SIZE(keys); i++) {
Py_ssize_t ix = dictkeys_get_index(keys, i);
CHECK(DKIX_DUMMY <= ix && ix <= usable);
}
- for (Py_ssize_t i=0; i < usable; i++) {
- PyDictKeyEntry *entry = &entries[i];
- PyObject *key = entry->me_key;
+ if (keys->dk_kind == DICT_KEYS_GENERAL) {
+ PyDictKeyEntry *entries = DK_ENTRIES(keys);
+ for (Py_ssize_t i=0; i < usable; i++) {
+ PyDictKeyEntry *entry = &entries[i];
+ PyObject *key = entry->me_key;
- if (key != NULL) {
- if (PyUnicode_CheckExact(key)) {
- Py_hash_t hash = ((PyASCIIObject *)key)->hash;
- CHECK(hash != -1);
- CHECK(entry->me_hash == hash);
- }
- else {
+ if (key != NULL) {
/* test_dict fails if PyObject_Hash() is called again */
CHECK(entry->me_hash != -1);
- }
- if (!splitted) {
CHECK(entry->me_value != NULL);
+
+ if (PyUnicode_CheckExact(key)) {
+ Py_hash_t hash = unicode_get_hash(key);
+ CHECK(entry->me_hash == hash);
+ }
}
}
+ }
+ else {
+ PyDictUnicodeEntry *entries = DK_UNICODE_ENTRIES(keys);
+ for (Py_ssize_t i=0; i < usable; i++) {
+ PyDictUnicodeEntry *entry = &entries[i];
+ PyObject *key = entry->me_key;
+
+ if (key != NULL) {
+ CHECK(PyUnicode_CheckExact(key));
+ Py_hash_t hash = unicode_get_hash(key);
+ CHECK(hash != -1);
+ if (!splitted) {
+ CHECK(entry->me_value != NULL);
+ }
+ }
- if (splitted) {
- CHECK(entry->me_value == NULL);
+ if (splitted) {
+ CHECK(entry->me_value == NULL);
+ }
}
}
@@ -561,11 +602,12 @@ _PyDict_CheckConsistency(PyObject *op, int check_content)
static PyDictKeysObject*
-new_keys_object(uint8_t log2_size)
+new_keys_object(uint8_t log2_size, bool unicode)
{
PyDictKeysObject *dk;
Py_ssize_t usable;
int log2_bytes;
+ size_t entry_size = unicode ? sizeof(PyDictUnicodeEntry) : sizeof(PyDictKeyEntry);
assert(log2_size >= PyDict_LOG_MINSIZE);
@@ -591,7 +633,7 @@ new_keys_object(uint8_t log2_size)
// new_keys_object() must not be called after _PyDict_Fini()
assert(state->keys_numfree != -1);
#endif
- if (log2_size == PyDict_LOG_MINSIZE && state->keys_numfree > 0) {
+ if (log2_size == PyDict_LOG_MINSIZE && unicode && state->keys_numfree > 0) {
dk = state->keys_free_list[--state->keys_numfree];
}
else
@@ -599,7 +641,7 @@ new_keys_object(uint8_t log2_size)
{
dk = PyObject_Malloc(sizeof(PyDictKeysObject)
+ ((size_t)1 << log2_bytes)
- + sizeof(PyDictKeyEntry) * usable);
+ + entry_size * usable);
if (dk == NULL) {
PyErr_NoMemory();
return NULL;
@@ -611,23 +653,34 @@ new_keys_object(uint8_t log2_size)
dk->dk_refcnt = 1;
dk->dk_log2_size = log2_size;
dk->dk_log2_index_bytes = log2_bytes;
- dk->dk_kind = DICT_KEYS_UNICODE;
+ dk->dk_kind = unicode ? DICT_KEYS_UNICODE : DICT_KEYS_GENERAL;
dk->dk_nentries = 0;
dk->dk_usable = usable;
dk->dk_version = 0;
memset(&dk->dk_indices[0], 0xff, ((size_t)1 << log2_bytes));
- memset(DK_ENTRIES(dk), 0, sizeof(PyDictKeyEntry) * usable);
+ memset(&dk->dk_indices[(size_t)1 << log2_bytes], 0, entry_size * usable);
return dk;
}
static void
free_keys_object(PyDictKeysObject *keys)
{
- PyDictKeyEntry *entries = DK_ENTRIES(keys);
- Py_ssize_t i, n;
- for (i = 0, n = keys->dk_nentries; i < n; i++) {
- Py_XDECREF(entries[i].me_key);
- Py_XDECREF(entries[i].me_value);
+ assert(keys != Py_EMPTY_KEYS);
+ if (DK_IS_UNICODE(keys)) {
+ PyDictUnicodeEntry *entries = DK_UNICODE_ENTRIES(keys);
+ Py_ssize_t i, n;
+ for (i = 0, n = keys->dk_nentries; i < n; i++) {
+ Py_XDECREF(entries[i].me_key);
+ Py_XDECREF(entries[i].me_value);
+ }
+ }
+ else {
+ PyDictKeyEntry *entries = DK_ENTRIES(keys);
+ Py_ssize_t i, n;
+ for (i = 0, n = keys->dk_nentries; i < n; i++) {
+ Py_XDECREF(entries[i].me_key);
+ Py_XDECREF(entries[i].me_value);
+ }
}
#if PyDict_MAXFREELIST > 0
struct _Py_dict_state *state = get_dict_state();
@@ -635,7 +688,8 @@ free_keys_object(PyDictKeysObject *keys)
// free_keys_object() must not be called after _PyDict_Fini()
assert(state->keys_numfree != -1);
#endif
- if (DK_LOG_SIZE(keys) == PyDict_LOG_MINSIZE && state->keys_numfree < PyDict_MAXFREELIST) {
+ if (DK_LOG_SIZE(keys) == PyDict_LOG_MINSIZE && state->keys_numfree < PyDict_MAXFREELIST
+ && DK_IS_UNICODE(keys)) {
state->keys_free_list[state->keys_numfree++] = keys;
return;
}
@@ -751,15 +805,30 @@ clone_combined_dict_keys(PyDictObject *orig)
/* After copying key/value pairs, we need to incref all
keys and values and they are about to be co-owned by a
new dict object. */
- PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
+ PyObject **pkey, **pvalue;
+ size_t offs;
+ if (DK_IS_UNICODE(orig->ma_keys)) {
+ PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(keys);
+ pkey = &ep0->me_key;
+ pvalue = &ep0->me_value;
+ offs = sizeof(PyDictUnicodeEntry) / sizeof(PyObject*);
+ }
+ else {
+ PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
+ pkey = &ep0->me_key;
+ pvalue = &ep0->me_value;
+ offs = sizeof(PyDictKeyEntry) / sizeof(PyObject*);
+ }
+
Py_ssize_t n = keys->dk_nentries;
for (Py_ssize_t i = 0; i < n; i++) {
- PyDictKeyEntry *entry = &ep0[i];
- PyObject *value = entry->me_value;
+ PyObject *value = *pvalue;
if (value != NULL) {
Py_INCREF(value);
- Py_INCREF(entry->me_key);
+ Py_INCREF(*pkey);
}
+ pvalue += offs;
+ pkey += offs;
}
/* Since we copied the keys table we now have an extra reference
@@ -801,10 +870,11 @@ lookdict_index(PyDictKeysObject *k, Py_hash_t hash, Py_ssize_t index)
Py_UNREACHABLE();
}
+// Search non-Unicode key from Unicode table
static Py_ssize_t
-dictkeys_stringlookup(PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
+unicodekeys_lookup_generic(PyDictObject *mp, PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
{
- PyDictKeyEntry *ep0 = DK_ENTRIES(dk);
+ PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(dk);
size_t mask = DK_MASK(dk);
size_t perturb = hash;
size_t i = (size_t)hash & mask;
@@ -812,11 +882,57 @@ dictkeys_stringlookup(PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
for (;;) {
ix = dictkeys_get_index(dk, i);
if (ix >= 0) {
- PyDictKeyEntry *ep = &ep0[ix];
+ PyDictUnicodeEntry *ep = &ep0[ix];
+ assert(ep->me_key != NULL);
+ assert(PyUnicode_CheckExact(ep->me_key));
+ if (ep->me_key == key) {
+ return ix;
+ }
+ if (unicode_get_hash(ep->me_key) == hash) {
+ PyObject *startkey = ep->me_key;
+ Py_INCREF(startkey);
+ int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
+ Py_DECREF(startkey);
+ if (cmp < 0) {
+ return DKIX_ERROR;
+ }
+ if (dk == mp->ma_keys && ep->me_key == startkey) {
+ if (cmp > 0) {
+ return ix;
+ }
+ }
+ else {
+ /* The dict was mutated, restart */
+ return DKIX_KEY_CHANGED;
+ }
+ }
+ }
+ else if (ix == DKIX_EMPTY) {
+ return DKIX_EMPTY;
+ }
+ perturb >>= PERTURB_SHIFT;
+ i = mask & (i*5 + perturb + 1);
+ }
+ Py_UNREACHABLE();
+}
+
+// Search Unicode key from Unicode table.
+static Py_ssize_t _Py_HOT_FUNCTION
+unicodekeys_lookup_unicode(PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
+{
+ PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(dk);
+ size_t mask = DK_MASK(dk);
+ size_t perturb = hash;
+ size_t i = (size_t)hash & mask;
+ Py_ssize_t ix;
+ for (;;) {
+ ix = dictkeys_get_index(dk, i);
+ if (ix >= 0) {
+ PyDictUnicodeEntry *ep = &ep0[ix];
assert(ep->me_key != NULL);
assert(PyUnicode_CheckExact(ep->me_key));
if (ep->me_key == key ||
- (ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
+ (unicode_get_hash(ep->me_key) == hash && unicode_eq(ep->me_key, key))) {
return ix;
}
}
@@ -827,11 +943,11 @@ dictkeys_stringlookup(PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
i = mask & (i*5 + perturb + 1);
ix = dictkeys_get_index(dk, i);
if (ix >= 0) {
- PyDictKeyEntry *ep = &ep0[ix];
+ PyDictUnicodeEntry *ep = &ep0[ix];
assert(ep->me_key != NULL);
assert(PyUnicode_CheckExact(ep->me_key));
if (ep->me_key == key ||
- (ep->me_hash == hash && unicode_eq(ep->me_key, key))) {
+ (unicode_get_hash(ep->me_key) == hash && unicode_eq(ep->me_key, key))) {
return ix;
}
}
@@ -844,6 +960,51 @@ dictkeys_stringlookup(PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
Py_UNREACHABLE();
}
+// Search key from Generic table.
+static Py_ssize_t
+dictkeys_generic_lookup(PyDictObject *mp, PyDictKeysObject* dk, PyObject *key, Py_hash_t hash)
+{
+ PyDictKeyEntry *ep0 = DK_ENTRIES(dk);
+ size_t mask = DK_MASK(dk);
+ size_t perturb = hash;
+ size_t i = (size_t)hash & mask;
+ Py_ssize_t ix;
+ for (;;) {
+ ix = dictkeys_get_index(dk, i);
+ if (ix >= 0) {
+ PyDictKeyEntry *ep = &ep0[ix];
+ assert(ep->me_key != NULL);
+ if (ep->me_key == key) {
+ return ix;
+ }
+ if (ep->me_hash == hash) {
+ PyObject *startkey = ep->me_key;
+ Py_INCREF(startkey);
+ int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
+ Py_DECREF(startkey);
+ if (cmp < 0) {
+ return DKIX_ERROR;
+ }
+ if (dk == mp->ma_keys && ep->me_key == startkey) {
+ if (cmp > 0) {
+ return ix;
+ }
+ }
+ else {
+ /* The dict was mutated, restart */
+ return DKIX_KEY_CHANGED;
+ }
+ }
+ }
+ else if (ix == DKIX_EMPTY) {
+ return DKIX_EMPTY;
+ }
+ perturb >>= PERTURB_SHIFT;
+ i = mask & (i*5 + perturb + 1);
+ }
+ Py_UNREACHABLE();
+}
+
/* Lookup a string in a (all unicode) dict keys.
* Returns DKIX_ERROR if key is not a string,
* or if the dict keys is not all strings.
@@ -857,7 +1018,7 @@ _PyDictKeys_StringLookup(PyDictKeysObject* dk, PyObject *key)
if (!PyUnicode_CheckExact(key) || kind == DICT_KEYS_GENERAL) {
return DKIX_ERROR;
}
- Py_hash_t hash = ((PyASCIIObject *)key)->hash;
+ Py_hash_t hash = unicode_get_hash(key);
if (hash == -1) {
hash = PyUnicode_Type.tp_hash(key);
if (hash == -1) {
@@ -865,7 +1026,7 @@ _PyDictKeys_StringLookup(PyDictKeysObject* dk, PyObject *key)
return DKIX_ERROR;
}
}
- return dictkeys_stringlookup(dk, key, hash);
+ return unicodekeys_lookup_unicode(dk, key, hash);
}
/*
@@ -883,74 +1044,53 @@ _Py_dict_lookup() is general-purpose, and may return DKIX_ERROR if (and only if)
comparison raises an exception.
When the key isn't found a DKIX_EMPTY is returned.
*/
-Py_ssize_t _Py_HOT_FUNCTION
+Py_ssize_t
_Py_dict_lookup(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr)
{
PyDictKeysObject *dk;
+ DictKeysKind kind;
+ Py_ssize_t ix;
+
start:
dk = mp->ma_keys;
- DictKeysKind kind = dk->dk_kind;
- if (PyUnicode_CheckExact(key) && kind != DICT_KEYS_GENERAL) {
- Py_ssize_t ix = dictkeys_stringlookup(dk, key, hash);
- if (ix == DKIX_EMPTY) {
- *value_addr = NULL;
- }
- else if (kind == DICT_KEYS_SPLIT) {
- *value_addr = mp->ma_values->values[ix];
+ kind = dk->dk_kind;
+
+ if (kind != DICT_KEYS_GENERAL) {
+ if (PyUnicode_CheckExact(key)) {
+ ix = unicodekeys_lookup_unicode(dk, key, hash);
}
else {
- *value_addr = DK_ENTRIES(dk)[ix].me_value;
- }
- return ix;
- }
- PyDictKeyEntry *ep0 = DK_ENTRIES(dk);
- size_t mask = DK_MASK(dk);
- size_t perturb = hash;
- size_t i = (size_t)hash & mask;
- Py_ssize_t ix;
- for (;;) {
- ix = dictkeys_get_index(dk, i);
- if (ix == DKIX_EMPTY) {
- *value_addr = NULL;
- return ix;
+ ix = unicodekeys_lookup_generic(mp, dk, key, hash);
+ if (ix == DKIX_KEY_CHANGED) {
+ goto start;
+ }
}
+
if (ix >= 0) {
- PyDictKeyEntry *ep = &ep0[ix];
- assert(ep->me_key != NULL);
- if (ep->me_key == key) {
- goto found;
+ if (kind == DICT_KEYS_SPLIT) {
+ *value_addr = mp->ma_values->values[ix];
}
- if (ep->me_hash == hash) {
- PyObject *startkey = ep->me_key;
- Py_INCREF(startkey);
- int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
- Py_DECREF(startkey);
- if (cmp < 0) {
- *value_addr = NULL;
- return DKIX_ERROR;
- }
- if (dk == mp->ma_keys && ep->me_key == startkey) {
- if (cmp > 0) {
- goto found;
- }
- }
- else {
- /* The dict was mutated, restart */
- goto start;
- }
+ else {
+ *value_addr = DK_UNICODE_ENTRIES(dk)[ix].me_value;
}
}
- perturb >>= PERTURB_SHIFT;
- i = (i*5 + perturb + 1) & mask;
- }
- Py_UNREACHABLE();
-found:
- if (dk->dk_kind == DICT_KEYS_SPLIT) {
- *value_addr = mp->ma_values->values[ix];
+ else {
+ *value_addr = NULL;
+ }
}
else {
- *value_addr = ep0[ix].me_value;
+ ix = dictkeys_generic_lookup(mp, dk, key, hash);
+ if (ix == DKIX_KEY_CHANGED) {
+ goto start;
+ }
+ if (ix >= 0) {
+ *value_addr = DK_ENTRIES(dk)[ix].me_value;
+ }
+ else {
+ *value_addr = NULL;
+ }
}
+
return ix;
}
@@ -985,31 +1125,40 @@ _PyDict_MaybeUntrack(PyObject *op)
PyDictObject *mp;
PyObject *value;
Py_ssize_t i, numentries;
- PyDictKeyEntry *ep0;
if (!PyDict_CheckExact(op) || !_PyObject_GC_IS_TRACKED(op))
return;
mp = (PyDictObject *) op;
- ep0 = DK_ENTRIES(mp->ma_keys);
numentries = mp->ma_keys->dk_nentries;
if (_PyDict_HasSplitTable(mp)) {
for (i = 0; i < numentries; i++) {
if ((value = mp->ma_values->values[i]) == NULL)
continue;
if (_PyObject_GC_MAY_BE_TRACKED(value)) {
- assert(!_PyObject_GC_MAY_BE_TRACKED(ep0[i].me_key));
return;
}
}
}
else {
- for (i = 0; i < numentries; i++) {
- if ((value = ep0[i].me_value) == NULL)
- continue;
- if (_PyObject_GC_MAY_BE_TRACKED(value) ||
- _PyObject_GC_MAY_BE_TRACKED(ep0[i].me_key))
- return;
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(mp->ma_keys);
+ for (i = 0; i < numentries; i++) {
+ if ((value = ep0[i].me_value) == NULL)
+ continue;
+ if (_PyObject_GC_MAY_BE_TRACKED(value))
+ return;
+ }
+ }
+ else {
+ PyDictKeyEntry *ep0 = DK_ENTRIES(mp->ma_keys);
+ for (i = 0; i < numentries; i++) {
+ if ((value = ep0[i].me_value) == NULL)
+ continue;
+ if (_PyObject_GC_MAY_BE_TRACKED(value) ||
+ _PyObject_GC_MAY_BE_TRACKED(ep0[i].me_key))
+ return;
+ }
}
}
_PyObject_GC_UNTRACK(op);
@@ -1036,16 +1185,16 @@ find_empty_slot(PyDictKeysObject *keys, Py_hash_t hash)
}
static int
-insertion_resize(PyDictObject *mp)
+insertion_resize(PyDictObject *mp, int unicode)
{
- return dictresize(mp, calculate_log2_keysize(GROWTH_RATE(mp)));
+ return dictresize(mp, calculate_log2_keysize(GROWTH_RATE(mp)), unicode);
}
static Py_ssize_t
insert_into_dictkeys(PyDictKeysObject *keys, PyObject *name)
{
assert(PyUnicode_CheckExact(name));
- Py_hash_t hash = ((PyASCIIObject *)name)->hash;
+ Py_hash_t hash = unicode_get_hash(name);
if (hash == -1) {
hash = PyUnicode_Type.tp_hash(name);
if (hash == -1) {
@@ -1053,7 +1202,7 @@ insert_into_dictkeys(PyDictKeysObject *keys, PyObject *name)
return DKIX_EMPTY;
}
}
- Py_ssize_t ix = dictkeys_stringlookup(keys, name, hash);
+ Py_ssize_t ix = unicodekeys_lookup_unicode(keys, name, hash);
if (ix == DKIX_EMPTY) {
if (keys->dk_usable <= 0) {
return DKIX_EMPTY;
@@ -1063,11 +1212,10 @@ insert_into_dictkeys(PyDictKeysObject *keys, PyObject *name)
keys->dk_version = 0;
Py_ssize_t hashpos = find_empty_slot(keys, hash);
ix = keys->dk_nentries;
- PyDictKeyEntry *ep = &DK_ENTRIES(keys)[ix];
+ PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(keys)[ix];
dictkeys_set_index(keys, hashpos, ix);
assert(ep->me_key == NULL);
ep->me_key = name;
- ep->me_hash = hash;
keys->dk_usable--;
keys->dk_nentries++;
}
@@ -1085,11 +1233,11 @@ static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
PyObject *old_value;
- PyDictKeyEntry *ep;
- if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) {
- if (insertion_resize(mp) < 0)
+ if (DK_IS_UNICODE(mp->ma_keys) && !PyUnicode_CheckExact(key)) {
+ if (insertion_resize(mp, 0) < 0)
goto Fail;
+ assert(mp->ma_keys->dk_kind == DICT_KEYS_GENERAL);
}
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &old_value);
@@ -1104,24 +1252,32 @@ insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
assert(old_value == NULL);
if (mp->ma_keys->dk_usable <= 0) {
/* Need to resize. */
- if (insertion_resize(mp) < 0)
+ if (insertion_resize(mp, 1) < 0)
goto Fail;
}
- if (!PyUnicode_CheckExact(key) && mp->ma_keys->dk_kind != DICT_KEYS_GENERAL) {
- mp->ma_keys->dk_kind = DICT_KEYS_GENERAL;
- }
+
Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
- ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
- ep->me_key = key;
- ep->me_hash = hash;
- if (mp->ma_values) {
- Py_ssize_t index = mp->ma_keys->dk_nentries;
- _PyDictValues_AddToInsertionOrder(mp->ma_values, index);
- assert (mp->ma_values->values[index] == NULL);
- mp->ma_values->values[index] = value;
+
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ PyDictUnicodeEntry *ep;
+ ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
+ ep->me_key = key;
+ if (mp->ma_values) {
+ Py_ssize_t index = mp->ma_keys->dk_nentries;
+ _PyDictValues_AddToInsertionOrder(mp->ma_values, index);
+ assert (mp->ma_values->values[index] == NULL);
+ mp->ma_values->values[index] = value;
+ }
+ else {
+ ep->me_value = value;
+ }
}
else {
+ PyDictKeyEntry *ep;
+ ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
+ ep->me_key = key;
+ ep->me_hash = hash;
ep->me_value = value;
}
mp->ma_used++;
@@ -1143,7 +1299,12 @@ insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
}
else {
assert(old_value != NULL);
- DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ DK_UNICODE_ENTRIES(mp->ma_keys)[ix].me_value = value;
+ }
+ else {
+ DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
+ }
}
mp->ma_version_tag = DICT_NEXT_VERSION();
}
@@ -1166,15 +1327,13 @@ insert_to_emptydict(PyDictObject *mp, PyObject *key, Py_hash_t hash,
{
assert(mp->ma_keys == Py_EMPTY_KEYS);
- PyDictKeysObject *newkeys = new_keys_object(PyDict_LOG_MINSIZE);
+ int unicode = PyUnicode_CheckExact(key);
+ PyDictKeysObject *newkeys = new_keys_object(PyDict_LOG_MINSIZE, unicode);
if (newkeys == NULL) {
Py_DECREF(key);
Py_DECREF(value);
return -1;
}
- if (!PyUnicode_CheckExact(key)) {
- newkeys->dk_kind = DICT_KEYS_GENERAL;
- }
dictkeys_decref(Py_EMPTY_KEYS);
mp->ma_keys = newkeys;
mp->ma_values = NULL;
@@ -1182,11 +1341,18 @@ insert_to_emptydict(PyDictObject *mp, PyObject *key, Py_hash_t hash,
MAINTAIN_TRACKING(mp, key, value);
size_t hashpos = (size_t)hash & (PyDict_MINSIZE-1);
- PyDictKeyEntry *ep = DK_ENTRIES(mp->ma_keys);
dictkeys_set_index(mp->ma_keys, hashpos, 0);
- ep->me_key = key;
- ep->me_hash = hash;
- ep->me_value = value;
+ if (unicode) {
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(mp->ma_keys);
+ ep->me_key = key;
+ ep->me_value = value;
+ }
+ else {
+ PyDictKeyEntry *ep = DK_ENTRIES(mp->ma_keys);
+ ep->me_key = key;
+ ep->me_hash = hash;
+ ep->me_value = value;
+ }
mp->ma_used++;
mp->ma_version_tag = DICT_NEXT_VERSION();
mp->ma_keys->dk_usable--;
@@ -1198,7 +1364,7 @@ insert_to_emptydict(PyDictObject *mp, PyObject *key, Py_hash_t hash,
Internal routine used by dictresize() to build a hashtable of entries.
*/
static void
-build_indices(PyDictKeysObject *keys, PyDictKeyEntry *ep, Py_ssize_t n)
+build_indices_generic(PyDictKeysObject *keys, PyDictKeyEntry *ep, Py_ssize_t n)
{
size_t mask = DK_MASK(keys);
for (Py_ssize_t ix = 0; ix != n; ix++, ep++) {
@@ -1212,6 +1378,22 @@ build_indices(PyDictKeysObject *keys, PyDictKeyEntry *ep, Py_ssize_t n)
}
}
+static void
+build_indices_unicode(PyDictKeysObject *keys, PyDictUnicodeEntry *ep, Py_ssize_t n)
+{
+ size_t mask = DK_MASK(keys);
+ for (Py_ssize_t ix = 0; ix != n; ix++, ep++) {
+ Py_hash_t hash = unicode_get_hash(ep->me_key);
+ assert(hash != -1);
+ size_t i = hash & mask;
+ for (size_t perturb = hash; dictkeys_get_index(keys, i) != DKIX_EMPTY;) {
+ perturb >>= PERTURB_SHIFT;
+ i = mask & (i*5 + perturb + 1);
+ }
+ dictkeys_set_index(keys, i, ix);
+ }
+}
+
/*
Restructure the table by allocating a new table and reinserting all
items again. When entries have been deleted, the new table may
@@ -1220,14 +1402,17 @@ If a table is split (its keys and hashes are shared, its values are not),
then the values are temporarily copied into the table, it is resized as
a combined table, then the me_value slots in the old table are NULLed out.
After resizing a table is always combined.
+
+This function supports:
+ - Unicode split -> Unicode combined or Generic
+ - Unicode combined -> Unicode combined or Generic
+ - Generic -> Generic
*/
static int
-dictresize(PyDictObject *mp, uint8_t log2_newsize)
+dictresize(PyDictObject *mp, uint8_t log2_newsize, int unicode)
{
- Py_ssize_t numentries;
PyDictKeysObject *oldkeys;
PyDictValues *oldvalues;
- PyDictKeyEntry *oldentries, *newentries;
if (log2_newsize >= SIZEOF_SIZE_T*8) {
PyErr_NoMemory();
@@ -1236,6 +1421,11 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize)
assert(log2_newsize >= PyDict_LOG_MINSIZE);
oldkeys = mp->ma_keys;
+ oldvalues = mp->ma_values;
+
+ if (!DK_IS_UNICODE(oldkeys)) {
+ unicode = 0;
+ }
/* NOTE: Current odict checks mp->ma_keys to detect resize happen.
* So we can't reuse oldkeys even if oldkeys->dk_size == newsize.
@@ -1243,32 +1433,48 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize)
*/
/* Allocate a new table. */
- mp->ma_keys = new_keys_object(log2_newsize);
+ mp->ma_keys = new_keys_object(log2_newsize, unicode);
if (mp->ma_keys == NULL) {
mp->ma_keys = oldkeys;
return -1;
}
// New table must be large enough.
assert(mp->ma_keys->dk_usable >= mp->ma_used);
- if (oldkeys->dk_kind == DICT_KEYS_GENERAL)
- mp->ma_keys->dk_kind = DICT_KEYS_GENERAL;
- numentries = mp->ma_used;
- oldentries = DK_ENTRIES(oldkeys);
- newentries = DK_ENTRIES(mp->ma_keys);
- oldvalues = mp->ma_values;
+ Py_ssize_t numentries = mp->ma_used;
+
if (oldvalues != NULL) {
+ PyDictUnicodeEntry *oldentries = DK_UNICODE_ENTRIES(oldkeys);
/* Convert split table into new combined table.
* We must incref keys; we can transfer values.
*/
- for (Py_ssize_t i = 0; i < numentries; i++) {
- int index = get_index_from_order(mp, i);
- PyDictKeyEntry *ep = &oldentries[index];
- assert(oldvalues->values[index] != NULL);
- Py_INCREF(ep->me_key);
- newentries[i].me_key = ep->me_key;
- newentries[i].me_hash = ep->me_hash;
- newentries[i].me_value = oldvalues->values[index];
+ if (mp->ma_keys->dk_kind == DICT_KEYS_GENERAL) {
+ // split -> generic
+ PyDictKeyEntry *newentries = DK_ENTRIES(mp->ma_keys);
+
+ for (Py_ssize_t i = 0; i < numentries; i++) {
+ int index = get_index_from_order(mp, i);
+ PyDictUnicodeEntry *ep = &oldentries[index];
+ assert(oldvalues->values[index] != NULL);
+ Py_INCREF(ep->me_key);
+ newentries[i].me_key = ep->me_key;
+ newentries[i].me_hash = unicode_get_hash(ep->me_key);
+ newentries[i].me_value = oldvalues->values[index];
+ }
+ build_indices_generic(mp->ma_keys, newentries, numentries);
+ }
+ else { // split -> combined unicode
+ PyDictUnicodeEntry *newentries = DK_UNICODE_ENTRIES(mp->ma_keys);
+
+ for (Py_ssize_t i = 0; i < numentries; i++) {
+ int index = get_index_from_order(mp, i);
+ PyDictUnicodeEntry *ep = &oldentries[index];
+ assert(oldvalues->values[index] != NULL);
+ Py_INCREF(ep->me_key);
+ newentries[i].me_key = ep->me_key;
+ newentries[i].me_value = oldvalues->values[index];
+ }
+ build_indices_unicode(mp->ma_keys, newentries, numentries);
}
dictkeys_decref(oldkeys);
mp->ma_values = NULL;
@@ -1276,16 +1482,54 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize)
free_values(oldvalues);
}
}
- else { // combined table.
- if (oldkeys->dk_nentries == numentries) {
- memcpy(newentries, oldentries, numentries * sizeof(PyDictKeyEntry));
+ else { // oldkeys is combined.
+ if (oldkeys->dk_kind == DICT_KEYS_GENERAL) {
+ // generic -> generic
+ assert(mp->ma_keys->dk_kind == DICT_KEYS_GENERAL);
+ PyDictKeyEntry *oldentries = DK_ENTRIES(oldkeys);
+ PyDictKeyEntry *newentries = DK_ENTRIES(mp->ma_keys);
+ if (oldkeys->dk_nentries == numentries) {
+ memcpy(newentries, oldentries, numentries * sizeof(PyDictKeyEntry));
+ }
+ else {
+ PyDictKeyEntry *ep = oldentries;
+ for (Py_ssize_t i = 0; i < numentries; i++) {
+ while (ep->me_value == NULL)
+ ep++;
+ newentries[i] = *ep++;
+ }
+ }
+ build_indices_generic(mp->ma_keys, newentries, numentries);
}
- else {
- PyDictKeyEntry *ep = oldentries;
- for (Py_ssize_t i = 0; i < numentries; i++) {
- while (ep->me_value == NULL)
+ else { // oldkeys is combined unicode
+ PyDictUnicodeEntry *oldentries = DK_UNICODE_ENTRIES(oldkeys);
+ if (unicode) { // combined unicode -> combined unicode
+ PyDictUnicodeEntry *newentries = DK_UNICODE_ENTRIES(mp->ma_keys);
+ if (oldkeys->dk_nentries == numentries && mp->ma_keys->dk_kind == DICT_KEYS_UNICODE) {
+ memcpy(newentries, oldentries, numentries * sizeof(PyDictUnicodeEntry));
+ }
+ else {
+ PyDictUnicodeEntry *ep = oldentries;
+ for (Py_ssize_t i = 0; i < numentries; i++) {
+ while (ep->me_value == NULL)
+ ep++;
+ newentries[i] = *ep++;
+ }
+ }
+ build_indices_unicode(mp->ma_keys, newentries, numentries);
+ }
+ else { // combined unicode -> generic
+ PyDictKeyEntry *newentries = DK_ENTRIES(mp->ma_keys);
+ PyDictUnicodeEntry *ep = oldentries;
+ for (Py_ssize_t i = 0; i < numentries; i++) {
+ while (ep->me_value == NULL)
+ ep++;
+ newentries[i].me_key = ep->me_key;
+ newentries[i].me_hash = unicode_get_hash(ep->me_key);
+ newentries[i].me_value = ep->me_value;
ep++;
- newentries[i] = *ep++;
+ }
+ build_indices_generic(mp->ma_keys, newentries, numentries);
}
}
@@ -1301,6 +1545,7 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize)
assert(state->keys_numfree != -1);
#endif
if (DK_LOG_SIZE(oldkeys) == PyDict_LOG_MINSIZE &&
+ DK_IS_UNICODE(oldkeys) &&
state->keys_numfree < PyDict_MAXFREELIST)
{
state->keys_free_list[state->keys_numfree++] = oldkeys;
@@ -1312,15 +1557,14 @@ dictresize(PyDictObject *mp, uint8_t log2_newsize)
}
}
- build_indices(mp->ma_keys, newentries, numentries);
mp->ma_keys->dk_usable -= numentries;
mp->ma_keys->dk_nentries = numentries;
ASSERT_CONSISTENT(mp);
return 0;
}
-PyObject *
-_PyDict_NewPresized(Py_ssize_t minused)
+static PyObject *
+dict_new_presized(Py_ssize_t minused, bool unicode)
{
const uint8_t log2_max_presize = 17;
const Py_ssize_t max_presize = ((Py_ssize_t)1) << log2_max_presize;
@@ -1341,12 +1585,56 @@ _PyDict_NewPresized(Py_ssize_t minused)
log2_newsize = estimate_log2_keysize(minused);
}
- new_keys = new_keys_object(log2_newsize);
+ new_keys = new_keys_object(log2_newsize, unicode);
if (new_keys == NULL)
return NULL;
return new_dict(new_keys, NULL, 0, 0);
}
+PyObject *
+_PyDict_NewPresized(Py_ssize_t minused)
+{
+ return dict_new_presized(minused, false);
+}
+
+PyObject *
+_PyDict_FromItems(PyObject *const *keys, Py_ssize_t keys_offset,
+ PyObject *const *values, Py_ssize_t values_offset,
+ Py_ssize_t length)
+{
+ bool unicode = true;
+ PyObject *const *ks = keys;
+
+ for (Py_ssize_t i = 0; i < length; i++) {
+ if (!PyUnicode_CheckExact(*ks)) {
+ unicode = false;
+ break;
+ }
+ ks += keys_offset;
+ }
+
+ PyObject *dict = dict_new_presized(length, unicode);
+ if (dict == NULL) {
+ return NULL;
+ }
+
+ ks = keys;
+ PyObject *const *vs = values;
+
+ for (Py_ssize_t i = 0; i < length; i++) {
+ PyObject *key = *ks;
+ PyObject *value = *vs;
+ if (PyDict_SetItem(dict, key, value) < 0) {
+ Py_DECREF(dict);
+ return NULL;
+ }
+ ks += keys_offset;
+ vs += values_offset;
+ }
+
+ return dict;
+}
+
/* Note that, for historical reasons, PyDict_GetItem() suppresses all errors
* that may occur (originally dicts supported only string keys, and exceptions
* weren't possible). So, while the original intent was that a NULL return
@@ -1366,9 +1654,7 @@ PyDict_GetItem(PyObject *op, PyObject *key)
PyDictObject *mp = (PyDictObject *)op;
Py_hash_t hash;
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1)
- {
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1) {
hash = PyObject_Hash(key);
if (hash == -1) {
PyErr_Clear();
@@ -1410,23 +1696,41 @@ _PyDict_GetItemHint(PyDictObject *mp, PyObject *key,
if (hint >= 0 && hint < mp->ma_keys->dk_nentries) {
PyObject *res = NULL;
- PyDictKeyEntry *ep = DK_ENTRIES(mp->ma_keys) + (size_t)hint;
- if (ep->me_key == key) {
- if (mp->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
- assert(mp->ma_values != NULL);
- res = mp->ma_values->values[(size_t)hint];
- }
- else {
- res = ep->me_value;
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(mp->ma_keys) + (size_t)hint;
+ if (ep->me_key == key) {
+ if (mp->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
+ assert(mp->ma_values != NULL);
+ res = mp->ma_values->values[(size_t)hint];
+ }
+ else {
+ res = ep->me_value;
+ }
+ if (res != NULL) {
+ *value = res;
+ return hint;
+ }
}
- if (res != NULL) {
- *value = res;
- return hint;
+ }
+ else {
+ PyDictKeyEntry *ep = DK_ENTRIES(mp->ma_keys) + (size_t)hint;
+ if (ep->me_key == key) {
+ if (mp->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
+ assert(mp->ma_values != NULL);
+ res = mp->ma_values->values[(size_t)hint];
+ }
+ else {
+ res = ep->me_value;
+ }
+ if (res != NULL) {
+ *value = res;
+ return hint;
+ }
}
}
}
- Py_hash_t hash = ((PyASCIIObject *) key)->hash;
+ Py_hash_t hash = unicode_get_hash(key);
if (hash == -1) {
hash = PyObject_Hash(key);
if (hash == -1) {
@@ -1474,8 +1778,7 @@ PyDict_GetItemWithError(PyObject *op, PyObject *key)
PyErr_BadInternalCall();
return NULL;
}
- if (!PyUnicode_CheckExact(key) ||
- (hash = ((PyASCIIObject *) key)->hash) == -1)
+ if (!PyUnicode_CheckExact(key) || (hash = unicode_get_hash(key)) == -1)
{
hash = PyObject_Hash(key);
if (hash == -1) {
@@ -1506,7 +1809,7 @@ _PyDict_GetItemIdWithError(PyObject *dp, _Py_Identifier *key)
kv = _PyUnicode_FromId(key); /* borrowed */
if (kv == NULL)
return NULL;
- Py_hash_t hash = ((PyASCIIObject *) kv)->hash;
+ Py_hash_t hash = unicode_get_hash(kv);
assert (hash != -1); /* interned strings have their hash value initialised */
return _PyDict_GetItem_KnownHash(dp, kv, hash);
}
@@ -1652,14 +1955,12 @@ delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix,
PyObject *old_value)
{
PyObject *old_key;
- PyDictKeyEntry *ep;
Py_ssize_t hashpos = lookdict_index(mp->ma_keys, hash, ix);
assert(hashpos >= 0);
mp->ma_used--;
mp->ma_version_tag = DICT_NEXT_VERSION();
- ep = &DK_ENTRIES(mp->ma_keys)[ix];
if (mp->ma_values) {
assert(old_value == mp->ma_values->values[ix]);
mp->ma_values->values[ix] = NULL;
@@ -1671,9 +1972,19 @@ delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix,
else {
mp->ma_keys->dk_version = 0;
dictkeys_set_index(mp->ma_keys, hashpos, DKIX_DUMMY);
- old_key = ep->me_key;
- ep->me_key = NULL;
- ep->me_value = NULL;
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[ix];
+ old_key = ep->me_key;
+ ep->me_key = NULL;
+ ep->me_value = NULL;
+ }
+ else {
+ PyDictKeyEntry *ep = &DK_ENTRIES(mp->ma_keys)[ix];
+ old_key = ep->me_key;
+ ep->me_key = NULL;
+ ep->me_value = NULL;
+ ep->me_hash = 0;
+ }
Py_DECREF(old_key);
}
Py_DECREF(old_value);
@@ -1814,8 +2125,8 @@ _PyDict_Next(PyObject *op, Py_ssize_t *ppos, PyObject **pkey,
{
Py_ssize_t i;
PyDictObject *mp;
- PyDictKeyEntry *entry_ptr;
- PyObject *value;
+ PyObject *key, *value;
+ Py_hash_t hash;
if (!PyDict_Check(op))
return 0;
@@ -1826,30 +2137,48 @@ _PyDict_Next(PyObject *op, Py_ssize_t *ppos, PyObject **pkey,
if (i < 0 || i >= mp->ma_used)
return 0;
int index = get_index_from_order(mp, i);
- entry_ptr = &DK_ENTRIES(mp->ma_keys)[index];
value = mp->ma_values->values[index];
+
+ key = DK_UNICODE_ENTRIES(mp->ma_keys)[index].me_key;
+ hash = unicode_get_hash(key);
assert(value != NULL);
}
else {
Py_ssize_t n = mp->ma_keys->dk_nentries;
if (i < 0 || i >= n)
return 0;
- entry_ptr = &DK_ENTRIES(mp->ma_keys)[i];
- while (i < n && entry_ptr->me_value == NULL) {
- entry_ptr++;
- i++;
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ PyDictUnicodeEntry *entry_ptr = &DK_UNICODE_ENTRIES(mp->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ return 0;
+ key = entry_ptr->me_key;
+ hash = unicode_get_hash(entry_ptr->me_key);
+ value = entry_ptr->me_value;
+ }
+ else {
+ PyDictKeyEntry *entry_ptr = &DK_ENTRIES(mp->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ return 0;
+ key = entry_ptr->me_key;
+ hash = entry_ptr->me_hash;
+ value = entry_ptr->me_value;
}
- if (i >= n)
- return 0;
- value = entry_ptr->me_value;
}
*ppos = i+1;
if (pkey)
- *pkey = entry_ptr->me_key;
- if (phash)
- *phash = entry_ptr->me_hash;
+ *pkey = key;
if (pvalue)
*pvalue = value;
+ if (phash)
+ *phash = hash;
return 1;
}
@@ -1958,7 +2287,8 @@ _PyDict_FromKeys(PyObject *cls, PyObject *iterable, PyObject *value)
PyObject *key;
Py_hash_t hash;
- if (dictresize(mp, estimate_log2_keysize(PyDict_GET_SIZE(iterable)))) {
+ int unicode = DK_IS_UNICODE(((PyDictObject*)iterable)->ma_keys);
+ if (dictresize(mp, estimate_log2_keysize(PyDict_GET_SIZE(iterable)), unicode)) {
Py_DECREF(d);
return NULL;
}
@@ -1979,7 +2309,7 @@ _PyDict_FromKeys(PyObject *cls, PyObject *iterable, PyObject *value)
PyObject *key;
Py_hash_t hash;
- if (dictresize(mp, estimate_log2_keysize(PySet_GET_SIZE(iterable)))) {
+ if (dictresize(mp, estimate_log2_keysize(PySet_GET_SIZE(iterable)), 0)) {
Py_DECREF(d);
return NULL;
}
@@ -2217,10 +2547,7 @@ static PyObject *
dict_keys(PyDictObject *mp)
{
PyObject *v;
- Py_ssize_t i, j;
- PyDictKeyEntry *ep;
- Py_ssize_t n, offset;
- PyObject **value_ptr;
+ Py_ssize_t n;
again:
n = mp->ma_used;
@@ -2234,23 +2561,15 @@ dict_keys(PyDictObject *mp)
Py_DECREF(v);
goto again;
}
- ep = DK_ENTRIES(mp->ma_keys);
- if (mp->ma_values) {
- value_ptr = mp->ma_values->values;
- offset = sizeof(PyObject *);
- }
- else {
- value_ptr = &ep[0].me_value;
- offset = sizeof(PyDictKeyEntry);
- }
- for (i = 0, j = 0; j < n; i++) {
- if (*value_ptr != NULL) {
- PyObject *key = ep[i].me_key;
- Py_INCREF(key);
- PyList_SET_ITEM(v, j, key);
- j++;
- }
- value_ptr = (PyObject **)(((char *)value_ptr) + offset);
+
+ /* Nothing we do below makes any function calls. */
+ Py_ssize_t j = 0, pos = 0;
+ PyObject *key;
+ while (_PyDict_Next((PyObject*)mp, &pos, &key, NULL, NULL)) {
+ assert(j < n);
+ Py_INCREF(key);
+ PyList_SET_ITEM(v, j, key);
+ j++;
}
assert(j == n);
return v;
@@ -2260,10 +2579,7 @@ static PyObject *
dict_values(PyDictObject *mp)
{
PyObject *v;
- Py_ssize_t i, j;
- PyDictKeyEntry *ep;
- Py_ssize_t n, offset;
- PyObject **value_ptr;
+ Py_ssize_t n;
again:
n = mp->ma_used;
@@ -2277,23 +2593,15 @@ dict_values(PyDictObject *mp)
Py_DECREF(v);
goto again;
}
- ep = DK_ENTRIES(mp->ma_keys);
- if (mp->ma_values) {
- value_ptr = mp->ma_values->values;
- offset = sizeof(PyObject *);
- }
- else {
- value_ptr = &ep[0].me_value;
- offset = sizeof(PyDictKeyEntry);
- }
- for (i = 0, j = 0; j < n; i++) {
- PyObject *value = *value_ptr;
- value_ptr = (PyObject **)(((char *)value_ptr) + offset);
- if (value != NULL) {
- Py_INCREF(value);
- PyList_SET_ITEM(v, j, value);
- j++;
- }
+
+ /* Nothing we do below makes any function calls. */
+ Py_ssize_t j = 0, pos = 0;
+ PyObject *value;
+ while (_PyDict_Next((PyObject*)mp, &pos, NULL, &value, NULL)) {
+ assert(j < n);
+ Py_INCREF(value);
+ PyList_SET_ITEM(v, j, value);
+ j++;
}
assert(j == n);
return v;
@@ -2303,11 +2611,8 @@ static PyObject *
dict_items(PyDictObject *mp)
{
PyObject *v;
- Py_ssize_t i, j, n;
- Py_ssize_t offset;
- PyObject *item, *key;
- PyDictKeyEntry *ep;
- PyObject **value_ptr;
+ Py_ssize_t i, n;
+ PyObject *item;
/* Preallocate the list of tuples, to avoid allocations during
* the loop over the items, which could trigger GC, which
@@ -2333,28 +2638,18 @@ dict_items(PyDictObject *mp)
Py_DECREF(v);
goto again;
}
+
/* Nothing we do below makes any function calls. */
- ep = DK_ENTRIES(mp->ma_keys);
- if (mp->ma_values) {
- value_ptr = mp->ma_values->values;
- offset = sizeof(PyObject *);
- }
- else {
- value_ptr = &ep[0].me_value;
- offset = sizeof(PyDictKeyEntry);
- }
- for (i = 0, j = 0; j < n; i++) {
- PyObject *value = *value_ptr;
- value_ptr = (PyObject **)(((char *)value_ptr) + offset);
- if (value != NULL) {
- key = ep[i].me_key;
- item = PyList_GET_ITEM(v, j);
- Py_INCREF(key);
- PyTuple_SET_ITEM(item, 0, key);
- Py_INCREF(value);
- PyTuple_SET_ITEM(item, 1, value);
- j++;
- }
+ Py_ssize_t j = 0, pos = 0;
+ PyObject *key, *value;
+ while (_PyDict_Next((PyObject*)mp, &pos, &key, &value, NULL)) {
+ assert(j < n);
+ PyObject *item = PyList_GET_ITEM(v, j);
+ Py_INCREF(key);
+ PyTuple_SET_ITEM(item, 0, key);
+ Py_INCREF(value);
+ PyTuple_SET_ITEM(item, 1, value);
+ j++;
}
assert(j == n);
return v;
@@ -2528,8 +2823,6 @@ static int
dict_merge(PyObject *a, PyObject *b, int override)
{
PyDictObject *mp, *other;
- Py_ssize_t i, n;
- PyDictKeyEntry *entry, *ep0;
assert(0 <= override && override <= 2);
@@ -2592,58 +2885,52 @@ dict_merge(PyObject *a, PyObject *b, int override)
* that there will be no (or few) overlapping keys.
*/
if (USABLE_FRACTION(DK_SIZE(mp->ma_keys)) < other->ma_used) {
- if (dictresize(mp, estimate_log2_keysize(mp->ma_used + other->ma_used))) {
+ int unicode = DK_IS_UNICODE(other->ma_keys);
+ if (dictresize(mp, estimate_log2_keysize(mp->ma_used + other->ma_used), unicode)) {
return -1;
}
}
- ep0 = DK_ENTRIES(other->ma_keys);
- for (i = 0, n = other->ma_keys->dk_nentries; i < n; i++) {
- PyObject *key, *value;
- Py_hash_t hash;
- entry = &ep0[i];
- key = entry->me_key;
- hash = entry->me_hash;
- if (other->ma_values)
- value = other->ma_values->values[i];
- else
- value = entry->me_value;
- if (value != NULL) {
- int err = 0;
+ Py_ssize_t orig_size = other->ma_keys->dk_nentries;
+ Py_ssize_t pos = 0;
+ Py_hash_t hash;
+ PyObject *key, *value;
+
+ while (_PyDict_Next((PyObject*)other, &pos, &key, &value, &hash)) {
+ int err = 0;
+ Py_INCREF(key);
+ Py_INCREF(value);
+ if (override == 1) {
Py_INCREF(key);
Py_INCREF(value);
- if (override == 1) {
+ err = insertdict(mp, key, hash, value);
+ }
+ else {
+ err = _PyDict_Contains_KnownHash(a, key, hash);
+ if (err == 0) {
Py_INCREF(key);
Py_INCREF(value);
err = insertdict(mp, key, hash, value);
}
- else {
- err = _PyDict_Contains_KnownHash(a, key, hash);
- if (err == 0) {
- Py_INCREF(key);
- Py_INCREF(value);
- err = insertdict(mp, key, hash, value);
- }
- else if (err > 0) {
- if (override != 0) {
- _PyErr_SetKeyError(key);
- Py_DECREF(value);
- Py_DECREF(key);
- return -1;
- }
- err = 0;
+ else if (err > 0) {
+ if (override != 0) {
+ _PyErr_SetKeyError(key);
+ Py_DECREF(value);
+ Py_DECREF(key);
+ return -1;
}
+ err = 0;
}
- Py_DECREF(value);
- Py_DECREF(key);
- if (err != 0)
- return -1;
+ }
+ Py_DECREF(value);
+ Py_DECREF(key);
+ if (err != 0)
+ return -1;
- if (n != other->ma_keys->dk_nentries) {
- PyErr_SetString(PyExc_RuntimeError,
- "dict mutated during update");
- return -1;
- }
+ if (orig_size != other->ma_keys->dk_nentries) {
+ PyErr_SetString(PyExc_RuntimeError,
+ "dict mutated during update");
+ return -1;
}
}
}
@@ -2880,23 +3167,36 @@ dict_equal(PyDictObject *a, PyDictObject *b)
return 0;
/* Same # of entries -- check all of 'em. Exit early on any diff. */
for (i = 0; i < a->ma_keys->dk_nentries; i++) {
- PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
- PyObject *aval;
- if (a->ma_values)
- aval = a->ma_values->values[i];
- else
+ PyObject *key, *aval;
+ Py_hash_t hash;
+ if (DK_IS_UNICODE(a->ma_keys)) {
+ PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(a->ma_keys)[i];
+ key = ep->me_key;
+ if (key == NULL) {
+ continue;
+ }
+ hash = unicode_get_hash(key);
+ if (a->ma_values)
+ aval = a->ma_values->values[i];
+ else
+ aval = ep->me_value;
+ }
+ else {
+ PyDictKeyEntry *ep = &DK_ENTRIES(a->ma_keys)[i];
+ key = ep->me_key;
aval = ep->me_value;
+ hash = ep->me_hash;
+ }
if (aval != NULL) {
int cmp;
PyObject *bval;
- PyObject *key = ep->me_key;
/* temporarily bump aval's refcount to ensure it stays
alive until we're done with it */
Py_INCREF(aval);
/* ditto for key */
Py_INCREF(key);
/* reuse the known hash value */
- _Py_dict_lookup(b, key, ep->me_hash, &bval);
+ _Py_dict_lookup(b, key, hash, &bval);
if (bval == NULL) {
Py_DECREF(key);
Py_DECREF(aval);
@@ -3033,9 +3333,10 @@ PyDict_SetDefault(PyObject *d, PyObject *key, PyObject *defaultobj)
return defaultobj;
}
- if (mp->ma_values != NULL && !PyUnicode_CheckExact(key)) {
- if (insertion_resize(mp) < 0)
+ if (!PyUnicode_CheckExact(key) && DK_IS_UNICODE(mp->ma_keys)) {
+ if (insertion_resize(mp, 0) < 0) {
return NULL;
+ }
}
Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &value);
@@ -3044,35 +3345,38 @@ PyDict_SetDefault(PyObject *d, PyObject *key, PyObject *defaultobj)
if (ix == DKIX_EMPTY) {
mp->ma_keys->dk_version = 0;
- PyDictKeyEntry *ep, *ep0;
value = defaultobj;
if (mp->ma_keys->dk_usable <= 0) {
- if (insertion_resize(mp) < 0) {
+ if (insertion_resize(mp, 1) < 0) {
return NULL;
}
}
- if (!PyUnicode_CheckExact(key) && mp->ma_keys->dk_kind != DICT_KEYS_GENERAL) {
- mp->ma_keys->dk_kind = DICT_KEYS_GENERAL;
- }
Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
- ep0 = DK_ENTRIES(mp->ma_keys);
- ep = &ep0[mp->ma_keys->dk_nentries];
dictkeys_set_index(mp->ma_keys, hashpos, mp->ma_keys->dk_nentries);
- Py_INCREF(key);
- Py_INCREF(value);
- MAINTAIN_TRACKING(mp, key, value);
- ep->me_key = key;
- ep->me_hash = hash;
- if (_PyDict_HasSplitTable(mp)) {
- Py_ssize_t index = (int)mp->ma_keys->dk_nentries;
- assert(index < SHARED_KEYS_MAX_SIZE);
- assert(mp->ma_values->values[index] == NULL);
- mp->ma_values->values[index] = value;
- _PyDictValues_AddToInsertionOrder(mp->ma_values, index);
+ if (DK_IS_UNICODE(mp->ma_keys)) {
+ assert(PyUnicode_CheckExact(key));
+ PyDictUnicodeEntry *ep = &DK_UNICODE_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
+ ep->me_key = key;
+ if (_PyDict_HasSplitTable(mp)) {
+ Py_ssize_t index = (int)mp->ma_keys->dk_nentries;
+ assert(index < SHARED_KEYS_MAX_SIZE);
+ assert(mp->ma_values->values[index] == NULL);
+ mp->ma_values->values[index] = value;
+ _PyDictValues_AddToInsertionOrder(mp->ma_values, index);
+ }
+ else {
+ ep->me_value = value;
+ }
}
else {
+ PyDictKeyEntry *ep = &DK_ENTRIES(mp->ma_keys)[mp->ma_keys->dk_nentries];
+ ep->me_key = key;
+ ep->me_hash = hash;
ep->me_value = value;
}
+ Py_INCREF(key);
+ Py_INCREF(value);
+ MAINTAIN_TRACKING(mp, key, value);
mp->ma_used++;
mp->ma_version_tag = DICT_NEXT_VERSION();
mp->ma_keys->dk_usable--;
@@ -3160,7 +3464,6 @@ dict_popitem_impl(PyDictObject *self)
/*[clinic end generated code: output=e65fcb04420d230d input=1c38a49f21f64941]*/
{
Py_ssize_t i, j;
- PyDictKeyEntry *ep0, *ep;
PyObject *res;
/* Allocate the result tuple before checking the size. Believe it
@@ -3182,7 +3485,7 @@ dict_popitem_impl(PyDictObject *self)
}
/* Convert split table to combined table */
if (self->ma_keys->dk_kind == DICT_KEYS_SPLIT) {
- if (dictresize(self, DK_LOG_SIZE(self->ma_keys))) {
+ if (dictresize(self, DK_LOG_SIZE(self->ma_keys), 1)) {
Py_DECREF(res);
return NULL;
}
@@ -3190,23 +3493,45 @@ dict_popitem_impl(PyDictObject *self)
self->ma_keys->dk_version = 0;
/* Pop last item */
- ep0 = DK_ENTRIES(self->ma_keys);
- i = self->ma_keys->dk_nentries - 1;
- while (i >= 0 && ep0[i].me_value == NULL) {
- i--;
+ PyObject *key, *value;
+ Py_hash_t hash;
+ if (DK_IS_UNICODE(self->ma_keys)) {
+ PyDictUnicodeEntry *ep0 = DK_UNICODE_ENTRIES(self->ma_keys);
+ i = self->ma_keys->dk_nentries - 1;
+ while (i >= 0 && ep0[i].me_value == NULL) {
+ i--;
+ }
+ assert(i >= 0);
+
+ key = ep0[i].me_key;
+ hash = unicode_get_hash(key);
+ value = ep0[i].me_value;
+ ep0[i].me_key = NULL;
+ ep0[i].me_value = NULL;
}
- assert(i >= 0);
+ else {
+ PyDictKeyEntry *ep0 = DK_ENTRIES(self->ma_keys);
+ i = self->ma_keys->dk_nentries - 1;
+ while (i >= 0 && ep0[i].me_value == NULL) {
+ i--;
+ }
+ assert(i >= 0);
- ep = &ep0[i];
- j = lookdict_index(self->ma_keys, ep->me_hash, i);
+ key = ep0[i].me_key;
+ hash = ep0[i].me_hash;
+ value = ep0[i].me_value;
+ ep0[i].me_key = NULL;
+ ep0[i].me_hash = -1;
+ ep0[i].me_value = NULL;
+ }
+
+ j = lookdict_index(self->ma_keys, hash, i);
assert(j >= 0);
assert(dictkeys_get_index(self->ma_keys, j) == i);
dictkeys_set_index(self->ma_keys, j, DKIX_DUMMY);
- PyTuple_SET_ITEM(res, 0, ep->me_key);
- PyTuple_SET_ITEM(res, 1, ep->me_value);
- ep->me_key = NULL;
- ep->me_value = NULL;
+ PyTuple_SET_ITEM(res, 0, key);
+ PyTuple_SET_ITEM(res, 1, value);
/* We can't dk_usable++ since there is DKIX_DUMMY in indices */
self->ma_keys->dk_nentries = i;
self->ma_used--;
@@ -3220,29 +3545,30 @@ dict_traverse(PyObject *op, visitproc visit, void *arg)
{
PyDictObject *mp = (PyDictObject *)op;
PyDictKeysObject *keys = mp->ma_keys;
- PyDictKeyEntry *entries = DK_ENTRIES(keys);
Py_ssize_t i, n = keys->dk_nentries;
- if (keys->dk_kind == DICT_KEYS_GENERAL) {
- for (i = 0; i < n; i++) {
- if (entries[i].me_value != NULL) {
- Py_VISIT(entries[i].me_value);
- Py_VISIT(entries[i].me_key);
- }
- }
- }
- else {
+ if (DK_IS_UNICODE(keys)) {
if (mp->ma_values != NULL) {
for (i = 0; i < n; i++) {
Py_VISIT(mp->ma_values->values[i]);
}
}
else {
+ PyDictUnicodeEntry *entries = DK_UNICODE_ENTRIES(keys);
for (i = 0; i < n; i++) {
Py_VISIT(entries[i].me_value);
}
}
}
+ else {
+ PyDictKeyEntry *entries = DK_ENTRIES(keys);
+ for (i = 0; i < n; i++) {
+ if (entries[i].me_value != NULL) {
+ Py_VISIT(entries[i].me_value);
+ Py_VISIT(entries[i].me_key);
+ }
+ }
+ }
return 0;
}
@@ -3258,9 +3584,7 @@ static PyObject *dictiter_new(PyDictObject *, PyTypeObject *);
Py_ssize_t
_PyDict_SizeOf(PyDictObject *mp)
{
- Py_ssize_t size, res;
-
- size = DK_SIZE(mp->ma_keys);
+ Py_ssize_t res;
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values) {
@@ -3269,10 +3593,7 @@ _PyDict_SizeOf(PyDictObject *mp)
/* If the dictionary is split, the keys portion is accounted-for
in the type object. */
if (mp->ma_keys->dk_refcnt == 1) {
- Py_ssize_t usable = USABLE_FRACTION(size);
- res += (sizeof(PyDictKeysObject)
- + DK_IXSIZE(mp->ma_keys) * size
- + sizeof(PyDictKeyEntry) * usable);
+ res += _PyDict_KeysSize(mp->ma_keys);
}
return res;
}
@@ -3280,9 +3601,11 @@ _PyDict_SizeOf(PyDictObject *mp)
Py_ssize_t
_PyDict_KeysSize(PyDictKeysObject *keys)
{
+ size_t es = keys->dk_kind == DICT_KEYS_GENERAL
+ ? sizeof(PyDictKeyEntry) : sizeof(PyDictUnicodeEntry);
return (sizeof(PyDictKeysObject)
- + DK_IXSIZE(keys) * DK_SIZE(keys)
- + USABLE_FRACTION(DK_SIZE(keys)) * sizeof(PyDictKeyEntry));
+ + ((size_t)1 << keys->dk_log2_index_bytes)
+ + USABLE_FRACTION(DK_SIZE(keys)) * es);
}
static PyObject *
@@ -3754,19 +4077,31 @@ dictiter_iternextkey(dictiterobject *di)
if (i >= d->ma_used)
goto fail;
int index = get_index_from_order(d, i);
- key = DK_ENTRIES(k)[index].me_key;
+ key = DK_UNICODE_ENTRIES(k)[index].me_key;
assert(d->ma_values->values[index] != NULL);
}
else {
Py_ssize_t n = k->dk_nentries;
- PyDictKeyEntry *entry_ptr = &DK_ENTRIES(k)[i];
- while (i < n && entry_ptr->me_value == NULL) {
- entry_ptr++;
- i++;
+ if (DK_IS_UNICODE(k)) {
+ PyDictUnicodeEntry *entry_ptr = &DK_UNICODE_ENTRIES(k)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ key = entry_ptr->me_key;
+ }
+ else {
+ PyDictKeyEntry *entry_ptr = &DK_ENTRIES(k)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ key = entry_ptr->me_key;
}
- if (i >= n)
- goto fail;
- key = entry_ptr->me_key;
}
// We found an element (key), but did not expect it
if (di->len == 0) {
@@ -3847,14 +4182,26 @@ dictiter_iternextvalue(dictiterobject *di)
}
else {
Py_ssize_t n = d->ma_keys->dk_nentries;
- PyDictKeyEntry *entry_ptr = &DK_ENTRIES(d->ma_keys)[i];
- while (i < n && entry_ptr->me_value == NULL) {
- entry_ptr++;
- i++;
+ if (DK_IS_UNICODE(d->ma_keys)) {
+ PyDictUnicodeEntry *entry_ptr = &DK_UNICODE_ENTRIES(d->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ value = entry_ptr->me_value;
+ }
+ else {
+ PyDictKeyEntry *entry_ptr = &DK_ENTRIES(d->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ value = entry_ptr->me_value;
}
- if (i >= n)
- goto fail;
- value = entry_ptr->me_value;
}
// We found an element, but did not expect it
if (di->len == 0) {
@@ -3930,21 +4277,34 @@ dictiter_iternextitem(dictiterobject *di)
if (i >= d->ma_used)
goto fail;
int index = get_index_from_order(d, i);
- key = DK_ENTRIES(d->ma_keys)[index].me_key;
+ key = DK_UNICODE_ENTRIES(d->ma_keys)[index].me_key;
value = d->ma_values->values[index];
assert(value != NULL);
}
else {
Py_ssize_t n = d->ma_keys->dk_nentries;
- PyDictKeyEntry *entry_ptr = &DK_ENTRIES(d->ma_keys)[i];
- while (i < n && entry_ptr->me_value == NULL) {
- entry_ptr++;
- i++;
+ if (DK_IS_UNICODE(d->ma_keys)) {
+ PyDictUnicodeEntry *entry_ptr = &DK_UNICODE_ENTRIES(d->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ key = entry_ptr->me_key;
+ value = entry_ptr->me_value;
+ }
+ else {
+ PyDictKeyEntry *entry_ptr = &DK_ENTRIES(d->ma_keys)[i];
+ while (i < n && entry_ptr->me_value == NULL) {
+ entry_ptr++;
+ i++;
+ }
+ if (i >= n)
+ goto fail;
+ key = entry_ptr->me_key;
+ value = entry_ptr->me_value;
}
- if (i >= n)
- goto fail;
- key = entry_ptr->me_key;
- value = entry_ptr->me_value;
}
// We found an element, but did not expect it
if (di->len == 0) {
@@ -4048,20 +4408,33 @@ dictreviter_iternext(dictiterobject *di)
}
if (d->ma_values) {
int index = get_index_from_order(d, i);
- key = DK_ENTRIES(k)[index].me_key;
+ key = DK_UNICODE_ENTRIES(k)[index].me_key;
value = d->ma_values->values[index];
assert (value != NULL);
}
else {
- PyDictKeyEntry *entry_ptr = &DK_ENTRIES(k)[i];
- while (entry_ptr->me_value == NULL) {
- if (--i < 0) {
- goto fail;
+ if (DK_IS_UNICODE(k)) {
+ PyDictUnicodeEntry *entry_ptr = &DK_UNICODE_ENTRIES(k)[i];
+ while (entry_ptr->me_value == NULL) {
+ if (--i < 0) {
+ goto fail;
+ }
+ entry_ptr--;
+ }
+ key = entry_ptr->me_key;
+ value = entry_ptr->me_value;
+ }
+ else {
+ PyDictKeyEntry *entry_ptr = &DK_ENTRIES(k)[i];
+ while (entry_ptr->me_value == NULL) {
+ if (--i < 0) {
+ goto fail;
+ }
+ entry_ptr--;
}
- entry_ptr--;
+ key = entry_ptr->me_key;
+ value = entry_ptr->me_value;
}
- key = entry_ptr->me_key;
- value = entry_ptr->me_value;
}
di->di_pos = i-1;
di->len--;
@@ -4970,7 +5343,7 @@ dictvalues_reversed(_PyDictViewObject *dv, PyObject *Py_UNUSED(ignored))
PyDictKeysObject *
_PyDict_NewKeysForClass(void)
{
- PyDictKeysObject *keys = new_keys_object(NEXT_LOG2_SHARED_KEYS_MAX_SIZE);
+ PyDictKeysObject *keys = new_keys_object(NEXT_LOG2_SHARED_KEYS_MAX_SIZE, 1);
if (keys == NULL) {
PyErr_Clear();
}
diff --git a/Python/ceval.c b/Python/ceval.c
index b3673d7d04ab2..e47e0521ea941 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -1457,7 +1457,7 @@ eval_frame_handle_pending(PyThreadState *tstate)
LOAD_##attr_or_method); \
assert(dict->ma_keys->dk_kind == DICT_KEYS_UNICODE); \
assert(cache0->index < dict->ma_keys->dk_nentries); \
- PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + cache0->index; \
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(dict->ma_keys) + cache0->index; \
res = ep->me_value; \
DEOPT_IF(res == NULL, LOAD_##attr_or_method); \
STAT_INC(LOAD_##attr_or_method, hit); \
@@ -1595,6 +1595,19 @@ is_method(PyObject **stack_pointer, int args) {
return PEEK(args+2) != NULL;
}
+static PyObject*
+dictkeys_get_value_by_index(PyDictKeysObject *dk, int index)
+{
+ if (DK_IS_UNICODE(dk)) {
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(dk) + index;
+ return ep->me_value;
+ }
+ else {
+ PyDictKeyEntry *ep = DK_ENTRIES(dk) + index;
+ return ep->me_value;
+ }
+}
+
#define KWNAMES_LEN() \
(call_shape.kwnames == NULL ? 0 : ((int)PyTuple_GET_SIZE(call_shape.kwnames)))
@@ -3030,8 +3043,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
_PyLoadGlobalCache *cache = (_PyLoadGlobalCache *)next_instr;
uint32_t version = read32(&cache->module_keys_version);
DEOPT_IF(dict->ma_keys->dk_version != version, LOAD_GLOBAL);
- PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + cache->index;
- PyObject *res = ep->me_value;
+ PyObject *res = dictkeys_get_value_by_index(dict->ma_keys, cache->index);
DEOPT_IF(res == NULL, LOAD_GLOBAL);
JUMPBY(INLINE_CACHE_ENTRIES_LOAD_GLOBAL);
STAT_INC(LOAD_GLOBAL, hit);
@@ -3051,8 +3063,7 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
uint16_t bltn_version = cache->builtin_keys_version;
DEOPT_IF(mdict->ma_keys->dk_version != mod_version, LOAD_GLOBAL);
DEOPT_IF(bdict->ma_keys->dk_version != bltn_version, LOAD_GLOBAL);
- PyDictKeyEntry *ep = DK_ENTRIES(bdict->ma_keys) + cache->index;
- PyObject *res = ep->me_value;
+ PyObject *res = dictkeys_get_value_by_index(bdict->ma_keys, cache->index);
DEOPT_IF(res == NULL, LOAD_GLOBAL);
JUMPBY(INLINE_CACHE_ENTRIES_LOAD_GLOBAL);
STAT_INC(LOAD_GLOBAL, hit);
@@ -3272,20 +3283,12 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
}
TARGET(BUILD_MAP) {
- Py_ssize_t i;
- PyObject *map = _PyDict_NewPresized((Py_ssize_t)oparg);
+ PyObject *map = _PyDict_FromItems(
+ &PEEK(2*oparg), 2,
+ &PEEK(2*oparg - 1), 2,
+ oparg);
if (map == NULL)
goto error;
- for (i = oparg; i > 0; i--) {
- int err;
- PyObject *key = PEEK(2*i);
- PyObject *value = PEEK(2*i - 1);
- err = PyDict_SetItem(map, key, value);
- if (err != 0) {
- Py_DECREF(map);
- goto error;
- }
- }
while (oparg--) {
Py_DECREF(POP());
@@ -3351,7 +3354,6 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
}
TARGET(BUILD_CONST_KEY_MAP) {
- Py_ssize_t i;
PyObject *map;
PyObject *keys = TOP();
if (!PyTuple_CheckExact(keys) ||
@@ -3360,20 +3362,12 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
"bad BUILD_CONST_KEY_MAP keys argument");
goto error;
}
- map = _PyDict_NewPresized((Py_ssize_t)oparg);
+ map = _PyDict_FromItems(
+ &PyTuple_GET_ITEM(keys, 0), 1,
+ &PEEK(oparg + 1), 1, oparg);
if (map == NULL) {
goto error;
}
- for (i = oparg; i > 0; i--) {
- int err;
- PyObject *key = PyTuple_GET_ITEM(keys, oparg - i);
- PyObject *value = PEEK(i + 1);
- err = PyDict_SetItem(map, key, value);
- if (err != 0) {
- Py_DECREF(map);
- goto error;
- }
- }
Py_DECREF(POP());
while (oparg--) {
@@ -3538,9 +3532,16 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
PyObject *name = GETITEM(names, cache0->original_oparg);
uint16_t hint = cache0->index;
DEOPT_IF(hint >= (size_t)dict->ma_keys->dk_nentries, LOAD_ATTR);
- PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + hint;
- DEOPT_IF(ep->me_key != name, LOAD_ATTR);
- res = ep->me_value;
+ if (DK_IS_UNICODE(dict->ma_keys)) {
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(dict->ma_keys) + hint;
+ DEOPT_IF(ep->me_key != name, LOAD_ATTR);
+ res = ep->me_value;
+ }
+ else {
+ PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + hint;
+ DEOPT_IF(ep->me_key != name, LOAD_ATTR);
+ res = ep->me_value;
+ }
DEOPT_IF(res == NULL, LOAD_ATTR);
STAT_INC(LOAD_ATTR, hit);
Py_INCREF(res);
@@ -3630,15 +3631,27 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, _PyInterpreterFrame *frame, int
PyObject *name = GETITEM(names, cache0->original_oparg);
uint16_t hint = cache0->index;
DEOPT_IF(hint >= (size_t)dict->ma_keys->dk_nentries, STORE_ATTR);
- PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + hint;
- DEOPT_IF(ep->me_key != name, STORE_ATTR);
- PyObject *old_value = ep->me_value;
- DEOPT_IF(old_value == NULL, STORE_ATTR);
- STAT_INC(STORE_ATTR, hit);
- STACK_SHRINK(1);
- PyObject *value = POP();
- ep->me_value = value;
+ PyObject *value, *old_value;
+ if (DK_IS_UNICODE(dict->ma_keys)) {
+ PyDictUnicodeEntry *ep = DK_UNICODE_ENTRIES(dict->ma_keys) + hint;
+ DEOPT_IF(ep->me_key != name, STORE_ATTR);
+ old_value = ep->me_value;
+ DEOPT_IF(old_value == NULL, STORE_ATTR);
+ STACK_SHRINK(1);
+ value = POP();
+ ep->me_value = value;
+ }
+ else {
+ PyDictKeyEntry *ep = DK_ENTRIES(dict->ma_keys) + hint;
+ DEOPT_IF(ep->me_key != name, STORE_ATTR);
+ old_value = ep->me_value;
+ DEOPT_IF(old_value == NULL, STORE_ATTR);
+ STACK_SHRINK(1);
+ value = POP();
+ ep->me_value = value;
+ }
Py_DECREF(old_value);
+ STAT_INC(STORE_ATTR, hit);
/* Ensure dict is GC tracked if it needs to be */
if (!_PyObject_GC_IS_TRACKED(dict) && _PyObject_GC_MAY_BE_TRACKED(value)) {
_PyObject_GC_TRACK(dict);
diff --git a/Tools/gdb/libpython.py b/Tools/gdb/libpython.py
index e3d73bce6cfe5..8b227e61082be 100755
--- a/Tools/gdb/libpython.py
+++ b/Tools/gdb/libpython.py
@@ -787,12 +787,6 @@ def write_repr(self, out, visited):
def _get_entries(keys):
dk_nentries = int(keys['dk_nentries'])
dk_size = 1<<int(keys['dk_log2_size'])
- try:
- # <= Python 3.5
- return keys['dk_entries'], dk_size
- except RuntimeError:
- # >= Python 3.6
- pass
if dk_size <= 0xFF:
offset = dk_size
@@ -805,7 +799,10 @@ def _get_entries(keys):
ent_addr = keys['dk_indices'].address
ent_addr = ent_addr.cast(_type_unsigned_char_ptr()) + offset
- ent_ptr_t = gdb.lookup_type('PyDictKeyEntry').pointer()
+ if int(keys['dk_kind']) == 0: # DICT_KEYS_GENERAL
+ ent_ptr_t = gdb.lookup_type('PyDictKeyEntry').pointer()
+ else:
+ ent_ptr_t = gdb.lookup_type('PyDictUnicodeEntry').pointer()
ent_addr = ent_addr.cast(ent_ptr_t)
return ent_addr, dk_nentries
1
0
bpo-46712: Let generate_global_objects.py Run on Earlier Python Versions (gh-31637)
by ericsnowcurrently March 1, 2022
by ericsnowcurrently March 1, 2022
March 1, 2022
https://github.com/python/cpython/commit/21099fc064c61d59c936a2f6a0db3e07cd…
commit: 21099fc064c61d59c936a2f6a0db3e07cd5c8de5
branch: main
author: Eric Snow <ericsnowcurrently(a)gmail.com>
committer: ericsnowcurrently <ericsnowcurrently(a)gmail.com>
date: 2022-03-01T14:29:54-07:00
summary:
bpo-46712: Let generate_global_objects.py Run on Earlier Python Versions (gh-31637)
https://bugs.python.org/issue46712
files:
M Makefile.pre.in
M Tools/scripts/generate_global_objects.py
diff --git a/Makefile.pre.in b/Makefile.pre.in
index 0383853901df1..7b6f54a9ae0a7 100644
--- a/Makefile.pre.in
+++ b/Makefile.pre.in
@@ -1176,7 +1176,7 @@ regen-importlib: regen-frozen
# Global objects
.PHONY: regen-global-objects
-regen-global-objects: $(srcdir)/Tools/scripts/generate_global_objects.py
+regen-global-objects: regen-deepfreeze $(srcdir)/Tools/scripts/generate_global_objects.py
$(PYTHON_FOR_REGEN) $(srcdir)/Tools/scripts/generate_global_objects.py
############################################################################
diff --git a/Tools/scripts/generate_global_objects.py b/Tools/scripts/generate_global_objects.py
index 639d8fa91c68b..867358cda8919 100644
--- a/Tools/scripts/generate_global_objects.py
+++ b/Tools/scripts/generate_global_objects.py
@@ -259,7 +259,7 @@ def generate_runtime_init(identifiers, strings):
printer.write(after)
-def get_identifiers_and_strings() -> tuple[set[str], dict[str, str]]:
+def get_identifiers_and_strings() -> 'tuple[set[str], dict[str, str]]':
identifiers = set(IDENTIFIERS)
strings = dict(STRING_LITERALS)
for name, string, *_ in iter_global_strings():
1
0
March 1, 2022
https://github.com/python/cpython/commit/7dbb2f8eaf07c105f4d2bb0fe61763463e…
commit: 7dbb2f8eaf07c105f4d2bb0fe61763463e68372d
branch: 3.10
author: Miss Islington (bot) <31488909+miss-islington(a)users.noreply.github.com>
committer: ned-deily <nad(a)python.org>
date: 2022-03-01T15:56:25-05:00
summary:
bpo-42982: update pbkdf2 example & add another link (GH-30966) (#30968)
Automerge-Triggered-By: GH:gpshead
(cherry picked from commit ace0aa2a2793ba4a2b03e56c4ec375c5470edee8)
Co-authored-by: Gregory P. Smith <greg(a)krypto.org>
files:
M Doc/library/hashlib.rst
diff --git a/Doc/library/hashlib.rst b/Doc/library/hashlib.rst
index 269e8a834d58d..aa24131f8bf44 100644
--- a/Doc/library/hashlib.rst
+++ b/Doc/library/hashlib.rst
@@ -251,15 +251,17 @@ include a `salt <https://en.wikipedia.org/wiki/Salt_%28cryptography%29>`_.
The number of *iterations* should be chosen based on the hash algorithm and
computing power. As of 2022, hundreds of thousands of iterations of SHA-256
are suggested. For rationale as to why and how to choose what is best for
- your application, read *Appendix A.2.2* of NIST-SP-800-132_.
+ your application, read *Appendix A.2.2* of NIST-SP-800-132_. The answers
+ on the `stackexchange pbkdf2 iterations question`_ explain in detail.
*dklen* is the length of the derived key. If *dklen* is ``None`` then the
digest size of the hash algorithm *hash_name* is used, e.g. 64 for SHA-512.
- >>> import hashlib
- >>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)
+ >>> from hashlib import pbkdf2_hmac
+ >>> our_app_iters = 500_000 # Application specific, read above.
+ >>> dk = pbkdf2_hmac('sha256', b'password', b'bad salt'*2, our_app_iters)
>>> dk.hex()
- '0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5'
+ '15530bba69924174860db778f2c6f8104d3aaf9d26241840c8c4a641c8d000a9'
.. versionadded:: 3.4
@@ -733,7 +735,7 @@ Domain Dedication 1.0 Universal:
.. _ChaCha: https://cr.yp.to/chacha.html
.. _pyblake2: https://pythonhosted.org/pyblake2/
.. _NIST-SP-800-132: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.p…
-
+.. _stackexchange pbkdf2 iterations question: https://security.stackexchange.com/questions/3959/recommended-of-iterations…
.. seealso::
1
0