[Mailman-Users] MM3 Test "" Hangs

Tom Browder writes:
We really appreciate your efforts to test the betas of Mailman 3. But please do be aware that although there are sites already successfully using Mailman 3 in production, the development team doesn't recommend use of any of the components (core, Postorius, HyperKitty) in production yet.
I have no idea what to do next,
Nothing. :-) I've already Cc'd (and set Reply-To to) mailman-developers, which is a more appropriate place for this report. (Many Mailman-Users are not interested in MM3 yet, while Mailman-Developers are by definition, as MM2 is basically end-of-life. Also, some relevant developers may read mailman-developers more frequently than they read mailman-users.)
Actually, I do have a couple of ideas. First, you should always report the whole error trace (if you think that's ugly in an email, attach it as a file). In this particular case, I suspect that the problem is in the test before the one that caused the Exception, which failed leaving the database locked. It would be very helpful to identify that test, which would probably be the *first* frame in the trace.
Second, you should look in the server's logs to see if there were any errors that might have caused the incomplete transaction.
and help or ideas would be greatly appreciated.
If you have no idea, then reporting to the developers is the best you can do. Use "Mailman Developers" <mailman-developers@python.org> or report via Launchpad.
Pretty clearly what's happened is that some previous test locked the database (probably anything that accesses the database does so at least long enough to read the whole record), and either (1) that test failed to unlock the database, (2) the test framework failed to unlock the database, or (3) the tests were improperly sequenced in some way and the database didn't get unlocked. It's quite possible that this failure could never be replicated in actual use, as tests often mock up some component that would normally ensure that any pending transaction gets discarded and the database unlocked.
Unfortunately I don't have an up-to-date test installation (it's on my list for early next week), and looking at the test file doesn't tell me anything. Perhaps Barry has an idea for a fix, or a workaround. And there's probably a way to skip that test, but I don't know nose very well.
Steve
Original report follows:

I encountered db lock using sqlite with mailman3 and tools. Switching to postgres avoid the db locking states. Maybe you should explore that way.
Hyperkitty moved to github so the lp ref is quite out of date for this resource.
Hope that helps.
Regards Le 26 févr. 2014 06:46, "Stephen J. Turnbull" <stephen@xemacs.org> a écrit :

On Feb 26, 2014, at 02:45 PM, Stephen J. Turnbull wrote:
I have no idea what to do next,
This is clearly a bug, although I think it's relatively recent, so it might be worth seeing if earlier revisions avoid the problem. Yes, I can reproduce it.
The interesting thing is that the test is in rest/docs/membership.rst so these are multiprocess related bugs. Typically when this happens (and it will only be with the default SQLite database, as observed by others), it's a bug in the test, not necessarily a bug in the core.
Tests which involve multiple processes, as the REST tests do (i.e. the foreground testing process and a background runner process) have to be careful to release the database lock when they expect background processes to access the database. Releasing the lock means calling .commit() or .abort() at the appropriate time. The thing to keep in mind is that with Storm, even doing an operation that results in a database query opens a transaction and thus acquires the lock.
In the context of membership.rst, what this probably means is that somewhere in the doctest there's a database query with a missing explicit .commit() or .abort() before the background REST runner process executes. Tracing through the doctest to find out exactly where it hangs usually helps isolate where the missing commit/abort should go.
Of course, it's possible that there's a missing commit/abort in the core, but I rather doubt it, since that's pretty well tied into the REST runner's HTTP transaction machinery, and other REST tests don't exhibit the hang.
Tom, if you're up for debugging it, that would be great. If not, no worries. The test suite hangs for me, so I'll find some time this weekend to take a look.
-Barry

Barry Warsaw <barry@list.org> wrote:
I am trying to get a bit caught up with MM 3, so I am trying to follow the instructions in src/mailman/docs/START.rst (I'm using the head of the lp:mailman branch).
I set up a virtualenv in /var/py27 on Ubuntu 13.10 with Python 2.7.5+ and activated it and ran
python setup.py develop
in my branch directory. My first issue was I had zope.interface-4.0.5 in /usr/lib/python2.7/dist-packages/, and instead of ignoring that and installing zope.interface-4.1.0 in /var/py27/lib/python2.7/site-packages/, setup just complained that zope.interface wasn't new enough.
I moved zope.interface-4.0.5 aside and reran setup and it was happy this time.
I then ran
nose2 -v &> test.log
which encountered one error before hanging apparently with a locked sqlite3 database. My test.log is attached. It is somewhat different from Tom's in that it hung on a different test, and the actual sequence of tests is different, and I encountered an error in a test.
I then tried to run
python setup.py build_sphinx
to build the docs, and this gives
error: invalid command 'build_sphinx'
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 02/26/2014 02:39 PM, Mark Sapiro wrote:
The error (failed test) was due to the default encoding in this Python installation being UTF-8 rather than ascii. Changing sitecustomise.py to not set UTF-8 allowed the test to succeed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 02/26/2014 06:42 PM, Barry Warsaw wrote:
It's more than just locale. You have to run
import sys sys.getdefaultencoding()
and see what it returns. Normally, it returns 'ascii' even in systems whose locale is UTF-8. You have to put/or enable something in sitecustomize.py (On Ubuntu a symlink to /etc/python2.7/sitecustomize.py) to get it to use the locale.
Anyway, when sys.getdefaultencoding() returns 'UTF-8', one of the tests in src/mailman/model/docs/registration.rst fails because it's expecting
InvalidEmailAddressError: \xa0@example.com
and instead of \xa0, it gets the UTF-8 encoding of a no-break space.
I've filed <https://bugs.launchpad.net/mailman/+bug/1285496> for this.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 09:21 PM, Mark Sapiro wrote:
You have to put/or enable something in sitecustomize.py (On Ubuntu a symlink to /etc/python2.7/sitecustomize.py) to get it to use the locale.
Have you changed your sitecustomize.py file?
You're right that sys.getdefaultencoding() gets initialized to 'ascii' everywhere for Python 2.7. The default site.py doesn't change that (even on Ubuntu), and then it deletes sys.setdefaultencoding, so it's not possible to change that after initialization.
How are you getting sys.getdefaultencoding() to return 'UTF-8'?
-Barry

On 02/27/2014 01:04 PM, Barry Warsaw wrote:
Yes, I think so. I no longer remember exactly, but this is what I have now:
$ ll /usr/lib/python2.7/sitecustomize.py lrwxrwxrwx 1 root root 31 Sep 19 07:47 /usr/lib/python2.7/sitecustomize.py -> /etc/python2.7/sitecustomize.py
$ cat /etc/python2.7/sitecustomize.py # Set default encoding for locale (see site.py) import sys import locale loc = locale.getdefaultlocale() if loc[1]: sys.setdefaultencoding(loc[1]) # install the apport exception handler if available try: import apport_python_hook except ImportError: pass else: apport_python_hook.install()
I think I added the first part based on stuff in the setencoding() function in site.py
Right, so if you are going to setdefaultencoding(), you need to do it in site.py or sitecustomize.py.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 02:39 PM, Mark Sapiro wrote:
which encountered one error before hanging apparently with a locked sqlite3 database.
This is my fault. I had a couple of revisions sitting in my local branch that I hadn't pushed to Launchpad. Please pull the latest trunk revision (r7234), recreate your virtualenv and try again.
Do you have the python-sphinx package installed? E.g. on Debian/Ubuntu:
$ sudo apt-get install python-sphinx
You'll also need to be sure that you created your virtualenv with --system-site-packages, otherwise your venv won't find Sphinx. (I checked that the START.rst file does mention --system-site-packages).
That's not terrible though. Inside your venv you can always do
$ pip install sphinx
and get the command back.
Cheers, -Barry

On 02/26/2014 06:40 PM, Barry Warsaw wrote:
OK. After pulling the latest trunk, the tests no longer hang.
My fault here. I work on a couple of different machines with different versions of Ubuntu and different packages installed, and this one didn't have python-sphinx. Fixed now.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 02:39 PM, Mark Sapiro wrote:
I moved zope.interface-4.0.5 aside and reran setup and it was happy this time.
Looks like upstream released zope.interface 4.1.0, but this hasn't been pulled into Debian or Ubuntu yet. This breaks other dependencies referenced by --system-site-packages.
I'll work on getting a newer zope.interface into the distros, but in the
meantime, when the python setup.py develop
command stops with this error,
just pip install -U zope.interface
and then run the develop command again.
-Barry

On 03/02/2014 11:46 AM, Barry Warsaw wrote:
OK. I now have a puzzle I can't seem to solve.
I have two machines with different software installation histories and different OS. Various hardware/driver issues preclude my running the same OS on both. Actually, there's a third machine with yet another OS version, but I haven't tried running MM 3 on it.
Anyway, on a machine with Ubuntu 13.10 64 bit, I have a virtualenv with packages up to date, and "python setup.py develop" runs fine and "nose2 -v" runs fine.
On the other machine with Ubuntu 12.10 32 bit, I again have a virtualenv with packages up to date, and "python setup.py develop" runs fine, but "nose2 -v" dies immediately with the attached traceback. Both zope.component and zope.interface are the same version on both machines. /var/py27 is the virtualenv.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mar 03, 2014, at 07:00 PM, Mark Sapiro wrote:
I don't know for sure, but I suspect that zope.interface in your virtualenv isn't actually 4.1.0. I say this because the changelog on PyPI says this:
-----snip snip----- zope.interface Changelog
4.1.0 (2014-02-05)
Updated boostrap.py to version 2.2.
Added @named(name) declaration, that specifies the component name, so it does not have to be passed in during registration. -----snip snip-----
https://pypi.python.org/pypi/zope.interface/4.1.0
What happens if you pip install -U zope.interface
in the virtualenv?
-Barry

On 03/03/2014 07:10 PM, Barry Warsaw wrote:
What happens if you
pip install -U zope.interface
in the virtualenv?
I don't know what was wrong, but I started over. I removed the virtualenv and then did
virtualenv --system-site-packages /var/py27 source /var/py27/bin/activate pip install -U zope.interface cd /var/MM/3.0 (my MM 3 directory) python setup.py develop nose2 -v
and this time all was well except two tests failed. The only thing I can think of is at one point, not realizing this was a 32-bit OS, I rsync'd the virtualenv from the 64-bit machine. I thought I had cleaned that all up, but there may have been some .pyc or ?? left.
Anyway, it appears this was my fault in not setting up the virtualenv properly somehow.
Regarding the failed tests, they seem to possibly be test issues rather than core issues, but the interesting thing is they don't fail on the other machine. Possibly there's a race condition. Also, if I just run the failing tests (and one other) with
nose2 -v -P "users.rst|inject.rst"
they don't fail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mar 03, 2014, at 09:30 PM, Mark Sapiro wrote:
That might have been the problem. I think zope.interface has some C extension modules for performance, and that would not be cross-architecture compatible. So it's possible that broke the installation or importability of 4.1.0 in your venv.
-Barry

I encountered db lock using sqlite with mailman3 and tools. Switching to postgres avoid the db locking states. Maybe you should explore that way.
Hyperkitty moved to github so the lp ref is quite out of date for this resource.
Hope that helps.
Regards Le 26 févr. 2014 06:46, "Stephen J. Turnbull" <stephen@xemacs.org> a écrit :

On Feb 26, 2014, at 02:45 PM, Stephen J. Turnbull wrote:
I have no idea what to do next,
This is clearly a bug, although I think it's relatively recent, so it might be worth seeing if earlier revisions avoid the problem. Yes, I can reproduce it.
The interesting thing is that the test is in rest/docs/membership.rst so these are multiprocess related bugs. Typically when this happens (and it will only be with the default SQLite database, as observed by others), it's a bug in the test, not necessarily a bug in the core.
Tests which involve multiple processes, as the REST tests do (i.e. the foreground testing process and a background runner process) have to be careful to release the database lock when they expect background processes to access the database. Releasing the lock means calling .commit() or .abort() at the appropriate time. The thing to keep in mind is that with Storm, even doing an operation that results in a database query opens a transaction and thus acquires the lock.
In the context of membership.rst, what this probably means is that somewhere in the doctest there's a database query with a missing explicit .commit() or .abort() before the background REST runner process executes. Tracing through the doctest to find out exactly where it hangs usually helps isolate where the missing commit/abort should go.
Of course, it's possible that there's a missing commit/abort in the core, but I rather doubt it, since that's pretty well tied into the REST runner's HTTP transaction machinery, and other REST tests don't exhibit the hang.
Tom, if you're up for debugging it, that would be great. If not, no worries. The test suite hangs for me, so I'll find some time this weekend to take a look.
-Barry

Barry Warsaw <barry@list.org> wrote:
I am trying to get a bit caught up with MM 3, so I am trying to follow the instructions in src/mailman/docs/START.rst (I'm using the head of the lp:mailman branch).
I set up a virtualenv in /var/py27 on Ubuntu 13.10 with Python 2.7.5+ and activated it and ran
python setup.py develop
in my branch directory. My first issue was I had zope.interface-4.0.5 in /usr/lib/python2.7/dist-packages/, and instead of ignoring that and installing zope.interface-4.1.0 in /var/py27/lib/python2.7/site-packages/, setup just complained that zope.interface wasn't new enough.
I moved zope.interface-4.0.5 aside and reran setup and it was happy this time.
I then ran
nose2 -v &> test.log
which encountered one error before hanging apparently with a locked sqlite3 database. My test.log is attached. It is somewhat different from Tom's in that it hung on a different test, and the actual sequence of tests is different, and I encountered an error in a test.
I then tried to run
python setup.py build_sphinx
to build the docs, and this gives
error: invalid command 'build_sphinx'
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 02/26/2014 02:39 PM, Mark Sapiro wrote:
The error (failed test) was due to the default encoding in this Python installation being UTF-8 rather than ascii. Changing sitecustomise.py to not set UTF-8 allowed the test to succeed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 02/26/2014 06:42 PM, Barry Warsaw wrote:
It's more than just locale. You have to run
import sys sys.getdefaultencoding()
and see what it returns. Normally, it returns 'ascii' even in systems whose locale is UTF-8. You have to put/or enable something in sitecustomize.py (On Ubuntu a symlink to /etc/python2.7/sitecustomize.py) to get it to use the locale.
Anyway, when sys.getdefaultencoding() returns 'UTF-8', one of the tests in src/mailman/model/docs/registration.rst fails because it's expecting
InvalidEmailAddressError: \xa0@example.com
and instead of \xa0, it gets the UTF-8 encoding of a no-break space.
I've filed <https://bugs.launchpad.net/mailman/+bug/1285496> for this.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 09:21 PM, Mark Sapiro wrote:
You have to put/or enable something in sitecustomize.py (On Ubuntu a symlink to /etc/python2.7/sitecustomize.py) to get it to use the locale.
Have you changed your sitecustomize.py file?
You're right that sys.getdefaultencoding() gets initialized to 'ascii' everywhere for Python 2.7. The default site.py doesn't change that (even on Ubuntu), and then it deletes sys.setdefaultencoding, so it's not possible to change that after initialization.
How are you getting sys.getdefaultencoding() to return 'UTF-8'?
-Barry

On 02/27/2014 01:04 PM, Barry Warsaw wrote:
Yes, I think so. I no longer remember exactly, but this is what I have now:
$ ll /usr/lib/python2.7/sitecustomize.py lrwxrwxrwx 1 root root 31 Sep 19 07:47 /usr/lib/python2.7/sitecustomize.py -> /etc/python2.7/sitecustomize.py
$ cat /etc/python2.7/sitecustomize.py # Set default encoding for locale (see site.py) import sys import locale loc = locale.getdefaultlocale() if loc[1]: sys.setdefaultencoding(loc[1]) # install the apport exception handler if available try: import apport_python_hook except ImportError: pass else: apport_python_hook.install()
I think I added the first part based on stuff in the setencoding() function in site.py
Right, so if you are going to setdefaultencoding(), you need to do it in site.py or sitecustomize.py.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 02:39 PM, Mark Sapiro wrote:
which encountered one error before hanging apparently with a locked sqlite3 database.
This is my fault. I had a couple of revisions sitting in my local branch that I hadn't pushed to Launchpad. Please pull the latest trunk revision (r7234), recreate your virtualenv and try again.
Do you have the python-sphinx package installed? E.g. on Debian/Ubuntu:
$ sudo apt-get install python-sphinx
You'll also need to be sure that you created your virtualenv with --system-site-packages, otherwise your venv won't find Sphinx. (I checked that the START.rst file does mention --system-site-packages).
That's not terrible though. Inside your venv you can always do
$ pip install sphinx
and get the command back.
Cheers, -Barry

On 02/26/2014 06:40 PM, Barry Warsaw wrote:
OK. After pulling the latest trunk, the tests no longer hang.
My fault here. I work on a couple of different machines with different versions of Ubuntu and different packages installed, and this one didn't have python-sphinx. Fixed now.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Feb 26, 2014, at 02:39 PM, Mark Sapiro wrote:
I moved zope.interface-4.0.5 aside and reran setup and it was happy this time.
Looks like upstream released zope.interface 4.1.0, but this hasn't been pulled into Debian or Ubuntu yet. This breaks other dependencies referenced by --system-site-packages.
I'll work on getting a newer zope.interface into the distros, but in the
meantime, when the python setup.py develop
command stops with this error,
just pip install -U zope.interface
and then run the develop command again.
-Barry

On 03/02/2014 11:46 AM, Barry Warsaw wrote:
OK. I now have a puzzle I can't seem to solve.
I have two machines with different software installation histories and different OS. Various hardware/driver issues preclude my running the same OS on both. Actually, there's a third machine with yet another OS version, but I haven't tried running MM 3 on it.
Anyway, on a machine with Ubuntu 13.10 64 bit, I have a virtualenv with packages up to date, and "python setup.py develop" runs fine and "nose2 -v" runs fine.
On the other machine with Ubuntu 12.10 32 bit, I again have a virtualenv with packages up to date, and "python setup.py develop" runs fine, but "nose2 -v" dies immediately with the attached traceback. Both zope.component and zope.interface are the same version on both machines. /var/py27 is the virtualenv.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mar 03, 2014, at 07:00 PM, Mark Sapiro wrote:
I don't know for sure, but I suspect that zope.interface in your virtualenv isn't actually 4.1.0. I say this because the changelog on PyPI says this:
-----snip snip----- zope.interface Changelog
4.1.0 (2014-02-05)
Updated boostrap.py to version 2.2.
Added @named(name) declaration, that specifies the component name, so it does not have to be passed in during registration. -----snip snip-----
https://pypi.python.org/pypi/zope.interface/4.1.0
What happens if you pip install -U zope.interface
in the virtualenv?
-Barry

On 03/03/2014 07:10 PM, Barry Warsaw wrote:
What happens if you
pip install -U zope.interface
in the virtualenv?
I don't know what was wrong, but I started over. I removed the virtualenv and then did
virtualenv --system-site-packages /var/py27 source /var/py27/bin/activate pip install -U zope.interface cd /var/MM/3.0 (my MM 3 directory) python setup.py develop nose2 -v
and this time all was well except two tests failed. The only thing I can think of is at one point, not realizing this was a 32-bit OS, I rsync'd the virtualenv from the 64-bit machine. I thought I had cleaned that all up, but there may have been some .pyc or ?? left.
Anyway, it appears this was my fault in not setting up the virtualenv properly somehow.
Regarding the failed tests, they seem to possibly be test issues rather than core issues, but the interesting thing is they don't fail on the other machine. Possibly there's a race condition. Also, if I just run the failing tests (and one other) with
nose2 -v -P "users.rst|inject.rst"
they don't fail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mar 03, 2014, at 09:30 PM, Mark Sapiro wrote:
That might have been the problem. I think zope.interface has some C extension modules for performance, and that would not be cross-architecture compatible. So it's possible that broke the installation or importability of 4.1.0 in your venv.
-Barry
participants (6)
-
Barry Warsaw
-
Barry Warsaw
-
Mark Sapiro
-
Nicolas Karageuzian
-
Stephen J. Turnbull
-
Tom Browder