From 2QdxY4RzWzUUiLuE at potatochowder.com Tue Oct 1 11:34:45 2024 From: 2QdxY4RzWzUUiLuE at potatochowder.com (2QdxY4RzWzUUiLuE at potatochowder.com) Date: Tue, 1 Oct 2024 11:34:45 -0400 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: <87jzesr3u5.fsf@nosuchdomain.example.com> References: <4XHQPG4LzsznVwM@mail.python.org> <4XHbxS5jl4znVGD@mail.python.org> <87jzesr3u5.fsf@nosuchdomain.example.com> Message-ID: On 2024-09-30 at 18:48:02 -0700, Keith Thompson via Python-list wrote: > 2QdxY4RzWzUUiLuE at potatochowder.com writes: > [...] > > In Common Lisp, you can write integers as #nnR[digits], where nn is the > > decimal representation of the base (possibly without a leading zero), > > the # and the R are literal characters, and the digits are written in > > the intended base. So the input #16fFFFF is read as the integer 65535. > > Typo: You meant #16RFFFF, not #16fFFFF. Yep. Sorry. From 2QdxY4RzWzUUiLuE at potatochowder.com Tue Oct 1 11:47:24 2024 From: 2QdxY4RzWzUUiLuE at potatochowder.com (2QdxY4RzWzUUiLuE at potatochowder.com) Date: Tue, 1 Oct 2024 11:47:24 -0400 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On 2024-09-30 at 21:34:07 +0200, Regarding "Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API," Left Right via Python-list wrote: > > What am I missing? Handwavingly, start with the first digit, and as > > long as the next character is a digit, multipliy the accumulated result > > by 10 (or the appropriate base) and add the next value. Oh, and handle > > scientific notation as a special case, and perhaps fail spectacularly > > instead of recovering gracefully in certain edge cases. And in the > > pathological case of a single number with 60 billion digits, run out of > > memory (and complain loudly to the person who claimed that the file > > contained a "dataset"). But why do I need to start with the least > > significant digit? > > You probably forgot that it has to be _streaming_. Suppose you parse > the first digit: can you hand this information over to an external > function to process the parsed data? -- No! because you don't know the > magnitude yet. What about two digits? -- Same thing. You cannot > leave the parser code until you know the magnitude (otherwise the > information is useless to the external code). If I recognize the first digit, then I *can* hand that over to an external function to accumulate the digits that follow. > So, even if you have enough memory and don't care about special cases > like scientific notation: yes, you will be able to parse it, but it > won't be a streaming parser. Under that constraint, I'm not sure I can parse anything. How can I parse a string (and hand it over to an external function) until I've found the closing quote? How much state can a parser maintain (before it invokes an external function) and still be considered streaming? I fear that we may be getting hung up on terminology rather than solving the problem at hand. From thomas at python.org Tue Oct 1 12:39:31 2024 From: thomas at python.org (Thomas Wouters) Date: Tue, 1 Oct 2024 09:39:31 -0700 Subject: [RELEASE] Python 3.13.0rc3 and 3.12.7 released. Message-ID: This is not the release you?re looking for? (unless you?re looking for 3.12.7.) Because no plan survives contact with reality, instead of the actual Python 3.13.0 release we have a new Python 3.13 release candidate today. Python 3.13.0rc3 rolls back the incremental cyclic garbage collector (GC), which was added in one of the alpha releases. The incremental GC had more significant performance regressions in specific workloads than we expected. Rather than try to fiddle with its details in the hope of fixing them (and not making anything else worse) we decided to revert back to the old GC in 3.13. Work on the incremental GC will continue in 3.14. We also took the opportunity to fix some other (rare) bugs and issues found in 3.13.0rc2. The final release of Python 3.13.0 will now happen next week, Monday October 7th . In an effort to return to normalcy, we?ve also released Python 3.12.7 as scheduled, despite the expedited release a month ago. It?s important to be regular! 3.13.0rc3 https://www.python.org/downloads/release/python-3130rc3/ The final cut of 3.13.0 (really, honest). Besides the incremental GC revert it contains a small number of other fixes, as well as many documentation improvements and testsuite improvements (~145 changes in total). Call to action We strongly encourage maintainers of third-party Python projects to prepare their projects for 3.13 compatibilities during this phase, and where necessary publish Python 3.13 wheels on PyPI to be ready for the final release of 3.13.0. Any binary wheels built against Python 3.13.0rc1 and later will work with future versions of Python 3.13. As always, report any issues to the Python bug tracker . Please keep in mind that this is a preview release and while it?s as close to the final release as we can get it, its use is not recommended for production environments. Next week, though! New features in Python 3.13 - A new and improved interactive interpreter , based on PyPy ?s, featuring multi-line editing and color support, as well as colorized exception tracebacks . - An *experimental* free-threaded build mode , which disables the Global Interpreter Lock, allowing threads to run more concurrently. The build mode is available as an experimental feature in the Windows and macOS installers as well. - A preliminary, *experimental* JIT , providing the ground work for significant performance improvements. - The locals() builtin function (and its C equivalent) now has well-defined semantics when mutating the returned mapping , which allows debuggers to operate more consistently. - A modified version of mimalloc is now included, optional but enabled by default if supported by the platform, and required for the free-threaded build mode. - Docstrings now have their leading indentation stripped , reducing memory use and the size of .pyc files. (Most tools handling docstrings already strip leading indentation.) - The dbm module has a new dbm.sqlite3 backend that is used by default when creating new files. - The minimum supported macOS version was changed from 10.9 to 10.13 (High Sierra). Older macOS versions will not be supported going forward. - WASI is now a Tier 2 supported platform . Emscripten is no longer an officially supported platform (but Pyodide continues to support Emscripten). - iOS is now a Tier 3 supported platform . - Android is now a Tier 3 supported platform as well. Python 3.12.7 https://www.python.org/downloads/release/python-3127/ A small release since 3.12.6 was only a month ago, but nevertheless 3.12.7 contains ~120 bug fixes, build improvements and documentation changes. More resources - Python 3.13 Online Documentation - PEP 719 , Python 3.13 Release Schedule - Report bugs at Issues ? python/cpython ? GitHub . - Help fund Python directly (or via GitHub Sponsors ), and support the Python community . Enjoy the new releases Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation. Regards from a positively *melting* Menlo Park for some reason this time, Your release team, Thomas Wouters ?ukasz Langa Ned Deily Steve Dower From olegsivokon at gmail.com Tue Oct 1 17:03:01 2024 From: olegsivokon at gmail.com (Left Right) Date: Tue, 1 Oct 2024 23:03:01 +0200 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: > If I recognize the first digit, then I *can* hand that over to an > external function to accumulate the digits that follow. And what is that external function going to do with this information? The point is you didn't parse anything if you just sent the digit. You just delegated the parsing further. Parsing is only meaningful if you extracted some information, but your idea is, essentially "what if I do nothing?". > Under that constraint, I'm not sure I can parse anything. How can I parse a string (and hand it over to an external function) until I've found the closing quote? Nobody says that parsing a number is the only pathological case. You, however, exaggerate by saying you cannot parse _anything_. You can parse booleans or null, for example. There's no problem there. Again, I think you misunderstand what streaming is for. Let me remind: it's for processing information as it comes, potentially, indefinitely. This has far more important implications than what you find in computer science. For example, some mathematicians use the same argument to show that real numbers are either fiction or useless: consider adding two real numbers (where real numbers are potentially infinite strings of decimal digits after the period) -- there's no way to prove that such an addition is possible because you would need an infinite proof for that (because you need to start adding from the least significant digit). In principle, any language that has infinite words will have the same problem with streaming. If you ever pondered h/w or low-level protocols s.a. SCSI or IP, you'd see that they are specifically designed in such a way as to never have infinite words (because they must be amenable to streaming). Consider also an interesting consequence of SCSI not being able to have infinite words: this means, besides other things that fsync() is nonsense! :) If you aren't familiar with the concept: UNIX filesystem API suggests that it's possible to destage arbitrary large file (or a chunk of file) to disk. But SCSI is built of finite "words" and to describe an arbitrary large file you'd need to list all the blocks that constitute the file! And that's why fsync() and family are so hated by people who deal with storage: the only way to implement fsync() in compliance with the standard is to sync _everything_ (and it hurts!) On Tue, Oct 1, 2024 at 5:49?PM Dan Sommers via Python-list wrote: > > On 2024-09-30 at 21:34:07 +0200, > Regarding "Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API," > Left Right via Python-list wrote: > > > > What am I missing? Handwavingly, start with the first digit, and as > > > long as the next character is a digit, multipliy the accumulated result > > > by 10 (or the appropriate base) and add the next value. Oh, and handle > > > scientific notation as a special case, and perhaps fail spectacularly > > > instead of recovering gracefully in certain edge cases. And in the > > > pathological case of a single number with 60 billion digits, run out of > > > memory (and complain loudly to the person who claimed that the file > > > contained a "dataset"). But why do I need to start with the least > > > significant digit? > > > > You probably forgot that it has to be _streaming_. Suppose you parse > > the first digit: can you hand this information over to an external > > function to process the parsed data? -- No! because you don't know the > > magnitude yet. What about two digits? -- Same thing. You cannot > > leave the parser code until you know the magnitude (otherwise the > > information is useless to the external code). > > If I recognize the first digit, then I *can* hand that over to an > external function to accumulate the digits that follow. > > > So, even if you have enough memory and don't care about special cases > > like scientific notation: yes, you will be able to parse it, but it > > won't be a streaming parser. > > Under that constraint, I'm not sure I can parse anything. How can I > parse a string (and hand it over to an external function) until I've > found the closing quote? > > How much state can a parser maintain (before it invokes an external > function) and still be considered streaming? I fear that we may be > getting hung up on terminology rather than solving the problem at hand. > -- > https://mail.python.org/mailman/listinfo/python-list From greg.ewing at canterbury.ac.nz Tue Oct 1 17:48:24 2024 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 2 Oct 2024 10:48:24 +1300 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On 1/10/24 8:34 am, Left Right wrote: > You probably forgot that it has to be _streaming_. Suppose you parse > the first digit: can you hand this information over to an external > function to process the parsed data? -- No! because you don't know the > magnitude yet. By that definition of "streaming", no parser can ever be streaming, because there will be some constructs that must be read in their entirety before a suitably-structured piece of output can be emitted. The context of this discussion about integers is the claim that they *could* be parsed incrementally if they were written little endian instead of big endian, but the same argument applies either way. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 1 18:07:41 2024 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 2 Oct 2024 11:07:41 +1300 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On 2/10/24 10:03 am, Left Right wrote: > Consider also an interesting > consequence of SCSI not being able to have infinite words: this means, > besides other things that fsync() is nonsense! :) If you aren't > familiar with the concept: UNIX filesystem API suggests that it's > possible to destage arbitrary large file (or a chunk of file) to disk. > But SCSI is built of finite "words" and to describe an arbitrary large > file you'd need to list all the blocks that constitute the file! I don't follow. What fsync() does is ensure that any data buffered in the kernel relating to the file is sent to the storage device. It can send as many blocks of data over SCSI as required to achieve this. There's no requirement for it to be atomic at the level of the interface between the kernel and the hardware. Some devices do their own buffering in ways that are invisible to the software, so fsync() can't guarantee that the data is actually written to the storage medium. But that's a problem stemming from the design of the hardware, not the design of the protocol for communicating with the hardware. > the only way to implement fsync() in compliance with the > standard is to sync _everything_ Again I'm not sure what you mean here. It may be difficult for the kernel to track down exactly what data is relevant to a particular file, and so the kernel programmers take the easy way out and just implement fsync() as sync(). But again that has nothing to do with the protocol. -- Greg From avi.e.gross at gmail.com Tue Oct 1 19:26:52 2024 From: avi.e.gross at gmail.com (avi.e.gross at gmail.com) Date: Tue, 1 Oct 2024 19:26:52 -0400 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: <020101db1459$65b0c4d0$31124e70$@gmail.com> This discussion has become less useful. E can all agree that in Computer Science, real infinities are avoided, and frankly, need not be taken seriously in any serious program. You can store all kinds of infinities quite compactly as in a transcendental number you can derive to as many decimal points as you like. Want 1/7 to a thousand decimal places, no problem. You can be given a digit 1 and a digit 7 and asked to do a division to as many digits as you wish in a deterministic manner. I can think of quite a few generators that could easily supply the next digit, or just keep giving the next element from 142857 each time from a circular loop. Sines, cosines, pi, e and so on, can often be calculated to arbitrary precision by evaluating things like infinite Taylor Series as many times as needed up to the precision of the data holding the number as you move along. Similar ideas allow generators to give you as many primes as you want, and no more. So, if you can store arbitrary python code as part of your JSON, you can send quite a bit of somewhat compressed data. The real problem is how the JSON is set up. If you take umpteen data structures and wrap them all in something like a list, then it may be a tad hard to stream as you may not necessarily be examining the contents till the list finishes gigabytes later. But if, instead, you send lots of smaller parts, such as perhaps sending each row of something like a data.frame individually, the other side can recombine them incrementally to a larger structure such as a data.frame and do some logic on it as it streams, such as keeping only some columns and discarding the rest, or applying filters that only keep rows you care about. And, of course, all rows could be appended to one and perhaps more .CSV files as well so if you need multiple passes on the data, it can now be processed locally in various modes, including "streamed". I think that for some purposes, it makes some sense to not stream anything but results. I mean consider any database that allows a remote login and SQL commands that only stream results. If I only want info on records about company X between July 1 and September 15 of a particular year and only if the amount paid remains zero or is less than the amount owed, ... -----Original Message----- From: Python-list On Behalf Of Greg Ewing via Python-list Sent: Tuesday, October 1, 2024 5:48 PM To: python-list at python.org Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API On 1/10/24 8:34 am, Left Right wrote: > You probably forgot that it has to be _streaming_. Suppose you parse > the first digit: can you hand this information over to an external > function to process the parsed data? -- No! because you don't know the > magnitude yet. By that definition of "streaming", no parser can ever be streaming, because there will be some constructs that must be read in their entirety before a suitably-structured piece of output can be emitted. The context of this discussion about integers is the claim that they *could* be parsed incrementally if they were written little endian instead of big endian, but the same argument applies either way. -- Greg -- https://mail.python.org/mailman/listinfo/python-list From 2QdxY4RzWzUUiLuE at potatochowder.com Tue Oct 1 20:20:59 2024 From: 2QdxY4RzWzUUiLuE at potatochowder.com (2QdxY4RzWzUUiLuE at potatochowder.com) Date: Tue, 1 Oct 2024 20:20:59 -0400 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On 2024-10-01 at 23:03:01 +0200, Left Right wrote: > > If I recognize the first digit, then I *can* hand that over to an > > external function to accumulate the digits that follow. > > And what is that external function going to do with this information? > The point is you didn't parse anything if you just sent the digit. > You just delegated the parsing further. Parsing is only meaningful if > you extracted some information, but your idea is, essentially "what if > I do nothing?". If the parser detects the first digit of a number, then the parser can read digits one at a time (i.e., "streaming"), assimilate and accumulate the value of the number being parsed, and successfully finish parsing the number it reads a non-digit. Whether the function that accumulates the value during the process is internal or external isn't relevant; the point is that it is possible to parse integers from most significant digit to least significant digit under a streaming model (and if you're sufficiently clever, you can even write partial results to external storage and/or another transmission protocol, thus allowing for numbers bigger (as measured by JSON or your internal representation) than your RAM). At most, the parser has to remember the non-digit character it read so that it (the parser) can begin to parse whatever comes after the number. Does that break your notion of "streaming"? Why do I have to start with the least significant digit? > > Under that constraint, I'm not sure I can parse anything. How can I > > parse a string (and hand it over to an external function) until I've > > found the closing quote? > > Nobody says that parsing a number is the only pathological case. You, > however, exaggerate by saying you cannot parse _anything_. You can > parse booleans or null, for example. There's no problem there. My intent was only to repeat what you implied: that any parser that reads its input until it has parsed a value is not streaming. So how much information can the parser keep before you consider it not to be "streaming"? [...] > In principle, any language that has infinite words will have the same > problem with streaming [...] So what magic allows anyone to stream any JSON file over SCSI or IP? Let alone some kind of "live stream" that by definition is indefinite, even if it only lasts a few tenths of a second? > [...] If you ever pondered h/w or low-level > protocols s.a. SCSI or IP [...] I spent a good deal of my career designing and implementing all manner of communicaations protocols, from transmitting and receiving single bits over a wire all the way up to what are now known as session and presentation layers. Some imposed maximum lengths in certain places; some allowed for indefinite amounts of data to be transferred from one end to the other without stopping, resetting, or overflowing. And yet somehow, the universe never collapsed. If you believe that some implementation of fsync fails to meet a specification, or fails to work correctly on files containign JSON, then file a bug report. From greg.ewing at canterbury.ac.nz Wed Oct 2 01:27:54 2024 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 2 Oct 2024 18:27:54 +1300 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <020101db1459$65b0c4d0$31124e70$@gmail.com> Message-ID: On 2/10/24 12:26 pm, avi.e.gross at gmail.com wrote: > The real problem is how the JSON is set up. If you take umpteen data > structures and wrap them all in something like a list, then it may be a tad > hard to stream as you may not necessarily be examining the contents till the > list finishes gigabytes later. Yes, if you want to process the items as they come in, you might be better off sending a series of separate JSON strings, rather than one JSON string containing a list. Or, use a specialised JSON parser that processes each item of the list as soon as it's finished parsing it, instead of collecting the whole list first. -- Greg From guenther.sohler at gmail.com Wed Oct 2 09:26:47 2024 From: guenther.sohler at gmail.com (Guenther Sohler) Date: Wed, 2 Oct 2024 15:26:47 +0200 Subject: Python crash together with threads Message-ID: My Software project is working fine in most of the cases (www.pythonscad.org) however I am right now isolating a scenario, which makes it crash permanently. It does not happen with Python 3.11.6 (and possibly below), it happens with 3.12 and above It does not happen when not using Threads. However due to the architecture of the program I am forced to evaluate some parts in main thread and some parts in a dedicated Thread. The Thread is started with QThread(QT 5.0) whereas I am quite sure that program flows do not overlap. When I just execute my 1st very simple Python function inside the newly created thread, like: PyObject *a = PyFloat_FromDouble(3.3); my program crashes with this Stack trace 0 0x00007f6837fe000f in _PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:179 #1 get_float_state () at Objects/floatobject.c:38 #2 PyFloat_FromDouble (fval=3.2999999999999998) at Objects/floatobject.c:136 #3 0x00000000015a021f in python_testfunc() () #4 0x0000000001433301 in CGALWorker::work() () #5 0x0000000000457135 in CGALWorker::qt_static_metacall(QObject*, QMetaObject::Call, int, void**) () #6 0x00007f68364d0f9f in void doActivate(QObject*, int, void**) () at /lib64/libQt5Core.so.5 #7 0x00007f68362e66ee in QThread::started(QThread::QPrivateSignal) () at /lib64/libQt5Core.so.5 #8 0x00007f68362e89c4 in QThreadPrivate::start(void*) () at /lib64/libQt5Core.so.5 #9 0x00007f6835cae19d in start_thread () at /lib64/libc.so.6 #10 0x00007f6835d2fc60 in clone3 () at /lib64/libc.so.6 I suspect, that this is a Null pointer here See also _PyInterpreterState_Get() and _PyGILState_GetInterpreterStateUnsafe(). */ static inline PyInterpreterState* _PyInterpreterState_GET(void) { PyThreadState *tstate = _PyThreadState_GET(); #ifdef Py_DEBUG _Py_EnsureTstateNotNULL(tstate); #endif # <<----------- suspect state is nullpointer return tstate->interp; } any clues , whats going on here, and how I can mitigate that ? From olegsivokon at gmail.com Wed Oct 2 02:05:02 2024 From: olegsivokon at gmail.com (Left Right) Date: Wed, 2 Oct 2024 08:05:02 +0200 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: > By that definition of "streaming", no parser can ever be streaming, > because there will be some constructs that must be read in their > entirety before a suitably-structured piece of output can be > emitted. In the same email you replied to, I gave examples of languages for which parsers can be streaming (in general): SCSI or IP. For some languages (eg. everything in the context-free family) streaming parsers are _in general_ impossible, because there are pathological cases like the one with parsing numbers. But this doesn't mean that you cannot come up with a parser that is only useful _sometimes_. And, in practice, languages like XML or JSON do well with streaming, even though in general it's impossible. I'm sorry if this comes as a surprise. On one hand I don't want to sound condescending, on the other hand, this is something that you'd typically study in automata theory class. Well, not exactly in the very same words, but you should be able to figure this stuff out if you had that class. From rosuav at gmail.com Wed Oct 2 09:59:41 2024 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Oct 2024 23:59:41 +1000 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On Wed, 2 Oct 2024 at 23:53, Left Right via Python-list wrote: > In the same email you replied to, I gave examples of languages for > which parsers can be streaming (in general): SCSI or IP. You can't validate an IP packet without having all of it. Your notion of "streaming" is nonsensical. ChrisA From rosuav at gmail.com Wed Oct 2 18:51:01 2024 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 3 Oct 2024 08:51:01 +1000 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On Thu, 3 Oct 2024 at 08:48, Left Right wrote: > > > You can't validate an IP packet without having all of it. Your notion > > of "streaming" is nonsensical. > > Whoa, whoa, hold your horses! "nonsensical" needs a little bit of > justification :) > > It seems you don't understand the difference between words and > languages! In my examples, IP _protocol_ is the language, sequences of > IP packets are the words in the language. A language is amenable to > streaming if the words of the language are repetition of sequences of > symbols of the alphabet of fixed length. This is, essentially, like > saying that the words themselves are regular. One single IP packet is all you can parse. You're playing shenanigans with words the way Humpty Dumpty does. IP packets are not sequences, they are individuals. ChrisA From lkrupp at invalid.pssw.com.invalid Wed Oct 2 17:06:03 2024 From: lkrupp at invalid.pssw.com.invalid (Louis Krupp) Date: Wed, 2 Oct 2024 15:06:03 -0600 Subject: Python crash together with threads In-Reply-To: References: Message-ID: <%AiLO.42528$s7Ce.9174@fx46.iad> On 10/2/2024 7:26 AM, Guenther Sohler wrote: > My Software project is working fine in most of the cases > (www.pythonscad.org) > however I am right now isolating a scenario, which makes it crash > permanently. > > It does not happen with Python 3.11.6 (and possibly below), it happens with > 3.12 and above > It does not happen when not using Threads. > > However due to the architecture of the program I am forced to evaluate some > parts in main thread and some parts in a dedicated Thread. The Thread is > started with QThread(QT 5.0) > whereas I am quite sure that program flows do not overlap. > > When I just execute my 1st very simple Python function inside the newly > created thread, like: > > PyObject *a = PyFloat_FromDouble(3.3); > > my program crashes with this Stack trace > > 0 0x00007f6837fe000f in _PyInterpreterState_GET () at > ./Include/internal/pycore_pystate.h:179 > #1 get_float_state () at Objects/floatobject.c:38 > #2 PyFloat_FromDouble (fval=3.2999999999999998) at > Objects/floatobject.c:136 > #3 0x00000000015a021f in python_testfunc() () > #4 0x0000000001433301 in CGALWorker::work() () > #5 0x0000000000457135 in CGALWorker::qt_static_metacall(QObject*, > QMetaObject::Call, int, void**) () > #6 0x00007f68364d0f9f in void doActivate(QObject*, int, void**) () > at /lib64/libQt5Core.so.5 > #7 0x00007f68362e66ee in QThread::started(QThread::QPrivateSignal) () at > /lib64/libQt5Core.so.5 > #8 0x00007f68362e89c4 in QThreadPrivate::start(void*) () at > /lib64/libQt5Core.so.5 > #9 0x00007f6835cae19d in start_thread () at /lib64/libc.so.6 > #10 0x00007f6835d2fc60 in clone3 () at /lib64/libc.so.6 > > > I suspect, that this is a Null pointer here > See also _PyInterpreterState_Get() > and _PyGILState_GetInterpreterStateUnsafe(). */ > static inline PyInterpreterState* _PyInterpreterState_GET(void) { > PyThreadState *tstate = _PyThreadState_GET(); > #ifdef Py_DEBUG > _Py_EnsureTstateNotNULL(tstate); > #endif > # <<----------- suspect state is nullpointer > return tstate->interp; > } > > any clues , whats going on here, and how I can mitigate that ? Can you post a small, self-contained program that demonstrates the problem? Louis From olegsivokon at gmail.com Wed Oct 2 18:48:10 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 3 Oct 2024 00:48:10 +0200 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: > You can't validate an IP packet without having all of it. Your notion > of "streaming" is nonsensical. Whoa, whoa, hold your horses! "nonsensical" needs a little bit of justification :) It seems you don't understand the difference between words and languages! In my examples, IP _protocol_ is the language, sequences of IP packets are the words in the language. A language is amenable to streaming if the words of the language are repetition of sequences of symbols of the alphabet of fixed length. This is, essentially, like saying that the words themselves are regular. So, the follow-up question from you to me should be: how come strictly context-free languages can still be parsed with streaming parsers? -- And the answer to that is it's possible to approximate context-free languages with regular languages. In fact, this is a very interesting subject, which unfortunately is usually overlooked in automata classes. It's interesting in a sense that it's very accessible to the students who already mastered the understanding of regular and context-free formalisms. So, streaming parsers (eg. SAX) are written for a regular language that approximates XML. This is because in practice we will almost never encounter more than N nesting levels in an XML, more than N characters in an element name etc. (for some large enough N). Something which allows us to create a regular language from a context-free one. NB. "Nonsensical" has a very precise meaning, when it comes to discussing the truth value of a proposition, which I think you also somehow didn't know about. You seem to use "nonsensical" as a synonym to "wrong". But, unbeknownst to you, you said something else. You actually implied that there's no way to tell if my notion of streaming is correct or not. But, for the future reference: my notion of streaming is correct, and you would do better learning some materials about it before jumping to conclusions. From olegsivokon at gmail.com Wed Oct 2 18:56:36 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 3 Oct 2024 00:56:36 +0200 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: > One single IP packet is all you can parse. I worked for an undisclosed company which manufactures h/w for ISPs (4- and 8-unit boxes you mount on a rack in a datacenter). Essentially, big-big routers. So, I had the pleasure of writing software that parses IP _protocol_, and let me tell you: you have no idea what you just wrote. But, like I wrote earlier: you don't understand the distinction between languages and words. And in general, are just being stubborn and rude because you are trying to prove a point to someone you don't like, but, in reality, you just look more and more ridiculous. On Thu, Oct 3, 2024 at 12:51?AM Chris Angelico wrote: > > On Thu, 3 Oct 2024 at 08:48, Left Right wrote: > > > > > You can't validate an IP packet without having all of it. Your notion > > > of "streaming" is nonsensical. > > > > Whoa, whoa, hold your horses! "nonsensical" needs a little bit of > > justification :) > > > > It seems you don't understand the difference between words and > > languages! In my examples, IP _protocol_ is the language, sequences of > > IP packets are the words in the language. A language is amenable to > > streaming if the words of the language are repetition of sequences of > > symbols of the alphabet of fixed length. This is, essentially, like > > saying that the words themselves are regular. > > One single IP packet is all you can parse. You're playing shenanigans > with words the way Humpty Dumpty does. IP packets are not sequences, > they are individuals. > > ChrisA From ethan at stoneleaf.us Wed Oct 2 21:57:51 2024 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 2 Oct 2024 18:57:51 -0700 Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: Message-ID: This thread is derailing. Please consider it closed. -- ~Ethan~ Moderator From greg.ewing at canterbury.ac.nz Thu Oct 3 03:08:35 2024 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 3 Oct 2024 20:08:35 +1300 Subject: doRe: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API In-Reply-To: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Message-ID: On 3/10/24 11:48 am, Left Right wrote: > So, streaming parsers (eg. SAX) are written for a regular language > that approximates XML. SAX doesn't parse a whole XML document, it parses small pieces of it independently and passes them on. It's more like a lexical analyser than a parser in that respect. -- Greg From olegsivokon at gmail.com Thu Oct 3 17:01:53 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 3 Oct 2024 23:01:53 +0200 Subject: Python crash together with threads In-Reply-To: References: Message-ID: > whereas I am quite sure that program flows do not overlap. You can never be sure of this in Python. Virtually all objects in Python are allocated on heap, so instantiating integers, doing simple arithmetic etc. -- all of this requires synchronization because it will allocate memory for a shared pool. The description of _PyThreadState_GET states that callers must hold GIL. Does your code do that? It's not possible to divine that from the stack trace, but you'd probably know that. On Wed, Oct 2, 2024 at 3:29?PM Guenther Sohler via Python-list wrote: > > My Software project is working fine in most of the cases > (www.pythonscad.org) > however I am right now isolating a scenario, which makes it crash > permanently. > > It does not happen with Python 3.11.6 (and possibly below), it happens with > 3.12 and above > It does not happen when not using Threads. > > However due to the architecture of the program I am forced to evaluate some > parts in main thread and some parts in a dedicated Thread. The Thread is > started with QThread(QT 5.0) > whereas I am quite sure that program flows do not overlap. > > When I just execute my 1st very simple Python function inside the newly > created thread, like: > > PyObject *a = PyFloat_FromDouble(3.3); > > my program crashes with this Stack trace > > 0 0x00007f6837fe000f in _PyInterpreterState_GET () at > ./Include/internal/pycore_pystate.h:179 > #1 get_float_state () at Objects/floatobject.c:38 > #2 PyFloat_FromDouble (fval=3.2999999999999998) at > Objects/floatobject.c:136 > #3 0x00000000015a021f in python_testfunc() () > #4 0x0000000001433301 in CGALWorker::work() () > #5 0x0000000000457135 in CGALWorker::qt_static_metacall(QObject*, > QMetaObject::Call, int, void**) () > #6 0x00007f68364d0f9f in void doActivate(QObject*, int, void**) () > at /lib64/libQt5Core.so.5 > #7 0x00007f68362e66ee in QThread::started(QThread::QPrivateSignal) () at > /lib64/libQt5Core.so.5 > #8 0x00007f68362e89c4 in QThreadPrivate::start(void*) () at > /lib64/libQt5Core.so.5 > #9 0x00007f6835cae19d in start_thread () at /lib64/libc.so.6 > #10 0x00007f6835d2fc60 in clone3 () at /lib64/libc.so.6 > > > I suspect, that this is a Null pointer here > See also _PyInterpreterState_Get() > and _PyGILState_GetInterpreterStateUnsafe(). */ > static inline PyInterpreterState* _PyInterpreterState_GET(void) { > PyThreadState *tstate = _PyThreadState_GET(); > #ifdef Py_DEBUG > _Py_EnsureTstateNotNULL(tstate); > #endif > # <<----------- suspect state is nullpointer > return tstate->interp; > } > > any clues , whats going on here, and how I can mitigate that ? > -- > https://mail.python.org/mailman/listinfo/python-list From dciprus at cisco.com Thu Oct 3 18:12:15 2024 From: dciprus at cisco.com (Dan Ciprus (dciprus)) Date: Thu, 3 Oct 2024 22:12:15 +0000 Subject: [Tutor] How to stop a specific thread in Python 2.7? In-Reply-To: References: Message-ID: I'd be interested too :-). On Thu, Sep 26, 2024 at 03:34:05AM GMT, marc nicole via Python-list wrote: >Could you show a python code example of this? > > >On Thu, 26 Sept 2024, 03:08 Cameron Simpson, wrote: > >> On 25Sep2024 22:56, marc nicole wrote: >> >How to create a per-thread event in Python 2.7? >> >> Every time you make a Thread, make an Event. Pass it to the thread >> worker function and keep it to hand for your use outside the thread. >> _______________________________________________ >> Tutor maillist - Tutor at python.org >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor >> >-- >https://mail.python.org/mailman/listinfo/python-list -- Dan Ciprus [ curl -L http://git.io/unix ] -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 659 bytes Desc: not available URL: From cs at cskk.id.au Thu Oct 3 19:17:19 2024 From: cs at cskk.id.au (Cameron Simpson) Date: Fri, 4 Oct 2024 09:17:19 +1000 Subject: [Tutor] How to stop a specific thread in Python 2.7? In-Reply-To: References: Message-ID: On 03Oct2024 22:12, Dan Ciprus (dciprus) wrote: >I'd be interested too :-). Untested sketch: def make_thread(target, *a, E=None, **kw): ''' Make a new Event E and Thread T, pass `[E,*a]` as the target positional arguments. A shared preexisting Event may be supplied. Return a 2-tuple of `(T,E)`. ''' if E is None: E = Event() T = Thread(target=target, args=[E, *a], kwargs=kw) return T, E Something along those lines. Cheers, Cameron Simpson From mk1853387 at gmail.com Sat Oct 5 13:55:36 2024 From: mk1853387 at gmail.com (marc nicole) Date: Sat, 5 Oct 2024 19:55:36 +0200 Subject: How to check whether lip movement is significant using face landmarks in dlib? Message-ID: I am trying to assess whether the lips of a person are moving too much while the mouth is closed (to conclude they are chewing). I try to assess the lip movement through landmarks (dlib) : Inspired by the mouth example ( https://github.com/mauckc/mouth-open/blob/master/detect_open_mouth.py#L17), and using it before the following function (as a primary condition for telling the person is chewing), I wrote the following function: def lips_aspect_ratio(shape): # grab the indexes of the facial landmarks for the lip (mStart, mEnd) = (61, 68) lip = shape[mStart:mEnd] print(len(lip)) # compute the euclidean distances between the two sets of # vertical lip landmarks (x, y)-coordinates # to reach landmark 68 I need to get lib[7] not lip[6] (while I get lip[7] I get IndexOutOfBoundError) A = dist.euclidean(lip[1], lip[6]) # 62, 68 B = dist.euclidean(lip[3], lip[5]) # 64, 66 # compute the euclidean distance between the horizontal # lip landmark (x, y)-coordinates C = dist.euclidean(lip[0], lip[4]) # 61, 65 # compute the lip aspect ratio mar = (A + B) / (2.0 * C) # return the lip aspect ratio return mar How to define an aspect ratio for the lips to conclude they are moving significantly? Is the mentioned function able to tell whether the lips are significantly moving while the mouth is closed? From ml at fam-goebel.de Sat Oct 5 16:27:33 2024 From: ml at fam-goebel.de (Ulrich Goebel) Date: Sat, 5 Oct 2024 22:27:33 +0200 Subject: Best Practice Virtual Environment Message-ID: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Hi, I learned to use virtual environments where ever possible, and I learned to pip install the required packages there. That works quite nice at home. Now I come to deploy a Python script on a debian linux server, making it usable for a couple of users there. Debian (or even Python3 itself) doesn't allow to pip install required packages system wide, so I have to use virtual environments even there. But is it right, that I have to do that for every single user? Can someone give me a hint to find an howto for that? Best regards Ulrich -- Ulrich Goebel From cs at cskk.id.au Sat Oct 5 17:59:56 2024 From: cs at cskk.id.au (Cameron Simpson) Date: Sun, 6 Oct 2024 08:59:56 +1100 Subject: Best Practice Virtual Environment In-Reply-To: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: On 05Oct2024 22:27, Ulrich Goebel wrote: >Debian (or even Python3 itself) doesn't allow to pip install required >packages system wide, This is gnerally a good thing. You might modify a critical system-used package. >But is it right, that I have to do that for every single user? No. Just make a shared virtualenv, eg in /usr/local or /opt somewhere. Have the script commence with: #!/path/to/the/shred/venv/bin/python and make it readable and executable. Problem solved. Cheers, Cameron Simpson From list1 at tompassin.net Sat Oct 5 17:31:34 2024 From: list1 at tompassin.net (Thomas Passin) Date: Sat, 5 Oct 2024 17:31:34 -0400 Subject: Best Practice Virtual Environment In-Reply-To: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: <10ddef1d-d1e1-4614-8958-1f1c278c1ce1@tompassin.net> On 10/5/2024 4:27 PM, Ulrich Goebel via Python-list wrote: > Hi, > > I learned to use virtual environments where ever possible, and I learned to pip install the required packages there. > > That works quite nice at home. Now I come to deploy a Python script on a debian linux server, making it usable for a couple of users there. > > Debian (or even Python3 itself) doesn't allow to pip install required packages system wide, so I have to use virtual environments even there. But is it right, that I have to do that for every single user? > > Can someone give me a hint to find an howto for that? One alternative is to install a different version of Python without replacing the system's version. For example, if the system uses Python 3.11, install Python 3.12. That way there is no risk of breaking system operation, and you can install what you like where you like. From Karsten.Hilbert at gmx.net Sat Oct 5 18:21:09 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Sun, 6 Oct 2024 00:21:09 +0200 Subject: Best Practice Virtual Environment In-Reply-To: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: Am Sat, Oct 05, 2024 at 10:27:33PM +0200 schrieb Ulrich Goebel via Python-list: > Debian (or even Python3 itself) doesn't allow to pip install required packages system wide, so I have to use virtual environments even there. But is it right, that I have to do that for every single user? > > Can someone give me a hint to find an howto for that? AFAICT the factual consensus appears to be install modules as packaged by the system you won't need anything else If you do find how to cleanly install non-packaged modules in a system-wide way (even if that means installing every application into its own *system-wide* venv) - do let me know. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From Karsten.Hilbert at gmx.net Sun Oct 6 09:44:02 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Sun, 6 Oct 2024 15:44:02 +0200 Subject: Best Practice Virtual Environment In-Reply-To: References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: Am Sun, Oct 06, 2024 at 12:21:09AM +0200 schrieb Karsten Hilbert via Python-list: > Am Sat, Oct 05, 2024 at 10:27:33PM +0200 schrieb Ulrich Goebel via Python-list: > > > Debian (or even Python3 itself) doesn't allow to pip install required packages system wide, so I have to use virtual environments even there. But is it right, that I have to do that for every single user? > > > > Can someone give me a hint to find an howto for that? > > If you do find how to cleanly install non-packaged modules > in a system-wide way (even if that means installing every > application into its own *system-wide* venv) - do let me > know. It seems dh-virtualenv is one way to do it. On Debian. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From transreductionist at gmail.com Sun Oct 6 13:30:24 2024 From: transreductionist at gmail.com (transreductionist) Date: Sun, 6 Oct 2024 13:30:24 -0400 Subject: Best Practice Virtual Environment In-Reply-To: References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: This is how we handle this problem at a large organization. In the repository there are a number of build scripts. For convenience we use poetry (poetry.toml) to manage the virtual environment. A pyproduct.toml is used to define dependencies, how tests are run, the linter config, etc. So there are scripts for poetry lock, poetry install, and whatever else is needed. A user pulls down the repository and runs 1. poetry lock 2. poetry install And they have their environment with the proper dependencies. On Sun, Oct 6, 2024, 09:47 Karsten Hilbert via Python-list < python-list at python.org> wrote: > Am Sun, Oct 06, 2024 at 12:21:09AM +0200 schrieb Karsten Hilbert via > Python-list: > > > Am Sat, Oct 05, 2024 at 10:27:33PM +0200 schrieb Ulrich Goebel via > Python-list: > > > > > Debian (or even Python3 itself) doesn't allow to pip install required > packages system wide, so I have to use virtual environments even there. But > is it right, that I have to do that for every single user? > > > > > > Can someone give me a hint to find an howto for that? > > > > If you do find how to cleanly install non-packaged modules > > in a system-wide way (even if that means installing every > > application into its own *system-wide* venv) - do let me > > know. > > It seems dh-virtualenv is one way to do it. On Debian. > > Karsten > -- > GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B > -- > https://mail.python.org/mailman/listinfo/python-list > From transreductionist at gmail.com Sun Oct 6 13:31:09 2024 From: transreductionist at gmail.com (transreductionist) Date: Sun, 6 Oct 2024 13:31:09 -0400 Subject: Best Practice Virtual Environment In-Reply-To: References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: byproduct.toml On Sun, Oct 6, 2024, 13:30 transreductionist wrote: > This is how we handle this problem at a large organization. > > In the repository there are a number of build scripts. For convenience we > use poetry (poetry.toml) to manage the virtual environment. A > pyproduct.toml is used to define dependencies, how tests are run, the > linter config, etc. > > So there are scripts for poetry lock, poetry install, and whatever else is > needed. > > A user pulls down the repository and runs > 1. poetry lock > 2. poetry install > And they have their environment with the proper dependencies. > > On Sun, Oct 6, 2024, 09:47 Karsten Hilbert via Python-list < > python-list at python.org> wrote: > >> Am Sun, Oct 06, 2024 at 12:21:09AM +0200 schrieb Karsten Hilbert via >> Python-list: >> >> > Am Sat, Oct 05, 2024 at 10:27:33PM +0200 schrieb Ulrich Goebel via >> Python-list: >> > >> > > Debian (or even Python3 itself) doesn't allow to pip install required >> packages system wide, so I have to use virtual environments even there. But >> is it right, that I have to do that for every single user? >> > > >> > > Can someone give me a hint to find an howto for that? >> > >> > If you do find how to cleanly install non-packaged modules >> > in a system-wide way (even if that means installing every >> > application into its own *system-wide* venv) - do let me >> > know. >> >> It seems dh-virtualenv is one way to do it. On Debian. >> >> Karsten >> -- >> GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > From antoon.pardon at vub.be Sun Oct 6 16:19:10 2024 From: antoon.pardon at vub.be (Antoon Pardon) Date: Sun, 6 Oct 2024 22:19:10 +0200 Subject: Beazley's Problem In-Reply-To: <0709b4b8b0bbf2a32d53649d1a6fbefbcd44a68a.camel@tilde.green> References: <87tte941ko.fsf@nightsong.com> <87plow4v4p.fsf@nightsong.com> <0709b4b8b0bbf2a32d53649d1a6fbefbcd44a68a.camel@tilde.green> Message-ID: Op 23/09/2024 om 09:44 schreef Annada Behera via Python-list: > The "next-level math trick" Newton-Raphson has nothing to do with > functional programming. I have written solvers in purely iterative > style. What is your point. Any problem solved in a functional style can also be solved in a pure interative style. So you having written something in an interative style doesn't contradict Newton-Raphson being expressable in a functional style. > As far as I know, Newton-Raphson is the opposite of functional > programming as you iteratively solve for the root. Functional programming > is stateless where you are not allowed to store any state (current best > guess root). That doesn't prevent you from passing state along as a parameter, usualy in some helper function. -- Antoon Pardon. From thomas at python.org Mon Oct 7 14:57:24 2024 From: thomas at python.org (Thomas Wouters) Date: Mon, 7 Oct 2024 11:57:24 -0700 Subject: [RELEASE] Python 3.13.0 (final) released Message-ID: After all the shenanigans two weeks ago ? everyone discovering nasty little problems in release candidate 2 ? the last week was suspiciously quiet, and therefore I can finally say: Python 3.13.0 is now available https://www.python.org/downloads/release/python-3130/ This is the stable release of Python 3.13.0 Python 3.13.0 is the newest major release of the Python programming language, and it contains many new features and optimizations compared to Python 3.12. (Compared to the last release candidate, 3.13.0rc3, 3.13.0 contains two small bug fixes and some documentation and testing changes.) Major new features of the 3.13 series, compared to 3.12 Some of the new major new features and changes in Python 3.13 are: New features - A new and improved interactive interpreter , based on PyPy ?s, featuring multi-line editing and color support, as well as colorized exception tracebacks . - An *experimental* free-threaded build mode , which disables the Global Interpreter Lock, allowing threads to run more concurrently. The build mode is available as an experimental feature in the Windows and macOS installers as well. - A preliminary, *experimental* JIT , providing the ground work for significant performance improvements. - The locals() builtin function (and its C equivalent) now has well-defined semantics when mutating the returned mapping , which allows debuggers to operate more consistently. - A modified version of mimalloc is now included, optional but enabled by default if supported by the platform, and required for the free-threaded build mode. - Docstrings now have their leading indentation stripped , reducing memory use and the size of .pyc files. (Most tools handling docstrings already strip leading indentation.) - The dbm module has a new dbm.sqlite3 backend that is used by default when creating new files. - The minimum supported macOS version was changed from 10.9 to 10.13 (High Sierra). Older macOS versions will not be supported going forward. - WASI is now a Tier 2 supported platform . Emscripten is no longer an officially supported platform (but Pyodide continues to support Emscripten). - iOS is now a Tier 3 supported platform . - Android is now a Tier 3 supported platform . Typing - Support for type defaults in type parameters . - A new type narrowing annotation , typing.TypeIs. - A new annotation for read-only items in TypeDicts . - A new annotation for marking deprecations in the type system . Removals and new deprecations - PEP 594 (Removing dead batteries from the standard library) scheduled removals of many deprecated modules: aifc, audioop, chunk, cgi, cgitb, crypt, imghdr, mailcap, msilib, nis, nntplib, ossaudiodev, pipes, sndhdr, spwd, sunau, telnetlib, uu, xdrlib, lib2to3. - Many other removals of deprecated classes, functions and methods in various standard library modules. - C API removals and deprecations . (Some removals present in alpha 1 were reverted in alpha 2, as the removals were deemed too disruptive at this time.) - New deprecations , most of which are scheduled for removal from Python 3.15 or 3.16. For more details on the changes to Python 3.13, see What?s new in Python 3.13 . More resources - Online Documentation - PEP 719 , 3.13 Release Schedule - Report bugs at Issues ? python/cpython ? GitHub . - Help fund Python directly (or via GitHub Sponsors ), and support the Python community . We hope you enjoy the new releases! Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation . Choo-choo from the release train, Your release team, Thomas Wouters Ned Deily Steve Dower ?ukasz Langa From olegsivokon at gmail.com Sun Oct 6 07:42:18 2024 From: olegsivokon at gmail.com (Left Right) Date: Sun, 6 Oct 2024 13:42:18 +0200 Subject: Best Practice Virtual Environment In-Reply-To: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> References: <20241005222733.fd60f7e672e849aa63c8b360@fam-goebel.de> Message-ID: Hi. The advice here is from a perspective of someone who does this professionally, for large, highly loaded systems. This doesn't necessarily apply to your case / not to the full extent. > Debian (or even Python3 itself) doesn't allow to pip install required packages system wide, so I have to use virtual environments even there. But is it right, that I have to do that for every single user? 1. Yes, you can install packages system-wide with pip, but you don't need to. 2. pip is OK to install requirements once, to figure out what they are (in dev. environment). It's bad for production environment: it's slow, inconsistent, and insecure. For more context: pip dependency resolution is especially slow when installing local interdependent packages. Sometimes it can take up to a minute per package. Inconsistency comes from pip not using package checksums and signatures (by default): so, if the package being installed was updated w/o version update, to pip it's going to be the same package. Not just that, for some packages pip has to resort to building them from source, in which case nobody can guarantee the end result. Insecurity comes from Python allowing out-of-index package downloads during install. You can distribute your package through PyPI, while its dependency will point to a random Web site in a country with very permissive laws (and, essentially, just put malware on your computer). It's impossible to properly audit such situations because the outside Web site doesn't have to provide any security guarantees. To package anything Linux-related, use the packaging mechanism provided by the flavor of Linux you are using. In the case of Debian, use DEB. Don't use virtual environments for this (it's possible to roll the entire virtual environment into a DEB package, but that's a bad idea). The reason to do this is so that your package plays nice with other Python packages available as DEB packages. This will allow your users to use a consistent interface when dealing with installing packages, and to avoid situation when an out-of-bound tool installed something in the same path where dpkg will try to install the same files, but coming from a legitimate package. If you package the whole virtual environment, you might run into problems with locating native libraries linked from Python native modules. You will make it hard to audit the installation, especially when it comes to certificates, TLS etc. stuff that, preferably, should be handled in a centralized way by the OS. Of course, countless times I've seen developers do the exact opposite of what I'm suggesting here. Also, the big actors in the industry s.a. Microsoft and Amazon do the exact opposite of what I suggest. I have no problem acknowledging this and still maintaining that they are wrong and I'm right :) But, you don't have to trust me! From michael.stemper at gmail.com Mon Oct 7 09:35:32 2024 From: michael.stemper at gmail.com (Michael F. Stemper) Date: Mon, 7 Oct 2024 08:35:32 -0500 Subject: Correct syntax for pathological re.search() Message-ID: I'm trying to discard lines that include the string "\sout{" (which is TeX, for those who are curious. I have tried: if not re.search("\sout{", line): if not re.search("\sout\{", line): if not re.search("\\sout{", line): if not re.search("\\sout\{", line): But the lines with that string keep coming through. What is the right syntax to properly escape the backslash and the left curly bracket? -- Michael F. Stemper No animals were harmed in the composition of this message. From michael.stemper at gmail.com Mon Oct 7 10:14:53 2024 From: michael.stemper at gmail.com (Michael F. Stemper) Date: Mon, 7 Oct 2024 09:14:53 -0500 Subject: Correct syntax for pathological re.search() In-Reply-To: References: Message-ID: On 07/10/2024 08.56, Stefan Ram wrote: > "Michael F. Stemper" wrote or quoted: >> if not re.search("\\sout\{", line): > > So, if you're not down to slap an "r" before your string literals, > you're going to end up doubling down on every backslash. Never heard of that before, but it did the trick. > Long story short, those double backslashes in your regex? > They'll be quadrupling up in your Python string literal! > for line in lines: > product = re.search( "\\\\sout\\{", line ) This also worked. For now, I'll use the "r" in a cargo-cult fashion, until I decide which syntax I prefer. (Is there any reason that one or the other is preferable?) Thanks for your help, Mike -- Michael F. Stemper Economists have correctly predicted seven of the last three recessions. From jon+usenet at unequivocal.eu Mon Oct 7 11:43:59 2024 From: jon+usenet at unequivocal.eu (Jon Ribbens) Date: Mon, 7 Oct 2024 15:43:59 -0000 (UTC) Subject: Correct syntax for pathological re.search() References: Message-ID: On 2024-10-07, Stefan Ram wrote: > "Michael F. Stemper" wrote or quoted: >>For now, I'll use the "r" in a cargo-cult fashion, until I decide which >>syntax I prefer. (Is there any reason that one or the other is preferable?) > > I'd totally go with the r-style notation! > > It's got one bummer though - you can't end such a string literal with > a backslash. But hey, no biggie, you could use one of those notations: > > main.py > > path = r'C:\Windows\example' + '\\' > > print( path ) > > path = r''' > C:\Windows\example\ > '''.strip() > > print( path ) > > stdout > > C:\Windows\example\ > C:\Windows\example\ > > . ... although of course in this example you should probably do neither of those things, and instead do: from pathlib import Path path = Path(r'C:\Windows\example') since in a Path the trailing '\' or '/' is unnecessary. Which leaves very few remaining uses for a raw-string with a trailing '\'... From pieter-l at vanoostrum.org Tue Oct 8 13:50:14 2024 From: pieter-l at vanoostrum.org (Pieter van Oostrum) Date: Tue, 08 Oct 2024 19:50:14 +0200 Subject: Correct syntax for pathological re.search() References: Message-ID: ram at zedat.fu-berlin.de (Stefan Ram) writes: > "Michael F. Stemper" wrote or quoted: > > path = r'C:\Windows\example' + '\\' > You could even omit the '+'. Then the concatenation is done at parsing time instead of run time. -- Pieter van Oostrum www: http://pieter.vanoostrum.org/ PGP key: [8DAE142BE17999C4] From Karsten.Hilbert at gmx.net Tue Oct 8 14:30:34 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Tue, 8 Oct 2024 20:30:34 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: References: Message-ID: Am Mon, Oct 07, 2024 at 08:35:32AM -0500 schrieb Michael F. Stemper via Python-list: > I'm trying to discard lines that include the string "\sout{" (which is TeX, for > those who are curious. I have tried: > if not re.search("\sout{", line): > if not re.search("\sout\{", line): > if not re.search("\\sout{", line): > if not re.search("\\sout\{", line): unwanted_tex = '\sout{' if unwanted_tex not in line: do_something_with_libreoffice() Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From python at mrabarnett.plus.com Tue Oct 8 15:07:04 2024 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 8 Oct 2024 20:07:04 +0100 Subject: Correct syntax for pathological re.search() In-Reply-To: References: Message-ID: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> On 2024-10-08 19:30, Karsten Hilbert via Python-list wrote: > Am Mon, Oct 07, 2024 at 08:35:32AM -0500 schrieb Michael F. Stemper via Python-list: > >> I'm trying to discard lines that include the string "\sout{" (which is TeX, for >> those who are curious. I have tried: >> if not re.search("\sout{", line): >> if not re.search("\sout\{", line): >> if not re.search("\\sout{", line): >> if not re.search("\\sout\{", line): > > unwanted_tex = '\sout{' > if unwanted_tex not in line: do_something_with_libreoffice() > That should be: unwanted_tex = r'\sout{' or: unwanted_tex = '\\sout{' From python at mrabarnett.plus.com Tue Oct 8 15:11:40 2024 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 8 Oct 2024 20:11:40 +0100 Subject: Correct syntax for pathological re.search() In-Reply-To: References: Message-ID: On 2024-10-07 14:35, Michael F. Stemper via Python-list wrote: > I'm trying to discard lines that include the string "\sout{" (which is TeX, for > those who are curious. I have tried: > if not re.search("\sout{", line): > if not re.search("\sout\{", line): > if not re.search("\\sout{", line): > if not re.search("\\sout\{", line): > > But the lines with that string keep coming through. What is the right syntax to > properly escape the backslash and the left curly bracket? > String literals use backslash is an escape character, so it needs to be escaped, or you need to use a "raw" string. However, regex also uses backslash as an escape character. That means that a literal backslash in a regex that's in a plain string literal needs to be doubly-escaped, once for the string literal and again for the regex. From Karsten.Hilbert at gmx.net Tue Oct 8 16:17:49 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Tue, 8 Oct 2024 22:17:49 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> References: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> Message-ID: Am Tue, Oct 08, 2024 at 08:07:04PM +0100 schrieb MRAB via Python-list: > >unwanted_tex = '\sout{' > >if unwanted_tex not in line: do_something_with_libreoffice() > > > That should be: > > unwanted_tex = r'\sout{' Hm. Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> tex = '\sout{' >>> tex '\\sout{' >>> Am I missing something ? Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From alan at csail.mit.edu Tue Oct 8 16:59:48 2024 From: alan at csail.mit.edu (Alan Bawden) Date: Tue, 08 Oct 2024 16:59:48 -0400 Subject: Correct syntax for pathological re.search() References: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> Message-ID: <864j5mfgzf.fsf@williamsburg.bawden.org> Karsten Hilbert writes: Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> tex = '\sout{' >>> tex '\\sout{' >>> Am I missing something ? You're missing the warning it generates: > python -E -Wonce Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> tex = '\sout{' :1: DeprecationWarning: invalid escape sequence '\s' >>> From python at mrabarnett.plus.com Tue Oct 8 18:10:03 2024 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 8 Oct 2024 23:10:03 +0100 Subject: Correct syntax for pathological re.search() In-Reply-To: <864j5mfgzf.fsf@williamsburg.bawden.org> References: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> <864j5mfgzf.fsf@williamsburg.bawden.org> Message-ID: <3ab03165-185b-45f7-9fba-1918b83afdd8@mrabarnett.plus.com> On 2024-10-08 21:59, Alan Bawden via Python-list wrote: > Karsten Hilbert writes: > > Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> tex = '\sout{' > >>> tex > '\\sout{' > >>> > > Am I missing something ? > > You're missing the warning it generates: > > > python -E -Wonce > Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> tex = '\sout{' > :1: DeprecationWarning: invalid escape sequence '\s' > >>> You got lucky that \s in invalid. If it had been \t you would've got a tab character. Historically, Python treated invalid escape sequences as literals, but it's deprecated now and will become an outright error in the future (probably) because it often hides a mistake, such as the aforementioned \t being treated as a tab character when the user expected it to be a literal backslash followed by letter t. (This can occur within Windows file paths written in plain string literals.) From avi.e.gross at gmail.com Tue Oct 8 19:43:35 2024 From: avi.e.gross at gmail.com (avi.e.gross at gmail.com) Date: Tue, 8 Oct 2024 19:43:35 -0400 Subject: Signing off In-Reply-To: <007701d89150$1dea86b0$59bf9410$@gmail.com> References: <008a01d890e0$756336a0$6029a3e0$@gmail.com> <7143f0d4-0fdf-88eb-22d9-391065b28044@yahoo.co.uk> <007701d89150$1dea86b0$59bf9410$@gmail.com> Message-ID: <012d01db19db$e42d7c40$ac8874c0$@gmail.com> Just a final brief note. I am leaving the python community so don't worry that anything happened to me. I have a disagreement with the direction some people are taking with the python community that is my issue and it that probably will not bother most people. I have lots of other interests including many other programming languages and it is time I stopped using python when I have so much else to choose from. My best wishes to everyone here. Avi From Karsten.Hilbert at gmx.net Wed Oct 9 14:06:10 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Wed, 9 Oct 2024 20:06:10 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: <864j5mfgzf.fsf@williamsburg.bawden.org> References: <1e13579f-693d-44cf-a563-7c0c9767e04e@mrabarnett.plus.com> <864j5mfgzf.fsf@williamsburg.bawden.org> Message-ID: Am Tue, Oct 08, 2024 at 04:59:48PM -0400 schrieb Alan Bawden via Python-list: > Karsten Hilbert writes: > > Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> tex = '\sout{' > >>> tex > '\\sout{' > >>> > > Am I missing something ? > > You're missing the warning it generates: > > :1: DeprecationWarning: invalid escape sequence '\s' I knew it'd be good to ask :-D Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From martin.stopka at gmail.com Thu Oct 10 09:07:20 2024 From: martin.stopka at gmail.com (stopa) Date: Thu, 10 Oct 2024 15:07:20 +0200 Subject: dis.get_instructions not showing CACHE instructions Message-ID: Hello, I noticed the change in dis module, no longer requiring show_caches to be set to True to show cache instructions. However I am not able to display them with get_instructions. Is there by any chance some bug preventing me to see them? Thanks Martin From vinay_sajip at yahoo.co.uk Thu Oct 10 10:50:38 2024 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Thu, 10 Oct 2024 14:50:38 +0000 (UTC) Subject: Announcement: distlib 0.3.9 released on PyPI References: <517012813.550286.1728571838732.ref@mail.yahoo.com> Message-ID: <517012813.550286.1728571838732@mail.yahoo.com> Version 0.3.9 of distlib has recently been released on PyPI [1]. For newcomers, distlib is a library of packaging functionality which is intended to be usable as the basis for third-party packaging tools. The main changes in this release are as follows: * Merge #215: Preload script wrappers on Windows to assist with a pip issue. * Fix #220: Remove duplicated newline in shebang of windows launcher. * Fix #222: Support mounting wheels that use extensions without an EXTENSIONS file. * Fix #224: Do not use the absolute path to cache wheel extensions. * Fix #225: Add support for wheel compatibility with the limited API. * Fix #230: Add handling for cross-compilation environments. A more detailed change log is available at [2]. Please try it out, and if you find any problems or have any suggestions for improvements, please give some feedback using the issue tracker at [3]. Regards, Vinay Sajip [1] https://pypi.org/project/distlib/0.3.9/ [2] https://distlib.readthedocs.io/en/latest/overview.html#change-log-for-distlib [3] https://github.com/pypa/distlib/issues/new/choose From barry at barrys-emacs.org Thu Oct 10 12:53:37 2024 From: barry at barrys-emacs.org (Barry) Date: Thu, 10 Oct 2024 17:53:37 +0100 Subject: dis.get_instructions not showing CACHE instructions In-Reply-To: References: Message-ID: <0FF12307-A10D-489B-8BF7-B397A93D698D@barrys-emacs.org> > On 10 Oct 2024, at 14:18, stopa via Python-list wrote: > > ?Hello, > I noticed the change in dis module, no longer requiring show_caches to be > set to True to show cache instructions. However I am not able to display > them with get_instructions. > Is there by any chance some bug preventing me to see them? We need more information to be able to comment. What version of python do you see this working for? What version of python are you see it change? Can you show an example function that demonstrates the issue please. Barry > > Thanks > > Martin > -- > https://mail.python.org/mailman/listinfo/python-list > From martin.stopka at gmail.com Thu Oct 10 13:31:57 2024 From: martin.stopka at gmail.com (stopa) Date: Thu, 10 Oct 2024 19:31:57 +0200 Subject: dis.get_instructions not showing CACHE instructions In-Reply-To: <0FF12307-A10D-489B-8BF7-B397A93D698D@barrys-emacs.org> References: <0FF12307-A10D-489B-8BF7-B397A93D698D@barrys-emacs.org> Message-ID: Oh god I am sorry :/ I somehow missed information about cache_info field. I was expecting to see those cache instructions as normal opcodes. So its working as expected. Thanks for your help. M. ?t 10. 10. 2024 o 18:53 Barry nap?sal(a): > > > > On 10 Oct 2024, at 14:18, stopa via Python-list > wrote: > > > > ?Hello, > > I noticed the change in dis module, no longer requiring show_caches to be > > set to True to show cache instructions. However I am not able to display > > them with get_instructions. > > Is there by any chance some bug preventing me to see them? > > We need more information to be able to comment. > > What version of python do you see this working for? > What version of python are you see it change? > > Can you show an example function that demonstrates the issue please. > > Barry > > > > > Thanks > > > > Martin > > -- > > https://mail.python.org/mailman/listinfo/python-list > > > > From dciprus at cisco.com Fri Oct 11 14:32:40 2024 From: dciprus at cisco.com (Dan Ciprus (dciprus)) Date: Fri, 11 Oct 2024 18:32:40 +0000 Subject: [Tutor] How to stop a specific thread in Python 2.7? In-Reply-To: References: Message-ID: Thank you for the hint ! On Fri, Oct 04, 2024 at 09:17:19AM GMT, Cameron Simpson wrote: >On 03Oct2024 22:12, Dan Ciprus (dciprus) wrote: >>I'd be interested too :-). > >Untested sketch: > > def make_thread(target, *a, E=None, **kw): > ''' > Make a new Event E and Thread T, pass `[E,*a]` as the target >positional arguments. > A shared preexisting Event may be supplied. > Return a 2-tuple of `(T,E)`. > ''' > if E is None: > E = Event() > T = Thread(target=target, args=[E, *a], kwargs=kw) > return T, E > >Something along those lines. > >Cheers, >Cameron Simpson -- Dan Ciprus [ curl -L http://git.io/unix ] -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 659 bytes Desc: not available URL: From avi.e.gross at gmail.com Fri Oct 11 17:13:07 2024 From: avi.e.gross at gmail.com (avi.e.gross at gmail.com) Date: Fri, 11 Oct 2024 17:13:07 -0400 Subject: Correct syntax for pathological re.search() In-Reply-To: References: Message-ID: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Is there some utility function out there that can be called to show what the regular expression you typed in will look like by the time it is ready to be used? Obviously, life is not that simple as it can go through multiple layers with each dealing with a layer of backslashes. But for simple cases, ... -----Original Message----- From: Python-list On Behalf Of Gilmeh Serda via Python-list Sent: Friday, October 11, 2024 10:44 AM To: python-list at python.org Subject: Re: Correct syntax for pathological re.search() On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: > I'm trying to discard lines that include the string "\sout{" (which is > TeX, for those who are curious. I have tried: > if not re.search("\sout{", line): if not re.search("\sout\{", line): > if not re.search("\\sout{", line): if not re.search("\\sout\{", > line): > > But the lines with that string keep coming through. What is the right > syntax to properly escape the backslash and the left curly bracket? $ python Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> s = r"testing \sout{WHADDEVVA}" >>> re.search(r"\\sout{", s) You want a literal backslash, hence, you need to escape everything. It is not enough to escape the "\s" as "\\s", because that only takes care of Python's demands for escaping "\". You also need to escape the "\" for the RegEx as well, or it will read it like it means "\s", which is the RegEx for a space character and therefore your search doesn't match, because it reads it like you want to search for " out{". Therefore, you need to escape it either as per my example, or by using four "\" and no "r" in front of the first quote, which also works: >>> re.search("\\\\sout{", s) You don't need to escape the curly braces. We call them "seagull wings" where I live. -- Gilmeh Sometimes I simply feel that the whole world is a cigarette and I'm the only ashtray. -- https://mail.python.org/mailman/listinfo/python-list From python at mrabarnett.plus.com Fri Oct 11 20:37:55 2024 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 12 Oct 2024 01:37:55 +0100 Subject: Correct syntax for pathological re.search() In-Reply-To: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Message-ID: On 2024-10-11 22:13, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? > > Obviously, life is not that simple as it can go through multiple layers with > each dealing with a layer of backslashes. > > But for simple cases, ... > Yes. It's called 'print'. :-) > > > -----Original Message----- > From: Python-list On > Behalf Of Gilmeh Serda via Python-list > Sent: Friday, October 11, 2024 10:44 AM > To: python-list at python.org > Subject: Re: Correct syntax for pathological re.search() > > On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: > >> I'm trying to discard lines that include the string "\sout{" (which is >> TeX, for those who are curious. I have tried: >> if not re.search("\sout{", line): if not re.search("\sout\{", line): >> if not re.search("\\sout{", line): if not re.search("\\sout\{", >> line): >> >> But the lines with that string keep coming through. What is the right >> syntax to properly escape the backslash and the left curly bracket? > > $ python > Python 3.12.6 (main, Sep 8 2024, 13:18:56) [GCC 14.2.1 20240805] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import re >>>> s = r"testing \sout{WHADDEVVA}" >>>> re.search(r"\\sout{", s) > > > You want a literal backslash, hence, you need to escape everything. > > It is not enough to escape the "\s" as "\\s", because that only takes care > of Python's demands for escaping "\". You also need to escape the "\" for > the RegEx as well, or it will read it like it means "\s", which is the > RegEx for a space character and therefore your search doesn't match, > because it reads it like you want to search for " out{". > > Therefore, you need to escape it either as per my example, or by using > four "\" and no "r" in front of the first quote, which also works: > >>>> re.search("\\\\sout{", s) > > > You don't need to escape the curly braces. We call them "seagull wings" > where I live. > From hjp-python at hjp.at Sat Oct 12 06:59:58 2024 From: hjp-python at hjp.at (Peter J. Holzer) Date: Sat, 12 Oct 2024 12:59:58 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Message-ID: <20241012105958.cbctekv7vustleha@hjp.at> On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? I assume that by "ready to be used" you mean the compiled form? No, there doesn't seem to be a way to dump that. You can p = re.compile("\\\\sout{") print(p.pattern) but that just prints the input string, which you could do without compiling it first. But - without having looked at the implementation - it's far from clear that the compiled form would be useful to the user. It's probably some kind of state machine, and a large table of state transitions isn't very readable. There are a number of websites which visualize regular expressions. Those are probably better for debugging a regular expression than anything the re module could reasonably produce (although with the caveat that such a web site would use a different implementation and therefore might produce different results). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp at hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From list1 at tompassin.net Sat Oct 12 08:51:57 2024 From: list1 at tompassin.net (Thomas Passin) Date: Sat, 12 Oct 2024 08:51:57 -0400 Subject: Correct syntax for pathological re.search() In-Reply-To: <20241012105958.cbctekv7vustleha@hjp.at> References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> Message-ID: <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net> On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote: > On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: >> Is there some utility function out there that can be called to show what the >> regular expression you typed in will look like by the time it is ready to be >> used? > > I assume that by "ready to be used" you mean the compiled form? > > No, there doesn't seem to be a way to dump that. You can > > p = re.compile("\\\\sout{") > print(p.pattern) > > but that just prints the input string, which you could do without > compiling it first. It prints the escaped version, so you can see if you escaped the string as you intended. In this case, the print will display '\\sout{'. That's worth something. > > But - without having looked at the implementation - it's far from clear > that the compiled form would be useful to the user. It's probably some > kind of state machine, and a large table of state transitions isn't very > readable. > > There are a number of websites which visualize regular expressions. > Those are probably better for debugging a regular expression than > anything the re module could reasonably produce (although with the > caveat that such a web site would use a different implementation and > therefore might produce different results). > > hp > > From avi.e.gross at gmail.com Sat Oct 12 10:10:41 2024 From: avi.e.gross at gmail.com (avi.e.gross at gmail.com) Date: Sat, 12 Oct 2024 10:10:41 -0400 Subject: Correct syntax for pathological re.search() In-Reply-To: <20241012105958.cbctekv7vustleha@hjp.at> References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> Message-ID: <003201db1cb0$85ac8760$91059620$@gmail.com> Peter, Matthew understood what I was hinting at in one way and you in another. The question asked how to add some power of two backslashes or make other changes, so the RE functionality sees what you want. The goal is to see what happens when one or more intermediate evaluations may change the string. So, a simple print may suffice as a parallel way to force the same evaluations. Thomas made his point. And, I am starting to feel like I need to change my name to something like Luke since this discussion must be gospel. FYI, I was not planning on posting at all. Time to detach. -----Original Message----- From: Python-list On Behalf Of Peter J. Holzer via Python-list Sent: Saturday, October 12, 2024 7:00 AM To: python-list at python.org Subject: Re: Correct syntax for pathological re.search() On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: > Is there some utility function out there that can be called to show what the > regular expression you typed in will look like by the time it is ready to be > used? I assume that by "ready to be used" you mean the compiled form? No, there doesn't seem to be a way to dump that. You can p = re.compile("\\\\sout{") print(p.pattern) but that just prints the input string, which you could do without compiling it first. But - without having looked at the implementation - it's far from clear that the compiled form would be useful to the user. It's probably some kind of state machine, and a large table of state transitions isn't very readable. There are a number of websites which visualize regular expressions. Those are probably better for debugging a regular expression than anything the re module could reasonably produce (although with the caveat that such a web site would use a different implementation and therefore might produce different results). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp at hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" From list1 at tompassin.net Sat Oct 12 09:06:54 2024 From: list1 at tompassin.net (Thomas Passin) Date: Sat, 12 Oct 2024 09:06:54 -0400 Subject: Correct syntax for pathological re.search() In-Reply-To: References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> Message-ID: On 10/11/2024 8:37 PM, MRAB via Python-list wrote: > On 2024-10-11 22:13, AVI GROSS via Python-list wrote: >> Is there some utility function out there that can be called to show >> what the >> regular expression you typed in will look like by the time it is ready >> to be >> used? >> >> Obviously, life is not that simple as it can go through multiple >> layers with >> each dealing with a layer of backslashes. >> >> But for simple cases, ... >> > Yes. It's called 'print'. :-) There is section in the Python docs about this backslash subject. It's titled "The Backslash Plague" in https://docs.python.org/3/howto/regex.html You can also inspect the compiled expression to see what string it received after all the escaping: >>> import re >>> >>> re_string = '\\w+\\\\sub' >>> re_pattern = re.compile(re_string) >>> >>> # Should look as if we had used r'\w+\\sub' >>> print(re_pattern.pattern) \w+\\sub >> -----Original Message----- >> From: Python-list > bounces+avi.e.gross=gmail.com at python.org> On >> Behalf Of Gilmeh Serda via Python-list >> Sent: Friday, October 11, 2024 10:44 AM >> To: python-list at python.org >> Subject: Re: Correct syntax for pathological re.search() >> >> On Mon, 7 Oct 2024 08:35:32 -0500, Michael F. Stemper wrote: >> >>> I'm trying to discard lines that include the string "\sout{" (which is >>> TeX, for those who are curious. I have tried: >>> ?? if not re.search("\sout{", line): if not re.search("\sout\{", line): >>> ?? if not re.search("\\sout{", line): if not re.search("\\sout\{", >>> ?? line): >>> >>> But the lines with that string keep coming through. What is the right >>> syntax to properly escape the backslash and the left curly bracket? >> >> $ python >> Python 3.12.6 (main, Sep? 8 2024, 13:18:56) [GCC 14.2.1 20240805] on >> linux >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import re >>>>> s = r"testing \sout{WHADDEVVA}" >>>>> re.search(r"\\sout{", s) >> >> >> You want a literal backslash, hence, you need to escape everything. >> >> It is not enough to escape the "\s" as "\\s", because that only takes >> care >> of Python's demands for escaping "\". You also need to escape the "\" for >> the RegEx as well, or it will read it like it means "\s", which is the >> RegEx for a space character and therefore your search doesn't match, >> because it reads it like you want to search for " out{". >> >> Therefore, you need to escape it either as per my example, or by using >> four "\" and no "r" in front of the first quote, which also works: >> >>>>> re.search("\\\\sout{", s) >> >> >> You don't need to escape the curly braces. We call them "seagull wings" >> where I live. >> > From martin.schoon at gmail.com Tue Oct 15 16:16:41 2024 From: martin.schoon at gmail.com (Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?=) Date: 15 Oct 2024 20:16:41 GMT Subject: Old matplotlib animation now fails Message-ID: Some years ago I created a Python program that reads GPS data and creates an animation stored in an mp4 file. Not very elegant but it worked. Not very original as it was based on the example found here: https://shorturl.at/dTCZZ Last time it worked was about a year ago. Since then I have moved to a later version of Debian and Conda and as a consequence a later version of Python 3 (now 3.12.2). Now my code fails. I have downloaded the latest version of the example and it also fails. It is the second to last line that throws an error: l.set_data(x0, y0) The error messages drills down to something called "/home/.../matplotlib/lines.py", line 1289, in set_xdata and tells me 'x must be a sequence' I have started to dig around in matplotlib's documentation but my strategy is clearly wanting. I don't really know where to start looking for information on how to correct my code. Hence, this call for help. Any ideas? TIA /Martin From python at mrabarnett.plus.com Tue Oct 15 19:38:01 2024 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 16 Oct 2024 00:38:01 +0100 Subject: Old matplotlib animation now fails In-Reply-To: References: Message-ID: <2136e51b-c556-4bb3-bcb3-d7299ae80be5@mrabarnett.plus.com> On 2024-10-15 21:16, Martin Sch??n via Python-list wrote: > Some years ago I created a Python program that reads GPS data and > creates an animation stored in an mp4 file. Not very elegant but it > worked. Not very original as it was based on the example found here: > > https://shorturl.at/dTCZZ > > Last time it worked was about a year ago. Since then I have moved to a > later version of Debian and Conda and as a consequence a later version > of Python 3 (now 3.12.2). > > Now my code fails. I have downloaded the latest version of the example > and it also fails. > > It is the second to last line that throws an error: > > l.set_data(x0, y0) > > The error messages drills down to something called > "/home/.../matplotlib/lines.py", line 1289, in set_xdata > > and tells me 'x must be a sequence' > > I have started to dig around in matplotlib's documentation but my > strategy is clearly wanting. I don't really know where to start > looking for information on how to correct my code. Hence, this > call for help. > > Any ideas? > This is from the help: """ Help on function set_data in module matplotlib.lines: set_data(self, *args) Set the x and y data. Parameters ---------- *args : (2, N) array or two 1D arrays See Also -------- set_xdata set_ydata """ So, the arguments should be arrays: For example: x0, y0 = np.array([0.0]), np.array([0.0]) Has the API changed at some point? From hugo at python.org Wed Oct 16 04:09:13 2024 From: hugo at python.org (Hugo van Kemenade) Date: Wed, 16 Oct 2024 11:09:13 +0300 Subject: [RELEASE] Python 3.14.0 alpha 1 is now available Message-ID: It's now time for a new alpha of a new version of Python! https://www.python.org/downloads/release/python-3140a1/ **This is an early developer preview of Python 3.14** # Major new features of the 3.14 series, compared to 3.13 Python 3.14 is still in development. This release, 3.14.0a1 is the first of seven planned alpha releases. Alpha releases are intended to make it easier to test the current state of new features and bug fixes and to test the release process. During the alpha phase, features may be added up until the start of the beta phase (2025-05-06) and, if necessary, may be modified or deleted up until the release candidate phase (2025-07-22). Please keep in mind that this is a preview release and its use is **not** recommended for production environments. Many new features for Python 3.14 are still being planned and written. Among the new major new features and changes so far: * PEP 649 (https://peps.python.org/pep-0649/): deferred evaluation of annotations ( https://docs.python.org/3.14/whatsnew/3.14.html#pep-649-deferred-evaluation-of-annotations ) * Improved error messages ( https://docs.python.org/3.14/whatsnew/3.14.html#improved-error-messages) * (Hey, **fellow core developer,** if a feature you find important is missing from this list, [let Hugo know (hugo at python.org).) The next pre-release of Python 3.14 will be 3.14.0a2, currently scheduled for 2024-11-19. # More resources * Online documentation: https://docs.python.org/3.14/ * PEP 745, 3.14 Release Schedule: https://peps.python.org/pep-0719/ * Report bugs at https://github.com/python/cpython/issues * Help fund Python and its community: https://www.python.org/psf/donations/ # And now for something completely different ? (or pi) is a mathematical constant, approximately 3.14, for the ratio of a circle's circumference to its diameter. It is an irrational number, which means it cannot be written as a simple fraction of two integers. When written as a decimal, its digits go on forever without ever repeating a pattern. Here's 76 digits of ?: 3.141592653589793238462643383279502884197169399375105820974944592307816406286 Piphilology is the creation of mnemonics to help remember digits of ?. In a pi-poem, or "piem", the number of letters in a word equal the corresponding digit. This covers 9 digits, 3.14159265: > *How I wish I could recollect pi easily today!* One of the most well-known covers 15 digits, 3.14159265358979: > *How I want a drink, alcoholic of course, after the heavy chapters involving quantum mechanics!* Here's a 35-word piem in the shape of a circle, 3.1415926535897932384626433832795728: It's a fact A ratio immutable Of circle round and width, Produces geometry's deepest conundrum. For as the numerals stay random, No repeat lets out its presence, Yet it forever stretches forth. Nothing to eternity. The Guiness World Record for memorising the most digits is held by Rajveer Meena, who recited 70,000 digits blindfold in 2015. The unofficial record is held by Akira Haraguchi who recited 100,000 digits in 2006. # Enjoy the new release Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organization contributions to the Python Software Foundation. Regards from a bright and colourful Helsinki, Your release team, Hugo van Kemenade Ned Deily Steve Dower ?ukasz Langa From martin.schoon at gmail.com Wed Oct 16 04:20:10 2024 From: martin.schoon at gmail.com (Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?=) Date: 16 Oct 2024 08:20:10 GMT Subject: Old matplotlib animation now fails References: Message-ID: Den 2024-10-15 skrev Stefan Ram : > Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?= wrote or quoted: >>l.set_data(x0, y0) > > Well, I got to say, it's pretty rad that you're rocking Python! > That language is the bee's knees, for real. > > As for your question, here's my two cents off the cuff: > Could it be that the newer Matplotlib versions are jonesing > for something like "l.set_data( [ x0 ],[ y0 ])" in that spot? > Thanks, that was quick and adding square brackets fixed my code. Me rocking Python? /Martin From martin.schoon at gmail.com Wed Oct 16 04:23:17 2024 From: martin.schoon at gmail.com (Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?=) Date: 16 Oct 2024 08:23:17 GMT Subject: Old matplotlib animation now fails References: <2136e51b-c556-4bb3-bcb3-d7299ae80be5@mrabarnett.plus.com> Message-ID: Den 2024-10-15 skrev MRAB : > On 2024-10-15 21:16, Martin Sch??n via Python-list wrote: >> Some years ago I created a Python program that reads GPS data and >> It is the second to last line that throws an error: >> >> l.set_data(x0, y0) >> >> The error messages drills down to something called >> "/home/.../matplotlib/lines.py", line 1289, in set_xdata >> >> and tells me 'x must be a sequence' >> > """ > Help on function set_data in module matplotlib.lines: > > set_data(self, *args) > Set the x and y data. > > Parameters > ---------- > *args : (2, N) array or two 1D arrays > > See Also > -------- > set_xdata > set_ydata > """ > > So, the arguments should be arrays: > > For example: > > x0, y0 = np.array([0.0]), np.array([0.0]) > > Has the API changed at some point? > So it seems. Thanks for the quick reply. /Martin From roland.em0001 at googlemail.com Wed Oct 16 14:32:32 2024 From: roland.em0001 at googlemail.com (=?UTF-8?Q?Roland_M=C3=BCller?=) Date: Wed, 16 Oct 2024 21:32:32 +0300 Subject: Common objects for CLI commands with Typer In-Reply-To: References: <87tteayavt.fsf@zedat.fu-berlin.de> <28833A4D-B57C-4195-87BF-FAAF9EFF5F19@barrys-emacs.org> <1E3ED29E-161E-430C-9E99-F89266472ADB@barrys-emacs.org> Message-ID: On 9/23/24 22:51, Dan Sommers via Python-list wrote: > On 2024-09-23 at 19:00:10 +0100, > Barry Scott wrote: > >>> On 21 Sep 2024, at 11:40, Dan Sommers via Python-list wrote: >> But once your code gets big the disciple of using classes helps >> maintenance. Code with lots of globals is problematic. > Even before your code gets big, discipline helps maintenance. :-) > > Every level of your program has globals. An application with too many > classes is no better (or worse) than a class with too many methods, or a > module with too many functions. Insert your own definitions of (and > tolerances for) "too many," which will vary in flexibility. > I think the need of classes comes when you need objects thus a set of variables with an identity and that may be created N times. Classes are object factories. A second aspect is inheritance: classes may inherit from other classes and reuse existing functionality and data structures for their objects. In cases where you only need to encapsulate a single set of data and functions modules are the best choice. From martin.schoon at gmail.com Wed Oct 16 11:52:55 2024 From: martin.schoon at gmail.com (Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?=) Date: 16 Oct 2024 15:52:55 GMT Subject: Old matplotlib animation now fails References: Message-ID: Den 2024-10-16 skrev Stefan Ram : > Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?= wrote or quoted: >>Me rocking Python? > >|to rock >|1. To use. To make do with, usually to great effect. >|"You don't need to make up the guest bed; we can rock the couch." > Urban Dictionary (2005) - Aaron Peckham (editor) (1979-04-03/), > Andrews McMeel Publishing, Kansas City > That is a use and meaning of rock I was not aware of. An example of what I use this Python code for (track top right): https://shorturl.at/m3ZKp (Youtube's compression algorithm clearly did not like this video.) /Martin From bowman at montana.com Wed Oct 16 17:47:08 2024 From: bowman at montana.com (rbowman) Date: 16 Oct 2024 21:47:08 GMT Subject: Old matplotlib animation now fails References: Message-ID: On 16 Oct 2024 08:20:10 GMT, Martin Sch??n wrote: > Den 2024-10-15 skrev Stefan Ram : >> Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?= wrote or >> quoted: >>>l.set_data(x0, y0) >> >> Well, I got to say, it's pretty rad that you're rocking Python! >> That language is the bee's knees, for real. >> >> As for your question, here's my two cents off the cuff: >> Could it be that the newer Matplotlib versions are jonesing for >> something like "l.set_data( [ x0 ],[ y0 ])" in that spot? >> > Thanks, that was quick and adding square brackets fixed my code. > > Me rocking Python? > > /Martin You have to understand Stefan tries to use American slang, not always entirely accurately. I think 'bee's knees' died out around 1931. From news at cct-net.co.uk Wed Oct 16 18:30:42 2024 From: news at cct-net.co.uk (Chris Townley) Date: Wed, 16 Oct 2024 23:30:42 +0100 Subject: Old matplotlib animation now fails In-Reply-To: References: Message-ID: On 16/10/2024 22:47, rbowman wrote: > On 16 Oct 2024 08:20:10 GMT, Martin Sch??n wrote: > >> Den 2024-10-15 skrev Stefan Ram : >>> Martin =?UTF-8?Q?Sch=C3=B6=C3=B6n?= wrote or >>> quoted: >>>> l.set_data(x0, y0) >>> >>> Well, I got to say, it's pretty rad that you're rocking Python! >>> That language is the bee's knees, for real. >>> >>> As for your question, here's my two cents off the cuff: >>> Could it be that the newer Matplotlib versions are jonesing for >>> something like "l.set_data( [ x0 ],[ y0 ])" in that spot? >>> >> Thanks, that was quick and adding square brackets fixed my code. >> >> Me rocking Python? >> >> /Martin > > You have to understand Stefan tries to use American slang, not always > entirely accurately. I think 'bee's knees' died out around 1931. > Not sure about America, but the bee's knees is still in common use in the UK -- Chris From bowman at montana.com Wed Oct 16 23:19:17 2024 From: bowman at montana.com (rbowman) Date: 17 Oct 2024 03:19:17 GMT Subject: Old matplotlib animation now fails References: Message-ID: On Wed, 16 Oct 2024 23:30:42 +0100, Chris Townley wrote: > Not sure about America, but the bee's knees is still in common use in > the UK https://en.wikipedia.org/wiki/Bee's_knees That version? A local bakery makes a honey flavored pastry they call 'bee's knees' but using it in a conversation would be campy. From hjp-python at hjp.at Fri Oct 18 17:09:41 2024 From: hjp-python at hjp.at (Peter J. Holzer) Date: Fri, 18 Oct 2024 23:09:41 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net> References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net> Message-ID: <20241018210941.f5azh2lvz7cxzcy5@hjp.at> On 2024-10-12 08:51:57 -0400, Thomas Passin via Python-list wrote: > On 10/12/2024 6:59 AM, Peter J. Holzer via Python-list wrote: > > On 2024-10-11 17:13:07 -0400, AVI GROSS via Python-list wrote: > > > Is there some utility function out there that can be called to show what the > > > regular expression you typed in will look like by the time it is ready to be > > > used? > > > > I assume that by "ready to be used" you mean the compiled form? > > > > No, there doesn't seem to be a way to dump that. You can > > > > p = re.compile("\\\\sout{") > > print(p.pattern) > > > > but that just prints the input string, which you could do without > > compiling it first. > > It prints the escaped version, Did you mean the *un*escaped version? Well, yeah, that's what print does. > so you can see if you escaped the string as you intended. In this > case, the print will display '\\sout{'. print("\\\\sout{") will do the same. It seems to me that for any string s which is a valid regular expression (i.e. re.compile doesn't throw an exception) assert re.compile(s).pattern == s holds. So it doesn't give you anything you didn't already know. As a trivial example, the regular expressions r"\\sout{" and r"\\sout\{" are equivalent (the \ before the { is redundant). Yet re.compile(s).pattern preserves the difference between the two strings. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp at hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From nospam at please.ty Fri Oct 18 18:15:23 2024 From: nospam at please.ty (jak) Date: Sat, 19 Oct 2024 00:15:23 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net> <20241018210941.f5azh2lvz7cxzcy5@hjp.at> Message-ID: Peter J. Holzer ha scritto: > As a trivial example, the regular expressions r"\\sout{" and r"\\sout\{" > are equivalent (the \ before the { is redundant). Yet > re.compile(s).pattern preserves the difference between the two strings. Hi, Allow me to be fussy: r"\\sout{" and r"\\sout\{" are similar but not equivalent. If you omit the backslash, the parser will have to determine if the graph is part of regular expression {n, m} and will take more time. In some online regexs have these results: r"\\sout{" : 1 match ( 7 steps, 620 ?s ) r"\\sout\{" : 1 match ( 7 steps, 360 ?s ) From hjp-python at hjp.at Mon Oct 21 15:10:49 2024 From: hjp-python at hjp.at (Peter J. Holzer) Date: Mon, 21 Oct 2024 21:10:49 +0200 Subject: Correct syntax for pathological re.search() In-Reply-To: References: <011301db1c22$5e7519c0$1b5f4d40$@gmail.com> <20241012105958.cbctekv7vustleha@hjp.at> <966b510d-9bd7-4472-a858-7e042d78461d@tompassin.net> <20241018210941.f5azh2lvz7cxzcy5@hjp.at> Message-ID: <20241021191049.iclg7pmpfrpkel55@hjp.at> On 2024-10-19 00:15:23 +0200, jak via Python-list wrote: > Peter J. Holzer ha scritto: > > As a trivial example, the regular expressions r"\\sout{" and r"\\sout\{" > > are equivalent (the \ before the { is redundant). Yet > > re.compile(s).pattern preserves the difference between the two strings. > > Allow me to be fussy: r"\\sout{" and r"\\sout\{" are similar but not > equivalent. They are. Both will match the 6 character string 0005c \ REVERSE SOLIDUS 00073 s LATIN SMALL LETTER S 0006f o LATIN SMALL LETTER O 00075 u LATIN SMALL LETTER U 00074 t LATIN SMALL LETTER T 0007b { LEFT CURLY BRACKET > If you omit the backslash, the parser will have to determine if the > graph is part of regular expression {n, m} and will take more time. Yes, that's the parser. But the result of parsing will be the same: The string will end in a literal backslash. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp at hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From jacob.kruger.work at gmail.com Tue Oct 22 08:03:14 2024 From: jacob.kruger.work at gmail.com (Jacob Kruger) Date: Tue, 22 Oct 2024 14:03:14 +0200 Subject: Capturing screenshots and recording audio in an ongoing basis, and submitting data to a RESTFul API Message-ID: <5d797501-cc7d-46e8-9f72-098a5c4d0748@gmail.com> Hi there - know this might be a silly question, but asking anyway... As in, know these formats/data-types are probably not really possible to compress any more than they already are. Have managed to sort out capturing screenshots repeatedly, while recording audio in the background, using combination of PIL's ImageGrab, and pyaudio, and can then use moviepy, which is a sort of wrapper around/interface to the FFMPEG command-line utility - this all allows me to record forms of screencast recordings, setting my own forms of time-frames, etc. in terms of the looping interval when I want to capture screenshots, etc., before then combining them into video clips with the audio recording merged in as a background track, and, all works fine, but, we want to use this as a form of monitoring service for call-centre staff, at times, and, the only real remaining issue is file-size/data in terms of both hard-drive storage space, and, bandwidth in terms of submitting resulting data to a RESTFul API. For example, a test video clip, generated using the libvpx codec, resulting in a .webm file, with a total length of 14 seconds, has a file size of 100KB. Also, don't think it's really relevant, but, am then just using things like requests module to submit data to the RESTFul API, which have used flask to implement. So, know this question might be a waste of time since have already played around with selecting the video codec that generates the smallest resulting file-size, and, not sure if might be able to drop image snapshot file sizes by using something like grayscale, which moviepy doesn't want to work with directly during generating original video clips, but just wondering if there might be any way to try converting binary data into smaller data chunks to then upload these via my RESTFul API, where could then convert them back to multimedia formats, etc.? Any thoughts/suggestions on this type of thing, and, on that note, all of this will be running as something like a background service on call-centre staff's windows 11 machines, if relevant.? As in, if there might be some way to store data and then generate multimedia later on, on the server handling the RESTFul API, that could also work, but, main thing is to both save storage data on workstations, as well as limit amount of bandwidth required overall since the number of target machines could easily be enough to use up a lot of bandwidth, etc., so, what we are looking into at the moment relates to only triggerring recordings at certain times on certain machines, in between. Thanks in advance --- Jacob Kruger +2782 413 4791 "Resistance is futile!...Acceptance is versatile..." From sjeik_appie at hotmail.com Wed Oct 23 13:07:14 2024 From: sjeik_appie at hotmail.com (Albert-Jan Roskam) Date: Wed, 23 Oct 2024 19:07:14 +0200 Subject: Chardet oddity Message-ID: Today I used chardet.detect in the repl and it returned windows-1252 (incorrect, because it later resulted in a UnicodeDecodeError). When I ran chardet as a script (which uses UniversalLineDetector) this returned MacRoman. Isn't charset.detect the correct way? I've used this method many times. # Interpreter >>> contents = open(FILENAME, "rb").read() >>> chardet.detect(content) {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language': ''} # Terminal $ python -m chardet FILENAME FILENAME: MacRoman with confidence 0.7167379080370483 Thanks! Albert-Jan From nntp.mbourne at spamgourmet.com Wed Oct 23 15:42:00 2024 From: nntp.mbourne at spamgourmet.com (Mark Bourne) Date: Wed, 23 Oct 2024 20:42:00 +0100 Subject: Chardet oddity In-Reply-To: References: Message-ID: Albert-Jan Roskam wrote: > Today I used chardet.detect in the repl and it returned windows-1252 > (incorrect, because it later resulted in a UnicodeDecodeError). When I ran > chardet as a script (which uses UniversalLineDetector) this returned > MacRoman. Isn't charset.detect the correct way? I've used this method many > times. > # Interpreter > >>> contents = open(FILENAME, "rb").read() > >>> chardet.detect(content) Is that copy and pasted from the terminal, or retyped with possible transcription errors? As written, you've assigned the open file handle to `contents`, but passed `content` (with no "s") to `chardet.detect` - so the result would depend on whatever was previously assigned to `content`. > {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, 'language': > ''} > # Terminal > $ python -m chardet FILENAME > FILENAME: MacRoman with confidence 0.7167379080370483 > Thanks! > Albert-Jan -- Mark. From c.buhtz at posteo.jp Thu Oct 24 03:33:04 2024 From: c.buhtz at posteo.jp (c.buhtz at posteo.jp) Date: Thu, 24 Oct 2024 07:33:04 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment Message-ID: <4a13731716200669342338ae409e73ca@posteo.de> Hello, I am upstream maintainer of "Back In Time" [1] investigating an issue a distro maintainer from Fedora reported [2] to me. On one hand Fedora seems to use a tool called "mock" to build packages in a chroot environment. On the other hand the test suite of "Back In Time" does read and write to the real file system. One test fails because a temporary directory is cleaned up using shutil.rmtree(). Please see the output below. I am not familiar with Fedora and "mock". So I am not able to reproduce this on my own. It seems the Fedora maintainer also has no clue how to solve it or why it happens. Can you please have a look (especially at the line "assert func is os.lstat"). Maybe you have an idea what is the intention behind this error raised by an "assert" statement inside "shutil.rmtree()". Thanks in advance, Christian Buhtz [1] -- [2] -- __________________________ General.test_ctor_defaults __________________________ self = def test_ctor_defaults(self): """Default values in constructor.""" > with TemporaryDirectory(prefix='bit.') as temp_name: test/test_uniquenessset.py:47: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/lib64/python3.13/tempfile.py:946: in __exit__ self.cleanup() /usr/lib64/python3.13/tempfile.py:950: in cleanup self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors) /usr/lib64/python3.13/tempfile.py:930: in _rmtree _shutil.rmtree(name, onexc=onexc) /usr/lib64/python3.13/shutil.py:763: in rmtree _rmtree_safe_fd(stack, onexc) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ stack = [] onexc = .onexc at 0xffffb39bc860> def _rmtree_safe_fd(stack, onexc): # Each stack item has four elements: # * func: The first operation to perform: os.lstat, os.close or os.rmdir. # Walking a directory starts with an os.lstat() to detect symlinks; in # this case, func is updated before subsequent operations and passed to # onexc() if an error occurs. # * dirfd: Open file descriptor, or None if we're processing the top-level # directory given to rmtree() and the user didn't supply dir_fd. # * path: Path of file to operate upon. This is passed to onexc() if an # error occurs. # * orig_entry: os.DirEntry, or None if we're processing the top-level # directory given to rmtree(). We used the cached stat() of the entry to # save a call to os.lstat() when walking subdirectories. func, dirfd, path, orig_entry = stack.pop() name = path if orig_entry is None else orig_entry.name try: if func is os.close: os.close(dirfd) return if func is os.rmdir: os.rmdir(name, dir_fd=dirfd) return # Note: To guard against symlink races, we use the standard # lstat()/open()/fstat() trick. > assert func is os.lstat E AssertionError /usr/lib64/python3.13/shutil.py:663: AssertionError From python at mrabarnett.plus.com Thu Oct 24 10:45:47 2024 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 24 Oct 2024 15:45:47 +0100 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <4a13731716200669342338ae409e73ca@posteo.de> References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: <8d82b75d-a773-4854-bf44-cf480fdf3b84@mrabarnett.plus.com> On 2024-10-24 08:33, Christian Buhtz via Python-list wrote: > Hello, > I am upstream maintainer of "Back In Time" [1] investigating an issue a > distro maintainer from Fedora reported [2] to me. > > On one hand Fedora seems to use a tool called "mock" to build packages > in a chroot environment. > On the other hand the test suite of "Back In Time" does read and write > to the real file system. > One test fails because a temporary directory is cleaned up using > shutil.rmtree(). Please see the output below. > > I am not familiar with Fedora and "mock". So I am not able to reproduce > this on my own. > It seems the Fedora maintainer also has no clue how to solve it or why > it happens. > > Can you please have a look (especially at the line "assert func is > os.lstat"). > Maybe you have an idea what is the intention behind this error raised by > an "assert" statement inside "shutil.rmtree()". > > Thanks in advance, > Christian Buhtz > > [1] -- > [2] -- > > __________________________ General.test_ctor_defaults > __________________________ > self = > def test_ctor_defaults(self): > """Default values in constructor.""" >> with TemporaryDirectory(prefix='bit.') as temp_name: > test/test_uniquenessset.py:47: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > /usr/lib64/python3.13/tempfile.py:946: in __exit__ > self.cleanup() > /usr/lib64/python3.13/tempfile.py:950: in cleanup > self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors) > /usr/lib64/python3.13/tempfile.py:930: in _rmtree > _shutil.rmtree(name, onexc=onexc) > /usr/lib64/python3.13/shutil.py:763: in rmtree > _rmtree_safe_fd(stack, onexc) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > stack = [] > onexc = .onexc at > 0xffffb39bc860> > def _rmtree_safe_fd(stack, onexc): > # Each stack item has four elements: > # * func: The first operation to perform: os.lstat, os.close or > os.rmdir. > # Walking a directory starts with an os.lstat() to detect > symlinks; in > # this case, func is updated before subsequent operations and > passed to > # onexc() if an error occurs. > # * dirfd: Open file descriptor, or None if we're processing the > top-level > # directory given to rmtree() and the user didn't supply > dir_fd. > # * path: Path of file to operate upon. This is passed to > onexc() if an > # error occurs. > # * orig_entry: os.DirEntry, or None if we're processing the > top-level > # directory given to rmtree(). We used the cached stat() of > the entry to > # save a call to os.lstat() when walking subdirectories. > func, dirfd, path, orig_entry = stack.pop() > name = path if orig_entry is None else orig_entry.name > try: > if func is os.close: > os.close(dirfd) > return > if func is os.rmdir: > os.rmdir(name, dir_fd=dirfd) > return > > # Note: To guard against symlink races, we use the standard > # lstat()/open()/fstat() trick. >> assert func is os.lstat > E AssertionError > /usr/lib64/python3.13/shutil.py:663: AssertionError > What does "mock" do? func should be either os.close, os.rmdir or os.lstat. If mock is somehow replacing one of those functions, then it might break the code. From olegsivokon at gmail.com Thu Oct 24 11:17:26 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 24 Oct 2024 17:17:26 +0200 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <4a13731716200669342338ae409e73ca@posteo.de> References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: >From reading the code where the exception is coming from, this is how I interpret the intention of the author: they build a list (not sure why they used list, when there's a stack datastructure in Python) which they use as a stack, where the elements of the stack are 4-tuples, the important part about these tuples is that the first element is the operation to be performed by rmtree() has to be one of the known filesystem-related functions. The code raising the exception checks that it's one of those kinds and if it isn't, crashes. There is, however, a problem with testing equality (more strictly, identity in this case) between functions. I.e. it's possible that a function isn't identical to itself is, eg. "os" module was somehow loaded twice. I'm not sure if that's a real possibility with how Python works... but maybe in some cases, like, multithreaded environments it could happen... To investigate this, I'd edit the file with the assertion and make it print the actual value found in os.lstat and func. My guess is that they are both somehow "lstat", but with different memory addresses. On Thu, Oct 24, 2024 at 4:06?PM Christian Buhtz via Python-list wrote: > > Hello, > I am upstream maintainer of "Back In Time" [1] investigating an issue a > distro maintainer from Fedora reported [2] to me. > > On one hand Fedora seems to use a tool called "mock" to build packages > in a chroot environment. > On the other hand the test suite of "Back In Time" does read and write > to the real file system. > One test fails because a temporary directory is cleaned up using > shutil.rmtree(). Please see the output below. > > I am not familiar with Fedora and "mock". So I am not able to reproduce > this on my own. > It seems the Fedora maintainer also has no clue how to solve it or why > it happens. > > Can you please have a look (especially at the line "assert func is > os.lstat"). > Maybe you have an idea what is the intention behind this error raised by > an "assert" statement inside "shutil.rmtree()". > > Thanks in advance, > Christian Buhtz > > [1] -- > [2] -- > > __________________________ General.test_ctor_defaults > __________________________ > self = > def test_ctor_defaults(self): > """Default values in constructor.""" > > with TemporaryDirectory(prefix='bit.') as temp_name: > test/test_uniquenessset.py:47: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > /usr/lib64/python3.13/tempfile.py:946: in __exit__ > self.cleanup() > /usr/lib64/python3.13/tempfile.py:950: in cleanup > self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors) > /usr/lib64/python3.13/tempfile.py:930: in _rmtree > _shutil.rmtree(name, onexc=onexc) > /usr/lib64/python3.13/shutil.py:763: in rmtree > _rmtree_safe_fd(stack, onexc) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ _ _ _ > stack = [] > onexc = .onexc at > 0xffffb39bc860> > def _rmtree_safe_fd(stack, onexc): > # Each stack item has four elements: > # * func: The first operation to perform: os.lstat, os.close or > os.rmdir. > # Walking a directory starts with an os.lstat() to detect > symlinks; in > # this case, func is updated before subsequent operations and > passed to > # onexc() if an error occurs. > # * dirfd: Open file descriptor, or None if we're processing the > top-level > # directory given to rmtree() and the user didn't supply > dir_fd. > # * path: Path of file to operate upon. This is passed to > onexc() if an > # error occurs. > # * orig_entry: os.DirEntry, or None if we're processing the > top-level > # directory given to rmtree(). We used the cached stat() of > the entry to > # save a call to os.lstat() when walking subdirectories. > func, dirfd, path, orig_entry = stack.pop() > name = path if orig_entry is None else orig_entry.name > try: > if func is os.close: > os.close(dirfd) > return > if func is os.rmdir: > os.rmdir(name, dir_fd=dirfd) > return > > # Note: To guard against symlink races, we use the standard > # lstat()/open()/fstat() trick. > > assert func is os.lstat > E AssertionError > /usr/lib64/python3.13/shutil.py:663: AssertionError > > -- > https://mail.python.org/mailman/listinfo/python-list From python at mrabarnett.plus.com Thu Oct 24 11:44:30 2024 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 24 Oct 2024 16:44:30 +0100 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> On 2024-10-24 16:17, Left Right via Python-list wrote: > From reading the code where the exception is coming from, this is how > I interpret the intention of the author: they build a list (not sure > why they used list, when there's a stack datastructure in Python) > which they use as a stack, where the elements of the stack are > 4-tuples, the important part about these tuples is that the first > element is the operation to be performed by rmtree() has to be one of > the known filesystem-related functions. The code raising the exception > checks that it's one of those kinds and if it isn't, crashes. > > There is, however, a problem with testing equality (more strictly, > identity in this case) between functions. I.e. it's possible that a > function isn't identical to itself is, eg. "os" module was somehow > loaded twice. I'm not sure if that's a real possibility with how > Python works... but maybe in some cases, like, multithreaded > environments it could happen... > > To investigate this, I'd edit the file with the assertion and make it > print the actual value found in os.lstat and func. My guess is that > they are both somehow "lstat", but with different memory addresses. > The stack is created on line 760 with os.lstat and entries are appended on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). 'func' is popped off the stack on line 651 and check in the following lines. I can't see anywhere else where something else is put onto the stack or an entry is replaced. > On Thu, Oct 24, 2024 at 4:06?PM Christian Buhtz via Python-list > wrote: >> >> Hello, >> I am upstream maintainer of "Back In Time" [1] investigating an issue a >> distro maintainer from Fedora reported [2] to me. >> >> On one hand Fedora seems to use a tool called "mock" to build packages >> in a chroot environment. >> On the other hand the test suite of "Back In Time" does read and write >> to the real file system. >> One test fails because a temporary directory is cleaned up using >> shutil.rmtree(). Please see the output below. >> >> I am not familiar with Fedora and "mock". So I am not able to reproduce >> this on my own. >> It seems the Fedora maintainer also has no clue how to solve it or why >> it happens. >> >> Can you please have a look (especially at the line "assert func is >> os.lstat"). >> Maybe you have an idea what is the intention behind this error raised by >> an "assert" statement inside "shutil.rmtree()". >> >> Thanks in advance, >> Christian Buhtz >> >> [1] -- >> [2] -- >> >> __________________________ General.test_ctor_defaults >> __________________________ >> self = >> def test_ctor_defaults(self): >> """Default values in constructor.""" >> > with TemporaryDirectory(prefix='bit.') as temp_name: >> test/test_uniquenessset.py:47: >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ _ _ _ >> /usr/lib64/python3.13/tempfile.py:946: in __exit__ >> self.cleanup() >> /usr/lib64/python3.13/tempfile.py:950: in cleanup >> self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors) >> /usr/lib64/python3.13/tempfile.py:930: in _rmtree >> _shutil.rmtree(name, onexc=onexc) >> /usr/lib64/python3.13/shutil.py:763: in rmtree >> _rmtree_safe_fd(stack, onexc) >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ _ _ _ >> stack = [] >> onexc = .onexc at >> 0xffffb39bc860> >> def _rmtree_safe_fd(stack, onexc): >> # Each stack item has four elements: >> # * func: The first operation to perform: os.lstat, os.close or >> os.rmdir. >> # Walking a directory starts with an os.lstat() to detect >> symlinks; in >> # this case, func is updated before subsequent operations and >> passed to >> # onexc() if an error occurs. >> # * dirfd: Open file descriptor, or None if we're processing the >> top-level >> # directory given to rmtree() and the user didn't supply >> dir_fd. >> # * path: Path of file to operate upon. This is passed to >> onexc() if an >> # error occurs. >> # * orig_entry: os.DirEntry, or None if we're processing the >> top-level >> # directory given to rmtree(). We used the cached stat() of >> the entry to >> # save a call to os.lstat() when walking subdirectories. >> func, dirfd, path, orig_entry = stack.pop() >> name = path if orig_entry is None else orig_entry.name >> try: >> if func is os.close: >> os.close(dirfd) >> return >> if func is os.rmdir: >> os.rmdir(name, dir_fd=dirfd) >> return >> >> # Note: To guard against symlink races, we use the standard >> # lstat()/open()/fstat() trick. >> > assert func is os.lstat >> E AssertionError >> /usr/lib64/python3.13/shutil.py:663: AssertionError >> >> -- >> https://mail.python.org/mailman/listinfo/python-list From roland.em0001 at googlemail.com Thu Oct 24 11:51:47 2024 From: roland.em0001 at googlemail.com (Roland Mueller) Date: Thu, 24 Oct 2024 18:51:47 +0300 Subject: Chardet oddity In-Reply-To: References: Message-ID: ke 23. lokak. 2024 klo 20.11 Albert-Jan Roskam via Python-list ( python-list at python.org) kirjoitti: > Today I used chardet.detect in the repl and it returned windows-1252 > (incorrect, because it later resulted in a UnicodeDecodeError). When I > ran > chardet as a script (which uses UniversalLineDetector) this returned > MacRoman. Isn't charset.detect the correct way? I've used this method > many > times. > # Interpreter > >>> contents = open(FILENAME, "rb").read() > >>> chardet.detect(content) > {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, > 'language': > ''} > # Terminal > $ python -m chardet FILENAME > FILENAME: MacRoman with confidence 0.7167379080370483 > Thanks! > Albert-Jan > The entry point for the module chardet is chardet.cli.chardetect:main and main() calls function description_of(lines, name). 'lines' is an opened file in mode 'rb' and name will hold the filename. Following way I tried this in interactive mode: I think the crucial difference is that description_of(lines, name) reads the opened file line by line and stops after something has been detected in some line. When reading the whole file into the variable contents probably gives another result depending on the input. This behaviour I was not able to repeat. I am assuming that you used the same Python for both tests. >>> from chardet.cli import chardetect >>> chardetect.description_of(open('/tmp/DATE', 'rb'), 'some file') 'some file: ascii with confidence 1.0' >>> Your approach >>> from chardet import detect >>> detect(open('/tmp/DATE','rb').read()) {'encoding': 'ascii', 'confidence': 1.0, 'language': ''} from /usr/lib/python3/dist-packages/chardet/cli/chardetect.py def description_of(lines, name='stdin'): u = UniversalDetector() for line in lines: line = bytearray(line) u.feed(line) # shortcut out of the loop to save reading further - particularly useful if we read a BOM. if u.done: break u.close() result = u.result ... > -- > https://mail.python.org/mailman/listinfo/python-list > From python at mrabarnett.plus.com Thu Oct 24 13:08:02 2024 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 24 Oct 2024 18:08:02 +0100 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> Message-ID: <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> On 2024-10-24 17:30, Left Right wrote: > > The stack is created on line 760 with os.lstat and entries are appended > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). > > > > 'func' is popped off the stack on line 651 and check in the following lines. > > > > I can't see anywhere else where something else is put onto the stack or > > an entry is replaced. > > But how do you know this code isn't executed from different threads? > What I anticipate to be the problem is that the "os" module is > imported twice, and there are two references to "os.lstat". Normally, > this wouldn't cause a problem, because they are the same function that > doesn't have any state, but once you are trying to compare them, the > identity test will fail, because those functions were loaded multiple > times into different memory locations. > > I don't know of any specific mechanism for forcing the interpreter to > import the same module multiple times, but if that was possible (which > in principle it is), then it would explain the behavior. The stack is a local variable and os.lstat, etc, are pushed and popped in one function and then another that it calls, so they're in the same thread. From olegsivokon at gmail.com Thu Oct 24 12:30:21 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 24 Oct 2024 18:30:21 +0200 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> Message-ID: > The stack is created on line 760 with os.lstat and entries are appended > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). > > 'func' is popped off the stack on line 651 and check in the following lines. > > I can't see anywhere else where something else is put onto the stack or > an entry is replaced. But how do you know this code isn't executed from different threads? What I anticipate to be the problem is that the "os" module is imported twice, and there are two references to "os.lstat". Normally, this wouldn't cause a problem, because they are the same function that doesn't have any state, but once you are trying to compare them, the identity test will fail, because those functions were loaded multiple times into different memory locations. I don't know of any specific mechanism for forcing the interpreter to import the same module multiple times, but if that was possible (which in principle it is), then it would explain the behavior. From olegsivokon at gmail.com Thu Oct 24 15:21:17 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 24 Oct 2024 21:21:17 +0200 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> Message-ID: > > > The stack is created on line 760 with os.lstat and entries are appended > > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). > > > > > > 'func' is popped off the stack on line 651 and check in the following lines. > > > > > > I can't see anywhere else where something else is put onto the stack or > > > an entry is replaced. But the _rmtree_safe_fd() compares func to a *dynamically* resolved reference: os.lstat. If the reference to os changed (or os object was modified to have new reference at lstat) between the time os.lstat was added to the stack and the time of comparison, then comparison would've failed. To illustrate my idea: os.lstat = lambda x: x # thread 1 stack.append((os.lstat, ...)) # thread 1 os.lstat = lambda x: x # thread 2 func, *_ = stack.pop() # thread 1 assert func is os.lstat # thread 1 (failure!) The only question is: is it possible to modify os.lstat like that, and if so, how? Other alternatives include a malfunctioning "is" operator, malfunctioning module cache... all those are a lot less likely. From python at mrabarnett.plus.com Thu Oct 24 15:54:53 2024 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 24 Oct 2024 20:54:53 +0100 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> Message-ID: On 2024-10-24 20:21, Left Right wrote: > > > > The stack is created on line 760 with os.lstat and entries are appended > > > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). > > > > > > > > 'func' is popped off the stack on line 651 and check in the following lines. > > > > > > > > I can't see anywhere else where something else is put onto the stack or > > > > an entry is replaced. > > But the _rmtree_safe_fd() compares func to a *dynamically* resolved > reference: os.lstat. If the reference to os changed (or os object was > modified to have new reference at lstat) between the time os.lstat was > added to the stack and the time of comparison, then comparison > would've failed. To illustrate my idea: > > os.lstat = lambda x: x # thread 1 > stack.append((os.lstat, ...)) # thread 1 > os.lstat = lambda x: x # thread 2 > func, *_ = stack.pop() # thread 1 > assert func is os.lstat # thread 1 (failure!) > > The only question is: is it possible to modify os.lstat like that, and > if so, how? > > Other alternatives include a malfunctioning "is" operator, > malfunctioning module cache... all those are a lot less likely. What is the probability of replacing os.lstat, os.close or os.rmdir from another thread at just the right time? From olegsivokon at gmail.com Thu Oct 24 16:08:31 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 24 Oct 2024 22:08:31 +0200 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> Message-ID: > What is the probability of replacing os.lstat, os.close or os.rmdir from > another thread at just the right time? If the thead does "import os", and its start is logically connected to calling _rmtree_safe_fd(), I'd say it's a very good chance! That is, again, granted that the reference to os.lstat *can* be modified in this way. But, before we keep guessing any further, it'd be best if OP could get us the info on what's stored in "func" and "os.lstat" at the time the assertion fails. From 2QdxY4RzWzUUiLuE at potatochowder.com Thu Oct 24 16:25:56 2024 From: 2QdxY4RzWzUUiLuE at potatochowder.com (2QdxY4RzWzUUiLuE at potatochowder.com) Date: Thu, 24 Oct 2024 16:25:56 -0400 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> <0c080a7d-92de-4518-ac44-78d97fc9e3bb@mrabarnett.plus.com> <69934a74-6d04-40a0-a75f-b8024bd0af43@mrabarnett.plus.com> Message-ID: On 2024-10-24 at 20:54:53 +0100, MRAB via Python-list wrote: > On 2024-10-24 20:21, Left Right wrote: > > > > > The stack is created on line 760 with os.lstat and entries are appended > > > > > on lines 677 (os.rmdir), 679 (os.close) and 689 (os.lstat). > > > > > > > > > > 'func' is popped off the stack on line 651 and check in the following lines. > > > > > > > > > > I can't see anywhere else where something else is put onto the stack or > > > > > an entry is replaced. > > > > But the _rmtree_safe_fd() compares func to a *dynamically* resolved > > reference: os.lstat. If the reference to os changed (or os object was > > modified to have new reference at lstat) between the time os.lstat was > > added to the stack and the time of comparison, then comparison > > would've failed. To illustrate my idea: > > > > os.lstat = lambda x: x # thread 1 > > stack.append((os.lstat, ...)) # thread 1 > > os.lstat = lambda x: x # thread 2 > > func, *_ = stack.pop() # thread 1 > > assert func is os.lstat # thread 1 (failure!) > > > > The only question is: is it possible to modify os.lstat like that, and > > if so, how? > > > > Other alternatives include a malfunctioning "is" operator, > > malfunctioning module cache... all those are a lot less likely. > What is the probability of replacing os.lstat, os.close or os.rmdir from > another thread at just the right time? That is never the right question in a multi-threaded system. The answer is always that is doesn't matter, the odds will beat you in the end. Or sometimes right in the middle of a CPU instruction; does anyone remember the MC680XX series? Yes, as a matter of fact, I did used to make my living designing, building, delivering, and maintaining such systems. From barry at barrys-emacs.org Thu Oct 24 18:44:43 2024 From: barry at barrys-emacs.org (Barry) Date: Thu, 24 Oct 2024 23:44:43 +0100 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <4a13731716200669342338ae409e73ca@posteo.de> References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: <36BEA5DD-D591-4D0E-B7CF-78609AA42B92@barrys-emacs.org> > On 24 Oct 2024, at 15:07, Christian Buhtz via Python-list wrote: > > On one hand Fedora seems to use a tool called "mock" to build packages in a chroot environment. > On the other hand the test suite of "Back In Time" does read and write to the real file system. I am a Fedora packager and can help explain what is the tools are doing. Mock runs the build in a chroot env that allows for reproducible clean room builds. Sort like a container. This is nothing to do with the python mock package. What do you mean by the real file sustem? You cannot write to the /usr file system. Is that what your tests do? If so that needs changing. Barry From c.buhtz at posteo.jp Fri Oct 25 02:59:28 2024 From: c.buhtz at posteo.jp (c.buhtz at posteo.jp) Date: Fri, 25 Oct 2024 06:59:28 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: Thank you very much for all your responses. Am 24.10.2024 17:17 schrieb Left Right: > To investigate this, I'd edit the file with the assertion and make it > print the actual value found in os.lstat and func. My guess is that > they are both somehow "lstat", but with different memory addresses. My reporter provided this [1]. I think this is the relevant output: =========================== short test summary info ============================ FAILED test/test_plugin_usercallback.py::SystemTest::test_local_snapshot - As... FAILED test/test_uniquenessset.py::General::test_ctor_defaults - AssertionError FAILED test/test_uniquenessset.py::General::test_deep_check - AssertionError FAILED test/test_uniquenessset.py::General::test_fail_equal_without_equal_to FAILED test/test_uniquenessset.py::General::test_size_mtime - AssertionError FAILED test/test_uniquenessset.py::General::test_unique_myself - AssertionError FAILED test/test_uniquenessset.py::General::test_unique_size_but_different_mtime ================== 7 failed, 267 passed, 16 skipped in 20.79s ================== os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= os.lstat= make: Leaving directory '/home/johannes/rpmbuild/BUILD/backintime-1.5.3-build/backintime-1.5.3-rc1/common' RPM build errors: [1] -- From c.buhtz at posteo.jp Fri Oct 25 03:06:35 2024 From: c.buhtz at posteo.jp (c.buhtz at posteo.jp) Date: Fri, 25 Oct 2024 07:06:35 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <36BEA5DD-D591-4D0E-B7CF-78609AA42B92@barrys-emacs.org> References: <4a13731716200669342338ae409e73ca@posteo.de> <36BEA5DD-D591-4D0E-B7CF-78609AA42B92@barrys-emacs.org> Message-ID: <95ec01236b917c5717c8ef6d2da8ffae@posteo.de> Hello Barry, thank you for your reply and clarifying the Fedora aspects. Am 25.10.2024 00:44 schrieb Barry: > What do you mean by the real file sustem? > > You cannot write to the /usr file system. Is that what your tests do? > If so that needs changing. Asking the right questions brings up to important details. While I was writing and trying to explain that the relevant test does use ""tempfile.TemporaryDirectory" as a context, I realized that "PyFakeFS" is used in the back [1]. But that makes me wonder. On a "regular" system all tests are running. So the issue might exist because of a combination of 3 factors: shutil.rmtree(), PyFakeFS in a chroot/mock build environment. [1] -- From c.buhtz at posteo.jp Fri Oct 25 03:29:19 2024 From: c.buhtz at posteo.jp (c.buhtz at posteo.jp) Date: Fri, 25 Oct 2024 07:29:19 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <95ec01236b917c5717c8ef6d2da8ffae@posteo.de> References: <4a13731716200669342338ae409e73ca@posteo.de> <36BEA5DD-D591-4D0E-B7CF-78609AA42B92@barrys-emacs.org> <95ec01236b917c5717c8ef6d2da8ffae@posteo.de> Message-ID: Am 25.10.2024 09:06 schrieb Christian Buhtz via Python-list: > On a "regular" system all tests are running. To clarify: "regular" does not exclude PyFakeFS. It means on my own local development machine and on the TravsCI machines (Ubuntu 22 with Python 3.9 up to 3.13) and using PyFakeFS in that test, everything is fine. Only when mock/chroot is involved that happens. From sjeik_appie at hotmail.com Fri Oct 25 06:31:25 2024 From: sjeik_appie at hotmail.com (Albert-Jan Roskam) Date: Fri, 25 Oct 2024 12:31:25 +0200 Subject: Chardet oddity In-Reply-To: Message-ID: On Oct 24, 2024 17:51, Roland Mueller via Python-list wrote: ke 23. lokak. 2024 klo 20.11 Albert-Jan Roskam via Python-list ( python-list at python.org) kirjoitti: >??? Today I used chardet.detect in the repl and it returned windows-1252 >??? (incorrect, because it later resulted in a UnicodeDecodeError). When I > ran >??? chardet as a script (which uses UniversalLineDetector) this returned >??? MacRoman. Isn't charset.detect the correct way? I've used this method > many >??? times. >??? # Interpreter >??? >>> contents = open(FILENAME, "rb").read() >??? >>> chardet.detect(content) >??? {'encoding': 'Windows-1252', 'confidence': 0.7282676610947401, > 'language': >??? ''} >??? # Terminal >??? $ python -m chardet FILENAME >??? FILENAME: MacRoman with confidence 0.7167379080370483 >??? Thanks! >??? Albert-Jan > The entry point for the module chardet is chardet.cli.chardetect:main and main() calls function description_of(lines, name). 'lines' is an opened file in mode 'rb' and name will hold the filename. Following way I tried this in interactive mode: I think the crucial difference is that? description_of(lines, name) reads the opened file line by line and stops after something has been detected in some line. When reading the whole file into the variable contents probably gives another result depending on the input. This behaviour I was not able to repeat. I am assuming that you used the same Python for both tests. >>> from chardet.cli import chardetect >>> chardetect.description_of(open('/tmp/DATE', 'rb'), 'some file') 'some file: ascii with confidence 1.0' >>> Your approach >>> from chardet import detect >>> detect(open('/tmp/DATE','rb').read()) {'encoding': 'ascii', 'confidence': 1.0, 'language': ''} from /usr/lib/python3/dist-packages/chardet/cli/chardetect.py def description_of(lines, name='stdin'): ??? u = UniversalDetector() ??? for line in lines: ??????? line = bytearray(line) ??????? u.feed(line) ??????? # shortcut out of the loop to save reading further - particularly useful if we read a BOM. ??????? if u.done: ??????????? break ??? u.close() ??? result = u.result ============= Hi Mark, Roland, Thanks for your replies. I experimented a bit with both methods and the derived encoding still differed, even after I removed the "if u.done:? break" (I removed that because I've seen cp1252 files with a utf8 BOM in the past. I kid you not!). BUT next day, at closer inspection I saw that the file was quite a mess. I contained mojibake. So I don't blame chardet for not being able to figure out the encoding.? Albert-Jan From mk1853387 at gmail.com Fri Oct 25 12:25:19 2024 From: mk1853387 at gmail.com (marc nicole) Date: Fri, 25 Oct 2024 18:25:19 +0200 Subject: How to check whether audio bytes contain empty noise or actual voice/signal? Message-ID: Hello Python fellows, I hope this question is not very far from the main topic of this list, but I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise. I am using PyAudio to collect the sound through my PC mic as follows: FRAMES_PER_BUFFER = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 48000 RECORD_SECONDS = 2import pyaudio audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER, input_device_index=2) data = stream.read(FRAMES_PER_BUFFER) I want to know whether or not data contains voice signals or empty sound, To note that the variable always contains bytes (empty or sound) if I print it. Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke? Thanks. From c.buhtz at posteo.jp Sat Oct 26 07:08:05 2024 From: c.buhtz at posteo.jp (c.buhtz at posteo.jp) Date: Sat, 26 Oct 2024 11:08:05 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <4a13731716200669342338ae409e73ca@posteo.de> References: <4a13731716200669342338ae409e73ca@posteo.de> Message-ID: <4XbH214CRzz6tyG@submission01.posteo.de> As you can see in the linked issue it seems it was an incompatibility between the version of Python and PyFakeFS. In the end it was a Fedora packaging bug because that pyfakefs version was not compatible with Python 3.13. Thanks in advance for helping out. From python at mrabarnett.plus.com Sat Oct 26 11:35:47 2024 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 26 Oct 2024 16:35:47 +0100 Subject: How to check whether audio bytes contain empty noise or actual voice/signal? In-Reply-To: References: Message-ID: <13ca3f2d-3a86-4771-9318-73ab806179a9@mrabarnett.plus.com> On 2024-10-25 17:25, marc nicole via Python-list wrote: > Hello Python fellows, > > I hope this question is not very far from the main topic of this list, but > I have a hard time finding a way to check whether audio data samples are > containing empty noise or actual significant voice/noise. > > I am using PyAudio to collect the sound through my PC mic as follows: > > FRAMES_PER_BUFFER = 1024 > FORMAT = pyaudio.paInt16 > CHANNELS = 1 > RATE = 48000 > RECORD_SECONDS = 2import pyaudio > audio = pyaudio.PyAudio() > stream = audio.open(format=FORMAT, > channels=CHANNELS, > rate=RATE, > input=True, > frames_per_buffer=FRAMES_PER_BUFFER, > input_device_index=2) > data = stream.read(FRAMES_PER_BUFFER) > > > I want to know whether or not data contains voice signals or empty sound, > To note that the variable always contains bytes (empty or sound) if I print > it. > > Is there an straightforward "easy way" to check whether data is filled with > empty noise or that somebody has made noise/spoke? > > Thanks. If you do a spectral analysis and find peaks at certain frequencies, then there might be a "significant" sound. From list1 at tompassin.net Sat Oct 26 12:07:10 2024 From: list1 at tompassin.net (Thomas Passin) Date: Sat, 26 Oct 2024 12:07:10 -0400 Subject: How to check whether audio bytes contain empty noise or actual voice/signal? In-Reply-To: References: Message-ID: <0cb64539-790f-40e8-818a-74e32bd476a0@tompassin.net> On 10/25/2024 12:25 PM, marc nicole via Python-list wrote: > Hello Python fellows, > > I hope this question is not very far from the main topic of this list, but > I have a hard time finding a way to check whether audio data samples are > containing empty noise or actual significant voice/noise. > > I am using PyAudio to collect the sound through my PC mic as follows: > > FRAMES_PER_BUFFER = 1024 > FORMAT = pyaudio.paInt16 > CHANNELS = 1 > RATE = 48000 > RECORD_SECONDS = 2import pyaudio > audio = pyaudio.PyAudio() > stream = audio.open(format=FORMAT, > channels=CHANNELS, > rate=RATE, > input=True, > frames_per_buffer=FRAMES_PER_BUFFER, > input_device_index=2) > data = stream.read(FRAMES_PER_BUFFER) > > > I want to know whether or not data contains voice signals or empty sound, > To note that the variable always contains bytes (empty or sound) if I print > it. > > Is there an straightforward "easy way" to check whether data is filled with > empty noise or that somebody has made noise/spoke? It's not always so easy. The Fast Fourier Transform will be your friend. The most straightforward way would be to do an autocorrelation on the recorded interval, possibly with some pre-filtering to enhance the typical vocal frequency range. If the data is only noise, the autocorrelation will show a large signal at point 0 and only small, obviously noisy numbers everywhere else. There are practical aspects that make things less clear. For example, voices tend to be spiky and erratic so you need to use small intervals to have a better chance of getting an interval with a good S/N ratio, but small intervals will have a lower signal to noise ratio. Human speech is produced with various statistical regularities and these can sometimes be detected with various means, including the autocorrelation. You also will need to test-record your entire signal chain because it might be producing artifacts that could fool some tests. And background sounds could fool some tests as well. Here are some Python libraries that could be very helpful: librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used); https://python-speech-features.readthedocs.io/en/latest/ Other people will know of others. From barry at barrys-emacs.org Sun Oct 27 04:56:59 2024 From: barry at barrys-emacs.org (Barry) Date: Sun, 27 Oct 2024 08:56:59 +0000 Subject: shutil.rmtree() fails when used in Fedora (rpm) "mock" environment In-Reply-To: <4XbH214CRzz6tyG@submission01.posteo.de> References: <4XbH214CRzz6tyG@submission01.posteo.de> Message-ID: > On 26 Oct 2024, at 12:11, Christian Buhtz via Python-list wrote: > > ?As you can see in the linked issue it seems it was an incompatibility > between the version of Python and PyFakeFS. > > In the end it was a Fedora packaging bug because that pyfakefs version > was not compatible with Python 3.13. That makes sense. > > Thanks in advance for helping out. No problem. Barry > -- > https://mail.python.org/mailman/listinfo/python-list > From o1bigtenor at gmail.com Sun Oct 27 18:51:13 2024 From: o1bigtenor at gmail.com (o1bigtenor) Date: Sun, 27 Oct 2024 17:51:13 -0500 Subject: learning Python Message-ID: Greetings There are mountains of books out there. Any suggestions for documents for a just learning how to program and starting with Python (3)? Preference to a tool where I would be learning by doing - - - that works well for me. TIA From PythonList at DancesWithMice.info Sun Oct 27 20:53:29 2024 From: PythonList at DancesWithMice.info (dn) Date: Mon, 28 Oct 2024 13:53:29 +1300 Subject: learning Python In-Reply-To: References: Message-ID: <540111dc-a5e6-4582-8981-66974f10d3d3@DancesWithMice.info> On 28/10/24 11:51, o1bigtenor via Python-list wrote: > Greetings > > There are mountains of books out there. > > Any suggestions for documents for a just learning how to program and > starting with Python (3)? > > Preference to a tool where I would be learning by doing - - - that > works well for me. Coursera and edX have many courses. Harvard CS50-P (for Python) may suit... -- Regards, =dn From lal at solute.de Mon Oct 28 04:57:09 2024 From: lal at solute.de (Lars Liedtke) Date: Mon, 28 Oct 2024 09:57:09 +0100 Subject: How to check whether audio bytes contain empty noise or actual voice/signal? In-Reply-To: <0cb64539-790f-40e8-818a-74e32bd476a0@tompassin.net> References: <0cb64539-790f-40e8-818a-74e32bd476a0@tompassin.net> Message-ID: <2728f80a-bb15-400c-9c0a-2d7df57cf78f@solute.de> There are also the concepts of Cepstrum (https://en.wikipedia.org/wiki/Cepstrum) and Quefrency, which are derivatives of Spectrum and Frequency, with which you can even do speaker-recognition, but also detection of events. Lars Liedtke Lead Developer [Tel.] +49 721 98993- [Fax] +49 721 98993- [E-Mail] lal at solute.de solute GmbH Zeppelinstra?e 15 76185 Karlsruhe Germany [Marken] Gesch?ftsf?hrer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten Webseite | www.solute.de Sitz | Registered Office: Karlsruhe Registergericht | Register Court: Amtsgericht Mannheim Registernummer | Register No.: HRB 748044 USt-ID | VAT ID: DE234663798 Informationen zum Datenschutz | Information about privacy policy https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php Am 26.10.24 um 18:07 schrieb Thomas Passin via Python-list: On 10/25/2024 12:25 PM, marc nicole via Python-list wrote: Hello Python fellows, I hope this question is not very far from the main topic of this list, but I have a hard time finding a way to check whether audio data samples are containing empty noise or actual significant voice/noise. I am using PyAudio to collect the sound through my PC mic as follows: FRAMES_PER_BUFFER = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 48000 RECORD_SECONDS = 2import pyaudio audio = pyaudio.PyAudio() stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=FRAMES_PER_BUFFER, input_device_index=2) data = stream.read(FRAMES_PER_BUFFER) I want to know whether or not data contains voice signals or empty sound, To note that the variable always contains bytes (empty or sound) if I print it. Is there an straightforward "easy way" to check whether data is filled with empty noise or that somebody has made noise/spoke? It's not always so easy. The Fast Fourier Transform will be your friend. The most straightforward way would be to do an autocorrelation on the recorded interval, possibly with some pre-filtering to enhance the typical vocal frequency range. If the data is only noise, the autocorrelation will show a large signal at point 0 and only small, obviously noisy numbers everywhere else. There are practical aspects that make things less clear. For example, voices tend to be spiky and erratic so you need to use small intervals to have a better chance of getting an interval with a good S/N ratio, but small intervals will have a lower signal to noise ratio. Human speech is produced with various statistical regularities and these can sometimes be detected with various means, including the autocorrelation. You also will need to test-record your entire signal chain because it might be producing artifacts that could fool some tests. And background sounds could fool some tests as well. Here are some Python libraries that could be very helpful: librosa (I have not worked with this but it sounds right on target); scipy.signal (I have used scypi but not specifically scipy.signal); python-speech-features (another I haven't used); https://python-speech-features.readthedocs.io/en/latest/ Other people will know of others. From mal at egenix.com Mon Oct 28 11:06:41 2024 From: mal at egenix.com (Marc-Andre Lemburg) Date: Mon, 28 Oct 2024 16:06:41 +0100 Subject: Call for Participation: Python devroom @ FOSDEM 2025 Message-ID: Call for Participation We are happy to announce that we will again be running a*Python devroom at FOSDEM 2025*. This year's edition will be exclusively in-person, and take place on February 1 and 2, with the Python devroom being held on Sunday, February 2. If you haven?t heard about FOSDEM before or are looking for more information, you can visit the official website athttps://www.fosdem.org/. As usual, we are looking for multiple Pythonistas to help us shape the devroom schedule. We are now open to receiving your proposals ! With over 8500 participants, FOSDEM is the perfect place to share your story and meet fellow Python enthusiasts. Good luck to everyone applying. We?re looking forward to meeting you all at FOSDEM 2025 ! About FOSDEM * Official FOSDEM 2025 website * FOSDEM Code of Conduct FOSDEM is a free and non-commercial event organized by the community for the community. The goal is to provide free and open source software developers and communities a place to meet to: * get in touch with other developers and projects; * be informed about the latest developments in the free software world; * be informed about the latest developments in the open source world; * attend interesting talks and presentations on various topics by project leaders and contributors; * to promote the development and benefits of free software and open source solutions. Participation and attendance is totally free, though the organizers gratefully accept donations and sponsorship. Essential Information The Python devroom will be held on February 2 2025, from 09:00 until 17:00 CET. * *Submission deadline: December 1 2025* * There will be no extension of the deadline * Announcement of selected talks: December 15 2025 * The reference time is Brussels local lime (CET). * Talk format: 25 minutes presentation, including Q&A, if any. In-person only. * Live streaming of the talks will be available. Speaker Guidelines Please submit your talk proposals using theCFP submission page . * FOSDEM is using a self-hosted Pretalx installation for managing talk submissions. You will need to create an account, if you don?t already have one. * In the Submission notes field, please also confirm that if your talk is accepted, you will be able to attend FOSDEM and deliver your presentation. We will not consider proposals from prospective speakers who are unsure whether they will be able to secure funds for travel and lodging to attend FOSDEM. Sadly, we are not able to offer travel funding for prospective speakers. * You will need to select the "*Python*" track * Keep in mind presentations must be related to Python. All presentations will be recorded and made available under the Creative Commons licenses CC-BY-SA or CC-BY. Captured footage will later be shared online using the FOSDEM archives. By submitting you also agree to these terms. List of Desirable Topics We'd like to make the devroom topics as diverse as possible, so we are looking to offer a mixture of presentations, short tutorials, demos, live coding, etc. Aside from the usual talks about free and open source, we will also gladly welcome talks about e.g. * Best practices for Python developers * New developments in Python land * How to get started with a specific library/framework * Launching and growing Python communities * How Python is being used for education * Python for Hardware / Infrastructure * Security tools in Python and securing Python * Data science, AI and Machine Learning * Data engineering and management * Video Games (or game-related tooling) written in Python * MicroPython, CircuitPython, embedded software * Scaling Python applications Volunteers We will also call for volunteers to help us run the event and help us with the devroom operation. Please email the organizers, in case you are interested. Organizers You can reach out directly to the organizers, if you have a specific request or question: * Marc-Andr? Lemburg: mal+fosdem [at] egenix.com * Rosie Wood: rwood [at] turing.ac.uk * Ludo -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 28 2024) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ From loris.bennett at fu-berlin.de Tue Oct 29 09:56:01 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Tue, 29 Oct 2024 14:56:01 +0100 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read Message-ID: <87plnj3te6.fsf@zedat.fu-berlin.de> Hi, With Python 3.9.18, if I do try: with open(args.config_file, 'r') as config_file: config = configparser.ConfigParser() config.read(config_file) print(config.sections()) i.e try to read the configuration with the variable defined via 'with ... as', I get [] whereas if I use the file name directly try: with open(args.config_file, 'r') as config_file: config = configparser.ConfigParser() config.read(args.config_file) print(config.sections()) I get ['loggers', 'handlers', 'formatters', 'logger_root', 'handler_fileHandler', 'handler_consoleHandler', 'formatter_defaultFormatter'] which is what I expect. If I print type of 'config_file' I get whereas 'args.config_file' is just Should I be able to use the '_io.TextIOWrapper' object variable here? If so how? Here https://docs.python.org/3.9/library/configparser.html there are examples which use the 'with open ... as' variable for writing a configuration file, but not for reading one. Cheers, Loris -- This signature is currently under constuction. From python at mrabarnett.plus.com Tue Oct 29 12:10:47 2024 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 29 Oct 2024 16:10:47 +0000 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read In-Reply-To: <87plnj3te6.fsf@zedat.fu-berlin.de> References: <87plnj3te6.fsf@zedat.fu-berlin.de> Message-ID: <61f0d076-a72e-467d-b4a7-91c1c5792933@mrabarnett.plus.com> On 2024-10-29 13:56, Loris Bennett via Python-list wrote: > Hi, > > With Python 3.9.18, if I do > > try: > with open(args.config_file, 'r') as config_file: > config = configparser.ConfigParser() > config.read(config_file) > print(config.sections()) > > i.e try to read the configuration with the variable defined via 'with > ... as', I get > > [] > > whereas if I use the file name directly > > try: > with open(args.config_file, 'r') as config_file: > config = configparser.ConfigParser() > config.read(args.config_file) > print(config.sections()) > I get > > ['loggers', 'handlers', 'formatters', 'logger_root', 'handler_fileHandler', 'handler_consoleHandler', 'formatter_defaultFormatter'] > > which is what I expect. > > If I print type of 'config_file' I get > > > > whereas 'args.config_file' is just > > > > Should I be able to use the '_io.TextIOWrapper' object variable here? If so how? > > Here > > https://docs.python.org/3.9/library/configparser.html > > there are examples which use the 'with open ... as' variable for writing > a configuration file, but not for reading one. > > Cheers, > > Loris > 'config.read' expects a path or paths. If you give it a file handle, it treats it as an iterable. (It might be reading the line as paths of files, but I haven't tested it). If you want to read from an open file, use 'config.read_file' instead. From jon+usenet at unequivocal.eu Tue Oct 29 11:33:57 2024 From: jon+usenet at unequivocal.eu (Jon Ribbens) Date: Tue, 29 Oct 2024 15:33:57 -0000 (UTC) Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> Message-ID: On 2024-10-29, Loris Bennett wrote: > Hi, > > With Python 3.9.18, if I do > > try: > with open(args.config_file, 'r') as config_file: > config = configparser.ConfigParser() > config.read(config_file) > print(config.sections()) > > i.e try to read the configuration with the variable defined via 'with > ... as', I get > > [] > > whereas if I use the file name directly > > try: > with open(args.config_file, 'r') as config_file: > config = configparser.ConfigParser() > config.read(args.config_file) > print(config.sections()) > I get > > ['loggers', 'handlers', 'formatters', 'logger_root', 'handler_fileHandler', 'handler_consoleHandler', 'formatter_defaultFormatter'] > > which is what I expect. > > If I print type of 'config_file' I get > > > > whereas 'args.config_file' is just > > > > Should I be able to use the '_io.TextIOWrapper' object variable here? If so how? > > Here > > https://docs.python.org/3.9/library/configparser.html > > there are examples which use the 'with open ... as' variable for writing > a configuration file, but not for reading one. As per the docs you link to, the read() method only takes filename(s) as arguments, if you have an already-open file you want to read then you should use the read_file() method instead. From mats at wichmann.us Tue Oct 29 18:05:53 2024 From: mats at wichmann.us (Mats Wichmann) Date: Tue, 29 Oct 2024 16:05:53 -0600 Subject: learning Python In-Reply-To: References: Message-ID: On 10/27/24 16:51, o1bigtenor via Python-list wrote: > Greetings > > There are mountains of books out there. > > Any suggestions for documents for a just learning how to program and > starting with Python (3)? > > Preference to a tool where I would be learning by doing - - - that > works well for me. > > TIA Frankly, the mountain of resources is so vast that none of us can have experience of more than a small fraction, and effective learning is a factor not only of the quality of the teacher/book/training course, but how it meshes with your own learning style. If you like learn-by-doing, you might take a look at PyBites (https://pybit.es/). But they're by no means the only players in that space! From bowman at montana.com Tue Oct 29 21:22:00 2024 From: bowman at montana.com (rbowman) Date: 30 Oct 2024 01:22:00 GMT Subject: learning Python References: Message-ID: On Tue, 29 Oct 2024 16:05:53 -0600, Mats Wichmann wrote: > Frankly, the mountain of resources is so vast that none of us can have > experience of more than a small fraction, and effective learning is a > factor not only of the quality of the teacher/book/training course, but > how it meshes with your own learning style. It isn't a beginners tutorial but at some point 'Python Distilled' is helpful. https://www.dabeaz.com/python-distilled/ Usual disclaimer: i don't know Beazley and am not getting any kickback. From loris.bennett at fu-berlin.de Wed Oct 30 09:03:55 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Wed, 30 Oct 2024 14:03:55 +0100 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> Message-ID: <87bjz1vj2c.fsf@zedat.fu-berlin.de> Jon Ribbens writes: > On 2024-10-29, Loris Bennett wrote: >> Hi, >> >> With Python 3.9.18, if I do >> >> try: >> with open(args.config_file, 'r') as config_file: >> config = configparser.ConfigParser() >> config.read(config_file) >> print(config.sections()) >> >> i.e try to read the configuration with the variable defined via 'with >> ... as', I get >> >> [] >> >> whereas if I use the file name directly >> >> try: >> with open(args.config_file, 'r') as config_file: >> config = configparser.ConfigParser() >> config.read(args.config_file) >> print(config.sections()) >> I get >> >> ['loggers', 'handlers', 'formatters', 'logger_root', 'handler_fileHandler', 'handler_consoleHandler', 'formatter_defaultFormatter'] >> >> which is what I expect. >> >> If I print type of 'config_file' I get >> >> >> >> whereas 'args.config_file' is just >> >> >> >> Should I be able to use the '_io.TextIOWrapper' object variable here? If so how? >> >> Here >> >> https://docs.python.org/3.9/library/configparser.html >> >> there are examples which use the 'with open ... as' variable for writing >> a configuration file, but not for reading one. > > As per the docs you link to, the read() method only takes filename(s) > as arguments, if you have an already-open file you want to read then > you should use the read_file() method instead. As you and others have pointed out, this is indeed covered in the docs, so mea culpa. However, whereas I can see why you might want to read the config from a dict or a string, what would be a use case in which I would want to read from an open file rather than just reading from a file(name)? Cheers, Loris -- This signature is currently under constuction. From jon+usenet at unequivocal.eu Wed Oct 30 11:41:13 2024 From: jon+usenet at unequivocal.eu (Jon Ribbens) Date: Wed, 30 Oct 2024 15:41:13 -0000 (UTC) Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> Message-ID: On 2024-10-30, Loris Bennett wrote: > Jon Ribbens writes: >> As per the docs you link to, the read() method only takes filename(s) >> as arguments, if you have an already-open file you want to read then >> you should use the read_file() method instead. > > As you and others have pointed out, this is indeed covered in the docs, > so mea culpa. > > However, whereas I can see why you might want to read the config from a > dict or a string, what would be a use case in which I would want to > read from an open file rather than just reading from a file(name)? The ConfigParser module provides read(), read_file(), read_string(), and read_dict() methods. I think they were just trying to be comprehensive. It's a bit non-Pythonic really. From loris.bennett at fu-berlin.de Wed Oct 30 11:57:44 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Wed, 30 Oct 2024 16:57:44 +0100 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> Message-ID: <87r07xtwg7.fsf@zedat.fu-berlin.de> Jon Ribbens writes: > On 2024-10-30, Loris Bennett wrote: >> Jon Ribbens writes: >>> As per the docs you link to, the read() method only takes filename(s) >>> as arguments, if you have an already-open file you want to read then >>> you should use the read_file() method instead. >> >> As you and others have pointed out, this is indeed covered in the docs, >> so mea culpa. >> >> However, whereas I can see why you might want to read the config from a >> dict or a string, what would be a use case in which I would want to >> read from an open file rather than just reading from a file(name)? > > The ConfigParser module provides read(), read_file(), read_string(), > and read_dict() methods. I think they were just trying to be > comprehensive. It's a bit non-Pythonic really. OK, but is there a common situation might I be obliged to use 'read_file'? I.e. is there some common case where the file name is not available, only a corresponding file-like object or stream? -- This signature is currently under constuction. From jon+usenet at unequivocal.eu Wed Oct 30 13:57:23 2024 From: jon+usenet at unequivocal.eu (Jon Ribbens) Date: Wed, 30 Oct 2024 17:57:23 -0000 (UTC) Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> <87r07xtwg7.fsf@zedat.fu-berlin.de> Message-ID: On 2024-10-30, Loris Bennett wrote: > Jon Ribbens writes: >> On 2024-10-30, Loris Bennett wrote: >>> Jon Ribbens writes: >>>> As per the docs you link to, the read() method only takes filename(s) >>>> as arguments, if you have an already-open file you want to read then >>>> you should use the read_file() method instead. >>> >>> As you and others have pointed out, this is indeed covered in the docs, >>> so mea culpa. >>> >>> However, whereas I can see why you might want to read the config from a >>> dict or a string, what would be a use case in which I would want to >>> read from an open file rather than just reading from a file(name)? >> >> The ConfigParser module provides read(), read_file(), read_string(), >> and read_dict() methods. I think they were just trying to be >> comprehensive. It's a bit non-Pythonic really. > > OK, but is there a common situation might I be obliged to use > 'read_file'? I.e. is there some common case where the file name is not > available, only a corresponding file-like object or stream? Well, sure - any time it's not being read from a file. A bit ironic that the method to use in that situation is "read_file", of course. In my view the read() and read_file() methods have their names the wrong way round. But bear in mind this code is 27 years old, and the read() function came first. From loris.bennett at fu-berlin.de Thu Oct 31 02:47:17 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Thu, 31 Oct 2024 07:47:17 +0100 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> <87r07xtwg7.fsf@zedat.fu-berlin.de> Message-ID: <87y124db0q.fsf@zedat.fu-berlin.de> Jon Ribbens writes: > On 2024-10-30, Loris Bennett wrote: >> Jon Ribbens writes: >>> On 2024-10-30, Loris Bennett wrote: >>>> Jon Ribbens writes: >>>>> As per the docs you link to, the read() method only takes filename(s) >>>>> as arguments, if you have an already-open file you want to read then >>>>> you should use the read_file() method instead. >>>> >>>> As you and others have pointed out, this is indeed covered in the docs, >>>> so mea culpa. >>>> >>>> However, whereas I can see why you might want to read the config from a >>>> dict or a string, what would be a use case in which I would want to >>>> read from an open file rather than just reading from a file(name)? >>> >>> The ConfigParser module provides read(), read_file(), read_string(), >>> and read_dict() methods. I think they were just trying to be >>> comprehensive. It's a bit non-Pythonic really. >> >> OK, but is there a common situation might I be obliged to use >> 'read_file'? I.e. is there some common case where the file name is not >> available, only a corresponding file-like object or stream? > > Well, sure - any time it's not being read from a file. A bit ironic > that the method to use in that situation is "read_file", of course. > In my view the read() and read_file() methods have their names the > wrong way round. But bear in mind this code is 27 years old, and > the read() function came first. Yes, I suppose history has a lot to answer for :-) However I didn't make myself clear: I understand that there are different functions, depending on whether I have a file name or a stream. Nevertheless, I just can't think of a practical example where I might just have *only* a stream, especially one containing my configuration data. I was just interested to know if anyone can give an example. -- This signature is currently under constuction. From jon+usenet at unequivocal.eu Thu Oct 31 04:41:22 2024 From: jon+usenet at unequivocal.eu (Jon Ribbens) Date: Thu, 31 Oct 2024 08:41:22 -0000 (UTC) Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> <87r07xtwg7.fsf@zedat.fu-berlin.de> <87y124db0q.fsf@zedat.fu-berlin.de> Message-ID: On 2024-10-31, Loris Bennett wrote: > Jon Ribbens writes: >> On 2024-10-30, Loris Bennett wrote: >>> Jon Ribbens writes: >>>> On 2024-10-30, Loris Bennett wrote: >>>>> Jon Ribbens writes: >>>>>> As per the docs you link to, the read() method only takes filename(s) >>>>>> as arguments, if you have an already-open file you want to read then >>>>>> you should use the read_file() method instead. >>>>> >>>>> As you and others have pointed out, this is indeed covered in the docs, >>>>> so mea culpa. >>>>> >>>>> However, whereas I can see why you might want to read the config from a >>>>> dict or a string, what would be a use case in which I would want to >>>>> read from an open file rather than just reading from a file(name)? >>>> >>>> The ConfigParser module provides read(), read_file(), read_string(), >>>> and read_dict() methods. I think they were just trying to be >>>> comprehensive. It's a bit non-Pythonic really. >>> >>> OK, but is there a common situation might I be obliged to use >>> 'read_file'? I.e. is there some common case where the file name is not >>> available, only a corresponding file-like object or stream? >> >> Well, sure - any time it's not being read from a file. A bit ironic >> that the method to use in that situation is "read_file", of course. >> In my view the read() and read_file() methods have their names the >> wrong way round. But bear in mind this code is 27 years old, and >> the read() function came first. > > Yes, I suppose history has a lot to answer for :-) > > However I didn't make myself clear: I understand that there are > different functions, depending on whether I have a file name or a > stream. Nevertheless, I just can't think of a practical example where I > might just have *only* a stream, especially one containing my > configuration data. I was just interested to know if anyone can give an > example. That was answered the first sentence of my reply. It's a bit vague because in most of the situations I can think of, one of the other read_*() methods would probably be more appropriate. But again, the history is that read_file() was added first (originally called readfp() ) so it had to handle all cases where the data being read was not coming from a named filesystem file - e.g. it's coming over a Unix socket, or an HTTP request, or from a database. It is good practice in general to provide a method that allows your class to read data as a stream, if that is appropriate for what you're doing, so that people aren't unnecessarily forced to load data fully into memory or write it to a file, as well as perhaps a convenience method thaat will read from a named file for people who are doing that. From loris.bennett at fu-berlin.de Thu Oct 31 07:05:50 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Thu, 31 Oct 2024 12:05:50 +0100 Subject: Poetry: endpoints with endpoints Message-ID: <874j4sbkhd.fsf@zedat.fu-berlin.de> Hi, I am using Poetry and have the following in my pyproj.toml [tool.poetry.scripts] frobnicate = "frobnicator.cli:frobnicate" The CLI provides an option '--flavour' and I would like to add further endpoints for specific values of 'flavour'. I tried adding frobnicate_foo = "frobnicator.cli:frobnicate --flavour foo" to '[tool.poetry.scripts]', but when I call this I get the error $ poetry run frobnicate_foo --verbose File "", line 1 import sys; from importlib import import_module; sys.argv = ['frobnicate, '--verbose']; import_module('frobniator.cli').frobnicate --flavour foo() ^ SyntaxError: invalid syntax Is it possible to add such endpoint? If so, how? Cheers, Loris -- This signature is currently under constuction. From loris.bennett at fu-berlin.de Thu Oct 31 11:33:41 2024 From: loris.bennett at fu-berlin.de (Loris Bennett) Date: Thu, 31 Oct 2024 16:33:41 +0100 Subject: Printing UTF-8 mail to terminal Message-ID: <878qu49tii.fsf@zedat.fu-berlin.de> Hi, I have a command-line program which creates an email containing German umlauts. On receiving the mail, my mail client displays the subject and body correctly: Subject: ?bung Sehr geehrter Herr Dr. Bennett, Dies ist eine ?bung. So far, so good. However, when I use the --verbose option to print the mail to the terminal via if args.verbose: print(mail) I get: Subject: ?bungsbetreff Sehr geehrter Herr Dr. Bennett, Dies ist eine =C3=9Cbung. What do I need to do to prevent the body from getting mangled? I seem to remember that I had issues in the past with a Perl version of a similar program. As far as I recall there was an issue with fact the greeting is generated by querying a server, whereas the body is being read from a file, which lead to oddities when the two bits were concatenated. But that might just have been a Perl thing. Cheers, Loris -- This signature is currently under constuction. From info at egenix.com Thu Oct 31 07:29:25 2024 From: info at egenix.com (eGenix Team) Date: Thu, 31 Oct 2024 12:29:25 +0100 Subject: ANN: PyDDF Python Herbst Sprint 2024 Message-ID: /This announcement is in German since it targets a local user group//meeting in D?sseldorf, Germany/ Ank?ndigung Python Meeting Herbst Sprint 2024 in D?sseldorf Samstag, 09.11.2024, 10:00-18:00 Uhr Sonntag, 10.11.2024. 10:00-18:00 Uhr /Eviden / Atos Information Technology GmbH /, Am Seestern 1, 40547 D?sseldorf Informationen Das Python Meeting D?sseldorf (PyDDF) veranstaltet mit freundlicher Unterst?tzung von Eviden Deutschland ein Python Sprint Wochenende. Der Sprint findet am Wochenende 09./10.11.2024 in der Eviden / Atos Niederlassung, Am Seestern 1, in D?sseldorf statt. * Sprint Ort in Google Maps Folgende Themengebiete sind als Anregung bereits angedacht: * *AI/ML: Bilderkennung* mit Azure Computervision * *AI/ML: Texte und Meta Daten aus Presseseiten extrahieren*, mit Hilfe eines lokalen LLMs * *AI/ML: Transkription* von Videos/Audiodateien mit Whisper * *Kodi Add-Ons* f?r ARD, ZDF und ARTE ** Nat?rlich k?nnen die Teilnehmenden weitere Themen vorschlagen und umsetzen. Anmeldung, Kosten und weitere Infos Alles weitere und die Anmeldung findet Ihr auf der Meetup Sprint Seite: * *Python Spring Sprint & Hackathon in D?sseldorf* *WICHTIG*: Ohne Anmeldung k?nnen wir den Geb?udezugang nicht vorbereiten. Eine spontane Anmeldung am Sprint Tag wird daher vermutlich nicht funktionieren. Teilnehmer sollten sich zudem in der PyDDF Telegram Gruppe registrieren, da wir uns dort koordinieren: * *PyDDF Telegram Gruppe * ?ber das Python Meeting D?sseldorf Das Python Meeting D?sseldorf ist eine regelm??ige Veranstaltung in D?sseldorf, die sich an Python-Begeisterte aus der Region wendet. Einen guten ?berblick ?ber die Vortr?ge bietet unser PyDDF YouTube-Kanal , auf dem wir Videos der Vortr?ge nach den Meetings ver?ffentlichen. Veranstaltet wird das Meeting von der eGenix.com GmbH , Langenfeld, in Zusammenarbeit mit Clark Consulting & Research , D?sseldorf. Viel Spa? ! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 31 2024) >>> Python Projects, Coaching and Support ... https://www.egenix.com/ >>> Python Product Development ... https://consulting.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/ From Karsten.Hilbert at gmx.net Thu Oct 31 12:10:42 2024 From: Karsten.Hilbert at gmx.net (Karsten Hilbert) Date: Thu, 31 Oct 2024 17:10:42 +0100 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read In-Reply-To: <87y124db0q.fsf@zedat.fu-berlin.de> References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> <87r07xtwg7.fsf@zedat.fu-berlin.de> <87y124db0q.fsf@zedat.fu-berlin.de> Message-ID: Am Thu, Oct 31, 2024 at 07:47:17AM +0100 schrieb Loris Bennett via Python-list: > However I didn't make myself clear: I understand that there are > different functions, depending on whether I have a file name or a > stream. Nevertheless, I just can't think of a practical example where I > might just have *only* a stream, especially one containing my > configuration data. I was just interested to know if anyone can give an > example. Apart from the fact that any data source can be made into a file: one might have a stream of data coming in over, say, http, as in a centralized configuration repository. Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B From olegsivokon at gmail.com Thu Oct 31 12:38:50 2024 From: olegsivokon at gmail.com (Left Right) Date: Thu, 31 Oct 2024 17:38:50 +0100 Subject: Printing UTF-8 mail to terminal In-Reply-To: <878qu49tii.fsf@zedat.fu-berlin.de> References: <878qu49tii.fsf@zedat.fu-berlin.de> Message-ID: There's quite a lot of misuse of terminology around terminal / console / shell. Please, correct me if I'm wrong, but it looks like you are printing that on MS Windows, right? MS Windows doesn't have or use terminals (that's more of a Unix-related concept). And, by "terminal" I mean terminal emulator (i.e. a program that emulates the behavior of a physical terminal). You can, of course, find some terminal programs for windows (eg. mintty), but I doubt that that's what you are dealing with. What MS Windows users usually end up using is the console. If you run, eg. cmd.exe, it will create a process that displays a graphical console. The console uses an encoding scheme to represent the text output. I believe that the default on MS Windows is to use some single-byte encoding. This answer from SE family site tells you how to set the console encoding to UTF-8 permanently: https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8 , which, I believe, will solve your problem with how the text is displayed. On Thu, Oct 31, 2024 at 5:19?PM Loris Bennett via Python-list wrote: > > Hi, > > I have a command-line program which creates an email containing German > umlauts. On receiving the mail, my mail client displays the subject and > body correctly: > > Subject: ?bung > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine ?bung. > > So far, so good. However, when I use the --verbose option to print > the mail to the terminal via > > if args.verbose: > print(mail) > > I get: > > Subject: ?bungsbetreff > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine =C3=9Cbung. > > What do I need to do to prevent the body from getting mangled? > > I seem to remember that I had issues in the past with a Perl version of > a similar program. As far as I recall there was an issue with fact the > greeting is generated by querying a server, whereas the body is being > read from a file, which lead to oddities when the two bits were > concatenated. But that might just have been a Perl thing. > > Cheers, > > Loris > > -- > This signature is currently under constuction. > -- > https://mail.python.org/mailman/listinfo/python-list From python at mrabarnett.plus.com Thu Oct 31 13:06:11 2024 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 31 Oct 2024 17:06:11 +0000 Subject: Using 'with open(...) as ...' together with configparser.ConfigParser.read In-Reply-To: <87y124db0q.fsf@zedat.fu-berlin.de> References: <87plnj3te6.fsf@zedat.fu-berlin.de> <87bjz1vj2c.fsf@zedat.fu-berlin.de> <87r07xtwg7.fsf@zedat.fu-berlin.de> <87y124db0q.fsf@zedat.fu-berlin.de> Message-ID: On 2024-10-31 06:47, Loris Bennett via Python-list wrote: > Jon Ribbens writes: > >> On 2024-10-30, Loris Bennett wrote: >>> Jon Ribbens writes: >>>> On 2024-10-30, Loris Bennett wrote: >>>>> Jon Ribbens writes: >>>>>> As per the docs you link to, the read() method only takes filename(s) >>>>>> as arguments, if you have an already-open file you want to read then >>>>>> you should use the read_file() method instead. >>>>> >>>>> As you and others have pointed out, this is indeed covered in the docs, >>>>> so mea culpa. >>>>> >>>>> However, whereas I can see why you might want to read the config from a >>>>> dict or a string, what would be a use case in which I would want to >>>>> read from an open file rather than just reading from a file(name)? >>>> >>>> The ConfigParser module provides read(), read_file(), read_string(), >>>> and read_dict() methods. I think they were just trying to be >>>> comprehensive. It's a bit non-Pythonic really. >>> >>> OK, but is there a common situation might I be obliged to use >>> 'read_file'? I.e. is there some common case where the file name is not >>> available, only a corresponding file-like object or stream? >> >> Well, sure - any time it's not being read from a file. A bit ironic >> that the method to use in that situation is "read_file", of course. >> In my view the read() and read_file() methods have their names the >> wrong way round. But bear in mind this code is 27 years old, and >> the read() function came first. > > Yes, I suppose history has a lot to answer for :-) > > However I didn't make myself clear: I understand that there are > different functions, depending on whether I have a file name or a > stream. Nevertheless, I just can't think of a practical example where I > might just have *only* a stream, especially one containing my > configuration data. I was just interested to know if anyone can give an > example. > What if the config file was inside a zipped folder? Although I haven't used ConfigParser like that, I have read the contents of files that are in a zipped folder. It means that I don't have to extract the file first. From cs at cskk.id.au Thu Oct 31 16:50:56 2024 From: cs at cskk.id.au (Cameron Simpson) Date: Fri, 1 Nov 2024 07:50:56 +1100 Subject: Printing UTF-8 mail to terminal In-Reply-To: <878qu49tii.fsf@zedat.fu-berlin.de> References: <878qu49tii.fsf@zedat.fu-berlin.de> Message-ID: On 31Oct2024 16:33, Loris Bennett wrote: >I have a command-line program which creates an email containing German >umlauts. On receiving the mail, my mail client displays the subject and >body correctly: [...] >So far, so good. However, when I use the --verbose option to print >the mail to the terminal via > > if args.verbose: > print(mail) > >I get: > > Subject: ?bungsbetreff > > Sehr geehrter Herr Dr. Bennett, > > Dies ist eine =C3=9Cbung. > >What do I need to do to prevent the body from getting mangled? That looks to me like quoted-printable. This is an encoding for binary transport of text to make it robust against not 8-buit clean transports. So your Unicode text is encodings as UTF-8, and then that is encoded in quoted-printable for transport through the email system. Your terminal probably accepts UTF-8 - I imagine other German text renders corectly? You need to get the text and undo the quoted-printable encoding. If you're using the Python email module to parse (or construct) the message as a `Message` object I'd expect that to happen automatically. If you're just dealing with this directly, use the `quopri` stdlib module: https://docs.python.org/3/library/quopri.html Cheers, Cameron Simpson From learn2program at gmail.com Thu Oct 31 17:53:31 2024 From: learn2program at gmail.com (Alan Gauld) Date: Thu, 31 Oct 2024 21:53:31 +0000 Subject: Printing UTF-8 mail to terminal Message-ID: <9f754662-4b73-4d3d-a9b8-ac0a40762143@yahoo.co.uk> On 31/10/2024 20:50, Cameron Simpson via Python-list wrote: > That looks to me like quoted-printable. This is an encoding for binary > transport of text to make it robust against not 8-buit clean ... > If you're just dealing with this directly, use the `quopri` stdlib > module: https://docs.python.org/3/library/quopri.html One of the things I love about this list are these little features that I didn't know existed. Despite having used Python for over 25 years, I've never noticed that module before! :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos From thjmmj15 at gmail.com Thu Oct 31 19:37:44 2024 From: thjmmj15 at gmail.com (Tim Johnson) Date: Thu, 31 Oct 2024 15:37:44 -0800 Subject: Correct module for site customization of path Message-ID: FYI: I am retired programmer using a recent upgrade to ubuntu 24.04 and python 3.12 My needs are that of a hobbyist at this time. I am on a single user home desktop with root privileges available. After the recent upgrades I had to install youtube_dl with pipx for the new python version. When I ran the script which imported youtube_dl, I got an import error as it appears the path to the module was not in sys.path. For me,? it was a simple matter of appending the path for youtube_dl to sys.path, however, I would prefer to not have to do an append at every script using it. There is a boatload of documentation of site path configuration, but still, I am not sure what option to take. Recommendations are invited and welcome. Thanks -- Tim thjmmj15 at gmail.com