[Python-checkins] python/dist/src/Misc NEWS,1.397,1.398

tim_one@sourceforge.net tim_one@sourceforge.net
Sun, 28 Apr 2002 18:37:34 -0700


Update of /cvsroot/python/python/dist/src/Misc
In directory usw-pr-cvs1:/tmp/cvs-serv32409/python/Misc

Modified Files:
	NEWS 
Log Message:
Mostly in SequenceMatcher.{__chain_b, find_longest_match}:
This now does a dynamic analysis of which elements are so frequently
repeated as to constitute noise.  The primary benefit is an enormous
speedup in find_longest_match, as the innermost loop can have factors
of 100s less potential matches to worry about, in cases where the
sequences have many duplicate elements.  In effect, this zooms in on
sequences of non-ubiquitous elements now.

While I like what I've seen of the effects so far, I still consider
this experimental.  Please give it a try!


Index: NEWS
===================================================================
RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v
retrieving revision 1.397
retrieving revision 1.398
diff -C2 -d -r1.397 -r1.398
*** NEWS	26 Apr 2002 20:11:29 -0000	1.397
--- NEWS	29 Apr 2002 01:37:32 -0000	1.398
***************
*** 73,79 ****
  Extension modules
  
! - The bsddb.*open functions can now take 'None' as a filename. 
    This will create a temporary in-memory bsddb that won't be
!   written to disk. 
  
  - posix.mknod was added.
--- 73,79 ----
  Extension modules
  
! - The bsddb.*open functions can now take 'None' as a filename.
    This will create a temporary in-memory bsddb that won't be
!   written to disk.
  
  - posix.mknod was added.
***************
*** 99,102 ****
--- 99,111 ----
  
  Library
+ 
+ - difflib's SequenceMatcher class now does a dynamic analysis of
+   which elements are so frequent as to constitute noise.  For
+   comparing files as sequences of lines, this generally works better
+   than the IS_LINE_JUNK function, and function ndiff's linejunk
+   argument defaults to None now as a result.  A happy benefit is
+   that SequenceMatcher may run much faster now when applied
+   to large files with many duplicate lines (for example, C program
+   text with lots of repeated "}" and "return NULL;" lines).
  
  - New Text.dump() method in Tkinter module.