Python pattern repository

Fri Oct 18 01:07:04 EDT 2002

On 16 Oct 2002 03:19:26 +0200, Chris Liechti <cliechti at gmx.net> wrote:

>[posted and mailed]
>
>bokr at oz.net (Bengt Richter) wrote in news:aoi76i$b1m$0 at 216.39.172.122:
>> On Tue, 15 Oct 2002 18:33:13 GMT, Robin Munn <rmunn at pobox.com> wrote:
>> The fact that there is so much that can be found via Google
>...
>> Introducing PyPAN: Python Pervasive Archive Network ;-)
>> 
>> Here's the concept: If you want to include your code snippet
>> in the PyPAN, post it embedded in a document that Google will see.
>
>while i think your idea is nice, this can be the first problem...
>google is only includeing pages that are linked somewhere. it took many 
>months until it indexed my page with a gcc port...
Your post was on google the day after (maybe sooner). clp posts get there
very quickly.

> 
>> You embed it for easy extraction by putting a PyPAN expression
>> in the first and last (+/- 1, discussed later[1]) lines of your
>> snippet, e.g., 
>> 
>> # ++PyPAN++ mySnippet.py /clp/forcomment/ -- minimal PyPan snippet
>> def mySnippet():
>>     print 'Hello PyPAN!'
>> # --PyPAN--
>> 
>> I think Google would find '++PyPan++' and show an interesting list.
>
>why do you use "+-" etc in the marker? those are special characters to 
>google and it ignores punctuation/special chars in other cases.
>i'd stay with letters only.
Ok, point taken. But I want something that will stand out visually as well
as make a unique search target. Since google seems to treat underscores
as letters, I will use those. If google just whitewashes punctuation
it shouldn't hurt to use +- for PyPAN processing purposes, so long as
it's not meant to be used as part of a google search pattern. Will have
to test... ;-)

>
>> The "/clp/forcomment/" part of the expression is optional, but the
>> intent is to express the location of mySnippet.py in a classification
>> hierarchy, to aid in searching, to limit hits to particular topics
>> etc. mySnippet.py is a recommended file name, and comes first after
>> the '++PyPAN++' tag. 
>> 
>> The classification path is also for optional use as an actual
>> directory path which can be rooted anywhere convenient for the user
>> (e.g., ~/PyPAN or C:\pywk\PyPan etc.) and thereby support automatic
>> extraction/downloading/placement from e.g., newsgroup archives, disk
>> files, etc. 
>> 
>> One common usage would be to see a PyPAN snippet in a post -- like the
>> above in this post. To make it available to the extraction tool as a
>> file, I wrote a little program [2] called getclip.exe which simply
>> gets the text from the windows clipboard and writes it to stdout. This
>> makes the clipboard visible as a file object using os.popen('getclip')
>> -- which you can pass to anything that wants to read a file. (getclip
>> is also handy outside of Python, since you can easily pipe the output
>> or redirect it to a file, without having to go to an editor an pasting
>> and saving-as. Instead you just type getclip>theFile.txt). 
>> 
>> The intent is not to require you to select the exact lines, but just
>> do a select-all, copy or whatever is easy. Then getclip will make all
>> that available to the actual snippet extractor, which can put it in
>> particular directories, etc. 
>> 
>> I am putting together a PyPAN.py module to provide convenient methods
>> for retrieving PyPAN snippets from clipboard, files, or urls, etc. by
>> regex pattern matches, but it's not finished. It will be runnable from
>> the command line or importable for programmatic use. There will be 
>> options for file placement similar to winzip extraction. I.e., you can
>> ignore paths and put everything in a specified directory, or you can
>> root the paths where you like etc. I guess if there is no interest in
>> PyPAN, I may only be able to retrieve my own snippets ;-) 
>
>oh, i think if it's that simple to use that many people will use the marker 
>in their news posts.
Well, I hope so, but there has to be a way to separate the wheat from the chaff.
People shouldn't be afraid to use it casually, but real gems should wind up
with a different search key when they have been polished to that state. I am
thinking a tag should be reserved for casual ng use, e.g., __PyPAN_CLP.

How to classify stuff requries some thought...
>
>> In any case, I would be interested in hearing of any standard
>> hierarchy for classifying software. Is there real librarian in the
>> house? 
>
>well i'm no expert in that area... but i found the reverse URL type of 
>hierarchy of Java a good idea. that way you avoid name conflicts. on the 
>other hand many people will prefer an order by topic rather than 
>organization/author.
I prefer to search by topic unless I know an exact file...

>
>> ---------------
>> [1] Variations on the PyPAN tags:
>> (Note that PyPAN will search based on space-delimited tags, therefore
>> quoting them as in the following makes them safe against inadvertently
>> interfering with searching for an actual snippet like the (not quite)
>> minimal one above). 
>> 
>> '++PyPAN++'   => start with current line
>> '++PyPAN++-'  => start with previous line
>> '++PyPAN+++'  => start with next line
>> '++PyPAN--'   => reserved for future expressions within a snippet
>> '--PyPAN--'   => end with current line
>> '--PyPAN---'  => end with previous line
>> '--PyPAN--+'  => end with next line
>
>as mentioned above, i'm not sure how well the special characters will work. 
>e.g. google searches for entire words and ignores special chars, but i 
>think it understands "+" and "-" as include/exclude word so that --PyPAN--
>would possibly mean no results containing that string...
>
>why do you want to complicate thigs with that many magic tags anyway?
>why not '# PyPANsnippet filename.py /hier/archy version'

Yes that would work. But I'd like something to make the delimiting lines
stand out visually, so I'd tend to prefix a couple of underscores.
I still want to deal with including a preceding #! line or not.

>
>> ---------------
>> 
>> [2]
>> /* ++PyPAN++ getclip.c -- get and write win32 clipboard text to stdout
>> */ /*
>> ** To compile with msvc++60 at command line use
>> **      cl getclip.c /link /defaultlib:user32
>> */
>> 
>> #include <io.h>
>> #include <string.h>
>> #include <windows.h>
>> 
>> int main (){
>>     HANDLE hClipData;                   /* handle to clip data  */ 
>>     LPSTR lpClipData;                   /* pointer to clip data */ 
>>     if (!OpenClipboard(NULL)) return 0; /* NULL <=> current task */
>>     /* get text from the clipboard */ 
>>     if( (hClipData = GetClipboardData(CF_TEXT)) &&
>>         (lpClipData = GlobalLock(hClipData))
>>     ){ 
>>         write(1, lpClipData, strlen(lpClipData)); /*text string to
>>         stdout */ GlobalUnlock(hClipData); 
>>         CloseClipboard(); return 0;
>>     } else {
>>         CloseClipboard(); return 1; 
>>     }
>> }
>> /* --PyPAN-- */
>> 
>> 
>> Note: The cl command assumes environment settings, which
>> you can set by invoking D:\VC98\Bin\VCVARS32.BAT (or
>> whatever your path to it is).
>> 
>> BTW, getclip.exe is not big (freshly recompiled):
>> 
>>     02-10-15  15:58                 24,576 getclip.exe
>
>hehe. 3'584 Bytes GCC/stripped.
I'll have to look into that ;-) I remember when hello world was a 34-byte .com ;-)

>
>--makefile--
>CFLAGS = -mno-cygwin
>getclip.exe: getclip.o
>	$(CC) -mno-cygwin -o $@ $^
>	strip $@
>------------
>
>> Further ideas?
>
>as mentioned above, google is a bit picky if a site is not referenced 
>anywhere. so it would mabe make sense if there were a page (where many 
>links point to) where anyone could enter his URL, so that the chances are 
>increased that it is found by google and other search engines.
>a simple wikki page would do, or a mailinglist with HTML accessible 
>archive.
>a bunch of people should place a link to that site on their pages to 
>increase the google pagerank and to increase the probability that the 
>linked pages are found.
>
At least for newsgroup postings, it seems to go fast. And for that we
also have the python.org archives.

>i think this is a nice idea. with your PyPAN module there could be an easy 
>access to the information and we get a lot of infrastructure "sponsored" 
>(storage space for the snippets (with redundancy :-) search engines, ...)
>
Thanks, I think the idea generalizes to embedded documents (which of course
rings the MIME bell), but I think  want to resist the urge widen the scope
too much ;-) 

Other things to leverage might be the restructuredtext (?) and/or YAML.
But anything can be put beween the covers of the covers are simple.
I am thinking of PyPAN_822 as a flag for optional immediately following
rfc822 headers (ending normally in a blank line, but then continuing as
if the block wasn't there). Sort of transparent metadata.

I would allow it to be indented as a block lined up with the flag, so
it could be inside of python comments and posting quotes etc.

>it may be a bit slow 'cause most pages are not refreshed that often. so it 
>may take a week or two until a snipped is indexed.
>
As mentioned, this doesn't seem to apply to clp postings.

>and an other problem might come up. i'll just mention "versions"...
>it will become difficult to find the best snipped if there are so many 
>similar entries etc.
A really important point! I don't have a good plan yet...

>
>oh, and maybe there should be some convention for keywords, like that they 
>should follow right after the tagline as python comment or so. or do you 
>think that the source code + message around it is enough to find it?
Well, google seems to include the search tag +- a small amount of context
which may just amount to seeking backwards n characters in the file and
reading forward 2*n and sanitizing it a bit. I don't know. But the
tag and the line it makes stands a good chance of being listed, and I think
it's important to make that human-usable.

I was hoping the hierachy would work somewhat like keywords. Of course
with PyPan_822 you could follow with Keywords: whatever ...
and other metadata as well.

I am thinking that casual use for discussion should use a tag like
__PyPan_CLP filename /hier/archy/ -- human-usable description
and then when newsgroup discussion evolves it to where it's archive ready,
I am thinking it might be useful to differentiate types of snippets by xxx in
__PyPAN_xxx , where xxx might be e.g., PEP, PEPX, Tip, Faq, Arcanum, Warn, Alarm, Doc, Info, etc.
for non-code and maybe __PyPAN_Py fname.ext followed by hierarchy prefixes such as /Tool/,
/Tcl/, /Demo/, /Tut/, etc. for code. Maybe /PEP/nnn/ for suggested code re PEPnnn.

This is just OTTOMH ;-)
>
>there will be ground for many tools, like pygoogle combined with xx etc, 
>indexing bots, that collect all snippets and place them on their own page, 
>spcialized search engines, ...
>
There are lots of possibilities.
I hope I haven't miffed the catalog and cookbook people by trespassing on their turf.
That's not the intent. This is to start something where it isn't happening, not to
replace something. I think there could be some synergy. ;-)

Regards,
Bengt Richter