[Tutor] Tutor Digest, Vol 63, Issue 8

Tue May 5 05:43:23 CEST 2009

Hello Spir, Alan, and Paul,

Thank you for your help. I have been working on the file, but I still have a
problem doing what I wanted. As a reminder,

I have

#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}

TAB = '\t'

def newlyTaggedWord(line):
       (word,tag) = line.split(TAB)    # separate parts of line, keeping
data only
       new_tags = tags[tag]          # read in dict
       tagging = TAB.join(new_tags)    # join with TABs
       return word + TAB + tagging   # formatted result

def replaceTagging(source_name, target_name):
       target_file = open(target_name, "w")
       # replacement loop
       for line in open(source_name, "r"):
           new_line = newlyTaggedWord(line) + '\n'
           target_file.write(new_line)

target_file.close()

if __name__ == "__main__":
       source_name = sys.argv[1]
       target_name = sys.argv[2]
       replaceTagging(source_name, target_name)

On Mon, May 4, 2009 at 12:38 PM, <tutor-request at python.org> wrote:

> Send Tutor mailing list submissions to
>        tutor at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mail.python.org/mailman/listinfo/tutor
> or, via email, send a message with subject or body 'help' to
>        tutor-request at python.org
>
> You can reach the person managing the list at
>        tutor-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Tutor digest..."
>
>
> Today's Topics:
>
>   1. Re: Iterating over a long list with regular expressions and
>      changing each item? (Paul McGuire)
>   2. Advanced String Search using operators AND, OR etc.. (Alex Feddor)
>   3. Re: Encode problem (Pablo P. F. de Faria)
>   4. Re: Encode problem (Pablo P. F. de Faria)
>   5. Re: Advanced String Search using operators AND, OR etc..
>      (vince spicer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 4 May 2009 11:17:53 -0500
> From: "Paul McGuire" <ptmcg at austin.rr.com>
> Subject: Re: [Tutor] Iterating over a long list with regular
>        expressions and changing each item?
> To: <tutor at python.org>
> Message-ID: <99B447F3C7EF4996AA2ED683F1EE6DB6 at AWA2>
> Content-Type: text/plain;       charset="us-ascii"
>
> Original:
>  'case_def_gen':['case_def','gen','null'],
>  'nsuff_fem_pl':['nsuff','null', 'null'],
>  'abbrev': ['abbrev, null, null'],
>  'adj': ['adj, null, null'],
>  'adv': ['adv, null, null'],}
>
> Note the values for 'abbrev', 'adj' and 'adv' are not lists, but strings
> containing comma-separated lists.
>
> Should be:
>  'case_def_gen':['case_def','gen','null'],
>  'nsuff_fem_pl':['nsuff','null', 'null'],
>  'abbrev': ['abbrev', 'null', 'null'],
>  'adj': ['adj', 'null', 'null'],
>  'adv': ['adv', 'null', 'null'],}
>
> For much of my own code, I find lists of string literals to be tedious to
> enter, and easy to drop a ' character.  This style is a little easier on
> the
> eyes, and harder to screw up.
>
>  'case_def_gen':['case_def gen null'.split()],
>  'nsuff_fem_pl':['nsuff null null'.split()],
>  'abbrev': ['abbrev null null'.split()],
>  'adj': ['adj null null'.split()],
>  'adv': ['adv null null'.split()],}
>
> Since all that your code does at runtime with the value strings is
> "\t".join() them, then you might as well initialize the dict with these
> computed values, for at least some small gain in runtime performance:
>
>  T = lambda s : "\t".join(s.split())
>  'case_def_gen' : T('case_def gen null'),
>  'nsuff_fem_pl' : T('nsuff null null'),
>  'abbrev' :       T('abbrev null null'),
>  'adj' :          T('adj null null'),
>  'adv' :          T('adv null null'),}
>  del T
>
> (Yes, I know PEP8 says *not* to add spaces to line up assignments or other
> related values, but I think there are isolated cases where it does help to
> see what's going on.  You could even write this as:
>
>  T = lambda s : "\t".join(s.split())
>  'case_def_gen' : T('case_def  gen  null'),
>  'nsuff_fem_pl' : T('nsuff     null null'),
>  'abbrev' :       T('abbrev    null null'),
>  'adj' :          T('adj       null null'),
>  'adv' :          T('adv       null null'),}
>  del T
>
> and the extra spaces help you to see the individual subtags more easily,
> with no change in the resulting values since split() splits on multiple
> whitespace the same as a single space.)
>
> Of course you could simply code as:
>
>  'case_def_gen' : T('case_def\tgen\t null'),
>  'nsuff_fem_pl' : T('nsuff\tnull\tnull'),
>  'abbrev' :       T('abbrev\tnull\tnull'),
>  'adj' :          T('adj\tnull\tnull'),
>  'adv' :          T('adv\tnull\tnull'),}
>
> But I think readability definitely suffers here, I would probably go with
> the penultimate version.
>
> -- Paul
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 4 May 2009 14:45:06 +0200
> From: Alex Feddor <alex.feddor at gmail.com>
> Subject: [Tutor] Advanced String Search using operators AND, OR etc..
> To: tutor at python.org
> Message-ID:
>        <5bf184e30905040545i78bc75b8ic78eabf44a55aa20 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi
>
> I am looking for method enables advanced text string search. Method
> string.find() or re module seems no  supporting what I am looking for. The
> idea is as follows:
>
> Text ="FDA meeting was successful. New drug is approved for whole sale
> distribution!"
>
> I would like to scan the text using AND and OR operators and gets -1 or
> other value if the searching elements haven't found in the text.
> Example 01:
> search criteria:  "FDA" AND ( "approve*" OR "supported")
> The catch is that in Text variable FDA and approve words  are not one after
> another (other words are in between).
> Example 02:
> search criteria: "Ben"
> The catch is that code sould find only exact Ben words not also words which
> that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben
> is
> the right word we are looking for.
>
> I would really appreciated your advice - code sample / links how above can
> be achieved! if possible I would appreciated solution achieved with free of
> charge module.
>
> Cheers,  Alex
> PS:
> A few moths ago I have discovered Python. I am amazed what all can be done
> with it. Really cool programming language..
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/tutor/attachments/20090504/bbd34b5a/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Mon, 4 May 2009 11:09:25 -0300
> From: "Pablo P. F. de Faria" <pablofaria at gmail.com>
> Subject: Re: [Tutor] Encode problem
> To: Kent Johnson <kent37 at tds.net>
> Cc: *tutor python <tutor at python.org>
> Message-ID:
>        <3ea81d4c0905040709m78a45d11j2037943380817297 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names). I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-). The rest of the
> application works properly with utf-8, except for ConfigParser. What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.
>
> 2009/5/1 Kent Johnson <kent37 at tds.net>:
> > On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria
> > <pablofaria at gmail.com> wrote:
> >> Hi, Kent.
> >>
> >> The stack trace is:
> >>
> >> Traceback (most recent call last):
> >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in
> OnClose
> >> ? ?self.SavePreferences()
> >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068,
> >> in SavePreferences
> >> ? ?self.cfg.set(u'File Settings',u'Recent files',
> >> unicode(",".join(self.recent_files)))
> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> >> 12: ordinal not in range(128)
> >>
> >> The "unicode" function, actually doesn't do any difference... The
> >> content of the string being saved is "/home/pablo/?rea de
> >> Trabalho/teste.xml".
> >
> > OK, this error is in your code, not the ConfigParser. The problem is with
> > ",".join(self.recent_files)
> >
> > Are the entries in self.recent_files unicode strings? If so, then I
> > think the join is trying to convert to a string using the default
> > codec. Try
> >
> > self.cfg.set('File Settings','Recent files',
> > ','.join(name.encode('utf-8') for name in self.recent_files))
> >
> > Looking at the ConfigParser.write() code, it wants the values to be
> > strings or convertible to strings by calling str(), so non-ascii
> > unicode values will be a problem there. I would use plain strings for
> > all the interaction with ConfigParser and convert to Unicode yourself.
> >
> > Kent
> >
> > PS Please Reply All to reply to the list.
> >
>
>
>
> --
> ---------------------------------
> "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas."
> (Oscar Wilde)
> ---------------------------------
> Pablo Faria
> Mestrando em Aquisi??o de Linguagem - IEL/Unicamp
> Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica
> (19) 3521-1570
> http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/>
> pablofaria at gmail.com
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 4 May 2009 11:11:58 -0300
> From: "Pablo P. F. de Faria" <pablofaria at gmail.com>
> Subject: Re: [Tutor] Encode problem
> To: Kent Johnson <kent37 at tds.net>
> Cc: *tutor python <tutor at python.org>
> Message-ID:
>        <3ea81d4c0905040711p62376925n26fb93a8955fefe4 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Here is the traceback, after the last change you sugested:
>
> Traceback (most recent call last):
>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in
> OnClose
>    self.SavePreferences()
>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1069,
> in SavePreferences
>    self.cfg.write(codecs.open(self.properties_file,'w','utf-8'))
>  File "/usr/lib/python2.5/ConfigParser.py", line 373, in write
>    (key, str(value).replace('\n', '\n\t')))
>  File "/usr/lib/python2.5/codecs.py", line 638, in write
>    return self.writer.write(data)
>  File "/usr/lib/python2.5/codecs.py", line 303, in write
>    data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 27: ordinal not in range(128)
>
> So, in "str(value)" the content is a folder name with an accented character
> (?).
>
> 2009/5/4 Pablo P. F. de Faria <pablofaria at gmail.com>:
> > Thanks, Kent, but that doesn't solve my problem. In fact, I need
> > ConfigParser to work with non-ascii characters, since my App may run
> > in "latin-1" environments (folders e files names). I must find out why
> > the str() function in the module ConfigParser doesn't use the encoding
> > defined for the application (# -*- coding: utf-8 -*-). The rest of the
> > application works properly with utf-8, except for ConfigParser. What I
> > found out is that ConfigParser seems to make use of the configuration
> > in Site.py (which is set to 'ascii'), instead of the configuration
> > defined for the App (if I change . But this is very problematic to
> > have to change Site.py in every computer... So I wonder if there is a
> > way to replace the settings in Site.py only for my App.
> >
> > 2009/5/1 Kent Johnson <kent37 at tds.net>:
> >> On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria
> >> <pablofaria at gmail.com> wrote:
> >>> Hi, Kent.
> >>>
> >>> The stack trace is:
> >>>
> >>> Traceback (most recent call last):
> >>> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in
> OnClose
> >>> ? ?self.SavePreferences()
> >>> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068,
> >>> in SavePreferences
> >>> ? ?self.cfg.set(u'File Settings',u'Recent files',
> >>> unicode(",".join(self.recent_files)))
> >>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> >>> 12: ordinal not in range(128)
> >>>
> >>> The "unicode" function, actually doesn't do any difference... The
> >>> content of the string being saved is "/home/pablo/?rea de
> >>> Trabalho/teste.xml".
> >>
> >> OK, this error is in your code, not the ConfigParser. The problem is
> with
> >> ",".join(self.recent_files)
> >>
> >> Are the entries in self.recent_files unicode strings? If so, then I
> >> think the join is trying to convert to a string using the default
> >> codec. Try
> >>
> >> self.cfg.set('File Settings','Recent files',
> >> ','.join(name.encode('utf-8') for name in self.recent_files))
> >>
> >> Looking at the ConfigParser.write() code, it wants the values to be
> >> strings or convertible to strings by calling str(), so non-ascii
> >> unicode values will be a problem there. I would use plain strings for
> >> all the interaction with ConfigParser and convert to Unicode yourself.
> >>
> >> Kent
> >>
> >> PS Please Reply All to reply to the list.
> >>
> >
> >
> >
> > --
> > ---------------------------------
> > "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas."
> > (Oscar Wilde)
> > ---------------------------------
> > Pablo Faria
> > Mestrando em Aquisi??o de Linguagem - IEL/Unicamp
> > Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica
> > (19) 3521-1570
> > http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/>
> > pablofaria at gmail.com
> >
>
>
>
> --
> ---------------------------------
> "Estamos todos na sarjeta, mas alguns de n?s olham para as estrelas."
> (Oscar Wilde)
> ---------------------------------
> Pablo Faria
> Mestrando em Aquisi??o de Linguagem - IEL/Unicamp
> Bolsista t?cnico FAPESP no Projeto Padr?es R?tmicos e Mudan?a Ling??stica
> (19) 3521-1570
> http://www.tycho.iel.unicamp.br/~pablofaria/<http://www.tycho.iel.unicamp.br/%7Epablofaria/>
> pablofaria at gmail.com
>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 4 May 2009 10:38:31 -0600
> From: vince spicer <vinces1979 at gmail.com>
> Subject: Re: [Tutor] Advanced String Search using operators AND, OR
>        etc..
> To: Alex Feddor <alex.feddor at gmail.com>
> Cc: tutor at python.org
> Message-ID:
>        <1e53c510905040938q25d787f3w17f7a18f65bd0410 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Advanced Strings searches are Regex via re module.
>
> EX:
>
> import re
>
> m = re.compile("(FDA.*?(approved|supported)|Ben[^\s])*")
>
> if m.search(Text):
>    print m.search(Text).group()
>
>
> Vince
>
>
> On Mon, May 4, 2009 at 6:45 AM, Alex Feddor <alex.feddor at gmail.com> wrote:
>
> > Hi
> >
> > I am looking for method enables advanced text string search. Method
> > string.find() or re module seems no  supporting what I am looking for.
> The
> > idea is as follows:
> >
> > Text ="FDA meeting was successful. New drug is approved for whole sale
> > distribution!"
> >
> > I would like to scan the text using AND and OR operators and gets -1 or
> > other value if the searching elements haven't found in the text.
> > Example 01:
> > search criteria:  "FDA" AND ( "approve*" OR "supported")
> > The catch is that in Text variable FDA and approve words  are not one
> after
> > another (other words are in between).
> >  Example 02:
> > search criteria: "Ben"
> > The catch is that code sould find only exact Ben words not also words
> which
> > that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben
> is
> > the right word we are looking for.
> >
> > I would really appreciated your advice - code sample / links how above
> can
> > be achieved! if possible I would appreciated solution achieved with free
> of
> > charge module.
> >
> > Cheers,  Alex
> > PS:
> > A few moths ago I have discovered Python. I am amazed what all can be
> done
> > with it. Really cool programming language..
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/tutor/attachments/20090504/88993fa6/attachment.htm
> >
>
> ------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
> End of Tutor Digest, Vol 63, Issue 8
> ************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090504/f5dd2135/attachment-0001.htm>