[Tutor] create 1000 000 variables

Sun Jul 16 14:34:49 CEST 2006

Michael P. Reilly wrote:
> Good.  But one VERY important point to note is that that you are not working
> with "variables" here.  You are working with members of a class instance.
> This is a very different beast.  You could just use getattr(), setattr() and
> delattr() for these.
>
> But continuing... you might want to think about this in a step back.  Each
> of the self.hN and self.in_hN have something in common and they all have the
> same behavior.  That sounds a lot like a job for "object oriented
> programming", no?  We can create a class that look and acts like a list
> (like hN), but is only active if we have set it (if in_hN is True).
>
> Actually, because of the structure of the SGML code, "BAD CODE1" isn't quite
> the "bad code", the "handle_data" code is actually worse.  The reason "BAD
> CODE1" looks bad is not because of your code, but because SGMLParser forces
> you to create so many methods in the subclass.  There are no "start_hN" and
> "end_hN" catch-all methods available.  For this reason, I made only a minor
> change to the "start_hN" and "end_hN" methods, but changed the reset and
> handle_data methods quite a bit.
>
> class HeaderCapture:
>     def __init__(self, contents=[]):
>         self.contents = contents[:] # copy
>         self.deactivate()
>     def append(self, item):
>         # could raise an exception, but for now, ignore
>         if self.active:
>             self.contents.append(item)
>     def __len__(self):
>         return len(self.contents)
>     def __getitem__(self, idx):
>         return self.contents[idx]
>     def activate(self):
>         self.active = True
>     def deactivate(self):
>         self.active = False
> ...
> class Lister(SGMLParser):
>
>     def reset(self):
>         SGMLParser.reset(self)
>         self.headers = {
>             'h1': HeaderCapture(),
>             'h2': HeaderCapture(),
>             'h3': HeaderCapture(),
>             'h4': HeaderCapture(),
>             'h5': HeaderCapture(),
>             'h6': HeaderCapture(),
>         }
>
>     def handle_data(self, text):
>         # only one would be active, but legally, two could
>         for hc in self.headers.values():
>             hc.append(text)  # if not active, ignore
>
>     def start_h1(self, attrs):
>         self.headers['h1'].activate()
>     def end_h1(self):
>         self.headers['h1'].deactivate()
>     def start_h2(self, attrs):
>         self.headers['h2'].activate()
>     def end_h2(self):
>         self.headers['h2'].deactivate()
>     def start_h3(self, attrs):
>         self.headers['h3'].activate()
>     def end_h3(self):
>         self.headers['h3'].deactivate()
>     def start_h4(self, attrs):
>         self.headers['h4'].activate()
>     def end_h4(self):
>         self.headers['h4'].deactivate()
>     def start_h5(self, attrs):
>         self.headers['h5'].activate()
>     def end_h5(self):
>         self.headers['h5'].deactivate()
>     def start_h6(self, attrs):
>         self.headers['h6'].activate()
>     def end_h6(self):
>         self.headers['h6'].deactivate()
>   
To continue this, your "BAD CODE2" becomes

for tag in 'h1 h2 h3 h4 h5 h6'.split():
  Show_step(tag)
  for i in parser.headers[tag]:
    print i

Kent

> On 7/15/06, Сергій <kyxaxa at gmail.com> wrote:
>   
>>  But again, like others have suggested, you should rethink your problem
>>     
>>> and your solution before starting down your path.  What are you really
>>> capturing?
>>>
>>>       
>> Rethink problem...
>> I try to use sgmllib - get all info tagged in "h1"... "h6"
>> I've created file lister.py:
>>
>> "from sgmllib import SGMLParser
>>
>> class Lister(SGMLParser):
>>
>>  def reset(self):
>>   SGMLParser.reset(self)
>>   self.h1 = []
>>   self.h2 = []
>>   self.h3 = []
>>   self.h4 = []
>>   self.h5 = []
>>   self.h6 = []
>>
>>   self.in_h1 = False
>>   self.in_h2 = False
>>   self.in_h3 = False
>>   self.in_h4 = False
>>   self.in_h5 = False
>>   self.in_h6 = False
>>
>>  def handle_data(self, text):
>>   if self.in_h1 == True:
>>    self.h1.append(text)
>>   elif self.in_h2 == True:
>>    self.h2.append(text)
>>   elif self.in_h3 == True:
>>    self.h3.append(text)
>>   elif self.in_h4 == True:
>>    self.h4.append(text)
>>   elif self.in_h5 == True:
>>    self.h5.append(text)
>>   elif self.in_h6 == True:
>>    self.h6.append(text)
>>
>> #AND NOW "BAD CODE1":
>>
>>  def start_h1(self, attrs):
>>   self.in_h1 = True
>>
>>  def end_h1(self):
>>   self.in_h1 = False
>>
>>  def start_h2(self, attrs):
>>   self.in_h2 = True
>>
>>  def end_h2(self):
>>   self.in_h2 = False
>>
>>  def start_h3(self, attrs):
>>   self.in_h3 = True
>>
>>  def end_h3(self):
>>   self.in_h3 = False
>>
>>  def start_h4(self, attrs):
>>   self.in_h4 = True
>>
>>  def end_h4(self):
>>   self.in_h4 = False
>>
>>  def start_h5(self, attrs):
>>   self.in_h5 = True
>>
>>  def end_h5(self):
>>   self.in_h5 = False
>>
>>  def start_h6(self, attrs):
>>   self.in_h6 = True
>>
>>  def end_h6(self):
>>   self.in_h6 = False
>>
>> "
>>
>> And now I want to print all text in this tags.
>>
>> file use_lister.py:
>>
>> "
>>
>> import urllib, lister
>>
>> f = open('_1.html', 'r')
>> text = f.read()
>> f.close()
>>
>> parser = urllister.Lister()
>> parser.feed(text)
>> parser.close()
>>
>> #AND NOW "BAD CODE2":
>>
>> Show_step('h1')
>> for i in parser.h1:
>>   print i
>>
>> Show_step('h2')
>> for i in parser.h2:
>>   print i
>>
>> Show_step('h3')
>> for i in parser.h3:
>>   print i
>>
>> Show_step('h4')
>> for i in parser.h4:
>>   print i
>>
>> Show_step('h5')
>> for i in parser.h5:
>>   print i
>>
>> Show_step('h6')
>> for i in parser.h6:
>>   print i
>>
>> "
>>
>>
>>
>> And I don't like this "BAD CODE1" and "BAD CODE2"
>>
>> How to rewrite bad codes???
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>