Remove empty strings from list
Bruno Desthuilliers
bruno.42.desthuilliers at websiteburo.invalid
Tue Sep 15 04:16:38 EDT 2009
Helvin a écrit :
> Hi,
>
> Sorry I did not want to bother the group, but I really do not
> understand this seeming trivial problem.
> I am reading from a textfile, where each line has 2 values, with
> spaces before and between the values.
> I would like to read in these values, but of course, I don't want the
> whitespaces between them.
> I have looked at documentation, and how strings and lists work, but I
> cannot understand the behaviour of the following:
line = f.readline()
> line = line.lstrip() # take away whitespace at the beginning of the
> readline.
file.readline returns the line with the ending newline character (which
is considered whitespace by the str.strip method), so you may want to
use line.strip instead of line.lstrip
> list = line.split(' ')
Slightly OT but : don't use builtin types or functions names as
identifiers - this shadows the builtin object.
Also, the default behaviour of str.split is to split on whitespaces and
remove the delimiter. You would have better results not specifying the
delimiters here:
>>> " a a a a ".split(' ')
['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a a a a ".split()
['a', 'a', 'a', 'a']
>>>
> # the list has empty strings in it, so now,
> remove these empty strings
A problem you could have avoided right from the start !-)
> for item in list:
> if item is ' ':
Don't use identity comparison when you want to test for equality. It
happens to kind of work in your above example but only because CPython
implements a cache for _some_ small strings, but you should _never_ rely
on such implementation details. A string containing accented characters
would not have been cached:
>>> s = 'ééé'
>>> s is 'ééé'
False
>>>
Also, this is surely not your actual code : ' ' is not an empty string,
it's a string with a single space character. The empty string is ''. And
FWIW, empty strings (like most empty sequences and collections, all
numerical zeros, and the None object) have a false value in a boolean
context, so you can just test the string directly:
for s in ['', 0, 0.0, [], {}, (), None]:
if not s:
print "'%s' is empty, so it's false" % str(s)
> print 'discard these: ',item
> index = list.index(item)
> del list[index] # remove this item from the list
And then you do have a big problem : the internal pointer used by the
iterator is not in sync with the list anymore, so the next iteration
will skip one item.
As general rule : *don't* add / remove elements to/from a sequence while
iterating over it. If you really need to modify the sequence while
iterating over it, do a reverse iteration - but there are usually better
solutions.
> else:
> print 'keep this: ',item
> The problem is,
Make it a plural - there's more than 1 problem here !-)
> when my list is : ['44', '', '', '', '', '',
> '0.000000000\n']
> The output is:
> len of list: 7
> keep this: 44
> discard these:
> discard these:
> discard these:
> So finally the list is: ['44', '', '', '0.000000000\n']
> The code above removes all the empty strings in the middle, all except
> two. My code seems to miss two of the empty strings.
>
> Would you know why this is occuring?
cf above... and below:
>>> alist = ['44', '', '', '', '', '', '0.000000000']
>>> for i, it in enumerate(alist):
... print 'i : %s - it : "%s"' % (i, it)
... if not it:
... del alist[idx]
... print "alist is now %s" % alist
...
i : 0 - it : "44"
alist is now ['44', '', '', '', '', '', '0.000000000']
i : 1 - it : ""
alist is now ['44', '', '', '', '', '0.000000000']
i : 2 - it : ""
alist is now ['44', '', '', '', '0.000000000']
i : 3 - it : ""
alist is now ['44', '', '', '0.000000000']
>>>
Ok, now for practical answers:
1/ in the above case, use line.strip().split(), you'll have no more
problem !-)
2/ as a general rule, if you need to filter a sequence, don't try to do
it in place (unless it's a *very* big sequence and you run into memory
problems but then there are probably better solutions).
The common idioms for filtering a sequence are:
* filter(predicate, sequence):
the 'predicate' param is callback function which takes an item from the
sequence and returns a boolean value (True to keep the item, False to
discard it). The following example will filter out even integers:
def is_odd(n):
return n % 2
alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds
Alternatively, filter() can take None as it's first param, in which case
it will filter out items that have a false value in a boolean context, ie:
alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result
* list comprehensions
Here you directly build the result list:
alist = range(10)
odds = [n for n in alist if n % 2]
alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result
HTH
More information about the Python-list
mailing list