Iterate over text file, discarding some lines via context manager
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Sat Nov 29 03:00:33 EST 2014
>>> with open('UnicodeData.txt', 'rb') as f:
... t = f.read()
...
>>> t = t.decode('ascii')
>>> z = t.splitlines()
>>> # process
>>> zz = [e.split(';') for e in z]
>>> for e in zz[:3]:
... print(e)
...
['0000', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'NULL', '', '', '', '']
['0001', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'START OF HEADING', '', '', '', '']
['0002', '<control>', 'Cc', '0', 'BN', '', '', '', '', 'N', 'START OF TEXT', '', '', '', '']
>>> (len(t), len(z), len(zz))
(1509570, 27268, 27268)
>>>
Fast, simple, unbeatable, (without aspirin).
jmf
More information about the Python-list
mailing list