[Tutor] what am I not understanding?
Steven D'Aprano
steve at pearwood.info
Mon Oct 20 03:16:23 CEST 2014
On Sun, Oct 19, 2014 at 03:26:44PM -0700, Clayton Kirkwood wrote:
> raw_table = ('''
> a: Ask y: Dividend Yield
[...]
> o: Open''')
> key_name = raw_table.rstrip('\t')
> print(key_name)
[...]
> #why is the tab not being removed?
How do you know that the raw_table contains tabs rather than spaces? As
far as I can see, it contains spaces, although that might just be an
artifact of copying and pasting text into your email.
You think you have a problem with tabs not being removed, but it seems
to me that your *actual* problem is that you're not really sure whether
there are tabs in the string in the first place! Just because you press
the TAB key on your keyboard doesn't mean that a tab is inserted. Your
system might be confirgured to insert a number of spaces instead. Your
editor might automatically convert the tab into spaces. Who knows?
Instead of typing TAB, the best way to get tabs into a Python string is
to use the \t (backslash-t) escape code. And the best way to see that
they are inside a string is to print the repr() of the string:
py> raw = " some text "
py> print raw # Looks like a leading tab, trailing tab is invisible.
some text
py> print repr(raw) # But repr() reveals the truth, no tabs here!
' some text '
py> raw = "\tsome text\t" # Actually are tabs.
py> print raw
some text
py> print repr(raw)
'\tsome text\t'
So the first thing for you to do is to satisfy yourself that what you
think are tabs actually are tabs.
Then, the second thing is to understand what the rstrip() method
actually does. It only removes characters from the right hand end of the
string, not everywhere in the string:
py> raw = "--some-text----"
py> print raw.rstrip("-")
--some-text
(There is also a lstrip() to do only the left hand end of the string,
and a strip() method to do both.)
"Right hand end" doesn't mean the right hand end of *each line*, only of
the entire string:
py> raw = """---some-text----
... --more--text---
... -and-even-more--text-----"""
py> print raw.rstrip("-")
---some-text----
--more--text---
-and-even-more--text
If you want to strip something from the end of each line, first you have
to break the string up into individual lines, strip each one, then put
them back together again:
py> lines = raw.split('\n')
py> lines = [line.rstrip('-') for line in lines]
py> print '\n'.join(lines)
---some-text
--more--text
-and-even-more--text
To remove from the middle of the string, use the replace() method:
py> print raw.replace("-", "")
sometext
moretext
andevenmoretext
> key_name = raw_table.split('\t')
> print(key_name)
[...]
> #great the tab is being removed but now I
> have to remove the \n but it is no longer a string
Right. The split() method isn't a "remove" method, that's why it's
called "split" and not "remove". It splits the string into pieces, and
returns a list of substrings.
You can always join the pieces back together, if you want, or find a
better way to remove things.
> key_name = raw_table.split('\t\n')
> print(key_name)
[...]
> #why isn't the \t and \n not both "removed"
Because you didn't tell Python to remove \t and \n separately. You told
it to split the string everywhere it sees the pair of characters \t\n.
The argument to split is not a list of individual characters to split
on, but an exact substring to split on. If it doesn't match that
substring exactly, it won't split and you'll only get one piece:
py> print raw
---some-text----
--more--text---
-and-even-more--text-----
py> raw.split('ex')
['---some-t', 't----\n--more--t', 't---\n-and-even-more--t', 't-----']
py> raw.split('xe')
['---some-text----\n--more--text---\n-and-even-more--text-----']
> I am trying to get to where I can create a dict using the ':' separator
It takes about three steps (excluding print statements, which I have
only included so you can see the individual steps in action):
# Step 1: split the string into "key:value" substrings.
py> raw = "a : something, b: another thing, c:yet a third thing"
py> items = [s.strip() for s in raw.split(',')]
py> print items
['a : something', 'b: another thing', 'c:yet a third thing']
# Step 2: split each substring into a separate (key, value) tuple,
# cleaning up any whitespace around them.
py> items = [s.split(':') for s in items]
py> print items
[['a ', ' something'], ['b', ' another thing'], ['c', 'yet a third thing']]
py> items = [(key.strip(), value.strip()) for key,value in items]
py> print items
[('a', 'something'), ('b', 'another thing'), ('c', 'yet a third thing')]
# Step 3: convert into a dict.
py> d = dict(items)
py> print d
{'a': 'something', 'c': 'yet a third thing', 'b': 'another thing'}
--
Steven
More information about the Tutor
mailing list