[Tutor] what am I not understanding?

Mon Oct 20 03:16:23 CEST 2014

On Sun, Oct 19, 2014 at 03:26:44PM -0700, Clayton Kirkwood wrote:
> raw_table = ('''
> a: Ask    y: Dividend Yield
[...]
> o: Open''')
> key_name = raw_table.rstrip('\t')
> print(key_name)
[...]
> #why is the tab not being removed?

How do you know that the raw_table contains tabs rather than spaces? As 
far as I can see, it contains spaces, although that might just be an 
artifact of copying and pasting text into your email.

You think you have a problem with tabs not being removed, but it seems 
to me that your *actual* problem is that you're not really sure whether 
there are tabs in the string in the first place! Just because you press 
the TAB key on your keyboard doesn't mean that a tab is inserted. Your 
system might be confirgured to insert a number of spaces instead. Your 
editor might automatically convert the tab into spaces. Who knows?

Instead of typing TAB, the best way to get tabs into a Python string is 
to use the \t (backslash-t) escape code. And the best way to see that 
they are inside a string is to print the repr() of the string:

py> raw = "        some text      "
py> print raw  # Looks like a leading tab, trailing tab is invisible.
        some text
py> print repr(raw)  # But repr() reveals the truth, no tabs here!
'        some text      '
py> raw = "\tsome text\t"  # Actually are tabs.
py> print raw
        some text
py> print repr(raw)
'\tsome text\t'

So the first thing for you to do is to satisfy yourself that what you 
think are tabs actually are tabs.

Then, the second thing is to understand what the rstrip() method 
actually does. It only removes characters from the right hand end of the 
string, not everywhere in the string:

py> raw = "--some-text----"
py> print raw.rstrip("-")
--some-text

(There is also a lstrip() to do only the left hand end of the string, 
and a strip() method to do both.)

"Right hand end" doesn't mean the right hand end of *each line*, only of 
the entire string:

py> raw = """---some-text----
... --more--text---
... -and-even-more--text-----"""
py> print raw.rstrip("-")
---some-text----
--more--text---
-and-even-more--text

If you want to strip something from the end of each line, first you have 
to break the string up into individual lines, strip each one, then put 
them back together again:

py> lines = raw.split('\n')
py> lines = [line.rstrip('-') for line in lines]
py> print '\n'.join(lines)
---some-text
--more--text
-and-even-more--text

To remove from the middle of the string, use the replace() method:

py> print raw.replace("-", "")
sometext
moretext
andevenmoretext

> key_name = raw_table.split('\t')
> print(key_name) 
[...]
>                                 #great the tab is being removed but now I
> have to remove the \n but it is no longer a string

Right. The split() method isn't a "remove" method, that's why it's 
called "split" and not "remove". It splits the string into pieces, and 
returns a list of substrings.

You can always join the pieces back together, if you want, or find a 
better way to remove things.

> key_name = raw_table.split('\t\n')
> print(key_name)
[...] 
>                                 #why isn't the \t and \n not both "removed" 

Because you didn't tell Python to remove \t and \n separately. You told 
it to split the string everywhere it sees the pair of characters \t\n. 
The argument to split is not a list of individual characters to split 
on, but an exact substring to split on. If it doesn't match that 
substring exactly, it won't split and you'll only get one piece:

py> print raw
---some-text----
--more--text---
-and-even-more--text-----
py> raw.split('ex')
['---some-t', 't----\n--more--t', 't---\n-and-even-more--t', 't-----']
py> raw.split('xe')
['---some-text----\n--more--text---\n-and-even-more--text-----']

> I am trying to get to where I can create a dict using the ':' separator

It takes about three steps (excluding print statements, which I have 
only included so you can see the individual steps in action):

# Step 1: split the string into "key:value" substrings.
py> raw = "a : something, b: another thing, c:yet a third thing"
py> items = [s.strip() for s in raw.split(',')]
py> print items
['a : something', 'b: another thing', 'c:yet a third thing']

# Step 2: split each substring into a separate (key, value) tuple,
# cleaning up any whitespace around them.
py> items = [s.split(':') for s in items]
py> print items
[['a ', ' something'], ['b', ' another thing'], ['c', 'yet a third thing']]
py> items = [(key.strip(), value.strip()) for key,value in items]
py> print items
[('a', 'something'), ('b', 'another thing'), ('c', 'yet a third thing')]

# Step 3: convert into a dict.
py> d = dict(items)
py> print d
{'a': 'something', 'c': 'yet a third thing', 'b': 'another thing'}

-- 
Steven