Thanks Martin-- this is really great.  My major question now is that I need to transition to Python for a project and I need to learn how to think in Python instead of in R.  The two strategies I have used so far are: a) going through the description and exercises in <a href="http://www.openbookproject.net/thinkcs/python/english2e/">http://www.openbookproject.net/thinkcs/python/english2e/</a> and b) trying to convert my R code into Python.<div>

<br></div><div>On a high-level, do you have any other suggestions for how I could go about becoming more proficient in Python? </div><div><br></div><div>Thanks again to you and everyone else who responded.  I am really very much obliged.</div>

<div><br></div><div>Benjamin<br><br><div class="gmail_quote">On Sat, May 19, 2012 at 5:32 PM, Martin A. Brown <span dir="ltr">&lt;<a href="mailto:martin@linux-ip.net" target="_blank">martin@linux-ip.net</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Greetings Benjamin,<br>

<br>

To begin: I do not know R.<br>

<br>

 : I&#39;m trying to improve my python by translating R code that I<br>

 : wrote into Python.<br>

 :<br>

 : *All I am trying to do is take in a specific column in<br>

<div class="im"> : &quot;uncurated&quot; and write that whole column as output to &quot;curated.&quot;<br>

 : It should be a pretty basic command, I&#39;m just not clear on how to<br>

</div> : execute it.*<br>

<br>

The hardest part about translation is learning how to think in a<br>

different language.  If you know any other human languages, you<br>

probably know that you can say things in some languages that do not<br>

translate particularly well (other than circumlocution) into another<br>

language.  Why am I starting with this?  I am starting here because<br>

you seem quite comfortable with thinking and operating in R, but you<br>

don&#39;t seem as comfortable yet with thinking and operating in Python.<br>

<br>

Naturally, that&#39;s why you are asking the Tutor list about this, so<br>

welcome to the right place!  Let&#39;s see if we can get you some help.<br>

<div class="im"><br>

 : As background, GSEXXXXX_full_pdata.csv has different patient<br>

 : information (such as unique patient ID&#39;s, whether the tissue used<br>

 : was tumor or normal, and other things. I&#39;ll just use the first<br>

 : two characteristics for now). Template.csv is a template we built<br>

 : that allows us to take different datasets and standardize them<br>

 : for meta-analysis.  So for example, &quot;curated$alt_sample_name&quot;<br>

 : refers to the unique patient ID, and &quot;curated$sample_type&quot; refers<br>

 : to the type of tissue used.<br>

<br>

</div>I have fabricated some data after your description that looks like<br>

this:<br>

<br>

  patientID,title,sample_type<br>

  V6IF0OqVu,0.5788,70<br>

  GXj51ljB2,0.3449,88<br>

<br>

You, doubtless have more columns and the data here are probably<br>

nothing like yours, but consider it useful for illustrative purposes<br>

only.  (Illustrating porpoises!  How did they get here?  Next thing<br>

you know we will have illuminating egrets and animating<br>

dromedaries!)<br>

<div class="im"><br>

 : I&#39;ve been reading about the python csv module and realized it was<br>

 : best to get some expert input to clarify some confusion on my<br>

 : part.<br>

<br>

</div>The csv module is very useful and quite powerful for reading data in<br>

different ways and iterating over data sets.  Supposing you know the<br>

index of the column of interest to you...well this is quite trivial:<br>

<br>

  import csv<br>

  def main(f,field):<br>

      for row in csv.reader(f):<br>

          print row[0],row[field]<br>

<br>

  # -- lists/tuples are zero-based [0,1,2], so 2 is the third column<br>

  #<br>

  #<br>

  main(open(&#39;GSEXXXXX_full_pdata.csv&#39;),2)<br>

<br>

OK, but if your data files have different numbers of or ordering of<br>

columns, then this can become a bit fragile.  So maybe you would<br>

want to learn how to use the csv.DictReader, which will give you the<br>

same thing but uses the first (header) line to name the columns, so<br>

then you could do something more like this:<br>

<br>

  import csv<br>

  def main(f,id,field):<br>

      for row in csv.DictReader(f):<br>

          print row[id],row[field]<br>

<br>

  main(open(&#39;GSEXXXXX_full_pdata.csv&#39;),&#39;patientID&#39;,&#39;sample_type&#39;)<br>

<br>

Would you like more detail on this?  Well, have a look at this nice<br>

little summary:<br>

<br>

  <a href="http://www.doughellmann.com/PyMOTW/csv/" target="_blank">http://www.doughellmann.com/PyMOTW/csv/</a><br>

<br>

Now, that really is just giving you a glimpse of the csv module.<br>

This is not really your question.  Your question was more along the<br>

lines of &#39;How do I, in Python, accomplish this task that is quite<br>

simple in R?&#39;<br>

<br>

You may find that list-comprehensions, generators and iterators are<br>

all helpful in mangling the data according to your nefarious will<br>

once you have used the csv module to load the data into a data<br>

structure.<br>

<br>

In point of fact, though, Python does not have this particular<br>

feature that you are seek...not in the core libraries, however.<br>

<br>

The lack of this capability has bothered a few people over the<br>

years, so there are a few different types of solutions.  You have<br>

already heard a reference to RPy (about which I know nothing):<br>

<br>

  <a href="http://rpy.sourceforge.net/" target="_blank">http://rpy.sourceforge.net/</a><br>

<br>

There are, however, a few other tools that you may find quite<br>

useful.  One chap wanted access to some features of R that he used<br>

all the time along with many of the other convenient features of<br>

Python, so he decided to implement dataframes (an R concept?) in<br>

Python.  This idea was present at the genesis of the pandas library.<br>

<br>

  <a href="http://pandas.pydata.org/" target="_blank">http://pandas.pydata.org/</a><br>

<br>

So, how would you do this with pandas?  Well, you could:<br>

<br>

  import pandas<br>

  def main(f,field):<br>

      uncurated = pandas.read_csv(f)<br>

      curated = uncurated[field]<br>

      print curated<br>

<br>

  main(open(&#39;GSEXXXXX_full_pdata.csv&#39;),&#39;sample_type&#39;)<br>

<br>

Note that pandas is geared to allow you to access your data by the<br>

&#39;handles&#39;, the unique identifier for the row and the column name.<br>

This will produce a tabular output of just the single column you<br>

want.  You may find that pandas affords you access to tools with<br>

which you are already intellectually familiar.<br>

<br>

Good luck,<br>

<br>

-Martin<br>

<br>

P.S. While I was writing this, you sent in some sample data that<br>

   looked tab-separated (well, anyway, not comma-separated).  The<br>

   csv and pandas libraries allow for delimiter=&#39;\t&#39; options to<br>

   most object constructor calls.  So, you could do:<br>

     csv.reader(f,delimiter=&#39;\t&#39;)<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Martin A. Brown<br>

<a href="http://linux-ip.net/" target="_blank">http://linux-ip.net/</a><br>

</font></span></blockquote></div><br></div>