[Tutor] Parsing

bob gailer bgailer at gmail.com
Sun Nov 27 22:46:22 CET 2011


Welcome to the Tutor List.

We are a few volunteers who enjoy tutoring folk on specific Python 
learning issues.

We like posts that are in plain text rather than HTML. Please post plain 
text in future. Also your code has a blank line between every LOC, 
Please remove these in future posts

I failed to see any specific request in your post so all I can say is 
welcome and how can we help you. Did you run the code? Did you get 
errors or unexpected results? Please report these.

Errors usually appear as a traceback. Include the entire traceback 
(after putting in some effort to figure out the error on your own).

unexpected results? Tell us what you expected and how the results differ.

Good luck!

On 11/27/2011 3:45 PM, Deanna Wilson wrote:
> Project 4: Parsing rhinoceros sightings
>
> In this project, I'm  working for a wildlife conservation group that 
> is tracking rhinos in the African savannah. My  field workers' 
> software resources and GIS expertise are limited, but you have managed 
> to obtain an Excel spreadsheet 
> <https://www.e-education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx> 
> showing the positions of several rhinos over time. Each record in the 
> spreadsheet shows the latitude/longitude coordinate of a rhino along 
> with the rhino's name (these rhinos are well known to your field workers).
>
> I want to write a script that will turn the readings in the 
> spreadsheet into a vector dataset that I can place on a map. This will 
> be a polyline dataset showing the tracks the rhinos followed over the 
> time the data was collected.
>
> I will deliver:
>
> A Python script that reads the data from the spreadsheet and creates, 
> from scratch, a polyline shapefile with n polylines, n being the 
> number of rhinos in the spreadsheet. Each polyline should represent a 
> rhino's track chronologically from the beginning of the spreadsheet to 
> the end of the spreadsheet. Each polyline should also have a text 
> attribute containing the rhino's name. The shapefile should use the 
> WGS 1984 geographic coordinate system.
>
> Challenges
>
> The data is in a format (XLSX) that you cannot easily parse. The first 
> step I must do is manually open the file in Excel and save it as a 
> comma-delimited format that I can easily read with a script. Choose 
> the option CSV (comma-delimited) (*.csv). I did this
>
>   * The rhinos in the spreadsheet appear in no guaranteed order, and
>     not all the rhinos appear at the beginning of the spreadsheet. As
>     I parse each line, I must determine which rhino the reading
>     belongs to and update that rhino's polyline track accordingly. I
>     am not allowed to sort the Rhino column in Excel before I export
>     to the CSV file. My script must be "smart" enough to work with an
>     unsorted spreadsheet in the order that the records appear.
>   * I do not immediately know how many rhinos are in the file or even
>     what their names are. Although I could visually comb the
>     spreadsheet for this information and hard-code each rhino's name,
>     your script is required to handle all the rhino names
>     programmatically. The idea is that I should be able to run this
>     script on a different file, possibly containing more rhinos,
>     without having to make many manual adjustments.
>
> sample of my code:
>
> import arcpy
>
> shapefile = "C:\\...shp"
>
> pointFilePath = "C:\\...csv"
>
> pointFile = open(pointFilePath, "r")
>
> lineOfText = pointFile.readline()
>
> dataPairList = lineOfText.split(",")
>
> def addVertex(lat, lon, array):
>
>     vertex = arcpy.CreateObject("Point")
>
>     vertex.X = lon
>
>     vertex.Y = lat
>
>     array.add(vertex)
>
> def addPolyline(cursor, array):
>
>    feature = cursor.newRow()
>
>    feature.shape = array
>
>    cursor.insertRow(feature)
>
>    array.removeAll()
>
> def rhinoName(Rhino, dictionary):
>
>     if rhinoName in rhinoDictionary:
>
>         dictionary[rhinoName].append([latValue, lonValueIndex])
>
>     if rhinoName not in dictionary:
>
>         dictionary[rhinoName] = []
>
>     else:
>
>         dictionary[rhinoName]= ([latValue, lonValue])
>
> latValueIndex = dataPairList.index("X")
>
> lonValueIndex = dataPairList.index("Y")
>
> vertexArray = arcpy.CreateObject("Array")
>
> for line in pointFile.readlines():
>
>     segmentedLine = line.split(",")
>
>     latValue = segmentedLine[latValueIndex]
>
>     lonValue = segmentedLine[lonValueIndex]
>
>     vertex = arcpy.CreateObject("Point")
>
>     vertex.X = lonValue
>
>     vertex.Y = latValue
>
>     vertexArray.add(vertex)
>
>     polylineArray.add(currentPoint)
>
> cursor = arcpy.InsertCursor(shapefile)
>
> row = cursor.newRow()
>
> row.Shape = vertexArray
>
> cursor.insertRow(row)
>
> del cursor
>
>
>
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor


-- 
Bob Gailer
919-636-4239
Chapel Hill NC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111127/d174dba2/attachment-0001.html>


More information about the Tutor mailing list