[Tutor] Parsing

Deanna Wilson denawilson33 at gmail.com
Sun Nov 27 21:45:30 CET 2011


 Project 4: Parsing rhinoceros sightings

In this project, I’m  working for a wildlife conservation group that is
tracking rhinos in the African savannah. My  field workers' software
resources and GIS expertise are limited, but you have managed to obtain an
Excel spreadsheet<https://www.e-education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx>showing
the positions of several rhinos over time. Each record in the
spreadsheet shows the latitude/longitude coordinate of a rhino along with
the rhino's name (these rhinos are well known to your field workers).

I want to write a script that will turn the readings in the spreadsheet
into a vector dataset that I can place on a map. This will be a polyline
dataset showing the tracks the rhinos followed over the time the data was
collected.

I will deliver:

A Python script that reads the data from the spreadsheet and creates, from
scratch, a polyline shapefile with *n* polylines, *n* being the number of
rhinos in the spreadsheet. Each polyline should represent a rhino's track
chronologically from the beginning of the spreadsheet to the end of the
spreadsheet. Each polyline should also have a text attribute containing the
rhino's name. The shapefile should use the WGS 1984 geographic coordinate
system.

*Challenges*

The data is in a format (XLSX) that you cannot easily parse. The first step
I must do is manually open the file in Excel and save it as a
comma-delimited format that I can easily read with a script. Choose the
option *CSV (comma-delimited) (*.csv)*. I did this

   - The rhinos in the spreadsheet appear in no guaranteed order, and not
   all the rhinos appear at the beginning of the spreadsheet. As I parse each
   line, I must determine which rhino the reading belongs to and update that
   rhino's polyline track accordingly. *I am not allowed to sort the Rhino
   column in Excel before I export to the CSV file. My script must be "smart"
   enough to work with an unsorted spreadsheet in the order that the records
   appear.*
   - I do not immediately know how many rhinos are in the file or even what
   their names are. Although I could visually comb the spreadsheet for this
   information and hard-code each rhino's name, your script is required to
   handle all the rhino names programmatically. The idea is that I should be
   able to run this script on a different file, possibly containing more
   rhinos, without having to make many manual adjustments.

sample of my code:

import arcpy



shapefile = "C:\\...shp"

pointFilePath = "C:\\...csv"



pointFile = open(pointFilePath, "r")

lineOfText = pointFile.readline()



dataPairList = lineOfText.split(",")



def addVertex(lat, lon, array):

    vertex = arcpy.CreateObject("Point")

    vertex.X = lon

    vertex.Y = lat

    array.add(vertex)



def addPolyline(cursor, array):

   feature = cursor.newRow()

   feature.shape = array

   cursor.insertRow(feature)

   array.removeAll()





def rhinoName(Rhino, dictionary):

    if rhinoName in rhinoDictionary:

        dictionary[rhinoName].append([latValue, lonValueIndex])



    if rhinoName not in dictionary:

        dictionary[rhinoName] = []



    else:

        dictionary[rhinoName]= ([latValue, lonValue])







latValueIndex = dataPairList.index("X")

lonValueIndex = dataPairList.index("Y")

vertexArray = arcpy.CreateObject("Array")







for line in pointFile.readlines():

    segmentedLine = line.split(",")

    latValue = segmentedLine[latValueIndex]

    lonValue = segmentedLine[lonValueIndex]

    vertex = arcpy.CreateObject("Point")

    vertex.X = lonValue

    vertex.Y = latValue

    vertexArray.add(vertex)





    polylineArray.add(currentPoint)





cursor = arcpy.InsertCursor(shapefile)

row = cursor.newRow()

row.Shape = vertexArray

cursor.insertRow(row)



del cursor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111127/9c55a348/attachment-0001.html>


More information about the Tutor mailing list