[Tutor] Parsing
Deanna Wilson
denawilson33 at gmail.com
Sun Nov 27 21:45:30 CET 2011
Project 4: Parsing rhinoceros sightings
In this project, I’m working for a wildlife conservation group that is
tracking rhinos in the African savannah. My field workers' software
resources and GIS expertise are limited, but you have managed to obtain an
Excel spreadsheet<https://www.e-education.psu.edu/drupal6/files/geog485py/data/RhinoObservations.xlsx>showing
the positions of several rhinos over time. Each record in the
spreadsheet shows the latitude/longitude coordinate of a rhino along with
the rhino's name (these rhinos are well known to your field workers).
I want to write a script that will turn the readings in the spreadsheet
into a vector dataset that I can place on a map. This will be a polyline
dataset showing the tracks the rhinos followed over the time the data was
collected.
I will deliver:
A Python script that reads the data from the spreadsheet and creates, from
scratch, a polyline shapefile with *n* polylines, *n* being the number of
rhinos in the spreadsheet. Each polyline should represent a rhino's track
chronologically from the beginning of the spreadsheet to the end of the
spreadsheet. Each polyline should also have a text attribute containing the
rhino's name. The shapefile should use the WGS 1984 geographic coordinate
system.
*Challenges*
The data is in a format (XLSX) that you cannot easily parse. The first step
I must do is manually open the file in Excel and save it as a
comma-delimited format that I can easily read with a script. Choose the
option *CSV (comma-delimited) (*.csv)*. I did this
- The rhinos in the spreadsheet appear in no guaranteed order, and not
all the rhinos appear at the beginning of the spreadsheet. As I parse each
line, I must determine which rhino the reading belongs to and update that
rhino's polyline track accordingly. *I am not allowed to sort the Rhino
column in Excel before I export to the CSV file. My script must be "smart"
enough to work with an unsorted spreadsheet in the order that the records
appear.*
- I do not immediately know how many rhinos are in the file or even what
their names are. Although I could visually comb the spreadsheet for this
information and hard-code each rhino's name, your script is required to
handle all the rhino names programmatically. The idea is that I should be
able to run this script on a different file, possibly containing more
rhinos, without having to make many manual adjustments.
sample of my code:
import arcpy
shapefile = "C:\\...shp"
pointFilePath = "C:\\...csv"
pointFile = open(pointFilePath, "r")
lineOfText = pointFile.readline()
dataPairList = lineOfText.split(",")
def addVertex(lat, lon, array):
vertex = arcpy.CreateObject("Point")
vertex.X = lon
vertex.Y = lat
array.add(vertex)
def addPolyline(cursor, array):
feature = cursor.newRow()
feature.shape = array
cursor.insertRow(feature)
array.removeAll()
def rhinoName(Rhino, dictionary):
if rhinoName in rhinoDictionary:
dictionary[rhinoName].append([latValue, lonValueIndex])
if rhinoName not in dictionary:
dictionary[rhinoName] = []
else:
dictionary[rhinoName]= ([latValue, lonValue])
latValueIndex = dataPairList.index("X")
lonValueIndex = dataPairList.index("Y")
vertexArray = arcpy.CreateObject("Array")
for line in pointFile.readlines():
segmentedLine = line.split(",")
latValue = segmentedLine[latValueIndex]
lonValue = segmentedLine[lonValueIndex]
vertex = arcpy.CreateObject("Point")
vertex.X = lonValue
vertex.Y = latValue
vertexArray.add(vertex)
polylineArray.add(currentPoint)
cursor = arcpy.InsertCursor(shapefile)
row = cursor.newRow()
row.Shape = vertexArray
cursor.insertRow(row)
del cursor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20111127/9c55a348/attachment-0001.html>
More information about the Tutor
mailing list