[Tutor] reading and processing xml files with python

Norman Khine norman at khine.net
Sun Jun 21 08:17:44 CEST 2009


you can use something like http://docs.hforge.org/itools/xml.html to
process your xml request or some other python xml parser such as
BeautifulStoneSoup.

to return a list of the <name> tag value, you could , perhaps:

>>> firstdata = '<Hotel> <hotelId>134388</hotelId> <name>Milford Plaza at Times Square</name> <address1>700 8th Avenue</address1> <address2/> <address3/> <city>New York</city> <stateProvince>NY</stateProvince> <country>US</country> <postalCode>10036</postalCode> </Hotel>'
>>>
>>> from itools.xml import (XMLParser, START_ELEMENT,
...     END_ELEMENT, TEXT)
>>> names = []
>>> for type, value, line in XMLParser(firstdata):
...     if type == TEXT:
...         names.append(value)
>>> names
[' ', '134388', ' ', 'Milford Plaza at Times Square', ' ', '700 8th
Avenue', ' ', ' ', ' ', 'New York', ' ', 'NY', ' ', 'US', ' ',
'10036', ' ']

Here is a another version, using BeautifulStoneSoup:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup(firstdata)
>>> names = soup.findAll('name')
>>> names
[<name>Milford Plaza at Times Square</name>]
>>>


On Sat, Jun 20, 2009 at 9:34 PM, <python.list at safe-mail.net> wrote:
> Hi,
> I am a total python XML noob and wanted some clarification on using python with reading remote XML data.
>
> All examples I have found assumes the data is stored localy or have I misunderstood this?
>
> If I browse to:
> 'user:password at domain.com/external/xmlinterface.jsp?cid=xxx&resType=hotel200631&intfc=ws&xml='
>
> This request returns a page like:
>
> <HotelAvailabilityListResults size="25">
> -
> <Hotel>
> <hotelId>134388</hotelId>
> <name>Milford Plaza at Times Square</name>
> <address1>700 8th Avenue</address1>
> <address2/>
> <address3/>
> <city>New York</city>
> <stateProvince>NY</stateProvince>
> <country>US</country>
> <postalCode>10036</postalCode>
> <airportCode>NYC</airportCode>
> <lowRate>155.4</lowRate>
> <highRate>259.0</highRate>
> <rateCurrencyCode>USD</rateCurrencyCode>
> <latitude>40.75905</latitude>
> <longitude>-73.98844</longitude>
> +
> <shortDescription>
> &lt;b&gt;Location.&lt;/b&gt;&lt;br&gt; &lt;UL&gt;&lt;LI&gt;The Milford Plaza is located in New York, N.Y.
> </shortDescription>
> <thumbNailUrl>/hotels/thumbs/NYC_MILF-exter-1-thumb.jpg</thumbNailUrl>
> <supplierType>H</supplierType>
> <location>TIMES SQUARE/THEATER DISTRICT</location>
> <propertyRating>2.5</propertyRating>
> <propertyType>1</propertyType>
> <marketingLevel>1</marketingLevel>
> <hasMap>true</hasMap>
> <hotelInDestination>true</hotelInDestination>
> <referenceProximityDistance>3.9202964</referenceProximityDistance>
> <referenceProximityUnit>MI</referenceProximityUnit>
> +
> <HotelProperty>
> <specialRate>N</specialRate>
> <promoDescription>72 Hour Sale - Don&apos;t miss this great deal!</promoDescription>
> <promoType/>
> <promoDetailText/>
> <hrnQuoteKey>17A828141014136319</hrnQuoteKey>
> <currentAllotment>-1</currentAllotment>
> <propertyId>25033</propertyId>
> <propertyAvailable>true</propertyAvailable>
> <propertyRestricted>false</propertyRestricted>
> <roomDescription>Standard room</roomDescription>
> <roomTypeCode>108606</roomTypeCode>
> <rateCode>252427</rateCode>
> -
> <RateInfo>
> <displayCurrencyCode>USD</displayCurrencyCode>
> -
> <DisplayNightlyRates size="2">
> <displayNightlyRate>259.0</displayNightlyRate>
> <displayNightlyRate>259.0</displayNightlyRate>
> </DisplayNightlyRates>
> <displayRoomRate>575.76</displayRoomRate>
> <chargeableRoomRateTotal>575.76</chargeableRoomRateTotal>
> <chargeableRoomRateTaxesAndFees>57.76</chargeableRoomRateTaxesAndFees>
> <nativeCurrencyCode>USD</nativeCurrencyCode>
> -
> <NativeNightlyRates size="2">
> <nativeNightlyRate>259.0</nativeNightlyRate>
> <nativeNightlyRate>259.0</nativeNightlyRate>
> </NativeNightlyRates>
> <nativeRoomRate>575.76</nativeRoomRate>
> <rateFrequency>B</rateFrequency>
> </RateInfo>
> -
> <PromoRateInfo>
> <displayCurrencyCode>USD</displayCurrencyCode>
> -
> <DisplayNightlyRates size="2">
> <displayNightlyRate>155.4</displayNightlyRate>
> <displayNightlyRate>155.4</displayNightlyRate>
> </DisplayNightlyRates>
> <displayRoomRate>368.56</displayRoomRate>
> <chargeableRoomRateTotal>368.56</chargeableRoomRateTotal>
> <chargeableRoomRateTaxesAndFees>57.76</chargeableRoomRateTaxesAndFees>
> <nativeCurrencyCode>USD</nativeCurrencyCode>
> -
> <NativeNightlyRates size="2">
> <nativeNightlyRate>155.4</nativeNightlyRate>
> <nativeNightlyRate>155.4</nativeNightlyRate>
> </NativeNightlyRates>
> <nativeRoomRate>368.56</nativeRoomRate>
> <rateFrequency>B</rateFrequency>
> </PromoRateInfo>
> </HotelProperty>
> </Hotel>
>
>
> I got this so far:
>
>>>> import urllib2
>>>> request = urllib2.Request('user:password at domain.com/external/xmlinterface.jsp?cid=xxx&resType=hotel200631&intfc=ws&xml=')
>>>> opener = urllib2.build_opener()
>>>> firstdatastream = opener.open(request)
>>>> firstdata = firstdatastream.read()
>>>> print firstdata
>
>
> <HotelAvailabilityListResults size='25'>
>  <Hotel>
>    <hotelId>134388</hotelId>
>    <name>Milford Plaza at Times Square</name>
>    <address1>700 8th Avenue</address1>
>    <address2/>
>    <address3/>
>    <city>New York</city>
>    <stateProvince>NY</stateProvince>
>    <country>US</country>
>    <postalCode>10036</postalCode>
>
> ...
>
>>>>
>
> I would like to understand how to manipulate the data further and extract for example all the hotel names in a list?
>
> Thank you
> Marti
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list