identifying and parsing string in text file
Bernard
bernard.chhun at gmail.com
Sat Mar 8 15:01:37 EST 2008
Hey Brian,
It seems the text you are trying to parse is similar to XML/HTML.
So I'd use BeautifulSoup[1] if I were you :)
here's a sample code for your scraping case:
from BeautifulSoup import BeautifulSoup
<python>
# assume the s variable has your text
s = "whatever xml or html here"
# turn it into a tasty & parsable soup :)
soup = BeautifulSoup(s)
# for every element tag in the soup
for el in soup.findAll("element"):
# print out its tag & name attribute plus its inner value!
print el["tag"], el["name"], el.string
</python>
that's it!
[1] http://www.crummy.com/software/BeautifulSoup/
On 8 mar, 14:49, "Bryan.Fodn... at gmail.com" <Bryan.Fodn... at gmail.com>
wrote:
> I have a large file that has many lines like this,
>
> <element tag="300a,0014" vr="CS" vm="1" len="4"
> name="DoseReferenceStructureType">SITE</element>
>
> I would like to identify the line by the tag (300a,0014) and then grab
> the name (DoseReferenceStructureType) and value (SITE).
>
> I would like to create a file that would have the structure,
>
> DoseReferenceStructureType = Site
> ...
> ...
>
> Also, there is a possibility that there are multiple lines with the
> same tag, but different values. These all need to be recorded.
>
> So far, I have a little bit of code to look at everything that is
> available,
>
> for line in open(str(sys.argv[1])):
> i_line = line.split()
> if i_line:
> if i_line[0] == "<element":
> a = i_line[1]
> b = i_line[5]
> print "%s | %s" %(a, b)
>
> but do not see a clever way of doing what I would like.
>
> Any help or guidance would be appreciated.
>
> Bryan
More information about the Python-list
mailing list