Skip to content Skip to sidebar Skip to footer

Parsing An Xml File For Unknown Elements Using Python Elementtree

I wish to extract all the tag names and their corresponding data from a multi-purpose xml file. Then save that information into a python dictionary (e.g tag = key, data = value). T

Solution 1:

from lxml import etree as ET

xmlString = """
    <some_root_name>
        <tag_x>bubbles</tag_x>
        <tag_y>car</tag_y>
        <tag...>42</tag...>
    </some_root_name> """

document = ET.fromstring(xmlString)
for elementtag in document.getiterator():
   print"elementtag name:", elementtag.tag

EDIT: To read from file instead of from string

document = ET.parse("myxmlfile.xml")

Solution 2:

>>>import xml.etree.cElementTree as et>>>xml = """...   <some_root_name>...        <tag_x>bubbles</tag_x>...        <tag_y>car</tag_y>...        <tag...>42</tag...>...    </some_root_name>...""">>>doc = et.fromstring(xml)>>>printdict((el.tag, el.text) for el in doc)
{'tag_x': 'bubbles', 'tag_y': 'car', 'tag...': '42'}

If you really want 42 instead of '42', you'll need to work a little harder and less elegantly.

Solution 3:

You could use xml.sax.handler to parse the XML:

import xml.sax as sax
import xml.sax.handler as saxhandler
import pprint

classTagParser(saxhandler.ContentHandler):
    # http://docs.python.org/library/xml.sax.handler.html#contenthandler-objectsdef__init__(self):
        self.tags = {}
    defstartElement(self, name, attrs):
        self.tag = name
    defendElement(self, name):
        if self.tag:
            self.tags[self.tag] = self.data
            self.tag = None
            self.data = Nonedefcharacters(self, content):
        self.data = content

parser = TagParser()
src = '''\
<some_root_name>
    <tag_x>bubbles</tag_x>
    <tag_y>car</tag_y>
    <tag...>42</tag...>
</some_root_name>'''
sax.parseString(src, parser)
pprint.pprint(parser.tags)

yields

{u'tag...': u'42', u'tag_x': u'bubbles', u'tag_y': u'car'}

Solution 4:

This could be done using lxml in python

from lxml import etree

myxml = """
          <root>
             value
          </root> """

doc = etree.XML(myxml)

d = {}
for element in doc.iter():
      key = element.tag
      value = element.text
      d[key] = value

print d

Post a Comment for "Parsing An Xml File For Unknown Elements Using Python Elementtree"