XML data Processing

Hi guys, in this tutorial we will see how XML parsing works. I guess many of you here must be aware of what XML is if not don’t worry we will cover the basic’s of XML here too.

XML Processing


So to begin with lets see what XML is?.

XML is a portable, open source language that allows programmers to develop applications that can be read by other applications, regardless of operating system and/or developmental language.” – Standard definition.

XML stands for Extensible Markup language

XML in Layman terms :

  1. XML is Extensible which means that you have the freedom to define your own tags, the order in which they occur, and how how they should be displayed or processed.
  2. Markup means that the elements you’ll create in XML will be very similar to the elements you’ve already been creating in your HTML documents.
  3. It is Language which is pretty much similar to HTML. It’s much more flexible to HTML because it allows you to create your own custom tags.

Example :

<?xml version="1.0"?> 
 <productListing> 
  <product title=“ABC Products”> 
    <name>Product One</name> 
    <description>Product One is an exciting new widget that will 
      simplify your life.</description> 
    <cost>$19.95</cost> 
    <shipping>$2.95</shipping> 
  </product>
</productListing>

The above information is all what you need to understand XML working with Python.

xml.etree.ElementTree — The ElementTree XML API

The Element type is a flexible container object, designed to store hierarchical data structures in memory. Each element can have the following properties

  1. tag to identify the data the element represents which is string
  2. number of attributes, stored in a Python dictionary.
  3. number of child elements, stored in a Python sequence
  4. text string.
  5. an optional tail string

we can use Element Constructor or SubElement() function to create an element instance.

Installation

It’s quite easy to use the elementtree module. To install from source, simply unpack the distribution archive, change to the distribution directory, and run the setup.py script as follows :

$ python setup.py install

If you already have pip install in your machine then you can simply use the following command :

$pip install elementtree

Check you installation by typing the following line :

$python
 >>> from elementtree import Elementtree

Basic Usages

Consider the following XML data

<?xml version="1.0"?> 
 <productListing title=“MainProductTitle”> 
  <product title=“ABC Products”> 
    <name>Product One</name> 
    <pid>100</pid> 
    <description>Product One is an exciting new widget that will 
      simplify your life.</description> 
    <cost>$19.95</cost> 
    <shipping>$2.95</shipping> 
  </product>
  <product title=“DEF Products”> 
    <name>Product Two</name> 
    <pid>109</pid>
    <description>Product Two is amazing household products.</description> 
    <cost>$21.05</cost> 
    <shipping>$3.00</shipping> 
  </product>
  <product title=XYZ Products”> 
    <name>Product Three</name>
    <pid>110</pid> 
    <description>Product Three is all related to PC's and Laptop's.</description> 
    <cost>$1000.89</cost> 
    <shipping>$5.00</shipping> 
  </product>
</productListing>

one way to import the data from XML file is shown below

import xml.etree.ElementTree as ET
tree = ET.parse('Product.xml')
root = tree.getroot()

In the above example we import xml.tree.ElementTree module and have created a alias as ET. You can use the parse function to quickly load an entire XML document into an ElementTree instance using PS3 emulator, then we have made use of getroot() function to get the root element.

Now, root has a tag and dictionary of attributes :

>>root.tag
‘productListing’
>>root.attrib
{ ‘title’ : ’MainProductTitle’ }

We can also iterate over the Child Elements :

for child in root :
    print child.tag, child.attrib
output : 
product { ‘title’ : ‘ABC Products’ }
product { ‘title’ : ‘DEF Products’ }
product { ‘title’ : ‘XYZ Products’ }

Till now we have dealt with the basic understanding of the feature module. Lets get bit more advanced with the module and end our discussion with an interesting example wherein we modify our XML file.

consider the piece if code :

>>> for pid in root.iter('pid'):
...   n_pid = int(pid.text) + 1
...   pid.text = str(n_pid)
...   pid.set('unique', ‘yo')
...
>>> tree.write(‘output.xml’)

In the above code, we iterate through the pid elements in XML and add an unique attribute to that element and also increase the count of the pid’s.

Here “root” is the root element as we discussed earlier and we set attribute using Element.set() function this function can also be used to modify the existing attributes. And then we have redirected our output to “output.xml” file using the function write(xmlFileName).

Finally our output looks like this

<?xml version="1.0"?> 
 <productListing title=“MainProductTitle”> 
  <product title=“ABC Products”> 
    <name>Product One</name> 
    <pid unique=“yo”>101</pid> 
    <description>Product One is an exciting new widget that will 
      simplify your life.</description> 
    <cost>$19.95</cost> 
    <shipping>$2.95</shipping> 
  </product>
  <product title=“DEF Products”> 
    <name>Product Two</name> 
    <pid unique=“yo”>110</pid>
    <description>Product Two is amazing household products.</description> 
    <cost>$21.05</cost> 
    <shipping>$3.00</shipping> 
  </product>
  <product title=XYZ Products”> 
    <name>Product Three</name>
    <pid unique=“yo”>111</pid> 
    <description>Product Three is all related to PC's and Laptop's.</description> 
    <cost>$1000.89</cost> 
    <shipping>$5.00</shipping> 
  </product>
</productListing>

So this is our small effort to ensure that you understand the concepts in much better way so that you don’t face any difficulties in picking up the Advanced phase of this topic. You can also visit the Official Python Doc’s to gain insight knowledge on this.

Hope you enjoyed the session and If you have any queries or question, please feel free to comment below. We are always here to help you.

2 Comments

  1. Marek Turnovec January 19, 2016
    • Jaswinder Bhatia January 20, 2016