Python 使用 ElementTree 解析 XML 文件

关于 XML 文件的简介，看看菜鸟就可以了，链接在此。

假设我们有个存放电影数据的 XML 文件：movies.xml，其内容如下：

<?xml version="1.0"?>
<collection><genre category="Action"><decade years="1980s"><movie favorite="True" title="Indiana Jones: The raiders of the lost Ark"><format multiple="No">DVD</format><year>1981</year><rating>PG</rating><description>'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'</description></movie><movie favorite="True" title="THE KARATE KID"><format multiple="Yes">DVD,Online</format><year>1984</year><rating>PG</rating><description>None provided.</description></movie><movie favorite="False" title="Back 2 the Future"><format multiple="False">Blu-ray</format><year>1985</year><rating>PG</rating><description>Marty McFly</description></movie></decade><decade years="1990s"><movie favorite="False" title="X-Men"><format multiple="Yes">dvd, digital</format><year>2000</year><rating>PG-13</rating><description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description></movie><movie favorite="True" title="Batman Returns"><format multiple="No">VHS</format><year>1992</year><rating>PG13</rating><description>NA.</description></movie><movie favorite="False" title="Reservoir Dogs"><format multiple="No">Online</format><year>1992</year><rating>R</rating><description>WhAtEvER I Want!!!?!</description></movie></decade>    </genre><genre category="Thriller"><decade years="1970s"><movie favorite="False" title="ALIEN"><format multiple="Yes">DVD</format><year>1979</year><rating>R</rating><description>"""""""""</description></movie></decade><decade years="1980s"><movie favorite="True" title="Ferris Bueller's Day Off"><format multiple="No">DVD</format><year>1986</year><rating>PG13</rating><description>Funny movie on funny guy </description></movie><movie favorite="FALSE" title="American Psycho"><format multiple="No">blue-ray</format><year>2000</year><rating>Unrated</rating><description>psychopathic Bateman</description></movie></decade></genre>
</collection>

可以看到，XML 文件是由多个被称为元素（Element）的东西组成的，每个元素都是有头有尾的，以 <xxx> 开头，以 </xxx> 结尾。可以把元素理解为树的一个个节点，每个元素主要有三个特征：
1、tag，标签，即 XML 文件中在括号里的，被标红色的部分，是个字符串；
2、atrrib，属性，即 XML 文件中在括号里的，被标黄色和绿色的部分，它们会组成一个字典 dict，黄色的就是 key，绿色的就是 value；
3、text，文本，即 XML 文件中不在括号里的，例如：

                ...<description>'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'</description>...

使用 Python 解析 XML 文件十分简单，首先导入 ElementTree 库并且读入文件：

import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()

此时查看 root，可以看到输出就是一个元素：

<Element 'collection' at 0x0000026DF3130728>

很简单地就可以找到元素的三个特征：

print(root.tag)
print(root.attrib)
print(root.text)
'''
collection
{}'''

这表明该元素的 tag 为 collection，attrib 为空的字典，text 为空。

由于这个元素同时也相当于根节点，所以可以遍历它的子节点，有多种方法：

1、把元素看作是存放子节点的列表，直接索引

print(root[0])
print(root[0].tag)
print(root[0].attrib)
print(root[0].text)
'''
<Element 'genre' at 0x0000026DF3130778>
genre
{'category': 'Action'}'''

print(root[0][0][0][3])
print(root[0][0][0][3].tag)
print(root[0][0][0][3].attrib)
print(root[0][0][0][3].text)
'''
<Element 'description' at 0x0000026DF3130B38>
description
{}'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of  the Covenant before the Nazis.'
'''

for 循环可以索引多个

for child in root:print(child.tag, child.attrib)
'''
genre {'category': 'Action'} 
genre {'category': 'Thriller'} 
'''

2、用 root.iter(tag)，可以遍历得到某个 tag 的所有元素

for movie in root.iter('movie'):print(movie.tag, movie.attrib)
'''
movie {'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
movie {'favorite': 'True', 'title': 'THE KARATE KID'}
movie {'favorite': 'False', 'title': 'Back 2 the Future'}
movie {'favorite': 'False', 'title': 'X-Men'}
movie {'favorite': 'True', 'title': 'Batman Returns'}
movie {'favorite': 'False', 'title': 'Reservoir Dogs'}
movie {'favorite': 'False', 'title': 'ALIEN'}
movie {'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
movie {'favorite': 'FALSE', 'title': 'American Psycho'}
'''