关于 XML 文件的简介,看看菜鸟就可以了,链接在此。
假设我们有个存放电影数据的 XML 文件:movies.xml,其内容如下:
<?xml version="1.0"?>
<collection><genre category="Action"><decade years="1980s"><movie favorite="True" title="Indiana Jones: The raiders of the lost Ark"><format multiple="No">DVD</format><year>1981</year><rating>PG</rating><description>'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'</description></movie><movie favorite="True" title="THE KARATE KID"><format multiple="Yes">DVD,Online</format><year>1984</year><rating>PG</rating><description>None provided.</description></movie><movie favorite="False" title="Back 2 the Future"><format multiple="False">Blu-ray</format><year>1985</year><rating>PG</rating><description>Marty McFly</description></movie></decade><decade years="1990s"><movie favorite="False" title="X-Men"><format multiple="Yes">dvd, digital</format><year>2000</year><rating>PG-13</rating><description>Two mutants come to a private academy for their kind whose resident superhero team must oppose a terrorist organization with similar powers.</description></movie><movie favorite="True" title="Batman Returns"><format multiple="No">VHS</format><year>1992</year><rating>PG13</rating><description>NA.</description></movie><movie favorite="False" title="Reservoir Dogs"><format multiple="No">Online</format><year>1992</year><rating>R</rating><description>WhAtEvER I Want!!!?!</description></movie></decade> </genre><genre category="Thriller"><decade years="1970s"><movie favorite="False" title="ALIEN"><format multiple="Yes">DVD</format><year>1979</year><rating>R</rating><description>"""""""""</description></movie></decade><decade years="1980s"><movie favorite="True" title="Ferris Bueller's Day Off"><format multiple="No">DVD</format><year>1986</year><rating>PG13</rating><description>Funny movie on funny guy </description></movie><movie favorite="FALSE" title="American Psycho"><format multiple="No">blue-ray</format><year>2000</year><rating>Unrated</rating><description>psychopathic Bateman</description></movie></decade></genre>
</collection>
可以看到,XML 文件是由多个被称为元素(Element)的东西组成的,每个元素都是有头有尾的,以 <xxx>
开头,以 </xxx>
结尾。可以把元素理解为树的一个个节点,每个元素主要有三个特征:
1、tag,标签,即 XML 文件中在括号里的,被标红色的部分,是个字符串;
2、atrrib,属性,即 XML 文件中在括号里的,被标黄色和绿色的部分,它们会组成一个字典 dict,黄色的就是 key,绿色的就是 value;
3、text,文本,即 XML 文件中不在括号里的,例如:
...<description>'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'</description>...
使用 Python 解析 XML 文件十分简单,首先导入 ElementTree 库并且读入文件:
import xml.etree.ElementTree as ET
tree = ET.parse('movies.xml')
root = tree.getroot()
此时查看 root,可以看到输出就是一个元素:
<Element 'collection' at 0x0000026DF3130728>
很简单地就可以找到元素的三个特征:
print(root.tag)
print(root.attrib)
print(root.text)
'''
collection
{}'''
这表明该元素的 tag 为 collection,attrib 为空的字典,text 为空。
由于这个元素同时也相当于根节点,所以可以遍历它的子节点,有多种方法:
1、把元素看作是存放子节点的列表,直接索引
print(root[0])
print(root[0].tag)
print(root[0].attrib)
print(root[0].text)
'''
<Element 'genre' at 0x0000026DF3130778>
genre
{'category': 'Action'}'''
print(root[0][0][0][3])
print(root[0][0][0][3].tag)
print(root[0][0][0][3].attrib)
print(root[0][0][0][3].text)
'''
<Element 'description' at 0x0000026DF3130B38>
description
{}'Archaeologist and adventurer Indiana Jones is hired by the U.S. government to find the Ark of the Covenant before the Nazis.'
'''
for 循环可以索引多个
for child in root:print(child.tag, child.attrib)
'''
genre {'category': 'Action'}
genre {'category': 'Thriller'}
'''
2、用 root.iter(tag),可以遍历得到某个 tag 的所有元素
for movie in root.iter('movie'):print(movie.tag, movie.attrib)
'''
movie {'favorite': 'True', 'title': 'Indiana Jones: The raiders of the lost Ark'}
movie {'favorite': 'True', 'title': 'THE KARATE KID'}
movie {'favorite': 'False', 'title': 'Back 2 the Future'}
movie {'favorite': 'False', 'title': 'X-Men'}
movie {'favorite': 'True', 'title': 'Batman Returns'}
movie {'favorite': 'False', 'title': 'Reservoir Dogs'}
movie {'favorite': 'False', 'title': 'ALIEN'}
movie {'favorite': 'True', 'title': "Ferris Bueller's Day Off"}
movie {'favorite': 'FALSE', 'title': 'American Psycho'}
'''