pythonjava解释xml_Python解析XML文档

解析XML主要用到pytohn自带的XML库，其次还是lxml库

XML结构，先以一个相对简单但功能比较全的XML文档为例

dive into mark

currently between addictions

tag:diveintomark.org,2001-07-29:/

2009-03-27T21:56:07Z

Mark

http://diveintomark.org/

Dive into history, 2009 edition

href='http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'/>

tag:diveintomark.org,2009-03-27:/archives/20090327172042

2009-03-27T21:56:07Z

2009-03-27T17:20:42Z

Putting an entire chapter on one page sounds

bloated, but consider this — my longest chapter so far

would be 75 printed pages, and it loads in under 5 seconds…

On dialup.

Mark

http://diveintomark.org/

Accessibility is a harsh mistress

href='http://diveintomark.org/archives/2009/03/21/accessibility-is-a-harsh-mistress'/>

tag:diveintomark.org,2009-03-21:/archives/20090321200928

2009-03-22T01:05:37Z

2009-03-21T20:09:28Z

The accessibility orthodoxy does not permit people to

question the value of features that are rarely useful and rarely used.

Mark

A gentle introduction to video encoding, part 1: container formats

href='http://diveintomark.org/archives/2008/12/18/give-part-1-container-formats'/>

tag:diveintomark.org,2008-12-18:/archives/20081218155422

2009-01-11T19:39:22Z

2008-12-18T15:54:22Z

These notes will eventually become part of a

tech talk on video encoding.

先简单的看一下这个XML的结构

#这里定义了命名空间(namespace) http://www.w3.org/2005/Atom

#这里的没有text，但是里面有相应的属性

href='http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'/>

首先有一个全局的根元素

在根元素下面有title,subtitle,id,update,link,entry子元素

在entry元素下面还有author,title,link,id,updated,published,category,summary子元素 (姑且称为孙元素)

在author元素下面还有name,uri子元素(这该称为曾孙元素了吧~ 哈哈)

结构还是挺清晰的

下面我们用python的方法来一步步的取出在元素<>>这间的content以为元素内的属性

使用的方法主要有

tree = etree.parse() 解析XML

root = tree.getroot() 得到根元素

root.tag 根元素名称

root.attrib 显示元素的属性

root.findall() 查找元素

下面请看代码，都已经将注释与结果写在里面

import xml.etree.ElementTree as etree #将xml.etree.ElementTree引入

tree = etree.parse('feed.xml') #解析XML

root = tree.getroot()

print root

#元素即列表

print root.tag

#{http://www.w3.org/2005/Atom}feed

# ElementTree使用{namespace}localname来表达xml元素

for child in root:

print child

# 这里只显示一级子元素，而子元素的子元素将不会被遍历

#属性即字典

print root.attrib

#{'{http://www.w3.org/XML/1998/namespace}lang': 'en'}

#我们注意到feed下面的link这个元素有属性

print root[4].attrib

#{'href': 'http://diveintomark.org/', 'type': 'text/html', 'rel': 'alternate'}

print root[3].attrib

#{} 将会得到一个空字典，因为updated元素内没有属性值

#查找元素

entrylist = root.findall('{http://www.w3.org/2005/Atom}entry')

print entrylist

# [,

# 3.org/2005/Atom}entry at 18425d0>,

# t 1842968>]

print root.findall('{http://www.w3.org/2005/Atom}author')

# 这里将得到一个空列表，因为author不是feed的直接子元素

#查找子元素

entries = tree.findall('{http://www.w3.org/2005/Atom}entry') #先找到entry元素·

title = entries[0].find('{http://www.w3.org/2005/Atom}title')#接着再找title元素

print title.text

#'Dive into history, 2009 edition'

all_links = tree.findall('//{http://www.w3.org/2005/Atom}link') #在元素前面加'//' 则可以在所有元素里查找包括子元素和孙元素

# [,

# ,

# ]

print all_links[0].attrib #将会得到这个Link的属性字典

# {'href': 'http://diveintomark.org/',

# 'type': 'text/html',

# 'rel': 'alternate'}

关于XML库解析与查找XML文档基本的方法就这些了，现在通过一个实例来学以至用下

还是回到微信的XML解析上，微信将用户的信息POST到你的服务器上，基本形式如下

1348831860

1234567890123456

现在我们来通过上面介绍的方法来获得元素中的‘this is a test’字段

import xml.etree.ElementTree as etree

weixinxml = etree.parse('weixinpost.xml')

wroot = weixinxml.getroot()

print wroot.tag

for child in wroot:

print child.tag

if wroot.find('Content') is not None:

print wroot.find('Content').text

else:

print 'Nothing found'

这样简单几步就可以把想要的内容取出来了

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/507146.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

pythonjava解释xml_Python解析XML文档

相关文章

红旗linux安装oracle,Redflag Linux安装Oracle 10gR2 RAC记事

springboot开启debug日志_SpringBoot 如何优雅的打印日志？

oracle导出表中某天数据命令,Oracle数据库使用命令行导入导出数据表及数据内容（本地、远程）...

python手机话费_查询话费订单详情示例代码

windows 映射文件会释放内存吗_Windows系统共享内存管理

linux 文件的组织,Linux文件组织和目录结构

python 将列表中的字符串转为数字_python 将列表中的字符串转为数字

linux perl 安装目录,linux-将Perl模块安装到特定位置

python 降维lda算法的使用_sklearn LDA降维算法

linux 命令语法,linux常用命令及语法

jeecg 导出的excel不能使用公式_微软：Excel公式是世界上使用最广泛的编程语言...

linux 查找tomcat目录,linux下通过tomcat访问某路径下的文件

python中的画布控制_使按钮在画布上工作（tkinter）

voip 音频采集时间_蓝牙音频续航监测系统展会现场演示

linux 运行scrapy,python 文件运行 scrapy

linux的硬件系统管理,Linux 系统硬件管理的基础知识（四）

大整数减法c语言_3.2 C语言运算符和表达式

如何查看linux 是否安装软件包,linux 查看软件包是否安装 linux查看软件包

python查看数据大小_科多大数据带你看Python可以列为最值得学习的编程语言

trueOS能装linux软件,TrueOS 是什么