一、安装

pip install python-docx

二、写入word

word 中主要有两种用文本格式等级：块等级（block-level）和内联等级（inline-level）word 中大部分内容都是由这两种等级的对象组成的（其他的诸如眉页、引脚等，docx 库的作者还在开发中）

块等级（block-level）：也就是段落

块对象一般包括：段落（paragraph）、图片（inline picture）、表（table）、标题（heading）、有序列表（numbered lists）、无序列表（bullets lists）

段落是 word 文件中的主要块对象（block-level object），块等级项（block-level item）主要任务是将文本格式从左边界向右边界展示（flows）；对于段落而言，边界就是分段标识，或者是文本的列边界，列表（table）也是块对象（block-level object）

内联等级（inline-level）：也就是字体

内联对象（inline-level object）是块对象（block-level object）的组成部分，块对象的所有内容都包含在内联对象中，一个块对象由一个或多个内联对象组成，run 是常用的内联对象，例如：

p = document.add_paragraph('This is paragraph')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

这个例子中一个段落（块对象）包含三个 run（内联对象），每一个 run 都设置有不同属性

写word示例：

# coding:utf-8
import sysfrom docx import Document
from docx.shared import Inchesdef main():reload(sys)sys.setdefaultencoding('utf-8')# 创建文档对象document = Document()# 新增样式(第一个参数是样式名称，第二个参数是样式类型：1代表段落；2代表字符；3代表表格)style = doc.styles.add_style('style name 1', 2)# 从样式库中选取 'Normal' 样式，并设置 'Normal' 样式的字符属性（font）style = document.styles['Normal']style.font.name = "Microsoft YaHei UI"style.font.size = Pt(50)# 将设置好字符属性的样式运用到段落中# p = document.add_paragraph("change font attribution", style = 'Normal')# 从样式库中选取 'Heading 2'' 样式，并设置段落格式（paragraph format）style = document.styles['Heading 2']style.paragraph_format.left_indent = Pt(20)style.paragraph_format.widow_control = True# 将设置好段落格式的 style 运用到段落中# p = document.add_paragraph('This is Heading, level 1', style = style)# 设置文档标题，中文要用unicode字符串document.add_heading(u'我的一个新文档',0)from docx.shared import RGBColor,Inches,Ptfrom docx.enum.text import WD_ALIGN_PARAGRAPH# 往文档中添加段落p = document.add_paragraph('This is a paragraph having some ')p.paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER # WD_ALIGN_PARAGRAPH.LEFT，左对齐；WD_ALIGN_PARAGRAPH.RIGHT，右对齐p.paragraph_format.left_indent = Inches(0.5)  # 设置段落从左开始缩进，使用Inches来衡量p.paragraph_format.right_indent = Pt(20)      # 设置段落从右开始缩进，使用Pt来衡量p.paragraph_format.first_line_indent = Inches(0.5)  # 设置段落第一行缩进，可以与上两个缩进叠加p.paragraph_format.space_after = Pt(5)    # 设置与上一段间隔 Pt（5）p.paragraph_format.space_before = Pt(10)  # 设置与下一段间隔 Pt（10）p.paragraph_format.line_spacing = Pt(18)  # 行距p_run = p.add_run('xxx')p_run.font.italic = True   # 设置为斜体p_run.font.size = Pt(12)   # 设置字体大小p_run.font.color.rgb = RGBColor(0, 0, 0)  # 设置字体颜色p_run.font.name = u"宋体"  # 设置字体样式p_run.font._element.rPr.rFonts.set(qn('w:eastAsia'), u'宋体')  # 设置字体样式p_run.font.underline = False  # 不设置下划线p_run.font.bold = None  # 设置粗体为继承上一个字体的格式# 这一类属性，每个有三种状态:True 为使用属性；False 为不使用属性；None 默认属性继承自上一个字体# 添加一级标题document.add_heading(u'一级标题, level = 1',level = 1)document.add_paragraph('Intense quote',style = 'IntenseQuote')# 添加无序列表document.add_paragraph('first item in unordered list',style = 'ListBullet')# 添加有序列表document.add_paragraph('first item in ordered list',style = 'ListNumber')document.add_paragraph('second item in ordered list',style = 'ListNumber')document.add_paragraph('third item in ordered list',style = 'ListNumber')# 添加图片，并指定宽度document.add_picture('e:/docs/pic.png',width = Inches(1.25))# 添加表格: 1行3列table = document.add_table(rows = 1,cols = 3)# 获取第一行的单元格列表对象hdr_cells = table.rows[0].cells# 为每一个单元格赋值，值都要为字符串类型hdr_cells[0].text = 'Name'hdr_cells[1].text = 'Age'hdr_cells[2].text = 'Tel'# 为表格添加一行new_cells = table.add_row().cellsnew_cells[0].text = 'Tom'new_cells[1].text = '19'new_cells[2].text = '12345678'# 添加分页符document.add_page_break()# 往新的一页中添加段落p = document.add_paragraph('This is a paragraph in new page.')# 保存文档document.save('e:/docs/demo1.docx')if __name__ == '__main__':main()

运行程序会得到一个下面的文档

三、读文档

对于文件名是中文的读取时会报错

doc.paragraphs # 段落集合
doc.tables # 表格集合
doc.sections # 节集合
doc.styles # 样式集合
doc.inline_shapes # 内置图形等等...

读取已有的word文档示例

# coding:utf-8
import sysfrom docx import Documentdef main():reload(sys)sys.setdefaultencoding('utf-8')# 创建文档对象，写自己的 word 路径document = Document('e:/docs/demo2.docx')# 读取文档中所有的段落列表ps = document.paragraphs# 每个段落有两个属性：style和textps_detail = [(x.text,x.style.name) for x in ps]with open('out.tmp','w+') as fout:fout.write('')# 读取段落并写入一个文件with open('out.tmp','a+') as fout:for p in ps_detail:fout.write(p[0] + '\t' + p[1] + '\n\n')# 读取文档中的所有段落的列表tables = document.tables# 遍历table，并将所有单元格内容写入文件中with open('out.tmp','a+') as fout:for table in tables:for row in table.rows:for cell in row.cells:fout.write(cell.text + '\t')fout.write('\n')if __name__ == '__main__':main()

四、其他事项

1、如果段落中是有超链接的，那么段落对象是读取不出来超链接的文本的，需要把超链接先转换成普通文本，方法：全选word文档的所有内容，按快捷键Ctrl+Shift+F9即可。

2、读取某些文件时会报错，docx.opc.exceptions.PackageNotFoundError: Package not found。原因：docx无法识别doc，需要先手动或者使用win32com转换

from win32com import client as wc
import docxdef doSaveAas():word = wc.Dispatch('Word.Application')doc = word.Documents.Open(u'E:\old.doc')        # 目标路径下的文件doc.SaveAs(u'E:\\new_path.docx', 12, False, "", True, "", False, False, False, False)  # 转化后路径下的文件    doc.Close()word.Quit()doSaveAas()

链接https://www.cnblogs.com/jiayongji/p/7290410.html

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/454717.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！