Python 提取docx中的图片

环境

  • Python 3.6.9

  • python-docx (0.8.11)

开始

  • 安装扩展
1
$ pip3 install python-docx
  • 创建 docx_extract_img.py,代码如下
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#  -*- coding: utf-8 -*-
import docx
import os, re

def main(word_path, result_path):
"""
图片提取
:param word_path: word路径
:param result_path: 结果路径
:return:
"""
doc = docx.Document(word_path)

dict_rel = doc.part._rels

for rel in dict_rel:
rel = dict_rel[rel]
if "image" in rel.target_ref:
if not os.path.exists(result_path):
os.makedirs(result_path)
img_name = re.findall("/(.*)", rel.target_ref)[0]
word_name = os.path.splitext(word_path)[0]
if os.sep in word_name:
new_name = word_name.split('/')[-1]
else:
new_name = word_name.split('/')[-1]
img_name = f'{new_name}_{img_name}'
with open(f'{result_path}/{img_name}', "wb") as f:
f.write(rel.target_part.blob)
main('origin.docx','images')
  • 执行
1
$ python3 docx_extract_img.py
-------------本文结束感谢您的阅读-------------
0%