我无法将从多个html文件(文本不是英文)获得的正则表达式结果写入.txt文件。它将它们打印成屏幕上新行的几个字符串,但是当我尝试将它写入outfile时,它只会写入一个随机字符串。我的代码看起来像这样: 你能帮我怎么把所有的字符串写入所有大约100个文件的outfile中吗?将多个html文件的正则表达式结果写入.txt outfile
from bs4 import BeautifulSoup
import sys
import string
import re
import os
text = glob.glob('C:/Users/dell/Desktop/python-for-text-analysis-master/Notebooks/MEK/*')
for filename in text:
with open(filename, encoding='ISO-8859-1', errors="ignore") as f:
mytext = f.read()
soup = BeautifulSoup(mytext, "lxml")
extracted_text = soup.getText()
pattern = r"\ba\b\s\bleg[\w]+bb\b\s\b[\w]+\b"
result = (", ".join(re.findall(pattern, mytext)))
file = "C:/Users/dell/Desktop/python-for-text-analysis-master/Data/Charlie/charlie_neww.txt"
for row in result:
with open (file, "w", encoding="iso-8859-1", errors="ignore") as outfile:
print(result, end='\n', file=outfile)
2017-02-09
Lee
+0
我不认为你的意思是'打印(结果)'...... –
2017-02-09 21:46:15
+0
呃,等等......'结果'是一个字符串.....你认为'结果中的行'在做什么?因为我怀疑它在做你认为正在做的事情。 –
2017-02-09 21:47:27