1、问题背景
我们需要比较一个文本文件 F 与路径下多个其他文本文件之间的差异。我们已经编写了以下代码,但只能输出一个文件的比较结果。我们需要修改代码,以便比较所有文件并打印所有结果。
import difflib
import fnmatch
import osfilelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path): for filename in fnmatch.filter(filenames, '*.txt'): filelist.append(os.path.join(root, filename))for m in filelist:g=open(m,'r')glines= g.readlines()# g.close()d = difflib.Differ()diff_list = list(d.compare(flines, glines))#print("".join(diff))
n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0for diff_item in diff_list:if diff_item[0] == '+':n_adds += 1elif diff_item[0] == '-':n_subs +=1 elif diff_item[0] == ' ':n_eqs += 1else: n_wiered += 1print 'lines files #1: %d #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d ' % (n_adds, n_subs, n_eqs, n_wiered)
2、解决方案
方法一:
问题在于 diff_list 被每次读取的文件覆盖。我们可以修改代码,在每次读取文件时将差异添加到 diff_list 中,而不是覆盖它。
import difflib
import fnmatch
import osfilelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path): for filename in fnmatch.filter(filenames, '*.txt'): filelist.append(os.path.join(root, filename))diff_list = [] # Initialize an empty list to store all differencesfor m in filelist:g=open(m,'r')glines= g.readlines()d = difflib.Differ()diff_list.extend(list(d.compare(flines, glines))) # Append differences to diff_listn_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0for diff_item in diff_list:if diff_item[0] == '+':n_adds += 1elif diff_item[0] == '-':n_subs +=1 elif diff_item[0] == ' ':n_eqs += 1else: n_wiered += 1print 'lines files #1: %d #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d ' % (n_adds, n_subs, n_eqs, n_wiered)
现在,代码将比较所有文件,并将所有结果打印出来。
方法二:
另一种方法是使用 filecmp.cmp 函数来比较文件。filecmp.cmp 函数接受两个文件路径作为参数,并返回一个布尔值,表示这两个文件是否相等。
import filecmp
import osfilelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path): for filename in fnmatch.filter(filenames, '*.txt'): filelist.append(os.path.join(root, filename))for file1 in filelist:for file2 in filelist:if filecmp.cmp(file1, file2, shallow=False):print(f"{file1} and {file2} are equal.")else:print(f"{file1} and {file2} are different.")
这种方法不需要读取文件内容,因此速度更快,但它只比较文件的二进制内容,不比较文件的内容。