python处理目录下文本文件去除空格和空行

一：实现思路：

要想实现去除某个目录下所有txt文件的空格，需要循环遍历一个目录下的所有文件，
获取文件的每行数据去除空格以后，重新保存数据到当前当前文件中。
处理空格,我们使用正则，这样可以去除字符串中间和两边的的空格
line = re.sub('\s+','',line)

去除空行，我们根据len(line.strip())判断长度是0表示空行

我们将处理完的数据保存到列表，重新写入文件
file.writelines('\n'.join(lineList))

二：代码处理

网上版本：

import os
import glob

def remove_spaces_and_empty_lines(directory):
for filename in glob.glob(os.path.join(directory, '*.txt')): # 查找目录下所有的.txt文件
with open(filename, 'r') as file:
lines = file.readlines() # 读取所有行

# 去除空格和空行
new_lines = [line.strip() for line in lines if line.strip()]

with open(filename, 'w') as file: # 重新写入文件
file.writelines(new_lines)

# 指定你要处理的目录
directory = "/path/to/your/directory"
remove_spaces_and_empty_lines(directory)

这个版本不能去除字符中间的空格，也不能去除空行，我们用正则实现

修改以后版本：

import os

# 目录路径
import re

path = '"/path/to/your/directory/'

# 遍历目录下的所有文件
for filename in os.listdir(path):
# 判断是否是.txt文件
if filename.endswith('.txt'):
file_path = os.path.join(path, filename)
# 读取文件
with open(file_path, 'r') as file:
lines = file.readlines()
lineList = []
for line in lines:
if re.search('',line):
line = re.sub('\s+','',line)
lineList.append(line)
# 重写文件
with open(file_path, 'w') as file:
file.writelines('\n'.join(lineList))

处理报错：

(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid sta
rt byte

这句话翻译过来就是“UnicodeDecodeError:“utf-8”编解码器无法解码位置0中的字节0xca:无效的连续字节”。说明啥呢？简单简单一句话就是你的文本里带的字符有utf-8翻译不了的，utf-8中没有定义。我们用nodepad编辑器把文件编码改为utf-8就好了