我正在编写一个简单的应用程序,它将大文本文件拆分为较小的文件,并且我已经编写了2个版本,一个使用列表,另一个使用生成器。我使用memory_profiler模块对这两个版本进行了概要分析,并清楚地显示了生成器版本的更好的内存效率,但是在对使用生成器的版本进行概要分析时,这很奇怪,这增加了执行时间。下面的演示解释了我的意思
使用列表版本
frommemory_profilerimportprofile@profile()defmain():file_name=input("Enter the full path of file you want to split into smaller inputFiles: ")input_file=open(file_name).readlines()num_lines_orig=len(input_file)parts=int(input("Enter the number of parts you want to split in: "))output_files=[(file_name+str(i))foriinrange(1,parts+1)]st=0p=int(num_lines_orig/parts)ed=pforiinrange(parts-1):withopen(output_files[i],"w")asOF:OF.writelines(input_file[st:ed])st=ed
ed=st+pwithopen(output_files[-1],"w")asOF:OF.writelines(input_file[st:])if__name__=="__main__":main()
与分析器一起运行时
$ time py36Splitting\ text\ files_BAD_usingLists.pyEnterthe full path of file you want to split into smaller inputFiles:/apps/nttech/rbhanot/Downloads/test.txtEnterthe number of parts you want to splitin:3Filename:Splittingtext files_BAD_usingLists.pyLine# Mem usage Increment Line Contents================================================647.8MiB0.0MiB@profile()7defmain():847.8MiB0.0MiBfile_name=input("Enter the full path of file you want to split into smaller inputFiles: ")9107.3MiB59.5MiBinput_file=open(file_name).readlines()10107.3MiB0.0MiBnum_lines_orig=len(input_file)11107.3MiB0.0MiBparts=int(input("Enter the number of parts you want to split in: "))12107.3MiB0.0MiBoutput_files=[(file_name+str(i))foriinrange(1,parts+1)]13107.3MiB0.0MiBst=014107.3MiB0.0MiBp=int(num_lines_orig/parts)15107.3MiB0.0MiBed=p16108.1MiB0.7MiBforiinrange(parts-1):17107.6MiB-0.5MiBwithopen(output_files[i],"w")asOF:18108.1MiB0.5MiBOF.writelines(input_file[st:ed])19108.1MiB0.0MiBst=ed20108.1MiB0.0MiBed=st+p2122108.1MiB0.0MiBwithopen(output_files[-1],"w")asOF:23108.1MiB0.0MiBOF.writelines(input_file[st:])real0m6.115suser0m0.764ssys0m0.052s
在没有分析器的情况下运行
$ time py36Splitting\ text\ files_BAD_usingLists.pyEnterthe full path of file you want to split into smaller inputFiles:/apps/nttech/rbhanot/Downloads/test.txtEnterthe number of parts you want to splitin:3real0m5.916suser0m0.696ssys0m0.080s
现在使用发电机
@profile()defmain():file_name=input("Enter the full path of file you want to split into smaller inputFiles: ")input_file=open(file_name)num_lines_orig=sum(1for_ininput_file)input_file.seek(0)parts=int(input("Enter the number of parts you want to split in: "))output_files=((file_name+str(i))foriinrange(1,parts+1))st=0p=int(num_lines_orig/parts)ed=pforiinrange(parts-1):file=next(output_files)withopen(file,"w")asOF:for_inrange(st,ed):OF.writelines(input_file.readline())st=ed
ed=st+pifnum_lines_orig-ed
file=next(output_files)withopen(file,"w")asOF:for_inrange(st,ed):OF.writelines(input_file.readline())if__name__=="__main__":main()
使用分析器选项运行时
$ time py36-m memory_profilerSplitting\ text\ files_GOOD_usingGenerators.pyEnterthe full path of file you want to split into smaller inputFiles:/apps/nttech/rbhanot/Downloads/test.txtEnterthe number of parts you want to splitin:3Filename:Splittingtext files_GOOD_usingGenerators.pyLine# Mem usage Increment Line Contents================================================447.988MiB0.000MiB@profile()5defmain():647.988MiB0.000MiBfile_name=input("Enter the full path of file you want to split into smaller inputFiles: ")747.988MiB0.000MiBinput_file=open(file_name)847.988MiB0.000MiBnum_lines_orig=sum(1for_ininput_file)947.988MiB0.000MiBinput_file.seek(0)1047.988MiB0.000MiBparts=int(input("Enter the number of parts you want to split in: "))1148.703MiB0.715MiBoutput_files=((file_name+str(i))foriinrange(1,parts+1))1247.988MiB-0.715MiBst=01347.988MiB0.000MiBp=int(num_lines_orig/parts)1447.988MiB0.000MiBed=p1548.703MiB0.715MiBforiinrange(parts-1):1648.703MiB0.000MiBfile=next(output_files)1748.703MiB0.000MiBwithopen(file,"w")asOF:1848.703MiB0.000MiBfor_inrange(st,ed):1948.703MiB0.000MiBOF.writelines(input_file.readline())202148.703MiB0.000MiBst=ed2248.703MiB0.000MiBed=st+p2348.703MiB0.000MiBifnum_lines_orig-ed