纯粹的python优化（数据结构、cache、推导、生成器）

learn from 《Python高性能（第2版）》

1. 数据结构与算法

列表、双端队列

list 底层是数组，在开头插入和删除的时间复杂度 O(n)，在结尾插入和删除是 O(1)，访问任意元素是 O(1)
deque 底层是双向链表实现，开头结尾操作时间复杂度均为 O(1)，代价是访问中间元素是 O(n)
在有序列表里查找元素，可以使用二分查找，bisect 模块，时间O(log n)

字典、反向索引

在N篇文档中查找包含 X 单词的所有文档 [doc for doc in docs if 'X' in doc]
当N非常大的时候这样的效率是很低的

首次可以生成 单词 : [包含该单词的 doc 的 id]，以后每次查询的时间复杂度是 O(1)

集合

底层也是哈希

堆

插入和删除元素的时间复杂度都是 O(log n)

heapq

>>> import heapq
>>> c = [10,1,3,2,5]
>>> heapq.heapify(c)
>>> c
[1, 2, 3, 10, 5]
>>> heapq.heappop(c)  # 弹出最小的值 时间复杂度O(1)
1
>>> heapq.heappush(c, 9)
>>> c
[2, 5, 3, 10, 9]

PriorityQueue 优先队列，它是线程、进程安全的

>>> from queue import PriorityQueue
>>> q = PriorityQueue()
>>> for x in [2, 5, 10, 9, 1]:
...     q.put(x)
>>> q
<queue.PriorityQueue object at 0x0000017DD91EED08>
>>> while not q.empty():
...     q.get()  # 取出最小值
...
1
2
5
9
10

如果是元组等，将按第一个元素排序，一样的话，按第二个，以此类推

>>> q.put([1,2])
>>> q.put([1,3])
>>> q.put([1,-1])
>>> q.get()
[1, -1]
>>> q.get()
[1, 2]
>>> q.get()
[1, 3]

字典树

标准库没有实现，有第三方包实现
字典树可以快速查找前缀字符串，课用于文字补全

pip install patricia-trie

另外还有 C语言编写优化过的字典树库 datrie， marisa-trie

https://pypi.org/project/datrie/
https://pypi.org/project/marisa-trie/0.7.1/

from random import choice
from patricia import trie
from string import ascii_lowercase
import cProfiledef random_string(length):return ''.join(choice(ascii_lowercase) for i in range(length))def gen_strings():strings = [random_string(32) for _ in range(100000)]return stringsstrings = gen_strings()
strings_dict = {s: 0 for s in strings}
strings_trie = trie(**strings_dict)def func1():matches = [s for s in strings if s.startswith('abc')]print(matches)def func2():matches = list(strings_trie.iter('abc'))print(matches)if __name__ == "__main__":cProfile.run('func1()', sort='tottime')cProfile.run('func2()', sort='tottime')

D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/trie.py 
['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt', 'abcbaznirxvyhnvabmbhwmsefdtultpj']100006 function calls in 0.036 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)100000    0.019    0.000    0.019    0.000 {method 'startswith' of 'str' objects}1    0.017    0.017    0.036    0.036 trie.py:18(<listcomp>)1    0.000    0.000    0.036    0.036 {built-in method builtins.exec}1    0.000    0.000    0.000    0.000 {built-in method builtins.print}1    0.000    0.000    0.036    0.036 trie.py:17(func1)1    0.000    0.000    0.036    0.036 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abcbaznirxvyhnvabmbhwmsefdtultpj', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt']83 function calls (66 primitive calls) in 0.000 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}25/8    0.000    0.000    0.000    0.000 patricia.py:57(_items)1    0.000    0.000    0.000    0.000 {built-in method builtins.print}3    0.000    0.000    0.000    0.000 patricia.py:161(_next)1    0.000    0.000    0.000    0.000 trie.py:21(func2)1    0.000    0.000    0.000    0.000 patricia.py:354(iter)8    0.000    0.000    0.000    0.000 patricia.py:51(_keys)7    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}9    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}8    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}1    0.000    0.000    0.000    0.000 patricia.py:366(_accumulate)8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}1    0.000    0.000    0.000    0.000 <string>:1(<module>)3    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}5    0.000    0.000    0.000    0.000 {built-in method builtins.len}1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

可以看出 trie 快了至少好几十倍

2. 缓存

lru_cache

例如：计算斐波那契数列，动态规划
python 有 functools.lru_cache装饰器，可指定大小 maxsize

>>> from functools import lru_cache
>>> @lru_cache()
... def sum2(a, b):
...     print('calculate')
...     return a+b
...
>>> sum2(1,2)
calculate
3
>>> sum2(1,2)  # 没有进行计算，直接输出结果了
3
>>> sum2(1,3)
calculate
4

缓存性能查看

>>> sum2.cache_info()
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)

>>> sum2.cache_clear()
>>> sum2.cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

from functools import lru_cache
import cProfile
@lru_cache(maxsize=None)
def fibonacci_mem(n):if n < 1:  return 1else:  return fibonacci_mem(n - 1) + fibonacci_mem(n - 2)def fibonacci(n):if n < 1:  return 1else:  return fibonacci(n - 1) + fibonacci(n - 2)if __name__ == '__main__':cProfile.run('fibonacci(30)', sort='tottime')print(fibonacci_mem)cProfile.run('fibonacci_mem(30)', sort='tottime')

输出：

D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/cache.py 4356620 function calls (4 primitive calls) in 1.695 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)
4356617/1    1.695    0.000    1.695    1.695 cache.py:8(fibonacci)1    0.000    0.000    1.695    1.695 {built-in method builtins.exec}1    0.000    0.000    1.695    1.695 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}<functools._lru_cache_wrapper object at 0x000001882C23E550>35 function calls (4 primitive calls) in 0.000 secondsOrdered by: internal timencalls  tottime  percall  cumtime  percall filename:lineno(function)1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}32/1    0.000    0.000    0.000    0.000 cache.py:3(fibonacci_mem)1    0.000    0.000    0.000    0.000 <string>:1(<module>)1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

joblib

提供了基于磁盘的简单缓存

pip install joblib

import os
from joblib import Memory
mem = Memory(cachedir=os.path.join(os.path.dirname(__file__), 'cache_dir'))@mem.cache
def fibonacci(n):if n < 1:  return 1else:  return fibonacci(n - 1) + fibonacci(n - 2)if __name__ == '__main__':fibonacci(32)

在这里插入图片描述

3. 推导和生成器

两者可以替代显式的 for 循环，效率比 for 循环要高

对生成器对象进行迭代时，每次只返回一个计算结果，可以节省内存使用

jupyter 中

%load_ext memory_profiler
import osnumbers = range(1000000)def map_comprehension(numbers):a = [n * 2 for n in numbers]b = [n ** 2 for n in a]c = [n ** 0.33 for n in b]return max(c)%memit map_comprehension(numbers)

peak memory: 236.46 MiB, increment: 112.98 MiB

def map_normal(numbers):a = map(lambda n: n * 2, numbers)b = map(lambda n: n ** 2, a)c = map(lambda n: n ** 0.33, b)return max(c)%memit map_normal(numbers)

peak memory: 123.73 MiB, increment: 0.00 MiB

可以看出生成器版本更节省内存

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/470849.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！