01 一山不容二虎的集合

1.通俗来说，集合（set）是一个无序的不重复元素序列，就是一个用来存放数据的容器。

1）集合里面的元素是不可重复的：

如：

s = {1,2,3,4,1,2,3}
print(s,type(s))
# 输出结果
{1, 2, 3, 4} <class 'set'>

2）如何定义一个空集合：

s1 = {}                   #这种定义的是空字典
print(s1,type(s1))
{} <class 'dict'>s2 = set([])             #定义一个空集合
print(s2,type(s2))
set() <class 'set'>s3 = set()               #也是定义空集合
print(s3,type(s3))
set() <class 'set'>

2.集合的创建: 使用大括号 { } 或者 set() 函数创建集合;

3. 注意：

创建一个空集合必须用 set() 而不是 { }.{ } 是用来创建一个空字典

s1 = {}
print(s1, type(s1))
s2 = set([])
print(s2, type(s2))
s3 = set()
print(s3, type(s3))

输出结果：

{} <class 'dict'>
set() <class 'set'>
set() <class 'set'>

集合里面的元素必须是不可变的数据类型；（列表是可变数据类型）

s3 = {1, 3.14, True, 'hello', [1, 2, 3], (1, 2, 3)}      #列表是可变数据类型
Traceback (most recent call last):File "E:\software-python\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_codeexec(code_obj, self.user_global_ns, self.user_ns)File "<ipython-input-10-7c24614d5a6b>", line 1, in <module>s3 = {1, 3.14, True, 'hello', [1, 2, 3], (1, 2, 3)}
TypeError: unhashable type: 'list'     #这样定义集合是错误的，集合里面的元素必须是不可变数据类型

s3 = {1, 3.14, True, 'hello',(1, 2, 3)}          #删除了可变的列表类型之后，集合定义成功
print(s3,type(s3))
{1, 3.14, 'hello', (1, 2, 3)} <class 'set'>

通过set方法可以将列表/元组/字符串转换成集合数据类型。

s4 = set('abracadabra')
print(s4,type(s4))
{'r', 'c', 'b', 'd', 'a'} <class 'set'>

s5 = {'apple','orange','apple','pear','orange','banana'}
print(s5,type(s5))
{'orange', 'banana', 'pear', 'apple'} <class 'set'>

3.集合的特性：

集合是一个无序的数据类型。最后增加的元素不一定存储在集合最后。

可以从索引，切片，重复，连接，成员操作符和for循环来看其特性

集合支持的特性只有成员操作符，索引，切片，重复，连接，均不支持！

s = {1,3,5,7}
print(1 in s)
True
print(1 not in s)
False

4，集合的内置方法：

1）增加：集合是可变, 无序数据类型，添加的顺序，和在集合中存储的顺序不同

（add：增加一个，update：增加多个）

s = {2,3,4,3,5}
s.add(1)                  #增加一个
print(s)
{1, 2, 3, 4, 5}
s.update({7,8,9})         #增加多个
print(s)
{1, 2, 3, 4, 5, 7, 8, 9}

2）删除：

1>pop（随机弹出并返回删除的值）

s = {6,7,3,1,2,4}
print(s)
{1, 2, 3, 4, 6, 7}
s.pop()
Out[41]: 1
print(s)
{2, 3, 4, 6, 7}

2>.remove（如果元素存在，直接删除，如果不存在，抛出异常KeyError）

s = {6,7,3,1,2,3}
print(s)
{1, 2, 3, 6, 7}
s.remove(6)             #删除指定的元素
print(s)
{1, 2, 3, 7}

3>discard:(如果元素存在，直接删除，如果不存在， do nothing)只能删除单个元素。

s1 = {1,2,3,4,5}
s1.discard(3,4)               #.discard一次只能删除一个元素
Traceback (most recent call last):File "E:\software-python\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2961, in run_codeexec(code_obj, self.user_global_ns, self.user_ns)File "<ipython-input-3-44fe7abbc30b>", line 1, in <module>s1.discard(3,4)
TypeError: discard() takes exactly one argument (2 given)
s1 = {1,2,3,4,5}
s1.discard(3)
print(s1)
{1, 2, 4, 5}

5.集合应用案例：（列表去重）

案例1: URL地址去重：

>方法1：列表遍历，用.append（）追加

urls = ['http://www.baidu.com','http://www.qq.com','http://www.qq.com','http://www.163.com','http://www.csdn.com','http://www.csdn.com',
]# 用来存储去重的url地址
analyze_urls = []# 依次遍历所有的url
for url in urls:# 如果url不是analyze_urls列表成员， 则追加到列表最后。# 如果url是analyze_urls列表成员，不做任何操作。if url not in analyze_urls:analyze_urls.append(url)
print("去重之后的url地址: ", analyze_urls)

>方法2：集合的性质去重：

代码如下：（此处的urls定义为集合urls{}同样可以完成，效果一致）

urls = ['http://www.baidu.com','http://www.qq.com','http://www.qq.com','http://www.163.com','http://www.csdn.com','http://www.csdn.com',
]print("去重之后的url地址: ",set(urls))          #设置输出的格式为集合类型

案例2: 判断列表去重前后元素长度是否一致：即是否存在重复元素的判断

nums = [1,2,3,4,2,3]           #如果此处nums是用集合{}定义的，结果永远都是false
# 存在重复元素: 判断nums的长度和去重之后的长度是否一致，
#       如果不一致， 则有重复的元素， 返回True
#       如果一致， 则没有重复的元素， 返回False
print(len(nums) != len(set(nums)))  # 结果为truenums = [1, 2, 3, 4]
# 存在重复元素: 判断nums的长度和去重之后的长度是否一致，
#       如果不一致， 则有重复的元素， 返回True
#       如果一致， 则没有重复的元素， 返回False
print(len(nums) != len(set(nums)))  #结果为False

案例3：华为笔试编程题: 明明的随机数（去重与排序）

题目：明明想在学校中请一些同学一起做一项问卷调查，为了实验的客观性，他先用计算机生成了N个1到1000之间的随机整数（N≤1000），对于其中重复的数字，只保留一个，把其余相同的数去掉，不同的数对应着不同的学生的学号。然后再把这些数从大到小排序，按照排好的顺序去找同学做调查。请你协助明明完成“去重”与 “排序”的工作(同一个测试用例里可能会有多组数据，希望大家能正确处理)。

思路：

1). 生成了N个1到1000之间的随机整数(N≤1000)
2). 去重: 其中重复的数字,只保留一个,把其余相同的数去掉
3). 从大到小排序

import random
# 2). 去重: 其中重复的数字,只保留一个,把其余相同的数去掉.生成一个空集合
nums = set()
N = int(input('N: '))
# 1). 生成了N个1到1000之间的随机整数(N≤1000)
for count in range(N):num = random.randint(1, 1000)nums.add(num)# 3). 从大到小排序, li.sort()智能对列表进行排序； sorted()方法可以对任意数据类型排序。
print(sorted(nums, reverse=True))

案例4：两个数组的交集:

示例 1:

输入: nums1 = [1,2,2,1], nums2 = [2,2]
输出: [2]
示例 2:

输入: nums1 = [4,9,5], nums2 = [9,4,9,8,4]
输出: [9,4]
说明:输出结果中的每个元素一定是唯一的。
我们可以不考虑输出结果的顺序。

class Solution:def intersection(self, nums1, nums2):a = set(nums1)b = set(nums2)c = a & breturn list(c)

02 frozenset

frozenset 是 set 的不可变版本，因此 set 集合中所有能改变集合本身的方法（如 add、 remove、discard、xxx_update 等），frozenset都不支持；
set 集合中不改变集合本身的方法，fronzenset 都支持。

frozenset 的这些方法和 set 集合同名方法的功能完全相同。

frozenset 的作用主要有两点：

当集合元素不需要改变时，使用 frozenset 代替 set 更安全。
当某些 API 需要不可变对象时，必须用 frozenset 代替set。比如 dict 的 key 必须是不可变对象，因此只能用 frozenset；
再比如 set 本身的集合元素必须是不可变的，因此 set 不能包含 set，set 只能包含 frozenset。

如：

set1 = frozenset({1,2,3,4})
print(set1,type(set1))
frozenset({1, 2, 3, 4}) <class 'frozenset'>
set2 = {1,2,set1}
print(set2)
{1, 2, frozenset({1, 2, 3, 4})}

03 字典

字典是另一种可变容器模型，且可存储任意类型对象。键一般是唯一的，如果重复最后的一个键值对会替换前面的，值不需要唯一。

d ={key1 : value1, key2 : value2 } d = {'Z' : '字', 'D' : '典' }

1). 字典可以快速通过key值查询到value值。O(1) ；

2). key值是不能重复的， value值无所谓；

3). 字典的key值必须是不可变的数据类型, value值可以是任意数据类型。

1.字典的创建于删除

1). 简单字典创建

dict = {"name": "fentiao", "age": 4, "gender": "male"}
print(dict['name'])
print(dict["age"])# 输出结果
# fentiao
# 4

2)空字典的定义:

s = {}
print(type(s))
d = dict()
print(d)
print(type(d))
print(type(dict()))# 输出结果
# <class 'dict'>
# {}
# <class 'dict'>
# <class 'dict'>

2). 内建方法:fromkeys

字典中的key有相同的value值，如果没有进行设定，默认相同的value值为None


ddict = {}.fromkeys(('username', 'password'), 'wlh')
print(ddict)
ddict = {}.fromkeys(('username', 'password'))
print(ddict)
# 输出结果
# {'username': 'wlh', 'password': 'wlh'}
# {'username': None, 'password': None}

3). zip间接创建（对应关系）

userInfo = zip(["name", 'age'], ["wlh", 12])
print(userInfo)
print(dict(userInfo))
# 输出结果
# <zip object at 0x0000000002949EC8>
# {'name': 'wlh', 'age': 12}

2.字典内建方法

1>字典的查看

students = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}# 字典的查看
# 通过字典的key获取对应的value值；
print(students['user1'])
# print(students['user4'])    # KeyError: 'user4', 因为key值在字典中不存在# 特别重要: get方法： 如果key存在获取对应的value值， 反之， 返回默认值(如果不指定，默认返回的是None)
print(students.get('user1'))    # [100, 100, 100]
print(students.get('user4', 'no user'))    # 'no user'
print(students.get('user1', 'user2')) # [100, 100, 100]# 查看所有的key值/value值/key-value值
print(students.keys())
print(students.values())
print(students.items())     # key-value值 [(key1, value1),(key2, value2)]运行结果：
[100, 100, 100]
[100, 100, 100]
no user
dict_keys(['user1', 'user2', 'user3'])
dict_values([[100, 100, 100], [98, 100, 100], [100, 89, 100]])
dict_items([('user1', [100, 100, 100]), ('user2', [98, 100, 100]), ('user3', [100, 89, 100])])

2>.循环遍历字典：

# for循环字符串
# for item in 'abc':
#     print(item)# for循环元组
# for item in (1, 2, 3):
#     print(item)
# # for循环集合
# for item in {1, 2, 3}:
#     print(item)students = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}# 字典遍历时默认遍历的时字典的key值
for key in students:print(key, students[key])# 遍历字典key-value建议的方法
for key,value in students.items():    # [('user1', [100, 100, 100]), ('user2', [98, 100, 100]), ('user3', [100, 89, 100])]# key,value = ('user1', [100, 100, 100])# key,value = ('user2', [98, 100, 100])# key,value = ('user3', [100, 89, 100])print(key, value)运行结果：
user1 [100, 100, 100]
user2 [98, 100, 100]
user3 [100, 89, 100]
user1 [100, 100, 100]
user2 [98, 100, 100]
user3 [100, 89, 100]

3>字典的增加方法

1). 根据key增加 /修改key-value

如果key存在，修改key-value
如果key不存在，增加key-value

import pprintstudents = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}students['user4'] = [90, 99, 89]
print(students)结果：
{'user1': [100, 100, 100], 'user2': [98, 100, 100], 'user3': [100, 89, 100], 'user4': [90, 99, 89]}

2）setdefault方法

如果key存在，不做任何操作

如果key不存在，增加key-value

import pprintstudents = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}students.setdefault('user1', [100, 89, 88])
print(students)结果为：
{'user1': [100, 100, 100], 'user2': [98, 100, 100], 'user3': [100, 89, 100], 'user4': [90, 99, 89]}

2）update方法

批量添加key-value

如果key存在，修改key-value

如果key不存在，增加key-value

import pprintstudents = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}new_student = {'westos':[100, 100, 100],'root':[100, 100, 100],'user1':[0, 0, 0]
}
students.update(new_student)
pprint.pprint(students)结果为：
{'root': [100, 100, 100],'user1': [0, 0, 0],'user2': [98, 100, 100],'user3': [100, 89, 100],'westos': [100, 100, 100]}

4>字典的删除

1). del dict[key]

# 如果key存在，删除对应的value值

# 如果key不存在，抛出异常KeyError

students = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}del students['user1']
print(students)结果：
{'user2': [98, 100, 100], 'user3': [100, 89, 100]}

2）pop方法

1.#如果key存在，删除对应的value值

2.#如果key不存在，如果没有提供默认值，则抛出异常KeyError

students = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}delete_item = students.pop('user6', 'no user')
print("删除的元素是: ", delete_item)
print(students)结果是：
删除的元素是:  no user
{'user1': [100, 100, 100], 'user2': [98, 100, 100], 'user3': [100, 89, 100]}

3). popitem方法: 随机删除字典的key-value值

students = {'user1': [100, 100, 100],'user2': [98, 100, 100],'user3': [100, 89, 100],
}key, value = students.popitem()
print("随机删除的内容: ", key, value)
结果：
随机删除的内容:  user3 [100, 89, 100]

3.英文文本预处理：英文词频统计器

作为字典(key-value)的经典应用题目，单词统计几乎出现在每一种语言键值对学习后的必练题目。

主要需求：写一个函数wordcount统计一篇文章的每个单词出现的次数(词频统计)。

统计完成后，对该统计按单词频次进行排序。

from collections import  Counter
text = """
Introducing Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7 showcases the latest features in an enterprise operating system
Introducing Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7 showcases the latest features in an enterprise operating system
Introducing Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7 showcases the latest features in an enterprise operating system
Introducing Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 7 showcases the latest features in an enterprise operating systemEnterprise architects will appreciate new capabilities such as lightweight application isolation.Application developers will welcome an updated development environment and application-profiling tools. Read more at the Red Hat Developer Blog.System administrators will appreciate new management tools and expanded file-system options with improved performance and scalability.Deployed on physical hardware, virtual machines, or in the cloud, Red Hat Enterprise Linux 7 delivers the advanced features required for next-generation architectures.Where to go from here:Red Hat Enterprise Linux 7 Product PageThe landing page for Red Hat Enterprise Linux 7 information. Learn how to plan, deploy, maintain, and troubleshoot your Red Hat Enterprise Linux 7 system.Red Hat Customer PortalYour central access point to finding articles, videos, and other Red Hat content, as well as manage your Red Hat support cases.DocumentationProvides documentation related to Red Hat Enterprise Linux and other Red Hat offerings.Red Hat Subscription ManagementWeb-based administration interface to efficiently manage systems.Red Hat Enterprise Linux Product PageProvides an entry point to Red Hat Enterprise Linux product offerings.
"""
# 1. 先拿出字符串里面的所有单词;
words = text.split()        # ['hello', 'world', 'hello', 'python', 'hello', 'java']
"""
text = "hello world hello python hello java"
words = ['hello', 'world', 'hello', 'python', 'hello', 'java']
word_count_dict = {'hello': 3, 'world': 1, 'python':1, 'java':1
}"""
# 2. 统计每个单词出现的次数
#       1). 如何存储统计好的信息: 字典存储{'hello':3, 'world':1, 'python':1, 'java':1}
#       2). 如何处理?
word_count_dict = {}
for word in words:if word not in word_count_dict:word_count_dict[word] = 1else:word_count_dict[word]  += 1
print(word_count_dict)
# 3. 排序，获取出现次数最多的单词
counter = Counter(word_count_dict)
print(counter.most_common(5))

结果：

{'Introducing': 4, 'Red': 21, 'Hat': 21, 'Enterprise': 16, 'Linux': 15, '7': 12, 'showcases': 4, 'the': 7, 'latest': 4, 'features': 5, 'in': 5, 'an': 6, 'enterprise': 4, 'operating': 4, 'system': 4, 'architects': 1, 'will': 3, 'appreciate': 2, 'new': 2, 'capabilities': 1, 'such': 1, 'as': 3, 'lightweight': 1, 'application': 1, 'isolation.': 1, 'Application': 1, 'developers': 1, 'welcome': 1, 'updated': 1, 'development': 1, 'environment': 1, 'and': 6, 'application-profiling': 1, 'tools.': 1, 'Read': 1, 'more': 1, 'at': 1, 'Developer': 1, 'Blog.': 1, 'System': 1, 'administrators': 1, 'management': 1, 'tools': 1, 'expanded': 1, 'file-system': 1, 'options': 1, 'with': 1, 'improved': 1, 'performance': 1, 'scalability.': 1, 'Deployed': 1, 'on': 1, 'physical': 1, 'hardware,': 1, 'virtual': 1, 'machines,': 1, 'or': 1, 'cloud,': 1, 'delivers': 1, 'advanced': 1, 'required': 1, 'for': 2, 'next-generation': 1, 'architectures.': 1, 'Where': 1, 'to': 6, 'go': 1, 'from': 1, 'here:': 1, 'Product': 2, 'Page': 2, 'The': 1, 'landing': 1, 'page': 1, 'information.': 1, 'Learn': 1, 'how': 1, 'plan,': 1, 'deploy,': 1, 'maintain,': 1, 'troubleshoot': 1, 'your': 2, 'system.': 1, 'Customer': 1, 'Portal': 1, 'Your': 1, 'central': 1, 'access': 1, 'point': 2, 'finding': 1, 'articles,': 1, 'videos,': 1, 'other': 2, 'content,': 1, 'well': 1, 'manage': 2, 'support': 1, 'cases.': 1, 'Documentation': 1, 'Provides': 2, 'documentation': 1, 'related': 1, 'offerings.': 2, 'Subscription': 1, 'Management': 1, 'Web-based': 1, 'administration': 1, 'interface': 1, 'efficiently': 1, 'systems.': 1, 'entry': 1, 'product': 1}
[('Red', 21), ('Hat', 21), ('Enterprise', 16), ('Linux', 15), ('7', 12)]Process finished with exit code 0

4.列表去重

li = [1,2,3,4,65,1,2,3]
print({}.fromkeys(li).keys())结果为：
dict_keys([1, 2, 3, 4, 65])

5.switch语句实现

注意： python中没有switch语句，如何间接实现？

python里面不支持switch语句;

C/C++/Java/Javascript:switch语句是用来简化if语句的.

1)老方法：

grade = input('Grade: ')
if grade =='A':print("优秀")
elif grade == 'B':print("良好")
elif grade == 'C':print("合格")
else:print('无效的成绩')运行结果：
Grade: D
无效的成绩Grade: A
优秀

2）新方法（更加的简化）

grade = input('Grade: ')
grades_dict = {'A': '优秀','B': '良好','C': '及格','D': '不及格',
}
print(grades_dict.get(grade, "无效的等级"))
# if grade in grades_dict:
#     print(grades_dict[grade])
# else:
#     print("无效的等级")运行结果：
Grade: B
良好Grade: C
及格

04 defaultdict（默认字典）

1.当我使用普通的字典时，用法一般是dict={},添加元素的只需要dict[element] =value即，调用的时候也是如此，dict[element] = xxx,但前提是element在字典里，如果不在字典里就会报错，如：

2.这时defaultdict就能排上用场了，defaultdict的作用是在于，当字典里的key不存在但被查找时，返回的不是keyError而是一个默认值。

collections.defaultdict类，本身提供了默认值的功能， 默认值可以是整形，列表，集合等.
defaultdict 是 dict 的子类。但它与 dict 最大的区别在于，如果程序试图根据不存在的 key 访问 value，会引发 KeyError 异常；
而 defaultdict 提供default_factory 属性，为该不存在的 key 来自动生成生成默认的 value。

需求：

我们想要一个能将键（key）映射到多个值的字（即所谓的一键多值字典）

解决方案：

1). 字典是一种关联容器，每个键都映射到一个单独的值上。如果想让键映射到多个值，

需要将这些多个值保存到容器（列表或者集合）中。

2). 利用collections模块中的defaultdict类自动初始化第一个值，这样只需关注添加元素。

from collections import defaultdictinfo = defaultdict(int)
info['a'] += 1
print(info['a'])运行结果：
1

from collections import defaultdictinfo = defaultdict(list)
info['a'].append(1)
print(info['a'])结果为：
[1]

3. 案例练习：

用defaultdict来做一个练习，把list(随机生成50个1-100之间的随机数)中大于66的元素和小于66的元素分辨出来

{

'大于66的元素'： [71,8 2, ,83],

'小于66的元素'： [1， 2， 3],

}

from collections import  defaultdict# 1). 随机生成50个1-100之间的随机数
import random
nums = []
for count in range(50):nums.append(random.randint(1, 100))# 2). 把list中大于66的元素和小于66的元素分类存储
sort_nums_dict = defaultdict(list)      # 创建一个默认字典， 默认的value为空列表[]
for num in nums:if num > 66:sort_nums_dict['大于66的元素'].append(num)else:sort_nums_dict['小于66的元素'].append(num)
print(sort_nums_dict)运行结果为：
defaultdict(<class 'list'>, {'小于66的元素': [64, 39, 13, 1, 21, 9, 33, 27, 54, 8, 19, 36, 7, 7, 32, 54, 4, 20, 27, 17, 41, 6, 35, 60, 2, 21, 51, 23, 17, 49, 29, 7, 53, 56], '大于66的元素': [89, 70, 81, 91, 78, 98, 82, 87, 90, 100, 85, 71, 95, 80, 93, 82]})

05 内置数据结构总结

1.可变与不可变数据类型

可变数据类型:可以增删改。

可变数据类型允许变量的值发生变化，即如果对变量进行append、+=等这种操作后，只是改变了变量的值，而不会新建一个对象；
变量引用的对象的地址也不会变化，不过对于相同的值的不同对象，在内存中则会存在不同的对象；
即每个对象都有自己的地址，相当于内存中对于同值的对象保存了多份；
这里不存在引用计数，是实实在在的对象。

不可变数据类型:不可以增删改。

python中的不可变数据类型，不允许变量的值发生变化；
如果改变了变量的值，相当于是新建了一个对象；
而对于相同的值的对象，在内存中则只有一个对象，内部会有一个引用计数来记录有多少个变量引用这个对象。

2. 有序序列和无序序列

有序序列拥有的特性: 索引、切片、连接操作符、重复操作符以及成员操作符等特性。

所以：列表、字符串、元组是python的有序序列；

集合、字典是python的无序序列。

小练习1：

问题描述：

有一个列表，其中包括 10 个元素，例如这个列表是[1,2,3,4,5,6,7,8,9,0],要求将列表中的每个元素一次向前移动一个位置，第一个元素到列表的最后，然后输出这个列表。最终样式是[2,3,4,5,6,7,8,9,0,1]

a=input("请输入一个列表：")
alist=a.split(",")
a1 = [int(alist[i]) for i in range(len(alist))]
a2=a1.pop(0)
a1.append(a2)
print(a1)结果：
请输入一个列表：1,2,3,4,5,6,7,8,9,0
[2, 3, 4, 5, 6, 7, 8, 9, 0, 1]