Chapter 10: Dictionaries | Python for Everybody 讲义笔记_En

文章目录

  • Python for Everybody
    • 课程简介
    • Dictionaries
      • Dictionaries
      • Dictionary as a set of counters
      • Dictionaries and files
      • Looping and dictionaries
      • Advanced text parsing
      • Debugging
      • Glossary


Python for Everybody

Exploring Data Using Python 3
Dr. Charles R. Severance


课程简介

Python for Everybody 零基础程序设计(Python 入门)

  • This course aims to teach everyone the basics of programming computers using Python. 本课程旨在向所有人传授使用 Python 进行计算机编程的基础知识。
  • We cover the basics of how one constructs a program from a series of simple instructions in Python. 我们介绍了如何通过 Python 中的一系列简单指令构建程序的基础知识。
  • The course has no pre-requisites and avoids all but the simplest mathematics. Anyone with moderate computer experience should be able to master the materials in this course. 该课程没有任何先决条件,除了最简单的数学之外,避免了所有内容。任何具有中等计算机经验的人都应该能够掌握本课程中的材料。
  • This course will cover Chapters 1-5 of the textbook “Python for Everybody”. Once a student completes this course, they will be ready to take more advanced programming courses. 本课程将涵盖《Python for Everyday》教科书的第 1-5 章。学生完成本课程后,他们将准备好学习更高级的编程课程。
  • This course covers Python 3.

在这里插入图片描述

coursera

Python for Everybody 零基础程序设计(Python 入门)

Charles Russell Severance
Clinical Professor

在这里插入图片描述

个人主页
Twitter

在这里插入图片描述

University of Michigan


课程资源

coursera原版课程视频
coursera原版视频-中英文精校字幕-B站
Dr. Chuck官方翻录版视频-机器翻译字幕-B站

PY4E-课程配套练习
Dr. Chuck Online - 系列课程开源官网



Dictionaries

The dictionary data structures allows us to store multiple values in an object and look up the values by their key.


Dictionaries

A dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type.

You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.

As an example, we’ll build a dictionary that maps from English to Spanish words, so the keys and the values are all strings.

The function dict creates a new dictionary with no items. Because dict is the name of a built-in function, you should avoid using it as a variable name.

>>> eng2sp = dict()
>>> print(eng2sp)
{}

The curly brackets, {}, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

>>> eng2sp['one'] = 'uno'

This line creates an item that maps from the key 'one' to the value “uno”. If we print the dictionary again, we see a key-value pair with a colon between the key and value:

>>> print(eng2sp)
{'one': 'uno'}

This output format is also an input format. For example, you can create a new dictionary with three items.

>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> print(eng2sp)
{'one': 'uno', 'two': 'dos', 'three': 'tres'}

Since Python 3.7x the order of key-value pairs is the same as their input order, i.e. dictionaries are now ordered structures.

But that doesn’t really matter because the elements of a dictionary are never indexed with integer indices. Instead, you use the keys to look up the corresponding values:

>>> print(eng2sp['two'])
'dos'

The key 'two' always maps to the value “dos” so the order of the items doesn’t matter.

If the key isn’t in the dictionary, you get an exception:

>>> print(eng2sp['four'])
KeyError: 'four'

The len function works on dictionaries; it returns the number of key-value pairs:

>>> len(eng2sp)
3

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough).

>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False

To see whether something appears as a value in a dictionary, you can use the method values, which returns the values as a type that can be converted to a list, and then use the in operator:

>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True

The in operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python uses an algorithm called a hash table that has a remarkable property: the in operator takes about the same amount of time no matter how many items there are in a dictionary. I won’t explain why hash functions are so magical, but you can read more about it at wikipedia.org/wiki/Hash_table.

Exercise 1:
Download a copy of the file www.py4e.com/code3/words.txt
Write a program that reads the words in words.txt and stores them as keys in a dictionary. It doesn’t matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary.

Dictionary as a set of counters

Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it:

  1. You could create 26 variables, one for each letter of the alphabet. Then you could traverse the string and, for each character, increment the corresponding counter, probably using a chained conditional.

  2. You could create a list with 26 elements. Then you could convert each character to a number (using the built-in function ord), use the number as an index into the list, and increment the appropriate counter.

  3. You could create a dictionary with characters as keys and counters as the corresponding values. The first time you see a character, you would add an item to the dictionary. After that you would increment the value of an existing item.

Each of these options performs the same computation, but each of them implements that computation in a different way.

An implementation is a way of performing a computation; some implementations are better than others. For example, an advantage of the dictionary implementation is that we don’t have to know ahead of time which letters appear in the string and we only have to make room for the letters that do appear.

Here is what the code might look like:

word = 'brontosaurus'
d = dict()
for c in word:if c not in d:d[c] = 1else:d[c] = d[c] + 1
print(d)

We are effectively computing a histogram, which is a statistical term for a set of counters (or frequencies).

The for loop traverses the string. Each time through the loop, if the character c is not in the dictionary, we create a new item with key c and the initial value 1 (since we have seen this letter once). If c is already in the dictionary we increment d[c].

Here’s the output of the program:

{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}

The histogram indicates that the letters “a” and “b” appear once; “o” appears twice, and so on.

Dictionaries have a method called get that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value. For example:

>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print(counts.get('jan', 0))
100
>>> print(counts.get('tim', 0))
0

We can use get to write our histogram loop more concisely. Because the get method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the if statement.

word = 'brontosaurus'
d = dict()
for c in word:d[c] = d.get(c,0) + 1
print(d)

The use of the get method to simplify this counting loop ends up being a very commonly used “idiom” in Python and we will use it many times in the rest of the book. So you should take a moment and compare the loop using the if statement and in operator with the loop using the get method. They do exactly the same thing, but one is more succinct.

Dictionaries and files

One of the common uses of a dictionary is to count the occurrence of words in a file with some written text. Let’s start with a very simple file of words taken from the text of Romeo and Juliet.

For the first set of examples, we will use a shortened and simplified version of the text with no punctuation. Later we will work with the text of the scene with punctuation included.

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief

We will write a Python program to read through the lines of the file, break each line into a list of words, and then loop through each of the words in the line and count each word using a dictionary.

You will see that we have two for loops. The outer loop is reading the lines of the file and the inner loop is iterating through each of the words on that particular line. This is an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop.

Because the inner loop executes all of its iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly.

The combination of the two nested loops ensures that we will count every word on every line of the input file.

fname = input('Enter the file name: ')
try:fhand = open(fname)
except:print('File cannot be opened:', fname)exit()counts = dict()
for line in fhand:words = line.split()for word in words:if word not in counts:counts[word] = 1else:counts[word] += 1print(counts)# Code: http://www.py4e.com/code3/count1.py

In our else statement, we use the more compact alternative for incrementing a variable. counts[word] += 1 is equivalent to counts[word] = counts[word] + 1. Either method can be used to change the value of a variable by any desired amount. Similar alternatives exist for -=, *=, and /=.

When we run the program, we see a raw dump of all of the counts in unsorted hash order. (the romeo.txt file is available at www.py4e.com/code3/romeo.txt)

python count1.py
Enter the file name: romeo.txt
{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1,
'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3,
'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1,
'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1,
'grief': 1}

It is a bit inconvenient to look through the dictionary to find the most common words and their counts, so we need to add some more Python code to get us the output that will be more helpful.

Looping and dictionaries

If you use a dictionary as the sequence in a for statement, it traverses the keys of the dictionary. This loop prints each key and the corresponding value:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:print(key, counts[key])

Here’s what the output looks like:

chuck 1
annie 42
jan 100

Again, the keys are ordered.

We can use this pattern to implement the various loop idioms that we have described earlier. For example if we wanted to find all the entries in a dictionary with a value above ten, we could write the following code:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:if counts[key] > 10 :print(key, counts[key])

The for loop iterates through the keys of the dictionary, so we must use the index operator to retrieve the corresponding value for each key. Here’s what the output looks like:

annie 42
jan 100

We see only the entries with a value above 10.

If you want to print the keys in alphabetical order, you first make a list of the keys in the dictionary using the keys method available in dictionary objects, and then sort that list and loop through the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:print(key, counts[key])

Here’s what the output looks like:

['chuck', 'annie', 'jan']
annie 42
chuck 1
jan 100

First you see the list of keys in non-alphabetical order that we get from the keys method. Then we see the key-value pairs in alphabetical order from the for loop.

Advanced text parsing

In the above example using the file romeo.txt, we made the file as simple as possible by removing all punctuation by hand. The actual text has lots of punctuation, as shown below.

But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,

Since the Python split function looks for spaces and treats words as tokens separated by spaces, we would treat the words “soft!” and “soft” as different words and create a separate dictionary entry for each word.

Also since the file has capitalization, we would treat “who” and “Who” as different words with different counts.

We can solve both these problems by using the string methods lower, punctuation, and translate. The translate is the most subtle of the methods. Here is the documentation for translate:

line.translate(str.maketrans(fromstr, tostr, deletestr))

Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

We will not specify the tostr but we will use the deletestr parameter to delete all of the punctuation. We will even let Python tell us the list of characters that it considers “punctuation”:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The parameters used by translate were different in Python 2.0.

We make the following modifications to our program:

import stringfname = input('Enter the file name: ')
try:fhand = open(fname)
except:print('File cannot be opened:', fname)exit()counts = dict()
for line in fhand:line = line.rstrip()line = line.translate(line.maketrans('', '', string.punctuation))line = line.lower()words = line.split()for word in words:if word not in counts:counts[word] = 1else:counts[word] += 1print(counts)# Code: http://www.py4e.com/code3/count2.py

Part of learning the “Art of Python” or “Thinking Pythonically” is realizing that Python often has built-in capabilities for many common data analysis problems. Over time, you will see enough example code and read enough of the documentation to know where to look to see if someone has already written something that makes your job much easier.

The following is an abbreviated version of the output:

Enter the file name: romeo-full.txt
{'romeo': 40, 'and': 42, 'juliet': 32, 'act': 1, '2': 2, 'scene': 2,
'ii': 1, 'capulets': 1, 'orchard': 2, 'enter': 1, 'he': 5, 'jests': 1,
'at': 9, 'scars': 1, 'that': 30, 'never': 2, 'felt': 1, 'a': 24, 'wound': 1,
'appears': 1, 'above': 6, 'window': 2, 'but': 18, 'soft': 1, 'what': 11,
'light': 5, 'through': 2, 'yonder': 2, 'breaks': 1, ...}

Looking through this output is still unwieldy and we can use Python to give us exactly what we are looking for, but to do so, we need to learn about Python tuples. We will pick up this example once we learn about tuples.

Debugging

As you work with bigger datasets it can become unwieldy to debug by printing and checking data by hand. Here are some suggestions for debugging large datasets:

Scale down the input
If possible, reduce the size of the dataset. For example if the program reads a text file, start with just the first 10 lines, or with the smallest example you can find. You can either edit the files themselves, or (better) modify the program so it reads only the first n lines.

If there is an error, you can reduce n to the smallest value that manifests the error, and then increase it gradually as you find and correct errors.

Check summaries and types
Instead of printing and checking the entire dataset, consider printing summaries of the data: for example, the number of items in a dictionary or the total of a list of numbers.

A common cause of runtime errors is a value that is not the right type. For debugging this kind of error, it is often enough to print the type of a value.

Write self-checks
Sometimes you can write code to check for errors automatically. For example, if you are computing the average of a list of numbers, you could check that the result is not greater than the largest element in the list or less than the smallest. This is called a “sanity check” because it detects results that are “completely illogical”.

Another kind of check compares the results of two different computations to see if they are consistent. This is called a “consistency check”.

Pretty print the output
Formatting debugging output can make it easier to spot an error.
Again, time you spend building scaffolding can reduce the time you spend debugging.

Glossary

dictionary
A mapping from a set of keys to their corresponding values.
hashtable
The algorithm used to implement Python dictionaries.
hash function
A function used by a hashtable to compute the location for a key.
histogram
A set of counters.
implementation
A way of performing a computation.
item
Another name for a key-value pair.
key
An object that appears in a dictionary as the first part of a key-value pair.
key-value pair
The representation of the mapping from a key to a value.
lookup
A dictionary operation that takes a key and finds the corresponding value.
nested loops
When there are one or more loops “inside” of another loop. The inner loop runs to completion each time the outer loop runs once.
value
An object that appears in a dictionary as the second part of a key-value pair. This is more specific than our previous use of the word “value”.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/18969.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

linux安装Tomcat及部署jpress的详细教程!!!

一、YUM在线安装 1、查看Tomcat相关安装包 [rootlocalhost ~]# yum list | grep tomcat tomcat.noarch 7.0.76-16.el7_9 updates tomcat-admin-webapps.noarch 7.0.76-16.el7_9 updates tomcat-docs…

2D视觉检测算法整理

1.ROI pooling 和 ROI align的区别 ROI pooling第一步根据候选区域找特征图的位置&#xff0c;可能不是刚好对应&#xff0c;需要一次量化&#xff0c;如上图所示&#xff0c;第二次是特征图需要转化为特定的大小&#xff0c;这时候pooling可能也不能正好整除&#xff0c;所以第…

【Linux命令200例】tee将输入内容输出到屏幕和文件

&#x1f3c6;作者简介&#xff0c;黑夜开发者&#xff0c;全栈领域新星创作者✌&#xff0c;阿里云社区专家博主&#xff0c;2023年6月csdn上海赛道top4。 &#x1f3c6;本文已收录于专栏&#xff1a;Linux命令大全。 &#x1f3c6;本专栏我们会通过具体的系统的命令讲解加上鲜…

计算机毕设 深度学习猫狗分类 - python opencv cnn

文章目录 0 前言1 课题背景2 使用CNN进行猫狗分类3 数据集处理4 神经网络的编写5 Tensorflow计算图的构建6 模型的训练和测试7 预测效果8 最后 0 前言 &#x1f525; 这两年开始毕业设计和毕业答辩的要求和难度不断提升&#xff0c;传统的毕设题目缺少创新和亮点&#xff0c;往…

RocketMQ 在业务消息场景的优势详解

作者&#xff1a;隆基 01 消息场景 RocketMQ 5.0 是消息事件流一体的实时数据处理平台&#xff0c;是业务消息领域的事实标准&#xff0c;很多互联网公司在业务消息场景会使用 RocketMQ。 我们反复提到的“消息、业务消息”&#xff0c;指的是分布式应用解耦&#xff0c;是 R…

Linux系统CPU和磁盘性能进程分析工具pidstat

一、pidstat对CPU的分析 Linux 上的pidstat(1)工具按进程或线程打印CPU 用量&#xff0c;包括用户态和系统态时间的分解。默认情况下&#xff0c;仅循环输出活动进程的信息。例如&#xff1a; 这个例子捕捉到了系统备份&#xff0c;包含了tar(1)命令&#xff0c;从文件系统读取…

攻防世界zorropub题解与subprocess模块

目录 题目分析&#xff1a; subprocess模块&#xff1a; subprocess.Popen()函数&#xff1a; subprocess.run()函数&#xff1a; 题目脚本&#xff1a; 在攻防世界做到一个题目感觉还挺有意思&#xff0c;记录一下 这个放链接也只是攻防世界的页面&#xff0c;所以直接说…

AI技术快讯:清华开源ChatGLM2双语对话语言模型

ChatGLM2-6B是一个开源项目&#xff0c;提供了ChatGLM2-6B模型的代码和资源。根据提供的搜索结果&#xff0c;以下是对该项目的介绍&#xff1a; 论文&#xff1a;https://arxiv.org/pdf/2103.10360.pdf ChatGLM2-6B是一个开源的双语对话语言模型&#xff0c;是ChatGLM-6B模…

0802|IO进程线程day5 作业(打印时钟在终端上,若终端输入quit,结束时钟)

作业1&#xff1a;守护进程 守护进程的创建&#xff08;5步&#xff09;&#xff1a; 创建孤儿进程&#xff1a;所有工作都在子进程中执行&#xff0c;从形式上脱离终端控制。 fork(), 退出父进程 创建新的会话组&#xff1a;使子进程完全独立出来&#xff0c;防止兄弟进程对其…

Python集成开发环境IDE:Spyder自动换行、函数列表outline、代码折叠

Spyder是一个用PythonQt编写的集成开发环境&#xff0c;包含许多有用的函数和工具。以下是一些常用功能&#xff1a; 变量浏览器&#xff1a;可以动态交互并修改变量&#xff0c;可以进行绘制直方图、时间序列&#xff0c;编辑日期框架或Numpy数组&#xff0c;对集合进行排序&…

【python】两数之和 python实现(详细讲解)

&#x1f449;博__主&#x1f448;&#xff1a;米码收割机 &#x1f449;技__能&#x1f448;&#xff1a;C/Python语言 &#x1f449;公众号&#x1f448;&#xff1a;测试开发自动化【获取源码商业合作】 &#x1f449;荣__誉&#x1f448;&#xff1a;阿里云博客专家博主、5…

【Java可执行命令】(十三)策略工具policytool:界面化创建、编辑和管理策略文件中的权限和配置 ~

Java可执行命令之policytool 1️⃣ 概念2️⃣ 优势和缺点3️⃣ 使用3.1 使用方式3.2 使用技巧3.3 注意事项 4️⃣ 应用场景&#x1f33e; 总结 1️⃣ 概念 在Java平台上&#xff0c;安全性是至关重要的。为了提供细粒度的安全管理机制&#xff0c;Java引入了policytool命令。p…

cmake使用笔记

vim CMakeLists.txt mkdir build cd build cmake ..创建 CMakeLists.txt&#xff0c;添加内容 cmake_minimum_required(VERSION 3.26) #工程名称 project(hello) #宏定义 add_definitions(-D宏名称) #头文件路径 include_directories(${PROJECT_SOURCE_DIR}/inc) #搜索源文件…

Python爬虫教程篇+图形化整理数据(数学建模可用)

一、首先我们先看要求 1.写一个爬虫程序 2、爬取目标网站数据&#xff0c;关键项不能少于5项。 3、存储数据到数据库&#xff0c;可以进行增删改查操作。 4、扩展&#xff1a;将库中数据进行可视化展示。 二、操作步骤&#xff1a; 首先我们根据要求找到一个适合自己的网…

Socket本质、实战演示两个进程建立TCP连接通信的过程

文章目录 Socket是什么引入面试题, 使你更深刻的理解四元组 Socket网络通信大体流程实战演示TCP连接建立过程需要用到的linux 查看网络的一些命令测试的程序一些准备工作启动服务端, 并没有调用accept启动客户端开启服务accept Socket是什么 通俗来说,Socket是套接字,是一种编…

InnoDB引擎底层逻辑讲解——架构之磁盘架构

1. System Tablespaces区域 系统表空间是change buffer&#xff08;更改缓冲区&#xff09;的存放区域&#xff0c;这是在8.0之后重新规划的&#xff0c;在5.x版本的时候&#xff0c;系统表空间还会存放innodb的数据字典undolog日志等信息&#xff0c;在8.0之后主要主要存放更…

Gitlab CI/CD笔记-第一天-GitOps和以前的和jenkins的集成的区别

一、GitOps-CI/CD的流程图 简单解释&#xff1a; 1.提交代码 2.编译构建 3.测试 4.部署 二、gitlab的实现 1、Runer 1.这个就是jenkins里的worker-slave的角色&#xff0c; 2.git-lab server 下发任务&#xff0c;Runner执行。 3.这个R…

关于样本方差为什么除以 n-1

今天上午集训摸鱼看到同学给我发的这个问题感觉挺有意思的 感性理解 这一部分的内容仅代表本蒟蒻没看严谨证明之前的个人见解&#xff0c;如果您想看严谨的证明&#xff0c;请翻到下一部分 还是先把图放上来罢省的有人不知道讲的什么东西 呃我知道这是生物竞赛的东西&#…

下载列表视频的具体操作

主要是介绍怎样获取上篇博客需要的HAR文件和请求域名

docker: Error response from daemon: No command specified.

执行 docker run -it -d -v /home/dell/workspace/workspace/test_192.168.1.202_pipeline:/home/workspace1 --name test_192.168.1.202_pipeline_10 qnx:7.1报错 问题定位&#xff1a;export导入的镜像需要带上command&#xff0c;以下命令查看command信息 docker ps --no…