python 字符串驻留机制

偶然发现一个python字符串的现象:

>>> a = '123_abc'
>>> b = '123_abc'
>>> a is b
True
>>> c = 'abc#123'
>>> d = 'abc#123'
>>> c is d
False

这是为什么呢,原来它们的id不一样。

>>> id(a) == id(b)
True
>>> id(c) == id(d)
False

那为什么它们的地址有的相同,有的不同呢?查询后得知这是一种 Python 的字符串驻留机制。

字符串驻留机制

也称为字符串常量优化(string interning),是一种在 Python 解释器中自动进行的优化过程。它主要目的是减少内存的使用,提高程序的运行效率。

工作原理

  1. 小字符串:Python 只会对短小的字符串进行驻留。但是,这个长度并不是固定的,它可能会因 Python 的不同版本或实现而有所不同。
  2. 字符串池(String Pool):Python 解释器维护一个字符串池,用于存储所有已经出现过的字符串常量。
  3. 驻留(Interning):当解释器遇到一个新的字符串字面量时,它会首先检查这个字符串是否已经存在于字符串池中。如果存在,则直接使用池中的引用;如果不存在,就将这个字符串添加到池中,并返回这个字符串的引用。
  4. 内存节省:由于相同的字符串字面量在程序中可能被多次使用,通过字符串驻留机制,可以确保这些重复的字符串只存储一次,从而节省内存。
  5. 性能提升:字符串比较操作可以通过比较它们的引用地址来完成,这比逐字符比较要快得多。因此,字符串驻留可以提高字符串比较的性能。
  6. 自动和透明:字符串驻留是自动进行的,程序员不需要显式地进行任何操作。Python 解释器会在后台处理这一过程。
  7. 不可变类型:字符串驻留机制只适用于不可变类型,因为可变类型的对象内容可能会改变,这会使得引用地址比较失去意义。如果字符串可以修改,那么驻留机制可能会导致意外的副作用。
  8. 限制:字符串驻留机制虽然有诸多好处,但也存在一些限制。例如,如果程序中使用了大量的动态生成的字符串,那么字符串驻留可能不会带来太大的好处,因为这些字符串可能不会被重复使用。
  9. 字符串字面量:只有当字符串是字面量时,Python 才会尝试进行驻留。通过其他方式(如 str() 函数、字符串拼接等)创建的字符串通常不会被驻留。
  10. 编译时驻留:字符串驻留是在 Python 源代码编译成字节码时进行的,而不是在运行时。这意味着在运行时动态生成的字符串通常不会被驻留。

显式驻留

Python 提供了一个sys库函数 intern(),允许程序员显式地将一个字符串驻留。使用这个函数可以手动控制字符串的驻留过程:

>>> from sys import intern
>>> s = intern('abc#123')
>>> t = intern('abc#123')
>>> s is t
True
>>> s = 'abc#123'
>>> t = 'abc#123'
>>> s is t
False
>>> a = intern('abc_123')
>>> b = 'abc_123'
>>> a is b
True

字符串长短的分界

小字符串才进行驻留,具体多少长度是界线也没有细查,大概用二分法也能枚举出来。

>>> x = 'abcdefghijklmnopqrstuvwxyz'*100
>>> y = 'abcdefghijklmnopqrstuvwxyz'*100
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*200
>>> y = 'abcdefghijklmnopqrstuvwxyz'*200
>>> x is y
False
>>> x = 'abcdefghijklmnopqrstuvwxyz'*150
>>> y = 'abcdefghijklmnopqrstuvwxyz'*150
>>> x is y
True
>>> x = 'abcdefghijklmnopqrstuvwxyz'*175
>>> y = 'abcdefghijklmnopqrstuvwxyz'*175
>>> x is y
False
......

附:字符串方法

(版本python 3.12)

capitalize(self, /)
    Return a capitalized version of the string.

    More specifically, make the first character have upper case and the rest lower
    case.

casefold(self, /)
    Return a version of the string suitable for caseless comparisons.

center(self, width, fillchar=' ', /)
    Return a centered string of length width.

    Padding is done using the specified fill character (default is a space).

count(...)
    S.count(sub[, start[, end]]) -> int

    Return the number of non-overlapping occurrences of substring sub in
    string S[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

encode(self, /, encoding='utf-8', errors='strict')
    Encode the string using the codec registered for encoding.

    encoding
      The encoding in which to encode the string.
    errors
      The error handling scheme to use for encoding errors.
      The default is 'strict' meaning that encoding errors raise a
      UnicodeEncodeError.  Other possible values are 'ignore', 'replace' and
      'xmlcharrefreplace' as well as any other name registered with
      codecs.register_error that can handle UnicodeEncodeErrors.

endswith(...)
    S.endswith(suffix[, start[, end]]) -> bool

    Return True if S ends with the specified suffix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    suffix can also be a tuple of strings to try.

expandtabs(self, /, tabsize=8)
    Return a copy where all tab characters are expanded using spaces.

    If tabsize is not given, a tab size of 8 characters is assumed.

find(...)
    S.find(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

format(...)
    S.format(*args, **kwargs) -> str

    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').

format_map(...)
    S.format_map(mapping) -> str

    Return a formatted version of S, using substitutions from mapping.
    The substitutions are identified by braces ('{' and '}').

index(...)
    S.index(sub[, start[, end]]) -> int

    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

isalnum(self, /)
    Return True if the string is an alpha-numeric string, False otherwise.

    A string is alpha-numeric if all characters in the string are alpha-numeric and
    there is at least one character in the string.

isalpha(self, /)
    Return True if the string is an alphabetic string, False otherwise.

    A string is alphabetic if all characters in the string are alphabetic and there
    is at least one character in the string.

isascii(self, /)
    Return True if all characters in the string are ASCII, False otherwise.

    ASCII characters have code points in the range U+0000-U+007F.
    Empty string is ASCII too.

isdecimal(self, /)
    Return True if the string is a decimal string, False otherwise.

    A string is a decimal string if all characters in the string are decimal and
    there is at least one character in the string.

isdigit(self, /)
    Return True if the string is a digit string, False otherwise.

    A string is a digit string if all characters in the string are digits and there
    is at least one character in the string.

isidentifier(self, /)
    Return True if the string is a valid Python identifier, False otherwise.

    Call keyword.iskeyword(s) to test whether string s is a reserved identifier,
    such as "def" or "class".

islower(self, /)
    Return True if the string is a lowercase string, False otherwise.

    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

isnumeric(self, /)
    Return True if the string is a numeric string, False otherwise.

    A string is numeric if all characters in the string are numeric and there is at
    least one character in the string.

isprintable(self, /)
    Return True if the string is printable, False otherwise.

    A string is printable if all of its characters are considered printable in
    repr() or if it is empty.

isspace(self, /)
    Return True if the string is a whitespace string, False otherwise.

    A string is whitespace if all characters in the string are whitespace and there
    is at least one character in the string.

istitle(self, /)
    Return True if the string is a title-cased string, False otherwise.

    In a title-cased string, upper- and title-case characters may only
    follow uncased characters and lowercase characters only cased ones.

isupper(self, /)
    Return True if the string is an uppercase string, False otherwise.

    A string is uppercase if all cased characters in the string are uppercase and
    there is at least one cased character in the string.

join(self, iterable, /)
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

ljust(self, width, fillchar=' ', /)
    Return a left-justified string of length width.

    Padding is done using the specified fill character (default is a space).

lower(self, /)
    Return a copy of the string converted to lowercase.

lstrip(self, chars=None, /)
    Return a copy of the string with leading whitespace removed.

    If chars is given and not None, remove characters in chars instead.

partition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string.  If the separator is found,
    returns a 3-tuple containing the part before the separator, the separator
    itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing the original string
    and two empty strings.

removeprefix(self, prefix, /)
    Return a str with the given prefix string removed if present.

    If the string starts with the prefix string, return string[len(prefix):].
    Otherwise, return a copy of the original string.

removesuffix(self, suffix, /)
    Return a str with the given suffix string removed if present.

    If the string ends with the suffix string and that suffix is not empty,
    return string[:-len(suffix)]. Otherwise, return a copy of the original
    string.

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.

rfind(...)
    S.rfind(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Return -1 on failure.

rindex(...)
    S.rindex(sub[, start[, end]]) -> int

    Return the highest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.

    Raises ValueError when the substring is not found.

rjust(self, width, fillchar=' ', /)
    Return a right-justified string of length width.

    Padding is done using the specified fill character (default is a space).

rpartition(self, sep, /)
    Partition the string into three parts using the given separator.

    This will search for the separator in the string, starting at the end. If
    the separator is found, returns a 3-tuple containing the part before the
    separator, the separator itself, and the part after it.

    If the separator is not found, returns a 3-tuple containing two empty strings
    and the original string.

rsplit(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Splitting starts at the end of the string and works to the front.

rstrip(self, chars=None, /)
    Return a copy of the string with trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

split(self, /, sep=None, maxsplit=-1)
    Return a list of the substrings in the string, using sep as the separator string.

      sep
        The separator used to split the string.

        When set to None (the default value), will split on any whitespace
        character (including \n \r \t \f and spaces) and will discard
        empty strings from the result.
      maxsplit
        Maximum number of splits (starting from the left).
        -1 (the default value) means no limit.

    Note, str.split() is mainly useful for data that has been intentionally
    delimited.  With natural text that includes punctuation, consider using
    the regular expression module.

splitlines(self, /, keepends=False)
    Return a list of the lines in the string, breaking at line boundaries.

    Line breaks are not included in the resulting list unless keepends is given and
    true.

startswith(...)
    S.startswith(prefix[, start[, end]]) -> bool

    Return True if S starts with the specified prefix, False otherwise.
    With optional start, test S beginning at that position.
    With optional end, stop comparing S at that position.
    prefix can also be a tuple of strings to try.

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace removed.

    If chars is given and not None, remove characters in chars instead.

swapcase(self, /)
    Convert uppercase characters to lowercase and lowercase characters to uppercase.

title(self, /)
    Return a version of the string where each word is titlecased.

    More specifically, words start with uppercased characters and all remaining
    cased characters have lower case.

translate(self, table, /)
    Replace each character in the string using the given translation table.

      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.

    The table must implement lookup/indexing via __getitem__, for instance a
    dictionary or list.  If this operation raises LookupError, the character is
    left untouched.  Characters mapped to None are deleted.

upper(self, /)
    Return a copy of the string converted to uppercase.

zfill(self, width, /)
    Pad a numeric string with zeros on the left, to fill a field of the given width.

    The string is never truncated.


目录

字符串驻留机制

工作原理

显式驻留

字符串长短的分界

附:字符串方法


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/32716.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

浙大宁波理工学院2024年成人高等继续教育招生简章

浙大宁波理工学院,这所承载着深厚学术底蕴和卓越教育理念的学府,正热烈开启2024年成人高等继续教育的招生之门。这里,是知识的殿堂,是智慧的摇篮,更是您实现个人梦想、追求更高境界的起点。 ​浙大宁波理工学院始终坚…

实战指南:部署Elasticsearch 8.4.1与Kibana 8.4.1并集成IK分词器

首先拉取elasticsearch和kibana镜像 docker pull elasticsearch:8.4.1 docker pull kibana:8.4.1如果遇到镜像拉去不下来,遇到如下问题: [ERROR] error pulling image configuration: Get " https://production.cloudflare.docker.com/registry-v…

【吊打面试官系列-Mysql面试题】视图有哪些优点?

大家好,我是锋哥。今天分享关于 【视图有哪些优点?】面试题,希望对大家有帮助; 视图有哪些优点? 答: (1) 视图能够简化用户的操作; (2) 视图使用户能以多种角度看待同一数据; (3) 视…

【C#】使用数字和时间方法ToString()格式化输出字符串显示

在C#编程项目开发中,几乎所有对象都有格式化字符串方法,其中常见的是数字和时间的格式化输出多少不一样,按实际需要而定吧,现记录如下,以后会用得上。 文章目录 数字格式化时间格式化 数字格式化 例如,保留…

【docker1】指令,docker-compose,Dockerfile

文章目录 1.pull/image,run/ps(进程),exec/commit2.save/load:docker save 镜像id,不是容器id3.docker-compose:多容器:宿主机(eth0网卡)安装docker会生成一…

4、SpringMVC 实战小项目【加法计算器、用户登录、留言板、图书管理系统】

SpringMVC 实战小项目 3.1 加法计算器3.1.1 准备⼯作前端 3.1.2 约定前后端交互接⼝需求分析接⼝定义请求参数:响应数据: 3.1.3 服务器代码 3.2 ⽤⼾登录3.2.1 准备⼯作3.2.2 约定前后端交互接⼝3.2.3 实现服务器端代码 3.3 留⾔板实现服务器端代码 3.4 图书管理系统准备后端 3…

【电路笔记】-共发射极放大器

共发射极放大器 文章目录 共发射极放大器1、概述2、完整的CEA配置3、直流等效电路4、交流等效电路5、输入阻抗6、输出阻抗7、电压增益8、微分电容的重要性9、信号源的衰减10、电流增益11、相位反转12、总结1、概述 在本文中,我们将介绍基于双极晶体管的放大器的最后一种拓扑:…

2024 WaniCTF repwn 部分wp

lambda 文本编辑器打开附件 稍微格式化一下 结合gpt理解题目意思。 脚本 home 附件拖入ida 简单的检查环境和反调试,进构造flag的函数 简单的ollvm,用d810嗦一下 下断点调试,通过修改eip跳过反调试。查看dest内容,需要稍微向下翻一…

QT中利用动画弄一个侧边栏窗口,以及贴条效果

1、效果 2、关键代码 void Widget::on_sliderBtn_clicked() {m_sliderWidget->show();QPropertyAnimation* animation = new QPropertyAnimation(m

第14章. GPIO简介

目录 0. 《STM32单片机自学教程》专栏 14.1 GPIO基本结构 14.1.1 保护二极管 14.1.2 上拉、下拉电阻 14.1.3 施密特触发器 14.1.4 P-MOS 管和 N-MOS 管 14.1.5 输出数据寄存器 14.1.6 输入数据寄存器 14.2 GPIO工作模式 14.2.1 输入模式 14.2.1.1 输入浮空模式 1…

ABB机器人教程:工具载荷与有效载荷数据自动标定操作方法

目录 概述 工具载荷自动标定前的准备工作 进入载荷识别服务例行程序 工具载荷识别与标定操作 有效载荷识别与标定操作要点 4轴码垛类型机器人载荷数据标定说明 概述 在使用ABB机器人前需要正确标定一些关键数据,其中就包含载荷数据。理论上讲,安装…

issues.sonatype.org网站废弃,Maven仓库账号被废弃问题解决

问题起因: 今天自己的项目发布了一个新版本,打算通过GitHub流水线直接推送至Maven中央仓库,结果发现报错 401,说我的账号密码认证失败。我充满了疑惑我寻思难度我的号被盗掉了吗。于是我打开Nexus Repository Manager尝试登录账号…

【b站-湖科大教书匠】2 物理层-计算机网络微课堂

课程地址:【计算机网络微课堂(有字幕无背景音乐版)】 https://www.bilibili.com/video/BV1c4411d7jb/?share_sourcecopy_web&vd_sourceb1cb921b73fe3808550eaf2224d1c155 目录 2 物理层 2.1 物理层的基本概念 2.2 物理层下面的传输媒…

Android Studio 安卓手机上实现火柴人动画(Java源代码—Python)

android:layout_marginLeft“88dp” android:layout_marginTop“244dp” android:text“Python” android:textSize“25sp” app:layout_constraintStart_toStartOf“parent” app:layout_constraintTop_toTopOf“parent” /> </androidx.constraintlayout.widget.…

卤货商家配送小程序商城是怎样的模式

无论生意大小、打造品牌都是必要的一步&#xff0c;只要货品新鲜、味道高、性价比高&#xff0c;其新客转化/老客复购数量都不少&#xff0c;卤货种类多且复购多个单独/聚会场景&#xff0c;以同城主要经营&#xff0c;也有部分品牌有外地食品配送需要。 想要进一步品牌传播、…

Linux PXE高效批量装机

部署PXE远程安装服务 在大规模的 Linux 应用环境中&#xff0c;如 Web 群集、分布式计算等&#xff0c;服务器往往并不配备光驱设备&#xff0c;在这种情况下&#xff0c;如何为数十乃至上百台服务器裸机快速安装系统呢?传统的USB光驱、移动硬盘等安装方法显然已经难以满足需…

电子电气架构——由NRC优先级引起的反思

我是穿拖鞋的汉子,魔都中坚持长期主义的汽车电子工程师。 老规矩,分享一段喜欢的文字,避免自己成为高知识低文化的工程师: 屏蔽力是信息过载时代一个人的特殊竞争力,任何消耗你的人和事,多看一眼都是你的不对。非必要不费力证明自己,无利益不试图说服别人,是精神上的节…

一文详解去噪扩散概率模型(DDPM)

节前&#xff0c;我们星球组织了一场算法岗技术&面试讨论会&#xff0c;邀请了一些互联网大厂朋友、参加社招和校招面试的同学。 针对算法岗技术趋势、大模型落地项目经验分享、新手如何入门算法岗、该如何准备、面试常考点分享等热门话题进行了深入的讨论。 合集&#x…

Django之云存储(二)

一、Django使用云存储 建立项目 django-admin startproject project_demo创建子应用 python manage.py startapp app_name修改配置文件,设置模板视图路径 settings.py TEMPLATES = [{BACKEND: django.template.backends.django.DjangoTemplates,DIRS: [os.path.join(BASE_DIR,…

Ike-scan一键发现通过互联网的IPsec VPN服务器(KALI工具系列二十八)

目录 1、KALI LINUX 简介 2、Ike-scan工具简介 3、信息收集 3.1 目标主机IP&#xff08;服务器&#xff09; 3.2 KALI的IP 4、操作示例 4.1 简单扫描 4.2 范围扫描 4.3 扫描多个目标 4.4 输出扫描结果 4.5 特殊扫描 5、总结 1、KALI LINUX 简介 Kali Linux 是一个功…