C语言中的字符串称为C风格字符串,是一个以'0'结尾的字符数组,string.h库只提供了有限、不甚安全的字符串操作函数。char str[]只能定义编译期确定大小的字符串,而保存在堆内存的动态字符数组却需要考虑释放内存的问题,且想要实现自变长的弹性大小也存在诸多纷繁的操作细节。C++ STL中的string类一揽子解决了诸如此类的问题。除此以外,STL算法通过traits和iterator中间层来操作各类容器,也包括string。
C++中常用的string类其实是一个typedef:
typedef basic_string string;
basic_string是STL中的类模板,用char实例化这个类模板,取别名为string。
之所以在String中定义basic_string模板类,而不是直接的string模板类,是因为其考虑处理的容器元素,不单只是一个字节的ASCII字符,还有宽字符wchar_t以及char16_t、char32_t或unsigned char等。
1 封装字符串的由来
1.1 不封装的缺陷
char *cp = "abc"; // 其实是const,字面量"abc"并不存储在栈区,而是常量区//cp[2] = 'D'; // errorchar arr[4] = "abc"; // 固定大小int size = 4;char *dp = new char[size];dp[0]='a';dp[1]='b';dp[2]='c';dp[3]='0';cout<
1.2 封装为类:可变长、并自动释放堆内存
封装一个字符指针,由这个指针指向一块new出来的堆内存,并按传入的字符串字面量复制入这块堆内存。
// string.h#ifndef __MYSTRING__#define __MYSTRING__class String{public: String(const char* cstr=0); String(const String& str); String& operator=(const String& str); ~String(); char* get_c_str() const { return m_data; }private: char* m_data;};#include inlineString::String(const char* cstr){ if (cstr) { m_data = new char[strlen(cstr)+1]; strcpy(m_data, cstr); } else { m_data = new char[1]; *m_data = '0'; }}inlineString::~String(){ delete[] m_data;}inlineString& String::operator=(const String& str){ if (this == &str) return *this; delete[] m_data; m_data = new char[ strlen(str.m_data) + 1 ]; strcpy(m_data, str.m_data); return *this;}inlineString::String(const String& str){ m_data = new char[ strlen(str.m_data) + 1 ]; strcpy(m_data, str.m_data);}// String_test.cpp#include using namespace std;ostream& operator<using namespace std;int main(){ String s1("hello"); String s2("world"); String s3(s2); cout << s3 << endl; s3 = s1; cout << s3 << endl; cout << s2 << endl; cout << s1 << endl; String *p = new String[3]; p[0] = "We"; p[1] = "are"; p[2] = "one"; char* str = p[2].get_c_str(); cout << str << endl; delete[] p; cin.get();}/*worldhelloworldhelloone*/
进一步,可以封装为类模板,可以访问不同的字符类型。
2 STL
STL string模板类在上述基础上实现了类型泛化,并封装了很多的成员函数,且有STL算法库中的函数模板支持。
2.1 两个类模板
basic_string Generic string class (class template )
char_traits Character traits (class template )
2.2 模板类4个
string String class (class )
u16string String of 16-bit characters (class )
u32string String of 32-bit characters (class )
wstring Wide string (class )
2.3 转换函数或函数模板
stoi Convert string to integer (function template )
stol Convert string to long int (function template )
stoul Convert string to unsigned integer (function template )
stoll Convert string to long long (function template )
stoull Convert string to unsigned long long (function template )
stof Convert string to float (function template )
stod Convert string to double (function template )
stold Convert string to long double (function template )
to_string Convert numerical value to string (function )
to_wstring Convert numerical value to wide string (function )
3 basic_string模板类
basic_string模板类声明:
template < class charT, class traits = char_traits, // basic_string::traits_type class Alloc = allocator // basic_string::allocator_type > class basic_string;
basci_string封装一根指向字符类型的指针,也就是管理一个以'0'结尾的字符动态数组,有了诸多操作的封装,中间层有迭代器iterator和萃取器traits,traits用于迭代器、型别等的提取,iterator用于封装一个basci_string容器类对象指针,一些独立于容器的算法可以通过迭代器来操作容器元素。
char_traits用于类模板容器元素的型别选择,也是一个模板类:
template struct char_traits;template <> struct char_traits;template <> struct char_traits;template <> struct char_traits;template <> struct char_traits;
该特性类别规定了“复制字符”或“比较字符“的做法,如果不指定该特性类别,会根据现有的字符型别采用默认的特性类别。
其成员函数包括:
eqCompare characters for equality ( public static member function )ltCompare characters for inequality ( public static member function )lengthGet length of null-terminated string ( public static member function )assignAssign character ( public static member function )compareCompare sequences of characters ( public static member function )findFind first occurrence of character ( public static member function )moveMove character sequence ( public static member function )copyCopy character sequence ( public static member function )eofEnd-of-File character ( public static member function )not_eofNot End-of-File character ( public static member function )to_char_typeTo char type ( public static member function )to_int_typeTo int type ( public static member function )eq_int_typeCompare int_type values ( public static member function )
实例:
// char_traits::compare#include // std::cout#include // std::basic_string, std::char_traits#include // std::tolower#include // std::size_t// case-insensitive traits:struct custom_traits: std::char_traits { static bool eq (char c, char d) { return std::tolower(c)==std::tolower(d); } static bool lt (char c, char d) { return std::tolower(c)<:tolower static int compare char p const q std::size_t n while if return lt main std::basic_string> foo,bar; foo = "Test"; bar = "test"; if (foo==bar) std::cout << "foo and bar are equal"; return 0;}// output: // char_traits::compare
allocator有默认值,定义字符串类别所采用的内存模式,将容器与物理存储细节分隔,提供了一套分配与释放堆内存的标准方式,其型别为charT,可以是char、wchar_t、char16_t、char32_t。
类模板实例化的模板类:
stringString class (class )wstringWide string (class )u16string String of 16-bit characters (class )u32string String of 32-bit characters (class )
所用字符串类型均使用相同接口,用法和问题都一样,以下是部分成员函数:
(constructor)Construct basic_string object (public member function )(destructor)String destructor (public member function )operator=String assignment (public member function )sizeReturn size (public member function )lengthReturn length of string (public member function )operator[]Get character of string (public member function )atGet character of string (public member function )operator+=Append to string (public member function )appendAppend to string (public member function )push_backAppend character to string (public member function )insertInsert into string (public member function )replaceReplace portion of string (public member function )c_strGet C-string equivalentfindFind first occurrence in string (public member function )rfindFind last occurrence in string (public member function )substrGenerate substring (public member function )compareCompare strings (public member function )operator+Concatenate strings (function template )operator>>Extract string from stream (function template )getlineGet line from stream into string (function template )
4 string类
string类是使用char(即字节)作为其字符类型的basic_string类模板的一个实例化,其有默认char_traits和分配器类型。
其迭代器为a random access iterator to char。
string提供的接口类似于标准字节容器的接口,但添加了专门设计用于使用单字节字符字符串操作的功能。
string类独立于使用的编码处理字节,如果用于处理多字节或可变长度字符(如UTF-8)的序列,则该类的所有成员(如长度或大小)及其迭代器仍将按字节(而不是实际的编码字符)操作。string类的型别虽是char,但因为char是一个字节,其有天然的按字节(byte)处理的优势,也是汉字等一些其它编码的字符能被string类处理的原因,处理完后再按其原本的编码解析即可。
wstring类是使用wchar_t(即字节)作为其字符类型的basic_string类模板的一个实例化。
在Windows下,wchar_t占2个字节(byte);在Linux下,wchar_t占4个字节。
u16string、u32string则是确定长度的宽字节类型。
wstring、u16string、u32string类有与string类相同接口的成员函数。
-End-