目录
- 前言
- FString 的 operator==
- ESearchCase
- Stricmp
- BothAscii
- LowerAscii
- Stricmp 结论
- Stricmp 代码验证
- 整理思路
前言
最近大概写了如下代码
TArray<FString> TestArray;
FString Z1 = "Z1", z1 = "z1";
TestArray.Emplace(Z1);if(TestArray.Contains(z1))
{UE_LOG(LogTemp, Error, TEXT("Contains z1"));
}
else
{UE_LOG(LogTemp, Error, TEXT("Not Contains z1"));
}
大家可以猜一下输出什么。
正常来说可能会猜 Not Contains,因为 Z1 明显不等于 z1 嘛。
但是结果是 Contains z1。
很多时候我们都会用到 TArray,然后利用它来判断某个键是否存在,但是这里好像忽略了大小写一样。
我们直接进入 Contains 函数看看
/*** Checks if this array contains the element.** @returns True if found. False otherwise.* @see ContainsByPredicate, FilterByPredicate, FindByPredicate*/
template <typename ComparisonType>
bool Contains(const ComparisonType& Item) const
{for (const ElementType* RESTRICT Data = GetData(), *RESTRICT DataEnd = Data + ArrayNum; Data != DataEnd; ++Data){if (*Data == Item){return true;}}return false;
}
发现是调用了元素的 == 来判断是否存在,这里我们元素的类型是 FString,那就进入 FString 的 == 看看。
FString 的 operator==
我们直接进源码看看
/**
* Lexicographically test whether the left string is == the right string** @param Lhs String to compare against.* @param Rhs String to compare against.* @return true if the left string is lexicographically == the right string, otherwise false* @note case insensitive*/
[[nodiscard]] FORCEINLINE bool operator==(const UE_STRING_CLASS& Rhs) const
{return Equals(Rhs, ESearchCase::IgnoreCase);
}
我们发现 == 其实是调用了 Equals,然后也发现有第二个参数,进去看一下
/** Determines case sensitivity options for string comparisons. */
namespace ESearchCase
{enum Type{/** Case sensitive. Upper/lower casing must match for strings to be considered equal. */CaseSensitive,/** Ignore case. Upper/lower casing does not matter when making a comparison. */IgnoreCase,};
}
ESearchCase
- CaseSensitive:判断大小写
- IgnoreCase:忽略大小写
这样就大致明白了,== 调用的是 Equals,传入的第二个参数是IgnoreCase,所以是忽略大小写的。我们再来看看 Equals 源码。
/*** Lexicographically tests whether this string is equivalent to the Other given string* * @param Other The string test against* @param SearchCase Whether or not the comparison should ignore case* @return true if this string is lexicographically equivalent to the other, otherwise false*/
[[nodiscard]] FORCEINLINE bool Equals(const UE_STRING_CLASS& Other, ESearchCase::Type SearchCase = ESearchCase::CaseSensitive) const
{int32 Num = Data.Num();int32 OtherNum = Other.Data.Num();if (Num != OtherNum){// Handle special case where FString() == FString("")return Num + OtherNum == 1;}else if (Num > 1){if (SearchCase == ESearchCase::CaseSensitive){return TCString<ElementType>::Strcmp(Data.GetData(), Other.Data.GetData()) == 0; }else{return TCString<ElementType>::Stricmp(Data.GetData(), Other.Data.GetData()) == 0;}}return true;
}
我们发现有一个 if 分支,不同点是第一个分支调用的:Strcmp,第二个调用的 Stricmp。第一个比较常见,就是比较字符串。
Stricmp
我们进去看看第二个是干什么的,一直点进去发现如下代码
template<typename CharType1, typename CharType2>
int32 StricmpImpl(const CharType1* String1, const CharType2* String2)
{while (true){CharType1 C1 = *String1++;CharType2 C2 = *String2++;uint32 U1 = TChar<CharType1>::ToUnsigned(C1);uint32 U2 = TChar<CharType2>::ToUnsigned(C2);// Quickly move on if characters are identical but// return equals if we found two null terminatorsif (U1 == U2){if (U1){continue;}return 0;}else if (BothAscii(U1, U2)){if (int32 Diff = LowerAscii[U1] - LowerAscii[U2]){return Diff;}}else{return U1 - U2;}}
}
大致逻辑就是,一个一个字符作比较,如果发现不同的,那么立马返回。如果相同那么就一直执行,直到走到空字符(ASCLL = 0),会走到 return 0; 语句,最后返回给 Equals 函数,由于 0 = 0,所以返回 true,也就是两个数相等。
uint32 U1 = TChar<CharType1>::ToUnsigned(C1);
uint32 U2 = TChar<CharType2>::ToUnsigned(C2);
上面的代码其实就是把 C1、C2 两个 TCHAR 转换成了无符号整型,内部是转换成了 ASCLL 码。
比如C1 = ‘Z’, C2 = ‘z’,我们查询 ASCLL 码,U1 和 U2 应该等于 90 和 122。
ASCLL 码表
下一行代码
if (U1 == U2)
明显不符合,那么就进入下一个 else if
else if (BothAscii(U1, U2))
BothAscii
我们来看看这个函数
FORCEINLINE bool BothAscii(uint32 C1, uint32 C2)
{return ((C1 | C2) & 0xffffff80) == 0;
}
转换一下二进制
0xffffff80 = 1111 1111 1111 1111 1111 1111 1000 0000
因为 ASCLL 最大是 255,如果 C1 | C2 大于 255,也就是从第8位二进制开始,有一个1出现,那么就能说明 C1 和 C2 至少有一个不是 ASCLL 字符。所以BothAscii就是判断 C1 和 C2 是不是 ASCLL 表中存在的数字。
BothAscii 结束了我们在进入下一个 if 看看
if (int32 Diff = LowerAscii[U1] - LowerAscii[U2])
{return Diff;
}
LowerAscii
它是一个数组,我们来看下声明
static constexpr uint8 LowerAscii[128] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o','p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5B, 0x5C, 0x5D, 0x5E, 0x5F,0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7A, 0x7B, 0x7C, 0x7D, 0x7E, 0x7F
};
很明显,这就是一个 ASCLL 表,一行 16 个元素。
比如传进来的字符是’Z’ 和 ‘z’,那么它们的 ASCLL 分别是 90,122。
参数 U1 = 90, U2 = 122。根据 LowerAscii 我们就知道
LowerAscii['Z'] = 'z' = 122;
LowerAscii['z'] = 0x5A = 122;
所以 LowerAscii[‘Z’] == LowerAscii[‘z’];
Stricmp 结论
进一步得出结论,StricmpImpl 本身就是忽略大小写来进行字符串的判断的。
Stricmp 代码验证
uint32 U1 = TChar<TCHAR>::ToUnsigned('Z');
uint32 U2 = TChar<TCHAR>::ToUnsigned('z');static constexpr uint8 LowerAscii[128] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F,0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o','p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 0x5B, 0x5C, 0x5D, 0x5E, 0x5F,0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F,0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7A, 0x7B, 0x7C, 0x7D, 0x7E, 0x7F
};int32 LowerU1 = LowerAscii[U1], LowerU2 = LowerAscii[U2];
int32 Diff = LowerU1 - LowerU2;int32 U1U2 = (U1 | U2), U1U2And = (U1 | U2) & 0xffffff80;UE_LOG(LogTemp, Error, TEXT("U1U2: %d, U1U2And: %d, Diff: %d"), U1U2, U1U2And, Diff);if(U1 == U2)
{UE_LOG(LogTemp, Error, TEXT("U1 == U2"));
}
else
{UE_LOG(LogTemp, Error, TEXT("U1 != U2"));
}FString Z1 = "Z1", z1 = "z1";
if(Z1 == z1)
{UE_LOG(LogTemp, Error, TEXT("Z1 == z1"));
}
大家可以先思考一下会输出什么,后面再给出答案
输出如下
LogTemp: Error: U1U2: 122, U1U2And: 0, Diff: 0
LogTemp: Error: U1 != U2
LogTemp: Error: Z1 == z1
整理思路
好了,现在我们回到最初的问题,然后整理下思路,TArray 的 Contains 方法实际是调用元素的 ==,而 FString 的 == 内部是调用了忽略大小写的 Equals,才会导致 Contains(“z1”); 返回 true。
所以如果逻辑本身不能忽略大小写,那么就自己 Foreach 一个一个 Equals 去判断。
或者利用 UE 提供的另外一个函数:ContainsByPredicate
/*** Checks if this array contains an element for which the predicate is true.** @param Predicate to use* @returns True if found. False otherwise.* @see Contains, Find*/
template <typename Predicate>
FORCEINLINE bool ContainsByPredicate(Predicate Pred) const
{return FindByPredicate(Pred) != nullptr;
}
用法大致如下
TArray<FString> TestArray;
FString Z1 = "Z1", z1 = "z1";
TestArray.Emplace(Z1);if(TestArray.ContainsByPredicate([z1](const FString& Item){ return Item.Equals(z1, ESearchCase::CaseSensitive); }))
{UE_LOG(LogTemp, Error, TEXT("ByPredicate Contains z1"));
}
else
{UE_LOG(LogTemp, Error, TEXT("ByPredicate Not Contains z1"));
}
这里会输出:ByPredicate Not Contains z1,说明会判断大小写。