本文正则表达式主要是对 GNU Grep 3.11的章节的学习。标注特殊颜色的文字不需要太关注。
Regular Expressions
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions. grep understands three different versions of regular expression syntax: basic (BRE), extended (ERE), and Perl-compatible (PCRE). In GNU grep, basic and extended regular expressions are merely different notations for the same pattern-matching functionality. In other implementations, basic regular expressions are ordinarily less powerful than extended, though occasionally it is the other way around. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl-compatible regular expressions have different functionality, and are documented in the pcre2syntax(3) and pcre2pattern(3) manual pages, but work only if PCRE is available in the system.
正则表达式是用来匹配字符串集合的模式。正则表达式构造类似于算数表达式,通过各种操作符组成小的表达式。grep支持三种不同版本的正则表达式的语法,分别是基础正则表达式,扩展正则表达式和兼容perl正则表达式。在GNU grep中,基础和扩展正则表达式仅仅是相同的模式匹配功能用不同符号。在其他实现,基础正则表达式没有扩展的功能强大,甚至偶然还会出现功能相反的情况。下列文章适用于扩展正则表达式,最后总结基础和扩展的差异点。兼容perl正则表达式拥有不同的功能,被记录在pcre2syntax(3) and pcre2pattern(3) 的帮助页面,这种表达式仅在PCRE支持的系统中起作用。
1. Fundamental Structure
In regular expressions, the characters ‘.?*+{|()[\^$’ are special characters and have uses described below. All other characters are ordinary characters, and each ordinary character is a regular expression that matches itself.
The period ‘.’ matches any single character. It is unspecified whether ‘.’ matches an encoding error.
A regular expression may be followed by one of several repetition operators; the operators beginning with ‘{’ are called interval expressions.
‘?’
The preceding item is optional and is matched at most once.
‘*’
The preceding item is matched zero or more times.
‘+’
The preceding item is matched one or more times.
‘{n}’
The preceding item is matched exactly n times.
‘{n,}’
The preceding item is matched n or more times.
‘{,m}’
The preceding item is matched at most m times. This is a GNU extension.
‘{n,m}’
The preceding item is matched at least n times, but not more than m times.
The empty regular expression matches the empty string. Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated expressions.
Two regular expressions may be joined by the infix operator ‘|’. The resulting regular expression matches any string matching either of the two expressions, which are called alternatives.
Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole expression may be enclosed in parentheses to override these precedence rules and form a subexpression. An unmatched ‘)’ matches just itself.
参考:
Regular Expressions
https://www.cnblogs.com/rebrobot/p/15929285.html