Linux 正则表达式(basic and extened)

正则表达式(Regular Expressions),整理自:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
gred
sed

定义

Regular Expressions (REs) provide a mechanism to select specific strings from a set of character strings.
Regular expressions are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale.

什么是 “locale”?

参考自:Locale

A locale is the definition of the subset of a user’s environment that depends on language and cultural conventions.
It is made up from one or more categories.
Each category is identified by its name and controls specific aspects of the behavior of components of the system.
Category names correspond to the following environment variable names:

LC_CTYPE

Character classification and case conversion.

LC_COLLATE

Collation order.

LC_MONETARY

Monetary formatting.

LC_NUMERIC

Numeric, non-monetary formatting.

LC_TIME

Date and time formats.

LC_MESSAGES

Formats of informative and diagnostic messages and interactive responses.

我们常用的是:POSIX Locale

collating element

先回答一个比较容易混淆的概念:什么是collating element?

In many languages, collation (sorting like in a dictionary) is not only done per-character.
For instance, in Czech, ch doesn’t sort between cg and ci like it would in English, but is considered as a whole for sorting.
It is a collating element (we can’t refer to a character here, character are a subset of collating elements) that sorts in between h and i.

也就是说collating element是在某些语言系统中,多个字符组成一个字符的含义。这让我想起音标。

正则表达式中:
What does [[.ch.]] mean in a regex?

When you use [.ch.] in a regexp, you basically say:
“I expect a non-English input sequence with the digraph ch.
I want my regexp to match the single charachter ch.
My programming language/regex engine/keyboard does not allow me to write this digraph’s sign, so I type in [.ch.].
I don’t mean a c followed by an h. Please only find occurences of the digraph as a single charachter.”
[[.ch.]] means that the digraph is part of a a set of characters.
In this case only one character actually. Just standard regexp notation.

参考:https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03

multi-character collating element is that of “ch”.

Subject

  1. Basic Regular Expression
  2. Extended Regular Expression

Both BREs and EREs are supported by the Regular Expression Matching interface in the System Interfaces volume of POSIX.1-2017 under regcomp(), regexec(), and related functions.

Matched 定义

A sequence of zero or more characters shall be said to be matched by a BRE or ERE when the characters in the sequence correspond to a sequence of characters defined by the pattern.

The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression is found, where “first” is defined to mean “begins earliest in the string”.
If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.

For example, the BRE “bb*” matches the second to fourth characters of the string “abbbc”, and the ERE “(wee|week)(knights|night)” matches all ten characters of the string “weeknights”.

这是采用的是贪心算法。

Consistent with the whole match being the longest of the leftmost matches, each subpattern, from left to right, shall match the longest possible string.
For this purpose, a null string shall be considered to be longer than no match at all.
For example, matching the BRE "\(.*\).*" against “abcdef”, the subexpression "(\1)" is “abcdef”, and matching the BRE "\(a*\)*" against “bc”, the subexpression “(\1)” is the null string.

BRE (ERE) matching a single character 匹配一个字符

A BRE or ERE that shall match either a single character or a single collating element.

Only a BRE or ERE of this type that includes a bracket expression ( RE Bracket Expression) can match a collating element.

BRE (ERE) matching multiple characters 匹配多个字符

A BRE or ERE that shall match a concatenation of single characters or collating elements.

Such a BRE or ERE is made up from a BRE (ERE) matching a single character and BRE (ERE) special characters.

invalid

character class expression

[:alnum:]   [:cntrl:]   [:lower:]   [:space:]
[:alpha:]   [:digit:]   [:print:]   [:upper:]
[:blank:]   [:graph:]   [:punct:]   [:xdigit:]

对应的名字在下面:

LC_CTYPE Category in the POSIX Locale
The minimum character classifications for the POSIX locale follow; the code listing depicts the localedef input, and the table represents the same information, sorted by character. Implementations may add additional characters to the cntrl and punct classifications but shall not make any other additions.LC_CTYPE
# The following is the minimum POSIX locale LC_CTYPE.
# "alpha" is by definition "upper" and "lower"
# "alnum" is by definition "alpha" and "digit"
# "print" is by definition "alnum", "punct", and the <space>
# "graph" is by definition "alnum" and "punct"
#
upper    <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>
#
lower    <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>
#
digit    <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\<seven>;<eight>;<nine>
#
space    <tab>;<newline>;<vertical-tab>;<form-feed>;\<carriage-return>;<space>
#
cntrl    <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\<form-feed>;<carriage-return>;\<NUL>;<SOH>;<STX>;<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;\<SI>;<DLE>;<DC1>;<DC2>;<DC3>;<DC4>;<NAK>;<SYN>;\<ETB>;<CAN>;<EM>;<SUB>;<ESC>;<IS4>;<IS3>;<IS2>;\<IS1>;<DEL>
#
punct    <exclamation-mark>;<quotation-mark>;<number-sign>;\<dollar-sign>;<percent-sign>;<ampersand>;<apostrophe>;\<left-parenthesis>;<right-parenthesis>;<asterisk>;\<plus-sign>;<comma>;<hyphen-minus>;<period>;<slash>;\<colon>;<semicolon>;<less-than-sign>;<equals-sign>;\<greater-than-sign>;<question-mark>;<commercial-at>;\<left-square-bracket>;<backslash>;<right-square-bracket>;\<circumflex>;<underscore>;<grave-accent>;<left-curly-bracket>;\<vertical-line>;<right-curly-bracket>;<tilde>
#
xdigit   <zero>;<one>;<two>;<three>;<four>;<five>;<six>;<seven>;\<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;<a>;<b>;<c>;<d>;<e>;<f>
#
blank    <space>;<tab>
#
toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\(<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\(<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\(<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\(<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);(<z>,<Z>)
#
tolower (<A>,<a>);(<B>,<b>);(<C>,<c>);(<D>,<d>);(<E>,<e>);\(<F>,<f>);(<G>,<g>);(<H>,<h>);(<I>,<i>);(<J>,<j>);\(<K>,<k>);(<L>,<l>);(<M>,<m>);(<N>,<n>);(<O>,<o>);\(<P>,<p>);(<Q>,<q>);(<R>,<r>);(<S>,<s>);(<T>,<t>);\(<U>,<u>);(<V>,<v>);(<W>,<w>);(<X>,<x>);(<Y>,<y>);(<Z>,<z>)
END LC_CTYPE

Regular Expression General Requirements

The requirements in this section shall apply to both basic and extended regular expressions.

The use of regular expressions is generally associated with text processing.
REs (BREs and EREs) operate on text strings, that is, zero or more characters followed by an end-of-string delimiter (typically NUL). Some utilities employing regular expressions limit the processing to lines; that is, zero or more characters followed by a <newline>.

上面提到两个特别的符号NUL和 newline

In the functions processing regular expressions described in System Interfaces volume of POSIX.1-2017, the <newline> is regarded as an ordinary character and both a <period> and a non-matching list can match one.
其实就是一个"."
在 POSIX.1-2017系统接口中 newline是一个常规字符串,同时period 以及 non-matching list都能字面匹配

The Shell and Utilities volume of POSIX.1-2017 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of <newline> characters; if not stated otherwise, the use of literal <newline> characters or any escape sequence equivalent in either patterns or matched text produces undefined results.

如果是使用正则匹配的工具,就要自己说明是否允许匹配newline。
在有些工具中,一般不处理<newline>,出现<newline>在patterns或者在需要匹配的字符串中,都是为undefined results.
比如说在grep中就不能处理<newline>,grep

If the final byte of an input file is not a newline, grep silently supplies one.
Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text.

Those utilities (like grep) that do not allow <newline> characters to match are responsible for eliminating any <newline> from strings before matching against the RE.
The regcomp() function in the System Interfaces volume of POSIX.1-2017, however, can provide support for such processing without violating the rules of this section.

The interfaces specified in POSIX.1-2017 do not permit the inclusion of a NUL character in an RE or in the string to be matched.
If during the operation of a standard utility a NUL is included in the text designated to be matched, that NUL may designate the end of the text string for the purposes of matching.

下面一段说明要允许有大小写不敏感的匹配。
When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched.

This definition of case-insensitive processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the string is matched using both its cases.
For example, in a locale where “Ch” is a multi-character collating element and where a matching list expression matches such elements, the RE "[[.Ch.]]" when matched against the string “char” is in reality matched against “ch”, “Ch”, “cH”, and “CH”.

The implementation shall support any regular expression that does not exceed 256 bytes in length.

Basic Regular Expression BRE

BREs Matching a Single Character or Collating Element

  • A BRE ordinary character, a special character preceded by a
    <backslash>, or a <period> shall match a single character.
  • A bracke expression shall match a single character or a single collating
    element.

什么是ordinary character

An ordinary character is a BRE that matches itself: any character in the supported character set, except for the BRE special characters listed in BRE Special Characters.

The interpretation of an ordinary character preceded by an unescaped ( ‘\’ ) is undefined, except for:

  • The characters ‘)’, ‘(’, ‘{’, and ‘}’
  • The digits 1 to 9 inclusive (see BREs Matching Multiple Characters)
  • A character inside a bracket expression.

BRE Special Characters

A BRE special character has special properties in certain contexts.
Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself.
The BRE special characters and the contexts in which they have their special meaning are as follows:

special characterusageliteral character (match iteself)bracket expression (match itself)notation
..\\.[.]The period ‘.’ matches any single character,这个不用加backslach
?\?\\? or ?[?]The preceding item is optional and is matched at most once.
*\*\\* or *[*]The preceding item is matched zero or more times.
+\+\\+ or +[+]The preceding item is matched one or more times.
{n}\{n\}\\{ or \\} or { or }[{] or [}]The preceding item is matched exactly n times.
{n,}\{n,\}同上同上The preceding item is matched n or more times.
{,m}\{,m\}同上同上The preceding item is matched at most m times. This is a GNU extension.
{n,m}\{n,m\}同上同上The preceding item is matched at least n times, but not more than m times.
^^\\^only sequences starting at the first character of a string shall be matched by the BRE
$$\\$[$]match the end-of-string following the last character
\\\\\\[\]转义标志是2个\\
|\|\\|[|]"expression1\|expression2"
()\(\)( or )[(] or [)]group expression Groups the inner regexp as a whol
[][]\\[ or \\][[] or []]Bracket Expression,不用加\
--\\-[-]Bracket Expression range expression 不用加\

Note that:
A <period> ( '.' ), a BRE that shall match any character in the supported character set except NUL or <newline> \r\n.

注意上面需要加backslash\的地方,有些地方需要加,有些地方不需要加
如果你需要匹配上面的特殊字符,就需要转义字符

Special Backslash Expressions

The ‘\’ character followed by a special character is a regular expression that matches the special character. The ‘\’ character, when followed by certain ordinary characters, takes a special meaning:

characternotation
‘\b’Match the empty string at the edge of a word.
‘\B’Match the empty string provided it’s not at the edge of a word.
‘<’Match the empty string at the beginning of a word.
‘>’Match the empty string at the end of a word.
‘\w’Match word constituent, it is a synonym for [_[:alnum:]].
‘\W’Match non-word constituent, it is a synonym for [^_[:alnum:]].
‘\s’Match whitespace, it is a synonym for [[:space:]]
‘\S’Match non-whitespace, it is a synonym for [^[:space:]].
‘]’Match ‘]’.
‘}’Match ‘}’.

For example, ‘\brat\b’ matches the separate word ‘rat’, ‘\Brat\B’ matches ‘crate’ but not ‘furry rat’.

The behavior of grep is unspecified if a unescaped backslash is not followed by a special character, a nonzero digit, or a character in the above list. Although grep might issue a diagnostic and/or give the backslash an interpretation now, its behavior may change if the syntax of regular expressions is extended in future versions.

RE Bracket Expression

A bracket expression (an expression enclosed in square brackets, “[]” ) is an RE that shall match a specific set of single characters, and may match a specific set of multi-character collating elements, based on the non-empty set of list expressions contained in the bracket expression.

The following rules and definitions apply to bracket expressions:

  1. A bracket expression is either a matching list expression or a non-matching list expression.
  2. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions.
  3. The <right-square-bracket> ( ']' ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial ( ‘^’ ), if any).Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as “[.].]” ) or is the ending for a collating symbol, equivalence class, or character class.
  4. The special characters '.', '*', '[', and '\\' ( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.
  5. The character sequences "[.", "[=", and "[:" ( <left-square-bracket> followed by a <period>, <equals-sign>, or <colon>) shall be special inside a bracket expression and are used to delimit collating symbols, equivalence class expressions, and character class expressions. These symbols shall be followed by a valid expression and the matching terminating sequence ".]", "=]", or ":]", as described in the following items.
  6. A matching list expression specifies a list that shall match any single character that is matched by one of the expressions represented in the list.
  7. The first character in the list cannot be the <circumflex>.
  8. An ordinary character in the list should only match that character, but may match any single character that collates equally with that character; for example, “[abc]” is an RE that should only match one of the characters ‘a’, ‘b’, or ‘c’.
  9. A non-matching list expression begins with a <circumflex> ( '^' ), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading ). For example, if the RE "[abc]" only matches 'a', 'b', or 'c', then "[^abc]" is an RE that matches any character except 'a', 'b', or 'c'. It is unspecified whether a non-matching list expression matches a multi-character collating element that is not matched by any of the expressions.
  10. 10.The <circumflex> shall have this special meaning only when it occurs first in the list, immediately following the<left-square-bracket>.
  11. A collating symbol is a collating element enclosed within bracket-period ( “[.” and “.]” ) delimiters. Collating elements are defined as described in Collation Order.
    Conforming applications shall represent multi-character collating elements as collating symbols when it is necessary to distinguish them from a list of the individual characters that make up the multi-character collating element. For example, if the string “ch” is a collating element defined using the line:collating-element <ch-digraph> from "<c><h>"
  12. in the locale definition, the expression “[[.ch.]]” shall be treated as an RE containing the collating symbol ‘ch’, while “[ch]” shall be treated as an RE matching ‘c’ or ‘h’.
    Collating symbols are recognized only inside bracket expressions. If the string is not a collating element in the current locale, the expression is invalid.
  13. An equivalence class expression shall represent the set of collating elements belonging to an equivalence class, as described in Collation Order.
    Only primary equivalence classes shall be recognized.
    The class shall be expressed by enclosing any one of the collating elements in the equivalence class within bracket-equal ( “[=” and “=]” ) delimiters.
    For example, if ‘a’, ‘à’, and ‘â’ belong to the same equivalence class, then “[[=a=]b]”, “[[=à=]b]”, and “[[=â=]b]” are each equivalent to “[aàâb]”. If the collating element does not belong to an equivalence class, the equivalence class expression shall be treated as a collating symbol.
  14. A character class expression shall represent the union of two sets:
    The set of single characters that belong to the character class, as defined in the LC_CTYPE category in the current locale.
    An unspecified set of multi-character collating elements.
    All character classes specified in the current locale shall be recognized.
    A character class expression is expressed as a character class name enclosed within bracket- <colon> ( "[:" and ":]" ) delimiters.

The following character class expressions shall be supported in all locales:

[:alnum:]   [:cntrl:]   [:lower:]   [:space:]
[:alpha:]   [:digit:]   [:print:]   [:upper:]
[:blank:]   [:graph:]   [:punct:]   [:xdigit:]
  1. In the POSIX locale, a range expression shall be expressed as the starting point and the ending point separated by a <hyphen-minus> ( '-' ).
    In the following, all examples assume the POSIX locale.
    eated as invalid.
    The interpretation of range expressions where the ending range point is also the starting range point of a subsequent range expression (for example, "[a-m-o]" ) is undefined.

  2. The character shall be treated as itself if it occurs first (after an initial ‘^’, if any) or last in the list, or as an ending range point in a range expression.
    As examples, the expressions “[-ac]” and “[ac-]” are equivalent and match any of the characters 'a', 'c', or '-';
    "[^-ac]" and "[^ac-]" are equivalent and match any characters except 'a', 'c', or '-';

  3. the expression "[%--]" matches any of the characters between ‘%’ and ‘-’ inclusive;

  4. the expression “[–@]” matches any of the characters between ‘-’ and ‘@’ inclusive;

  5. and the expression "[a--@]" is either invalid or equivalent to ‘@’, because the letter ‘a’ follows the symbol ‘-’ in the POSIX locale.

  6. To use a <hyphen-minus> as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example, "[][.-.]-0]", which matches either a <right-square-bracket> or any character or collating element that collates between <hyphen-minus> and 0, inclusive.

  7. If a bracket expression specifies both ‘-’ and ‘]’, the ‘]’ shall be placed first (after the ‘^’, if any) and the ‘-’ last within the bracket expression.

Note:
A future version of this standard may require that an ordinary character in the list only matches that character.
It is unspecified whether a matching list expression matches a multi-character collating element that is matched by one of the expressions.

Anchoring

The caret ‘^’ and the dollar sign ‘$’ are special characters that respectively match the empty string at the beginning and end of a line. They are termed anchors, since they force the match to be “anchored” to beginning or end of a line, respectively.

Back-references and Subexpressions

The back-reference ‘\n’, where n is a single nonzero digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.
For example, ‘(a)\1’ matches ‘aa’. If the parenthesized subexpression does not participate in the match, the back-reference makes the whole match fail; for example, ‘(a)*\1’ fails to match ‘a’. If the parenthesized subexpression matches more than one substring, the back-reference refers to the last matched substring; for example, ‘^(ab*)*\1$’ matches ‘ababbabb’ but not ‘ababbab’. When multiple regular expressions are given with -e or from a file (‘-f file’), back-references are local to each expression.

Basic vs Extended Regular Expressions

Basic regular expressions differ from extended regular expressions in the following ways:

  1. The characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ lose their special meaning; instead use the backslashed versions ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’. Also, a backslash is needed before an interval expression’s closing ‘}’.
  2. An unmatched ‘)’ is invalid.
  3. If an unescaped ‘^’ appears neither first, nor directly after ‘(’ or ‘|’, it is treated like an ordinary character and is not an anchor.
  4. If an unescaped ‘$’ appears neither last, nor directly before ‘|’ or ‘)’, it is treated like an ordinary character and is not an anchor.
  5. If an unescaped ‘*’ appears first, or appears directly after ‘(’ or ‘|’ or anchoring ‘^’, it is treated like an ordinary character and is not a repetition operator.

Problematic Regular Expressions

Some strings are invalid regular expressions and cause grep/sed to issue a diagnostic and fail. For example, ‘xy\1’ is invalid because there is no parenthesized subexpression for the back-reference ‘\1’ to refer to.

Also, some regular expressions have unspecified behavior and should be avoided even if grep does not currently diagnose them. For example, ‘xy\0’ has unspecified behavior because ‘0’ is not a special character and ‘\0’ is not a special backslash expression (see Special Backslash Expressions). Unspecified behavior can be particularly problematic because the set of matched strings might be only partially specified, or not be specified at all, or the expression might even be invalid.

The following regular expression constructs are invalid on all platforms conforming to POSIX, so portable scripts can assume that grep rejects these constructs:

  1. A basic regular expression containing a back-reference ‘\n’ preceded by fewer than n closing parentheses. For example, ‘(a)\2’ is invalid.
  2. A bracket expression containing ‘[:’ that does not start a character class; and similarly for ‘[=’ and ‘[.’. For example, ‘[a[:b]’ and ‘[a[:ouch:]b]’ are invalid.

GNU grep treats the following constructs as invalid. However, other grep implementations might allow them, so portable scripts should not rely on their being invalid:

  1. Unescaped ‘\’ at the end of a regular expression.
  2. Unescaped ‘[’ that does not start a bracket expression.
  3. A ‘{’ in a basic regular expression that does not start an interval expression.
  4. A basic regular expression with unbalanced ‘(’ or ‘)’, or an extended regular expression with unbalanced ‘(’.
  5. In the POSIX locale, a range expression like ‘z-a’ that represents zero elements. A non-GNU grep might treat it as a valid range that never matches.
  6. An interval expression with a repetition count greater than 32767. (The portable POSIX limit is 255, and even interval expressions with smaller counts can be impractically slow on all known implementations.)
  7. A bracket expression that contains at least three elements, the first and last of which are both ‘:’, or both ‘.’, or both ‘=’. For example, a non-GNU grep might treat ‘[:alpha:]’ like ‘[[:alpha:]]’, or like ‘[:ahlp]’.

The following constructs have well-defined behavior in GNU grep. However, they have unspecified behavior elsewhere, so portable scripts should avoid them:

  1. Special backslash expressions like ‘\b’, ‘<’, and ‘]’. See Special Backslash Expressions.
  2. A basic regular expression that uses ‘?’, ‘+’, or ‘|’.
  3. An extended regular expression that uses back-references.
  4. An empty regular expression, subexpression, or alternative. For example, ‘(a|bc|)’ is not portable; a portable equivalent is ‘(a|bc)?’.
  5. In a basic regular expression, an anchoring ‘^’ that appears directly after ‘(’, or an anchoring ‘$’ that appears directly before ‘)’.
  6. In a basic regular expression, a repetition operator that directly follows another repetition operator.
  7. In an extended regular expression, unescaped ‘{’ that does not begin a valid interval expression. GNU grep treats the ‘{’ as an ordinary character.
  8. A null character or an encoding error in either pattern or input data. See Character Encoding.
  9. An input file that ends in a non-newline character, where GNU grep silently supplies a newline.

The following constructs have unspecified behavior, in both GNU and other grep implementations. Scripts should avoid them whenever possible.

  1. A backslash escaping an ordinary character, unless it is a back-reference like ‘\1’ or a special backslash expression like ‘<’ or ‘\b’. See Special Backslash Expressions. For example, ‘\x’ has unspecified behavior now, and a future version of grep might specify ‘\x’ to have a new behavior.
  2. A repetition operator that appears directly after an anchor, or at the start of a complete regular expression, parenthesized subexpression, or alternative. For example, ‘+|^*(+a|?-b)’ has unspecified behavior, whereas ‘+|^*(+a|?-b)’ is portable.
  3. A range expression outside the POSIX locale. For example, in some locales ‘[a-z]’ might match some characters that are not lowercase letters, or might not match some lowercase letters, or might be invalid. With GNU grep it is not documented whether these range expressions use native code points, or use the collating sequence specified by the LC_COLLATE category, or have some other interpretation. Outside the POSIX locale, it is portable to use ‘[[:lower:]]’ to match a lower-case letter, or ‘[abcdefghijklmnopqrstuvwxyz]’ to match an ASCII lower-case letter.

If a bracket expression contains at least three list elements, where the first and last list elements are the same single-character element of , , or , then it is unspecified whether the bracket expression will be treated as a collating symbol, equivalence class, or character class, respectively; treated as a matching list

Extended Regular Expression ERE

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/60693.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Elasticsearch入门到落地】2、正向索引和倒排索引

接上篇《1、初识Elasticsearch》 上一篇我们学习了什么是Elasticsearch&#xff0c;以及Elastic stack(ELK)技术栈介绍。本篇我们来什么是正向索引和倒排索引&#xff0c;这是了解Elasticsearch底层架构的核心。 上一篇我们学习到&#xff0c;Elasticsearch的底层是由Lucene实…

【Spring Boot】# 使用@Scheduled注解无法执行定时任务

1. 前言 在 Spring Boot中&#xff0c;使用Scheduled注解来定义定时任务时&#xff0c;定时任务不执行&#xff1b;或未在规定时间执行。 import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component;Component public c…

STM32总体架构简单介绍

目录 一、引言 二、STM32的总体架构 1、三个被动单元 &#xff08;1&#xff09;内部SRAM &#xff08;2&#xff09;内部闪存存储器 &#xff08;3&#xff09;AHB到APB的桥&#xff08;AHB to APBx&#xff09; 2、四个主动&#xff08;驱动&#xff09;单元 &#x…

C# Postman或者PostApi调试前端webapi接口发送带有request/body/head信息

知识&#xff1a; 前端接口&#xff0c;表单形式提交。 req.ContentType "application/x-www-form-urlencoded"; x-www-form-urlencoded 是一种常见的 MIME 类型&#xff0c;用于将键值对编码为 HTTP 请求体中的 URL 编码格式。在 Web API 中&#xff0c;x-www-for…

李宏毅机器学习课程知识点摘要(1-5集)

前5集 过拟合&#xff1a; 参数太多&#xff0c;导致把数据集刻画的太完整。而一旦测试集和数据集的关联不大&#xff0c;那么预测效果还不如模糊一点的模型 所以找的数据集的量以及准确性也会影响 由于线性函数的拟合一般般&#xff0c;所以用一组函数去分段来拟合 sigmoi…

七、SElinux

一、SElinux简介 SELinux是Security-Enhanced Linux的缩写&#xff0c;意思是安全强化的linuxSELinux 主要由美国国家安全局(NSA)开发&#xff0c;当初开发的目的是为了避免资源的误用传统的访问控制在我们开启权限后&#xff0c;系统进程可以直接访问当我们对权限设置不严谨时…

小程序25- iconfont 字体图标的使用

项目中使用到图标&#xff0c;一般由公司设计进行设计&#xff0c;设计好后上传到阿里巴巴矢量图标库 日常开发过程中&#xff0c;也可以通过 iconfont 图标库下载使用自带的图标 补充&#xff1a;使用 iconfont 图标库报错&#xff1a;Failed to load font 操作步骤&#xff…

鸢尾花植物的结构认识和Python中scikit-learn工具包的安装

鸢尾花植物的结构认识和Python中scikit-learn工具包的安装 鸢尾花植物的结构认识和Python中scikit-learn工具包的安装 鸢尾花植物的结构认识和Python中scikit-learn工具包的安装一、鸢尾花的认识1.1 对花结构和功能认识1.2、鸢尾花认识1.2.1 鸢尾花种类1.2.2 鸢尾花结构 二. Py…

Unity3D 截图

使用 Unity3D 自带的截图接口&#xff0c;制作截图工具。 截图 有时候我们想对 Unity 的窗口进行截图&#xff0c;如果直接使用一些截图工具&#xff0c;很难截取到一张完整分辨率的图片&#xff08;例如&#xff0c;我们想要截取一张 1920 * 1080 的图片&#xff09;。 其实…

Mysql的加锁情况详解

最近在复习mysql的知识点&#xff0c;像索引、优化、主从复制这些很容易就激活了脑海里尘封的知识&#xff0c;但是在mysql锁的这一块真的是忘的一干二净&#xff0c;一点映像都没有&#xff0c;感觉也有点太难理解了&#xff0c;但是还是想把这块给啃下来&#xff0c;于是想通…

丹摩征文活动 | AI创新之路,DAMODEL助你一臂之力GPU

目录 前言—— DAMODEL&#xff08;丹摩智算&#xff09; 算力服务 直观的感受算力提供商的强大​ 平台功能介绍​ 镜像选择 云磁盘创建 总结 前言—— 只需轻点鼠标,开发者便可拥有属于自己的AI计算王国 - 从丰富的GPU实例选择,到高性能的云磁盘,再到预配置的深度学习…

Linux之日志

日志 在编写网络服务器, 各种软件时, 程序一定要打印一些日志信息. 1. 可以向显示器打印, 也可以向文件中写入. 2. 日志是软件在运行时记录的流水账, 用于排查服务进程挂掉的信息. 其中必须要有的是: 日志等级, 时间, 日志内容.可选的是文件名, 代码行数, 进程pid 等 日志…

IDEA指定Maven的settings不生效问题处理

文章目录 一、问题描述二、问题分析三、问题解决 一、问题描述 在Idea中手动指定了maven的settings配置文件&#xff0c;但是一直没生效。 如下图&#xff1a;设置加载settings-aliyun.xml文件&#xff0c;但是最后发现还是在加载settings.xml文件 二、问题分析 ‌在Intel…

【软考】数据库

1. 数据模型 1.1 概念数据模型 概念数据模型一般用 E-R 图表示&#xff0c;常用术语如下&#xff1a; 实体&#xff1a;客观存在的事物&#xff0c;如&#xff1a;一个单位、一个职工、一个部门、一个项目。属性&#xff1a;学生实体有学号、姓名、出生日期等属性。码&#…

oneplus6线刷、trwp、magisk(apatch)、LSPosed、Shamiko、Hide My Applist

oneplus6线刷android10.0.1 oneplus6线刷包(官方android10.0.1)下载、线刷教程&#xff1a; OnePlus6-brick-enchilada_22_K_52_210716_repack-HOS-10_0_11-zip 启用开发者模式 设置 / 连续点击6次版本号 : 启用开发者模式设置/开发者模式/{打开 usb调试, 打开 网络adb调试,…

ByteBuffer模拟拆包输出消息字符串

以下代码模拟网络编程中的粘包现象&#xff0c;用\n进行分割消息块 源码 public static void main(String[] args) {ByteBuffer byteBuffer1 ByteBuffer.allocate(60) ;byteBuffer1.put("Hello World\nWhat is you name?\nI am Licky!\nHo".getBytes());splice(byt…

成都睿明智科技有限公司怎么样可靠不?

在这个日新月异的数字时代&#xff0c;电商行业如同一股不可阻挡的洪流&#xff0c;席卷着每一个消费者的生活。而抖音&#xff0c;作为短视频与电商完美融合的典范&#xff0c;更是为无数商家开辟了一片全新的蓝海。在这片充满机遇与挑战的海洋中&#xff0c;成都睿明智科技有…

【计算机网络】多路转接之epoll

epoll也是一种linux中的多路转接方案(epoll也是只负责IO过程中的"等") 一、epoll相关接口的使用 1.epoll_create int epoll_create(int size); ​功能&#xff1a;创建一个epoll模型 ① int size&#xff1a;没意义了 >0就行 返回值&#xff1a;返回一个文件…

Linux高阶——1117—TCP客户端服务端

目录 1、sock.h socket常用函数 网络初始化函数 首次响应函数 测试IO处理函数 获取时间函数 总代码 2、sock.c SOCKET() ACCEPT()——服务端使用这个函数等待客户端连接 CONNECT()——客户端使用这个函数连接服务端 BIND()——一般只有服务端使用 LISTEN()——服务端…

【SVN和GIT】版本控制系统详细下载使用教程

文章目录 ** 参考文章一、什么是SVN和GIT二、软件使用介绍1 SVN安装1.1 服务端SVN下载地址1.2 客户端SVN下载地址2 SVN使用2.1 服务端SVN基础使用2.1.1 创建存储库和用户成员2.1.2 为存储库添加访问人员2.2 客户端SVN基础使用2.2.1 在本地下载库中的内容2.2.2 版本文件操作--更…