8 Regular Expressions You Should Know

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

Regular expressions are a language of their own. When you learn a new programming language, they're this little sub-language that makes no sense at first glance. Many times you have to read another tutorial, article, or book just to understand the "simple" pattern described. Today, we'll review eight regular expressions that you should know for your next coding project.

Background Info on Regular Expressions

This is what Wikipedia has to say about them:

In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters. Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Now, that doesn't really tell me much about the actual patterns. The regexes I'll be going over today contains characters such as \w, \s, \1, and many others that represent something totally different from what they look like.

If you'd like to learn a little about regular expressions before you continue reading this article, I'd suggest watching the Regular Expressions for Dummies screencast series.

The eight regular expressions we'll be going over today will allow you to match a(n): username, password, email, hex value (like #fff or #000), slug, URL, IP address, and an HTML tag. As the list goes down, the regular expressions get more and more confusing. The pictures for each regex in the beginning are easy to follow, but the last four are more easily understood by reading the explanation.

The key thing to remember about regular expressions is that they are almost read forwards and backwards at the same time. This sentence will make more sense when we talk about matching HTML tags.

Note: The delimiters used in the regular expressions are forward slashes, "/". Each pattern begins and ends with a delimiter. If a forward slash appears in a regex, we must escape it with a backslash: "\/".

1. Matching a Username

Matching a username

Pattern:

Description:

We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).

String that matches:

my-us3r_n4m3

String that doesn't match:

th1s1s-wayt00_l0ngt0beausername (too long)

2. Matching a Password

Matching a password

Pattern:

Description:

Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).

String that matches:

myp4ssw0rd

String that doesn't match:

mypa$$w0rd (contains a dollar sign)

3. Matching a Hex Value

Matching a hex valud

Pattern:

Description:

We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, but to be "greedy" and capture it if it's there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and f or a number six times. The vertical bar tells us that we can also have three lowercase letters between a and f or numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f's.

String that matches:

#a3c113

String that doesn't match:

#4d82h4 (contains the letter h)

4. Matching a Slug

Matching a slug

Pattern:

Description:

You will be using this regex if you ever have to work with mod_rewrite and pretty URL's. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).

String that matches:

my-title-here

String that doesn't match:

my_title_here (contains underscores)

5. Matching an Email

Matching an email

Pattern:

Description:

We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD's (.ny.us or .co.uk). Finally, we want the end of the string ($).

String that matches:

john@doe.com

String that doesn't match:

john@doe.something (TLD is too long)

6. Matching a URL

Matching a url

Pattern:

Description:

This regex is almost like taking the ending part of the above regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all option. It allows the URL to begin with "http://", "https://", or neither of them. I have a question mark after the s to allow URL's that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally we end with the end of the line.

String that matches:

http://net.tutsplus.com/about

String that doesn't match:

http://google.com/some/file!.html (contains an exclamation point)

7. Matching an IP Address

Matching an IP address

Pattern:

Description:

Now, I'm not going to lie, I didn't write this regex; I got it from here. Now, that doesn't mean that I can't rip it apart character for character.

The first capture group really isn't a captured group because

was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It's just a bunch of character sets (things inside brackets): the string "25" followed by a number between 0 and 5; or the string "2" and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it's onto the next non-capturing group. This one wants: the string "25" followed by a number between 0 and 5; or the string "2" with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.

String that matches:

73.60.124.136 (no, that is not my IP address :P)

String that doesn't match:

256.60.124.136 (the first group must be "25" and a number between zero and five)

Advertisement

8. Matching an HTML Tag

Matching an HTML tag

Pattern:

Description:

One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag's name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag's attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag's name. Now, if that couldn't be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by "/>".

The regex is ended with the end of the line.

String that matches:

<a href="http://net.tutsplus.com/">Nettuts+</a>

String that doesn't match:

<img src="img.jpg" alt="My image>" /> (attributes can't contain greater than signs)

Conclusion

I hope that you have grasped the ideas behind regular expressions a little bit better. Hopefully you'll be using these regexes in future projects! Many times you won't need to decipher a regex character by character, but sometimes if you do this it helps you learn. Just remember, don't be afraid of regular expressions, they might not seem it, but they make your life a lot easier. Just try and pull out a tag's name from a string without regular expressions! ;)

  • Follow us on Twitter, or subscribe to the NETTUTS RSS Feed for more daily web development tuts and articles.


Advertisement

转载于:https://my.oschina.net/u/2363463/blog/635777

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/459017.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

poj 3278 catch that cow BFS(基础水)

Catch That CowTime Limit: 2000MS Memory Limit: 65536KTotal Submissions: 61826 Accepted: 19329Description Farmer John has been informed of the location of a fugitive cow and wants to catch her immediately. He starts at a point N (0 ≤ N ≤ 100,000) on a num…

服务器上装filezilla server后,本地的ftp客户端连接不上去

公司一台服务器&#xff0c;上面装了filezilla server后&#xff0c;按平常配置好了&#xff0c;但是在本地用FTP客户端不管怎么连接都连接不上&#xff0c;本地FTP客户端总提示连接失败&#xff0c;远程filezilla server的界面也没有提示有人连接&#xff0c; 仔细看了一下&am…

非法操作 login.php,阅文游戏中心 h5游戏接入wiki

阅文游戏中心《h5游戏 CP接口规范》接口要求规范游戏方接口说明&#xff1a;游戏方需按照规范提供&#xff0c;阅文进行调用阅文接口说明&#xff1a;阅文提供&#xff0c;游戏方调用参数 time 为Unix 时间戳(January 1 1970 00:00:00 GMT 起的秒数) &#xff0c;单位为秒编码统…

串口通信与编程:串口基础知识

*************************************************** 更多精彩&#xff0c;欢迎进入&#xff1a;http://shop115376623.taobao.com *************************************************** 串口是串行接口&#xff08;serial port&#xff09;的简称&#xff0c;也称为串行通信…

jmeter上传文件搞了一天,才搞定,没高人帮忙效率就是低,赶紧记下来,以备后用...

jmeter上传文件搞了一天&#xff0c;才搞定&#xff0c;没高人帮忙效率就是低&#xff0c;赶紧记下来&#xff0c;以备后用 先用谷歌浏览器抓包&#xff0c;抓到的包类似这样&#xff1a; 在jmeter里添加一个http请求&#xff0c;配置好参数&#xff0c;方法&#xff0c;端口&a…

自定义dialog

2019独角兽企业重金招聘Python工程师标准>>> R.layout.layout_insert_dialog自定义布局 View mViewLayoutInflater.from(MainActivity.this).inflate(R.layout.layout_insert_dialog, null); AlertDialog.Builder dialognew AlertDialog.Builder (MainActivity.this…

oracle的env函数用法,env命令_Linux env 命令用法详解:显示系统中已存在的环境变量...

env命令用于显示系统中已存在的环境变量&#xff0c;以及在定义的环境中执行指令。该命令只使用"-"作为参数选项时&#xff0c;隐藏了选项"-i"的功能。若没有设置任何选项和参数时&#xff0c;则直接显示当前的环境变量。如果使用env命令在新环境中执行指令…

网络通信的工作原理

*************************************************** 更多精彩&#xff0c;欢迎进入&#xff1a;http://shop115376623.taobao.com *************************************************** 1、什么是计算机网络&#xff1f; 计算机网络是由两台或两台以上的计算机通过网络设备…

Bossie Awards 2015: The best open source applicati

2019独角兽企业重金招聘Python工程师标准>>> Read about more open source winners InfoWorlds Best of Open Source Awards for 2014 celebrate more than 100 open source projects, from the bottom of the stack to the top. Follow these links to more open s…

oracle重做日志教程,Oracle教程:重做日志文件基本维护

重做日志文件最重要的用途就是用来恢复数据(其实你也可以用来logminer)&#xff0c;它记录着system global area(sga)当中的database bu重做日志文件最重要的用途就是用来恢复数据(其实你也可以用来logminer)&#xff0c;它记录着system global area(sga)当中的database buffer…

java动态代理的实现

动态代理作为代理模式的一种扩展形式&#xff0c;广泛应用于框架&#xff08;尤其是基于AOP的框架&#xff09;的设计与开发&#xff0c;本文将通过实例来讲解Java动态代理的实现过程。友情提示&#xff1a;本文略有难度&#xff0c;读者需具备代理模式相关基础知识&#xff0c…

C++基础之this指针的详解

*************************************************** 更多精彩&#xff0c;欢迎进入&#xff1a;http://shop115376623.taobao.com *************************************************** 关于C中的this指针&#xff0c;建议大家看看这篇文章&#xff0c;《C中的this指针》&a…

如何用参数化SQL语句污染你的计划缓存

你的SQL语句的参数化总是个好想法。使用参数化SQL语句你不会污染你的计划缓存——错&#xff01;&#xff01;&#xff01;在这篇文章里我想向你展示下用参数化SQL语句就可以污染你的计划缓存&#xff0c;这是非常简单的&#xff01; ADO.NET-AddWithValue ADO.NET是实现像SQL …

Ios: 如何保護iOS束文件屬性列表,圖像,SQLite,媒體文件

Ios: 如何保護iOS束文件屬性列表&#xff0c;圖像&#xff0c;SQLite&#xff0c;媒體文件我創建了Hello World示例項目&#xff0c;然後添加data.plist文件到資源文件夾。現在人們可以很容易得到束文件解壓縮。國際音標。有任何的方法來保護data.plist文件保存在iPhone應用程序…

w3wp oracle,w3wp.exe占用CPU超过50%的处理

w3wp.exe占用CPU超过50%的处理1.查看CPU占用高的进程&#xff1a;任务管理器C:\Documents andSettings\Administrator>iisappW3WP.exe PID: 18008 AppPoolId: STATW3WP.exe PID: 8328 AppPoolId: STATW3WP.exe PID: 17868 AppPoolId: JYCV16W3WP.exe PID: 16652 AppPoolId: …

论两种学习模式

引言 A&#xff1a;你是如何学习的&#xff0c;通过视频、书籍和实践结合&#xff1f;B&#xff1a;不是&#xff0c;一般情况是以一个问题为点去画线和面。 两种学习模式 按部就班方式获取知识(通过书、视频)缺点 信息接收者缺乏深度思考和探索信息发布者的知识体系不一定适合…

启动mq命令 linux,RocketMQ:Linux下启动server和broker的命令

目录QUESTION:RocketMQ&#xff1a;Linux下启动server和broker的命令?ANSWER:一、启动mqnamesrv1.1当前执行1.2后台运行二、启动mqbroker2.1当前执行2.2后台运行QUESTION:RocketMQ&#xff1a;Linux下启动server和broker的命令?ANSWER:一、启动mqnamesrv1.1当前执行进入rocke…

C++中int *p[4]和 int (*q)[4]的区别

*************************************************** 更多精彩&#xff0c;欢迎进入&#xff1a;http://shop115376623.taobao.com *************************************************** C中int *p[4]和 int (*q)[4]的区别 前者是指针数组&#xff0c;后者是指向数组的指针…

linux不用命令开启ssh,不用密码也能ssh登陆Linux?

Linux的一个后门引发对PAM的探究1.1 起因今天在搜索关于Linux下的后门姿势时&#xff0c;发现一条命令如下&#xff1a;ln -sf /usr/sbin/sshd /tmp/su; /tmp/su -oPort5555;经典后门。直接对sshd建立软连接&#xff0c;之后用任意密码登录即可。ssh rootx.x.x.x -p 5555这个是…

ScrollView常用(暂时用上了的)代理方法

2019独角兽企业重金招聘Python工程师标准>>> ScrollView常用代理方法: #pragma mark - 滚动结束调用 -(void)scrollViewDidEndDecelerating:(UIScrollView *)scrollView {// 计算 滑动到了第几页double page scrollView.contentOffset.x / scrollView.width;self.p…