在线练习网址:https://regex101.com/r/2ITLQ4/1
# 加载显示包
library(htmltools)
library(htmlwidgets)
两个有用的教学函数:
str_view(string, pattern, match = NA)
, 匹配到一次 pattern 即停止str_view_all(string, pattern, match = NA)
,匹配所有符合 pattern 的字符串子集inferregex::infer_regex()
## remotes::install_github("daranzolin/inferregex")
library(inferregex)
<- "abcd-9999-ab9"
s infer_regex(s)$regex
#> [1] "^[a-z]{4}-\\d{4}-[a-z]{2}\\d$"
字符串通常包含的是非结构化或半结构化数据,正则表达式 (regular expression) 可以用简练的语言来描述字符串中的模式。
正则表达式主要依赖于元字符。元字符不代表他们本身的字面意思,他们都有特殊的含义。以下是一些元字符的介绍:
元字符 | 描述 |
---|---|
字符集 | |
[ ] |
字符种类. 匹配方括号内的任意字符,[ ] 可以理解为或 |
[^ ] |
否定的字符种类,匹配除了方括号里的任意字符 |
重复次数 | |
* |
等价于{0, },匹配 >= 0个重复的在* 号之前的字符,默认贪婪匹配 |
+ |
等价于{1, },匹配 >=1 个重复的+ 号前的字符,默认贪婪匹配 |
? |
等价于{0, 1},匹配 0 或 1 个? 之前的字符;但若跟在* 、+ 、? 或{m, n} 后面,则表示更改匹配方式为”懒惰的”1 |
{n,m} |
匹配 num 个大括号之前的字符 (n <= num <= m)。可以用{n,}表示至少n个,但没有{,m}的写法,因为用{0,m}就可以了 |
特定群 | |
(xyz) |
字符集,匹配与 xyz 完全相等的字符串. |
| |
或运算符,匹配符号前或后的字符(串)。[] 的或只涉及单个字符,这是它与| 的区别。 |
转义 | 注意:在 R 的字符串中转义字符必须写成 \\ ,即正则表达式的字符串形式。若要匹配'\' ,则需在字符串中写成 \\\\ |
\ |
用于匹配一些在正则表达式中具有特殊意义的符号 [ ] ( ) { } . * + ? ^ $ \ | |
锚点 | |
^ |
仅从开端开始匹配. |
$ |
仅从末端开始匹配. |
^...$ |
从头到尾严格匹配,模式必须与目标字符串完全相等 |
前后预查 | 有点像条件匹配 |
...(?=...) |
正先行断言,条件:后面存在 |
...(?!...) |
负先行断言,条件:后面不存在 |
(?<=...)... |
正后发断言,条件:前面存在 |
(?<!...)... |
负后发断言,条件:前面不存在 |
标志 | JavaScript 中的正则表达式 |
/.../g |
搜索全部,而不是第一个匹配的 |
/.../i |
忽略大小写 |
/.../m |
多行匹配 |
简写 | |
. |
匹配任意单个字符,除了换行符. |
\w |
word的缩写,匹配所有字母、数字和下划线, 等同于 [a-zA-Z0-9_] |
\W |
匹配所有非字母数字下划线,等同于: [^\w] ,==很适合作为分隔符用来分词== |
\d |
digital缩写,匹配数字: [0-9] |
\D |
匹配非数字: [^\d] |
\s |
space缩写,匹配所有空格字符, 等同于: [\t\n\f\r\p{Z}] |
\S |
匹配所有非空格字符: [^\s] |
\f |
匹配一个换页符 |
\n |
匹配一个换行符 |
\r |
匹配一个回车符 |
\t |
匹配一个制表符 |
\v |
匹配一个垂直制表符 |
\p |
匹配 CR/LF (等同于 ),用来匹配 DOS 行终止符 |
[ ]
# 匹配The或the
str_view_all("The car parked in the garage.", "[Tt]he")
# 匹配ar加.(没有转义,就是句号)
str_view_all("A garage is a good place to park a car.", "ar[.]")
[^ ]
# 非c开头加ar
str_view_all("The car parked in the garage.", "[^c]ar")
*
, +
, ?
# 任意长度的(>=0个)小写英文字母
str_view_all("The car parked in the garage.", "[a-z]*")
# 以0或更多个空格开头并以0或更多个空格结尾、中间是cat的字符串
str_view_all("The fat cat sat on the concatenation.", "\\s*cat\\s*")
# c开头t结尾,中间一个或多个任意字符
str_view_all("The fat cat sat on the mat.", "c.+t")
# 0个或1个T,即有无均可。The或he
str_view_all("The car is parked in the garage.", "[T]?he")
# 匹配'u'0次或1次
str_view(c("color", "colour", "colo", "r"), "colou?r")
{}
# 仅限2-3位的纯数字
str_view_all("The number was 9.9997 but we rounded it off to 10.0.", "[0-9]{2,3}")
# 2至无穷大位
str_view_all("The number was 9.9997 but we rounded it off to 10.0.", "[0-9]{2,}")
# 3位
str_view_all("The number was 9.9997 but we rounded it off to 10.0.", "[0-9]{3}")
?
懒惰匹配# 尽量长
str_view("The cat sat on cat.", ".+at")
# 尽量短
str_view("The cat sat on cat.", ".+?at")
<- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII."
x # 匹配第二个'C'0次或1次,贪婪匹配尽量长
str_view(x, "CC?")
# 懒惰匹配
str_view(x, "CC??")
# 匹配第二个'C'1次或多次、尽量长的字符串
str_view(x, "CC+")
# 匹配'CC'后面有'C'或'L'1次或多次、尽量长的字符串
str_view(x, "CC[CL]+")
# 匹配2个'C'
str_view(x, "C{2}")
# 匹配2个以上'C'
str_view(x, "C{2,}")
# 匹配2到3个'C'
str_view(x, "C{2,3}")
# 匹配2到3个'C'、尽量短
str_view(x, "C{2,3}?")
# 匹配'C'后面有'L'或'X'1次(默认)
str_view(x, "C[LX]")
# 匹配'C'后面有'L'或'X'1次或多次、尽量长的字符串
str_view(x, "C[LX]+")
# 匹配'C'后面有'L'或'X'1次或多次、尽量短的字符串
str_view(x, "C[LX]+?")
()
群组群,将多个字符组成一个单位。可以在 ()
中用 |
表示或
str_view(c("grey", "gray"), "gr(e|a)y")
# 匹配car或gar或par
str_view_all("The car is parked in the garage.", "(c|g|p)ar")
# 匹配The或the或car
str_view_all("The car is parked in the garage.", "(T|t)he|car")
\\
转义# a\.是一个正则表达式,'a\\.'是该正则表达式的字符串形式
str_view(c("abc", "a.c", "bef"), "a\\.")
# fat或cat或mat后接'.'0次或1次
str_view_all("The fat cat sat on the mat.", "(f|c|m)at\\.?")
# 左管道打印字符串,右管道显示如何匹配。
# 字符串\\\\,相当于正则表达式的\\,经过转义才能匹配'\'
"a\\b" %T>% writeLines() %>% str_view("\\\\")
#> a\b
^
和$
# 从字符串开端匹配The或the
str_view_all("The car is parked in the garage.", "^(T|t)he")
# 从字符串末端匹配at.
str_view_all("The fat cat. sat. on the mat.", "(at\\.)$")
<- c("apple pie", "apple", "apple cake")
x str_view(x, "apple")
# 从头到尾严格匹配字符串的全部,不再是找到部分即可
str_view(x, "^apple$")
()
里面是一个判断条件。
# 匹配后面紧随着空格和fat的The或the
str_view_all("The fat cat sat on the mat.", "(T|t)he(?=\\sfat)")
# 匹配其后不跟随者空格和fat的The或the
str_view_all("The fat cat sat on the mat.", "(T|t)he(?!\\sfat)")
# 匹配其前紧跟The或the加空格的fat或mat
str_view_all("The fat cat sat on the mat.", "(?<=(T|t)he\\s)(fat|mat)")
# 匹配其前没有The或the加空格的cat
str_view_all("The cat sat on cat.", "(?<!(T|t)he\\s)(cat)")
括号还可以定义”分组”,可以通过回溯引用(如\1、\2等)来引用这些分组。例如,以下的正则表达式可以找出名称中有重复的一对字母的所有水果:
# .为任意字符,\1为引用1次括号中的分组
# 该正则表达式表示abab型字符串
str_view(fruit, "(..)\\1", match = TRUE)
# abcab型
str_view(fruit, "(..)(.)\\1", match = TRUE)
# abcc型
str_view(fruit, "(..)(.)\\2", match = TRUE)
str_detect()
返回与输入向量具有同样长度的逻辑向量
<- c("apple", "banana", "pear")
x str_detect(x, "e")
#> [1] TRUE FALSE TRUE
# 有多少个以t开头的常用单词?
sum(str_detect(words, "^t"))
#> [1] 65
# 以元音字母结尾的常用单词的比例是多少?
mean(str_detect(words, "[aeiou]$"))
#> [1] 0.2765306
当逻辑条件非常复杂时(例如,匹配 a 或 b,但不匹配 c,除非 d 成立),应该将其分解为几个更小的子表达式,将每个子表达式使用str_detect()
的匹配结果赋给一个变量,并使用逻辑运算组合起来。
## 寻找不含aeiou的单词
# 找出常用词中至少包含一个元音字母的所有单词,然后取反
<- !str_detect(words, "[aeiou]")
no_vowels_1 # 展示不包含aeiou的常用单词
words[no_vowels_1]#> [1] "by" "dry" "fly" "mrs" "try" "why"
# 找出仅包含辅音字母(非元音字母)的所有单词
# [^aeiou]表示非aeiou,即辅音字母
# ^辅音字母+$,表示从头到尾1个或多个辅音字母的完全匹配
<- str_detect(words, "^[^aeiou]+$")
no_vowels_2
words[no_vowels_2]#> [1] "by" "dry" "fly" "mrs" "try" "why"
identical(no_vowels_1, no_vowels_2)
#> [1] TRUE
str_detect(words, "x$")]
words[#> [1] "box" "sex" "six" "tax"
str_subset(words, "x$")
#> [1] "box" "sex" "six" "tax"
<- tibble(
df word = words,
i = seq_along(word)
)%>%
df filter(str_detect(words, "x$"))
#> # A tibble: 4 x 2
#> word i
#> <chr> <int>
#> 1 box 108
#> 2 sex 747
#> 3 six 772
#> 4 tax 841
str_count()
返回字符串中与pattern匹配的子集的数量
<- c("apple", "banana", "pear")
x str_count(x, "a")
#> [1] 1 3 1
# 平均来看,每个单词中有多少个元音字母?
mean(str_count(words, "[aeiou]"))
#> [1] 1.991837
%>%
df mutate(
vowels = str_count(word, "[aeiou]"),
consonants = str_count(word, "[^aeiou]")
)#> # A tibble: 980 x 4
#> word i vowels consonants
#> <chr> <int> <int> <int>
#> 1 a 1 1 0
#> 2 able 2 2 2
#> 3 about 3 3 2
#> 4 absolute 4 4 4
#> 5 accept 5 2 4
#> 6 account 6 3 4
#> 7 achieve 7 4 3
#> 8 across 8 2 4
#> 9 act 9 1 2
#> 10 active 10 3 3
#> # ... with 970 more rows
注意,匹配从来不会重叠。例如,在 “abababa” 中,模式 “aba” 会匹配多少次?正则表达式会告诉你是 2 次,而不是 3 次
str_count("abababa", "aba")
#> [1] 2
str_view("abababa", "aba")
str_view_all("abababa", "aba")
str_extract()
和str_extract()_all
# sentences数据集
head(stringr::sentences)
#> [1] "The birch canoe slid on the smooth planks."
#> [2] "Glue the sheet to the dark blue background."
#> [3] "It's easy to tell the depth of a well."
#> [4] "These days a chicken leg is a rare dish."
#> [5] "Rice is often served in round bowls."
#> [6] "The juice of lemons makes fine punch."
<- c(
colors "red", "orange", "yellow", "green", "blue", "purple"
)<- str_c(colors, collapse = "|")
color_match
color_match#> [1] "red|orange|yellow|green|blue|purple"
# 字符串中的'|'转换为正则表达式是“或”
# 选取含有颜色的句子子集
<- str_subset(sentences, color_match)
has_color <- str_extract(has_color, color_match) # 提取匹配内容
matches
matches#> [1] "blue" "blue" "red" "red" "red" "blue" "yellow" "red"
#> [9] "red" "green" "red" "red" "blue" "red" "red" "red"
#> [17] "red" "blue" "red" "blue" "red" "green" "red" "red"
#> [25] "red" "red" "red" "red" "green" "red" "green" "red"
#> [33] "purple" "green" "red" "red" "red" "red" "red" "blue"
#> [41] "red" "blue" "red" "red" "red" "red" "green" "green"
#> [49] "green" "red" "red" "yellow" "red" "orange" "red" "red"
#> [57] "red"
# 包含多于一种颜色的句子
<- sentences[str_count(sentences, color_match) > 1]
more str_view_all(more, color_match)
str_extract(more, color_match)
#> [1] "blue" "green" "orange"
注意,str_extract()
只提取第一个匹配,,因为单个匹配可以使用更简单的数据结构。要想得到所有匹配,可以使用str_extract_all()
函数,它会返回一个列表:
str_extract_all(more, color_match)
#> [[1]]
#> [1] "blue" "red"
#>
#> [[2]]
#> [1] "green" "red"
#>
#> [[3]]
#> [1] "orange" "red"
如果设置了参数simplify=TRUE,那么str_extract_all()
会返回一个矩阵,其中较短的匹配会扩展到与最长的匹配具有同样的长度:
str_extract_all(more, color_match, simplify = TRUE)
#> [,1] [,2]
#> [1,] "blue" "red"
#> [2,] "green" "red"
#> [3,] "orange" "red"
<- c("a", "a b", "a b c")
x str_extract_all(x, "[a-z]")
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "a" "b"
#>
#> [[3]]
#> [1] "a" "b" "c"
str_extract_all(x, "[a-z]", simplify = TRUE)
#> [,1] [,2] [,3]
#> [1,] "a" "" ""
#> [2,] "a" "b" ""
#> [3,] "a" "b" "c"
()
分组提取pattern中用()
分好组,便可以用str_match()
提取了。
例:我们想从句子中提取出名词。我们先进行一种启发式实验,找出跟在 a 或 the 后面的所有单词。因为使用正则表达式定义”单词”有一点难度,所以我们使用一种简单的近似定义——至少有1个非空格字符的字符序列:
# a或the+空格+一个或多个非空格字符
<- "(a|the) ([^ ]+)"
noun <- sentences %>%
has_noun str_subset(noun) %>%
head(10)
%>%
has_noun str_extract(noun)
#> [1] "the smooth" "the sheet" "the depth" "a chicken" "the parked"
#> [6] "the sun" "the huge" "the ball" "the woman" "a helps"
str_match() 函数可以给出每个独立分组。str_match()返回的不是字符向量,而是一个矩阵,其中一列是完整匹配,后面的列是每个分组的匹配
%>%
has_noun str_match(noun)
#> [,1] [,2] [,3]
#> [1,] "the smooth" "the" "smooth"
#> [2,] "the sheet" "the" "sheet"
#> [3,] "the depth" "the" "depth"
#> [4,] "a chicken" "a" "chicken"
#> [5,] "the parked" "the" "parked"
#> [6,] "the sun" "the" "sun"
#> [7,] "the huge" "the" "huge"
#> [8,] "the ball" "the" "ball"
#> [9,] "the woman" "the" "woman"
#> [10,] "a helps" "a" "helps"
%>%
has_noun str_match_all(noun)
#> [[1]]
#> [,1] [,2] [,3]
#> [1,] "the smooth" "the" "smooth"
#>
#> [[2]]
#> [,1] [,2] [,3]
#> [1,] "the sheet" "the" "sheet"
#> [2,] "the dark" "the" "dark"
#>
#> [[3]]
#> [,1] [,2] [,3]
#> [1,] "the depth" "the" "depth"
#> [2,] "a well." "a" "well."
#>
#> [[4]]
#> [,1] [,2] [,3]
#> [1,] "a chicken" "a" "chicken"
#> [2,] "a rare" "a" "rare"
#>
#> [[5]]
#> [,1] [,2] [,3]
#> [1,] "the parked" "the" "parked"
#>
#> [[6]]
#> [,1] [,2] [,3]
#> [1,] "the sun" "the" "sun"
#>
#> [[7]]
#> [,1] [,2] [,3]
#> [1,] "the huge" "the" "huge"
#> [2,] "the clear" "the" "clear"
#>
#> [[8]]
#> [,1] [,2] [,3]
#> [1,] "the ball" "the" "ball"
#>
#> [[9]]
#> [,1] [,2] [,3]
#> [1,] "the woman" "the" "woman"
#>
#> [[10]]
#> [,1] [,2] [,3]
#> [1,] "a helps" "a" "helps"
#> [2,] "the evening." "the" "evening."
如果数据是保存在tibble中的,那么使用tidyr::extract()会更容易。这个函数的工作方式与str_match()函数类似,只是要求为每个分组提供一个名称,以作为新列放在 tibble 中
tibble(sentence = sentences) %>%
::extract(
tidyrc("article", "noun"), "(a|the) ([^ ]+)",
sentence, remove = FALSE
)#> # A tibble: 720 x 3
#> sentence article noun
#> <chr> <chr> <chr>
#> 1 The birch canoe slid on the smooth planks. the smooth
#> 2 Glue the sheet to the dark blue background. the sheet
#> 3 It's easy to tell the depth of a well. the depth
#> 4 These days a chicken leg is a rare dish. a chicken
#> 5 Rice is often served in round bowls. <NA> <NA>
#> 6 The juice of lemons makes fine punch. <NA> <NA>
#> 7 The box was thrown beside the parked truck. the parked
#> 8 The hogs were fed chopped corn and garbage. <NA> <NA>
#> 9 Four hours of steady work faced us. <NA> <NA>
#> 10 Large size in stockings is hard to sell. <NA> <NA>
#> # ... with 710 more rows
str_replace()
, str_replace_all()
<- c("apple", "pear", "banana")
x str_replace(x, "[aeiou]", "-")
#> [1] "-pple" "p-ar" "b-nana"
str_replace_all(x, "[aeiou]", "-")
#> [1] "-ppl-" "p--r" "b-n-n-"
<- c("1 house", "2 cars", "3 people")
x str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
#> [1] "one house" "two cars" "three people"
还可以使用回溯引用来插入匹配中的分组。在下面的代码中,我们交换了第二个单词和第三个单词的顺序:
# [^\\s]+代表多个非空格字符,即一个单词
# '([^\\s]+) ([^\\s]+) ([^\\s]+)'是前三个单词
%>%
sentences str_replace("([^\\s]+) ([^\\s]+) ([^\\s]+)", "\\1 \\3 \\2") %>%
head(5)
#> [1] "The canoe birch slid on the smooth planks."
#> [2] "Glue sheet the to the dark blue background."
#> [3] "It's to easy tell the depth of a well."
#> [4] "These a days chicken leg is a rare dish."
#> [5] "Rice often is served in round bowls."
str_split()
因为字符向量的每个分量会包含不同数量的片段,所以 str_split() 会返回一个列表:
%>%
sentences head(5) %>%
str_split(" ")
#> [[1]]
#> [1] "The" "birch" "canoe" "slid" "on" "the" "smooth"
#> [8] "planks."
#>
#> [[2]]
#> [1] "Glue" "the" "sheet" "to" "the"
#> [6] "dark" "blue" "background."
#>
#> [[3]]
#> [1] "It's" "easy" "to" "tell" "the" "depth" "of" "a" "well."
#>
#> [[4]]
#> [1] "These" "days" "a" "chicken" "leg" "is" "a"
#> [8] "rare" "dish."
#>
#> [[5]]
#> [1] "Rice" "is" "often" "served" "in" "round" "bowls."
如果你拆分的是长度为1的向量,那么只要简单地提取列表的第一个元素即可:
# 字符串为"\\|",正则表达式为\|,编译时的含义为字符'|'
"a|b|c|d" %>%
str_split("\\|") %>%
::extract2(1)
magrittr#> [1] "a" "b" "c" "d"
也可以通过设置 simplify = TRUE 返回一个矩阵:
%>%
sentences head(5) %>%
str_split(" ", simplify = TRUE)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
#> [2,] "Glue" "the" "sheet" "to" "the" "dark" "blue" "background."
#> [3,] "It's" "easy" "to" "tell" "the" "depth" "of" "a"
#> [4,] "These" "days" "a" "chicken" "leg" "is" "a" "rare"
#> [5,] "Rice" "is" "often" "served" "in" "round" "bowls." ""
#> [,9]
#> [1,] ""
#> [2,] ""
#> [3,] "well."
#> [4,] "dish."
#> [5,] ""
还可以设定拆分片段的最大数量:
<- c("Name: Hadley", "Country: NZ", "Age: 35: 1980")
fields %>% str_split(": ", simplify = TRUE)
fields #> [,1] [,2] [,3]
#> [1,] "Name" "Hadley" ""
#> [2,] "Country" "NZ" ""
#> [3,] "Age" "35" "1980"
%>% str_split(": ", n = 2, simplify = TRUE)
fields #> [,1] [,2]
#> [1,] "Name" "Hadley"
#> [2,] "Country" "NZ"
#> [3,] "Age" "35: 1980"
除了模式,你还可以通过字母、行、句子和单词边界(boundary() 函数)来拆分字符串:
<- "This is a sentence. This is another sentence."
x str_view_all(x, boundary("word")) # 匹配word数据集中的常用词
str_split(x, " ")[[1]]
#> [1] "This" "is" "a" "sentence." "This" "is"
#> [7] "another" "sentence."
str_split(x, boundary("word"))[[1]]
#> [1] "This" "is" "a" "sentence" "This" "is" "another"
#> [8] "sentence"
str_locate()
, str_locate_all()
它们可以给出每个匹配的开始位置和结束位置。你可以使用 str_locate() 函数找出匹配的模式,然后使用 str_sub() 函数来提取或修改匹配的内容。
当使用一个字符串作为模式时,R会自动调用regex()函数对其进行包装:
# 正常调用:
str_view(fruit, "nana")
# 上面形式是以下形式的简写
str_view(fruit, regex("nana"))
可以使用 regex() 函数的其他参数来控制具体的匹配方式:
ignore_case = TRUE 既可以匹配大写字母,也可以匹配小写字母
<- c("banana", "Banana", "BANANA")
bananas str_view(bananas, "banana")
str_view(bananas, regex("banana", ignore_case = TRUE))
multiline = TRUE可以使得^和$从每行的开头和末尾开始匹配,而不是从完整字符串的开头和末尾开始匹配:
<- "Line 1\nLine 2\nLine 3"
x str_extract_all(x, "^Line")[[1]]
#> [1] "Line"
str_extract_all(x, regex("^Line", multiline = TRUE))[[1]]
#> [1] "Line" "Line" "Line"
comments = TRUE 可以让你在复杂的正则表达式中加入注释和空白字符,以便更易理解。匹配时会忽略空格和#后面的内容。如果想要匹配一个空格,你需要对其进行转义: “\”:
<- regex("
phone \\(? # 可选的左小括号
(\\d{3}) # 地区编码
[)- ]? # 可选的右小括号、短划线或空格
(\\d{3}) # 另外3个数字
[ -]? # 可选的空格或短划线
(\\d{3}) # 另外3个数字
", comments = TRUE)
str_match("514-791-8141", phone)
#> [,1] [,2] [,3] [,4]
#> [1,] "514-791-814" "514" "791" "814"
dotall = TRUE 可以使得’.’匹配包括 \n 在内的所有字符。
fixed() 函数可以按照字符串的字节形式进行精确匹配,它会忽略正则表达式中的所有特殊字符,并在非常低的层次上进行操作。这样可以不用进行那些复杂的转义操作,而且速度比普通正则表达式要快很多。
但是,在匹配非英语数据时,要慎用 fixed() 函数。它可能会出现问题,因为此时同一个字符经常有多种表达方式。
::microbenchmark(
microbenchmarkfixed = str_detect(sentences, fixed("the")),
regex = str_detect(sentences, "the"),
times = 20
)#> Unit: microseconds
#> expr min lq mean median uq max neval
#> fixed 76.001 78.2010 94.0309 79.9010 87.6505 315.701 20
#> regex 265.001 271.5505 284.1209 275.5015 280.1505 443.301 20
函数使用标准排序规则来比较字符串,这在进行不区分大小写的匹配时是非常有效的。
默认的匹配方式是”贪婪的”,意为在符合条件的所有可能的匹配中,正则表达式会匹配尽量长的字符串。通过在正则表达式后面添加一个?
,可以将匹配方式更改为”懒惰的”,即匹配尽量短的相应字符串。↩︎