A regular expression (regex) is a powerful way of specifying a complex search.
MySQL uses Henry Spencer's implementation of regular
expressions, which is aimed at conformance with POSIX
1003.2. MySQL uses the extended version.
This is a simplistic reference that skips the details. To get more exact
information, see Henry Spencer's regex(7)
manual page that is
included in the source distribution. See section C Credits.
A regular expression describes a set of strings. The simplest regexp is
one that has no special characters in it. For example, the regexphello
matches hello
and nothing else.
Non-trivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regexphello|word
matches either the string hello
or the stringword
.
As a more complex example, the regexp B[an]*s
matches any of the
strings Bananas
, Baaaaas
, Bs
, and any other string
starting with a B
, ending with an s
, and containing any
number of a
or n
characters in between.
A regular expression may use any of the following special
characters/constructs:
^
-
Match the beginning of a string.
mysql> SELECT "fo\nfo" REGEXP "^fo$"; -> 0
mysql> SELECT "fofo" REGEXP "^fo"; -> 1
$
-
Match the end of a string.
mysql> SELECT "fo\no" REGEXP "^fo\no$"; -> 1
mysql> SELECT "fo\no" REGEXP "^fo$"; -> 0
.
-
Match any character (including newline).
mysql> SELECT "fofo" REGEXP "^f.*"; -> 1
mysql> SELECT "fo\nfo" REGEXP "^f.*"; -> 1
a*
-
Match any sequence of zero or morea
characters.
mysql> SELECT "Ban" REGEXP "^Ba*n"; -> 1
mysql> SELECT "Baaan" REGEXP "^Ba*n"; -> 1
mysql> SELECT "Bn" REGEXP "^Ba*n"; -> 1
a+
-
Match any sequence of one or morea
characters.
mysql> SELECT "Ban" REGEXP "^Ba+n"; -> 1
mysql> SELECT "Bn" REGEXP "^Ba+n"; -> 0
a?
-
Match either zero or onea
character.
mysql> SELECT "Bn" REGEXP "^Ba?n"; -> 1
mysql> SELECT "Ban" REGEXP "^Ba?n"; -> 1
mysql> SELECT "Baan" REGEXP "^Ba?n"; -> 0
de|abc
-
Match either of the sequencesde
orabc
.
mysql> SELECT "pi" REGEXP "pi|apa"; -> 1
mysql> SELECT "axe" REGEXP "pi|apa"; -> 0
mysql> SELECT "apa" REGEXP "pi|apa"; -> 1
mysql> SELECT "apa" REGEXP "^(pi|apa)$"; -> 1
mysql> SELECT "pi" REGEXP "^(pi|apa)$"; -> 1
mysql> SELECT "pix" REGEXP "^(pi|apa)$"; -> 0
(abc)*
-
Match zero or more instances of the sequenceabc
.
mysql> SELECT "pi" REGEXP "^(pi)*$"; -> 1
mysql> SELECT "pip" REGEXP "^(pi)*$"; -> 0
mysql> SELECT "pipi" REGEXP "^(pi)*$"; -> 1
{1}
-
{2,3}
-
The is a more general way of writing regexps that match many
occurrences of the previous atom.a*
-
Can be written asa{0,}
. a+
-
Can be written asa{1,}
. a?
-
Can be written asa{0,1}
.
To be more precise, an atom followed by a bound containing one integeri
and no comma matches a sequence of exactlyi
matches of
the atom. An atom followed by a bound containing one integeri
and a comma matches a sequence ofi
or more matches of the atom.
An atom followed by a bound containing two integersi
andj
matches a sequence ofi
throughj
(inclusive)
matches of the atom.
Both arguments must be in the range from0
toRE_DUP_MAX
(default 255), inclusive. If there are two arguments, the second must be
greater than or equal to the first. [a-dX]
-
[^a-dX]
-
Matches
any character which is (or is not, if ^ is used) eithera
,b
,c
,d
orX
. To include a literal]
character,
it must immediately follow the opening bracket[
. To include a
literal-
character, it must be written first or last. So[0-9]
matches any decimal digit. Any character that does not have
a defined meaning inside a[]
pair has no special meaning and
matches only itself.
mysql> SELECT "aXbc" REGEXP "[a-dXYZ]"; -> 1
mysql> SELECT "aXbc" REGEXP "^[a-dXYZ]$"; -> 0
mysql> SELECT "aXbc" REGEXP "^[a-dXYZ]+$"; -> 1
mysql> SELECT "aXbc" REGEXP "^[^a-dXYZ]+$"; -> 0
mysql> SELECT "gheis" REGEXP "^[^a-dXYZ]+$"; -> 1
mysql> SELECT "gheisa" REGEXP "^[^a-dXYZ]+$"; -> 0
[[.characters.]]
-
The sequence of characters of that collating element. The sequence is a
single element of the bracket expression's list. A bracket expression
containing a multi-character collating element can thus match more than
one character, for example, if the collating sequence includes ach
collating element, then the regular expression[[.ch.]]*c
matches the
first five characters ofchchcc
.
[=character_class=]
-
An equivalence class, standing for the sequences of characters of all
collating elements equivalent to that one, including itself.
For example, ifo
and(+)
are the members of an
equivalence class, then[[=o=]]
,[[=(+)=]]
, and[o(+)]
are all synonymous. An equivalence class may not be an
endpoint of a range.
[:character_class:]
-
Within a bracket expression, the name of a character class enclosed in[:
and:]
stands for the list of all characters belonging
to that class. Standard character class names are:
Name Name Name alnum digit punct alpha graph space blank lower upper cntrl print xdigit
These stand for the character classes defined in thectype(3)
manual
page. A locale may provide others. A character class may not be used as an
endpoint of a range.
mysql> SELECT "justalnums" REGEXP "[[:alnum:]]+"; -> 1
mysql> SELECT "!!" REGEXP "[[:alnum:]]+"; -> 0
[[:<:]]
-
[[:>:]]
-
These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters which
is neither preceded nor followed by word characters. A word character is
an alnum character (as defined byctype(3)
) or an underscore
(_
).
mysql> SELECT "a word a" REGEXP "[[:<:]]word[[:>:]]"; -> 1
mysql> SELECT "a xword a" REGEXP "[[:<:]]word[[:>:]]"; -> 0
mysql> SELECT "weeknights" REGEXP "^(wee|week)(knights|nights)$"; -> 1
1 comment:
This is very good information. It is usefully for beginners like me.
Thanks
Post a Comment