Firebird Documentation Index → Firebird 2.5 Language Ref. Update → Operators and predicates → SIMILAR TO |
Table of Contents
Available in: DSQL, PSQL
Added in: 2.5
Description: SIMILAR TO matches a string against an SQL regular expression
pattern. Unlike in some other languages, the pattern must match the entire string in order
to succeed – matching a substring is not enough. If any operand is
NULL
, the result is NULL
. Otherwise, the result is
TRUE
or FALSE
.
Result type: Boolean
Syntax: SIMILAR TO:
string-expression
[NOT] SIMILAR TO<pattern>
[ESCAPE<escape-char>
]<pattern>
::= an SQL regular expression<escape-char>
::= a single character
Syntax: SQL regular expressions: The following syntax defines the SQL regular expression format. It is a complete and correct top-down definition. It is also highly formal, rather long and probably perfectly fit to discourage everybody who hasn't already some experience with regular expressions (or with highly formal, rather long top-down definitions). Feel free to skip it and read the next section, Building regular expressions, which uses a bottom-up approach, aimed at the rest of us.
<regular expression>
::=<regular term>
['|'<regular term>
...]<regular term>
::=<regular factor>
...<regular factor>
::=<regular primary>
[<quantifier>
]<quantifier>
::= ? | * | + | '{'<m>
[,[<n>
]] '}'<m>
,<n>
::= unsigned int, with<m>
<=<n>
if both present<regular primary>
::=<character>
|<character class>
| % | (<regular expression>
)<character>
::=<escaped character>
|<non-escaped character>
<escaped character>
::=<escape-char>
<special character>
|<escape-char>
<escape-char>
<special character>
::= any of the characters []()|^-+*%_?{<non-escaped character>
::= any character that is not a<special character>
and not equal to<escape-char>
(if defined)<character class>
::= '_' | '['<member>
... ']' | '[^'<non-member>
... ']' | '['<member>
... '^'<non-member>
... ']'<member>
,<non-member>
::=<character>
|<range>
|<predefined class>
<range>
::=<character>
-<character>
<predefined class>
::= '[:'<predefined class name>
':]'<predefined class name>
::= ALPHA | UPPER | LOWER | DIGIT | ALNUM | SPACE | WHITESPACE
Table of Contents
Within regular expressions, most characters represent themselves. The only exceptions are the special characters below:
[ ] ( ) | ^ - + * % _ ? {
...and the escape character, if it is defined.
A regular expression that doesn't contain any special or escape characters only
matches strings that are identical to itself (subject to the collation in use). That is,
it functions just like the “=
” operator:
'Apple' similar to 'Apple' -- true 'Apples' similar to 'Apple' -- false 'Apple' similar to 'Apples' -- false 'APPLE' similar to 'Apple' -- depends on collation
The known SQL wildcards _
and %
match any
single character and a string of any length, respectively:
'Birne' similar to 'B_rne' -- true 'Birne' similar to 'B_ne' -- false 'Birne' similar to 'B%ne' -- true 'Birne' similar to 'Bir%ne%' -- true 'Birne' similar to 'Birr%ne' -- false
Notice how %
also matches the empty string.
A bunch of characters enclosed in brackets define a character class. A character in the string matches a class in the pattern if the character is a member of the class:
'Citroen' similar to 'Cit[arju]oen' -- true 'Citroen' similar to 'Ci[tr]oen' -- false 'Citroen' similar to 'Ci[tr][tr]oen' -- true
As can be seen from the second line, the class only matches a single character, not a sequence.
Within a class definition, two characters connected by a hyphen define a range. A range comprises the two endpoints and all the characters that lie between them in the active collation. Ranges can be placed anywhere in the class definition without special delimiters to keep them apart from the other elements.
'Datte' similar to 'Dat[q-u]e' -- true 'Datte' similar to 'Dat[abq-uy]e' -- true 'Datte' similar to 'Dat[bcg-km-pwz]e' -- false
The following predefined character classes can also be used in a class definition:
Latin letters a
..z
and
A
..Z
. With an accent-insensitive collation,
this class also matches accented forms of these characters.
Decimal digits 0
..9
.
Union of [:ALPHA:] and [:DIGIT:].
Uppercase Latin letters A
..Z
. Also
matches lowercase with case-insensitive collation and accented forms with
accent-insensitive collation.
Lowercase Latin letters a
..z
. Also
matches uppercase with case-insensitive collation and accented forms with
accent-insensitive collation.
Matches the space character (ASCII 32).
Matches vertical tab (ASCII 9), linefeed (ASCII 10), horizontal tab (ASCII 11), form feed (ASCII 12), carriage return (ASCII 13) and space (ASCII 32).
Including a predefined class has the same effect as including all its members. Predefined classes are only allowed within class definitions. If you need to match against a predefined class and nothing more, place an extra pair of brackets around it.
'Erdbeere' similar to 'Erd[[:ALNUM:]]eere' -- true 'Erdbeere' similar to 'Erd[[:DIGIT:]]eere' -- false 'Erdbeere' similar to 'Erd[a[:SPACE:]b]eere' -- true 'Erdbeere' similar to [[:ALPHA:]] -- false 'E' similar to [[:ALPHA:]] -- true
If a class definition starts with a caret, everything that follows is excluded from the class. All other characters match:
'Framboise' similar to 'Fra[^ck-p]boise' -- false 'Framboise' similar to 'Fr[^a][^a]boise' -- false 'Framboise' similar to 'Fra[^[:DIGIT:]]boise' -- true
If the caret is not placed at the start of the sequence, the class contains everything before the caret, except for the elements that also occur after the caret:
'Grapefruit' similar to 'Grap[a-m^f-i]fruit' -- true 'Grapefruit' similar to 'Grap[abc^xyz]fruit' -- false 'Grapefruit' similar to 'Grap[abc^de]fruit' -- false 'Grapefruit' similar to 'Grap[abe^de]fruit' -- false'3' similar to '[[:DIGIT:]^4-8]' -- true '6' similar to '[[:DIGIT:]^4-8]' -- false
Lastly, the already mentioned wildcard “_
” is a
character class of its own, matching any single character.
A question mark immediately following a character or class indicates that the preceding item may occur 0 or 1 times in order to match:
'Hallon' similar to 'Hal?on' -- false 'Hallon' similar to 'Hal?lon' -- true 'Hallon' similar to 'Halll?on' -- true 'Hallon' similar to 'Hallll?on' -- false 'Hallon' similar to 'Halx?lon' -- true 'Hallon' similar to 'H[a-c]?llon[x-z]?' -- true
An asterisk immediately following a character or class indicates that the preceding item may occur 0 or more times in order to match:
'Icaque' similar to 'Ica*que' -- true 'Icaque' similar to 'Icar*que' -- true 'Icaque' similar to 'I[a-c]*que' -- true 'Icaque' similar to '_*' -- true 'Icaque' similar to '[[:ALPHA:]]*' -- true 'Icaque' similar to 'Ica[xyz]*e' -- false
A plus sign immediately following a character or class indicates that the preceding item must occur 1 or more times in order to match:
'Jujube' similar to 'Ju_+' -- true 'Jujube' similar to 'Ju+jube' -- true 'Jujube' similar to 'Jujuber+' -- false 'Jujube' similar to 'J[jux]+be' -- true 'Jujube' sililar to 'J[[:DIGIT:]]+ujube' -- false
If a character or class is followed by a number enclosed in braces, it must be repeated exactly that number of times in order to match:
'Kiwi' similar to 'Ki{2}wi' -- false 'Kiwi' similar to 'K[ipw]{2}i' -- true 'Kiwi' similar to 'K[ipw]{2}' -- false 'Kiwi' similar to 'K[ipw]{3}' -- true
If the number is followed by a comma, the item must be repeated at least that number of times in order to match:
'Limone' similar to 'Li{2,}mone' -- false 'Limone' similar to 'Li{1,}mone' -- true 'Limone' similar to 'Li[nezom]{2,}' -- true
If the braces contain two numbers separated by a comma, the second number not smaller than the first, then the item must be repeated at least the first number and at most the second number of times in order to match:
'Mandarijn' similar to 'M[a-p]{2,5}rijn' -- true 'Mandarijn' similar to 'M[a-p]{2,3}rijn' -- false 'Mandarijn' similar to 'M[a-p]{2,3}arijn' -- true
The quantifiers ?
, *
and +
are shorthand for {0,1}
, {0,}
and
{1,}
, respectively.
Regular expression terms can be OR'ed with the |
operator. A
match is made when the argument string matches at least one of the terms:
'Nektarin' similar to 'Nek|tarin' -- false 'Nektarin' similar to 'Nektarin|Persika' -- true 'Nektarin' similar to 'M_+|N_+|P_+' -- true
One or more parts of the regular expression can be grouped into subexpressions (also called subpatterns) by placing them between parentheses. A subexpression is a regular expression in its own right. It can contain all the elements allowed in a regular expression, and can also have quantifiers added to it.
'Orange' similar to 'O(ra|ri|ro)nge' -- true 'Orange' similar to 'O(r[a-e])+nge' -- true 'Orange' similar to 'O(ra){2,4}nge' -- false 'Orange' similar to 'O(r(an|in)g|rong)?e' -- true
In order to match against a character that is special in regular expressions, that character has to be escaped. There is no default escape character; rather, the user specifies one when needed:
'Peer (Poire)' similar to 'P[^ ]+ \(P[^ ]+\)' escape '\' -- true 'Pera [Pear]' similar to 'P[^ ]+ #[P[^ ]+#]' escape '#' -- true 'Päron-Äppledryck' similar to 'P%$-Ä%' escape '$' -- true 'Pärondryck' similar to 'P%--Ä%' escape '-' -- false
The last line demonstrates that the escape character can also escape itself, if needed.
Firebird Documentation Index → Firebird 2.5 Language Ref. Update → Operators and predicates → SIMILAR TO |