SIMILAR TO

Table of Contents

Building regular expressions

Tip

Find a more recent version at Firebird 5.0 Language Reference: SIMILAR TO

Available in: DSQL, PSQL

Added in: 2.5

Description: SIMILAR TO matches a string against an SQL regular expression pattern. Unlike in some other languages, the pattern must match the entire string in order to succeed – matching a substring is not enough. If any operand is NULL, the result is NULL. Otherwise, the result is TRUE or FALSE.

Result type: Boolean

Syntax: SIMILAR TO:

string-expression [NOT] SIMILAR TO <pattern> [ESCAPE <escape-char>]

<pattern>      ::=  an SQL regular expression
<escape-char>  ::=  a single character

Syntax: SQL regular expressions: The following syntax defines the SQL regular expression format. It is a complete and correct top-down definition. It is also highly formal, rather long and probably perfectly fit to discourage everybody who hasn't already some experience with regular expressions (or with highly formal, rather long top-down definitions). Feel free to skip it and read the next section, Building regular expressions, which uses a bottom-up approach, aimed at the rest of us.

<regular expression>     ::=  <regular term> ['|' <regular term> ...]

<regular term>           ::=  <regular factor> ...

<regular factor>         ::=  <regular primary> [<quantifier>]

<quantifier>             ::=  ?
                              | *
                              | +
                              | '{' <m> [,[<n>]] '}'

<m>, <n>                 ::=  unsigned int, with <m> <= <n> if both present

<regular primary>        ::=  <character>
                              | <character class>
                              | %
                              | (<regular expression>)

<character>              ::=  <escaped character>
                              | <non-escaped character>

<escaped character>      ::=  <escape-char> <special character>
                              | <escape-char> <escape-char>

<special character>      ::=  any of the characters []()|^-+*%_?{

<non-escaped character>  ::=  any character that is not a <special character>
                              and not equal to <escape-char> (if defined)

<character class>        ::=  '_'
                              | '[' <member> ... ']'
                              | '[^' <non-member> ... ']'
                              | '[' <member> ... '^' <non-member> ... ']'

<member>, <non-member>   ::=  <character>
                              | <range>
                              | <predefined class>

<range>                  ::=  <character>-<character>

<predefined class>       ::=  '[:' <predefined class name> ':]'

<predefined class name>  ::=  ALPHA | UPPER | LOWER | DIGIT
                              | ALNUM | SPACE | WHITESPACE

Building regular expressions

Table of Contents

Characters
Wildcards
Character classes
Quantifiers
OR-ing terms
Subexpressions
Escaping special characters

Characters

Within regular expressions, most characters represent themselves. The only exceptions are the special characters below:

[ ] ( ) | ^ - + * % _ ? {

...and the escape character, if it is defined.

A regular expression that doesn't contain any special or escape characters only matches strings that are identical to itself (subject to the collation in use). That is, it functions just like the “=” operator:

'Apple' similar to 'Apple'              -- true
'Apples' similar to 'Apple'             -- false
'Apple' similar to 'Apples'             -- false
'APPLE' similar to 'Apple'              -- depends on collation

Wildcards

The known SQL wildcards _ and % match any single character and a string of any length, respectively:

'Birne' similar to 'B_rne'              -- true
'Birne' similar to 'B_ne'               -- false
'Birne' similar to 'B%ne'               -- true
'Birne' similar to 'Bir%ne%'            -- true
'Birne' similar to 'Birr%ne'            -- false

Notice how % also matches the empty string.

Character classes

A bunch of characters enclosed in brackets define a character class. A character in the string matches a class in the pattern if the character is a member of the class:

'Citroen' similar to 'Cit[arju]oen'     -- true
'Citroen' similar to 'Ci[tr]oen'        -- false
'Citroen' similar to 'Ci[tr][tr]oen'    -- true

As can be seen from the second line, the class only matches a single character, not a sequence.

Within a class definition, two characters connected by a hyphen define a range. A range comprises the two endpoints and all the characters that lie between them in the active collation. Ranges can be placed anywhere in the class definition without special delimiters to keep them apart from the other elements.

'Datte' similar to 'Dat[q-u]e'          -- true
'Datte' similar to 'Dat[abq-uy]e'       -- true
'Datte' similar to 'Dat[bcg-km-pwz]e'   -- false

The following predefined character classes can also be used in a class definition:

[:ALPHA:]: Latin letters a..z and A..Z. With an accent-insensitive collation, this class also matches accented forms of these characters.
[:DIGIT:]: Decimal digits 0..9.
[:ALNUM:]: Union of [:ALPHA:] and [:DIGIT:].
[:UPPER:]: Uppercase Latin letters A..Z. Also matches lowercase with case-insensitive collation and accented forms with accent-insensitive collation.
[:LOWER:]: Lowercase Latin letters a..z. Also matches uppercase with case-insensitive collation and accented forms with accent-insensitive collation.
[:SPACE:]: Matches the space character (ASCII 32).
[:WHITESPACE:]: Matches vertical tab (ASCII 9), linefeed (ASCII 10), horizontal tab (ASCII 11), form feed (ASCII 12), carriage return (ASCII 13) and space (ASCII 32).

Including a predefined class has the same effect as including all its members. Predefined classes are only allowed within class definitions. If you need to match against a predefined class and nothing more, place an extra pair of brackets around it.

'Erdbeere' similar to 'Erd[[:ALNUM:]]eere'     -- true
'Erdbeere' similar to 'Erd[[:DIGIT:]]eere'     -- false
'Erdbeere' similar to 'Erd[a[:SPACE:]b]eere'   -- true
'Erdbeere' similar to [[:ALPHA:]]              -- false
'E'        similar to [[:ALPHA:]]              -- true

If a class definition starts with a caret, everything that follows is excluded from the class. All other characters match:

'Framboise' similar to 'Fra[^ck-p]boise'       -- false
'Framboise' similar to 'Fr[^a][^a]boise'       -- false
'Framboise' similar to 'Fra[^[:DIGIT:]]boise'  -- true

If the caret is not placed at the start of the sequence, the class contains everything before the caret, except for the elements that also occur after the caret:

'Grapefruit' similar to 'Grap[a-m^f-i]fruit'   -- true
'Grapefruit' similar to 'Grap[abc^xyz]fruit'   -- false
'Grapefruit' similar to 'Grap[abc^de]fruit'    -- false
'Grapefruit' similar to 'Grap[abe^de]fruit'    -- false

'3' similar to '[[:DIGIT:]^4-8]'               -- true
'6' similar to '[[:DIGIT:]^4-8]'               -- false

Lastly, the already mentioned wildcard “_” is a character class of its own, matching any single character.

Quantifiers

A question mark immediately following a character or class indicates that the preceding item may occur 0 or 1 times in order to match:

'Hallon' similar to 'Hal?on'                   -- false
'Hallon' similar to 'Hal?lon'                  -- true
'Hallon' similar to 'Halll?on'                 -- true
'Hallon' similar to 'Hallll?on'                -- false
'Hallon' similar to 'Halx?lon'                 -- true
'Hallon' similar to 'H[a-c]?llon[x-z]?'        -- true

An asterisk immediately following a character or class indicates that the preceding item may occur 0 or more times in order to match:

'Icaque' similar to 'Ica*que'                  -- true
'Icaque' similar to 'Icar*que'                 -- true
'Icaque' similar to 'I[a-c]*que'               -- true
'Icaque' similar to '_*'                       -- true
'Icaque' similar to '[[:ALPHA:]]*'             -- true
'Icaque' similar to 'Ica[xyz]*e'               -- false

A plus sign immediately following a character or class indicates that the preceding item must occur 1 or more times in order to match:

'Jujube' similar to 'Ju_+'                     -- true
'Jujube' similar to 'Ju+jube'                  -- true
'Jujube' similar to 'Jujuber+'                 -- false
'Jujube' similar to 'J[jux]+be'                -- true
'Jujube' sililar to 'J[[:DIGIT:]]+ujube'       -- false

If a character or class is followed by a number enclosed in braces, it must be repeated exactly that number of times in order to match:

'Kiwi' similar to 'Ki{2}wi'                    -- false
'Kiwi' similar to 'K[ipw]{2}i'                 -- true
'Kiwi' similar to 'K[ipw]{2}'                  -- false
'Kiwi' similar to 'K[ipw]{3}'                  -- true

If the number is followed by a comma, the item must be repeated at least that number of times in order to match:

'Limone' similar to 'Li{2,}mone'               -- false
'Limone' similar to 'Li{1,}mone'               -- true
'Limone' similar to 'Li[nezom]{2,}'            -- true

If the braces contain two numbers separated by a comma, the second number not smaller than the first, then the item must be repeated at least the first number and at most the second number of times in order to match:

'Mandarijn' similar to 'M[a-p]{2,5}rijn'       -- true
'Mandarijn' similar to 'M[a-p]{2,3}rijn'       -- false
'Mandarijn' similar to 'M[a-p]{2,3}arijn'      -- true

The quantifiers ?, * and + are shorthand for {0,1}, {0,} and {1,}, respectively.

OR-ing terms

Regular expression terms can be OR'ed with the | operator. A match is made when the argument string matches at least one of the terms:

'Nektarin' similar to 'Nek|tarin'              -- false
'Nektarin' similar to 'Nektarin|Persika'       -- true
'Nektarin' similar to 'M_+|N_+|P_+'            -- true

Subexpressions

One or more parts of the regular expression can be grouped into subexpressions (also called subpatterns) by placing them between parentheses. A subexpression is a regular expression in its own right. It can contain all the elements allowed in a regular expression, and can also have quantifiers added to it.

'Orange' similar to 'O(ra|ri|ro)nge'           -- true
'Orange' similar to 'O(r[a-e])+nge'            -- true
'Orange' similar to 'O(ra){2,4}nge'            -- false
'Orange' similar to 'O(r(an|in)g|rong)?e'      -- true

Escaping special characters

In order to match against a character that is special in regular expressions, that character has to be escaped. There is no default escape character; rather, the user specifies one when needed:

'Peer (Poire)' similar to 'P[^ ]+ \(P[^ ]+\)' escape '\'    -- true
'Pera [Pear]'  similar to 'P[^ ]+ #[P[^ ]+#]' escape '#'    -- true
'Päron-Äppledryck' similar to 'P%$-Ä%' escape '$'           -- true
'Pärondryck' similar to 'P%--Ä%' escape '-'                 -- false

The last line demonstrates that the escape character can also escape itself, if needed.