Oracle PL/SQL: Regular Expressions: Regexp Cheat Sheet - With version 10g, Oracle Database offers 4 regexp functions that you can use in SQL and PL/SQL state

Quick Search:

Oracle PL/SQL: Regular Expressions: Regexp Cheat Sheet

Jump to:

Category: >> Oracle PL/SQL >> Regular Expressions: Regexp Cheat Sheet

<< last

next >>

Snippet Name: Regular Expressions: Regexp Cheat Sheet

Description: With version 10g, Oracle Database offers 4 regexp functions that you can use in SQL and PL/SQL statements. These functions implement the POSIX Extended Regular Expressions (ERE) standard. Oracle fully supports collating sequences and equivalence classes in bracket expressions. The NLS_SORT setting determines the POSIX locale used, which determines the available collating sequences and equivalence classes.

Oracle does not implement the POSIX ERE standard exactly, however. It deviates in three areas.

First, Oracle supports the backreferences 1 through 9 in the regular expression. The POSIX ERE standard does not support these, even though POSIX BRE does. In a fully compliant engine, 1 through 9 would be illegal. The POSIX standard states it is illegal to escape a character that is not a metacharacter with a backslash. Oracle allows this, and simply ignores the backslash. E.g. z is identical to z in Oracle. The result is that all POSIX ERE regular expressions can be used with Oracle, but some regular expressions that work in Oracle may cause an error in a fully POSIX-compliant engine. Obviously, if you only work with Oracle, these differences are irrelevant.

The third difference is more subtle. It won't cause any errors, but may result in different matches. As I explained in the topic about the POSIX standard, it requires the regex engine to return the longest match in case of alternation. Oracle's engine does not do this. It is a traditional NFA engine, like all non-POSIX regex flavors discussed on this website.

Also see:
» Regular Expressions: REGEXP_SUBSTR
» Regular Expressions: REGEXP_LIKE
» Regular Expressions: REGEXP_REPLACE
» Regular Expressions: RegExp examples t...
» Regular Expressions: RegExp examples t...
» Regular Expressions: RegExp examples o...
» RegExp: Append first name first letter...

Comment: (none)

Language:
Highlight Mode: HTML4STRICT
Last Modified: March 12th, 2009

Anchoring Characters are metacharacters that 
affect the position of the search.
----------------------------------------------------
Character Class      Description
^      Anchor the expression to the start of a line
$      Anchor the expression to the end of a line
 
 
 
Equivalence Classes      
----------------------------------------------------
Character Class      Description
= =      Oracle supports the equivalence classes through 
        the POSIX '[==]' syntax. A base letter and all of 
        its accented versions constitute an equivalence 
        class. For example, the equivalence class '[=a=]' 
        matches Ã¤ and Ã¢. The equivalence classes are valid 
        only inside the bracketed expression
 
 
Match Options determine if the target is treated checked for 
case-sensitivity and whether or not the target is evaluated 
line by line or as a continuous string. 
----------------------------------------------------
Character Class      Description
c      Case sensitive matching
i      Case insensitive matching
m      Treat source string as multi-line activating Anchor chars
n      Allow the period (.) to match any newline character
 
 
 
Posix Characters tend to look very ugly but have the advantage 
that also take into account the 'locale', that is, any variant 
of the local language/coding system.
----------------------------------------------------
Character Class      Description
[:digit:]      Only the digits 0 to 9
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]      Any alpha character A to Z or a to z.
[:blank:]      Space and TAB characters only.
[:xdigit:]     Hexadecimal notation 0-9, A-F, a-f.
[:punct:]      Punctuation symbols 
                --------------------------------
                % . , " ' ? ! : # $ & ( ) * ;
                + - / < > = @ [ ] \ ^ _ { } | ~
                --------------------------------
[:print:]      Any printable character.
[:space:]      Any whitespace characters (space, tab, NL, FF, VT, CR). 
                Many system abbreviate as \s.
[:graph:]      Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.
[:upper:]      Any alpha character A to Z.
[:lower:]      Any alpha character a to z.
[:cntrl:]      Control Characters NL CR LF TAB VT FF NUL SOH STX 
                EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN 
                ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.
 
 
Quantifier Characters control the number of times a character 
or string is found in a search.
----------------------------------------------------
Character Class      Description
*      Match 0 or more times
?      Match 0 or 1 time
+      Match 1 or more times
{m}      Match exactly m times
{m,}      Match at least m times
{m, n}      Match at least m times but no more than n times
\n      Cause the previous expression to be repeated n times
 
 
 
Alternative Matching And Grouping Characters      
----------------------------------------------------
Character Class      Description
|      Separates alternates, often used with grouping 
        operator ()
 
( )      Groups subexpression into a unit for alternations, for 
        quantifiers, or for backreferencing 
 
[char]      Indicates a character list; most metacharacters inside a
        character list are understood as literals, with the 
        exception of character classes, and the ^ 
        and - metacharacters
 
 
////////////////////////////////////////////////////
Regex Cheat Sheet (non-posix)
////////////////////////////////////////////////////
 
 
Modifiers: 
i   case-insensitive pattern matching. 
 
g   global replace, or replace all 
 
m   Treat string as multiple lines. That is, 
    change ``^'' and ``$'' from matching at only 
    the very start or end of the string to the 
    start or end of any line anywhere within the string 
 
s   Treat string as single line. That is, change ``.'' to 
    match any character whatsoever, even a newline, which 
    it normally would not match. 
 
x   Extend your pattern's legibility by permitting 
    whitespace and comments.
 
 
Special Characters:
The following should be escaped if you are trying to 
match that character:
 
 \  ^  .  $  |  (  )  [  ]
 *  +  ?  {  }  ,
 
 
Special Character Definitions:
    \   Quote the next metacharacter
    ^   Match the beginning of the line
    .   Match any character (except newline)
    $   Match the end of the line (or before newline at the end)
    |   Alternation
    ()  Grouping
    []  Character class
    *      Match 0 or more times
    +      Match 1 or more times
    ?      Match 1 or 0 times
    {n}    Match exactly n times
    {n,}   Match at least n times
    {n,m}  Match at least n but not more than m times
 
More Special Characters: 
    \t          tab                   (HT, TAB)
    \n          newline               (LF, NL)
    \r          return                (CR)
    \f          form feed             (FF)
    \a          alarm (bell)          (BEL)
    \e          escape (think troff)  (ESC)
    \033        octal char (think of a PDP-11)
    \x1B        hex char
    \c[         control char
    \l          lowercase next char (think vi)
    \u          uppercase next char (think vi)
    \L          lowercase till \E (think vi)
    \U          uppercase till \E (think vi)
    \E          end case modification (think vi)
    \Q          quote (disable) pattern metacharacters till \E
 
Even More Special Characters:
    \w  Match a "word" character (alphanumeric plus "_")
    \W  Match a non-word character
    \s  Match a whitespace character
    \S  Match a non-whitespace character
    \d  Match a digit character
    \D  Match a non-digit character
    \b  Match a word boundary
    \B  Match a non-(word boundary)
    \A  Match only at beginning of string
    \Z  Match only at end of string, or before newline at the end
    \z  Match only at end of string
    \G  Match only where previous m//g left off (works only with /g)