Snippet Name: Regular Expressions: Regexp Cheat Sheet
Description: With version 10g, Oracle Database offers 4 regexp functions that you can use in SQL and PL/SQL statements. These functions implement the POSIX Extended Regular Expressions (ERE) standard. Oracle fully supports collating sequences and equivalence classes in bracket expressions. The NLS_SORT setting determines the POSIX locale used, which determines the available collating sequences and equivalence classes.
Oracle does not implement the POSIX ERE standard exactly, however. It deviates in three areas.
First, Oracle supports the backreferences 1 through 9 in the regular expression. The POSIX ERE standard does not support these, even though POSIX BRE does. In a fully compliant engine, 1 through 9 would be illegal. The POSIX standard states it is illegal to escape a character that is not a metacharacter with a backslash. Oracle allows this, and simply ignores the backslash. E.g. z is identical to z in Oracle. The result is that all POSIX ERE regular expressions can be used with Oracle, but some regular expressions that work in Oracle may cause an error in a fully POSIX-compliant engine. Obviously, if you only work with Oracle, these differences are irrelevant.
The third difference is more subtle. It won't cause any errors, but may result in different matches. As I explained in the topic about the POSIX standard, it requires the regex engine to return the longest match in case of alternation. Oracle's engine does not do this. It is a traditional NFA engine, like all non-POSIX regex flavors discussed on this website.
Also see: » Regular Expressions: REGEXP_SUBSTR
» Regular Expressions: REGEXP_LIKE
» Regular Expressions: REGEXP_REPLACE
» Regular Expressions: RegExp examples t...
» Regular Expressions: RegExp examples t...
» Regular Expressions: RegExp examples o...
» RegExp: Append first name first letter...
Comment: (none)
Language:
Highlight Mode: HTML4STRICT
Last Modified: March 12th, 2009
|
Anchoring Characters are metacharacters that
affect the position of the search.
----------------------------------------------------
Character Class Description
^ Anchor the expression to the start of a line
$ Anchor the expression to the end of a line
Equivalence Classes
----------------------------------------------------
Character Class Description
= = Oracle supports the equivalence classes through
the POSIX '[==]' syntax. A base letter and all of
its accented versions constitute an equivalence
class. For example, the equivalence class '[=a=]'
matches ä and â. The equivalence classes are valid
only inside the bracketed expression
Match Options determine if the target is treated checked for
case-sensitivity and whether or not the target is evaluated
line by line or as a continuous string.
----------------------------------------------------
Character Class Description
c Case sensitive matching
i Case insensitive matching
m Treat source string as multi-line activating Anchor chars
n Allow the period (.) to match any newline character
Posix Characters tend to look very ugly but have the advantage
that also take into account the 'locale', that is, any variant
of the local language/coding system.
----------------------------------------------------
Character Class Description
[:digit:] Only the digits 0 to 9
[:alnum:] Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:] Any alpha character A to Z or a to z.
[:blank:] Space and TAB characters only.
[:xdigit:] Hexadecimal notation 0-9, A-F, a-f.
[:punct:] Punctuation symbols
--------------------------------
% . , " ' ? ! : # $ & ( ) * ;
+ - / < > = @ [ ] \ ^ _ { } | ~
--------------------------------
[:print:] Any printable character.
[:space:] Any whitespace characters (space, tab, NL, FF, VT, CR).
Many system abbreviate as \s.
[:graph:] Exclude whitespace (SPACE, TAB). Many system abbreviate as \W.
[:upper:] Any alpha character A to Z.
[:lower:] Any alpha character a to z.
[:cntrl:] Control Characters NL CR LF TAB VT FF NUL SOH STX
EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN
ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.
Quantifier Characters control the number of times a character
or string is found in a search.
----------------------------------------------------
Character Class Description
* Match 0 or more times
? Match 0 or 1 time
+ Match 1 or more times
{m} Match exactly m times
{m,} Match at least m times
{m, n} Match at least m times but no more than n times
\n Cause the previous expression to be repeated n times
Alternative Matching And Grouping Characters
----------------------------------------------------
Character Class Description
| Separates alternates, often used with grouping
operator ()
( ) Groups subexpression into a unit for alternations, for
quantifiers, or for backreferencing
[char] Indicates a character list; most metacharacters inside a
character list are understood as literals, with the
exception of character classes, and the ^
and - metacharacters
////////////////////////////////////////////////////
Regex Cheat Sheet (non-posix)
////////////////////////////////////////////////////
Modifiers:
i case-insensitive pattern matching.
g global replace, or replace all
m Treat string as multiple lines. That is,
change ``^'' and ``$'' from matching at only
the very start or end of the string to the
start or end of any line anywhere within the string
s Treat string as single line. That is, change ``.'' to
match any character whatsoever, even a newline, which
it normally would not match.
x Extend your pattern's legibility by permitting
whitespace and comments.
Special Characters:
The following should be escaped if you are trying to
match that character:
\ ^ . $ | ( ) [ ]
* + ? { } ,
Special Character Definitions:
\ Quote the next metacharacter
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation
() Grouping
[] Character class
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
More Special Characters:
\t tab (HT, TAB)
\n newline (LF, NL)
\r return (CR)
\f form feed (FF)
\a alarm (bell) (BEL)
\e escape (think troff) (ESC)
\033 octal char (think of a PDP-11)
\x1B hex char
\c[ control char
\l lowercase next char (think vi)
\u uppercase next char (think vi)
\L lowercase till \E (think vi)
\U uppercase till \E (think vi)
\E end case modification (think vi)
\Q quote (disable) pattern metacharacters till \E
Even More Special Characters:
\w Match a "word" character (alphanumeric plus "_")
\W Match a non-word character
\s Match a whitespace character
\S Match a non-whitespace character
\d Match a digit character
\D Match a non-digit character
\b Match a word boundary
\B Match a non-(word boundary)
\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end
\z Match only at end of string
\G Match only where previous m//g left off (works only with /g)
|