Author Topic: REGEXP_SUBSTR Function usage  (Read 17726 times)

credmond

  • Newbie
  • *
  • Posts: 4
    • View Profile
REGEXP_SUBSTR Function usage
« on: August 10, 2011, 01:32:37 PM »
I am attempting to extract substrings out of a string (ex. '001.069012...3301').
From the sting I wish to treat the period (.) as the delimiter. As such there are 5 substrings in this string with the third and fourth having a null value. I am currently using the following code to do this:

REGEXP_SUBSTR( ALS.CONCATENATED_SEGMENTS, '[^.]+', 1, 1 ) SUBST_1

This code (which I got off of this wesite, Thanks Mike!) works great as long as there are no empty (null) substrings in the string.
Right now in order to get the 5th substring, I am actually using 3 for the occurance arguement.

Any suggestions on how to get this to work so it can handle empty (null) substrings?

Side note: for those of you familiar with MUMPS, this would work like the $PIECE function.

Thank you for a great website Mike. You have lots of great content here! :-)
« Last Edit: August 10, 2011, 02:09:42 PM by Mike »


Mike

  • Administrator
  • Hero Member
  • *****
  • Posts: 1947
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #1 on: August 10, 2011, 02:14:07 PM »
Hmmm, I'm not totally up on the finer points of REGEXP_SUBSTR, but here are a couple of ideas...

First, what happens when you have empty substrings
- does it throw an error. or...?

Second, you're probably aware of this, but NULL and empty aren't the same thing...you can test for NULL, but an empty string would return "not null" if you tested one.


I am attempting to extract substrings out of a string (ex. '001.069012...3301').
From the sting I wish to treat the period (.) as the delimiter. As such there are 5 substrings in this string with the third and fourth having a null value. I am currently using the following code to do this:

REGEXP_SUBSTR( ALS.CONCATENATED_SEGMENTS, '[^.]+', 1, 1 ) SUBST_1

This code (which I got off of this wesite, Thanks Mike!) works great as long as there are no empty (null) substrings in the string.
Right now in order to get the 5th substring, I am actually using 3 for the occurance arguement.

Any suggestions on how to get this to work so it can handle empty (null) substrings?

Side note: for those of you familiar with MUMPS, this would work like the $PIECE function.

Thank you for a great website Mike. You have lots of great content here! :-)

credmond

  • Newbie
  • *
  • Posts: 4
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #2 on: August 10, 2011, 02:43:44 PM »
Thanks Mike.
 
I am aware of the technical differences between empty and null. Sorry for mixing these terms. In this case, Oracle considers thel ack of a substring to be NULL.
 
If I choose the value 5 (because I want the fifth substring) as shown below, it returns a single row with a null value instead of the value '3301'. As mentioned before, I must use 3 as the occurance to get the fifth substring '3301'. This will become a problem when the third or fourth substrings become populated.
 

SELECT
REGEXP_SUBSTR( '001.069012...3301', '[^.]+', 1, 5 ) RESULT

FROM
DUAL

;
 
 

Mike

  • Administrator
  • Hero Member
  • *****
  • Posts: 1947
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #3 on: August 10, 2011, 03:15:36 PM »
Hmmm...this would be easy to solve in a second-tier language like PHP, Java, perl, etc, but I'm not sure of the best way to do this in SQL. You might have to do some preprocessing of the string to split it along the delimiter and then examine the values. I'd bet there's a better/cleaner way to do it via SQL but I'm not aware of how that would be done.

Another way might be to use regexp to look for the digits and key off of them. Something like "(/d+)" or "(/d*)" might work.

Thanks Mike.

I am aware of the technical differences between empty and null. Sorry for mixing these terms. In this case, Oracle considers thel ack of a substring to be NULL.

If I choose the value 5 (because I want the fifth substring) as shown below, it returns a single row with a null value instead of the value '3301'. As mentioned before, I must use 3 as the occurance to get the fifth substring '3301'. This will become a problem when the third or fourth substrings become populated.


SELECT
REGEXP_SUBSTR( '001.069012...3301', '[^.]+', 1, 5 ) RESULT

FROM
DUAL

;



credmond

  • Newbie
  • *
  • Posts: 4
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #4 on: August 10, 2011, 04:24:11 PM »
Thanks Mike,
I did some more hunting on the web and found the Oracle Forums. (I hope you don't mind me quoting other forums).
 
Here is what I found:
 
Check this out: <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
[COLOR=#NaNNaNNaN]SELECT[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]REGEXP_SUBSTR([/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]'001.069012...3301'[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]'([^.]*)(.|$)'[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]1[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]5[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]NULL,[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]1[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN])[/COLOR][COLOR=#NaNNaNNaN] RESULT<o:p></o:p>[/COLOR]
[COLOR=#NaNNaNNaN]FROM[/COLOR][COLOR=#NaNNaNNaN] DUAL<o:p></o:p>[/COLOR]
[COLOR=#NaNNaNNaN];[/COLOR][COLOR=#NaNNaNNaN]<o:p></o:p>[/COLOR]
<o:p> </o:p>
This only works on 11g. <o:p></o:p>
<o:p> </o:p>
If you are on 10g, use this:<o:p></o:p>
[COLOR=#NaNNaNNaN]SELECT[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]RTRIM([/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]REGEXP_SUBSTR([/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]'001.069012...3301'[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]'[^.]*(.|$)'[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]1[/COLOR][COLOR=#NaNNaNNaN],[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]5[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]),[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN]'.'[/COLOR][COLOR=#NaNNaNNaN] [/COLOR][COLOR=#NaNNaNNaN])[/COLOR][COLOR=#NaNNaNNaN] RESULT<o:p></o:p>[/COLOR]
[COLOR=#NaNNaNNaN]FROM[/COLOR][COLOR=#NaNNaNNaN] DUAL<o:p></o:p>[/COLOR]
[COLOR=#NaNNaNNaN];[/COLOR]<o:p></o:p>
<o:p> </o:p>
Found at: newbielink:http://forums.oracle.com/forums/thread.jspa?messageID=9530500&#9530500 [nonactive]<o:p></o:p>
There are three other sample code responses. Two of them work and are good options.
The key (best) response where this code is listed is by Solomon Yakobson on Apr 19, 2011 11:48 AM. <o:p></o:p>

 
Thanks for the help Mike. You really helped me to think about this issue and hunt for a good solution.
 
I'll be back soon I'm sure!


credmond

  • Newbie
  • *
  • Posts: 4
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #5 on: August 10, 2011, 04:26:38 PM »
Sorry about that last post, it's a little ugly. I didn't expect it to expose all of the RTF code.
Here it is again:
 
Check this out:
SELECT REGEXP_SUBSTR( '001.069012...3301', '([^.]*)(.|$)', 1, 5, NULL, 1 ) RESULT
FROM DUAL
;
This only works on 11g.
If you are on 10g, use this:
SELECT RTRIM( REGEXP_SUBSTR( '001.069012...3301', '[^.]*(.|$)', 1, 5 ), '.' ) RESULT
FROM DUAL
;
Found at: newbielink:http://forums.oracle.com/forums/thread.jspa?messageID=9530500&#9530500 [nonactive]
The key (best) response where this code is listed is by Solomon Yakobson on Apr 19, 2011 11:48 AM.

Mike

  • Administrator
  • Hero Member
  • *****
  • Posts: 1947
    • View Profile
Re: REGEXP_SUBSTR Function usage
« Reply #6 on: August 10, 2011, 04:47:04 PM »
Thanks for the help Mike. You really helped me to think about this issue and hunt for a good solution.

I'll be back soon I'm sure!
Glad to be able to help. :)