It is something developers need often, and still can't believe there isn't a function that does something like this in oracle 10g...???
Some developers parse delimited text using substr(text,X,Y) for every token they need to use in their code…
My advice is: Don’t ever do this!!! Use some available tokenizer function!
I found several implementations on the internet, but those functions either does not support delimiter with length more than 1, either it does not return tokens that are NULL, or does not return table of varchar2 index by pls_integer ( what I needed), etc.
So I wrote it myself…
Anyway, here is my first implementation, written entirelly in PL/SQL:
CREATE OR REPLACE FUNCTION splitString(str VARCHAR2,delimit VARCHAR2) RETURN dbms_sql.Varchar2_Table IS inputString VARCHAR2(32000); delimiter VARCHAR2(32); delimiters dbms_sql.Number_Table; tokens dbms_sql.Varchar2_Table; array_index PLS_INTEGER := 1; pos PLS_INTEGER := 1; delimiter_pos PLS_INTEGER; delimiter_length PLS_INTEGER; inputString_length PLS_INTEGER; BEGIN IF str IS NULL THEN RAISE_APPLICATION_ERROR(-20000,'str cannot be null!'); END IF; IF delimit IS NULL OR delimit = '' THEN RAISE_APPLICATION_ERROR(-20000,'delimiter cannot be null or empty string!'); END IF; inputString := str; delimiter := delimit; delimiter_length := length(delimiter); inputString_length := length(inputString); -- Prepare inputText for parsing IF substr(inputString,inputString_length - delimiter_length + 1,delimiter_length) != delimiter THEN inputString := inputString || delimiter; END IF; IF substr(inputString,1,delimiter_length) = delimiter THEN inputString := substr(inputString,delimiter_length + 1,inputString_length); END IF; --------------------- -- find delimiter occurrences in inputText LOOP delimiter_pos := instr(inputString,delimiter,pos); EXIT WHEN delimiter_pos = 0; delimiters(array_index) := delimiter_pos; array_index := array_index + 1; pos := delimiter_pos + delimiter_length; END LOOP; -------------------- -- reset temp variables array_index := 1; pos := 1; --------------------- -- iterate through delimiters and create tokens FOR j IN delimiters.first..delimiters.last LOOP tokens(array_index) := substr(inputString,pos,delimiters(j)-pos); array_index := array_index + 1; pos := delimiters(j)+ delimiter_length; END LOOP; -------------------- RETURN tokens; END; /Function’s parameters are string that needs to be tokenized, and a delimiter ( that can have length greater than 1).
Yoy could enchance it by setting
TOKENS DBMS_SQL.VARCHAR2_TABLE;
to
TYPE t IS TABLE OF varchar2(32000) INDEX BY PLS_INTEGER;
TOKENS t;
if you are dealing with tokens of length > 2000.
Also, these are case-sensitive functions. You can make it case-insensitive very easy...
Dejan gave me an idea to write it using SQL.
So I did it, because his implementation using SQL to parse a string is cool:
CREATE OR REPLACE FUNCTION SPLITSTRING_USING_SQL(STR VARCHAR2, DELIMIT VARCHAR2) RETURN DBMS_SQL.VARCHAR2_TABLE IS INPUTSTRING VARCHAR2(32000); INPUTDELIMITER VARCHAR2(32); TOKENS DBMS_SQL.VARCHAR2_TABLE; DELIMITER_LENGTH NUMBER; INPUTSTRING_LENGTH NUMBER; BEGIN IF STR IS NULL THEN RAISE_APPLICATION_ERROR(-20000, 'str cannot be null!'); END IF; IF DELIMIT IS NULL OR DELIMIT = '' THEN RAISE_APPLICATION_ERROR(-20000, 'delimiter cannot be null or empty string!'); END IF; INPUTSTRING := STR; INPUTDELIMITER := DELIMIT; DELIMITER_LENGTH := LENGTH(INPUTDELIMITER); INPUTSTRING_LENGTH := LENGTH(INPUTSTRING); -- Prepare inputText for parsing IF SUBSTR(INPUTSTRING, INPUTSTRING_LENGTH - DELIMITER_LENGTH + 1, DELIMITER_LENGTH) != INPUTDELIMITER THEN INPUTSTRING := INPUTSTRING || INPUTDELIMITER; END IF; IF SUBSTR(INPUTSTRING, 1, DELIMITER_LENGTH) != INPUTDELIMITER THEN INPUTSTRING := INPUTDELIMITER || INPUTSTRING; END IF; SELECT SUBSTR(INPUTTEXT, INDEKS + DELIMITER_LENGTH, NAREDNI - DELIMITER_LENGTH - INDEKS) TOKEN BULK COLLECT INTO TOKENS FROM (SELECT INDEKS, LEAD(INDEKS) OVER(ORDER BY INDEKS) NAREDNI, INPUTTEXT FROM (SELECT LEVEL INDEKS, INPUTSTRING INPUTTEXT, INPUTDELIMITER DELIMITER, (CASE WHEN SUBSTR(INPUTSTRING, LEVEL, DELIMITER_LENGTH) = INPUTDELIMITER THEN 1 ELSE 0 END) IS_DELIMITER FROM DUAL CONNECT BY LEVEL <= LENGTH(INPUTSTRING)) WHERE IS_DELIMITER = 1) WHERE NAREDNI IS NOT NULL; RETURN TOKENS; END; /It’s basically very similar to his function.
The only difference is that unlike Dejan’s function, this one returns NULL tokens too…
You can test theese two functions easy:
declare input VARCHAR2(32000) := 'one***two***three***four***five***six***seven***eight***nine***ten'; delimiter varchar2(10) := '***'; result dbms_sql.Varchar2_Table; begin result := splitString(input,delimiter); --result := splitString_using_sql(input,delimiter); for i in result.first..result.last loop dbms_output.put_line(result(i)); end loop; end;
Conclusion:
There are CERTAINLLY better ways to write this function, and these are just an example how this could be done.
This is one possible approach, to return an array of tokens. For me , probably the better way would be to create an Object that does something like StringTokenizer Java class. Not to return an array, but to provide a way to iterate through tokens using it’s hasMoreTokens() and nextToken() methods.
I think this is better approach if you are dealing with very large input strings and tokens. If you are dealing with small ones, then it’s OK to use one of my funtions I provided here, or any other similar to this one, that you can find on the internet…
If you intend to use one of these, I suggest on using first one, because It is much faster than second one.
I didn’t do any optimizations ( which is obvious ) because I was pleased with speed. Of course, you are welcome to do it yourself
as a matter of fact, I did some speed testing with input strings whose length is ~ 30000, and the first one was (much) faster than many functions I found on the internet…
No comments:
Post a Comment