I posted over a year ago about how one might use regular expressions in Oracle to return SEO-friendly URLs in queries. The query I came up with at the time looked something like this:
SELECT id, title, description, pubdate, journal_name, author_name, num_reads , REGEXP_REPLACE(REGEXP_REPLACE(LOWER(title), '[^-a-z0-9/ ]', ''), '[/ ]', '-') AS url_keyword FROM articles
Since then, I’ve been looking into regular expressions quite a bit. One difficulty with the expression above is that multiple characters will be replaced with multiple hyphens, plus other punctuation will be ignored — they will simply be replaced with nothing. Another, perhaps more obvious difficulty is its complexity! A better approach would be the following:
SELECT id, title, description, pubdate, journal_name, author_name, num_reads , REGEXP_REPLACE(LOWER(title), '[^a-z0-9]+', '-') AS url_keyword FROM articles
This simply replaces consecutive non-alphanumeric characters with a single hyphen. Since Oracle supports the POSIX character classes in regular expressions, the above could also be written:
SELECT id, title, description, pubdate, journal_name, author_name, num_reads , REGEXP_REPLACE(LOWER(title), '[^[:alnum:]]+', '-') AS url_keyword FROM articles
where [[:alnum:]]
is the POSIX character class for alphanumeric characters and [^[:alnum:]]
is its negation. A third option, if you know your data has no underscores or want them to be preserved for some reason (and not transformed into hyphens) would be the following:
SELECT id, title, description, pubdate, journal_name, author_name, num_reads , REGEXP_REPLACE(LOWER(title), '\W+', '-') AS url_keyword FROM articles