\b
, at least not in Oracle 11i.
Consider the following table:
create table tq84_word_boundaries (
txt varchar2(50)
);
insert into tq84_word_boundaries values ('AFooABarAndABaz' );
insert into tq84_word_boundaries values ('A FooA BarAndABaz' );
insert into tq84_word_boundaries values ('A Foo, a Bar and a Baz');
insert into tq84_word_boundaries values ('A Foo without a Baz' );
insert into tq84_word_boundaries values ('Foo Bar, Baz' );
insert into tq84_word_boundaries values ('Is it a Foo?' );
insert into tq84_word_boundaries values ('Bar-Foo-Baz' );
Now, I want to find all records that contain the exact word Foo
.
That is, I want, for example A Foo without a Baz
(the fourth record), but I don't want
A FooA BarAndABaz
(the second record) because FooA
is not the exact word Foo
If Oracle supported \b
I could use
select * from tq84_word_boundaries
where
regexp_like(txt, '\bFoo\b');
Yet, Oracle, does not support it and no record is returned.
To improve matters, I could try
select * from tq84_word_boundaries
where
regexp_like(txt, '\sFoo\s');
This approach returns
TXT
--------------------------------------------------
A Foo without a Baz
A bit better, but far from perfect. For example, the fifth record (Foo Bar, Baz
) is not returned, because
the \s
doesn't match start of line. So, I improve the where condition:
select * from tq84_word_boundaries
where
regexp_like(txt, '(^|\s)Foo($|\s)');
The record is now returned:
TXT
--------------------------------------------------
A Foo without a Baz
Foo Bar, Baz
Yet again, this is far from perfect. I need also record records where Foo
is followed or lead by a non word character (such as ?
or -
):
select * from tq84_word_boundaries
where
regexp_like(txt, '(^|\s|\W)Foo($|\s|\W)');
This returns
TXT
--------------------------------------------------
A Foo, a Bar and a Baz
A Foo without a Baz
Foo Bar, Baz
Is it a Foo?
Bar-Foo-Baz
I think I can live with it.
Links
Source code on github
Its very helpful In mobile applications searching for product descriptions.
ReplyDeleteThanks and Regards
Kumar
Thanks for this, but it turns out that the \s's are redundant as they are subsets of the \W's so you just need '(^|\W)Foo($|\W)'. Regards.
ReplyDelete