If you are on SQL 2017 or later, you can use regular expressions in Python. The advantage of this is that regular expressions are a lot more straight forward for this kind of problems. To be able to use Python, there are some preconditions:
- The support for Machine Learning extensions and Python must be installed. (These are optional components in Setup.=)
- The configuration parameter
external scripts enabled
must be 1.
- The user must hold the database permission EXECUTE ANY EXTERNAL SCRIPT.
Here is a solution for your problem:
DROP TABLE IF EXISTS #MyTable
CREATE TABLE #MyTable(
MyCol nvarchar(100)
);
INSERT INTO #MyTable VALUES
('doc'), ('doc_1'), ('doc/1'), ('doc_mine'), ('mine_doc'), ('doc.mine'), ('mi.doc'), ('document'), ('docs');
EXEC sp_execute_external_script @language = N'Python',
@script = N'
import re, pandas
condition = InputDataSet["MyCol"].map(lambda str:
re.search(r"(\b|_)" + Word + r"(\b|_)", str) != None)
Ret = InputDataSet[condition]
',
@input_data_1 = N'SELECT MyCol FROM #MyTable',
@input_data_1_name = N'InputDataSet',
@output_data_1_name = N'Ret',
@params = N'@Word nvarchar(200)',
@Word = N'doc'
WITH RESULT SETS ((MyCol nvarchar(100)))
In regular expression, \b
is stands exactly for a word break. Except that in regular expressions the underscore is considered to be alphanumeric, so I had to add this extra.