SQL Server Query for Searching by word in a string with word breakers

abc abc 351 Reputation points
2021-11-26T14:29:31.817+00:00

How to construct a SQL Server select query to achieve below result. How to search a word in a string word breakers?

My strings are,

doc
doc_1
doc/1
doc_mine
mine_doc
doc.mine
mi.doc

If I search by word "doc", the above strings should be retrieved.

My strings are,

document
docs

If I search by word "doc", the above strings should not be retrieved.

Thanks

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,779 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,555 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Viorel 112.5K Reputation points
    2021-11-26T14:46:11.497+00:00

    Check this query:

    declare @word varchar(max) = 'doc'  
      
    select *  
    from MyTable  
    where ' ' + MyColumn + ' ' like '%[^a-z]' + @word + '[^a-z]%'  
    

    It assumes that the word (@omarcanchanya variable) contains letters only.


  2. Guoxiong 8,126 Reputation points
    2021-11-26T21:32:02.043+00:00

    How about this:

    DECLARE @MyTable TABLE (
        MyCol nvarchar(100)
    );
    INSERT INTO @MyTable VALUES
    ('doc'), ('doc_1'), ('doc/1'), ('doc_mine'), ('mine_doc'), ('doc.mine'), ('mi.doc'), ('document'), ('docs');
    
    DECLARE @WordBreakers TABLE (
        Breaker nchar(1)
    );
    INSERT INTO @WordBreakers VALUES
    (''), ('_'), ('/'), ('.');
    
    DECLARE @searchWord nvarchar(20) = 'doc';
    
    ;WITH CTE AS (
        SELECT 
            CASE WHEN CHARINDEX(@searchWord, MyCol) = 1 THEN '' ELSE SUBSTRING(MyCol, CHARINDEX(@searchWord, MyCol) - 1, 1) END AS Prefix,
            MyCol,
            SUBSTRING(MyCol, CHARINDEX(@searchWord, MyCol) + LEN(@searchWord), 1) AS Suffix
        FROM @MyTable
        WHERE CHARINDEX(@searchWord, MyCol) > 0
    )
    
    SELECT MyCol 
    FROM CTE
    WHERE Prefix IN (SELECT Breaker FROM @WordBreakers) AND Suffix IN (SELECT Breaker FROM @WordBreakers);
    GO
    
    0 comments No comments

  3. Erland Sommarskog 101.4K Reputation points MVP
    2021-11-26T22:27:24.937+00:00

    If you are on SQL 2017 or later, you can use regular expressions in Python. The advantage of this is that regular expressions are a lot more straight forward for this kind of problems. To be able to use Python, there are some preconditions:

    1. The support for Machine Learning extensions and Python must be installed. (These are optional components in Setup.=)
    2. The configuration parameter external scripts enabled must be 1.
    3. The user must hold the database permission EXECUTE ANY EXTERNAL SCRIPT.

    Here is a solution for your problem:

    DROP TABLE IF EXISTS #MyTable
     CREATE TABLE #MyTable(
         MyCol nvarchar(100)
     );
     INSERT INTO #MyTable VALUES
     ('doc'), ('doc_1'), ('doc/1'), ('doc_mine'), ('mine_doc'), ('doc.mine'), ('mi.doc'), ('document'), ('docs');
    
    EXEC sp_execute_external_script @language = N'Python', 
         @script = N'
    import re, pandas
    condition = InputDataSet["MyCol"].map(lambda str:
        re.search(r"(\b|_)" + Word + r"(\b|_)", str) != None)
    Ret = InputDataSet[condition]
    ', 
         @input_data_1 = N'SELECT MyCol FROM #MyTable', 
         @input_data_1_name = N'InputDataSet',
         @output_data_1_name = N'Ret',
         @params = N'@Word nvarchar(200)',
         @Word = N'doc'
         WITH RESULT SETS ((MyCol nvarchar(100)))
    

    In regular expression, \b is stands exactly for a word break. Except that in regular expressions the underscore is considered to be alphanumeric, so I had to add this extra.

    0 comments No comments

  4. EchoLiu-MSFT 14,571 Reputation points
    2021-11-29T02:35:40.777+00:00

    Hi @abc abc

    Please also check:

    CREATE TABLE #data(string VARCHAR(15))  
    INSERT INTO #data VALUES  
    ('doc'),  
    ('doc_1'),  
    ('doc/1'),  
    ('doc_mine'),  
    ('mine_doc'),  
    ('doc.mine'),  
    ('mi.doc'),  
    ('document'),  
    ('docs')  
      
    SELECT string    
    FROM #data  
    WHERE string LIKE '%doc[^a-z]%' OR string LIKE '%doc'  
    

    Output:
    153156-image.png

    Regards,
    Echo


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    0 comments No comments