question

SQL9-3412 avatar image
0 Votes"
SQL9-3412 asked SQL9-3412 commented

Parse address field to extract City name using t-sql

Hi All,

I have a table(@Address table) with address field. I would like to extract the city name , state and zip code from it.

I have used STRING_AGG() and STRING_SPLIT() to get the City name, state, zip. In these 3 fields City name is very important to show.

Below is the sample code I am using to get the city, state, & zip but it works fine for city names with space in between them but doesn't work for single word states(top 4 works for states with space but not for states with single word, it pulls other street info).
Also I tried with a table to filter the data based on city names from the table but don't know how to use it correctly with string_split()

Any kind of help is greatly appreciate. I am using 2017 version.


declare @Output as varchar(100) = '100 NORTH MAIN MASON CITY IL 626641104'
SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output, SPACE(1))
order by rn desc) AS cn (rn,cityname);

SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@output, SPACE(1))
order by rn asc) AS Cn (rn,cityname);

SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 4 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@output, SPACE(1))
/ WHERE value IN (SELECT cityname FROM @City ) / order by rn desc ) AS Cn (rn,cityname);
select @output

Below is the sample city table data I want to use it to filter the STRING_Split() data

declare @City table (CityName varchar(50))
insert into @City (CityName)
Select 'PLAIN CITY' union all
Select 'MASON CITY' union all
select 'BLOOMINGDALE' union all
select 'CHICAGO'

--select * from @City

declare @Address table (FullAddress varchar(150))
insert into @Address (FullAddress)
Select '5500 North AVENUE, APT. 116 plain city OH 60053' union all
Select '12536 S INDIANA AVE CHICAGO IL 60625' union all
Select 'W MAIN ST PO BOX 111 Addison VA 24293' union all
select '179 ARMY TRAIL RD BLOOMINGDALE IL 60108' union all
select '100 NORTH MAIN MASON CITY IL 626641104'

--select * from @Address

Thanks in advance
RH

sql-server-transact-sql
· 6
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @SQL9-3412,

Could you please validate all the answers and provide any update?

If all are not working or helpful, please provide more sample data and expected output.

Please remember to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread. 

Thank you for understanding!

Best regards,
Melissa

0 Votes 0 ·

Hi Melissa,
Thanks for taking time and helping on this. The query which you have provided is almost giving same output as mine(I have posted mine). I have used String_Split() and String_Agg() to get the same, but it's showing cityname partly when cityname has space in between. Works fine when cityname is a single word.

The current query which you gave is showing part of the cityname when cityname has space in between( ex: Plain City , Mason city, but query is returning only Plain , Mason.
It's skipping "City" word from the cityname. I will post source data and output data that I need in the next reply. For some reason there is a chars limitation while posting.

Thanks,
RH

0 Votes 0 ·

Address could have cityname with "City", "Ville" or anything. I need full cityname like below output format.

My source data looks like below: I am using 2017 SQL version.
declare @Address table (FullAddress varchar(150))
insert into @Address (FullAddress)
Select '5500 North AVENUE, APT. 116 PLAIN CITY OH 60053'
union all
Select '12536 S INDIANA AVE CHICAGO IL 60625'
union all
Select 'W MAIN ST PO BOX 111 ADDISON VA 24293'
union all
select '179 ARMY TRAIL RD BLOOMINGDALE IL 60108'
union all
select '100 NORTH MAIN MASON CITY IL 626641104'
union all
select '123 West Street EDISON NJ 34561'

Output should look like in below format:

PLAIN CITY OH 60053
CHICAGO IL 60625
ADDISON VA 24293
BLOOMINGDALE IL 60108
MASON CITY IL 626641104
EDISON NJ 34561

Thanks,
RH

0 Votes 0 ·

So did you read Ronen's and Tom's answers? Or why are you still struggling to do this in T-SQL on your own?

0 Votes 0 ·

Hi @SQL9-3412,

Thanks for your update.

Please help refer the latest answers and check whether any of them is working.

Please remember to accept the answers if they helped. Your action would be helpful to other users who encounter the same issue and read this thread. 

Thank you for understanding!

Best regards,
Melissa

0 Votes 0 ·

Can all of cities in the Address table be found in the City table?

0 Votes 0 ·
MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered SQL9-3412 commented

Hi @SQL9-3412,

Thanks for your update.

If you could list all cityname with "City", "Ville" or anything, you could refer below:

 ;with cte as (
 select  ROW_NUMBER() Over (order by (select null)) addressid,FullAddress from @Address)
 ,cte1 as (
 SELECT ROW_NUMBER() Over (partition by addressid order by (select null) ) id, value,addressid FROM cte
 cross apply STRING_SPLIT(FullAddress, SPACE(1)))
 ,cte2 as (
 select ROW_NUMBER() Over (partition by addressid order by id desc) rn ,value ,addressid
 from cte1)
 ,cte3 as (
 select a.*,b.value1 
 from cte2 a
 left join (select addressid,value value1 from cte2 where rn=3) b on a.addressid=b.addressid)
 select STRING_AGG(value,SPACE(1)) WITHIN GROUP (ORDER BY rn DESC) Result
 from cte3
 where value1 in ('city','Ville') and rn<=4
 group by addressid
 union 
 select STRING_AGG(value,SPACE(1)) WITHIN GROUP (ORDER BY rn DESC) Result
 from cte3
 where value1 not in ('city','Ville') and rn<4
 group by addressid

Output:

 Result
 ADDISON VA 24293
 BLOOMINGDALE IL 60108
 CHICAGO IL 60625
 EDISON NJ 34561
 MASON CITY IL 626641104
 PLAIN CITY OH 60053

Or you could refer another method using user-defined function.

Firstly create one function as below:

 CREATE FUNCTION dbo.GetSplitString
 (
    @List       VARCHAR(MAX),
    @Delimiter  VARCHAR(255),
    @ElementNumber int
 )
 RETURNS VARCHAR(4000)
 AS
 BEGIN
    DECLARE @result varchar(4000)    
    DECLARE @Items TABLE ( position int IDENTITY PRIMARY KEY,
                           Item VARCHAR(4000)
                          )  
    
    DECLARE @ll INT = LEN(@List) + 1, @ld INT = LEN(@Delimiter);  
    
    WITH a AS
    (
        SELECT
            [start] = 1,
            [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                        @List, @ld), 0), @ll),
            [value] = SUBSTRING(@List, 1, 
                      COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                        @List, @ld), 0), @ll) - 1)
        UNION ALL
        SELECT
            [start] = CONVERT(INT, [end]) + @ld,
            [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                        @List, [end] + @ld), 0), @ll),
            [value] = SUBSTRING(@List, [end] + @ld, 
                      COALESCE(NULLIF(CHARINDEX(@Delimiter, 
                        @List, [end] + @ld), 0), @ll)-[end]-@ld)
        FROM a
        WHERE [end] < @ll
    )
    INSERT @Items SELECT [value]
    FROM a
    WHERE LEN([value]) > 0
    OPTION (MAXRECURSION 0);
    
    SELECT @result=Item
    FROM @Items
    WHERE position=@ElementNumber
       
    RETURN @result;
 END
 GO

Then call this function as below:

 SELECT case when REVERSE(dbo.GetSplitString(REVERSE(replace(FullAddress,' ',',')),',',3)) in ('city','Ville')
 then right(FullAddress,charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress), (charindex(' ', REVERSE(FullAddress), 1))+1)+1)+1))
 else right(FullAddress,charindex(' ', REVERSE(FullAddress),charindex(' ', REVERSE(FullAddress), (charindex(' ', REVERSE(FullAddress), 1))+1)+1))
 end result
 from @Address

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you Melissa, The options which you have given worked as expected. Sorry for the delay response. I appreciate your help.

Thank you,
RH

0 Votes 0 ·
pituach avatar image
0 Votes"
pituach answered pituach edited

Good day,

Your system appears to have a fundamental flaw, and now you try to deal with the result of that design instead of dealing with the real issue - your system design.

Using free text which include multiple values does not fit the relational model !
This design leads you to try to parse a string of text which can have any format and therefore any solution that you will find and will cover all your values today will not cover a different format which someone will insert the column.

You cannot count on any solution which based on parsing by format of the text, since your text can have endless options to format the data.

The only solution for parsing such free format text is using Artificial intelligence (AI) and even then it might not give you 100% correct solution for any input.

So, in your case you are using a single column to store address as free text which is the basic mistake.

Option 1 : best solution in most cases when using Tabular database like SQL Server


A much better design is to store the data in separate columns (as more detailed columns better) from the start. For example using the columns: building number, apartment number, street, town/city, state, zip code...

You should parse the information before you insert it to the database

If you are developing your own application then you can get free input of address from the user and then use Google map API or Bing map API in order to check the address and parse it.

Option 2:



Store the address in a single column using well formatted JSON value, after you parsed the data that the user wanted to insert using Google map API or Bing map

Option 3:



If must store the data in free text, then you should parse the data on-the-fly in your application.

Select the text as it is, and in your application side parse the text of the address using Google map API or Bing map API.

Option 4


If must store the data in free text and ONLY OF YOU MUST parse the data on-the-fly using in your query (and there should not be aby reason for this except bad design in most cases), then you should parse the data on-the-fly using using HttpWebRequest from your server to Google map API or Bing map API. This can be done using many ways in SQL Server on-premises including SQLCLR function, python or R solutions. Create function that send HttpWebRequest and returns the data as well formatted JSON.

Option 5 - bad solution but can work with the risk of inconsistent information and mistakes/errors


Make sure that the input (and the stored value) is well formatted in specific format using separator. Make sure using regular expression in your application side that the information fit the exact format(s) before you store it in the database.

Use simple T-SQL query to parse the data as you tried to do, but remember that the use might insert the data in different t format and you cannot cover the endless option of input which the user might use for the address!

THIS IS BAD IDEA

To provide a query that solve this case - text which specific format(s), please provide a list of exact formats of addressess which you might have and please provide the exact expected output for these formats


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MelissaMa-msft avatar image
0 Votes"
MelissaMa-msft answered

Hi @SQL9-3412,

Welcome to Microsoft Q&A!

According to your situation, you could consider to add one condition of value<>'city'. Then you could only choose the top 3 values.

Please refer below:

 declare @Output as varchar(100) = '100 NORTH MAIN MASON CITY IL 626641104'
 SELECT @output = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 3 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output, SPACE(1))
 where value<>'city'
 order by rn desc) AS cn (rn,cityname);
 select @output
 --626641104 IL MASON
    
 declare @Output1 as varchar(100) = '5500 North AVENUE, APT. 116 plain city OH 60053'
 SELECT @output1 = STRING_AGG(cityname, SPACE(1)) FROM (SELECT top 3 ROW_NUMBER() Over (order by (select null)) rn, value FROM STRING_SPLIT(@Output1, SPACE(1))
 where value<>'city'
 order by rn desc) AS cn (rn,cityname);
 select @output1
 --60053 OH plain

You could also refer below query:

 ;with cte as (
 select  ROW_NUMBER() Over (order by (select null)) addressid,FullAddress from @Address)
 ,cte1 as (
 SELECT ROW_NUMBER() Over (partition by addressid order by (select null) ) id, value,addressid FROM cte
 cross apply STRING_SPLIT(FullAddress, SPACE(1)) 
 where value<>'city')
 ,cte2 as (
 select ROW_NUMBER() Over (partition by addressid order by id desc) rn ,value ,addressid
 from cte1)
 select  STRING_AGG(value,SPACE(1)) AS Result
 from cte2
 where rn<4 
 group by addressid

Output:

 Result
 60053 OH plain
 60625 IL CHICAGO
 24293 VA Addison
 60108 IL BLOOMINGDALE
 626641104 IL MASON

Best regards,
Melissa


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TomPhillips-1744 avatar image
0 Votes"
TomPhillips-1744 answered

You will be extremely unhappy attempting to parse a full string address into its individual parts. You will be much better off using the USPS or 3rd party to parse them correctly and give you the exact parts.

I have used SmartyStreets for this purpose. They have an API to send the string and get the parts. https://www.smartystreets.com/

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GuoxiongYuan-7218 avatar image
0 Votes"
GuoxiongYuan-7218 answered SQL9-3412 commented

Suppose the city name in the Address table can be found in the City table. You can try this:

 ;WITH CTE AS (
     SELECT a.FullAddress, CHARINDEX(c.CityName, a.FullAddress) AS CityStartPosition 
     FROM @Address AS a, @City AS c
     WHERE CHARINDEX(c.CityName, a.FullAddress) > 0
 )
    
 SELECT FullAddress, SUBSTRING(FullAddress, CityStartPosition, LEN(FullAddress) - CityStartPosition + 1) AS Result
 FROM CTE;
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you GuoxiongYuan-7218

I tried the option which you have given and it worked. Sorry for the delay response. I appreciate your help on this and investing time to provide solution.

Thank you,
RH

0 Votes 0 ·