question

TCH-2430 avatar image
0 Votes"
TCH-2430 asked TimonYang-MSFT commented

REGEX Help for streetPattern in C# Port of AddressParser of Geo-StreetAddress-US-1.03

All,

I need to include symbols in the streetpattern of the Address Parser so that if a user enters 100 W @Main St or 120 W @Westin Jones St or @Main etc the @ will be included in the street portion and not stripped out. The street could be entered as "100 W Main St" or "100 N @Main St" etc. I am having trouble doing this without messing up the PREDIRECTIONAL portion. As the code is, it will pull "Westin Jones" as the street or "Main" but not "@Westin Jones" or "@Main" Here is the publicly available code:

           var streetPattern =
                 string.Format(
                     CultureInfo.InvariantCulture,
                     @"
                         (?:
                           # special case for addresses like 100 South Street
                           (?:(?<STREET>{0})\W+
                              (?<SUFFIX>{1})\b)
                           |
                           (?:(?<PREDIRECTIONAL>{0})\W+)?
                           (?:
                             (?<STREET>[^,]*\d)
                             (?:[^\w,]*(?<POSTDIRECTIONAL>{0})\b)
                            |
                             (?<STREET>[^,]+)
                             (?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)
                             (?:[^\w,]+(?<SUFFIX>{1})\b)
                            |
                             (?<STREET>[^,]+)
                             (?:[^\w,]+(?<SUFFIX>{1})\b)
                            |
                             (?<STREET>[^,]+)
                             (?:[^\w,]+(?<SUFFIX>{1})\b)
                             (?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)
                            |
                             (?<STREET>[^,]+?)
                             (?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
                             (?:[^\w,]+(?<SUFFIX>{1})\b)?
                            |
                             (?<STREET>[^,]+)
                             (?:[^\w,]+(?<SUFFIX>{1})\b)
                             (?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
                            |
                             (?<STREET>[^,]+?)
                             (?:[^\w,]+(?<SUFFIX>{1})\b)?
                             (?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
                           )
                         )
                     ",
                     directionalPattern,
                     suffixPattern);

Input = "100 N @Main St" or "N Main St" or "100 Willard Dairy Rd" or "100 @Willard Dairy Rd" etc.

            var suffixPattern = new Regex(
                 string.Join(
                     "|",
                     new[] {
                         string.Join("|", suffixes.Keys),
                         string.Join("|", suffixes.Values.Distinct())
                     }),
                 RegexOptions.Compiled);
    
            var directionalPattern =
                 string.Join(
                     "|",
                     new[] {
                         string.Join("|", directionals.Keys),
                         string.Join("|", directionals.Values),
                         string.Join("|", directionals.Values.Select(x => Regex.Replace(x, @"(\w)", @"$1\.")))
                     });

Suffix is too large, so here is the directionals used for PREDIRECTIONAL and POSTDIRECTIONAL

        public static Dictionary<string, string> directionals =
             new Dictionary<string, string>()
             {
                 { "NORTH", "N" },
                 { "NORTHEAST", "NE" },
                 { "EAST", "E" },
                 { "SOUTHEAST", "SE" },
                 { "SOUTH", "S" },
                 { "SOUTHWEST", "SW" },
                 { "WEST", "W" },
                 { "NORTHWEST", "NW" }
             };

The RegexMatch is used to populate properties on an object.

expected results:

Street = "@Main" or "Main" or "Willard Dairy" or "@Willard Dairy" per the input
PreDirectional = "N" or "" per the input
Suffix = "ST" or "RD" or "" per the input

actual results:

Street = "Main" or "Willard Dairy" or "Willard Dairy" per the input (@ is stripped out)
PreDirectional = "N" or "" per the input
Suffix = "ST" or "RD" or "" per the input


 The actual full pattern used is:
    
               var addressPattern = string.Format(
                     CultureInfo.InvariantCulture,
                     @"
                         ^
                         # Special case for APO/FPO/DPO addresses
                         (
                             [^\w\#]*
                             (?<STREETLINE>.+?)
                             (?<CITY>[AFD]PO)\W+
                             (?<STATE>A[AEP])\W+
                             (?<ZIP>{4})
                             \W*
                         )
                         |
                         # Special case for PO boxes
                         (
                             \W*
                             (?<STREETLINE>(P[\.\ ]?O[\.\ ]?\ )?BOX\ [0-9]+)\W+
                             {3}
                             \W*
                         )
                         |
                         (
                             [^\w\#]*    # skip non-word chars except # (eg unit)
                             (  {0} )\W*
                                {1}\W+
                             (?:{2}\W+)?
                                {3}
                             \W*         # require on non-word chars at end
                         )
                         $           # right up to end of string
                     ",
                     numberPattern,
                     streetPattern,
                     allSecondaryUnitPattern,
                     placePattern,
                     zipPattern);
        
                 addressRegex = new Regex(
                     addressPattern,
                     RegexOptions.Compiled |
                     RegexOptions.Singleline |
                     RegexOptions.IgnorePatternWhitespace);
        
     Called as:
        
                    var match = addressRegex.Match(input.ToUpperInvariant());

Hope this is enough information.


dotnet-csharp
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.


Which values to assign to directionalPattern and suffixPattern and which text to test in order to reproduce the problem? If possible, show a complete code fragment and the expected results.

0 Votes 0 ·

Here is the AddressParser before any changes I have added locally:

https://www.nuget.org/packages/AddressParser/1.0.3

0 Votes 0 ·

I accidentally posted the additional information incorrectly, so I have deleted it and updated the original question accordingly.

0 Votes 0 ·

@TCH-2430

I need to include symbols in the streetpattern of the Address Parser

If the problem is caused by this package, it might be better to communicate with the author in his GitHub repository.

1 Vote 1 ·

1 Answer

TCH-2430 avatar image
1 Vote"
TCH-2430 answered TimonYang-MSFT commented

I handled it by pulling out the symbol and then putting the symbol back at the beginning of the street after the validation rather than modify the patterns.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I am glad to know you have solved this issue. You can accept your own answer to end this thread.
Have a wonderful day.

0 Votes 0 ·