All,
I need to include symbols in the streetpattern of the Address Parser so that if a user enters 100 W @Main St or 120 W @Westin Jones St or @Main etc the @ will be included in the street portion and not stripped out. The street could be entered as "100 W Main St" or "100 N @Main St" etc. I am having trouble doing this without messing up the PREDIRECTIONAL portion. As the code is, it will pull "Westin Jones" as the street or "Main" but not "@Westin Jones" or "@Main" Here is the publicly available code:
var streetPattern =
string.Format(
CultureInfo.InvariantCulture,
@"
(?:
# special case for addresses like 100 South Street
(?:(?<STREET>{0})\W+
(?<SUFFIX>{1})\b)
|
(?:(?<PREDIRECTIONAL>{0})\W+)?
(?:
(?<STREET>[^,]*\d)
(?:[^\w,]*(?<POSTDIRECTIONAL>{0})\b)
|
(?<STREET>[^,]+)
(?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)
(?:[^\w,]+(?<SUFFIX>{1})\b)
|
(?<STREET>[^,]+)
(?:[^\w,]+(?<SUFFIX>{1})\b)
|
(?<STREET>[^,]+)
(?:[^\w,]+(?<SUFFIX>{1})\b)
(?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)
|
(?<STREET>[^,]+?)
(?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
(?:[^\w,]+(?<SUFFIX>{1})\b)?
|
(?<STREET>[^,]+)
(?:[^\w,]+(?<SUFFIX>{1})\b)
(?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
|
(?<STREET>[^,]+?)
(?:[^\w,]+(?<SUFFIX>{1})\b)?
(?:[^\w,]+(?<POSTDIRECTIONAL>{0})\b)?
)
)
",
directionalPattern,
suffixPattern);
Input = "100 N @Main St" or "N Main St" or "100 Willard Dairy Rd" or "100 @Willard Dairy Rd" etc.
var suffixPattern = new Regex(
string.Join(
"|",
new[] {
string.Join("|", suffixes.Keys),
string.Join("|", suffixes.Values.Distinct())
}),
RegexOptions.Compiled);
var directionalPattern =
string.Join(
"|",
new[] {
string.Join("|", directionals.Keys),
string.Join("|", directionals.Values),
string.Join("|", directionals.Values.Select(x => Regex.Replace(x, @"(\w)", @"$1\.")))
});
Suffix is too large, so here is the directionals used for PREDIRECTIONAL and POSTDIRECTIONAL
public static Dictionary<string, string> directionals =
new Dictionary<string, string>()
{
{ "NORTH", "N" },
{ "NORTHEAST", "NE" },
{ "EAST", "E" },
{ "SOUTHEAST", "SE" },
{ "SOUTH", "S" },
{ "SOUTHWEST", "SW" },
{ "WEST", "W" },
{ "NORTHWEST", "NW" }
};
The RegexMatch is used to populate properties on an object.
expected results:
Street = "@Main" or "Main" or "Willard Dairy" or "@Willard Dairy" per the input
PreDirectional = "N" or "" per the input
Suffix = "ST" or "RD" or "" per the input
actual results:
Street = "Main" or "Willard Dairy" or "Willard Dairy" per the input (@ is stripped out)
PreDirectional = "N" or "" per the input
Suffix = "ST" or "RD" or "" per the input
The actual full pattern used is:
var addressPattern = string.Format(
CultureInfo.InvariantCulture,
@"
^
# Special case for APO/FPO/DPO addresses
(
[^\w\#]*
(?<STREETLINE>.+?)
(?<CITY>[AFD]PO)\W+
(?<STATE>A[AEP])\W+
(?<ZIP>{4})
\W*
)
|
# Special case for PO boxes
(
\W*
(?<STREETLINE>(P[\.\ ]?O[\.\ ]?\ )?BOX\ [0-9]+)\W+
{3}
\W*
)
|
(
[^\w\#]* # skip non-word chars except # (eg unit)
( {0} )\W*
{1}\W+
(?:{2}\W+)?
{3}
\W* # require on non-word chars at end
)
$ # right up to end of string
",
numberPattern,
streetPattern,
allSecondaryUnitPattern,
placePattern,
zipPattern);
addressRegex = new Regex(
addressPattern,
RegexOptions.Compiled |
RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
Called as:
var match = addressRegex.Match(input.ToUpperInvariant());
Hope this is enough information.