question

MarkWingfield-5406 avatar image
0 Votes"
MarkWingfield-5406 asked MarkWingfield-5406 answered

Splitting age text using regex

I work with historical burial records and want to work with Ages in these documents for calculating rough birth dates. This will be within vba code.

Sometimes, these are not simply an integer but are strings showing years, months, weeks and/or days.

An example using all of these, with any spaces removed, is below.

11years3months2weeks3days

The years/months/weeks/days texts could be a list of abbreviations such as mths/m, wks/w etc.

If, as in the example, more than one of these time periods appears, they will be in that order (largest to smallest time period) and the numeric values should make sense ie up to 120 at most for years, 1-23 for months(children can have ages such as 18 months), probably up to 51 weeks, and probably up to 365 days.

However, these are all optional so there could be exams such as below:

1mth3dys
2years6months
3wks5days

I am sure regex can check these formats. I have had some ideas but am not experienced in regex.

Can a regex expression be used to, not just identify tye format but to get the numeric values for any years/months/weeks/days in the text? If so, what code could be used in a vba function to do this?

Cheers

office-vba-dev
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Viorel-1 avatar image
0 Votes"
Viorel-1 answered

If you are interested, here is another approach that uses a single expression:

 Dim example As String
 example = "11years3months2weeks4days"
    
 Dim re As New RegExp
 re.Pattern = "^\s*((\d+)\s*(year|years|yrs|y))?\s*((\d+)\s*(month|months|mths|m))?\s*((\d+)\s*(week|weeks|w))?\s*((\d+)\s*(day|days|d))?\s*$"
 re.IgnoreCase = True
    
 Dim mc As MatchCollection
 Set mc = re.Execute(example)
    
 If mc.Count = 0 Then
    
     MsgBox "Input not recognised"
        
 Else
     Dim f As Match
     Set f = mc(0)
        
     Dim y As Integer
     Dim m As Integer
     Dim w As Integer
     Dim d As Integer
        
     y = CInt(Val(f.SubMatches(1)))
     m = CInt(Val(f.SubMatches(4)))
     w = CInt(Val(f.SubMatches(7)))
     d = CInt(Val(f.SubMatches(10)))
        
     MsgBox "Years: " & y & ", month: " & m & ", weeks: " & w & ", days: " & d
        
 End If

Adjust the pattern if you find more forms of inputs.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

TvanStiphout avatar image
0 Votes"
TvanStiphout answered

I doubt it. Regex is about REGULAR expressions, and these seem very unregular (e.g. months/mth). You may be able to shoe-horn it in, but your expression would be ginormous and impossible to maintain. Hence: bad code.
I would go for simpler string parsing, perhaps aided by a table of translations: (e.g. month > m, mth > m)

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Viorel-1 avatar image
0 Votes"
Viorel-1 answered Viorel-1 edited

For example, to extract the years, try a code like this:

 Dim example As String
 example = "2years6months"
    
 Dim re As New RegExp
 re.Pattern = "(\d+)(years|year|yrs)"
 re.IgnoreCase = True
    
 Dim ms As MatchCollection
 Set ms = re.Execute(example)
    
 Dim n As Integer
 n = 0
    
 If ms.Count <> 0 Then
    
     Dim m As Match
     Set m = ms(0)
        
     n = CInt(m.SubMatches(0))
        
     MsgBox "Number: " & n
        
 Else
    
     MsgBox "Not found"
    
 End If

It also includes some MsgBox for debugging. To use RegEx in VBA, add a reference to “Microsoft VBScript Regular Expression 5.5” from Tools menu.

Adjust the pattern to extract other components. To avoid code repetition, you can create a common function, then “years|year|yrs” will be sent as a parameter.

Theoretically it is possible to write a single complex regular expression to extract all of the numbers, but the above approach is probably simpler.

However, if you also want to detect the bad inputs, you can write a single regular expression that includes all of the possible components.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MarkWingfield-5406 avatar image
0 Votes"
MarkWingfield-5406 answered

Thanks Viorel-1.

That second script is perfect. I now have some new knowledge of SubMatches which I did not know before.

Cheers

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.