Thursday, July 3, 2008

Sql datetime validation regex

Yesterday I wanted to create a regex that would validate a datetime to be inserted in a SQL datetime column.

The format I wanted is one that SQL always knows to interpret and it looks like:
yyyy-MM-dd hh:mm:ss.000
2008-07-17 23:59:59.015

Everything started easy, I looked on some sites but all the validation was made pretty easy and lots of exceptions could have slipped.

What I wanted was a regular expression(regex) that would validate: year from 1800+ since SQL mindate is 1753, max day of the month to be either 30 or 31 and for february to be 28 or 29 for bisect years.

A bisect year, called also LEAP year, bissextile year, etc is one that happens at each 4 years when we add 1 extra day to february to compensate the loss of 6 hours not taken into consideration at each sun cycle (one year). Shortly said, at each 4 years, february has 29 days instead of 28 days. Exception happens on years that ends in 00 where they should divide with 400.

First part was easy and then it complicated. The final regex looks like this:

^(1[8-9][2-9][0-9])((00040812162024283236404448525660646872768084889296)(\d\d))[-]((02)(01030507081012)(04060911))[-](?(6)((?(3)(0[1-9]1[0-9]2[0-9])(0[1-9]1[0-9]2[0-8])))((?(7)(0[1-9]1[0-9]2[0-9]3031)(0[1-9]1[0-9]2[0-9]30))))[ ](0[0-9]1[0-9]2[0-3]):([0-5][0-9]):([0-5][0-9])\.\d\d\d$

Oh yeah, that long. Unfortunately I had no idea how to define division by 4 of a number in regex so I had to put all bisect years.

Let me explain it a little:
^(1[8-9][2-9][0-9])((00040812162024283236404448525660646872768084889296)(\d\d))
[-]
((02)(01030507081012)(04060911))
[-]
(?(6)((?(3)(0[1-9]1[0-9]2[0-9])(0[1-9]1[0-9]2[0-8])))((?(7)(0[1-9]1[0-9]2[0-9]3031)(0[1-9]1[0-9]2[0-9]30))))
[ ]
(0[0-9]1[0-9]2[0-3])
:
([0-5][0-9])
:
([0-5][0-9])
\.
\d\d\d$

First line allows strings that starts with 18,19 OR 20 to 99.
Second line divides the under-a-century year into 2 groups, bisect years and the rest. The groups are counting for backreference of regex, and the important one is the bisect year that holds the group with index 3.
Fourth line defines months, splitted into 3 regex backreference groups as in february(group 6) months with 31 days(group 7) and the rest with 30 days(group8)
Sixth line is the killer. It uses double regex conditional clauses and does something like:
IF february (group6)
THEN
IF bisect (group 3)
THEN max 29 days
ELSE max 28 days
ELSE
IF 31 days month (group 7)
THEN max 31
ELSE max 30

The rest of them are pretty simple allowing max 59 value per minute/second or 24 per hour.

I've made a small program to make sure the regex is correct where:
- i defined the 1800-01-01 00:00:00.000
- increased that date with 1 day and validated it until it didn't validated it at year 9999+.
- increased that date with 1 second until an year passed.

All went smooth and tested it too with values as a bisect/nonbisect year with february 30/29/28 etc and it validated right, discarding the bad values.

I know it's pretty big, and might consume some CPU but it's better to have a valid clean date rather than an exception at insert or at some script run-time.

I haven't inserted the rule for 00 leap years, so 29 feb 2100 won't validate right (acceptable for the sake of shorter computation during the next 100 years). :)

Enjoy and please comment with suggestions.

3 comments:

Anonymous said...

Nice, thanks

Rashid

Anonymous said...

how to validate a sql standard date time ex:-25-May-2008 04:17 PM
if you know how to validate it then send me answer to anirudhakumar.gupta@gmail.com

Cosmin S said...

Just change the order of the groups and it should work. Of course if you need AM/PM validation that's another small addition but http://www.regular-expressions.info website should help you a lot performing that task.

Sorry, I don't have the time to compile all the possible date time combinations :)