I thought readers might like a peek at how I spent three hours of my life earlier in the week.The original task was to transfer a set of customer account numbers from a SQL SERVER box to a Teradata (a data warehouse) box in order to find all customer activities associated with those accounts before a certain cutoff date.
Within minutes, it became clear that the real task was getting Teradata to recognize a column of dates as dates. The column looks like this: 07/20/2010, 07/25/2010, ...
What's the problem? You say, of course, anyone can see these are dates. Well, Teradata disagrees - and until and unless Teradata is fully convinced I have offered it a column of dates, I couldn't proceed with my original task, which was to compare these dates to the cutoff date.
Teradata thought those were strings of characters, not dates. I tried the simplest solution first: cast(datecolumn as date). Teradata wouldn't budge, complaining that my datecolumn were "invalid dates".
From the manual, I confirmed that a valid way to use the cast function is cast('2010-07-20' as date). So, I needed a way to convert 07/20/2010 into '2010-07-20'.
I fumbled around a bit as I learned that Teradata does not support many classes of solutions I'm familar with, like regular expressions, an MDY type function (which produces a date given month, day and year inputs), a find-and-substitute function, etc. So I stopped being cute, and reluctantly did it the brute-force way, using substrings and concatenates.
First, I tested cast('2010-07-20' as date) and it created a Teradata date 07/20/2010. Yes, that looked exactly like my input data but human eyes deceive, if the database proclaimed it not to be a date, then it was not a date.
Next, I substituted '2010-07-20' with that brute-force substring-concatenate expression. To my surprise, it failed, complaining again of invalid dates. I fished out some samples of these dates. On visual inspection, they looked like dates. Smelled like dates.
Undeterred, I took out the cast-as-date function and applied the substring-concatenate expression to the full column of dates, and it ran without a hitch. Then, I put the cast-as-date function back in, which instantaneously bombed.
Now that the dumb way stumbled, I went back to being cute. Maybe I could trick Teradata by splitting into two steps, first creating a new data table with the substring-concatenate output, which seemed to have worked before, and then running the cast-as-date function.
Maybe not. As soon as I put the substring-concatenate expression together with two lines of code that generate data tables, it choked. The mystery deepened. The same code when used without the data-table generation code, succeeded in producing Teradata dates, and yet as soon as I wrapped it inside the data-table code, it stalled. The error officially had to do wtih a missing something between the date variable and the comma sign inside the substring function. It was a so-called "syntax error". Very troubling because the same code ran smoothly when the output was ported to a pop-up window but when the output was to be stored in a data table, the server apparently wanted a different syntax! In any case, I couldn't figure out what Teradata was grumbling about.
Teradata and I were not friends at the moment. What to do? Like a spurned lover, I sought out my other good friend, SQL SERVER. What if I converted the column of dates to dates first before the data got transferred into Teradata?
So I did that. After a laborious procedure, the data got moved into Teradata. Alas, the dates still showed up as strings of text. Doubled back to SQL SERVER: there, the dates were dates. This meant the program used to transfer data between the two platforms would not recognize those as dates.
My colleague suggested a Hail Mary. (Yes, now two "data scientists" were working on this glamorous problem.) Makes the dates "datetime". Datetime looks like this: 07/20/2010 00:00:00. The time component will be all zeroes since my data contains no information about time. It was a Hail Mary because we had really no rationale why the database would read datetime but not date. You just do random things when you run out of rational ideas.
It worked. It worked. It worked.
SQL Server converted the column of dates to datetime format. Teradata not only reads this correctly but reads this three times, once as a datetime, once as a date and once as a time.
And I skipped over the exasperating data transfer procedure. Importing the data into Teradata requires a special utility. Half the time, the utility will not launch properly, and when that happens, I issue a command-line instruction to reset the utility from the operating system. On this particular day, after the utility opened, it took five tries to establish a network connection. One setting must be switched from default before running the utility. Switching that setting always snips the network connection. So, it was another five tries to restore the connection. And only then could the data transfer get off the ground.
Three hours later, it worked. The "it" has morphed from finding customer activites to getting a database to see '07/20/2010' as a date not text.
The SQL SERVER box has closed up shop for the night. I still haven't found a single customer transaction. The project has a long way to go.
Every project one runs into situations like this. It's not an outlier. Welcome to the world of data science.