Tuesday, September 23, 2025

Digging Into the Data

I spend my days working with data. Sometimes I do analysis and sometimes I am the guy the analysts go to when they need help solving a problem. While on vacation last week I had one of my coworkers ask me about some of the data in our system. I explained that I didn't really want to dive into the problem until I got back to work but would thoroughly investigate her question.

Yesterday I had way too many pressing tasks and so I couldn't look into the issue until today. This is one of those types of problems I love to solve and so I attacked it first thing this morning. I am going to simplify the problem for multiple reasons, not the least of which relates to my confidentiality agreement. The problem description is that I have a database with a CUSTOMER table without a date indicating when the customer entered the database. It is always nice to know how long someone has been a customer and the analyst wanted to know when that happened.

I figured there might be another table that might have that information and so I dug around a bit and found an EVENT table. I first looked at all possible event types and discovered about a half dozen. Unfortunately none of them indicated when someone became a customer. To help make sense of the EVENT table, I looked up all possible events for a specific customer and then sorted them by the event date column. A trend quickly appeared and it became obvious that all users would have a specific event once they became a customer. That event would occur frequently but all one needs to do is look at the earliest one and you have a rough approximation of when that person became a customer.

I sent my findings to the analyst that asked the question. About an hour later I received word that my technique works for only about a third of the customers. That is fine for some preliminary analysis but if we want to have beginning dates for each customer, we will have to go back to the group sending us the data and have them add an appropriate column.

I had hoped to solve the problem for all of the customers but fell short. I did find a preliminary workaround and that will allow us to test some ideas we have, which will then justify asking for the data from the source which will take a few days. Until then, the workaround will have to suffice. 

No comments:

Post a Comment