Showing posts with label Analytics. Show all posts
Showing posts with label Analytics. Show all posts

Tuesday, September 23, 2025

Digging Into the Data

I spend my days working with data. Sometimes I do analysis and sometimes I am the guy the analysts go to when they need help solving a problem. While on vacation last week I had one of my coworkers ask me about some of the data in our system. I explained that I didn't really want to dive into the problem until I got back to work but would thoroughly investigate her question.

Yesterday I had way too many pressing tasks and so I couldn't look into the issue until today. This is one of those types of problems I love to solve and so I attacked it first thing this morning. I am going to simplify the problem for multiple reasons, not the least of which relates to my confidentiality agreement. The problem description is that I have a database with a CUSTOMER table without a date indicating when the customer entered the database. It is always nice to know how long someone has been a customer and the analyst wanted to know when that happened.

I figured there might be another table that might have that information and so I dug around a bit and found an EVENT table. I first looked at all possible event types and discovered about a half dozen. Unfortunately none of them indicated when someone became a customer. To help make sense of the EVENT table, I looked up all possible events for a specific customer and then sorted them by the event date column. A trend quickly appeared and it became obvious that all users would have a specific event once they became a customer. That event would occur frequently but all one needs to do is look at the earliest one and you have a rough approximation of when that person became a customer.

I sent my findings to the analyst that asked the question. About an hour later I received word that my technique works for only about a third of the customers. That is fine for some preliminary analysis but if we want to have beginning dates for each customer, we will have to go back to the group sending us the data and have them add an appropriate column.

I had hoped to solve the problem for all of the customers but fell short. I did find a preliminary workaround and that will allow us to test some ideas we have, which will then justify asking for the data from the source which will take a few days. Until then, the workaround will have to suffice. 

Wednesday, December 7, 2022

A Neat Trick in Excel: VLOOKUP

Lately I have been doing a lot of data analysis. One of the tools I have been using is Microsoft Excel and I recently learned a new trick using VLOOKUP. I thought I would put the information here in case I ever need it again. Oh and it may help you as well.

Most of the time I work with data, I use a relational database. That allows me to join various tables to replace cryptic values with more descriptive ones. A classic example is that of an EMPLOYEE table that uses a department number instead of the name. The database will then have a lookup table that converts the number into the name. This saves a certain amount of space and makes for much more efficient queries.

You can do the same thing in Excel using the VLOOKUP function. In my spreadsheet I have 2 tabs with the first being the employee information and the second being the department. Let's assume the employee tab has the following information:


A

B

C

1

Firstname

Lastname

Deptno

2

Matthew

Bennett

10

3

Mike

Smith

20

4

Sara

King

30

 Now let's assume that those department numbers translate to something much more readable like:


A

B

1

Deptno

Name

2

10

Research

3

20

Operations

4

30

Finance

You can use the VLOOKUP function in the 4th column of the table to include values from the department tab. It has the following syntax:

VLOOKUP(LookupCell, TabName!StartCell:EndCell, PasteColumn, Match)

So the LookupCell is that of the number that we want to convert to descriptive text. In the example above, we would use the column C values. 

The next values to fill in are that of the lookup table. While we could have them on the same page of the workbook, I find it easier to list them on a different one and call it the "department" tab. There are only 3 values and so we would start in column A on the second row. We would then run to B4.

The PasteColumn corresponds to the column number on the lookup tab that contains the value we want to show. Our descriptive text is in the 2nd column and so we would use the value "2".

The Match parameter indicates if we want to use an approximate or exact match. This example uses an exact match and so it should be set to "FALSE". I haven't tried doing an approximate match and so I am not sure of a use case for setting it to TRUE. Perhaps I can play with it for another blog entry.

Now let's put it all together. In cell D2, I would use the following value:

=VLOOKUP(C2, department!$A$2:$B$4, 2, FALSE)

I would then copy that formula and paste it into cells D3 and D4.

If those dollar signs confuse you, that is to hold the values constant regardless of what cell you paste the formula into. Otherwise Excel will adjust the reference. In the above example, the C2 reference doesn't have the dollar sign and will be converted into C3 when you paste it in the C3 cell.

This really helped me with some analysis I did and so I hope it helps you too.

Friday, November 11, 2022

Causal Analysis: The Buoy Pop

I spend a lot of my time looking at data. Sometimes I get the chance to look at trends and perform causal analysis. What is causal analysis? It is looking at data and figuring out when one event causes another. A good example is that is the old-time belief that the rooster crowing in the morning causes the sun to rise. Don't laugh, that was an actual belief. We all know that it is the rising sun that causes the rooster to crow and not the other way around.

A more complex example of causal analysis is in correlating wine drinking with good health. There was a study done that indicated people that drink a glass of wine a day are more healthy than those that don't. On the surface it would seem that everyone should drink a glass of wine every day. Unfortunately life isn't that simple. There are a number of other factors that need to be taken into account as well. The reality is that if you can afford to drink wine, you can also afford proper healthcare. It is not the glass of wine that makes someone healthy, it is affluence and the ability to get proper medical attention when needed. They never mentioned that in the study.

Yesterday I came across a very interesting article about the "Buoy Pop." There is a scientific buoy in the middle of the Pacific Ocean that can predict a snow storm in winter for Utah 2 weeks before it happens. This is important if you are a skier and want to know when the next powder storm is on its way. Meteorologists discount the buoy because the best they can do is predict storms 10 days out with only 20% accuracy. How can wave height indicate a storm 14 days away? This is where data science comes in and says there is some other phenomenon at work that meteorologists have not yet identified. Before continuing, the "Buoy Pop" is only 66% accurate, not 100%. Still that is significantly better than what professional weather forecasters can do.

In order to do proper causal analysis, I would need to gather all possible variables and apply a number of statistical algorithms to see what the actual cause of the snow storm is. All I know for now is that something that causes the storms to arrive also sends a nice wave under a specific buoy in the middle of the Pacific Ocean.

Ultimately the "Buoy Pop" is a nice statistical problem for me to think about but it doesn't effect my actions much. I purposely live in Utah near the ski resorts so that I can take advantage of powder storms as they arrive. For someone who has to travel to ski, I would definitely study when the buoy pops and plan a trip to my favorite Intermountain ski area.