In this post I’ll show you how to add batter and pitcher handedness to your spreadsheets. To do this, we’ll have to learn two new Excel formulas we have not tackled yet.
I’ve avoided doing this for a long time… But there’s just no way around it now. It’s time to say goodbye to relying exclusively on VLOOKUP. Let’s put on our big boy pants and tackle VLOOKUP’s more flexible and powerful counterpart… INDEX and MATCH.
Sometimes VLOOKUP Can’t Get the Job Done
Take this scenario. You’ve started to build a DFS spreadsheet and you’ve imported FanDuel player salaries from a CSV file. Now you want to add player handedness (Lefty/Righty) as a column to your spreadsheet.
You are also aware of the Player ID Map and know that it’s an easy way to get handedness information on players.
You look at this data above and you think, “No problem!”. FanDuel ID is in both sets of data. How hard could this be? A simple VLOOKUP and we’re done.
But you quickly realize things are not that easy. You see, the VLOOKUP has a very restrictive assumption. If you are doing a VLOOKUP from the salary information into the Player ID Map, the Excel function assumes that “FanDuelID” will be the FIRST column in the Player ID Map.
And that’s NOT the case.
Let’s look at an example VLOOKUP formula:
In this formula we’ve told Excel to go look for the “FanDuelID” column in the “PLAYERIDMAP” table and give us back the value in the 10th column.
But “FanDuelID” is not the first column of the PLAYERIDMAP. It’s the 33rd (wow… the Player ID Map is getting to be quite large). So VLOOKUP will not work.
Other Weaknesses of VLOOKUP
Not only is assuming the data you want to match is in the first column awfully restrictive, if you think about it, VLOOKUP also ties you to a left-to-right lookup. For example, if you’re trying to use Excel to VLOOKUP which team Michael Brantley plays for, his player ID must be in the first column of your data set and you are then forced in to looking only to the right.
We want a formula that will allow our lookup to be in any column and then to look to the left! For example, go find Nelson Cruz’s FanDuelID and then look to the left a few columns and give me the side of the plate he bats from.
Going back to this example formula right above, the hard coding of a “10” in the formula to return the information in the 10th column is a flimsy approach, but that’s how many people are taught to write VLOOKUP formulas.
The flimsiness comes in if you decide to insert a column somewhere in the PLAYERIDMAP. If column 10 becomes column 11, Excel will not adjust its formula accordingly. Because you are likely building a spreadsheet that you’ll use all throughout the season, it seems highly likely you’ll want to add a new piece of information to your analysis. That inevitably means adding columns to bring that new information in. You don’t want to have to hunt through your formulas to figure out what the new column number in your VLOOKUP needs to be.
Enter “INDEX” and “MATCH”
That’s right. To defeat the almighty VLOOKUP formula, we need to combine the powers of two functions (that last sentence just made me think of this). Let’s take a look.
When I’m using a function I’m unfamiliar with, I will add it through the “Insert Function” button. I like doing this because Excel will then give you a search menu to find a formula. And after locating your function you’ll get a helpful wizard that breaks down all the inputs it needs.
If you follow that approach to add the INDEX function, you’ll soon realize there are two versions of it.
I always use the first version, which allows us to locate a cell anywhere within a block of data and return the value from that cell. This function uses the following inputs:
INDEX(Array, Row_num, Column_num)
- Array – The range of cells you are searching for a value in. This could be a table or a block of cells.
- Row_num – The row within that array that the value is in. This should be a number representing the row.
- Column_num – The column within the array that the value is in. Again, this should be a number representing the column (not the letter representation of the column).
It may help to see a visual representation of the function. Assume we’re trying to find Nelson Cruz’s batting handedness. If we tell the index function to look in the PLAYERIDMAP (the “array”), in row 309 (“row_num”), and column 26 (column_num), it would return “R”.
We know the array to look in. And we can easily determine the column we want to look in. The challenge we now face is how to determine the row to look in… How do we easily determine that Nelson Cruz is listed on row 309. That’s where the “MATCH” function can help us.
The MATCH function will look for a specific value in a range of cells. The function will return a number representing where the matched item falls in the list.
Translating that into English, a realistic use for the function is to look in an entire column for a match. The function will start at the top of the column and proceed down until it locates the desired value. The function then returns where the item falls in the list, which happens to be the row the item is in.
The function uses these inputs:
MATCH(lookup_value, lookup_array, [match_type])
- Lookup_value – This is the value you are hoping to match in the array (or column). For us, this will usually be a Player ID of some sort.
- Lookup_array – This is the area you are searching for the match within. When using the MATCH function with the INDEX function, this will usually be a column of data.
- Match_type – This is an optional input telling Excel some more details about the kind of match you are looking for. You can enter a 1, 0, or -1. Entering a 1 or -1 are forms of approximate matches and are useful if you are looking up numeric values. But we are typically looking to match strings (I consider a Player ID made up of all numbers to still be a string) of text.
This means we want exact matches only. Accordingly, I always use a 0 for this argument (even though it’s optional, leaving it blank tells Excel an approximate match is acceptable).
Combining INDEX and MATCH
As I alluded to before, the power of these two formulas comes when you combine (or nest) them together. Recall that the INDEX function looks like this:
INDEX(Array, Row_num, Column_num)
If we drop the MATCH function in place of the “Row_num” argument:
INDEX(Array, MATCH(lookup_value, lookup_array, [match_type]), Column_num)
We now have a formula that is more flexible and powerful than a VLOOKUP! The combination of INDEX and MATCH can look for a value anywhere in a table of data and we are no longer tied to the first column and a right-only lookup.