This post explains the results of my tests to find Excel’s fastest lookup method.
I discussed the report workbook in A Volatile Workbook to Test Calculation Times for Excel Lookup Methods. That workbook includes a report with five columns, each with 1,000 rows of formulas. Each formula returns data from a 50,000-row database.
To time the calculations I rely on the CalcTimer.xls workbook I discussed in Use Excel VBA to Test Report Calculation Times. (You can download the workbook to test calculation times here.)
I performed these tests on an old laptop running at 1.60 GHz. Your results probably will be faster, but approximately proportional to mine.
Calculating Overhead Times
Excel takes a certain amount of time to display the randomly chosen Codes shown in column A. Although that time is consistent for all tests, it could distort the ratios between two tests.
Therefore, I started by testing the time it took to calculate 25 times while using no lookup formulas at all. All “Net Calculation Time” values below are net of the following calculation time.
Calculation Time: 0.141 seconds.
Excel Lookup Times Using Unsorted Data
Both VLOOKUP and INDEX-MATCH can be used against unsorted data. In the following summaries, I provide “key formulas”. In all cases, copy these to the remaining areas of the report as needed.
So let’s see how these lookup methods perform.
VLOOKUP, Unsorted Data
The formulas in this trial rely on VLOOKUP, which has the following form:
=VLOOKUP(lookup_value, table_array, col_index_num, range_lookup)
The range_lookup argument is optional. If it has the value of FALSE, VLOOKUP returns an exact match and can be used against unsorted data. If it’s TRUE or omitted, it returns approximate matches and must be used only against sorted data.
The key formula is:
The net calculation time below seems long…about 260 seconds. But remember that the report has 5000 lookup formulas that we calculate 25 times. So each of these 125,000 lookups takes only about .0021 seconds.
Net Calculation Time: 260.781
INDEX-MATCH in One Formula, Unsorted Data
The formulas in this trial use both INDEX and MATCH:
=INDEX(reference, row_num, column_num, area_num)
=MATCH(lookup_value, lookup_array, match_type)
MATCH’s match_type argument is optional. It can have three values:
0: Exact match. Can be used against unsorted data.
1: Approximate match. (Default value.) Must be used against data sorted in ascending order.
-1: Approximate match. Must be used against data sorted in descending order.
The INDEX and MATCH functions can be used in one formula or in separate formulas. This test used one formula:
Notice that the following Net Calculation Time is about the same as for VLOOKUP.
Net Calculation Time: 253.453
INDEX-MATCH in Two Formulas, Unsorted Data
Unlike VLOOKUP, the INDEX-MATCH method can be split into two formulas. This is very useful when several formulas need to return data from the same row or column of a database, as in this figure. With this design, one MATCH formula can do the slow work and any number of INDEX formulas can do the quick work. This speeds calculation significantly.
Here are the key formulas:
In Lotus 1-2-3 and in the early days of Excel, we were told that formulas would calculate more quickly when they referenced cells above and to the left of the formula. So if you’re an old-timer like myself, you might wonder if this report would have calculated more quickly if the Row column in the figure above were moved to the left of the Data1 column. I tried that design and the calculation time wasn’t affected at all.
As you can see from the following calculation time, this design represents best practice for unsorted data. Because we use one-fifth the number of lookup formulas that the two previous methods use, our calculation time is about one-fifth as long.
Net Calculation Time: 52.234
Excel Lookup Times Using Sorted Data
When VLOOKUP and MATCH work with sorted data, they can be set to use the binary search method, which is much faster than the methods discussed above.
Unfortunately, this method returns approximate matches for both functions. This is unfortunate, because most lookups in my experience require exact matches. That is, if we look for a value that doesn’t exist in our data, we need our formulas to return error values; we don’t want them to return a best guess instead.
Fortunately, there’s an easy way to solve this problem, as you’ll see below. Although the solution nearly doubles the calculation time, the improved accuracy will be worth that trivial cost.
VLOOKUP, Sorted Data
The key formula for this trial is short and easy, and it can be written in two ways:
B3: =VLOOKUP($A3, Data, B$1)
B3: =VLOOKUP($A3, Data, B$1, TRUE)
Notice that by sorting the data and using a binary search technique (as determined by the range_lookup argument), we reduce the calculation time from about 260 seconds to about half a second.
In fact, the search time is so short that if we double the number of lookups we would add only about half a second to the total calculation time. This gives us the freedom to modify the search formula to give us an exact match against sorted data:
Here, we first look up the code and then return the code we find. If the returned code equals the original code, we have an exact match. So we look up the code again and return the value we actually want. Otherwise, if we don’t get an exact match, we return #N/A.
Net Calculation Time for Approximate Match: 0.594
Net Calculation Time for Exact-Match Version: 0.781
INDEX-MATCH in One Formula, Sorted Data
The key formula for this version can be written in two ways:
As with VLOOKUP, we can modify this formula to give us an exact match:
B3: =IF(INDEX(Code,MATCH($A3,Code,1))=$A3, INDEX(Data,MATCH($A3,Code,1),B$1), NA())
Net Calculation Time for Approximate Match: 0.453
Net Calculation Time for Exact-Match Version: 0.688
INDEX-MATCH in Two Formulas, Sorted Data
Finally, this trial uses separate formulas for INDEX and MATCH:
Here, we can modify cell G3 to give us an exact match:
G3: =IF(INDEX(Code,MATCH($A3,Code,1))=$A3, MATCH($A3,Code,1), NA())
In other words, using the two-formula INDEX-MATCH approach against sorted data can be significantly faster than using either VLOOKUP or the one-formula INDEX-MATCH technique, and is best practice.
Net Calculation Time for Approximate Match: 0.391
Net Calculation Time for Exact-Match Version: 0.438
Making Sense of the Lookup Test Results
Here’s how I would summarize these results:
- If you want to return more than one value from one row of a table, use MATCH in one formula to find the row and then INDEX in the other formulas to return the values. This is always the fastest method.
- If your data is sorted, use the sorted-data versions illustrated above. Doing so could speed your calculation time by several orders of magnitude.
- If you need an exact match from sorted data, never rely on an approximate match. Instead, always use the dual-lookup method to make sure that Excel actually has found your lookup_value.
- At its worst, the INDEX-MATCH method is about as fast as VLOOKUP; at its best, it’s much faster.
For more information about VLOOKUP and INDEX-MATCH see: Excel’s VLOOKUP vs INDEX-MATCH Functions.