Last week DiffEngineX was used to compare two Excel 2007 spreadsheets each with at least 750, 000 rows. The comparison took approximately 4 hours 30 minutes to complete. The user confirmed they were pleased with the results. Align Rows was turned on. We generally recommend the data should be pre-sorted in Excel using functionality available from its Data tab or menu before a comparison, if necessary.
We have compared spreadsheets with a million rows ourselves and seen results in a couple of minutes. So why do some comparisons take hours and some only minutes? The answer is that DiffEngineX is much faster when the number of differences buried in a million rows is small, rather than large. With a large number of differences, more blank rows have to be inserted to get the data to line up and this is a time-consuming, Excel mediated operation.
Additionally DiffEngineX works best with Excel 2002 and above. Slower performance has been noted when DiffEngineX is used with old versions of Excel, such as Excel 2000.
This is the first report of DiffEngineX being used with such a large amount of data. The key points are that the comparison may take hours and that you should ensure the Align Rows feature is selected. Unless your data is guaranteed to be in sorted order, you should get Excel to sort it before a comparison. (This does not apply when comparing formulae based models.)
In the days of Excel 2003, one customer told us he used DiffEngineX to compare two worksheets containing 50, 000 rows a piece.
With such large amounts of data, it is probably difficult to spot the differences in the color highlighted worksheets. That is why we recommend using the Extra dialog to turn on the Hide Matching Rows feature.
Typically with standard sized spreadsheets (2 Mb per workbook) we have seen comparison results generated within less than a minute.