SQL Performance: IN vs. EXISTS
June 16, 2010 Hey, Ted
Concerning your article Update One File Based on Another File, I would agree that the IN is more intuitive than the EXISTS. When looking at the volume of data, possible number of rows to update, and the rows returned for the IN, is there a preferred method if considering performance? Does the IN or EXISTS result in better performance under certain conditions?
The prevailing wisdom is that EXISTS currently tends to outperform IN. Here are comments from two readers that support that position.
A nice consequence of going the IN route (as opposed to using EXISTS) is that, at least on V5R3, Visual Explain gives consistently better execution times.
I don’t know if they’ve fixed it recently, but I’ve steered away from the IN predicate in SQL, if implemented with a SELECT, because performance was poor compared to an EXISTS. I found the IN to be okay for small lists, but if it contained a large result set, my query or update slowed to a crawl.
Now please permit me to make a few comments. First, performance is not a static science. The database team at IBM is constantly working to improve the query engine. Whereas technique A performs better than technique B today, the reverse may be true next week. Hence the words “currently” and “tends” in my first sentence.
Second, since EXISTS has exhibited superior performance, it may pay to learn how to use it.
Third, it is not necessary to optimize every query. Concentrate on the dogs.
Fourth, it may be a good idea to sign up for the IBM DB2 for i5/OS SQL Performance Monitoring and Tuning Workshop.
Other readers wrote in with words of praise for row-value expressions. The following response is typical.
I’ve been looking for this capability in System i SQL for years. I had no idea it had been released in V5R4. Thanks for bringing this to light.
Thanks to all who took time to write. I very much appreciate it.