In the talk I gave at PHP World in DC, most controversial statement (not surprisingly) was “ORMs are evil” (slide 30). And while I had good examples and stories on performance hits from the query generation and optimization perspective, I didn’t have numbers on hand to show the impact of object construction overhead. Well, thanks to my colleague, who recently had to troubleshoot a performance issue, I now have an real-world example to share.
Consider a polling RESTful API that generates a set of data to return to a requester based on specified date range. Data records consist of various properties collected from different tables in the database. API has been in production for quite some time now, but a few days ago reports of periodic timeouts started to come in. After initial load testing, it looked like the data volume grew, and, as the return dataset approached certain number of records, the API would timeout.
Running a quick performance analysis, we saw the duration to generate a response to be upwards of a 6.5 seconds, which is not as all acceptable for a single query (or even a single page).
| METHOD | REAL | USER | SYS | PCPU | | Base ORM | 6.330 | 5.771 | 0.212 | 94.51 |
Naturally, when dealing with ORM performance issues, the first instinct is to break out of ORM model, validate the SQL (as generated by ORM) and fine-tune if necessary. The query itself, even generated by ORM in this case, ran in milliseconds, which, given the straight forward left joins, is not that surprising.
| METHOD | REAL | USER | SYS | PCPU | | SQL without objects | 0.664 | 0.274 | 0.120 | 59.35 |
What can be surprising to some, is that once the object creation was added back into the process, even with raw SQL call instead of ORM-generated, the performance plummeted back to the 6 second range.
| METHOD | REAL | USER | SYS | PCPU | | SQL with ORM objects | 6.354 | 5.797 | 0.197 | 94.34 |
So, using the base ORM functionality with simple object model (3 left joins) we benchmarked 600 millisecond query and 6 second object generation overhead. Now, extrapolate this performance hit to a more complicated data model. And if you’re feeling adventurous, consider the SQL mis-optimization as directly related to the complexity of data. What you end up with is a pretty, clean object-based code and the system that is completely unusable and, in essence, not working.
Now, despite the tone of this article, ORMs are not bad. They have their uses. You can (and should) use ORM for most non-production processes that require CRUD data manipulation. The best example would probably be admin pages/portals where simple CRUD are the core of the functionality. ORMs will save you time, add convenience (in the form of accessor functions and clean data object structure) and, more likely than not, won’t cause any business critical performance bottlenecks. However, if you choose to rely on ORM in production (read: handling real traffic and/or supporting real business) to avoid dealing with SQL or to have pretty objects available for you with no extra effort, make sure you’re acutely aware of all the hidden costs that come along with that convenience.