One of the first places I typically look to improve the performance of ColdFusion templates is the query loop. Sometimes developers, in their haste to get things done, don’t take the time to think about how they might optimize the performance of their templates. In some instances this can be justified. Maybe there aren’t enough rows returned by the database query to worry about the time it takes to process each record. But even in these cases there could be issues with scalability in the future.
How many times have we seen something like this:
<cfquery name="get_employees" datasource="#the_dsn#"> SELECT emp_id, fname, lname, dept_id FROM employees </cfquery>
<cfloop query="get_employees"> <cfquery name="get_dept_info" datasource="#the_dsn#"> SELECT dname FROM dept WHERE dept_id = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#dept_id#" /> </cfquery> ... </cfloop>
?
Now admittedly this is an elementary example, which could be solved by using a simple SQL join in the first query:
SELECT e.emp_id, e.fname, e.lname, e.dept_id, d.dname FROM emp e, dept d WHERE e.dept_id = d.dept_id(+)
(Yes, I did use Oracle’s ancient syntax for an outer join above. I can’t get used to the ANSI standard syntax.)
What if we’ve already run the department query for some reason, however? Maybe we’d rather keep our employee query trim. In that case, we can do the following:
<cfquery name="get_depts" datasource="#the_dsn#"> SELECT dept_id, dname FROM dept </cfquery> <cfset dept_names = {} /> <!--- Use structNew() for CF8 or less ---> <cfloop query="get_depts"> <cfset dept_names[dept_id] = dname /> </cfloop>
Now when I loop over my employee query I can do the following:
<cfloop query="get_employees"> <cfset dname = "" /> <cfif structKeyExists(dept_names, dept_id)> <cfset dname = dept_names[dept_id] /> </cfif> ... </cfloop>
One area in which this can be very helpful is when one is using aggregate functions. Suppose we want to get some aggregate data (number of employees, total of salaries, etc.) for each of our departments. However, we also want to get information about each department. We could use a single query with multiple columns in the GROUP BY clause:
SELECT d.dept_id, d.dname, d.budget, d.location, d.supervisor_name , COUNT(e.emp_id) AS employee_cnt, SUM(e.salary) AS total_salary FROM dept d, emp e WHERE d.dept_id = e.dept_id(+) GROUP BY d.dept_id, d.dname, d.budget, d.location, d.supervisor_name
Now the performance of a query with a GROUP BY clause can be improved by adding index(es) on the column(s) mentioned in the GROUP BY clause (at least in Oracle). I would rather not add indexes where I don’t absolutely have to, however – I’d rather not have to think about the possible impact they may have. (Depending on the company for which you work, you might not be permitted to create indexes on your own.) What I can do in this case is break the query into two parts, one:
<cfquery name="get_dept_info" datasource="#the_dsn#"> SELECT dept_id, dname, budget, location, supervisor_name FROM dept </cfquery>
and two:
<cfquery name="get_dept_data" datasource="#the_dsn#"> SELECT dept_id, COUNT(emp_id) AS employee_cnt, SUM(salary) AS total_salary FROM emp GROUP BY dept_id </cfquery> <cfset dept_counts = {} /> <cfset dept_salary = {} /> <cfloop query="get_dept_data"> <cfset dept_counts[dept_id] = employee_cnt /> <cfset dept_salary[dept_id] = total_salary /> </cfloop>
Then when I loop over my “get_dept_data” query I have the data in those structures available, without needing to query the database further.