In one of my project, Django is used. And paginator is used to page the logs, and show the logs to administrator. During testing, the page is shown very very slowly, once the logs is over millions. Before analyzing the problem, the code is pasted below.
all_records = query_data_from_db(s_t, e_t, type, vid, keyword)
paginator = Paginator(all_records, page_size)
try:
records = paginator.page(page)
page_num = paginator.num_pages
total = paginator.count
except:
The code is very simple. First of all, query all data from DB, then page the records. At the beginning, I suspect the bottleneck is DB. But the DB is not the root cause after I did few testing. The problem is paginator.page() take longer time. Why paginator.page() take much time? We need anaylze its source code.
The code of paginator is shown in Django Docs. The function page() is simple. But the problem is do caused by this function. In the code below, the page() use parameter count, and count is calculated by len(self.object_list). This is the root cause. The len() function isnot efficient when the dataset is very large!
def page(self, number):
"""
Returns a Page object for the given 1-based page number.
"""
number = self.validate_number(number)
bottom = (number - 1) * self.per_page
top = bottom + self.per_page
if top + self.orphans >= self.count:
top = self.count
return self._get_page(self.object_list[bottom:top], number, self)
def _get_count(self):
"""
Returns the total number of objects, across all pages.
"""
if self._count is None:
try:
self._count = self.object_list.count()
except (AttributeError, TypeError):
# AttributeError if object_list has no count() method.
# TypeError if object_list.count() requires arguments
# (i.e. is of type list).
self._count = len(self.object_list)
return self._count
count = property(_get_count)