How I saved huge RAM using smart Pagination?

Pagination is basically dividing the data into different views. Suppose there are list of 100 news data where you just want to show 10 news per page, then instead fetching all 100 news from database, we will fetch only 10 in one go and when user scroll we will load next 10 and so on. As we are fetching less data from DB and less data is travelling over network , it will be faster and consume less RAM.

I had a use case where I needed to load data from multiple table and need to apply pagination on it. For the sake of simplicity let’s assume tables are Physician , Hospital and Clinic . Whenever user search for medical facility for some condition, we fetch allPhysicians , Hospitals and Clinics (by applying required filter) from Database and load them in RAM and return particular subset(according to pagination).

doctors = Physician.objects.filter(some_filter)
hospitals = Hospital.objects.all(some_filter)
clinics = Clinic.objects.all(some_filter)

all_data = doctors + hospitals + clinics
return all_data[offset:offset+limit]

Limit and offset are common terms in pagination. Suppose you are showing 10 item on each page, so for first page offset=0, limit=10 and item range=0-10, for second page offset=10, limit=10 and item range=10-20and so on. For each next page offset will be increased by page size.

it worked with one exception which is performance

Are we good now? NO

Now suppose you have 10k doctors, 5k clinics and 1k hospitals in DB which is total of 16k items. Considering 1kb each, which will become total of 1kb*16kb=16mb . Which means you will be fetching 16MB data from DB each time and loading 16MB data in RAM when you need only 10KB.

Now you know why customers says……

What to do now

Remember first principal of development -> “RAM is costly, Don’t fill it with unnecessary data.”

We need to figure out exactly what records we need to fetch from DB. Instead of fetch all records from DB, we can first fetch item’s count in each table and then check which part of which table need to be loaded.

doctors_count = Physician.objects.filter(some_filter).count()
hospital_counts = Hospital.objects.all(some_filter).count()
clinics_count = Clinic.objects.all(some_filter).count()

results = []
for item in [(Physician, doctors_count),(Hospital, hospital_counts),(Clinic, clinics_count)]:
  if offset > item[1]:
    offset = offset - item[1]
  else:
    offset = 0
    new_results = item[0].objects.filter(some_filter).offset(offset).limit(limit)
    results = results + new_results
    limit = limit - new_results.length()
  if limit == 0:
    break

return results

Suppose we have 5 doctors, 5 hospitals and 5 clinics and our page size is 3. We can represent them like “DDDDDHHHHHCCCCC” (D for Doctor, H for Hospital and C for Clinic). Now think we don’t have full list accessible but we have the count and we need to figure out which indexes to access from which sub-list.

First Page -> Initially we need first three element. We will check the doctor count which is 5(more than 3). So we can take initial 3 doctors (We are taking 1–3 items from list).
Second Page -> Now we need 3 items after initial 3. After skipping 3 initial elements we have 2 doctor remaining. So we will take 2 doctors. Now we will check Hospital count. Hospital count is 5(more than 3–2=1 remaining) and we need 1 item so we will load first hospital. Load 2 doctors (Range 4–5) and 1 hospital (Range 1–1)
Third Page -> Now we need to skip 6 items. Doctor count is 5 so we will skip whole doctors list and will skip 1 hospital(because we needed to skip 6 elements and doctor count is 5. So we need to skip more elements). So we will load 3 hospitals after skipping first hospital. Load 3 hospital (Range 2–4)
Fourth Page -> Now we need to skip 9 items. We will skip 5 doctors and 4 hospitals. We have only one hospital remaining. So we will load 1 hospital. We need 2 more items and as clinic count is 5 which is greater than 2, we will load 2 clinics. Load 1 hospital (Range 5–5) and 2 clinics (Range 1–2)
Fifth Page -> You need to figure this out yourself as homework😁😁

Pros: It will load only required data into RAM and same will be fetched from DB. So considering previously discussed stats we will be loading only 10kb data instead of 16mb. which is a huge difference

Cons: We will be making 3 extra calls on DB for fetching counts which are not big query so impact will be very low.

Your data will grow over time but you can not increase the RAM every time. So 3 extra DB calls seems fine. Also when you compare overall response time, there would be huge difference

Support me as a writer by buying me a coffee. See you soon next time.

Buy Shivam Jindal a coffee