寫過 Ruby on Rails 的人,面試肯定都會被問的其中一題

你知道什麼是 N+1 queries 嗎?

By 面試官

這篇簡介一下所謂的 N+1 queries 是什麼?在 rails 裡,我們通常會如何解決?

以下範例,可以自行下載 rails_n_plus_1_queries_problem github repo

首先我們有個 Post / Comment 的 model,彼此是 1-to many 的關係(示意圖與程式碼如下圖)

# app/models/post.rb
class Post < ApplicationRecord
  has_many :comments
end

# app/models/comment.rb

class Comment < ApplicationRecord
  belongs_to :post
end

接下來我們嘗試取出目前最新的 10 則 comment 的 post title,顯示在畫面上,因此我們嘗試這樣寫

comments = Comment.order(created_at: :desc).limit(10)

comments.each do |comment|
  puts comment.post.title
end

這時候你會看到 rails console 實際上執行這段程式碼的方式是逐一的 SQL query 每個 commentpost

Comment Load (8.9ms)  SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.5ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (3.0ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.4ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1

因此原來拉最新 10 則 comment 的 SQL 實際上變成了 11 個 SQL query

  • 1 個 comment SQL query
  • 10 個 post SQL query

以上面的例子來看,甚至 10 次的 post SQL query 其實都是相同且重複的,因此如何減少這些不必要的 query,藉此優化效能也就成為了重要的問題 🙋

接下來,我們來看看幾個解決 N+1 queries problem 的方法

  1. includes
    可以使用 includes 讓 query comment 時同時將 post query 出來,會自動判斷是否使用 preloadeager_load 機制(請見下方的另外兩種解決方式)
comments = Comment.includes(:post).order(created_at: :desc).limit(10)

comments.each do |comment|
  puts comment.post.title
end

可以發現,原本的 11 個 SQL query,減少至 2 個 SQL query

Comment Load (29.8ms)  SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100

備註:如果是更多層的關聯也可以用 nested includes(post: [:author]) 的方式

  1. preload
    使用 preload 預先載入 post
comments = Comment.preload(:post).order(created_at: :desc).limit(10)

comments.each do |comment|
  puts comment.post.title
end
Comment Load (29.8ms)  SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms)  SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100

在這邊的使用情境,會讓人覺得 preload 不就跟 includes 一模一樣嗎?

沒錯,確實在這個例子來說,使用 includes 其實就等同於使用 preload 機制載入

includes 不同的是,preload 不能使用 conditions 過濾關聯資料

讓我們直接看程式碼,什麼樣條件下是不能用 preload

comments = Comment.preload(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
# An error occurred when inspecting the object: #<ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'posts.id' in 'where clause'>

以上這個例子,你會發現在 preload post 時,會發生錯誤!

簡單來說 preload 是不能使用 conditions 在關聯資料 🙅‍♂️

  1. eager_load
    使用 eager_load 套用一個 SQL 的 LEFT JOIN 語法將 post 載入
comments = Comment.eager_load(:post).order(created_at: :desc).limit(10)

comments.each do |comment|
  puts comment.post.title
end
SQL (29.8ms)  SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` ORDER BY `comments`.`created_at` DESC LIMIT 10

看起來使用 eager_load 使 SQL 只要 query 一次,但變得相當複雜且可以想見如果涉及資料量較大的資料表時,最終會 JOIN 出相當大的資料表,可能因此影響記憶體等效能問題。

但當你使用 eager_load 時,便可以支援 conditions 在關聯資料上

讓我們來看看上面 preload 出錯的例子在這邊會變得如何?

comments = Comment.eager_load(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
SQL (4.5ms)  SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` WHERE (posts.id > 0) ORDER BY `comments`.`created_at` DESC LIMIT 10

可以發現 eager_load 正確的使用 LEFT JOIN 語法套用 conditions 至關聯資料並拉出結果

最後,我們來看看如果這個例子使用在 includes 會變得如何

comments = Comment.includes(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
SQL (4.5ms)  SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` WHERE (posts.id > 0) ORDER BY `comments`.`created_at` DESC LIMIT 10

看到這裡,你應該發現了 includes 的智慧之處,其實 includes 是會自動判斷關聯資料,自動決定要使用 preloadeager_load 🎩

因此多數情況建議直接使用 includes 就好囉 ~

參考資料

Leave a Reply