寫過 Ruby on Rails 的人,面試肯定都會被問的其中一題
你知道什麼是 N+1 queries 嗎?
By 面試官
這篇簡介一下所謂的 N+1 queries 是什麼?在 rails 裡,我們通常會如何解決?
以下範例,可以自行下載 rails_n_plus_1_queries_problem github repo
首先我們有個 Post / Comment 的 model,彼此是 1-to many 的關係(示意圖與程式碼如下圖)
# app/models/post.rb
class Post < ApplicationRecord
has_many :comments
end
# app/models/comment.rb
class Comment < ApplicationRecord
belongs_to :post
end
接下來我們嘗試取出目前最新的 10 則 comment 的 post title,顯示在畫面上,因此我們嘗試這樣寫
comments = Comment.order(created_at: :desc).limit(10)
comments.each do |comment|
puts comment.post.title
end
這時候你會看到 rails console 實際上執行這段程式碼的方式是逐一的 SQL query 每個 comment 的 post
Comment Load (8.9ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.5ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (3.0ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.4ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
Post Load (0.2ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100 LIMIT 1
因此原來拉最新 10 則 comment 的 SQL 實際上變成了 11 個 SQL query
-
1 個 comment SQL query
-
10 個 post SQL query
以上面的例子來看,甚至 10 次的 post SQL query 其實都是相同且重複的,因此如何減少這些不必要的 query,藉此優化效能也就成為了重要的問題 🙋
接下來,我們來看看幾個解決 N+1 queries problem 的方法
- includes
可以使用includes
讓 query comment 時同時將 post query 出來,會自動判斷是否使用preload
或eager_load
機制(請見下方的另外兩種解決方式)
comments = Comment.includes(:post).order(created_at: :desc).limit(10)
comments.each do |comment|
puts comment.post.title
end
可以發現,原本的 11 個 SQL query,減少至 2 個 SQL query
Comment Load (29.8ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100
備註:如果是更多層的關聯也可以用 nested includes(post: [:author])
的方式
- preload
使用preload
預先載入 post
comments = Comment.preload(:post).order(created_at: :desc).limit(10)
comments.each do |comment|
puts comment.post.title
end
Comment Load (29.8ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`created_at` DESC LIMIT 10
Post Load (1.7ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`id` = 100
在這邊的使用情境,會讓人覺得 preload
不就跟 includes
一模一樣嗎?
沒錯,確實在這個例子來說,使用
includes
其實就等同於使用preload
機制載入
和 includes
不同的是,preload
不能使用 conditions
過濾關聯資料
讓我們直接看程式碼,什麼樣條件下是不能用 preload
的
comments = Comment.preload(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
# An error occurred when inspecting the object: #<ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'posts.id' in 'where clause'>
以上這個例子,你會發現在 preload
post 時,會發生錯誤!
簡單來說 preload 是不能使用 conditions 在關聯資料 🙅♂️
- eager_load
使用eager_load
套用一個 SQL 的 LEFT JOIN 語法將 post 載入
comments = Comment.eager_load(:post).order(created_at: :desc).limit(10)
comments.each do |comment|
puts comment.post.title
end
SQL (29.8ms) SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` ORDER BY `comments`.`created_at` DESC LIMIT 10
看起來使用 eager_load
使 SQL 只要 query 一次,但變得相當複雜且可以想見如果涉及資料量較大的資料表時,最終會 JOIN 出相當大的資料表,可能因此影響記憶體等效能問題。
但當你使用 eager_load
時,便可以支援 conditions 在關聯資料上
讓我們來看看上面 preload
出錯的例子在這邊會變得如何?
comments = Comment.eager_load(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
SQL (4.5ms) SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` WHERE (posts.id > 0) ORDER BY `comments`.`created_at` DESC LIMIT 10
可以發現 eager_load
正確的使用 LEFT JOIN 語法套用 conditions 至關聯資料並拉出結果
最後,我們來看看如果這個例子使用在 includes
會變得如何
comments = Comment.includes(:post).where('posts.id > 0').references(:post).order(created_at: :desc).limit(10)
SQL (4.5ms) SELECT `comments`.`id` AS t0_r0, `comments`.`post_id` AS t0_r1, `comments`.`body` AS t0_r2, `comments`.`created_at` AS t0_r3, `comments`.`updated_at` AS t0_r4, `posts`.`id` AS t1_r0, `posts`.`title` AS t1_r1, `posts`.`body` AS t1_r2, `posts`.`created_at` AS t1_r3, `posts`.`updated_at` AS t1_r4 FROM `comments` LEFT OUTER JOIN `posts` ON `posts`.`id` = `comments`.`post_id` WHERE (posts.id > 0) ORDER BY `comments`.`created_at` DESC LIMIT 10
看到這裡,你應該發現了 includes
的智慧之處,其實 includes
是會自動判斷關聯資料,自動決定要使用 preload
或 eager_load
🎩
因此多數情況建議直接使用 includes
就好囉 ~