Search Ratings  Facebook Data Science Interview SQL Question
Search Ratings
search_results
table
column  type 

query  varchar 
result_id  integer 
position  integer 
rating  integer 
You’re given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.

Write a query to compute a metric to measure the quality of the search results for each query.

You want to be able to compute a metric that measures the precision of the ranking system based on position. For example, if the results for dog and cat are….
query  result_id  position  rating  notes 

dog  1000  1  2  picture of hotdog 
dog  998  2  4  dog walking 
dog  342  3  1  zebra 
cat  123  1  4  picture of cat 
cat  435  2  2  cat memes 
cat  545  3  1  pizza shops 
…we would rank ‘cat’ as having a better search result ranking precision than ‘dog’ based on the correct sorting by rating.
Write a query to create a metric that can validate and rank the queries by their search result precision.
Solution
Search results will be the best when rating
(DESC) and position
(ASC) fields have the correct orders. As in, in the example above, when position
is 1
if the corresponding rating was the highest then it would be a good search result.
So I would create a new column to indicate the descending order in rating
column first, for each query.
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY query ORDER BY rating DESC) desc_rating
FROM search_results
That would give me the following table;
query  result_id  position  rating  desc_rating 

dog  1000  1  2  2 
dog  998  2  4  1 
dog  342  3  1  3 
cat  123  1  4  1 
cat  435  2  2  2 
cat  545  3  1  3 
Then I would define the precision metric either Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE). It’s always useful to explain the differences between MAE and RMSE and use the appropriate one depending on your scenario.
Similarities of Mean Absolute Error and Root Mean Squared Error
 Indifferent from direction of errors
 Expresses average model prediction error in units of the variable of interest
 Can range from 0 to ∞
 Negatively orianted scores. (Lower is better)
Differences of Mean Absolute Error and Root Mean Squared Error
 RMSE gives relatively high weight to large errors. (Since the errors are squared before they are averaged)
In our case, if we assume that the number of search results are not too many, and the ratings are from 15, it would probably not make any marginal difference from MAE to RMSE, so I will go with MAE.
WITH base AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY query ORDER BY rating DESC) desc_rating
, ABS(position  desc_rating) residual
, COUNT(DISTINCT result_id) n
FROM search_results
GROUP BY 1,2,3,4
)
SELECT
*
, SUM(residual) OVER(PARTITION BY query) sum_residual
, sum_residual/n mae
FROM base