Reindexer supports hybrid search by full text and knn in one query.
SELECT * FROM ns WHERE ft_idx = "search_str" AND KNN(vec_idx, [2.4, 3.5, ...], k=100)
In hybrid search, there must be exactly one full text condition and exactly one knn condition.
In this case, full text and knn conditions must be inside the same bracket or outside the brackets:
SELECT * FROM ns WHERE (ft_idx = "search_str" AND id > 50 AND KNN(vec_idx, [2.4, 3.5, ...], k=100)) AND id < 10000
Placing full text and knn conditions in different brackets is prohibited:
SELECT * FROM ns WHERE (ft_idx = "search_str" AND id > 100) OR (KNN(vec_idx, [2.4, 3.5, ...], k=100) AND id < 100)
To recalculate the total rank based on the ranks of full text and knn conditions, the recalculation expression must be specified in the statement ORDER BY
.
By default RRF()
is used.
Reciprocal rank fusion RRF
reranking expression may be specified as follows:
SELECT * FROM ns WHERE ft_idx = "search_str" AND KNN(vec_idx, [2.4, 3.5, ...], k=100)
ORDER BY 'RRF(rank_const=120)'
rank_const
is optional, default value is 60, and minimum value is 1.
In this case the rank is calculated using the following formula:
rank = 1.0 / (rank_const + pos_ft) + 1.0 / (rank_const + pos_knn)
where pos_ft
and pos_knn
are documents’ positions in the results of the queries
SELECT * FROM ns WHERE ft_idx = "search_str"
and
SELECT * FROM ns WHERE KNN(vec_idx, [2.4, 3.5, ...], k=100)
respectively.
Another supported reranking expression is linear function based on the full text and knn ranks.
SELECT * FROM ns WHERE ft_idx = "search_str" OR KNN(vec_idx, [2.4, 3.5, ...], k=100)
ORDER BY '30 * rank(ft_idx) + 50 * rank(vec_idx, 100.0) + 100'
where rank(index_name, default_rank)
is rank of full text or knn condition, default_rank
is optional default rank in case of absence of result for a certain condition, default value is 0.0.
General form of the linear reranking expression is
A * rank(ft_idx, r1) + B * rank(vec_idx, r2) + C
where A
, B
, C
, r1
and r2
are numeric values.
Expected values range for the rank(ft_idx)
expression is [1, 255], where 1 is the least possible fulltext rank and 255 is the highest one.
The values range for the rank(vec_idx)
expressions depends on the specified similarity metric. For example, in case of the cosine
-metric it would be [-1.0, 1.0], where -1.0 corresponds to the least relevant vectors and 1.0 corresponds to the most relevant vectors.