Retrieval

Retrieval is a task where both the query and the documents being searched are in text format. The goal is to find relevant text documents that match or relate to a given text query.

Core Process

Query and/or Document Encoding: Convert text query and/or documents into a vector.
Similarity Computation: Calculate distance between query and documents.
Ranking: Return documents ordered by distance.
Evaluation: Calculate selected performance metrics (For example, NDCG, MAP, Recall@K) when ground truth is available.

When external database is used to store the documents, the core process is

Search: query encoding may be needed (depends on the setup of external database)
Evaluation: Calculate the metrics

For example, RAG is used, the documentation can be embedded and stored in the Qdrant database.

Data Schema Specifications

Optional meta can be added to the datasets.

Queries Dataset Schema

Column	Type	Description	Required
`_id`	string	Unique query identifier	✓
`query`	string	Query text/question	✓

Documents Dataset Schema

Column	Type	Description	Required
`_id`	string	Unique document identifier	✓
`doc`	string	Document text content	✓

Relevances Dataset Schema

Column	Type	Description	Required
`query_id`	string	References queries._id	✓
`doc_id`	string	References documents._id	✓
`score`	int/float	Relevance score	✓

Supported Models

Only HuggingFace embedding-based models compatible with SentenceTransformer (e.g., SentenceTransformer, BERT, RoBERTa) are supported.