Repository evaluations - ox/Rag-Benchmark

Evaluations

Label datasets and evaluate model performance

Oxen.ai allows you to run models row by row over your datasets. This allows you to label data, or evaluate how well a model is performing. Once the model has run over your dataset, you can save the output to a new file or branch, comparing it to the original dataset.

LLM As A Judge

754a9477-d216-4058-8e15-4a7dfeef53fe

Unknown/gemini-1-5-flashtext → text

1 year ago

Prompt

Are the answers equivalent? Answer "true" or "false". All lowercase.

Answer 1: {answer}
Answer 2: {prediction}

gemini-flash

rag_instruct_benchmark_tester.jsonl

gemini-flash-results

rag_instruct_benchmark_tester.jsonl

completed 200 rows11469 tokens$ 0.0002 3 iterations

RAG Benchmark

745c13fb-05e3-4abd-80ba-408596c83c18

Unknown/gemini-1-5-flashtext → text

1 year ago

Prompt

What is the answer to the question given the context? Only reply with text that is contained in the context.

Question:
{query}

Context:
{context}

Answer:

main

rag_instruct_benchmark_tester.jsonl

gemini-flash

rag_instruct_benchmark_tester.jsonl

completed 200 rows63477 tokens$ 0.0013 1 iteration

Loading evaluations...