Building a Question Answering Test Collection


Change Log

When What
June 15th, 2015 Donated by Voorhees, Ellen M and Tice, Dawn M.](;


Studies who have been using the data (in any form) are required to include the following reference:

@ARTICLE {voorheesellenmticedawnm.2000,
author = "Voorhees, Ellen M and Tice, Dawn M.",
title = "Building a Question Answering Test Collection",
journal = "Proceedings of SIGIR-2000",
year = "2000",
pages = "200-207",
month = "jul"

About the Data

Overview of Data

The questions were mostly contributed by participants or developed by NIST assessors. A few questions were mined from the FAQFinder system’s logs. The test set consisted of 200 questions. Two questions, questions 131 and 184, were not used when computing final scores because it was determined during judging that the document collection did not contain an answer to those questions.

TREC-8 development set of questions with answers. This set of questions was distributed to participants so they could train their systems. The format for each question is a question number, the text of the question, and then one or more answer strings followed by the id of a document that supports the answer.


The TREC-8 Question Answering (QA) Track was the firstlarge-scale evaluation of domain-independent question answer-ing systems. In addition to fostering research on the QA task,the track was used to investigate whether the evaluation method-ology used for document retrieval is appropriate for a differentnatural language processing task. As with document relevancejudging, assessors had legitimate differences of opinionsas towhether a response actually answers a question, but comparativeevaluation of QA systems was stable despite these differences.Creating a reusable QA test collection is fundamentally moredifficult than creating a document retrieval test collection sincethe QA task has no equivalent to document identifiers.