Evaluation of Search Methods on Community Documents

Kushagra Singh Bisen et. al.
Presentation @ MTSR 2022
08/11/2022
Some Background

Searching for domain-specific information on the web is,

  • Kind of Tough
  • Requires a dedicated platform
Some Background

We have different search methods over community documents but they are,

  • Generally evaluated over efficiency, not user-experience.
  • Very different in implementation to be compared.
Context of the Research

The research is in context of the WikiDisability Project to make disability specific documents more accessible to NGOs, stakeholders and people.

The documents involved are either web based blogs or electronic documents, represented in free text (as PDF) and in structured data (in RDF, on a wikibase instance)

We wished to compare different search methods for the best user-experience of the stakeholders involved.

Search Methods

We chose the following search methods to compare,

  • QAnswer Search over RDF datasets → for the wikibase
  • Elastic Search over Documents → for the documents.
  • QAnswer Search over Documents → for the documents.
Evaluation Environment

The wikibase dataset and the documents were uploaded for every user.

There were 24 documents and 17 candidates for the experiment.

Two different questionnaires were provided to the user,

  • Search Instruction Questionnaire
  • User Experience Questionnaire

Search Instruction Questionnaire
  • Instructions to search were search-method agnostic.
  • A 7-point Likert Scale was provided to record the relevancy of the information retreived.
  • 6 search instructions were provided per search method (5 had answers, 1 did not)
  • 2 minute was the decided threshold for each search.
User Experience Questionnaire
  • Chosen as it provides a benchmark to compare.
  • UEQ lets users express feelings, impressions and attitudes towards a method.
  • The questionnaire contained 26 items, dividing into 6 scales.
  • UEQ finds how the candidates feel about the attractiveness (Pragmatic value) and the ease of use (Hedonic Value).
Results

ESDoc provided the most and relevant answers followed by QAnswer KG and QADoc

ESDoc also provided a false sense of information for instruction with no answers

Results

Likert scale scores for search methods in the relevancy of information retrieved
Likert scale scores for search methods in the relevancy of information retrieved

Results

found-information
Percentage of users who found an answer

Results

The scores obtained from UEQ on different scales

Scales QAnswer KG ESDoc QADoc
Attractive -0.272 -0.114 -0.433
Perspicuity -0.014 -1.205 -0.05
Efficiency -0.22 0.014 -0.583
Dependability -0.132 -0.014 -0.266
Stimulation -0.161 0.0588 -0.1
Novelty -0.088 -0.191 0.266

The scores obtained in the experiment belong to categry bad for all scales in the UEQ benchmark.

Results

One way ANOVA test

Scale F-Ratio P-Value
Attractive 1.269 0.29
Perspicuity 36.20 < 0.001
Efficiency 5.284 0.008
Dependability 0.861 0.429
Stimulation 1.78 0.179
Novelty 3.2 0.049

  • No statistically significant difference between the methods on scales of Attractive, Dependability, Stimulation.
  • Statistically significant differences on the scales of Perspicuity, Efficiency and Novelty.
  • We further do Tukey-Kramer Test to find differences between each method to the other.

Results

qtukey values from Tukey-Kramer Test

critical value is 3.425

Scale Groups qtukey
Perspicuity QAnswerKG vs ESDoc 10.6742
ESDoc vs QADoc 10.029
QAnswerKG vs QADoc 0.306
Efficiency QAnswerKG vs ESDoc 1.86
ESDoc vs QADoc 4.579
QAnswerKG vs QADoc 2.777
Novelty QAnswerKG vs ESDoc 0.798
ESDoc vs QADoc 3.439
QAnswerKG vs QADoc 2.665

We see that there is a significant difference between,

  • ESDoc vs QAnswerKG and ESDoc vs QADoc for Perspicuity scale
  • ESDoc vs QADoc for both Novelty and Efficiency scale

Results
pragmatic-hedonic
Pragmatic and Hedonic Values from UEQ for each search method

  • QAnswer KG was found to be the most efficient and useful.
  • QADoc was found to be the most pleasent to interact with.

Conclusion

We presented a user-experience focused evaluation of search methods on domain-specific documents.

Elastic Search over Documents provided relevant answers but also provided a false sense of relevancy

For non-exploratory question answering with an exact answer, we need more than ESDoc.

QADoc was perceived to be innovative, but didn't perform well for information retreival.

Conclusion

We believe that there is a need to combine various search methods for different types of questions.

We therefore developed a demo to combine the different search methods, and introduced fallback for one to the other.

We employed wikibase to store data around the document, QADoc for the data inside the document.

If there is no confident answer from both, we do an elastic search where the keywords are highlighted.

We further plan to introduce a new set of documents to repeat the experiment with the concluded combined search demo to evaluate the differences.

Thank you for your time, questions?

@argahsuknesib