Evaluation of Search Methods on Community Documents

Kushagra Singh Bisen et. al.

Presentation @ MTSR 2022

08/11/2022

Some Background

Searching for domain-specific information on the web is,

Kind of Tough
Requires a dedicated platform

Some Background

We have different search methods over community documents but they are,

Generally evaluated over efficiency, not user-experience.
Very different in implementation to be compared.

Context of the Research

The research is in context of the WikiDisability Project to make disability specific documents more accessible to NGOs, stakeholders and people.

The documents involved are either web based blogs or electronic documents, represented in free text (as PDF) and in structured data (in RDF, on a wikibase instance)

We wished to compare different search methods for the best user-experience of the stakeholders involved.

Search Methods

We chose the following search methods to compare,

QAnswer Search over RDF datasets → for the wikibase
Elastic Search over Documents → for the documents.
QAnswer Search over Documents → for the documents.

Evaluation Environment

The wikibase dataset and the documents were uploaded for every user.

There were 24 documents and 17 candidates for the experiment.

Two different questionnaires were provided to the user,

Search Instruction Questionnaire
User Experience Questionnaire

Search Instruction Questionnaire

Instructions to search were search-method agnostic.
A 7-point Likert Scale was provided to record the relevancy of the information retreived.
6 search instructions were provided per search method (5 had answers, 1 did not)
2 minute was the decided threshold for each search.

User Experience Questionnaire

Chosen as it provides a benchmark to compare.
UEQ lets users express feelings, impressions and attitudes towards a method.
The questionnaire contained 26 items, dividing into 6 scales.
UEQ finds how the candidates feel about the attractiveness (Pragmatic value) and the ease of use (Hedonic Value).

Results

ESDoc provided the most and relevant answers followed by QAnswer KG and QADoc

ESDoc also provided a false sense of information for instruction with no answers

Results

Likert scale scores for search methods in the relevancy of information retrieved

Results

found-information — Percentage of users who found an answer

Results
The scores obtained from UEQ on different scales

Scales	QAnswer KG	ESDoc	QADoc
Attractive	-0.272	-0.114	-0.433
Perspicuity	-0.014	-1.205	-0.05
Efficiency	-0.22	0.014	-0.583
Dependability	-0.132	-0.014	-0.266
Stimulation	-0.161	0.0588	-0.1
Novelty	-0.088	-0.191	0.266

The scores obtained in the experiment belong to categry bad for all scales in the UEQ benchmark.

Results

One way ANOVA test

Scale	F-Ratio	P-Value
Attractive	1.269	0.29
Perspicuity	36.20	< 0.001
Efficiency	5.284	0.008
Dependability	0.861	0.429
Stimulation	1.78	0.179
Novelty	3.2	0.049

No statistically significant difference between the methods on scales of Attractive, Dependability, Stimulation.
Statistically significant differences on the scales of Perspicuity, Efficiency and Novelty.
We further do Tukey-Kramer Test to find differences between each method to the other.

Results

q_tukey values from Tukey-Kramer Test

critical value is 3.425

Scale	Groups	q_tukey
Perspicuity	QAnswerKG vs ESDoc	10.6742
	ESDoc vs QADoc	10.029
	QAnswerKG vs QADoc	0.306
Efficiency	QAnswerKG vs ESDoc	1.86
	ESDoc vs QADoc	4.579
	QAnswerKG vs QADoc	2.777
Novelty	QAnswerKG vs ESDoc	0.798
	ESDoc vs QADoc	3.439
	QAnswerKG vs QADoc	2.665

We see that there is a significant difference between,

ESDoc vs QAnswerKG and ESDoc vs QADoc for Perspicuity scale
ESDoc vs QADoc for both Novelty and Efficiency scale

Results

pragmatic-hedonic — Pragmatic and Hedonic Values from UEQ for each search method

QAnswer KG was found to be the most efficient and useful.
QADoc was found to be the most pleasent to interact with.

Conclusion

We presented a user-experience focused evaluation of search methods on domain-specific documents.

Elastic Search over Documents provided relevant answers but also provided a false sense of relevancy

For non-exploratory question answering with an exact answer, we need more than ESDoc.

QADoc was perceived to be innovative, but didn't perform well for information retreival.

Conclusion

We believe that there is a need to combine various search methods for different types of questions.

We therefore developed a demo to combine the different search methods, and introduced fallback for one to the other.

We employed wikibase to store data around the document, QADoc for the data inside the document.

If there is no confident answer from both, we do an elastic search where the keywords are highlighted.

We further plan to introduce a new set of documents to repeat the experiment with the concluded combined search demo to evaluate the differences.

Thank you for your time, questions?

@argahsuknesib

Evaluation of Search Methods on Community Documents

Some Background

Some Background

Context of the Research

Search Methods

Evaluation Environment

Search Instruction Questionnaire

User Experience Questionnaire

Results

Results

Results

Results The scores obtained from UEQ on different scales

Results

Results

Results

Conclusion

Conclusion

Results
The scores obtained from UEQ on different scales