Learning From Free-Text Human Feedback: Examples of Human-Human Dialogs

17 Jan 2024


(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;

(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;

(3) Ye Tian, Wluper, London, United Kingdom;

(4) Nikolai Rozanov, Wluper, London, United Kingdom;

(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.

Table of Links

Abstract & Introduction

Related Work

Datasets Examined

Manual Error Type Analysis and Taxonomies

Automatic Filtering for Potentially Relevant Dialogs

Statistical Analysis

Evaluation and Experiments


Conclusion, Limitation, Acknowledgments, and References

A Integrated Error Taxonomy – Details

B Error-Indicating Sentences And Phrases

C Automatic Filtering – Implementation

D Automatic Filtering – Sentence-Level Analysis

E Task-Oriented Dialogs – Examples

F Effectiveness Of Automatic Filtering – A Detailed Analysis

G Inter-Annotator Agreement – Detailed Analysis

H Annotation Guidelines

I Hyperparameters and Baseline Experiments

J Human-Human Dialogs – Examples

J Human-Human Dialogs – Examples

We find that it hard to identify errors in humanhuman dialogs, as the tone is in principle mostly polite, even if one of the partners is wrong. The following dialog is an example from WoW (Dinan et al., 2019):

  1. USER: I adore photography even though I’m bad at it. It was never a good idea to give me access to a DSLR camera...

    1. SYSTEM: Well, photography is a science. A little practice and you’d get it down.

2. USER: You think so? I think most people, myself included, would consider in an art. But I suppose it’s true that there is a lot of technical detail to it.

2. SYSTEM: I personally do it as a hobby but many do in businesses, film ect. In my opinion it’s always art. i love it because it can capture what your heart see’s. If you angle it just right

Here, one partner claims that photography is science. However, it’s commonly considered as a visual art. Instead of directly correcting him, the other person rather suggests that the system might be wrong. The following example is from PC (Zhang et al., 2018):

1. USER: Hi, how it is going today?

1. SYSTEM: It is going great today; everyday is great in my ford truck.

2. USER: Oh. I am eating a piece of fried chicken.

2. SYSTEM: Right on. I do not allow fried chicken in my ford truck.

3. USER: What do you do for a living?

3. SYSTEM: I work on ford trucks. Do you have a ford truck?

4. USER: No , i drive a toyota.

4. SYSTEM: I am so sorry. You should get yourself a ford truck. Mine is black .

5. USER: I am not. I love my toyota .

5. SYSTEM: Well okay. How about a ford car? They are just as good.

6. USER: Nope, hardcore toyota family. My wife drives a 4runner

6. SYSTEM: Why do they call it a 4runner?

7. USER: Because it will run forever.

7. SYSTEM: Oh, I thought it was because 4 wheeler was taken.

8. USER: Oh. That does not make much sense.

8. SYSTEM: Are you kidding me? Do you know any jokes?

In the second to last turn, one partner makes a joke that the other does not understand. The response is quite polite. He informs the partner that his statement does not make much sense to him.

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Lead image by Amy Hirschi on Unsplash.