Add Thesis

Making Chatbots More Conversational

Using Follow-Up Questions for Maximizing the Informational Value in Evaluation Responses

Written by Pontus Hilding

Paper category

Master Thesis


Business Administration>Communication & Media




Master Thesis: Data This section introduces project data. This includes the data available at the beginning of the project, the process of training and validating the model created, obtaining follow-up questions, and annotating a given sentence. In addition, it outlines general guidelines for when to ask follow-up questions. 3.1.1 Available data The data available at the beginning of the student data set project comes from the company, which is explained in Section 1, and consists of approximately 50,000 user responses. education field. The respondents were mainly high school and college students living in the United States. Each answer in the set is a response to the question of following the start-stop-continue (SSC) review technique [8]. SSC's approach is to encourage users to reflect on some recent or current experiences by asking three related questions, which can help improve the experience and move forward. The first question involves things that need to start improving the object, and the second question is what should stop completely and what should continue. Therefore, each answer in this set answers one of the following questions: • What should [inquirer] start to improve [evaluation object]? • What should [inquirer] stop doing to improve [evaluation object]? • What works well with [evaluation object] and should it continue in the same way? In addition, each line of data contains information about any pre-marked entities in the text. The physical pools are all in the education field, so they are composed of homework, teachers, and visual objects. Among the initial questions and answers, an example can be seen in Table 1. The following external word lists can be used at the beginning of the word list project: • A word list containing 527,082 words from English, including spelling variants of the United States, United Kingdom, and Canada [10]. • A list of 179 stop words in English [11]. The following four data sets refer to the utilities explained later in the paper: OntoNotes Corpus OntoNotes Corpus 5.0, a manually annotated corpus containing 2.9 million words. The corpus contains texts from different sources, such as news and talk shows. The annotation includes coreference, name type, and parse tree [12]. The GloVeGloVe pre-trained word vector from the Stanford NLP group has 300 dimensions. The data is based on the 2014 edition of Wikipedia and the English Gigaword 5 corpus [13]. Google NewsGoogleNews pre-training word vector comes from Google. The vocabulary of words contained in this set is about 3 million, and each vocabulary has 300 dimensions [14]. SpaCySpaCy manages different models for performing tasks such as part-of-speech determination and entity tagging. The model used in the project is a medium-sized English model called asencorewebmd [15]. 3.2.1 When to ask follow-up questions Not all answers require follow-up questions. If the user answers a question and the answer contains enough information to satisfy the queryer, he should not be asked follow-up questions (see Objective 1, Section 1). In order to determine what a sufficiently satisfactory answer actually is, a set of predetermined rules or definitions must be followed. Since the initial question is a start-stop-continue type, the main goal is to find an improvement to the evaluation goal, so the answer needs to contain some useful information. The best response is not only to answer what needs to be improved, but also why and how to do it. This is illustrated in Figure 2. The answer that contains all these three components as answer 3 in Table 2 is considered a sufficiently detailed answer from which the queryer can extract the maximum value. Therefore, in the end, the system should not further ask users who give answers of this caliber. In most cases, the answer containing only the first two components should be asked as answer 2 in the same figure. However, in some cases, it is more reasonable not to require follow-up in this case, and may only cause user annoyance. An example is the sentence: "The homework is terrible due to repetitive tasks. The first assignment has a lot of math that I don't like, and the second one is not explained correctly in class." In this case, even if it No specific "methods" are proposed, but the reasons for explanation are sufficiently detailed, and it can be assumed that the inquirer has extracted enough information about the object area that needs improvement. Based on this, the following definitions are used to determine whether follow-up questions are needed. If one of the following conditions is true, follow-up questions are required: • The answer does not include all components what, whynorhow. • If there is a lack of methods, why not explain to the extent that you can easily draw your own conclusions about the improvements you need. Party. This is considered a safer way to obtain high-quality data in the early stages of the project, without considering the randomness that may be caused by outsourcing tasks, as described in the next section. During this process, 1528 random answers from the data set were manually marked as 1 (follow up required) or 0 (followed up required). The distributions of 1 and 0 are as follows: A moderately uniform distribution indicates that a data set composed mainly of American high school students is very suitable for this binary classification task. Read Less