Add Thesis

AI Meeting Monitoring

Written by Andreas Hansson

Paper category

Master Thesis


Computer Science




Thesis: Machine learning As more and more things are made on computers, the amount of data is getting larger and larger. In order to be able to process all this data, big data algorithms are needed. However, developing big data algorithms for specific problems may be very difficult or not feasible. Machine learning models can be developed where more traditional algorithms do not work. Usually, the program runs on a set of specific instructions on how to solve its task. At the same time, the machine learning model creates a set of instructions from a set of data. Examples of two prominent areas of machine learning are email filtering and computer vision. A machine learning model is a method in computer science that allows you to teach computers to recognize patterns that humans may not recognize. This can be done without clear instructions on how to solve the task [8]. Detecting a person from a digital image or video is a biometric software application. This means that the system will use real data instead of artificial data, such as text. The most common example of biometric software is a fingerprint reader that unlocks a mobile phone. Audio classification is a method of labeling and classifying different audio sources among unlabeled audio sources. This helps preserve important conversation moments and makes transcription easier and more accurate. On the other hand, this method allows the audio to be searchable, which is not currently available. Processing audio data is a tedious task. Finding specific speaker information requires hours and computing power. Machine learning is suitable for solving such problems. The input is an unlabeled audio file, and the output is the same file, but with a timeline showing when each participant is speaking. Similar to voice detection, face detection uses raw video as input. The output is the video extraction square around the detected face. 2.1.1 The simplest form of supervised learning and unsupervised learning models accept input and return output. If it is a classification model, the output can be a value or a class. This output is determined based on a set of parameters. When training occurs, these parameters are set to make the output as close as possible to the desired output. In machine learning, the two most common methods are supervised learning and unsupervised learning. In supervised learning, the algorithm builds a mathematical model based on a set of inputs and required outputs. In computer vision, this can be an image with or without a specific object. Unsupervised learning uses input data to build a model, but no output data. An example is fruit classification. Through supervised training, the input is the label of the image and what kind of fruit it is. When using unsupervised training, the input is just images. 2.1.3 Data sets and how to train the model In the model training process, a set is divided into three different sets; training, verification and testing. Training is used to train the model. This will set the weight of the model. Verification is used to check the progress of training, and on this basis, adjust the model structure or change the training parameters. The last group is being tested. This is the set used when the model is completed, and you want to evaluate how well the model performs. The training set is the set from which the model can learn. This is where the matrix weights are calculated. When using supervised learning, the input to the model is given as a vector pair. This pair contains one input and one output (target). For each vector pair in the training set, the model will traverse the set and produce a result that is compared with the output. According to the result of the comparison, adjust the weight of the model. The validation set is the second group that can evaluate the model without bias. According to the situation, adjust the training parameters of the model. With the help of unbiased evaluation, it can be determined whether the model matches the training set. This means that the hidden units (layers and layer widths) of the model in the neural network increase or decrease. When this part of the training is over, a final model will be given. Depending on the learning method used, the training course may also include a third set of tests. The test set is a set completely independent of other sets and is not used to improve the model. This set is used to evaluate the performance of the model without modification. This will give an indication of how the model will behave when deployed. 2.2 Neural network Neural network is a concept that imitates the human brain and is used to recognize patterns. The following sections from 2.2.1 to 2.3.1 cover some existing neural networks. In Figure 2.2, a simple abstraction of a neural network shows how all the different layers work together. The connections between different layers have weights attached to them. A positive value of weight indicates an excitatory connection, and a negative value indicates an inhibitory connection. First, the input is split between different hidden layers. These layers focus on different characteristics. Then all the outputs of the hidden layer are combined to form a single output. Normal output is usually in the range of 0 to 1, but it can also be in the range of -1 to 1. 2.2.1 Recurrent Neural Network Recurrent Neural Network (RNN) is an artificial neural network in which nodes are connected along a sequence. Unlike other feedforward networks, RNN uses internal state (memory) to process variable-length input sequences. With this memory, the network can use information about the sequence itself, not just the current input. This information is hidden. Read Less