Add Thesis

Cognitive Search Engine Optimization

Written by Joakim Edlund

Paper category

Bachelor Thesis


Computer Science




Thesis: Search engine A search engine, usually a website, is an information retrieval system that collects and indexes information from different sources on the Internet. Then, the search engine allows users to find the information they need from the content collected through the query interface. Most commonly, this is done through a text input field, and then the engine tries to interpret the input text to match the index information. The results are then usually displayed to the user through a list of links pointing to the collected content that the search engine thinks is relevant to the query. Examples of well-known and frequently used search engines are Google, Bing, and Baidu. These search engines index the World Wide Web and make it available to anyone. 2.1.2 Types of search engines Crawler-based search engines use crawling robots or spiders to index information to the underlying database of the search engine. A crawler-based search engine usually runs in four steps. 1. Crawl information 2. Index the document into the search engine database 3. Calculate the relevance of all documents4. Retrieve the results of incoming search queries. This is the most common type of search engine in today's popular search engines (such as Google, Bing, Baidu, etc.). This is also the type of search engine used in this study, and a more in-depth description of these steps can be found in the following sections. Compared with automatic crawlers, manual directories are indexed manually. The website owner of the website submits a short description of the website to the search engine database. The submission is then manually reviewed by the database administrator and then added to the appropriate category or rejected. This means that search engine rankings will only be based on the description and keywords submitted by the website owner, without considering changes to the page content. After the great success of automated engines such as Google, this type of search engine is no longer used. Hybrid search engine Hybrid search engine is a search engine that mixes crawling and artificial catalogs. They use crawler-based information retrieval robots to collect information, but use manual methods to improve the ranking of results. For example, a hybrid search engine can display manually submitted descriptions in search results, but rank based on a mixture of the submitted descriptions and information crawled from the actual website itself. Others In addition to the types of search engines mentioned earlier, there are also search engines that are specifically used to search for specific types of media. For example, Google has a search engine dedicated to searching for images. These types of engines usually use other technologies because searching for media may require search engines to analyze, for example, the contextual meaning of search queries. 2.1.3 Search engine software Different organizations provide many different search engine software. The website db-engines [3] ranks search engine software based on popularity and updates the list every month. Their website provides detailed instructions on how they calculate this ranking. At the time of writing, the top three search engine software are as follows. ElasticsearchElasticsearch is a distributed open source search engine. Many large companies around the world use it to promote and accelerate their search engine development. Elasticsearch provides tools for indexing documents and building complex search queries. The default installation of Elasticsearch will build an inverted index, mapping each word in the indexed document to all the documents where the word appears and its position in the document. By default, it uses different techniques for different data types to optimize search responses. It is a highly customizable search engine that can be easily customized for specific use cases [4]. Splunk Splunk is an enterprise software product with many pre-packaged connectors to easily index and import data from various sources. Splunk mainly operates through the web interface to create various reports using advanced visualization tools and automatically perform most of the indexing. In order to extend the functionality, Splunk provides an application library called Splunkbase, which is a collection of configurations, knowledge objects, views, and dashboards running on the Splunk platform [5]. SolrSolr is an open source search engine developed by the Apache Software Foundation. It is a search engine based on modeless documents, much like elasticsearch. Their functions are very similar because both projects are based on the lucene engine [6]. 2.2 How does the search engine work? 2.2.1 Relevance and result ranking The search engine returns a list of results to the user in order of relevance to the user's query. This is called result ranking and is one of the basic problems that search engines try to solve. Traditionally, relevance is expressed as a numerical summary of several parameters, which together define the relevance of the document to the query. The resulting list of matching documents is then sorted by relevance number, placing the documents with the highest relevance score at the top and the documents with the lowest relevance at the bottom. How to collect these parameters and how to calculate the relevant number depends on the person who designed the search engine and the relevant content defined by the search engine user. A more general search engine that browses any web page on the Internet may focus on the number of words matched between the web page and the user’s query. Another search engine that focuses on online store articles might include parameters such as the article’s popularity among other customers. Read Less