Forecasting GDP Growth, or How Can Random Forests Improve Predictions in Economics?
Written by N. Adriansson, I. Mattsson
Thesis: Autoregressive process When evaluating econometric models, we often use simple linear models as benchmarks. According to Marcellino (2008), the simplest linear time series model is still that, in the era of more complex models, as long as they are well specified, they can perform well when tested with alternative models. Among them, subtracts the constant and is a Gaussian white noise term. The key assumption of the model is that the value of Ytican explains the behavior of Yin in time. In order to make this model stable, all i relations j ij<1 must be satisfied, otherwise the sequence will explode and increase. This type of model is often used for time series analysis, so it will be used as our benchmark model RF for evaluating performance. For more information on AR(p) process properties, see Asteriou and Hall (2011). 3.2 Example: The RF method of guinea pig tooth growth is based on a regression tree. In order to illustrate the working principle of the regression tree, we will start with an example. We have a dataset provided by RfromC. I. Happiness (1952). It includes three different dosage levels of vitamin C (0.5, 1, and 2 mg) and two different distribution methods, orange juice (OJ) and ascorbic acid (VC). The reason why we can make different observations on the same guinea pig is that their teeth wear and grow when they eat. In Figure 2, we found the regression tree diagram. At each node in Figure 2, we have a split criterion. Observations that meet the criteria go to the left, and those that are not satisfied go to the right. We get the node number at the top. In the ellipse graph, we have the average predicted tooth length of the observations that fall into that node. The node number corresponds to the output table provided by Rand, and an example is provided in Table 2. To illustrate, the explanation of node number 5 is as follows: in the case of vitamin C doses less than 0.75 mg, the predicted tooth length of the guinea pig is 13.23 mm for orange juice, and the MSE is 17.900. 3.3 Introduction to Random Forests RF was proposed by Breiman (2001a) and is an extension of his previous work on bagging (Breiman, 1996). It is an algorithm that can handle both high-dimensional classification and regression. This makes it one of the most popular methods in data mining. This method is widely used in independent fields such as biostatistics and finance, although it has not been used to a greater extent in the field of economics. From a mathematical point of view, the algorithm itself is still unknown to some extent, and only stylized outlines are provided in textbooks and articles (Biau and D'Elia, 2011). 3.4 Forest composed of trees As the name suggests, RF is a tree-based ensemble method, in which all trees depend on a set of random variables. In other words, the forest is composed of many return trees to form a whole. We can formally describe it as an ap-dimensional random vector X= (X1;X2;::::;Xp) represents the real-valued predictor variable, and Y represents the real-valued response variable. Suppose their joint distribution PXY(X;Y) is unknown. This is one of the advantages of RF, we don't need to assume any distribution of variables. The purpose of this method is to find a prediction function f(X) topredictY. This is done by calculating the conditional expectation. In RF, the j-th base learner is a regression tree. We denote it as hj(X; j), where ji is the set of random variables, for j=1;2 ;:::;J, these are independent. In RF, the tree is based on a binary recursive partition tree. They divide the predictor space into a series of binary partitions or "split" individual variables that form tree branches. The "root" node in the tree consists of the entire predictor space. The unsplit nodes are called "terminal nodes" or "leaves", and they form the final partition of the prediction space. Each non-terminal node is divided into two descendant nodes, one on the left and one on the right. This is done based on the value of one of the predictors based on the splitting criterion (called the "split point"). Observations of predictors smaller than the cut-off point are to the left, and the rest to the right. The predicted value of the split node of the tree is selected by considering every possible split on each predictor, and then selecting the "best" based on some splits, where yi is the average value. The splitting criterion provides a measure of "goodness of t", where a larger value indicates a worse t, and vice versa. A possible split will create two descendant nodes, one on the left and one on the right. If we use QLandQ and their respective sample size nLandnR to represent the split criteria for possible offspring, then choosing splits to minimize finding the best possible split means sorting the values of the predictor variables, and then considering the different values of each pair . Once the best split is found, the data is split into two descendant nodes, which are then split in the same way as the original node. This process is recursive and stops when the stopping criterion is met. For example, this can be a specified number of unsplit nodes that should be kept. When the stop condition is met, the remaining unsplit nodes are terminal nodes. Then the predicted value of the response variable is obtained as the average of all observed terminal nodes. Read Less