 # GDP growth rate nowcasting and forecasting

## A system averaging model implementation

### 2017

#### Abstract

Thesis: Dynamic factor analysis The idea behind factor analysis is to describe the covariance relationship between a large number of variables and some unobserved potential factors. Basically, variables with high correlation are combined to form a new variable or factor, which can describe the variance of all variables in a group. (Johnson, 2007) Dynamic factor analysis is a further development of this idea, adding dynamic factors that change over time. The model describes how the vector of N observed time series X changes over time based on unobserved factors and uncorrelated random terms that describe deviations (such as measurement errors). The two ways of writing DFM to be discussed in this report are the dynamic form, where Xt explicitly depends on the hysteresis of the factor, and the static form of DFM, where Xt implicitly depends on the hysteresis of the factor. Each form of DFM has advantages and disadvantages depending on the purpose of the model. Dynamic factor analysis belongs to a larger class of methods called hidden Markov models, in which observable variables are expressed in the form of hidden or unobserved variables. These few factors can explain the characteristics of big data over time, making dynamic factor analysis very suitable for analyzing macroeconomic data sets. (Stock, 2016) 2.1.1 Principal component method The principal component method is a data set. This creates a set of principal components where there is no covariance, so there is no correlation within the components. To minimize the variance Vr in the data set, principal components can be used to solve the least squares problem in equation (7). 2.1.2 Example of factor analysis Constructing an example to better understand factor analysis is done in Table (1). This example consists of six variables and two factors with related factor loadings. You can view the results of factor analysis in Table (1). Factor 1 seems to be related to income and education, because it loads the most on variables 1-4, while it does not load much on specific domain variables that can be considered less relevant to education. The second factor, factor 2, seems to be a regionally related factor. This factor loads the most on variables 4-6, all of which are related to the area where the house is located. 2.2.1 Bootstrap sampling assumes that a model should be a training data set, expressed as Z= (z1;z2;:::;zN) wherezi= (xi;yi). Randomly sample B new data sets from Z, each of which has the same size as Z. The model is then used in each of these B bootstrap data sets, also known as bootstrap samples. In this project, Matlab is used for all coding. Data is imported from Macrobond to Matlab, which is a service that provides global macroeconomic data. Due to the seasonal pattern in the data set, some data series imported into Matlab must be seasonally adjusted. This is done by using the seasonal adjustment of ARIMA X11 seasonal adjustment. Each forecast is completed at the end of a quarter to match the GDP value for that quarter. The problem is that in a quarter, a large amount of data will not be lost until the end of the quarter. This is solved by using AR(1) on the data series of macroeconomic data to predict the data set to the end of the quarter. When the dynamic factor model finds the principal component factors of the data set, they are combined with the linear regression between the factors and the previously observed GDP value. The benchmark index used for comparison of results is a simple random walk prediction. This means that the predicted GDP growth in one period is the same as in the previous period. A simplified version of how the program works can be found in Algorithm (1). 3.1 Data Due to the available data sets for each country, the number of variables used in different countries is different. Each variable must be at least 56 quarters of continuous data to be included in the forecast data set. There are about 200-550 variables used in different countries. Due to the short length of the time series, some available variables must be discarded, while other variables are discarded because there is no continuous data sequence. Variables do have different release intervals, but the data is added to form quarterly data. Some data series must be classified as stationary, which is a necessary condition for predicting to work normally. In Table (2), you can find a brief summary of the most important statistical data of the variables that have the highest correlation with GDP. The data set comes from Sweden, and a typical behavior with high correlation between this kind of macroeconomic variables can be observed. This section will review four data sets. They came from Sweden and the United States in two different periods. Data sets from seven different countries can be accessed, but due to time constraints, the aforementioned countries are the main focus of the project. Sweden and the United States also have good data sets, with many variables and long time series. The first data set started on October 27th, when the GDP value for the third quarter has not yet been released. This will automatically create two forecasts from the model, one for the third quarter and one for the fourth quarter. The second data set analyzed was imported on December 8. There is only one forecast here for the fourth quarter because the third quarter data was released at that time. Read Less