Nitroimidazole derivatives of polypyridyl ruthenium complexes: Towards understanding their anticancer activity and mode of action.

World Journal of Applied Economics (2023), 9(1): 1-31 doi: 10.22440/wjae.9.1.1 Research Article Combined Forecasts of Intermittent Demand for Stock-keeping Units (SKUs) Aysun Kapucugi̇l İki̇za Gi̇zem Hali̇l Utmab Received: 20.11.2022; Revised: 30.12.2022; Accepted: 03.01.2023 Effective inventory management requires accurate forecasts for stock-keeping units (SKUs), especially for the strategic ones for companies’ operations and after-sales services like providing spare parts. Forecasting is a challenging task for such SKUs as they usually have intermittent demand (ID) patterns, consisting of many periods with zero demand and infrequent demand arrivals. Given the highly uncertain nature of ID for SKUs, this study developed a methodological framework for combining statistical and judgmental forecasts and assessed the performance of the proposed framework by using accuracy and bias measures. The forecasting process has several steps, including data preparation, data categorization based on demand patterns, generating statistical and judgmental forecasts, combining statistical and judgmental forecasts, and evaluating the forecast performance. These steps were illustrated on a real-world dataset that contains monthly customer demand data for after-sales spare parts. Results showed that combination is the best method for the majority of SKUs. This paper contributes to the limited literature by addressing the gap between the combined and ID forecasts. The proposed framework gives practitioners and researchers a comprehensive overview to help them make more accurate forecasts while encouraging the use of simple but structured approaches. JEL codes: C44, C53, M11 Keywords: Statistical forecasting, Judgmental forecasting, Combining forecasts, Intermittent demand, Stock-keeping Units 1 Introduction Intermittent demand (ID) is characterized by infrequent demand arrivals separated by zero-demand time intervals, and its size varies greatly depending on demand, going from thousands of units per month to a few per year. At any point in the supply chain, such demand patterns can be used to describe spare parts and stock-keeping units (SKUs), such as finished goods or semi-finished products, in the product line (Syntetos, 2001). The ID pattern is common in sectors such as aerospace, automotive, maritime, security, information technology, industrial production, and retail (Syntetos & Boylan, 2001; Ghobbar & Friend, a Corresponding author. Dokuz Eylül University, Faculty of Business, Department of Business Administra- tion, İzmir, Turkey. email: aysun.kapucugil@deu.edu.tr 0000-0002-8337-2111 b İzmir University of Economics, Faculty of Business, Department of Business Administration, İzmir, Turkey. email: gizem.halil@ieu.edu.tr 0000-0001-5040-1329 1 Intermittent Demand for SKUs 2003; Willemain et al., 2004). Also, for products that are at the end of their life cycle, this pattern typically occurs (Nagaria, 2017). Such SKUs may be fast or slow movers. Because of the intermittency of demand, companies hold a large and unnecessary amount of inventory. They might account for as high as 60% of total stock value (Johnston et al., 2003), and stock-outs (for example, spare parts in aerospace) frequently imply enormous costs, i.e., very costly operational breakdowns (Babai et al., 2019; Ghobbar & Friend, 2003). On the other, holding costs can be quite high, especially given the products’ high risk of obsolescence (Saccani et al., 2017). As a result, minor improvements in forecasting demand for such SKUs may result in significant cost savings. However, due to the complex structure of these SKUs, forecasting with traditional methods is a challenging task, and they require specialized methods to generate more accurate predictions. Several forecasting methods have been proposed to overcome the problems caused by the ID and generate more accurate forecasts. Starting with the pioneering study of Croston (1972), the literature has grown with many modifications of Croston’s method (CR), such as Syntetos-Boylan Approximation (SBA) (Syntetos & Boylan, 2001), Teunter-Syntetos-Babai (TSB) method (Teunter & Duncan, 2009), and Levén & Segerstedt (2004) modification. Some nonparametric alternative approaches like Bootstrapping (Willemain et al., 2004; Hua et al., 2007; Hasni et al., 2019) and Artificial Neural Network models (Gutierrez et al., 2008; Pour et al., 2008; Kourentzes, 2013) are also proposed. Statistical forecasting methods, on the other hand, are incapable of capturing contextual factors, such as spare parts, maintenance schedules, equipment age, and operating conditions. Even if they are based on historical data containing contextual information, they may take a while to adapt to changes in demand brought on by context-specific dynamics. The use of contextual knowledge is essential in judgmental forecasting methods. There are two ways of using judgments in forecasting. One involves making direct forecasts based on judgment, and the other involves building forecasts using individual judgments. These two applications of judgment are used in various fields, including macroeconomic forecasting, business forecasting, political forecasting, and sports events forecasting (Parackal et al., 2007). Researchers have been paying more attention to these forecasting techniques over the past two decades (Pinçe et al., 2021). Sanders & Ritzman (1992) indicates that as the variability of the time series data increases, the judgmental forecasts generated by the practitioners may be beneficial. The accuracy of judgmental forecasts improves when the analyst has pertinent contextual knowledge, knowledge gained through experience with the forecasting environment, and up-to-date information (Lawrence et al., 2006). Otherwise, when they have limited access to quantifiable information, forecasters excessively emphasise their subjective contributions (Sanders & Manrodt, 2003; Franses & Legerstee, 2010) or repeatedly make bold adjustments based on false information (Petropoulos et al., 2016). Therefore, the judgmental forecasts may be biased and could reduce the forecast’s accuracy. Judgmental forecasts are used in three different settings. When statistical forecasts cannot be used because there is no data available, the only method left is judgmental forecasting. When data is available, a forecaster may use it to produce statistical forecasts; however, these predictions may later be subject to judgmental adjustment based on contextual knowledge. The final setting combines independent forecasts that were produced using statistical and judgmental methods. The latter is the subject of this research. 2 World Journal of Applied Economics 2023(1) As seen from review studies (Clemen, 1989; Timmermann, 2006; Wang et al., 2022), combinations of forecasts have made significant academic strides recently, emerging as a cornerstone of forecasting research. In many cases, combining forecasts is a better approach than identifying a single best forecast, as constituent forecasts often use information from different sources. Furthermore, individual forecasts are subject to model bias from unknown model misspecifications and varyingly affected by structural breaks in the data generating process (Qian et al., 2019). These are commonly referred to as “combination forecasts” or “ensemble forecasts” (Wang et al., 2022). The literature suggests a wide range of sophisticated combination techniques. As Li et al. (2022) stated, the simple average continually outperforms more complex weighting schemes in empirical studies like Chan & Pauwels (2018) and is still an unbeatable forecast combination technique. In the literature, this phenomenon is called a “forecast combination puzzle” (Claeskens et al., 2016; Smith & Wallis, 2009). In general, each combination method has advantages and disadvantages and which combination method should be used depends on several factors. Besides, there is still disagreement over the forecast combination technique that works best in a particular situation (Li et al., 2022; Wang et al., 2022). In the context of forecasting ID, even though there is growing research (Pinçe et al., 2021), forecast combinations have largely gone unnoticed (Li et al., 2022). Specifically, there is very limited discussion on the combination of statistical and judgmental forecasts for ID in the literature. Therefore, given the highly uncertain nature of ID for SKUs, this research aims to develop a methodological framework for combining statistical and judgmental forecasts and assess the proposed framework’s performance by using accuracy measures. It is expected to present the overall process that connects combined and ID forecasting. The rest of the paper is organized as follows. Section 2 presents relevant literature on combining forecasts and ID characteristics. The proposed framework for forecasting ID is described in a six-staged model in Section 3, and Section 4 explains how to combine statistical and judgmental forecasts. Section 5 illustrates the framework on real-world data that contains monthly customer demands for after-sales spare parts from an anonymous company operating in the electronics industry, manufacturing small home appliances. Finally, Section 6 concludes with the limitations of this study and recommendations for future research. 2 Relevant Literature The section begins with a review of the literature on combining forecasts, which serves as the foundation for this study. Following that, ID patterns and associated categorization schemes are presented. 2.1 Combining Forecasts The concept of combining several individual forecasts dates back to the 1960s when British scientist Francis Galton visited an ox weight-judging competition in which eight hundred people competed, most of whom were non-experts with varying abilities. Galton examined the 787 valid estimates provided by the contestants and discovered that the overall mean of the estimates, which represents the collective wisdom of the crowd, is nearly perfect (Surowiecki, 2005). About 60 years after Bates & Granger (1969)’s well-known work popularized the concept, a voluminous literature on forecast combinations emerged (Wang et al., 2022). After this, Clemen (1989), which summarizes a significant amount of research 3 Intermittent Demand for SKUs from different fields, can be seen as a milestone on this topic. Forecast combination, as a term, describes “the combination of forecasts to generate a better forecast; the component forecasts could be outcomes from model averaging, individual models, or expert forecasts” (Wang et al., 2022). Combining forecasts can be done by using some mechanical rules like averaging the included forecasts to the combination process (Lawrence et al., 2006). Clemen (1989) suggested that simply averaging the forecasts by using equal weight (i.e., 1/M where M denotes the number of forecasting methods to be combined) should be used as a basis when proposing more complex weighting schemes. Combination schemes range from simple methods that avoid weight estimation to sophisticated methods that tailor weights for various individual models, including model-free or model-fitting, linear or nonlinear, static or time-varying, series-specific or cross-learning, and frequentist or Bayesian (Wang et al., 2022). Complex averaging methods have been proposed for combining forecasts, but they have yet to be successful (Green & Armstrong, 2015). Empirical evidence still shows that the simple average is much more successful (Li et al., 2022). Although the simple average can reduce forecast variance and remove uncertainty in weight estimation (Palm & Zellner, 1992), it is a sensitive statistic in extreme values. As a result, other strong combinations utilizing the median and trimmed means have received some attention (Petropoulos & Svetunkov, 2020). The organizers of the M5 competition, centered on sales forecasts using a large number of intermittent time series, recently used the simple average of exponential smoothing and ARIMA as the combination benchmark, which performed better or equally well as the individual methods that constituted them (Makridakis et al., 2022). As each combination method has its own merits, which combination method should be used depends on the kind of forecasts (deterministic, probabilistic, quantile, etc.), the size and quality of the model pool, the available information, and the particular forecasting issues (Wang et al., 2022). However, there is still disagreement over the forecast combination technique that works best in a particular situation (Li et al., 2022; Wang et al., 2022). When the constituents of forecast combination come from judgmental and statistical methods, techniques that combine these two categories of forecasts include their simple average, judgmental bootstrapping (i.e., a type of expert system that converts expert reasoning into a set of explicit rules, Armstrong, 2001a), and statistical techniques that aim to eliminate systematic biases from judgemental forecasts (Parackal et al., 2007). Among studies using these techniques, Lawrence et al. (1986) investigated the effectiveness of combining forecasts for time series with various forecasting difficulty and seasonality levels. They found out that for series with low MAPE, the combination is more effective and seasonality has no influence on the benefits of the combination. Working on the effects of difficulty levels of series with a coefficient of variance instead of MAPE, Sanders (1992) also found out that combining forecasts is most effective for simpler series. For harder series, combined forecasts are less accurate than judgmental forecasts because judgments can generate better forecasts than statistical models. Weinberg (1986) worked on the forecasts for attendance at events. In his study, by using an econometric model, ex-ante forecasts were generated. Also, managers generated forecasts by using their judgment. Results showed that the econometric model provided more accurate results than managers’ judgment. However, the accuracy of the combination of the econometric model and judgment was superior to both model and judgment alone. 4 World Journal of Applied Economics 2023(1) There are some studies showing that adjusting the statistical forecast by using judgments increases the accuracy in the forecasting of ID (e.g., Syntetos et al., 2009; Davydenko & Fildes, 2013). However, adjustments are subject to biases and may harm the forecast accuracy (Eroğlu & Croxton, 2010). Mechanically integrating statistical and judgmental methods should be preferred to avoid such biases (Sanders & Ritzman, 2001). The combination of statistical and judgmental forecasts for ID has rarely been discussed in the literature, such that only the simple averaging method has been used to improve the forecasting (Petropoulos & Kourentzes, 2015). Especially for highly ID patterns, Bates & Granger (1969) found that using weighted forecast combinations causes the covariance matrix of forecast errors to be singular. As the obtained errors may contain a large number of zero values, it is impossible to calculate this matrix’s inverse. A similar issue is also valid for regressive approaches. Thus, many sophisticated forecasting methods are inapplicable due to the numerous zeros and non-smooth patterns present in ID. 2.2 Intermittent demand patterns Syntetos (2001, p. 365) stated that “infrequent demand occurrences and variable demand size when demand occurs mean demand to be non-normal since demand per unit period or lead time demand cannot be represented by the normal distribution”. The literature proposes various forecasting methods for different non-normal demand patterns. Specifying and categorizing the demand patterns according to their similarities is essential to find the best-performing forecasting method, which provides the highest forecasting accuracy. Williams (1984) proposed to categorize the patterns based on the idea of variance partition of the demand during lead time as “variance of the demand sizes”, “transaction variability”, and “variance of the lead times”, and classified the SKU demand into three categories: smooth, slow-moving, and sporadic. After this study, several authors proposed different categorizations to determine the best-performing forecasting methods and inventory control parameters. Two parameters -“the average inter-demand interval” (p) and “the square of the coefficient of variation” (CV 2 )- are used to determine the characteristics of demand data. p shows the regularity of demand by measuring the average number of periods between two non-zero demands, and it is calculated as follows. P Intervals between non − zero demand periods (1) p = N umber of non − zero demand periods The coefficient of variation (CV), on the other hand, measures the variation in the demand size and is calculated as follows. CV = Standard deviation of demand values Average demand over periods (2) Johnston & Boylan (1996) showed that Croston (CR) method outperforms the Exponentially Weighted Moving Average (EWMA) for p values greater than 1.25 review periods and mentioned that these SKUs have ID patterns. The authors were the first ones to formally confirm the importance of the value of p as a classification parameter (van Kampen et al., 2012). Eaves (2002) reclassified the SKUs in the dataset by modifying Williams (1984)’s classification as smooth, irregular, slow-moving, mildly intermittent, and highly intermittent. However, as the cut-off values of this categorization are solely dependent on 5 Intermittent Demand for SKUs the properties of the underlying dataset and adequate subsample size considerations, it is not widely applicable. Syntetos et al. (2005) compared EWMA, CR, and Syntetos-Boylan Approximation (SBA) methods based on the theoretical analysis of the Mean Square Error (MSE) to determine the regions of superior performance and define the demand patterns accordingly. They developed a model of four demand categories: “erratic”, “lumpy”, “smooth”, and “intermittent”, as shown in Figure 1. The categorization scheme proposed by the authors is based on p and the CV 2 of demand sizes. Comparing the theoretical MSE values of EWMA, CR and SBA, the cut-off values are determined as p=1.32 and CV 2 =0.49. Syntetos-Boylan-Croston (SBC) scheme was empirically tested using 3,000 SKUs from a company operating in the automotive industry, and the validity is confirmed. Syntetos et al. (2005) contributed to the identification of CV 2 as a new categorization parameter for demand forecasting purposes. Lastly, Kostenko & Hyndman (2006) developed the KH scheme as an extension of the SBC scheme, which is more accurate and simpler. The authors suggested using the SBA method in a smooth pattern if (CV )2 > 2 − (3/2)p. According to the KH categorization scheme, for SKUs with CV 2 value of 0.4 and p value of 1.25, SBA is used, while the SBC categorization scheme suggests using CR for the same SKUs. Figure 1: Demand Patterns Source: Constantino et al. (2018, p.59) SKUs with smooth demand patterns show a regular demand over time, and the nonzero demand size shows little variation. Infrequent demand occurrences may be defined as intermittent (or sporadic) demand. Intermittence refers to the occurrence of the demands but not the sizes of the occurring demands. For this type of demand, the average time between consecutive transactions is significantly longer than the unit period, the latter being the period for updating forecasts (Silver et al., 1998). In the case of an erratic demand pattern, demand for SKUs is highly variable and unstable. In lumpy demand, there are many periods without any demand occurring, and when the demand occurs, its size varies. Since both demand size and demand occurrences are highly variable, it is challenging to forecast SKUs with lumpy demand (Syntetos, 2001). 6 World Journal of Applied Economics 2023(1) 3 Methodology The purpose of this study is to present a methodological framework for combining statistical and judgmental forecasts in order to obtain more accurate results for SKUs with ID patterns. The proposed framework for forecasting ID is described as a six-stage model, as shown in Figure 2. In the first stage, the data is prepared for the forecasting process. The second stage is categorizing the demand data based on demand patterns (intermittent, smooth, lumpy and erratic) mentioned in Section 2.2 by using the categorization schemes proposed for the SKUs with ID patterns. In the third stage, a statistical forecasting model is built using the appropriate ID methods. The best-performing forecasting method is selected using accuracy measures appropriate for ID data. Parallel to statistical forecast model building, the judgmental forecasting model is also built in the fourth stage. The fifth stage is where the best statistical forecasts are combined with the judgmental forecasts by using combining procedures. Finally, the accuracies for statistical, judgmental, and combined forecasts are calculated, and the best-performing methods are selected for each SKU in the sixth stage. The following sections explain each stage of this framework. Figure 2: Stages of the proposed methodological framework 3.1 Data preparation The first stage is to prepare the data for building a forecast model. The data usually gathered from companies may not be suitable for directly using in the forecasting process. For instance, historical data for an SKU may have missing values, consisting only of zeros or a high number of consecutive zeros, especially in the beginning or ending periods, which may indicate that SKU is a new product or it is not in use anymore. Such SKUs should be eliminated from the dataset to start the forecasting process. 7 Intermittent Demand for SKUs 3.2 Categorization of the demand data The second stage is the categorization of the demand data. Categorizing the data according to the SKUs’ demand patterns plays an essential role in selecting the best-performing forecasting method and inventory control parameters. Companies hold high numbers of SKUs, so it is suggested that, instead of evaluating them on an individual basis, it is more effective to assess them as groups with similar characteristics. In the literature, there are several categorization schemes proposed by various researchers (e.g., Williams, 1984; Johnston & Boylan, 1996; Syntetos, 2001; Ghobbar & Friend, 2002; Eaves, 2002; Syntetos et al., 2005; Kostenko & Hyndman, 2006; Boylan et al., 2008). In this study, the Syntetos-Boylan-Croston (SBC) scheme proposed by Syntetos et al. (2005) is used since the majority of the studies reported accurate results based on this scheme (Fildes et al., 2019), and it is also easy to understand and interpret from the point of the users. The SBC scheme uses average inter-demand interval (p) and squared coefficient of variation (CV 2 ) to categorize the demand patterns into four groups: smooth, intermittent, erratic, and lumpy. The authors defined the demand patterns and established the regions of superior performance by comparing the theoretical MSE values of EWMA, CR and SyntetosBoylan Approximation (SBA). The SBC scheme was empirically tested by using 3,000 SKUs from a company operating in the automotive industry, and the validity is confirmed. The results showed that the cut-off points are p=1,32 and CV 2 =0,49. Figure 3 visualizes the categorization scheme recommended by Syntetos et al. (2005). Figure 3: SBC categorization scheme Source: Boylan et al. (2008, p.476) Based on the cut-off values of SBC categorization, p, and CV 2 of each demand pattern, SBS categorization suggests that forecasting the smooth demand with CR gives the highest accuracy while erratic, lumpy, and ID are forecasted best by using SBA. 3.3 Building a statistical forecast model After examining the data patterns using an appropriate categorization scheme, in the third stage, statistical forecasts should be generated using the candidate models for each category and evaluated based on their accuracies to select the best-performing one. 8 World Journal of Applied Economics 2023(1) ID patterns can be modeled by using several statistical forecasting methods (Waller, 2015). This study covers three widely used methods in the literature: the single exponential smoothing (SES) as a traditional method, the pioneer Croston method (CR) proposed for ID pattern and Syntetos-Boylan Approximation (SBA) as one of the CR’s variants. 3.3.1 Single Exponential Smoothing In practice, Simple Exponential Smoothing (SES) is commonly used due to its straightforwardness and robustness. It is a parameter-based method. It provides an EWMA of all observed values. This method aims to estimate the current level to use when forecasting future values. SES is appropriate for data without a seasonal pattern or trend. SES revises an estimate by using more recent experiences. The calculations require a predetermined parameter, “α”, the so-called smoothing constant. The most recent observation receives the highest weight, and the older observations receive less weight. The smoothing constant is a value between 0 and 1 and is determined judgmentally and based on the data’s characteristics. Silver et al. (1998) suggest choosing a value for the smoothing constant between 0.1 and 0.3 if forecasting is done monthly. The general formulation of exponential smoothing is Yt+1 = α At + (1 − α) Yt (3) where Yt and Yt+1 are the old and new smoothed value of the forecast for periods t and t + 1, respectively, and At is a new observation or the actual value of the series in period t. SES is a widely used method by practitioners for SKUs that have both smooth and nonsmooth demand patterns. As a strong candidate among conventional time-series forecasting techniques, SES is chosen for testing its effectiveness in the context of ID forecasting. 3.3.2 Croston’s Method (CR) Croston (1972) proved that using SES is inadequate for forecasting the ID due to the biases it causes. Syntetos et al. (2015, p. 1747) stated that “In SES, data that is more recent weights more heavily. Thus, just after a demand occurs, it gives forecasts that are biased high while it gives forecasts that are biased low just before a demand”. This situation results in high replenishments and excessive stock levels. In order to address this problem, a new forecasting method, the CR model, which separately estimates the non-zero demand size and inter-arrival time between subsequent demands by using SES, is proposed (Croston, 1972). CR is the first proposed method, especially for SKUs that have ID patterns. It assumes that demand occurs as a Bernoulli process; intervals between demands are independent and identically distributed, and demand sizes are independent and normally distributed. According to the CR, the ratio of estimates of the mean size of non-zero demand (Zt ) to the mean interval between non-zero demands (Pt ) provides an estimate of the mean demand per period, Yt , as follows. Yt = Zt Pt (4) The algorithm for CR, which is provided in Table 1, estimates also uses the observed value of the demand, At , and the time interval since the last demand, Q. 9 Intermittent Demand for SKUs Table 1: Croston Method’s Algorithm If At = 0 Zt = Zt−1 Pt = Pt−1 Q = Q + 1 If At ̸= 0 Zt = α At + (1 − α) Zt−1 Pt = α Qt + (1 − α) Pt−1 Q = 1 Figure 4 schematizes the CR forecasting process. When there is not any demand in a review period, the estimates are not changed. If the demand is zero in period t, the algorithm only increments the count of periods since the last positive demand (Croston, 1972). The forecasts are updated only after the occurrence of positive demand. If demand occurs every period, CR’s forecasts are the same as forecasts that SES generate. So, it is possible to use this method both for intermittent and smooth demand patterns. Figure 4: Forecasting process of Croston’s method Source: Nagaria (2017) Several studies showed the superiority of CR over traditional methods (Willemain et al., 1994; Johnston & Boylan, 1996; Ghobbar & Friend, 2003)). Due to its proven success in forecasting the demand for the SKUs with ID and its applicability without any additional costs for the organizations, this method is also used while generating statistical forecasts. 3.3.3 Syntetos and Boylan Approximation (SBA) Due to some mathematical derivation problems, Syntetos & Boylan (2001) reported that CR is biased. Thus, Syntetos-Boylan Approximation (SBA) was proposed as a revised model, which approximately corrects the bias in CR’s demand estimates. The new mean demand estimator is given as follows. Yt = 1 − α Zt 2 Pt (5) SBA is the most widely used variant of CR. Several studies in the literature showed that SBA outperforms other methods such as SES, CR, and MA (Eaves & Kingsman, 2004; 10 World Journal of Applied Economics 2023(1) Syntetos & Boylan, 2001; Syntetos et al., 2005; Gutierrez et al., 2008). This method is selected since it improves the quality of forecasts generated by CR. Another issue is that neither of these methods causes any additional cost for the companies since it is easy to apply by using simple software like MS Excel or RStudio. Due to the success and prevalence of these methods both in literature and practice for ID forecasting, they are selected for this study. 3.3.4 Performance measures of Statistical Forecast Model In practice, the generated forecasts from a forecast model are rarely perfect. Forecast accuracy is the most important criterion when deciding whether to use a forecasting method or not. The next step in the statistical forecasting process is the measurement of forecast accuracies by using appropriate measures to see the performance of each method and related parameters. These measures are based on the forecast error, et , the difference between actual value, At , and forecast, Ft , for a given period t as follows. et = At − Ft (6) The et values closer to 0 show that the forecast is close to the actual. Positive et value shows that the forecasts are smaller than the actual (underestimation), whereas negative et values indicate an overestimation in the forecasts, i.e., forecasts are larger than the actuals. The error measures are categorised into scale-dependent, scale-independent, percentage, and relative. Scale-dependent errors have the same scale as the data and are used for comparing different forecasting methods applied to the same data. It is not appropriate to use such measures to compare datasets with a different scale. In scale-independent error measures, the error is scaled and becomes independent of the scale of the data. They can be used to compare forecasting methods both on a single series and between different series. Relative error measures are used to determine the best-performing method by dividing the forecast errors obtained by using different forecasting methods from each other. This division scales the measures (Hyndman & Koehler, 2006).1 As the chosen forecast error measure can influence the performance ranking of forecasting methods, there is no metric that was universally best (Silver et al., 1998). The most appropriate performance measures for intermittent series are mean absolute scaled error (MASE, as the standard measure for the data with different scales and zero values, Hyndman & Koehler, 2006), scaled mean absolute error (sMAE), scaled mean square error (sMSE), scaled cumulative error (sCE, as traditional (accuracy) measurements, Wallström & Segerstedt, 2010), and scaled periods in stock (sPIS, as bias error measure, Hyndman & Koehler, 2006; Kourentzes, 2014; Petropoulos & Kourentzes, 2015). Relative Geometric Root Mean Square Error (RGRMSE) is also one of the most used relative error measurements first proposed by Fildes (1992). Syntetos (2001) and Syntetos & Boylan (2001) recommended RGRMSE as it is a well-behaved accuracy measure to use in ID. In the last M5 competition, Root Mean Squared Scaled Error (RMSSE) was required to evaluate the accuracy of point forecasts for ID (Makridakis et al., 2022). After calculating the forecast accuracies for each SKU, the results are compared, considering different measures. The final forecasts are generated using the best-performing methods and parameters for each SKU. 1 The details of the error measures are available in Appendix. 11 Intermittent Demand for SKUs 3.4 Making judmental forecasts The mental capacity of humans for processing information has reasonable boundaries. Judgmental forecasters quickly arrive at a point where more information is no longer useful in making more accurate forecasts. Furthermore, regardless of how intelligent humans are, they are incapable of learning about complex relationships solely through experience. Thus, it is difficult for them to forecast complex, uncertain situations without the aid of structured techniques (Green & Armstrong, 2015). In general, it has been demonstrated that standardizing the techniques applied by experts will increase accuracy (Armstrong, 2001a). The jury of executive opinion, the Delphi method, analogies, and scenario forecasting are some of the judgmental methods that can be used to reflect experience and knowledge about the SKUs in the process of forecasting ID. These techniques, which are described in more detail below, can be used to produce judgmental forecasts in parallel with the statistical model construction in Stage 3 of the process. 3.4.1 The jury of executive opinion The jury of the executive opinion, as one of the most common methods used in judgmental forecasting, relies on the opinions and expertise of high-level managers, executives, or experts who have the best insights about the firm’s future situation. The final group forecast becomes the forecast for a product as a blend of opinions of these executives from different functional areas such as sales, marketing, and production. The opinions can either be collected by using personal interviews or group meetings. Though in group meetings, there is a chance of discussion of various viewpoints, there may also be the domination of an expert with a strong personality which may affect the final forecast (Wilson & Keating, 2008). In the jury of executive opinion method, companies can also use statistical models to help with the analysis (Wright, 2013). 3.4.2 Delphi method Assuming that a group’s forecast is more accurate than an individual’s forecast, the Delphi method aims to construct consensus forecasts from a group of experts in a structured and iterative manner (Hyndman & Athanasopoulos, 2018). The Delphi procedure, implemented and managed by a facilitator, has the following steps: (i) Assemble a panel of experts between 5 and 20 with diverse expertise. (ii) Set the forecasting tasks and deliver them to the experts. (iii) Experts provide preliminary forecasts and rationales and summarize the initial forecasts to provide feedback to the experts. (iv) Deliver feedback to experts so that they can revise their forecasts based on the opinions of others. This process is repeated two to three times till the experts reach a satisfactory level. (v) By aggregating the forecasts of experts, construct the final forecasts. This procedure has four key components: anonymity, iteration, controlled feedback, and group response aggregation. During the processes, all participating experts maintain their anonymity and express their opinions privately without social and political pressure. When using group-based processes to gather and synthesize the information, anonymity can minimize the effects of dominant individuals, which are typically a problem (Dalkey, 1969). The geographical dispersion of the experts, as well as the use of electronic communication, facilitates confidentiality and reduces problems associated with group dynamics, such as manipulation or duress to accept a viewpoint (Hsu & Sandford, 2007). Iteration allows 12 World Journal of Applied Economics 2023(1) experts to change their minds without losing face in the eyes of the remaining group members (Hanke & Wichern, 2014). Usually, two or three iterations are sufficient since the experts may drop out as the number of iterations increases (Hyndman & Athanasopoulos, 2018). The Delphi method uses controlled feedback to reduce the effect of noise which refers to group interaction that both affects the data and deals with other expectations rather than concentrating on the main topic. With controlled feedback, a structured description of the prior iteration is distributed to the experts. Using the feedback, experts can generate additional insights (Hsu & Sandford, 2007). Group response is usually generated by giving each expert’s forecast equal weight. 3.4.3 Forecasting by analogies One of the judgmental forecasting methods is to use analogies. Analogy contains information about how people behaved in a similar situation in the past. Forecaster identifies a pattern that happened in the past and applies the same pattern to a new issue. It is expected that analogies will be helpful in forecasting decisions in conflict situations, like strikes or international disputes, since they provide essential information for difficult situations to forecast (Green & Armstrong, 2007). However, analogies are generally used in an unstructured way when people make judgmental forecasts. Armstrong (1985) showed that structured judgmental forecasting methods provide higher accuracy than unstructured ones. The structured analogy is related to Case-Based Reasoning (CBR), which is used in cognitive science and artificial intelligence. In CBR, information about situations (cases) is stored with the intention of recalling cases that are comparable to a target problem assisting in problem-solving (Armstrong, 2001a). Green & Armstrong (2007) proposed a structured approach for forecasting with analogies, which may encourage experts to consider more information on analogies and to process it effectively. Structured analogies have five steps: (i) The appointed administrator describes the target situation briefly and accurately. (ii) The administrator chooses at least five experts who have knowledge about similar situations to the target situation. (iii) Experts identify and describe as many analogies as they can. Based on each of these analogies, forecasts are generated. Green & Armstrong (2007) stated that experienced forecasters on analogies who have more than two analogies generate the most accurate forecasts. (iv) Experts list similarities and differences between their analogies and target situations. Later they rate the similarity of each analogy to the target situation on a scale. (v) From experts’ analogies, the administrator drives the forecasts using a set rule. This rule can be a weighted average where the weights can be guided by the ranking scores of each analogy by the experts (Hyndman & Athanasopoulos, 2018). The lack of situations comparable to the target is a limitation of structured analogy. 3.4.4 Scenario Forecasting Scenario-based forecasting is fundamentally different from judgmental forecasting in its methodology. This method creates forecasts using scenarios that are intended to be plausible but not necessarily most likely. These scenarios either portray a quick snapshot of the future or a believable progression from the present to the future (Bunn & Salo, 1993). Each scenario-based forecast may have a low chance of occurring, unlike the Delphi and using an analogy, where the anticipated outcome is meant to be a likely one. 13 Intermittent Demand for SKUs The effects and interactions of all potential factors and forecasting goals are considered when creating the scenarios. The scenarios help managers to understand the role of uncertainties better. Also, some extremes, like the best, middle, and worst-case scenarios, can be identified when building forecasts based on analogies. Keeping track of these extremes can help with early emergency planning (Hyndman & Athanasopoulos, 2018). In capitalintensive industries like oil companies, vehicle manufacturers, and electric suppliers, which have long planning horizons, scenario techniques are found to be more popular. The selection of judgmental procedures is influenced by significant changes, frequent forecasts, disagreements among decision-makers, and policy considerations (Armstrong, 2001b). If the expected changes are not significant, methods are likely to differ slightly in accuracy. Expert forecasts, which can be adapted to the condition and prepared instantly, may also suffice for infrequent forecasts. If decision-makers expect large changes in the situation and are not in conflict, forecasts can be obtained from experts through the jury of executive opinion or the Delphi method. Scenario forecasting is an alternative when decision-makers need forecasts to examine different policies, and it is difficult to find relevant analogies. 4 Combining statistical and judgmental forecasts The fifth stage combines the forecasts from the best statistical model and the judgmental forecasts. Combining forecasts generated by using different methods plays an essential role in improving overall accuracy. Each technique adds different information to the forecasting process, which a single technique is not capable of. Blattberg & Hoch (1990) stated that the final forecasts generated by combining judgmental and statistical forecasts are more accurate than the constituent forecasts. The opinions of the experts add contextual information, which explains the unexpected issues in the data. Also, it helps forecasters to understand future events which would affect the historical data. Combination procedures can range from mechanical methods, such as taking a simple or weighted average of the constituent forecast, to using judgment to determine how forecasts should be combined (Lawrence et al., 2006). The simplest approach is using an equal arithmetic average of the individual methods (i.e. 1/M, where M is the number of forecasting methods to be combined), which provides improvements in accuracy for many forecasts (Clemen, 1989). More complex weighting schemes can also be determined by judgments based on which modeling approach seems strongest or by several trials considering equal or alternate weighting. For instance, if the practitioner believes that there are some issues that may cause one constituent forecasting method to perform better than the other, s/he can weigh this method heavier. However, this complexity and the time waste it causes would make combining forecasts less desirable for an organization (Sanders & Ritzman, 2004). Forecast errors range from 5.5% to 94% when forecasts are combined using a complex method (Duncan et al., 2001). The findings of Fildes & Petropoulos (2015) supported differential weighting in situations when there is prior evidence on which methods provide forecasts that are most accurate given the conditions. Armstrong (2001c) proposed some procedures for combining the forecasts. Combining should be done mechanically to obtain greater accuracy and reduce bias, and all procedures should be described in more detail. When using judgment, it should be done in a structured manner, and the details of the procedure should be documented. When there is insufficient information about the relative accuracy of alternate forecasting sources, judgmental 14 World Journal of Applied Economics 2023(1) weights should be avoided. When subjects provide positive feedback about the accuracy of the sources, judgmental weighting is more accurate (Fischer & Harvey, 1999). Armstrong (2001c) stated that when the practitioner is uncertain about the forecasting methods to be used, each forecast should be weighted equally. Fildes & Petropoulos (2015) found that differential weighting is appropriate when there is prior knowledge of the methods that produce forecasts that are the most accurate given the circumstances. 4.1 Evaluation The last stage of the framework is evaluating the performances of statistical, judgmental, and combined forecasts to find out the best-performing one for each SKU. In the final evaluation, the performance of each model is compared based on MASE, RMSE, S-MAPE, and bias in this study. Bias is calculated by averaging the difference between the actual and predicted value. This value should be close to zero if the forecast is unbiased. The positive value refers to underestimation, and the negative value means that the forecasting method overestimated the values. Aside from producing the most accurate forecasts, the chosen forecasting method should also produce forecasts that are timely and easily understood by management, allowing the forecast to aid in making better decisions (Hanke & Wichern, 2014). 5 Empirical Illustration The proposed framework of this study is illustrated on real-world data that contains monthly customer demands (60 months, from 2014 to 2018) for after-sales spare parts from an anonymous company operating in the electronics industry, manufacturing small home appliances. The company did not provide any information about details except the amounts of the demanded SKUs, and all SKU names were changed for anonymity. 8,498 SKUs are presented in a raw form in this dataset. In this study, it is assumed that the SKUs in the dataset are independent of one another because the company withheld information about the SKUs due to anonymity. Furthermore, it is assumed that the historical data was accurately and properly recorded. 5.1 Data preparation The demand data is examined to prepare for the forecasting process. All calculations and data analysis is performed by using MS Excel 2019 and RStudio (version 3.6.0).2 First, the historical demand dataset is examined to determine whether it is incomplete or inconsistent. Some SKUs that have not been demanded in the last 60 months are being phased out. Also eliminated are SKUs with no or too few demand occurrences in the first and last 24 months, those with less than ten demand occurrences in 60 months, and products with no demand between months 24 and 36. Finally, there are 2,431 SKUs left, totaling 145,860 data points. Descriptive statistics given in Table 2 show the variety in the demand intervals, non-zero demand sizes, and demand per period. The demand intervals imply that the number of consecutive zero demands is high. 2 The codes are available upon request. 15 Intermittent Demand for SKUs Table 2: Descriptive statistics for after-sales spare parts 2,431 SKUs in Total Minimum First quartile (25%) Median Third quartile (75%) Maximum Demand intervals Mean Std. Dev. 1.00 0.00 1.07 0.26 1.52 0.85 2.63 2.07 7.30 4.92 Demand sizes Mean Std. Dev. 1.11 0.79 5.24 19.51 9.84 37.24 17.83 64.47 44.69 149.37 Demand per period Mean Std. Dev. 0.63 6.24 5.39 28.61 10.65 48.10 19.05 75.09 50.19 169.44 5.2 Categorization of the demand data According to the SBC categorization scheme, considering the average inter-demand interval and squared coefficient of variation values for spare parts, the dataset is categorized by using the RStudio’s “tsintermittent” package (Kourentzes & Petropoulos, 2016) and the idclass function. 389 SKUs have smooth demand patterns, while the remaining 2,042 SKUs have erratic, lumpy, and ID patterns, as shown in Figure 5. Figure 5: SBC classification for after-sales spare parts In Table 3, the distribution of the dataset according to demand categories is presented using the p and CV 2 values for classification. The majority of the SKUs in the dataset have ID, while the least number of SKUs (16%) have a smooth demand pattern. Table 3: Demand patterns for after-sales spare parts Pattern Smooth Erratic Intermittent Lumpy Condition p ≤ 1.32 and CV 2 ≤ 0.49 p ≤ 1.32 and CV 2 ≤ 0.49 p > 1.32 and CV 2 ≤ 0.49 p > 1.32 and CV 2 > 0.49 Quantity 389 582 802 658 Percentage 16.0 23.9 33.0 27.1 5.3 Building statistical forecast model This stage is primarily concerned with obtaining the best statistical forecast model for monthly customer demands for after-sales spare parts. The analysis is carried out in the following steps to obtain this model: identification of statistical forecasting methods based on the nature of the data, estimation of models, evaluation of models based on forecast accuracy, and selection of the final model to generate forecasts. 16 World Journal of Applied Economics 2023(1) First, the statistical forecasting methods that will be applied to each SKU in the dataset are determined as the traditional single exponential smoothing (SES), the pioneer Croston method (CR) proposed for ID pattern, and Syntetos-Boylan Approximation (SBA) as one of the CR’s variants. Each model is estimated in the second step using the proper parameters. The datasets used to build forecast models are typically split into the training and test sets (Hyndman & Athanasopoulos, 2018). Out of 60 months, the first 42 are used as a training set to estimate the statistical forecasting model’s parameters, and the remaining months are used as the test set to evaluate the model’s accuracy. The “tsintermittent” package of RStudio is used to develop the models. The package offers “sexsm” function for building the traditional SES model and “crost” function for the CR and SBA models. These three methods use the smoothing constant (α) for forecasting as well as initial values to start their algorithms, so the selection of these parameters will directly affect how well they perform. The literature usually suggests a small α ranging from 0.05 to 0.30. For ID data, low smoothing constants between 0.10 and 0.20 are realistic (Croston, 1972; Syntetos et al., 2005). Therefore, α values are chosen between 0.05-0.30, incrementing in 0.05 in this analysis. Regarding the initial values to start their algorithm, SES requires one initial value for the first estimate, whereas CR and SBA require two initial values (separately for demand size and the interval between non-zero demands estimates). These initial values can be determined by using the naı̈ve method (one step ahead), or calculated by taking the mean of the related values, or determined by the practitioner judgmentally. Similarly, practitioners may choose to use optimal parameters rather than fixed initial values. Cost (or loss) functions are used for the optimization process, and they measure the fit of a model to the actual data. MAR (mean absolute rate), MAE (mean absolute error), MSR (mean squared rate) and MSE (mean squared error) are the cost functions used in this analysis, and the parameters minimizing the specified cost functions are chosen. The “crost” function offers an option to use single or two optimal α parameters for CR and SBA. The parameter α can be calculated automatically by choosing one of the predetermined cost functions. In this study, besides the preset α values, forecasts are generated by using all combinations of single and double α values that optimize each cost function. For the preset forecasts, the calculations are repeated for the mean and naı̈ve initial values. In total, for SBA and CR, 56 different forecasts are generated for each SKU. The forecast horizon is 18 months which is equal to the length of the test set. Once the forecasts are generated, the best-performing method and parameters for each SKU are chosen based on the forecast accuracies in the third step. For evaluating the statistical forecasting methods, scaled mean absolute error (sMAE), scaled mean squared error (sMSE), root mean squared error (RMSE), symmetric mean absolute percentage error (S-MAPE), mean absolute scaled error (MASE), scaled cumulative error (sCE), and scaled periods in stock (sPIS) are used in the study. Scaling the errors turn the accuracy measure into a scale-independent form. Scale-independent errors can be compared across the series. That is why sMAE, sMSE and MASE are selected. Also, MASE is commonly used in evaluating ID forecasts since it is appropriate to use for this problematic demand pattern. RMSE is a scale-dependent error measure, and it is widely used in practice. So, it is included in the comparison process as well. S-MAPE shows accuracy as a percentage. Also, it overcomes the division by zero problems that MAPE has in the ID context. 17 Intermittent Demand for SKUs These measures are not capable of determining whether there is a systematic error, bias, or not. To determine if the forecasting methods generate results which are not biased, it is essential to use bias measures together with other accuracy measures. Since sCE and sPIS are scaled measures, they are used as bias measures to make comparisons across the series. The forecast accuracies are obtained by using “smooth” and “Metrics” packages of RStudio. “Accuracy” function of “smooth” package provides accuracy measures as sMAE, sMSE, sCE and sPIS while the MASE, S-MAPE and RMSE values can be calculated with the help of “Metrics” package. Table 4: Accuracy and bias measures for the sample SKU-P1 (partial representation) MODEL P1.acc.m.ses020 P1.acc.m.ses025 P1.acc.m.ses030 P1.acc.m.sba005 P1.acc.n.sba010 P1.acc.n.sba015 MASE 2.07 2.43 2.82 1.05 1.12 1.19 SMAPE 0.87 0.93 0.99 0.72 0.69 0.70 RMSE 808.02 914.73 1,035.80 429.07 518.04 549.37 sMAE 0.75 0.88 1.02 0.38 0.40 0.43 sMSE 0.52 0.66 0.85 0.15 0.21 0.24 sCE 7.47 8.76 10.19 1.49 3.44 3.96 sPIS 54.97 63.38 72.63 16.09 28.79 32.15 All these accuracies are retrieved from the relevant RStudio functions and then organized in an Excel file shown partially in Table 4 for the evaluations. This accuracy table has 136.136 rows and seven columns. The first column (Model) shows an ID which refers to a description of the model performed. This ID is formed as having five components: SKU number (e.g., P1), accuracy function (e.g., acc), initial value (m for mean, n for naı̈ve and o for optimum), forecast method (CR, SBA or SES) and α value (0.05-0.30). In the last step, the accuracies of forecasting methods for each SKU are controlled. Based on different measures, the best-performing method and parameters are selected (i.e., α and initial values) for the demand dataset. For example, in Table 4, the best-performing forecasting model for the SKU coded as P1 is the SBA method with an α value of 0.05 and the initial values determined by the mean. This selection is made based on considering all of the accuracy measures since the best-performing method is not always the same for all of them. Except for sCE and sPIS, smaller values show that the forecast is more accurate. For sCE and sPIS, which are bias measures, values that are close to 0 are preferable. Otherwise, their positive or negative values indicate stock problems. The results are presented overall based on the demand pattern of the SKUs. The number of SKUs that perform best for each forecasting method and parameter is used to assess the performance of these methods and parameters. Based on all preset smoothing α values and those calculated by minimizing the cost functions, the statistical forecasting models performed best for the majority of all demand patterns when the α parameter is set at 0.30 (Table 5). Opposite to the suggested guidelines in the literature, larger smoothing values performed better for most SKUs in this dataset. To identify the best method for determining the initial values for selected statistical forecasting models, the number of SKUs that performed best for each of the mean value, the naı̈ve estimate, and the optimum value obtained from the optimization process is calculated. For the majority (41.2%; 1,002 out of 2,431) of SKUs that have erratic, intermittent, and lumpy demand patterns, using the mean of a few more recent observations in the dataset as the initial values of statistical forecasting models resulted in better performance in terms of accuracy (Table 6). For a smooth demand pattern, the naı̈ve method is better for deter18 World Journal of Applied Economics 2023(1) mining the initial values. Table 5: Optimum smoothing parameters based on the demand pattern α Erratic (582) Lumpy (658) Intermittent (802) Smooth (389) 0.05 41 7% 88 13% 145 18% 54 14% 0.10 33 6% 47 7% 87 11% 49 13% 0.15 37 6% 47 7% 78 10% 35 9% 0.20 42 7% 67 10% 81 10% 51 13% 0.25 69 12% 61 9% 56 7% 35 9% 0.30 320 55% 303 46% 257 32% 130 33% mae1 9 2% 19 3% 29 4% 9 2% mae2 2 0% 11 2% 29 4% 6 2% mar1 1 0% 3 0% 7 1% 2 1% mar2 4 0% 1 0% mse1 8 1% 2 0% 7 1% 3 1% mse2 16 3% 6 1% 12 1% 7 2% msr1 1 0% 2 0% 8 1% 2 1% msr2 3 1% 2 0% 2 0% 5 1% Note: For each demand pattern, the number in the parenthesis shows the total and the first and second columns show the number and percent of SKUs, respectively. Table 6: The method for identifying initial values of statistical models Method # of SKUs % of SKUs Erratic (582) Mean 284 49% Naı̈ve 258 44% Optimum 40 7% Intermittent (802) Mean 362 45% Naı̈ve 342 43% Optimum 98 12% Method Mean Naı̈ve Optimum Mean Naı̈ve Optimum # of SKUs % of SKUs Lumpy (658) 356 54% 257 39% 45 7% Smooth (389) 150 39% 204 52% 35 9% When the performances of the statistical forecasting methods are compared with each other (Table 7), the majority of the SKUs in each demand pattern performed best accuracies with the SBA method (61.4%; 1,493 out of 2,431 SKUs). For intermittent and lumpy demand patterns, SES (39.7%; 580 out of 1,460 SKUs) also performed better than CR. Table 7: Performance of statistical forecasting methods according to demand patterns Method CR SBA SES CR SBA SES # of SKUs % of SKUs Erratic (582) 48 8% 446 77% 88 15% Intermittent (802) 69 9% 452 56% 281 35% Method CR SBA SES CR SBA SES # of SKUs % of SKUs Lumpy (658) 45 7% 314 48% 299 45% Smooth (389) 56 14% 281 72% 52 13% 5.4 Making judgmental forecasts In this study, judgmental forecasts are generated by the company by using executive opinions. The company usually relies on the judgmental forecasts generated by the executives 19 Intermittent Demand for SKUs instead of using any statistical techniques. This would create an advantage of generating accurate forecasts as they have deep experience in their business context. For the SKUs under investigation, judgmental forecasts were generated by the purchasing specialists as experts with the best insights about the company’s future. Using their experience, they decide the number of spare parts to order from their suppliers. Besides historical demand data, this study requires a proxy variable that can reproduce historical judgmental forecasts. For this purpose, 3,639 SKUs from the purchasing records covering July 2017-December 2018 (18 months) are obtained from the company. However, only 1,205 of these records could be matched with the demand data. Thus, 1,205 SKUs are used for the remaining steps of this forecasting process. This way of obtaining judgements can be accepted as equivalent to using the jury of the executive opinion method in an unstructured way to get the judgemental forecasts. It is assumed at this point that the experts made the judgmental forecasts and are aware of the variables influencing the demand for the SKUs for which they are responsible. 5.5 Combining statistical and judgmental forecasts The next stage of the proposed framework is the combination process of the selected statistical forecasting methods that are selected for all SKUs in both datasets with the judgmental forecasts which are provided by the company. This study aims to generate more accurate forecasts for ID by combining statistical forecasts generated by the appropriate methods through the guidance of the literature with judgmental forecasts based on expert opinions. When the time series is highly variable, as in the case of SKUs with ID patterns, it is appropriate to use judgmental forecasts. Thus, it is expected that this combination process will improve forecast accuracy. When combining the constituent forecasts for these datasets, weighted average procedures are used as this simple method performs better than the sophisticated combination strategies (Clemen, 1989; Sanders & Ritzman, 2004). Even using equal weights for the constituent forecasts (0.50 for statistical forecast and 0.50 for judgmental forecast) provides rather good performance (Clemen, 1989). Besides using simple averaging, weights ranging from 0.10 to 0.90 are used in order to see the effect of the different weights on the accuracy level. Thus, nine different combinations are implemented in the MS Excel environment. As the judgmental forecasts provided by the company encompass 18 months, the combination procedure is applied to only this range of the dataset. A combination of statistical and judgmental forecasts is applied based on the weighted average procedure. Assume the weights for the statistical and judgmental forecasts for a SKU are 0.6 and 0.4, respectively. The combined forecast for this SKU is calculated as (Statistical Forecast Value x 0.6 + Judgmental Forecast Value x 0.4). 5.6 Evaluation The forecast accuracy for each SKU is calculated for all of the resulting combinations. For the best-performing statistical methods, judgmental forecasts provided by the company and all combinations with different weights, MASE, RMSE, and bias are calculated for each SKU by using the “Metrics” package. The forecasting method that provides the highest accuracy is selected among the combined, statistical, and judgmental forecasts. The performance measures for all forecasting models developed in this study are partially displayed in Table 8. It is seen that, based on three performance measures, sample SKUs 20 World Journal of Applied Economics 2023(1) perform best in different combinations and methods. The overarching standard for selecting the best method is to look for consistent results across all measures. The criteria used in the case of inconsistent situations is to select the method with the lowest value out of the two accuracy measures (MASE and RMSE) plus the bias closest to zero. Table 8: Performance measures for all forecast models (Partial representation) # of Performance SKUs Measure MASE P65 RMSE Bias MASE P114 RMSE Bias MASE P775 RMSE Bias MASE P934 RMSE Bias S J 0.1S- 0.2S- 0.3S- 0.4S- 0.5S- 0.6S- 0.7S- 0.8S- 0.9S0.9J 0.8J 0.7J 0.6J 0.5J 0.4J 0.3J 0.2J 0.1J 0.84 1.89 1.69 1.48 1.28 1.11 0.99 0.92 0.87 0.83 0.82 10.29 22.75 20.77 18.84 16.99 15.25 13.65 12.26 11.14 10.39 10.10 2.01 20.39 18.15 15.91 13.67 11.43 9.19 6.95 4.71 2.47 0.23 0.96 1.48 1.28 1.08 0.87 0.74 0.69 0.67 0.68 0.74 0.82 6.79 9.78 8.62 7.52 6.51 5.65 5.01 4.69 4.74 5.17 5.88 5.41 6.83 5.61 4.39 3.16 1.94 0.71 0.51 1.74 2.96 4.18 0.79 2.26 2.05 1.85 1.64 1.44 1.23 1.02 0.90 0.82 0.80 2.61 6.64 6.07 5.51 4.97 4.45 3.96 3.51 3.12 2.82 2.64 0.15 6.11 5.49 4.86 4.23 3.61 2.98 2.35 1.73 1.10 0.47 2.39 2.40 1.94 1.51 1.15 0.89 0.80 0.91 1.18 1.52 1.94 7.94 7.97 6.65 5.40 4.30 3.46 3.13 3.44 4.27 5.37 6.62 7.29 7.33 5.87 4.41 2.95 1.48 0.02 1.44 2.91 4.37 5.38 Note: S stands for Statistical Model and J stands for Judgmental Forecast. Table 9 lists how many SKUs were chosen as the top performers in the corresponding settings. These results show that the majority (64.2%; 772 out of 1,205) of SKUs did perform better when judgmental and statistical forecasts were combined. However, for many SKUs, the combinations that give statistical forecasts more weight performed higher. For the pure statistical models, the best performance was observed for 33.4% (402 out of 1,205) of SKUs. On the other hand, the judgmental forecasts provided by the company’s purchasing specialists performed worse (2.4%) compared to the alternative approaches. Models with statistical forecasts given a weight greater than or equal to 0.5 performed better for 666 SKUs (55.3%). To sum up, most of the most accurate forecasts are generated by using the combination process. Table 9: Overall comparison of all models based on p and CV 2 Model 0.1S - 0.9J 0.2S - 0.8J 0.3S - 0.7J 0.4S - 0.6J 0.5S - 0.5J 0.6S - 0.4J 0.7S - 0.3J 0.8S - 0.2J 0.9S - 0.1J Judgmental Statistical Average # of SKUs 10 19 38 41 120 68 109 162 207 29 402 1,205 Average p 1.76 2.19 1.79 1.89 1.87 1.36 1.37 1.30 1.27 2.44 1.44 14.88 Average CV 2 0.81 0.69 0.98 0.71 0.73 0.84 0.74 0.74 0.58 0.79 0.56 0.66 For getting additional insights into the results, the average p and CV 2 values in each sub-group are also computed. To investigate the relationship between the weights assigned to statistical forecasts and the parameters describing ID characteristics (i.e., average interdemand interval, p and CV 2 values), an OLS regression analysis is performed. 21 Intermittent Demand for SKUs Analysis showed that 70% of the changes in the weights assigned to statistical forecasts could be explained by the average inter-demand interval p and CV 2 values (F=29.08 with pvalue <0.05). The relationship between the weights assigned to statistical forecasts and CV 2 values is found as negative (t=-3.136 with p-value<0.05). The negative relationship between the weights assigned to statistical forecasts and the value of p is also substantiated (t= -5.98 with p-value<0.05). For higher inter-demand intervals and CV 2 values, lower weights may be preferred for statistical forecasts in combined models. However, this finding needs to be tested with stronger evidence in another research specifically designed for exploring such connections. 6 Conclusion Combining forecasts in the context of ID is one of the unnoticed research areas in the literature. Studies combining statistical and judgmental forecasts for the ID of SKUs are particularly scarce. From these facts, this study developed a methodological framework for combining statistical and judgmental forecasts for ID. The proposed framework for forecasting is described as a six-stage model to obtain more accurate results for SKUs with ID patterns. In line with the simple definition of Green & Armstrong (2015), the study defined this framework as processes that are understandable to forecast users. Then, the framework is tested using real-world data that includes monthly customer demands for after-sales spare parts from an anonymous company that manufactures small home appliances in the electronics industry. Results showed that combination is the best method for the majority of SKUs, as expected. This paper contributes to the limited literature by addressing the gap between the combined forecast and the ID forecast. Companies primarily employ judgmental approaches in which their forecasts are entirely dependent on business context information. Thus, these kinds of pure judgmental forecasts have the risk of being biased. The proposed framework gives practitioners and researchers a comprehensive overview to help them make more accurate forecasts while also encouraging the use of simple but structured approaches. On the other hand, the results of this study must be interpreted in light of some limitations. The first is concerned with data quality and missing information. The company provided the data without giving any details for the SKUs, such as what the SKU is, if SKU is a new item or not used anymore, and whether it has a similarity to other SKUs or not. Thus, all calculations are done assuming that the SKUs are independent of each other and are actively used for the given periods. More detail about the SKUs would help to categorize and arrange the dataset for the forecasting process, which would provide better accuracy levels. The second limitation is the method used to generate judgmental forecasts. During the implementation, company representatives generate judgmental forecasts as is standard procedure in their operations. At this point, it is assumed that the representatives made the judgmental forecasts as experts and are aware of the variables influencing demand for the SKUs for which they are responsible. To better measure, the model’s performance, this part of the implementation could use more structured and controlled methods, as suggested in this study. Future research can determine the framework’s applicability in different contexts. When developed, a toolkit can help and motivate people to put this model into action. 22 World Journal of Applied Economics 2023(1) References Armstrong, J. S. (1985). Long Range Forecasting: From Crystal Ball to Computer (2nd ed.). Wiley-Interscience, New York. Armstrong, J. S. (2001a). Combining Forecasts. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer, Dordrecht. https:// repository.upenn.edu/marketing papers/150. Armstrong, J. S. (2001b). Judgmental Bootstrapping: Inferring Experts’ Rules for Forecasting. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer, Dordrecht. https://repository.upenn.edu/marketing papers/150. Armstrong, J. S. (2001c). Selecting Forecasting Methods. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer, Dordrecht. https://repository.upenn.edu/marketing papers/150. Babai, M. Z., Dallery, Y., Boubaker, S., & Kalai, R. (2019). A New Method to Forecast Intermittent Demand in the Presence of Inventory Obsolescence. International Journal of Production Economics, 209 , 30-41. doi:10.1016/j.ijpe.2018.01.026 Bates, J. M., & Granger, C. W. J. (1969). The Combination of Forecasts. Journal of the Operational Research Society, 20 (4), 451-468. doi:10.1057/jors.1969.103 Blattberg, R. C., & Hoch, S. J. (1990). tuition: 50% Model + 50% Manager. doi:10.1287/mnsc.36.8.887 Database Models and Managerial InManagement Science, 36 (8), 887-899. Boylan, J. E., Syntetos, A. A., & Karakostas, G. C. (2008). Classification for Forecasting and Stock Control: A Case Study. Journal of the Operational Research Society, 59 (4), 473-481. doi:10.1057/palgrave.jors.2602312 Bunn, D. W., & Salo, A. A. (1993). Forecasting with Scenarios. Journal of Operational Research, 68 (3), 291-303. doi:10.1016/0377-2217(93)90186-Q Chan, F., & Pauwels, L. L. (2018). Some Theoretical Results on Forecast Combinations. International Journal of Forecasting, 34 (1), 64-74. doi:10.1016/j.ijforecast.2017.08.005 Claeskens, G., Magnus, J. R., Vasnev, A., & Wang, W. (2016). The Forecast Combination Puzzle: A Simple Theoretical Explanation. International Journal of Forecasting, 32 (3), 754-762. doi:10.1016/j.ijforecast.2015.12.005 Clemen, R. T. (1989). Combining Forecasts: A Review and Annotated Bibliography. International Journal of Forecasting, 5 (4), 559-583. doi:10.1016/0169-2070(89)90012-5 Constantino, F., Di Gravio, G., Patriarca, R., & Petrella, L. (2018). Spare parts management for irregular demand items. Omega, 81 , 57-66. doi:10.1016/j.omega.2017.09.009 Croston, J. D. (1972). Forecasting and Stock Control for Intermittent Demands. Operational Research Quarterly (1970-1977), 23 (3), 289-303. doi:10.2307/3007885 Dalkey, N. (1969). An Experimental Study of Group Opinion: The Delphi Method. Futures, 1 (5), 408-426. doi:10.1016/S0016-3287(69)80025-X Davydenko, A., & Fildes, R. (2013). Measuring Forecasting Accuracy: The Case of Judgmental Adjustments to SKU-level Demand Forecasts. International Journal of Forecasting, 29 (3), 510-522. doi:10.1016/j.ijforecast.2012.09.002 23 Intermittent Demand for SKUs Duncan, G., Gorr, W. L., & Szczypula, J. (2001). Forecasting analogous time series. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer, Dordrecht. Eaves, A. H. (2002). Forecasting for the Ordering and Stock-holding of Consumable Spare Parts (PhD thesis). Department of Management Science, The Management School, Lancaster University. Eaves, A. H., & Kingsman, B. G. (2004). Forecasting for the Ordering and Stock-holding of Consumable Spare Parts. Journal of the Operational Research Society, 55 (4), 431-437. doi:10.1057/palgrave.jors.2601697 Eroğlu, C., & Croxton, K. L. (2010). Biases in Judgmental Adjustments of Statistical Forecasts: The Role of Individual Differences. Journal of the Operational Research Society, 26 (1), 116-133. doi:10.1016/j.ijforecast.2009.02.005 Fildes, R. (1992). The Evaluation of Extrapolative Forecasting Methods. International Journal of Forecasting, 8 (1), 81-98. doi:10.1016/0169-2070(92)90009-X Fildes, R., Ma, S., & Kolassa, S. (2019). Retail Forecasting: Research and Practice (MPRA Working Paper No. 89356). Munich Archive. https://mpra.ub.uni-muenchen.de/89356/1/MPRA paper 89356.pdf . Fildes, R., & Petropoulos, F. (2015). Simple versus Complex Selection Rules for Forecasting Many Time Series. Journal of Business Research, 68 (8), 1692–1701. doi:10.1016/j.jbusres.2015.03.028 Fischer, I., & Harvey, N. (1999). Combining Forecasts: What Information do Judges Need to Outperform the Simple Average? International Journal of Forecasting, 3 (15), 227-246. doi:10.1016/S0169-2070(98)00073-9 Franses, P. H., & Legerstee, R. (2010). Do Experts’ Adjustments on Model-Based SKULevel Forecasts Improve Forecast Quality? Journal of Forecasting, 29 (3), 331-340. doi:10.1002/for.1129 Ghobbar, A. A., & Friend, C. H. (2002). Sources of Intermittent Demand for Aircraft Spare Parts within Airline Operations. Journal of Air Transport Management, 8 (4), 221-231. doi:10.1016/S0969-6997(01)00054-0 Ghobbar, A. A., & Friend, C. H. (2003). Evaluation of Forecasting Methods for Intermittent Parts Demand in the Field of Aviation: A Predictive Model. Computers & Operations Research, 30 (14), 2097-2114. doi:10.1016/S0305-0548(02)00125-9 Green, K. C., & Armstrong, J. S. (2007). Structured Analogies for Forecasting. International Journal of Forecasting, 23 (3), 365-376. doi:10.1016/j.ijforecast.2007.05.005 Green, K. C., & Armstrong, J. S. (2015). Simple Versus Complex Forecasting: The Evidence. Journal of Business Research, 68 (8), 1678-1685. doi:10.1016/j.jbusres.2015.03.026 Gutierrez, R. S., Solis, A. O., & Mukhopadhyay, S. (2008). Lumpy Demand Forecasting using Neural Networks. International Journal Production Economics, 111 (2), 409-420. doi:10.1016/j.ijpe.2007.01.007 Hanke, J. E., & Wichern, D. (2014). Business Forecasting (9th ed.). Pearson Education Limited, Essex. Hasni, M., Babai, M. Z., Aguir, M. S., & Jemai, Z. (2019). An Investigation on Bootstrapping Forecasting Methods for Intermittent Demands. International Journal Production Economics, 209 , 20-29. doi:10.1016/j.ijpe.2018.03.001 24 World Journal of Applied Economics 2023(1) Hsu, C., & Sandford, B. (2007). The Delphi Technique: Making Sense of Consensus. Practical Assessment, Research, and Evaluation, 12 (10), 1-8. doi:10.7275/pdz9-th90 Hua, Z. S., Zhang, B., Yang, J., & Tan, D. S. (2007). A New Approach of Forecasting Intermittent Demand for Spare Parts Inventories in the Process Industries. Journal of the Operational Research Society, 58 (1), 52-61. doi:10.1057/palgrave.jors.2602119 Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice (Second ed.). Otexts, Melbourne. Hyndman, R. J., & Koehler, A. B. (2006). Another Look at Measures of Forecast Accuracy. International Journal of Forecasting, 22 (4), 679-688. doi:10.1016/j.ijforecast.2006.03.001 Johnston, F. R., Boylan, J., & Shale, E. (2003). An Examination of the Size of Orders from Customers, their Characterisation and the Implications for Inventory Control for Slow-moving Items. The Journal of the Operational Research Society, 54 (8), 833-837. doi:10.1057/palgrave.jors.2601586 Johnston, F. R., & Boylan, J. E. (1996). Forecasting for Items with Intermittent Demand. The Journal of the Operational Research Society, 47 (1), 113-121. doi:10.2307/2584256 Kostenko, A. V., & Hyndman, R. J. (2006). A Note on the Categorization of Demand Patterns. The Journal of the Operational Research Society, 57 , 1256-1257. doi:10.1057/palgrave.jors.2602211 Kourentzes, N. (2013). Intermittent Demand Forecasts with Neural Networks. International Journal of Production Economics, 143 (1), 198-206. doi:10.1016/j.ijpe.2013.01.009 Kourentzes, N. (2014). On Intermittent Demand Model Optimisation and Selection. International Journal of Production Economics, 156 (1), 180-190. doi:10.1016/j.ijpe.2014.06.007 Kourentzes, N., & Petropoulos, F. (2016). R: Package ’tsintermittent’ [R package manual]. https://cran.r-project.org/web/packages/tsintermittent/tsintermittent.pdf . Lawrence, M., Goodwin, P., O’Connor, M., & Önkal, D. (2006). Judgmental Forecasting: A Review of Progress over the Last 25 Years. International Journal of Forecasting, 22 (3), 493-518. doi:10.1016/j.ijforecast.2006.03.007 Lawrence, M. J., Edmundson, R. H., & O’Connor, M. J. (1986). The Accuracy of Combining Judgemental and Statistical Forecasts. Management Science, 32 (12), 1521-1532. doi:10.1287/mnsc.32.12.1521 Levén, E., & Segerstedt, A. (2004). Inventory Control with a Modified Croston Procedure and Erlang Distribution. International Journal of Production Economics, 90 (3), 361-367. doi:10.1016/S0925-5273(03)00053-7 Li, L., Kang, Y., Petropoulos, F., & Li, F. (2022). Feature-based Intermittent Demand Forecast Combinations: Accuracy and Inventory Implications. International Journal of Production Research. doi:10.1080/00207543.2022.2153941 Makridakis, S. (1993). Accuracy Measures: Theoretical and Practical Concerns. International Journal of Forecasting, 9 (4), 527-529. doi:10.1016/0169-2070(93)90079-3 Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2022). The M5 Competition: Background, Organization, and Implementation. International Journal of Forecasting, 38 (4), 1325-1336. doi:10.1016/j.ijforecast.2021.07.007 25 Intermittent Demand for SKUs Nagaria, P. (2017). Forecasting intermittent demand: Traditional smoothing approaches versus the Croston method. https://www.oreilly.com/library/view/strata-data-conference/ 9781491985373/video317307.html? gl=1*195ftmp* ga*NTQxODkwNjQyLjE2NzQyNzEyOTI.* ga 092EL089CH*MTY3NDI3MTI5MS4xLjEuMTY3NDI3MTg2OC41MC4wLjA. Palm, F. C., & Zellner, A. (1992). To Combine or not to Combine? Issues of Combining Forecasts. Journal of Forecasting, 11 (8), 687-701. doi:10.1002/for.3980110806 Parackal, M., Goodwin, P., & O’Connor, M. (2007). Judgement in Forecasting (Editorial). International Journal of Forecasting, 23 (3), 343-345. doi:10.1016/j.ijforecast.2007.05.004 Petropoulos, F., Fildes, R., & Goodwin, P. (2016). Do ‘Big Losses’ in Judgmental Adjustments to Statistical Forecasts Affect Experts’ Behaviour? European Journal of Operational Research, 249 (3), 842-852. doi:10.1016/j.ejor.2015.06.002 Petropoulos, F., & Kourentzes, N. (2015). Forecast Combinations for Intermittent Demand. Journal of the Operational Research Society, 66 (6), 914-924. doi:10.1057/jors.2014.62 Petropoulos, F., & Svetunkov, I. (2020). A Simple Combination of Univariate Models. International Journal of Forecasting, 36 (1), 110-115. doi:10.1016/j.ijforecast.2019.01.006 Pinçe, Ç., Turrini, L., & Meissner, J. (2021). Intermittent Demand Forecasting for Spare Parts: A Critical Review. Omega, 105 , 1-30. doi:10.1016/j.omega.2021.102513 Pour, A. N., Tabar, B. R., & Rahimzadeh, A. (2008). A Hybrid Neural Network and Traditional Approach for Forecasting Lumpy Demand. International Journal of Industrial and Manufacturing Engineering, 2 (4), 1028-1034. doi:10.5281/zenodo.1075923 Qian, W., Rolling, C., Cheng, G., & Yang, Y. (2019). On the Forecast Combination Puzzle. Econometrics, 7 (39), 1–26. doi:10.3390/econometrics7030039 Saccani, N., Visintin, F., Mansini, R., & Colombi, M. (2017). Improving Spare Parts Management for Field Services: A Model and a Case Study for the Repair Kit Problem. IMA Journal of Management Mathematics, 28 (2), 185-204. doi:10.1093/imaman/dpw023 Sanders, N. R. (1992). Accuracy of Judgmental Forecasts: A Comparison. Omega, 20 (3), 353-364. doi:10.1016/0305-0483(92)90040-E Sanders, N. R., & Manrodt, K. (2003). The efficacy of using judgmental versus quantitative forecasting methods in practice. Omega, 31 (6), 511-522. doi:10.1016/j.omega.2003.08.007 Sanders, N. R., & Ritzman, L. P. (1992). The Need for Contextual and Technical Knowledge in Judgmental Forecasting. Journal of Behavioral Decision Making, 5 (1), 39-52. doi:10.1002/bdm.3960050106 Sanders, N. R., & Ritzman, L. P. (2001). Judgmental Adjustment of Statistical Forecasts. In J. S. Armstrong (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer, Dordrecht. Sanders, N. R., & Ritzman, L. P. (2004). Integrating judgmental and quantitative forecasts: methodologies for pooling marketing and operations information. International Journal of Operations & Production Management, 24 (5), 514-529. doi:10.1108/01443570410532560 Silver, E. A., Pyke, D. F., & Peterson, R. (1998). Inventory management and production planning and scheduling (3rd ed.). John Wiley & Sons, New York. 26 World Journal of Applied Economics 2023(1) Smith, J., & Wallis, K. (2009). A simple explanation of the forecast combination puzzle. Oxford Bulletin of Economics and Statistics, 71 (3), 331-355. doi:10.1111/j.14680084.2008.00541.x Surowiecki, J. (2005). The wisdom of crowds. Anchor Books, New York. Syntetos, A. A. (2001). Forecasting of intermittent demand (PhD thesis). Business School Buckinghamshire Chilterns University College, Brunel University, UK. Syntetos, A. A., Babai, M. Z., & Gardner Jr., E. S. (2015). Forecasting intermittent inventory demands: simple parametric methods vs. bootstrapping. International Journal of Production Economics, 68 (8), 1746–1752. doi:10.1016/j.jbusres.2015.03.034 Syntetos, A. A., & Boylan, J. E. (2001). On the bias of intermittent demand estimates. International Journal of Production Economics, 71 (1-3), 457-466. doi:10.1016/S09255273(00)00143-2 Syntetos, A. A., Boylan, J. E., & Croston, J. D. (2005). On the categorization of demand patterns. Journal of the Operational Research Society, 56 (5), 495–503. doi:10.1057/palgrave.jors.2601841 Syntetos, A. A., Nikolopoulos, K., Boylan, J. E., Fildes, R., & Goodwin, P. (2009). The effects of integrating management judgement into intermittent demand forecasts. International Journal of Production Economics, 118 (1), 72–81. doi:10.1016/j.ijpe.2008.08.011 Teunter, R., & Duncan, L. (2009). Forecasting intermittent demand: a comparative study. Journal of the Operational Research Society, 60 (3), 321–329. doi:10.1057/palgrave.jors.2602569 Timmermann, A. (2006). Forecast Combinations. In G. Elliott, C. Granger, & A. Timmermann (Eds.), Handbook of Economic Forecasting (Vol. 1, p. 135–196). Elsevier. doi:10.1016/S1574-0706(05)01004-9 van Kampen, T. J., Akkerman, R., & van Donk, D. P. (2012). SKU classification: a literature review and conceptual framework. International Journal of Operations & Production Management, 32 (7), 850–876. doi:10.1108/01443571211250112 Waller, D. (2015). Methods for Intermittent Demand Forecasting (Working Paper). Lancaster University. http://www.lancaster.ac.uk/pg/waller/pdfs/Intermittent Demand Forecasting .pdf . Wallström, P., & Segerstedt, A. (2010). Evaluation of forecasting error measurements and techniques for intermittent demand. International Journal of Production Economics, 128 (2), 625–636. doi:10.1016/j.ijpe.2010.07.013 Wang, X., Hyndman, R., Li, F., & Kang, Y. (2022). Forecast Combinations: An over 50-year Review (Preprint). arXiv. https://arxiv.org/pdf/2205.04216. Weinberg, C. B. (1986). Arts Plan: Implementation, Evolution, and Usage. Marketing Science, 5 (2), 143–158. doi:10.1287/mksc.5.2.143 Willemain, T. R., Smart, C. N., & Schwarz, H. F. (2004). A new approach to forecasting intermittent demand for service parts inventories. International Journal of Forecasting, 20 (3), 375–387. doi:10.1016/S0169-2070(03)00013-X Willemain, T. R., Smart, C. N., Shockor, J. H., & DeSautels, P. A. (1994). Forecasting intermittent demand in manufacturing: a comparative evaluation of Croston’s method. International Journal of Forecasting, 10 (4), 529–538. doi:10.1016/0169-2070(94)90021-3 27 Intermittent Demand for SKUs Williams, T. M. (1984). Stock control with sporadic and slow-moving demand. The Journal of the Operational Research Society, 35 (10), 939–948. doi:10.1057/jors.1984.185 Wilson, J. H., & Keating, B. (2008). Introduction to Business Forecasting. In J. H. Wilson & B. Keating (Eds.), Business Forecasting with ForecastX (p. 1-55). McGraw Hill-Irwin, New York. Wright, T. C. (2013). Real Life Examples of Qualitative Forecasting. https://smallbusiness .chron.com/real-life-examples-qualitative-forecasting-72990.html. 28 Appendix: Details of performance measures Formula Explanation Measure Scale-dependent Error Measures Pn Mean Error M E = n1 t=1 et Mean Absolute Error M AE = 1 n Pn |et | Mean Square Error M SE = 1 n Pn e2t Root Mean Squared Error RM SE = t=1 t=1 q 1 n Pn t=1 e2t Lower values of RMSE is better. World Journal of Applied Economics 2023(1) Pn CE = t=1 et PH Ph P ISi = − h=1 j=1 (YN +j − Ŷj ) 29 Cumulative Error Periods in stock The small value of ME does not mean that the error is small but shows biases. ME takes a negative (positive) value if the forecasting method underpredicts (overpredicts) the actual. This method sums up the errors to determine the bias. H is the length of the required forecast horizon. This measure shows the total number of periods in stock or stock-out periods of the forecasted item. Positive PIS indicates stock left over, whereas negative PIS represents stock shortages. MAE shows the errors regardless of under or over the forecast. This method gives an average of error measurement irrespective of the direction. Also, it eliminates the canceling of the problem caused by the ME method. This method shows how large the average error can be. Like MAE, it eliminates the canceling out problem. This penalizes large forecasting errors by squaring the errors (Hanke & Wichern, 2014). MSE is widely used to compare the accuracy levels of different methods. MSE value close to 0 is preferable Scale-independent Error Measures Pn Mean absolute scaled M ASE = n1 t=1 error sCE = Scaled PIS sP ISi = P sAEi,h = Scaled MSE sSEi,h = 30 Scaled MAE Pn et t=2 |At − At−1 | YN +h − Fh PN t=1 Yt 1 N 1 N P ISi P N t=1 Yt |YN +h − Fh | PN 1 t=1 Yt N YN +h − Fh PN 1 t=1 Yt N r Like in sCE, MAE can be scaled to be able to average the measures across series. sMAE is the scaled absolute error averaged over all series and horizons. 2 P(n+h) sMSE (Scaled MSE) is the mean of scaled squared error across all series and horizons. (Yt − Yˆt )2 1 t=n+1 Pn Root Mean Squared RM SSE = 1 2 h 1−n t=2 (Yt − Yt−1 ) Scaled Error Percentage Error Measures Pn Mean Absolute Per- M AP E = n1 t+1 | Aett | x 100 centage Error Symmetric Mean Absolute Percentage Error S − M AP E = 2 n Its value less (higher) than 1 indicates that the actual forecast performance is better (worse) than the naı̈ve method. Division by zero only occurs if all the values in the time series are equal. It is symmetrical and robust to outliers. N is the number of in-sample observations, YN +h is the hth out-of-sample period and Fh is the h-steps ahead forecasts. sCE can be used for comparing the performance of different methods on different datasets. A scale-independent form of PIS. |At − Ft | t+1 At + Ft x 100 Pn RMSSE is a variant of the well-known MASE. If there are zero values in the denominator (actual demand), there will be division by zero problems. Thus, the percentage calculation will be undefined. Because of this, using MAPE is not appropriate for items with ID. This method is proposed by Makridakis (1993) to avoid asymmetry caused by the application of MAPE. Unlike MAPE, S-MAPE has a lower and an upper bound. The formula above provides results between 0 and 200%. Intermittent Demand for SKUs Scaled CE 1 (n−1) Relative Error Measures 2 ( (Πn t+1 (At − FB,t ) ) 1/2n) 2 ( (Πn t+1 (At − FA,t ) ) 1/2n) RGRM SE = Percentage Best (P Bt ) P Bt is the percentage of occurrences when one method outperforms all others. PB is a method under consideration that outperforms another. Subscripts A and B refers to the forecasting methods. RGRMSE is a safe measure to use in ID. It can easily be calculated and interpreted and is robust to large forecast errors. PB is used for comparing two estimators based on performance criteria like mean deviation or geometric mean square error. When more than two estimators are compared, Percentage Best (P Bt ) is used rather than PB. 31 World Journal of Applied Economics 2023(1) Relative Geometric Root Mean Square Error Percentage of times Better (PB)