An Integrated Approach to Measuring Supply Chain Performance

  • cc icon

    Chan and Qi (SCM 8/3 (2003) 209) developed an innovative measurement method that aggregates performance measures in a supply chain into an overall performance index. The method is useful and makes a significant contribution to supply chain management. Nevertheless, it can be cumbersome in computation due to its highly complex algorithmic fuzzy model. In aggregating the performance information, weights used by Chan and Qi-which aim to address the imprecision of human judgments-are incompatible with weights in additive models. Furthermore, the default assumption of linearity of its scoring procedure could lead to an inaccurate assessment of the overall performance. This paper addresses these limitations by developing an alternative measurement that takes care of the above. This research integrates three different approaches to multiple criteria decision analysis (MCDA)-the multiattribute value theory (MAVT), the swing weighting method and the eigenvector procedure-to develop a comprehensive assessment of supply chain performance. One case study is presented to demonstrate the measurement of the proposed method. The performance model used in the case study relies on the Supply Chain Operations Reference (SCOR) model level 1. With this measurement method, supply chain managers can easily benchmark the performance of the whole system, and then analyze the effectiveness and efficiency of the supply chain.


    Performance Measurement , Supply Chain Management , Multiattribute Value Theory (MAVT) , Swing Weight , Eigenvector Method , SCOR


    Business management has entered a period in which supply chains compete with each other (Christopher, 1998). As firms head towards supply chain management (SCM), it becomes essential to measure the performance of the supply chain. Traditional performance measurement systems (PMSs) however, cannot adequately capture the complexity of supply chain performance for several reasons such as: They have been found to be lacking in a balanced approach to integrating financial and non-financial performance measures. They also fall short in terms of the systems thinking perspective, by which a supply chain must be viewed as the whole entity and measured widely across the whole. Traditional PMSs also lack effective techniques that can help supply chain managers to interpret the overwhelming amount of supply chain performance information (Chan et al., 2006). Therefore, there is a pressing need to develop tools and measurement methods to improve the practice of supply chain performance measurement (SCPM).

    The literature on SCPM can be divided into the three major components of the PMS: performance models, supply chain metrics and measurement methods. The ‘per-formance model’ is a selected framework that links the overall performance with different levels of decision hierarchy to meet the objectives of the organization (Simatupang and Sridharan, 2002). The term ‘metric’ includes the definition of measure, data capture, and responsibility for calculation (Neely et al., 1995). The ‘measurement method’ is a set of rules and guidelines for measurement.

    A variety of performance models can measure supply chain performance according to different performance attributes (Beamon, 1999; Chan and Qi, 2003b; Chan 2003), processes (Gunasekaran et al., 2001; Supply-Chain Council, 2006), management levels (Gunasekaran et al., 2001), and perspectives adapted from the balanced scorecard (Brewer and Speh, 2000; Lohman et al., 2004). The current literature tends to focus on performance models by grouping measures into those various perspectives.

    The literature concerning supply chain metrics suggests integrated measures (Bechtel and Jayaram, 1997; Brewer and Speh, 2000; Farris II and Hutchison, 2002; Novack and Thomas, 2004); identifies measures frequenttly used to guide supply chain decision making (Fawcett and Cooper, 1998; Harrison and New, 2002; and Bolstorff, 2003); invents new metrics (Lambert and Pohlen, 2001; Dasgupta, 2003); and cautions against measures of traditional logistics operations such as inventory turn (Lambert and Pohlen, 2001), logistics cost per unit (Griffis et al., 2004), capacity utilization (Hausman, 2004), and order per sales representative (Fawcett and Cooper, 1998). Such traditional logistics measures do not focus on key chain-spanning activities, do not always optimize supply chain performance, and do not motivate employees to work with a supply chain orientation (Brewer and Speh, 2000).

    When it comes to measurement methods, the analytic hierarchy process (Chan, 2003), the fuzzy set theory (Chan and Oi, 2003a) and a method used in the ABC inventory (Gunasekaran et al., 2004) are just a few of the techniques that have been proposed to assist in the prioritization of supply chain performance measures. Kleijnen and Smits (2003) suggested that multiple supply chain measures may be aggregated into the utility, which is the final performance measure of a system, through scoring methods. Lohman et al. (2004) aggregated various performance measures into one number by using the method derived from Maskell (1991) for metric normalization. Seth et al. (2006) suggested using a novel methodology that integrates statistical analysis, quality loss function (QLF), and data envelopment analysis (DEA) to create a single performance indicator for the measurement of the quality of service in the supply chain context, yet this measurement method needs to be demonstrated empirically.

    In an attempt to resolve traditional PMS deficiencies, Chan and Qi (2003a) proposed an innovative measurement method that converts performance data from various measures into a meaningful composite index for a supply chain. The methodology developed is based on the fuzzy set theory to address the imprecision in human judgments. A geometric scale of triangular fuzzy numbers by Boender et al. (1989) is employed to quantify the relative weights of performance measures in terms of the triangular fuzzy number. Performance data are transformed into fuzzy measurement results by two subsequent mappings. First, the performance data are converted into the performance scores by adopting the proportional scoring technique, which involves defining and scaling the two end points of the measurement scale for each measure so that the score ranges from 0 to 10. Second, the performance score is translated into a fuzzy performance grade set, defined by the triangular fuzzy number. The fuzzy performance grade set is defined as the fuzzy measurement result, which is denoted by a fuzzy vector {A, B, C, D, E, F}. These six grades denote the gradational measurement results ranging from the perfect to worst. The weighted average method is used to aggregate the fuzzy measurement results and to defuzzify the fuzzy performance grades into a crisp (exact) number ranging from zero to ten, called the performance index.

    Chan and Qi’s measurement method has made a significant contribution to SCM. Harland et al. (2006) regarded Chan and Qi’s (2003a) paper as one of the core set of papers concerning the development of the discipline of SCM. Chan and Qi’s measurement approach offers managers an innovative way of aggregating financial and nonfinancial performance measures into a single index for analyzing and benchmarking the overall performance of a supply chain. The performance index makes it easy for managers to comprehend the complexity of supply chain performance and to recognize all aspects of performance along the chain. The index is aimed at assisting managers in modeling, optimizing, and continuously improving the supply chain.

    The method is undoubtedly useful for SCM, yet there is room for improvement. First, supply chain practitioners may find it difficult to use Chan and Qi’s measurement method because of its very complex fuzzy set algorithm. Although the fuzzy logic-based approach is effective in making decisions and evaluations where preferences are not clearly articulated, managers who do not have the requisite academic expertise will be frustrated by the mathematical sophistication that it requires (Zanakis et al., 1995; Bozdag et al., 2003).

    Second, it is important to recognize that Chan and Qi’s measurement approach has its roots in the weighted additive model of the multiattribute value theory (Keeney and Raiffa, 1976; Dyer and Sarin, 1979). Weights in such a model however, are scaling constants, which “do not indicate relative importance” (Keeney and Raiffa, 1976, p. 273). Weights as scaling constants rely on measurement scales (the ranges of measures being weighted). In general, the greater the range of performance for a particular measure, the greater the weight for the measure should be. If a particular measure has a small range between the worst and the best performance, this measure becomes irrelevant because it has no importance in discriminating between the worst and the best performance even though the evaluator may consider it an important measure per se (von Winterfeldt and Edwards, 1986; Goodwin and Wright, 2004). Although the fuzzy set theory has its advantage in capturing the imprecision of evaluators’ judgments, the Boender et al.’s (1989) geometric scale of triangular fuzzy number, adopted by Chan and Qi (2003a), does not produce weights that coincide with the meaning of weights in the weighted additive model as it does not take explicitly the range of measurement into account. Thus, it cannot be guaranteed that this technique will not lead to biased weights.

    Third, the relationships between measurement scales and performance scores are somewhat ad hoc because they are limited to just merely linear functions. In principle, these relationships should represent the extent to which the performance of particular metrics satisfies the evaluator, and they may be best represented by non-linearity (Forman and Selly, 2001; Belton and Stewart, 2002). Belton and Stewart (2002) observed that value functions are rarely linear. The default assumption of linearity tends to be violated in real-world decision making in some circumstances (Stewart, 1996). Therefore, the measurement algorithm must be flexible enough to handle both the linear and non-linear functions that could arise. Any measurement method that always allows or always precludes linearity might not be adequate to capture human preferences in reality.

    In view of the above limitations, a simple, flexible, and sound theoretical approach to SCPM is needed. Thus, the objective of this paper is to introduce an alternative measurement method that possesses such desirable features. Developed from the integration of the multiattribute value theory, the swing weighting method (von Winterfeldt and Edwards, 1986), and Saaty’s (1980) eigenvector procedure-the proposed measurement method is conceptually simple and comprehensible and both flexible and rigorous enough to cope with the human evaluation process.

    The contributions of this paper are to: (1) develop a novel performance measurement method to contribute to the development of SCM, (2) point to an approach that can elicit weights in the additive aggregation model, (3) present an alternative modeling of judgments that permits both linear and non-linear value functions, and (4) provide an original case study to demonstrate the proposed approach.

    In a subsequent section, the proposed performance measurement method for SCM and its development background are described. Next, details of a case study are provided. The paper ends with conclusions and discussions.


       2.1 Background

    Various measures have been proposed by several authors to capture many aspects of supply chain performance. Important measures of supply chain performance could be used collectively to depict the overall supply chain performance, and this evaluation could be administered through techniques typical to the field of multiple criteria decision analysis (MCDA). MCDA is a collection of formal approaches which take into account multiple criteria in helping individuals or groups to promote good decision making (Belton and Stewart, 2002). Common MCDA techniques embrace multiattribute value theory (MAVT), multiattribute utility theory (MAUT), the analytic hierarchy process (AHP), goal programming, and outranking methods (Belton and Stewart, 2002).

    This study uses MAVT (Keeney and Raiffa, 1976; Dyer and Sarin, 1979) to provide a platform for integrating several measures of supply chain performance into a single indicator. MAVT is an approach that allows numerical scores (values) to represent the respondent’s preference for performance outcomes. The scores are usually derived by the construction of the respondent’s preference orderings or mathematical functions. Such a function is referred to as the ‘value function’ if the assessment of preference is not concerned with uncertainty. If the assessment involves risk and uncertainty, MAUT should be applied, and the function under uncertainty is referred to as the ‘utility function.’

    In applying MAVT for SCPM, this study underscores the importance of modeling accurate value judgments. Accordingly, its scoring method allows non-linearity between performance outcomes and preference scores (values) to happen. In the literature, there has been a debate regarding the assumption of shapes of value functions. Von Winterfeldt and Edwards (1986) suggested that value functions should be linear or nearly linear if the problem (the performance model) has been well structured and if the appropriate scales have been selected. Belton and Stewart (2002), however, cautioned against the oversimplification of the problem by an inappropriate use of linear value functions because Stewart’s (1993, 1996) experimental simulations have showed that the results of MAVT models are very sensitive to inappropriate linearization.

    A combination of non-linear value functions and the fuzzy set theory could lead to the daunting complexity of algorithm for practitioners and could create ambiguity regarding the interpretation of inputs. Although a decision support system (DSS) could be developed to help managers to take decisions without being frightened by model complexity, its modeling would be uneconomical since the model would take as long to build as the system it represented, and would be expensive to develop and control. Stewart (1992) addressed these potential limitations by suggesting that analysts apply value functions without fuzzy set theory to make it simple, easier to use, and transparent enough to generate further insights and understanding. The success of model implementation depends on good communication between the analyst and the decision maker (P?yh?nen and H?m?l?inen, 2000). Stewart (1992) stated that although attempts to apply fuzzy set theory to value functions may lead to effective models, doing so may enlarge the scope for misunderstanding between analysts and decision makers because the inputs required from the decision makers are not as straightforward as the unequivocal language of relative values. He further stated that the fuzziness of judgments is not an important matter in practical value function analyses because the decision maker can handle it by conducting sensitivity analyses. This study adopts Stewart’s (1992) suggestion by applying the value measurement theory without the fuzzy set theory.

    We believe that the use of simple and understandable measurement methods contributes significantly to the important goal of improving the understanding and practice of SCPM. Likewise, research in MCDA has also called for the use of simple, understandable, and usable approaches for solving MCDA problems (Dyer et al., 1992; Chang and Yeh, 2001; Mendoza and Martines, 2006). Experiments (for example, Schoemaker and Waid, 1882; and Brugha, 2004) have shown that decision makers prefer simpler methods because such methods make it easier to understand and thus make them feel more in control. MAVT has several aggregation models, but this paper employs the additive aggregation model because it is the simplest and most widely used form (Belton, 1986; von Nitzsch and Weber, 1993; Belton and Stewart, 2002). According to Stewart (1992), the additive form is welljustified theoretically, and is easily understood because the relationship between the inputs and the output of the model are not hidden by the complicated mathematical calculation.

       2.2 Weighted Additive Model of SCPM

    The weighted additive model of SCPM can be written as:


    where the overall value (score) V(x) represents the supply chain performance index; vi is a partial value function associated with measure ith for measuring the preference of achieving different levels of performance; xi is the performance level (outcome) in terms of measure ith; and ki ≥ 0 is the weight assigned to measure i and


    Three assumptions must be kept in mind when applying the weighted additive model (Belton and Stewart, 2002). First, all measures have mutual preferential independence; the preference ordering in terms of one measure should not depend on the levels of performance on other measures. Second, the partial value functions are on an interval scale; only ratios of differences between values are meaningful. Third, weights are scaling constants; any method of assessing weights must be consistent with the algebraic meaning in the additive value function.

       2.3 Assessing Weights of Measures

    The weight parameters ki in the additive value function have a very specific algebraic meaning as shown in Equation 1b (Salo and Hamalainen, 1997). Assume that a suitable range of measurement scale [xi°, xi*] has been defined to cover the performance of the ith metric. It is not unusual to normalize the value function such that the values V() = V(x1°,x2°,…,xm°) = 0 and V(x*) = (x1*,x2*,…,xm*) = 1 are assigned to the worst and best conceivable performance. By normalizing the partial value functions onto the [0, 1] range, the additive representation can be written as:


    where vi(xi) = [Vi(xi) - Vi(xi°)]/Vi(xi* ) - Vi(x)] ∈ [0, 1] is the normalized score of x on the ith metric and ki = Vi (xi* ) - V i (xi°) is the weight of the ith metric. This expression of ki implies that if the measurement scales of metrics are changed, the weights need to be changed as well. Therefore, it should not be assumed that the weights are known prior to the construction of the measurement scale (Vargas, 1986; Belton and Stewart, 2002). Such methods of eliciting weights as the AHP (Saaty, 1980) and the fuzzy AHP (Boender et al., 1989) do not correspond to this algebraic meaning because their resulting weights are assessed in isolation from the ranges of measurement scales. Such methods therefore, may be prone to biased weights.

    The tradeoff procedure (Keeney and Raiffa, 1976) ? the standard method of eliciting weights for the additive model ? has the strongest theoretical foundation (Keeney and Raiffa, 1976; Schoemaker and Waid, 1982; Weber and Borcherding, 1993), yet this method is complicated and more likely to produce elicitation errors (Schoemaker and Waid, 1982; Borcherding et al., 1991; Edwards and Barron, 1994). This study therefore, applies the swing weighting method (von Winterfeldt and Edwards, 1986), which also satisfies the requirement that weights be reliant on the measurement scale. According to Edwards and Barron (1994), this method is simpler to use and more likely to be useful.

    The swing weighting method would work as follows: First, the evaluator needs to consider a hypothetical situation in which all the metrics would be at their worst possible levels. The evaluator is allowed to move (swing) the most important metric to the best level and this metric would be assigned 100 points. The second most desirable attribute and the remaining attributes would then be respectively moved and assigned less than 100 points. The given points would then finally be normalized to sum to one to yield the final weights. The swing procedure will be explained in more detail when the case study is presented.

       2.4 Assessing Value Functions

    The value function reflects the evaluator’s preferences for different levels of achievement on the measurement scale. The first step in defining a value function is to identify its measurable scale. The second step is to establish the scale of the performance score so that the performance results from diverse measures can be combined into a meaningful figure. Next, the value function is constructed to convert the performance data into the performance score that reflects the extent to which the evaluator has a preference.

    2.4.1 Interval Scale of Measurement

    In the proposed method, the performance is assessed on the interval scale of measurement. To construct the interval scale, the evaluator specifies two end points of the scale. The end points can be defined in many ways (see for example, Belton and Stewart, 2002, § 5.2.1; von Winterfeldt and Edwards, 1986, § 7.3), but this study finds it useful to follow Chan and Qi’s measurement scale, set in terms of an interval [bottom, perfect]. The bottom value represents the worst conceivable performance on the particular metric, and the perfect value indicates the most satisfactory performance. Since changing the scale can be somewhat cumbersome, it is suggested that evaluators choose end points that are very likely to include any possible future performance (von Winterfeldt and Edwards, 1986).

    2.4.2 Performance Score and its Scale

    After the extreme points of the measurement scale have been specified, consideration must be given to the performance score, its scale, and how the score is to be assessed. The performance score is the logical number indicating the degree to which the particular performance satisfies the evaluator. Like Chan and Qi (2003a), this study sets the performance score on a scale of 0 to 10. The perfect point of the measurement scale is given a score of 10 and the bottom a score of 0. Other performance levels will receive intermediate scores which reflect their preferences relative to the extreme points.

       2.4.3 Eigenvector Method for Assessing Value Functions

    Although several techniques are available for developing value functions, the proposed method of eliciting values relies on the eigenvector method of the analytic hierarchy process (AHP) (Saaty, 1980). The AHP is an approach to multiple criteria decision analysis that has been extensively applied in modeling the human judgment process (Lee et al., 1995). It is a theory of measurement that derives ratio scales, which reflect priorities of elements, from paired comparisons in multilevel hierarchic structures (Saaty, 1996).

    The AHP is based on three principles: decomposition, comparative judgments, and the synthesis of priorities. The decomposition principle allows problem attributes to be decomposed to form a hierarchy. The principle of comparative judgments enables the assessment of pairwise comparisons of elements within a given level with respect to their parent in the adjacent upper level. The elements are compared according to the strength of their influence, which can be made in terms of importance, preference or likelihood. These pairwise comparisons are placed into comparison matrices to calculate the ratio scales that reflect the local priorities of elements. The principle of a synthesis of priorities allows decision makers to multiply the local priorities of the elements in a cluster according to the global priority of the parent element, thus producing global priorities throughout the hierarchy. In this paper, the proposed method of eliciting values is based on the second principle of the AHP.

    Kamenetzky (1982) and Vargas (1986) have shown that it is possible to derive value functions from reciprocal pairwise comparisons and Saaty’s eigenvector method. The AHP-the eigenvector procedure in particular-is used to elicit values because of its unique characteristics. First, pairwise comparison judgments are easy to elicit because the evaluator can consider only two elements at a time. Second, the AHP allows for inconsistency in each set of pairwise judgments, and provides a measure of such inconsistency. Third, the redundancy of the information contained in the systematic pairwise comparisons contributes to the robustness of the value estimation (Kamenetzky, 1982). Finally, pair comparisons do not require making any assumption about the form of the value function.

    Now we can take a closer look at the proposed method for developing partial value functions through the use of Saaty’s eigenvector method. To construct the partial value function, for each measure, the evaluator needs to establish the scale of measurement in terms of an interval [bottom, perfect]. As the value function would be curvilinear, the intermediate points on the measurement scale need to be specified to reveal the shape of the value curve. These points may be selected purposely to make the comparison as simple as possible in the sense that they are equally distributed throughout the scale of measurement. Since at this stage we do not know yet how many points (or ‘ratings’ according to the AHP terminology) on the interval scale are adequate for an accurate assessment of a partial value function, we assume that there are n points. Note that it is imperative for n to embrace two extreme points in order that the compatible MAVT performance scores can be derived later.

    The comparison between the pair of performance outcomes p, qn for metric i would simply take the form: “For metric i, how preference is outcome p when compared to outcome q? ” The evaluator would then provide the specified response in either numerical or verbal mode of judgments, as indicated in Table 1.

    The response, denoted by apq, is positioned into a pairwise comparison matrix [A]n×n. The importance of element q with respect to element p is the reciprocal of apq. The comparison process is carried out as long as all pairs of n are compared. A matrix of pairwise comparison values [A]n×n is then formed:


    Local priorities are determined by solving the following matrix equation (Saaty, 1980):


    where [W]n×1 is the normalized eigenvector and λmax is the largest eigenvalue of the matrix[A]n×n. By this equation, [W]n×1 provides the priority ordering of preference, whereas λ max is a measure of the consistency of the judgment.

    A standard measure of the consistency of the evaluator’s judgment can be performed for each matrix by calculating a consistency ratio (C.R.), which is a function of comparison matrix dimensions (nxn), a random index (R.I.), and the principal eigenvalue ( λ max )-that is:


    Based on simulations, the random index for various matrix sizes has been provided by Saaty (1980), as shown in Table 2. The acceptable C.R. range varies according to the size of matrix, i.e. 0.05 for a 3 by 3 matrix, 0.08 for a 4 by 4 matrix and 0.1 for all larger matrices (n≥ 5) (Saaty, 1994).

    The AHP in theory gives values on a ratio scale summed to one, whereas the MAVT scores in this study are on the 0-10 interval scale. To construct the partial value function, the priority orderings [W]n×1 = wj, j = 1,…,n need to be transformed into the performance scores (w cj) -the scale of which has its lowest priority score at zero and the highest priority score at 10. The scale conversion is done by linear transformation, which is recommended and used by Kamenetzky (1982), Vargas (1886), and Mustajoki and Hamalainen (2000). The converted score wcj for wj is defined as:


    The wcj will be used to estimate the partial value function. By this transformation, wcj will not have the ratio scale property anymore, but it will have the property of an interval scale. Nevertheless, it is enough to indicate the strength of preference in the value function.

    At this point, it is necessary to make certain that the value assessment process involves a fair number of n, at the same time, not being too unwieldy to obtain the value function. Kamenetzky (1982) and Pan and Rahman (1998) suggested that the above method seems to work well when there are a small number of n. Saaty (1980) suggested that the human brain has the psychological limit of 7±2 items in a simultaneous comparison. Therefore, we would need 5 performance ratings to avoid the complication in estimating a value function. The simulations of Stewart (1993, 1996) confirmed the robustness of analyses to the use of 5 point estimates for value functions. Thus, 5 points on the interval scale (two ‘endpoints’ and three ‘midpoints’) are adequate to obtain a good approximation of a value function.

    2.4.4 Value Curve Fitting

    Having determined the five points and their corresponding scores, we can then graph and draw a curve through them. By drawing a line through the five individually assessed points, we can gain some idea about the shape and a possible functional form of the function. To standardize value analysis into a uniformly recognized form, we will fit a curve through these points to determine the corresponding equation for vi(xi) Most value functions can be fitted by exponential or polynomial functions (von Winterfeldt and Edwards, 1986).

    It is very simple and easy for practitioners to use a Microsoft® Excel spreadsheet to conduct linear or nonlinear regression analyses since the spreadsheet does not require users to have an intimate understanding of the mathematics behind the curve fitting process. What is required from the users is the ability to select the correct type of regression analysis and the ability to judge the goodness of fit from the estimated function. By preparing an XY (Scatter) plot and using the ‘Add Trendline’ function- the value curve, its mathematical equation, and its Rsquared value can be obtained. As the assessment of a value function is subjective, a perfect representation is not necessary (von Winterfeldt and Edwards, 1986; Clemen, 1996). A smooth curve drawn through the assessed points as well as its equation should be an adequate representation of the value function with regard to a particular metric. The R-squared value provides an estimate of goodness of fit of the function to the data. A function is most reliable when its R-squared value is at or near 1.

       2.5 Synthesizing Information

    After determining the swing weights, the partial value functions, and the current performance data of supply chain measures, the performance index can be computed. The performance index is determined by applying Equation 1a, multiplying the value score of a performance measure by the swing weight of that measure and then adding the resultant values. Because the values relating to individual measures have been assessed on a 0 to 10 scale and the weights are normalized to sum to 1, then the overall values of the supply chain performance index will lie on a 0 to 10 scale.

    Note that supply chain performance is often assessed by managers working as a group whose information could

    be utilized in the evaluation process. They normally come from various functions and management levels, and do not have equal expertise and knowledge. Since they may have different opinions, they may need to use an approach that allows them to aggregate individual judgments to obtain a group judgment. To resolve the differences, they may use mathematical aggregation to combine individual judgments. Mathematical aggregation methods involve such techniques as calculating simple averages and weighted averages of the judgments of individual evaluators. If some evaluators are better judges than others, the judgment aggregation process could adopt the weighted average method (Goodwin and Wright, 2004).


    The case study selected to illustrate how the proposed measurement method can be applied looks at how one supply chain analyst evaluated the performance of a cement manufacturing supply chain in Thailand. Although multiple evaluators participated in our research, for the sake of brevity, we include only the assessment of one evaluator for this paper. The evaluator applied the Supply Chain Operations Reference (SCOR) model level 1 metrics (Supply-Chain Council 2006) to the performance model shown in Figure 1 (see Table 3 for metric definitions and abbreviations used in this study, and Table 4 for the monthly performance data). After examining the historical performance, the evaluator specified five performance ratings for every metric: two endpoints of the measurement scale and three arbitrary intermediate points. The weighted additive value function that depicted the supply chain performance was based on the SCOR level 1 metrics as shown in the following equation:


    The first step in developing the compound value function V (x1, x2,…, x10) was to determine the weights k1, k2,…, k10. The swing weight approach was applied by asking the evaluator to imagine a hypothetical situation in which all ten measures would be at their least preferred conceivable performance (the bottom values). Then the evaluator was asked: If just one of these performance measures could be moved to its best level, which would he choose? The evaluator selected POF. After this change was made, he was asked which measure he would next choose to move to its best level, and so on. Finally, the results were ranked in the following sequence: 1) POF, 2) COGS, 3) SCMC, 4) DSCA, 5) OFCT, 6) C2C, 7) ROSCFA, 8) ROWC, 9) USCA, and 10) USCF.

    POF, the highest rank, was given a weight of 100. Other weights were assessed in the following series of steps. The evaluator was asked to compare a swing from the highest COGS to the lowest, with a swing from the lowest POF to the highest. After some thought, he decided that the swing in COGS was 92% as important as the swing in POF so COGS was given a weight of 92. Similarly, a swing from the worst to the best performance for SCMC was considered to be 87% as important as that of the worst to the best performance for POF, so SCMC was assigned a weight of 87. The swing procedure was repeated for the rest of the measures. The evaluator worked with a visual analogue scale like the one shown in Figure 2 to assess the relative magnitude of the swing weights. The ten weights obtained sum to 672, and since it is conventional to normalize them so that they add up to 1. Normalization is achieved by simply dividing each weight by the sum of weights (672). The normalized

    swing weights are shown in Figure 2.

    After eliciting the swing weights, the evaluator needed to develop the partial value functions v1(x1), v2(x2), …, v10(x10). The partial value function of POF v1(x1) was obtained by asking the evaluator to compare in a pairwise fashion the relative preference of performance ratings of POF. For example, in terms of ‘Perfect Order Fulfillment,’ which performance level was more preferable, 98% or 95% ? And how did he rank preference differences when using the verbal judgment scale? The evaluator replied that 98% was moderately preferable to 95% and this judgment was then transformed into the numerical scale of 3 according to the instruction as shown in Table 1. After all performance ratings had been compared pair by pair, a paired comparison or judgment matrix was formed so that the vector of priorities, the largest eigenvalue, the consistency ratio, and the performance scores ranging from zero to ten could be calculated. Based on the evaluator’s assessment and the numerical scale in Table 1, the POF pairwise comparison matrix and its computed data can be obtained as shown in Table 5. Similarly, Table 6 to 14 summarize the paired comparisons and the computed data of other metrics.

    The partial value functions of ten measures are given in Table 15.

    Based on the partial value functions and the swing

    weights, the compound value function V (x1, x2 ,…, x10 ) would look like this:


    For the purposes of illustration, the performance data presented in Table 16 are from the sample month of December 2006. Using the partial value functions v1(x1), v2 (x2),…, v10 (x10), depicted in Table 15, the corresponding scores (values) can be calculated as shown in Table 16 for the calculated scores. Based on Equation 7, the supply chain performance index for December was 2.99.

    The number reveals that the overall supply chain performance was not very satisfactory. The supply chain manager would need to refine the supply chain operations to improve the performance. To monitor the progress of the supply chain, the monthly historical performance indices were calculated and plotted with the recent index as shown in Figure 3.

    To compare the indices computed from the proposed

    measurement method with those whose value functions are linear by default, all the partial value functions were then assumed to be linear with respect to their bottom and perfect values, whereas the swing weights remained the same. By the default assumption of linearity, its resulting performance indices could be calculated and depicted as shown in Figure 3 to compare with those whose value functions would permit non-linearity.

    From the figure one can see that the linearization indices were systematically higher than their counterparts. The average PI score assuming linearity was 5.21, whereas the average PI of the proposed method was 3.68.

    There is a significant difference (15.2%) in terms of values between the average results of the two methods with respect to the ten-point scale. Since the two methods use the same set of performance data and swing weights, the difference was mainly attributed to the value curves. The finding of this case study supported evidence from the MCDA literature by showing how the default assumption of linearity can have a significant impact on the measurement result.

    The proposed method’s value functions were mostly convex. Given the same measurement scale, linear functions map the performance outcomes into the higher performance scores, compared to those mapped by convex functions. This finding has an implication to the choice of value functions in real measurement problems. In practical terms, convex curves are more likely to motivate people to improve or maintain high performance because if they do not do so, they could earn extremely low marks for the measurement results. The overestimation of the

    measurement results could not only lower the motivation for upgrading the performance but could also send a mis-

    leading signal to managers regarding the sense of urgency to improve the performance.


    Chan and Qi (2003a) proposed the measurement and aggregation algorithm based on fuzzy sets and linear value functions to calculate the performance index for the supply chain. Although the measurement method is helpful in analyzing supply chain performance, the fuzzy set techniques can be quite complex due to the considerable number of calculations that are required. At the same time, it may produce defective weights because their meanings are not consistent with the weights in additive models. Moreover, the linearization of partial value functions can lead to a misleading performance index. To resolve these issues, this paper develops a user-friendly alternative measurement approach whose weighting parameters are pertinent to scaling constants in the additive model. The method developed is applicable to both linear and nonlinear value functions.

    The proposed measurement method is presented based on the integration of the multiattribute value theory and the eigenvector method of the analytic hierarchy process and a real-world case study is provided. The weighted additive model is used to aggregate the performance information because it is the most widely used model. The measurement method relies on the swing weights of the supply chain metrics and on the eigenvector procedure for building partial value functions. The swing weighting method is applied because it produces weights compatible with weights in additive models. The eigenvector method provides a simple and useful tool in modeling both the linearity and non-linearity of value judgments. Once this method is fully applied, all the supply chain performance information can be aggregated into the overall performance index. As the performance index is formulated as a compound function of quantitative SCM measures, it can facilitate quantitative SCM research that investigates supply chain modeling and optimization.

    The case study shows how the default assumption of linearity can affect the measurement result. It is advisable therefore, to allow non-linearity to take place when modeling human preference. Adopting non-linearity involves additional efforts: identifying additional anchor points, conducting pairwise comparisons, and performing additional calculations and regression analyses. It is, however, worth all the effort to do so not only to guard against obtaining misleading performance indices but also to understand the current performance situation and attitudes reflected in value functions.

    The proposed measurement method has several advantages. First, it is flexible because it can handle both linearity and non-linearity. Second, the method is userfriendly because it is made up of simple and understandable MCDA tools. Belton and Stewart (2002) stated that the transparency, simplicity and user-friendly aspects of both the simple additive model and the AHP account for their widespread popularity. The proposed method shares these characteristics.

  • 1. Beamon B. M. (1999) Measuring supply chain performance [International Journal of Operations and Production Management] Vol.19 P.275-292 google doi
  • 2. Bechtel C., Jayaram J. (1997) Supply chain management: a strategic perspective [The International Journal of Logistics Management] Vol.8 P.15-34 google doi
  • 3. Belton V. (1986) A comparison of the analytic hierarchy process and a simple multi-attribute value function [European Journal of Operational Research] Vol.26 P.7-21 google doi
  • 4. Belton V., Stewart T. J. (2002) Multiple Criteria Decision Analysis: An Integrated Approach google
  • 5. Boender C. G. E., de Graan J. G., Lootsma F. A. (1989) Multi-criteria decision analysis with fuzzy pairwise comparisons [Fuzzy Sets and Systems] Vol.29 P.133-143 google doi
  • 6. Bolstorff P. (2003) Measuring the Impact of Supply Chain Performance [CLO/Chief Logistics Officer] Vol.12 P.6-11 google
  • 7. Borcherding K., Eppel T., von Winterfeldt D. (1991) Comparison of weighting judgments in multiattribute utility measurement [Management Science] Vol.37 P.1603-1619 google doi
  • 8. Bozdag C. E., Kahraman C., Ruan D. (2003) Fuzzy group decision making for selection among computer integrated manufacturing systems [Computers in Industry] Vol.51 P.13-29 google doi
  • 9. Brewer P. C., Speh T. W. (2000) Using the balanced scorecard to measure supply chain performance [Journal of Business Logistics] Vol.21 P.75-93 google
  • 10. Brugha C. M. (2004) Phased multicriteria preference finding [European Journal of Operational Research] Vol.158 P.308-316 google doi
  • 11. Chan F. T. S., Chan H. K., Qi H. J. (2006) A review of performance measurement systems for supply chain management [International Journal of Business Performance Management] Vol.8 P.110-131 google doi
  • 12. Chan F. T. S. (2003) Performance measurement in a supply chain [International Journal of Advanced Manufacturing Technology] Vol.21 P.534-548 google doi
  • 13. Chan F. T. S., Qi H. J. (2003a) An innovative performance measurement method for supply chain management [Supply Chain Management: An International Journal] Vol.8 P.209-223 google
  • 14. Chan F. T. S., Qi H. J. (2003b) Feasibility of performance measurement system for supply chain: a process-based approach and measures [Integrated Manufacturing Systems] Vol.14 P.179-190 google doi
  • 15. Chang Y., Yeh C. (2001) Evaluating airline competitiveness using multiattribute decision making [Omega: The International Journal of Management Science] Vol.29 P.405-415 google
  • 16. Christopher M. (1998) Logistics and Supply Chain Management: Strategies for Reducing Cost and Improving Service google
  • 17. Clemen R. T. (1996) Making Hard Decisions: An Introduction to Decision Analysis google
  • 18. Dasgupta T. (2003) Using the six-sigma metric to measure and improve the performance of a supply chain [Total Quality Management] Vol.14 P.355-366 google doi
  • 19. Dyer J. S., Sarin R. K. (1979) Measurable multiattribute value functions [Operations Research] Vol.27 P.810-822 google doi
  • 20. Dyer J. S., Fishburn P. C., Steuer R. E., Wallenius J., Zionts S. (1992) Multiple criteria decision making, multiattribute utility theory: the next ten years [Management Science] Vol.38 P.645-653 google doi
  • 21. Edwards W., Barron F. H. (1994) SMARTS and SMARTER: improved simple methods for multiat-tribute utility measurement [Organizational Behavior and Human Decision Processes] Vol.60 P.306-25 google doi
  • 22. Farris II M. T., Hutchison P. D. (2002) Cash-to-cash: the new supply chain management metric [International Journal of Physical Distribution and Logistics Management] Vol.32 P.288-98 google doi
  • 23. Fawcett S. E., Cooper M. B. (1998) Logistics performance measurement and customer success [Industrial Marketing Management] Vol.27 P.341-357 google doi
  • 24. Forman E., Selly M. A. (2001) Decision by objectives: how to convince others that you are right. google
  • 25. Goodwin P., Wright G. (2004) Decision Analysis for Management Judgment 3rd ed google
  • 26. Griffis S. E., Cooper M., Goldsby T. J., Closs D. J. (2004) Performance measurement: measure selection based upon firm goals and information reporting needs [Journal of Business Logistics] Vol.25 P.95-118 google doi
  • 27. Gunasekaran A., Patel C., Tirtiroglu E. (2001) Performance measures and metrics in a supply chain environment [International Journal of Operations and Production Management] Vol.21 P.71-87 google doi
  • 28. Gunasekaran A., Patel C., McGaughey R. E. (2004) A framework for supply chain performance measurement [International Journal of Production Economics] Vol.87 P.333-347 google doi
  • 29. Harland C. M., Lamming R. C., Walker H., Phillips W. E., Caldwell N. D., Johnsen T. E., Knight L. A., Zheng J. (2006) Supply management: is it a discipline? [International Journal of Operations and Production Management] Vol.26 P.730-753 google doi
  • 30. Harrison A., New C. (2002) The role of coherent supply chain strategy and performance management in achieving competitive advantage: an international survey [Journal of Operational Research Society] Vol.53 P.263-271 google doi
  • 31. Hausman W. H., Harrison T. P., Lee H. L., Neale J. J. (2004) Supply Chain Performance Metrics, The Practice of Supply Chain Management: where theory and application converge P.61-73 google
  • 32. Kamenetzky R. D. (1982) The relationship between the analytic hierarchy process and the additive value function [Decision Sciences] Vol.13 P.702-713 google
  • 33. Keeney R. L., Raiffa H. (1976) Decisions with Multiple Objectives: Preference and Value Tradeoffs google
  • 34. Kleijnen J. P. C., Smits M. T. (2003) Performance metrics in supply chain management [Journal of Operational Research Society] Vol.54 P.507-514 google doi
  • 35. Lambert D. M., Pohlen T. L. (2001) Supply chain metrics [International Journal of Logistics Management] Vol.12 P.1-19 google
  • 36. Lee H., Kwak W., Han I. (1995) Developing a business performance evaluation system: an analytic hierarchical model [The Engineering Economist] Vol.40 P.343-357 google doi
  • 37. Lohman C., Fortuin L., Wouters M. (2004) Designing a performance measurement system: a case study [European Journal of Operational Research] Vol.156 P.267-286 google doi
  • 38. Maskell B. H. (1991) Performance Measurement for World Class Manufacturing: A Model for American Companies google
  • 39. Mendoza G. A., Martins H. (2006) Multi-criteria decision analysis in natural resource management: a critical review of methods and new modeling paradigms [Forest Ecology and Management] Vol.230 P.1-22 google doi
  • 40. Mustajoki J., Hamalainen R. P. (2000) Web-HIPRE: global decision support by value tree and analysis [INFOR Journal: Information Systems and Operational Research] Vol.38 P.208-220 google
  • 41. Neely A., Gregory M., Platts K. (1995) Performance measurement system design: a literature review and research agenda [International Journal of Operations and Production Management] Vol.15 P.80-116 google
  • 42. Novack R. A., Thomas D. J. (2004) The challenges of implementing the perfect order concept [Transportation Journal] Vol.43 P.5-16 google
  • 43. Pan J., Rahman S. (1998) Multiattribute utility analysis with imprecise information: an enhanced decision support technique for the evaluation of electric generation expansion strategies [Electric Power Systems Research] Vol.46 P.101-109 google doi
  • 44. Poyhonen M., Hamalainen R. P. (2000) There is hope in attribute weighting [INFOR Journal: Information Systems and Operational Research] Vol.38 P.272-282 google
  • 45. Saaty T. L. (1980) Multicriteria Decision Making: The Analytic Hierarchy Process google
  • 46. Saaty T. L. (1994) How to make a decision: the analytic hierarchy process [Interfaces] Vol.24 P.18-43 google
  • 47. Saaty T. L. (1996) Decision Making with Dependence and Feedback: The Analytic Network Process google
  • 48. Salo A. A., Hamalainen R. P. (1997) On the measurement of preferences in the analytic hierarchy process [Journal of Multi-criteria Decision Analysis] Vol.6 P.309-319 google
  • 49. Schoemaker P. J. H., Waid C. C. (1982) An experimental comparison of different approaches to determining weights in additive utility models [Management Science] Vol.28 P.182-196 google doi
  • 50. Seth N., Deshmukh S. G., Vrat P. (2006) A framework for measurement of quality of service in supply chains [Supply Chain Management: An International Journal] Vol.11 P.82-94 google
  • 51. Simatupang T. M., Sridharan R. (2002) The collaborative supply chain [International Journal of Logistics Management] Vol.13 P.15-30 google doi
  • 52. Stewart T. J. (1992) A critical survey on the status of multiple criteria decision making theory and practice [Omega: International Journal of Management Science] Vol.20 P.569-586 google
  • 53. Stewart T. J. (1993) Use of piecewise linear value functions in interactive multicriteria decision support: a monte carlo study [Management Science] Vol.39 P.1369-1381 google doi
  • 54. Stewart T. J. (1996) robustness of additive value function methods in MCDM [Journal of Multi-criteria Decision Analysis] Vol.5 P.301-309 google
  • 55. (2006) Supply-Chain Operations Reference-Model Version 8.0 google
  • 56. Vargas L. G. (1986) Utility theory and reciprocal pairwise comparisons: the eigenvector method [Socio-Economic Planning Science] Vol.20 P.387-391 google doi
  • 57. von Nitzsch R., Weber M. (1993) The effect of attribute ranges on weights in multiattribute utility measurements [Management Science] Vol.39 P.937-43 google doi
  • 58. von Winterfeldt D., Edwards W. (1986) Decision Analysis and Behavioral Research google
  • 59. Weber M., Borcherding K. (1993) Behavioral influences on weights judgments in multiattribute decision making [European Journal of Operational Research] Vol.67 P.1-12 google doi
  • 60. Zanakis S. H., Mandakovic T., Gupta S. K., Sahay S., Hong S. (1995) A review of program evaluation and fund allocation methods within the service and government sectors [Socio-Economic Planning Sciences] Vol.29 P.59-79 google doi
  • [Table 1.] Mapping from Verbal Judgments into AHP 1-9 Scales.
    Mapping from Verbal Judgments into AHP 1-9 Scales.
  • [Table 2.] The average random indices (R.I.).
    The average random indices (R.I.).
  • [Figure 1.] A SCOR-based Performance Model and Performance Ratings Identified by the Evaluator.
    A SCOR-based Performance Model and Performance Ratings Identified by the Evaluator.
  • [Table 3.] Definitions of SCOR Level 1 Metrics.
    Definitions of SCOR Level 1 Metrics.
  • [Table 4.] SCOR Level 1 Monthly Performance Data, 2006.
    SCOR Level 1 Monthly Performance Data, 2006.
  • [Figure 2.] Derivation of Swing Weights-the Graphic Representation of Scale.
    Derivation of Swing Weights-the Graphic Representation of Scale.
  • [Table 5.] Pairwise Comparison Judgments and Values of POF Performance Ratings.
    Pairwise Comparison Judgments and Values of POF Performance Ratings.
  • [Table 6.] Pairwise Comparison Judgments and Values of OFCT Performance Ratings.
    Pairwise Comparison Judgments and Values of OFCT Performance Ratings.
  • [Table 7.] Pairwise Comparison Judgments and Values of USCF Performance Ratings.
    Pairwise Comparison Judgments and Values of USCF Performance Ratings.
  • [Table 8.] Pairwise Comparison Judgments and Values of USCA Performance Ratings.
    Pairwise Comparison Judgments and Values of USCA Performance Ratings.
  • [Table 9.] Pairwise Comparison Judgments and Values of DSCA Performance Ratings.
    Pairwise Comparison Judgments and Values of DSCA Performance Ratings.
  • [Table 10.] Pairwise Comparison Judgments and Values of SCMC Performance Ratings.
    Pairwise Comparison Judgments and Values of SCMC Performance Ratings.
  • [Table 11.] Pairwise Comparison Judgments and Values of COGS Performance Ratings.
    Pairwise Comparison Judgments and Values of COGS Performance Ratings.
  • [Table 12.] Pairwise Comparison Judgments and Values of C2C Performance Ratings.
    Pairwise Comparison Judgments and Values of C2C Performance Ratings.
  • [Table 13.] Pairwise Comparison Judgments and Values of ROSCFA Performance Ratings.
    Pairwise Comparison Judgments and Values of ROSCFA Performance Ratings.
  • [Table 14.] Pairwise Comparison Judgments and Values of ROWC Performance Ratings.
    Pairwise Comparison Judgments and Values of ROWC Performance Ratings.
  • [Table 15.] Partial Value Functions for SCOR Level 1 Metrics.
    Partial Value Functions for SCOR Level 1 Metrics.
  • [Table 16.] Performance of the supply chain of the case study, December 2006.
    Performance of the supply chain of the case study, December 2006.
  • [Table 151.] Partial Value Functions for SCOR Level 1 Metrics (Cont.).
    Partial Value Functions for SCOR Level 1 Metrics (Cont.).
  • [Figure 3.] Comparisons between the performance indices of the proposed method and those of the linear function method.
    Comparisons between the performance indices of the proposed method and those of the linear function method.