CART – identification of clusters with different adoption probabilities

Hypothesis 11 – Industries with ex ante better knowledge about ICT will adopt e-business technologies more rapidly

5. Static analysis of e-business adoption

5.4. CART – identification of clusters with different adoption probabilities

Up to this point, two parametric approaches were used to test technological interdependencies and their influ-ence on adoption decisions. In this section, a novel non-parametric technique is introduced to identify clusters of firms that exhibit significant differences in their adoption probability. The analysis complements the parametric results in three ways: First, CART simultaneously identifies clusters and significant predictor variables that char-acterize the clusters. An identification of clusters is not possible with parametric regression models. The cluster results from the CART analysis are of high practical value for marketing purposes because they allow qualified predictions about the adoption probability of a firm for various technologies, based on very few (only the most relevant) parameters. Marketers of e-business technologies and consultants can use the results to evaluate for a given customer which investment into e-business technologies the customer is most likely to undertake next.

Also, the identification of clusters allows to focus marketing activities on those market segments that are most likely to purchase. Second, CART identifies complex, non-linear relationships between technological and struc-tural variables (such as firm size and market of operation) that were not captured by the parametric models pre-sented above. As such, they provide unique and additional insights into the relationships that are present in the data. Third, CART relaxes important restrictions of parametric models: It does not make any assumptions about the distribution of error terms and the functional form of the explanatory variables. Also, CART is robust to out-liers and invariant to monotone transformations of predictors (Gatnar 2002). Thus, the identification of relevant predictor variables and clusters is independent from any prior assumption that might not actually be justified by the data. Finally, in contrast to the parametric regression models, CART uses an out-of-sample validation

proce-dure to eliminate an over-fitting of the model. For all of the above reasons, CART can be used to “double check” the results of the earlier sections: If CART detects relationships in the data that are inconsistent with the parametric regression results, this suggests that the parametric results might be erroneously influenced by their restrictive assumptions.

CART was first introduced by Breiman et. al. (1984). It can loosely be codified as a combination of non-parametric regression and cluster analysis. The result, a “tree” presented in graphic form, is both parsimonious and easy to interpret. CART has recently been used in numerous studies in the medical sciences (Zhang and Bracken, 1995; Zhang and Singer, 1999) with a focus on classifying patients into risk groups. For example, this is important for physicians in emergency rooms who need to decide on appropriate levels of medical care for ar-riving patients when only few clinical factors are available. To my knowledge, however, the application of CART in a managerial or economic context is still novel. Therefore, a short description of the method is in-cluded at this point and a short technical introduction to CART is given in Appendix 1.

The basic idea of CART is to systematically split the dataset into homogeneous groups with respect to the de-pendent variable based on the best set of predictors. The final tree is computed in four steps.

In the first step, called recursive partitioning, the sample of subjects is systematically sorted into completely homogeneous subsets until a saturated tree is found. In this study, complete homogeneity means that a node con-tains either only adopters or non-adopters. The root node of a tree concon-tains the sample of subjects from which the tree is grown. Then, based on the parameter value that is most predictive for the outcome, the root node is split into two daughter nodes that now form a second layer of the tree. All nodes in the same layer constitute a partition of the root node. The process of splitting nodes is continued and the partition becomes finer and finer as the layer gets deeper and deeper. For each split, CART considers the entire set of available predictor variables to determine which one maximizes the homogeneity of the following two daughter nodes. This is a hierarchical process that reveals interdependencies between covariates. Also, a predictor might show up numerous times in different parts of the tree. Each case of the sample is sorted into one of the daughter nodes at each layer of the tree, according to the splitting rule that was used. Those subsets that are not split are called terminal nodes.

When a case finally moves into a terminal subset, its predicted class is given by the class label attached to that terminal subset (e.g. “adopter {Y=1}” or “non-adopter {Y=0}” for node t). The process is continued until the nodes are completely homogeneous and cannot be split any further. This is the saturated tree. The saturated tree is usually too large to be useful. In the worst case, it is trivial because each terminal node could consist of just one case. The resulting model is subject to over-fitting problems. Therefore, one must find a nested sub-tree of the saturated tree that exhibits the best “true” classification performance and satisfies statistical inference meas-ures.

To proceed, a series of nested optimal sub-trees of the saturated tree is generated. This second step in the process is called pruning. The cost-complexity pruning algorithm suggested by Breiman et. al. (1984) is used here, which ensures that a uniquely best sub-tree can be found for any given tree complexity.

In a third step, one of the trees must be selected from the pruning sequence. The solutions lies in finding an honest estimate for the true classification performance and selecting the sub-tree that minimizes the estimated true misclassification costs. This is usually done with an independent test sample, boot-strapping, or cross-validation. In this study, a 20-fold cross validation procedure is employed because it makes better use of the in-formation contained in the original dataset than the independent test sample method and outperforms bootstrap-ping in terms of reduced bias (Breiman et. al., 1984, pp. 72-78, 311-313).²¹

Following these steps, the best classifying tree can be identified. However, because we are mainly interested in interpreting the revealed structures, it must also be ensured that the model satisfies the usual significance tests.

The final tree is computed by calculating significance tests for all splits in the tree and dropping those splits (and their successors) that are not significant at the 95% confidence level or above.

The same dataset as before is used for the CART analysis, originating from the June 2002 enterprise survey of the e-Business Market W@tch. CARTs are computed for all 11 technologies listed in Table 12. For each of the technologies, the remaining 10 technologies are included as predictors and all additional predictor variables are listed in Table 21. No abbreviation is specified for variables that did not show up as a relevant predictor in any of the 11 trees.7

___________

21 Estimation was carried out with CART 5.0 by Salford Systems.

Table 21 – Predictor variables in CART models

Predictor variable Abbreviation

Country COUNTRY Sector SECTOR Sizeclass SIZE Company has a website (yes/no) WEBSITE

More than one establishment (yes/no) N/A Company offers in-house computer or IT training (yes/no) N/A Company offers employees to participate in computer or IT train-ing offered by third parties (yes/no)

N/A Company provides tools for self-learning, for instance books or software (yes/no)

N/A Company offers employees to use some of their working time for learning activities (yes/no)

N/A Company judgement on importance of informal “learning on the job” (very, fairly, less, not important)

N/A Company judgement on importance of formal training schemes (very, fairly, less, not important) N/A Company judgement on importance of self-learning activities

(very, fairly, less, not important)

N/A Company has recruited or tried to recruit staff with special IT

skills during the last 12 months (yes/no)

IT_STAFF Company has experienced difficulties in finding staff with

spe-cial IT skills (yes/no)

N/A To what extent has e-business changed the way in which

com-pany conducts business (significantly, somewhat, not changed)

EBIZ_CHANGE

All firms included in the sample fulfill the necessary technological requirements to engage in e-business (use of computers, Internet, Email, and WWW). The terminal nodes can be ordered according to the ratio of adopters they contain. The numbers below the terminal nodes indicate this order, with 1 being the cluster with the highest adoption probability. The CART results for E-learning, CRM, and online sales are presented in the text. All re-maining CART models are included in Appendix 2. They can easily be interpreted in a likewise manner.

Figure 5 shows the final tree model for the usage of e-learning technologies. In the dataset, e-learning is de-fined as the usage of online, Internet-based technologies to support employee training. The final tree consists of four terminal nodes. CART uses three different predictor variables to construct the tree. Each of the terminal nodes exhibits different fractions of learning users. The root node contains 4,852 firms, 17.2% of them use learning. The most learning affine segment (number 1) contains 63.4% of adopters, whereas in the least e-learning affine segment (number 4) a fraction of only 7.6% uses e-e-learning. The terminal nodes each contain a different number of firms. Some of the nodes are rather small and describe rare, but statistically relevant sub-groups (like number 1, which contains only 153 firms or 3.2% of the sample), whereas others are very large (like number 4, which contains 2,594 firms or 53.5% of the sample). Note that the impact of each predictor vari-able on the ratio of adopters can be followed along the tree branches. For example, the fraction of e-learning us-ers increases from 17.2% in the root node to 28.2% for firms that share documents online. It again increases sharply if these firms also use an Internet-based Human Resource Management system. It is interesting to ob-serve that all co-variables in the tree are good predictors only for a specific sub-set of the sample, in interaction with other predictors, and do not turn out to be relevant in other parts of the tree. This is one of the unique in-sights into the data structures revealed by CART.

17.2%

Figure 5 – CART for E-learning

The results of the tree illustrate the importance of technological interdependencies. In fact, all three relevant predictor variables in the tree directly relate to the usage of other e-business technologies.

Segment 1, which exhibits almost 63.4% per cent of e-learning users, consists of firms that are already very advanced in the usage of Internet-based technologies. The average number of other Internet technologies in-stalled ( ) in this segment is 6.14, the highest among all terminal nodes in the tree. Segment 1 is sufficiently characterized by just three predictor variables: It includes firms that share documents online, use Internet tech-nologies to support human resource management functions (HRM), and use Knowledge Management Systems (KMS) that run on the Internet. At least HRM and KMS can be seen as rather advanced e-business applications that are not yet used by many companies.

ki, j₋

The probability to adopt e-learning decreases sharply if any of technologies (Share_do, HRM, KMS) is not used by a company. The lowest adoption probability is observable in cluster 4 that only contains 7.6% of e-learning users. This clusters is sufficiently characterized by just one predictor variable: it contains firms that do not use the Internet to share documents online. No additional predictor variable can improve the classification performance of this node. Cluster 4 contains a large share of the dataset (53.5%). Also, cluster 4 has the lowest average number of other Internet technologies installed (k_{i, j}₋ =1.02 ). Thus, the cluster that contains the firms that are “least advanced” in the usage of business technologies also has the lowest probability to adopt e-learning.

Other indicators in the dataset that reflected firm heterogeneity, such as size class or sector membership, do not turn up as relevant predictors in the tree. This should not be mistaken to indicate an irrelevance of other fac-tors leading to rank effects, such as firm size, sector, or country of origin. Indeed, several of these variables ex-hibited a significant impact on e-learning adoption in the logit regression (see table Table 13). The reason why they do not show up in the tree lies in the CART method, which only uses the predictor that minimizes node impurity. The second best predictor that might even be closely related does not show up in the tree. Firm size, sectors, or country of origin are candidates for such factors, based on previous analysis.

It has to be kept in mind that the usage of other e-business technologies as explanatory variables in the tree does not imply a causal relationship. It only implies that these variables are the best predictors for whether a firm uses e-learning or not, it does not say anything about the direction of the relationship or the presence of some other factor which could cause the correlation. However, the results are consistent with our findings from above, suggesting that more advanced firms exhibit higher adoption probabilities.

Figure 6 shows the results for online sales. The final tree consists of 5 terminal nodes. CART uses only 3 pre-dictor variables to construct the tree, one of the variables (sector membership) appears twice in the tree. The root node contains all 4,852 subjects again, the overall ratio of companies using the Internet to sell goods or services is 15.6%. The first layer splitting variable (which is the variable with the highest classification performance with

respect to the dependent variable) is whether a firm has a website or not. The probability that a company is selling online drops sharply from 15.6% to 3.3%, if the company does not have a website (cluster 5). Cluster 5 has the lowest overall probability for online sales. The result is intuitive – having a website is not only a basic indicator for the “e-readiness” of company, but also an important channel for online sales. Consequently, firms that do not exhibit this basic characteristic do not have a high probability to adopt online sales. Cluster 5 is also the cluster with the lowest average number of other adopted technologies (k_{i, j}₋ =1.28).

The second lowest probability for online sales occurs in cluster 4. Firms in this group do have a website, but they belong to an industry sector were anonymous remote orders and delivery are generally rare (food, chemi-cals, metal products, machinery, electronics and electrical machinery, transport equipment, real estate, business services, health services). The ratio of firms selling online in cluster 4 is 12%, compared to 30.1% in the other daughter node which contains all firms with a website from the remaining sectors (publishing, retail, tourism, monetary services, insurances, ICT services). The best splitting variable for this cluster is whether firms use the Internet to purchase online. Provided that purchasing online is already quite frequently used by all firms in the sample (46.9%), this can again be interpreted as a proxy for the “e-readiness” of a company. Cluster 3, which contains those firms that do not purchase online also exhibits a low number of other adopted technologies ( ) and the chance of selling online in this cluster drops from 30.1% to 22.9%, whereas otherwise it increases to 35.7%. The cluster with the highest probability to sell online are those enterprises that have a web-site, purchase online, and belong to either the retail or the tourism sector. This tree structure is interesting be-cause it points out two insights: First, the general technological development level again turns out to be a good predictor for adoption decisions (website, purchasing online). Second, some technologies – such as online sales – are clearly more valuable and relevant in some industries than in others. Thus, depending on the type of activ-ity a firm is pursuing, some firms will probably never adopt online sales even though they might be very vanced in using e-business applications otherwise. However, the probability to adopt still increases the more ad-vanced a firm already is, ceteris paribus.

ki, j₋ =1.5

15.6%

no yes

3.3% 19.5%

all others 2, 8, 9, 10, 11, 14

12.0% 30.1%

no yes

22.9% 35.7%

2, 10, 11, 14 8, 9

30.5% 51.7%

2 1

Website

Sector

Purch

Sector 4094

758

1121

38 2973

720

1903

260 1070

460

519

154 551

306

449 197

102 109

Figure 6 - CART for Selling online

Figure 7 shows the CART for Customer Relationship Management (CRM). This is the most parsimonious model. The root node contains 12% of CRM users, and the only relevant splitting variable is whether a firm uses a Knowledge Management System (KMS). If a firm does use KMS, the probability to adopt CRM increases to 47.8%, otherwise it drops to 8.8%. Again, this relationship does not imply any direct causality. Instead, KMS is again a proxy for the general development level of a firm: Whereas cluster 1 exhibits , cluster 2

ex-hibits only k . ^{i, j}

k ₋ =4.76

i, j₋ =1.83

12.0%

no yes

8.8% 47.8%

2 1

KMS 4272 580

4069 394

203 186

Figure 7 – CART for CRM

All CART models together emphasize technological interdependencies. Whenever a company uses any of the technologies identified in the tree models, the probability to adopt the technology under scrutiny increases and vice versa. This is in accordance to the findings in 5.2 and 5.3. Also, the CART models show interesting and di-verse adoption patterns for various technologies: The adoption probability for some technologies can be best predicted solely by the technological profile of a firm, while for other technologies information about the size of the firm or its sector of operation are also important.

Besides from the analytical value of this explorative analysis of data structures, the CART results have a very appealing practical value for technology providers, consultants, and marketers of e-business technologies. Pro-vided that the collected data are fairly recent, the identified clusters can help to optimize marketing and sales strategies. For example, a technology provider might decide to focus his marketing activities on those clusters that have been identified as showing high probability to adopt. Also, a technology provider who is already con-ducting business with a client might use the CART results to make an “educated guess” about what will be a likely future investment of the client. This knowledge could be used to increase the chances of getting a follow-up job. In a similar way, CART can generally be used as a market research instrument for any kind of product, service or technology where useful and preferably large datasets are available.

5.5. Discussion

This chapter introduced a theory and an empirical analysis of the adoption of related technologies. If firms have the possibility to invest into a number of related technologies that are based on joint technological princi-ples, technological interdependencies can have a crucial impact on investment and adoption decisions. In par-ticular, firms might experience an acceleration mechanism in adopting related technologies, if the technologies do not substitute each other in their functionalities. An acceleration effect can occur for various reasons:

- availability of joint complementary inputs, such as specialized labor, suitable organizational struc-tures, or technological infrastructures

- learning-by-doing effects

- discounts for the purchase of more than one technology

- technological complementarity, if technologies are directly compatible and do not substitute each other in their functionalities

- better possibilities to finance investments due to successful earlier investments into related technolo-gies

If any of the above applies, the probability to adopt a technology will increase with the number of related technologies that a firm also uses because all of the above effects are strictly increasing in their argument, with-out a natural point of inflection. In other words, the net present value of a technology can be higher for a firm that is already more advanced in using related technologies than for an otherwise identical firm. This suggests

Im Dokument Technological change (Seite 93-102)