Preparing for 2007 – A Strategy for Change Based on Previous Experiences
2005-01-11 15:25:41
 

Preparing for 2007 – A Strategy for Change Based on Previous Experiences

Steve Vale
Office for National Statistics
UK

Abstract

This Paper describes how the UK is planning to implement the 2007 round of changes to international classifications of economic activity in our statistical business register. The strategy for managing the transition draws heavily on the lessons learned from previous classification changes, particularly the relatively minor change in 2003, which can be seen as a "dress-rehearsal" for the more fundamental changes for 2007. The strategy is based on capturing and storing text descriptions of business activity, developing a suitable automatic coding tool, and constructing correlation tables to re-code less important units on a probability basis.

1.        Introduction

A major revision to the international system of economic classifications is currently taking place. According to the latest timetable, the new classifications are to be implemented (at least in Europe) for the start of 2007. The introduction of a new classification causes major discontinuities for statistical outputs, which need to be carefully managed. In this respect, the quality of the implementation of the new codes in the statistical business register largely determines the success of the change. This paper focuses on implementation issues relating to businesses registers. It is recognised that there will be many other implementation issues for other parts of the statistical process, but these are not explicitly referred to here.

The UK Standard Industrial Classification (SIC) consists of over 700 different five-digit codes. The first four digits align with the European NACE[1]  classification, which in turn aligns at the higher levels with the International Standard Industrial Classification of All Economic Activities (ISIC). The revisions to the ISIC and the NACE are close to being finalised, with the new version of the SIC to follow several months later.

The changes for 2007 are likely to affect every code. For some, it will mean a simple re-numbering, whereas others will be split and / or combined. The last major change of this nature in the UK was the introduction of the SIC(92) in 1994. There have been less fundamental changes at the level of the fifth digit during the late 1990's, as well as a partial revision in 2002 resulting in the SIC(2003), the current version. Valuable lessons have been learned as a result of implementing these previous changes. These will be used to help determine the strategy for 2007, and are detailed below.

Implementation of the SIC(92)

1) The approach

The SIC(92) was introduced in 1993 to enable the UK to comply with the European Regulation that established the NACE as a compulsory classification[2]. It replaced two older classifications, the SIC(80). which was used for production businesses, and the Business Trade Code (BTC, based on the SIC(68)) which was used mainly for non-production businesses.

Businesses were reclassified by clerically re-coding the descriptions provided on register proving questionnaires from the previous few years. A specific register proving exercise was used for other businesses that made a significant contribution to the economy, and correlation tables were used to re-code the remaining (mostly small) businesses. This process took around a year, for most of which all three classifications were maintained on the business register.

2) Lessons learned

- Holding multiple classification systems

Considerable clerical effort was required to maintain three classification systems during the changeover period. There were specific problems with business activities that changed sector between the different classifications, e.g. photographic processing was included within production in the SIC(80), but within services in both the BTC and the SIC(92). Business surveys did not all change to the SIC(92) at the same time, so special care was needed to stop businesses being selected for surveys aimed at different sectors of the economy. The main lesson learned was therefore to try to avoid having more than one live classification system on the business register, or at least to try to minimise such a period. 

- Consistency

The clerical approach to re-coding inevitably led to inconsistencies and biases depending on the aptitude and background of individual coders. This underlined the importance of the move towards an automatic coding tool.

- Burden on businesses

The additional register proving placed an extra burden on businesses, this is politically unpopular, but was partly justified because it also gained information needed for the introduction of a new version of the business register.

- Timing of changes

The change to the SIC(92) happened around the same time as the introduction of a new business register. This had the advantage of producing just one major discontinuity for surveys, but the disadvantage that it was extremely difficult to separate the impact of the new classification from the impact of the new register. On balance, it seems preferable to stagger the timing of major changes so that their individual impacts can be fully and accurately analysed.

Implementation of the SIC(2003)

1) The approach

The project to implement the change to the SIC(2003) took place during 2001 and 2002, and was partly funded by a Eurostat grant. It started by identifying the local units affected by the change. Those for which a text based business description had already been supplied through a recent response to the annual register proving survey were then re-coded by a team of clerical coders.

When the electronic version of the agreed descriptions and associated metadata for the new codes became available, it was incorporated in a modified version of the Precision Data Coder (PDC), our automatic coding tool. The automatic and clerical coding rates and quality were then compared.

Using those results where the coding exercises gave combinations of old and new codes that were confirmed by the ONS Industrial Classifications Unit as being valid, correlation tables were then produced showing the percentage likelihood of each valid combination. Both backwards and forwards correlation tables were produced. The local unit correlation tables were used to update local units held on the business register for which no business description was present.

The results of re-coding the local units were then applied to the reporting unit level (reporting unit equals enterprise in almost all cases). This allowed the production of separate reporting unit correlation tables, which informed the conversion of data from statistical surveys[3].

2) Lessons learned

- Planning time

A major problem was the limited time following the finalisation of the SIC(2003) and the date by which it had to be implemented on the business register. This was a particular problem for data suppliers as administrative systems require long lead times to plan for and introduce changes. Planning for implementation needs to start sooner and the full classification needs to be finalised earlier. Twelve to eighteen months are needed to ensure a successful implementation.

- Administrative sources

The ideal situation is that the administrative sources that feed the business register should convert to the new classification at exactly the same time as the business register. In practice this is very unlikely, one of the main problems encountered when dealing with suppliers during the change to the SIC(2003) was that some initially felt that this was only a statistical issue, and were reluctant to allocate resources to deal with the change. Even when the suppliers agree to the change, their timetable is likely to be driven by their own requirements and will have to fit in with other planned system changes.

As a result of the need to fit in with other system changes, the VAT system converted to the new classification several months before the business register. New codes from this source had to be stored for future use, and converted back to the SIC(92) using correlation tables until the business register changed. This also meant that late changes to the SIC(2003) could not be introduced immediately in the VAT system, resulting in their use of a classification system that was slightly different to the final SIC(2003) and necessitating the use of further correlation tables until this could be resolved.

The opposite happened with data from the UK company registration system, which introduced the SIC(2003) several months after the business register, so correlation tables were also required during this period. The main lesson learned here is that administrative sources often have even longer lead times, so need very early warning of future changes, particularly the sort of major changes envisaged for 2007.

- Business descriptions

Text descriptions of business activity are the key to changing the classification of units. They vary in quality, but form an important part of classification metadata. They allow the semi-automatic coding of a number of units, they provide a test of relevance for proposed classification changes, and they help with the modification and improvement of coding tools. It is therefore important to capture business descriptions electronically and store them for future use.

- Questionnaire design

Many of the coding issues found in this project could be traced back to the quality of data supplied by contributors.  The issue of questionnaire design is key to ensuring that we get the information we need to maximise both the quality and the efficiency of our coding procedures. 

- Coding tools

An automatic coding tool needs comprehensive metadata to build index entries. Some time is obviously needed to incorporate the metadata for a new classification into a coding tool, and to test that the results meet the required quality standards. This provides further support to the requirement for a sufficient period of time being allowed between finalisation of the classification and implementation.

An interesting lesson learned from the change to SIC(2003) was that automatic coding (using the Precision Data Coder[4] ) gave better results than clerical coding both in terms of accuracy and consistency. This represents the results of investment over a number of years in refining and enhancing the coding tool used.

- Many to one correlations

There was a specific issue relating to the changes to codes in the iron and steel area (division 27). Two SIC(92) codes were merged to form one SIC(2003) code. A correlation table was created to back-impute SIC(92) codes, supplemented in this case by a rule based on the size of the unit. Similar cases are expected in 2007, therefore both forward and backward correlation tables will be needed.

- Relevance of the classification

The SIC(2003) code descriptions were devised mainly by government statisticians. Unfortunately some of the terminology and distinctions used in the classification have little relevance to the way businesses describe their activities. One example is that the SIC(2003) contains a split at the five-digit level between supermarkets with a license to sell alcohol and those without. The activity descriptions supplied by businesses do not support this split (often the description is simply "supermarket"). It would be too costly to contact all supermarkets in the UK to determine the correct code, so as a default it is assumed that they all have licenses to sell alcohol. This effectively means that the classification has not been implemented in this case. To avoid this situation in future, we plan to test any proposed classification headings against the stock of business activity descriptions held on the register to determine whether the headings seem relevant to businesses.

- Impact on register quality

When classification systems change it is difficult to reclassify businesses without changing the quality of the business register frame. We found that some units moved out of the expected classification range, i.e. the combination of old and new codes was not considered valid. This raised the question about how to treat such units - use the new code regardless of what the old code was, thus changing the quality and bias of the frame, or use a correlation table to convert based on the old code and ignoring any potential improvement in quality and any change to the bias.

These questions were resolved by going back to the aims of the project, which were simply to implement the change from the SIC(1992) to the SIC(2003). As a result we only used the classification provided by the PDC or manual coders if it remained within the expected range, and used a correlation table to classify units that would otherwise move out of the expected range. Thus the quality of the register was not affected by this project.

- Dealings with customers

Register customers need to be informed of changes as early as possible, to give them time to prepare. In some cases they need to adapt systems and programmes, particularly if they receive electronic data flows. With sufficient notice issues such as the impact on time-series data can then be addressed prior to the change, reducing problems when the change actually takes place.

- Training and documentation

Strong liaison is necessary between those developing the new classification and the users. Register users are likely to need some training in the use and impact of the new classification, particularly any staff who code business descriptions clerically. Coding performance should be closely monitored to assess the quality of the training.

- Costs of change

The change from the SIC(92) to the SIC(2003) resulted in a cost in staff resources (training, documentation) and system enhancements (change of systems, redesign of current systems, re-coding of units on system, change in methodology etc.) in excess of 250,000 Euro. The cost of the more fundamental change in 2007 would therefore be considerably higher unless the clerical inputs can be reduced.

Conclusions and plans for 2007

As a result of the lessons learned from the implementation of previous classifications outlined above, the ONS is currently formulating its strategy for 2007. For the business register, this strategy is likely to be based heavily on the automatic re-coding of existing business activity descriptions. The results of this will supply the data to construct probabilistic correlation tables, to be used to convert codes for units without business activity descriptions.

We do not currently plan to contact businesses specifically for this update, though our regular register proving survey will continue as normal throughout the implementation period. We do not intend to use any clerical coding resource other than for checking purposes. The proposed solution therefore looks to be very cost-effective, but will need to be fully evaluated afterwards to assess whether it has delivered the required level of quality.

Notes

[1] NACE is derived from the French "Nomenclature statistique des Activités économiques dans la Communauté Européenne" (Statistical classification of economic activities in the European Community)

[2] Council Regulation (EEC) No 3037/90 of 9 October 1990 on the statistical classification of economic activities in the European Community

[3] A single correlation table could not be used for all types of units as the percentage correlations between old and new classifications showed some significant differences at the different levels, particularly for head office activities.

[4] A commercial product developed by Inference Group Pty Ltd, Australia – see www.inferencegroup.com.au

Article Accessories
Related Resources