UCR. Progress report - 1999
1. PRINCIPLES AND METHODS OF DATA REGISTRATION
2. ICD and ICD-O CODING
3. DATA QUALITY CONTROL IN UKRAINIAN CANCER REGISTRY
4. DIAGNOSIS QUALITY CONTROL
5. APPLICATION OF DATA LINKAGE IN UKRAINIAN CANCER REGISTRY
1.PRINCIPLES AND METHODS OF DATA REGISTRATION
The main preconditions for development of information technology for Ukrainian population-based cancer registry were existing state system of cancer registration and presence of oncological service in Ukraine which involves a number of oblast, city and regional oncological clinics or dispensaries. State system of cancer registration in Ukraine (based on paper files) began to function since 1932. Nowadays registration of cancer patients is carried out with 46 oncological dispensaries, including 25 oblast dispensaries.
The basic principle of cancer registration is that all oncological information, which concerned a case of the disease, is aggregated in the regional oncological establishment in place of patient’s residence.
All medical documents about a cancer patient are have to be sent into the oncological dispensary at place of patient’s residence irrespective of place of diagnosis or cancer patient’s treatment.
The basic sources of the information are the next registration medical documents:
- the “Notification of the patient with cancer diagnosed for the first time in life” - have to be filled up for each case of the revealed disease in any medical establishment;
- the “Abstract from medical in-patient card for patient with malignant diagnosis - is to be filled up in any medical establishment in the moment of cancer patient’s discharge;
- data of state statistic departments on the died patients;
- the “Control card of dispensary follow-up for patient with malignant neoplasm” - was filled only in oncological establishment for patient registered and resident in the region of service.
Thus the basic registration document was the “Notification...” on the basis of which the annual cancer incidence reports in Ukraine were made.
When accepting the “Notification...” or “Abstract...” an oncological dispensary had to fill up the “Control card” for the patient except for that who was diagnosed after death and for so-called departmental patient.
In transition to computerised information technology of oncological patients’ data processing a number of lacks in existing paper technology were revealed and the further ways of its perfection were designated.
- Absence of the single registration document for malignant patient. Using the “Notification...” as the registration document results in hyper-registration, as all medical establishments filled it up, which were in touch with patient and for each tumour diagnosed. There were cases of having up to 5 copies of “Notification...” for one patient (in Odessa area). “Control card”’s have been filled up only for the patients who were under supervision of the oncological dispensary and have not been filled up for those who were diagnosed after death and for departmental patients.
- The existing medical registration documents did not provide the registration of the detailed clinical diagnosis and frequently were reduced to registration of an ICD code (without specification of topography and morphology of the tumour). The authorised by Ministry of Health of Ukraine list of morphological types under registration was restricted to 25 rather general terms (!).
- Registration documents, as a rule, used to be filled up by physicians who were not familiar with the basic ideas and principles of ICD-coding. Manual coding of diseases frequently produces many mistakes (even experienced coders make from 35% up to 48% errors according to study done in USA).
- Official cancer incidence information of Ministry of Health of Ukraine is based only on the data of the first year of the account. But for such extensive territory as Ukraine it is not possible to receive the information for all diseased within one year. That is the official incidence data were not complete and never were obtained more precise (because of impossibility to do it manually every year).
- Official cancer mortality data in population of Ukraine is based on data of State Statistic Bureau, which used only the first year of the account too. Besides the basic cancer registration principle is accumulation whole information about the patient in residential onco-dispensary, but deaths’ registration in State Statistic Bureau is done in the place of obsequies. Thus, if a patient from, for instance, Luganskaya oblast has died on territory of Donetskaya oblast, then such case in the State Statistic Bureau data will be accounted in Donetskaya oblast (instead of Luganskaya as it should be more convenient for calculation of cancer mortality). Annually we receive a difference between official cancer mortality data and mortality data of cancer registry in about 5-6 thousands patients owing to those died and accounted in other oblasts. (Don’t forget that it is only the data of the first year of the account.) Besides the large divergences in causes of death codes are observed.
- There were no precise recommendations about multiple neoplasm’s’ registration what resulted in hyper-registration of cases.
- Low efficiency of manual search of duplicates.
Naturally the lacks determined have become a matter for discussion in HM. They have been accounted while developing more perfect information technology for cancer registration. Besides for development of new computerised information technology (IT) we proceeded from specific features of present situation in Ukraine, that is: non-completeness of cancer registries’ personnel and frequent changes in it, impossibility of regular training realisation etc. Therefore the new IT was assigned to perform control functions to supervise the observance of the existing international standards in cancer registration.
It is evident that for such extensive territory as Ukraine has, with large number
of annually registered cases of disease (near 160 thousands) and quota of diseased
(near 750 thousands), the centralised cancer registry is not reasonable. The distributed
principle of cancer registration, data updating and quality data check therefore
was chosen. The information on cancer cases is collecting in cancer registries
of oblast level in oblast oncological dispensaries. After the data quality control
and updating the data, it is transferred into the central registry in Ukrainian
Research Institute of Oncology and Radiology. The large oblast centres, with the
population 3-5 million persons in oblast (Dnepropetrovskaya, Donetskaya, and Lvovskaya)
have their own regional distributed structure. The universal software provides
an opportunity to be adjusted to the regional oncological structure and uniformity
of codificators enables to integrate the initial cancer data at a state level.
Nowadays new IT is introduced in 20 oblast centres and 3 more are in a stage of
introduction. Last state of the integrated database is shown on a slide. (
Whereby did we solve the indicated lacks in established state system of registration of oncological information?
- formalised “Registration card for patient with malignant neoplasm" (RC)
has been developed and authorised to execution by HM in 1998 (
- The card have to be filled up for all cancer patients (who are under the follow-up
in oncological dispensary, departmental patients and those diagnosed after
death as well) resident on oblast’s territory. The “Notification...”, “Abstract..”
and other registration documents carry the informative function and supplement
a RC ( SL1 ).
- Obligatory description of topography and morphology type of the tumour in "RC" is stipulated.
- Computerised coding of the diagnosis in ICD and ICD-O codes has been introduce
2.ICD and ICD-O CODING
The diagnosis is the major statistical unit for calculation of incidence, mortality and other related rates. Hence, presenting and coding of diagnosis in cancer registry requires special accuracy. Software support for this procedure is useful and necessary if coders are not enough qualified. It is known that even for the most accurate coding of diagnoses the error rate is quite substantial (in some cases as much as 38% according to researchers from USA).
We have to indicate the main reasons for automated ICD-coding introduction in our country:
- subjectivity and frequent errors in coding detected; manual coding was often committed to physicians;
- distributed structure of cancer registration system in Ukraine => staff of cancer registry is numerous and is scattered among oblasts;
- frequent changes in oblast cancer registries’ technical personnel; distribution of registry’s technical personnel over the oblasts; significant number of new cases;
- shortage of paper codificators (ICD and ICD-O books); even those available were adapted for use by general health services employees;
- it is impossible to organise an adequately regular training for the qualified personnel of oblast cancer registries.
According to the purposes of cancer registration, the cancer registries shall comply with certain standards of data presentation.
The programs initiated by the WHO and intended to ensure comparability of the incidence data, led to development of the ICD and special adaptation of the ICD for oncology (ICD-O). The ICD-O allows for more detailed coding of the oncological diagnoses, recording site, morphology and behaviour for each tumour.
Oncological diagnosis is quite easy to structure and abstract, which substantially facilitates the work with it in the computerised system. It is always possible to identify, directly or indirectly, its topography (site of primary) and morphology (histological type) components.
It allows to present each oncological medical diagnosis as a structural combination of specific values for each of the mentioned oncological diagnosis characteristics and to give each characteristic its own place in the cancer registry database file of diagnoses. Specific values of each component of the oncological diagnosis are selected from the definite set of values admissible for this characteristic.
It often happens, however, that clinical diagnoses in the medical records include quite special wordings, which are not presented in the ICD-O alphabetical list and whose analogues can only be found in the ICD-O by the specialist.
Thus, for the goal to enable clinical diagnosis coding jointly with histopathologists and clinicians of the URIOR the special codificators for the UCR system has been developed. The topography section of the codificator is the systematised ICD-oriented list of organs and their parts, anatomical sites, tissues, etc. containing over 1600 units (including synonyms). This section of the codificator may be used for both adequately detailed coding of primary site in ICDO and for solving another problems of cancer registry. The topography section has hierarchical (tree-type) structure with leaves for most detailed sites, which are subordinated, to more general concepts in nodes of the tree
For the purpose of morphology coding a quite extensive list of pathological conditions containing various clinical and pathological wordings for oncological diagnoses and tumour-like diseases (over 2000 terminology units) has also been developed.
To computerise the diagnoses coding by the ICD-O, the next step to make was to
agree the UCR codificator lists with the ICD-O nomenclature lists for topography
and morphology. Here synonymy of terms was taken into account. We can consider
it on the example of the codificator morphology section. Mutual agreement (connecting)
of the above lists allowed to obtain a hierarchical tree-type structure with morphology
terms and codes of the ICD-O in its nodes (in this case) and all synonym wordings
of the corresponding morphology from the UCR codificator in its leaves (values
subordinated to the nodes). The codificator list of sites was treated in a similar
way. Program function of transition from the leaf value to the value of the corresponding
node allows computerising coding of the oncological diagnosis by the ICD-O. (
The codificators are provided with the special service, which ensures correct
coding of diagnoses. The service functions are based on rules of coding of the
diagnoses stated in ICD-O. We took into account practically all rules, some of
them we want to describe more in detail.(SL_2)
Thus, some tumours have more than one histological type. The most frequent combinations, singled out in the ICD-O as separate morphology units, are also presented in the UCR codificator with individual codes. Hence, while entering the diagnosis it is necessary to check various possible combinations of prefixes or compound terms in order to find a suitable version. For this purpose the UCR codificator has an option of contextual search for terms with the letter combination determined by operator at the time of input. Contextual search would result in the list of all terminology units with the pre-determined textual sub-string from the codificator section being used (for example, morphology section) (Rule 10).
If a compound morphology type is not found in the UCR morphology codificator, the multiple primaries are probable. Our program of cancer registry allows to make analysis a complex morphological type by its components and to prompt to the operator, how to register the given diagnosis and also to determine more suitable for the given case code of morphology (Rule 11).
We believe that a most important advantage of the UCR cancer registration technology is the automatic check (based on the recommendations made by the IARC and IACR in 1994 instead of Rule 14) of several diagnoses with one patient whether it is actually multiple primary.
The ICD and ICD-O codes are functionally dependent, that is, having ICD-O codes for topography, morphology and behaviour of the neoplasm, we can always convert them into the ICD code. The program for coding the diagnosis by ICD in the UCR system is based on the ICD-O codes which, in their turn, are logically agreed with the codes of the UCR codificators.
Transition to only computerised coding in ICD and ICD-O nowadays is carried out. Such technology of coding allows keeping continuity of the data with transition to the new version ICD-10. The transition to new system of coding will require only drawing up of a new special file and function for association between ICD-10 codes and ICD-O codes and its integration into the UCR program of data input.
3. DATA QUALITY CONTROL IN
UKRAINIAN CANCER REGISTRY
The automatic control of data quality in the UCR computerised system includes
the following items ( SL-3 ):
Besides, the data are analysed with the use of the following methods:
- control for validity of the data codes - codes may be chosen only via UCR codificators;
- check for the presence of obligatory data;
- control for validity of dates and dates consistency;
- control for logical data consistency.
- control for duplicates;
- qualitative analysis of different indicators to assess accuracy and validity of the cancer registry data (proportion of histologically confirmed diagnoses for different sites (HV), proportion of cases registered by death certificate only (DCO)).
Functions from (1) to (4) are applied both at the time of data input and at
the run of the total control procedure for the registry records. The check procedure
provides interface for visual control of errors and suspicions for errors after
its running is over.
Practically all data entered into the UCR database are controlled. Naturally,
sets of control functions are different for hospital and population-based registries
though they often coincide.
4. DIAGNOSIS QUALITY CONTROL
Would be reasonable to describe the checks of (4) type concerning the diagnosis data. They are supported with the next items:
- codificator for morphological types of tumours includes many clinical diagnoses and provided significant service for correct coding:
- allows making contextual search of terms with a part of a word (with character sub-string)
- allows to find the synonymic terms with permutations of basic parts of it (according to Rule 10);
- helps find correct code for complex morphological term (Rule 11, IARC/IACR
- contains dozen of non-neoplasm which are frequently misinterpreted and coded as neoplasm (they are automatically coded in ICD by our program also);
- codificator for topography of tumour:
- allows making contextual search of the terms with a part of a word
- has tree-like structure = > allows to pass from more general to more detailed anatomic sites
- automated coding in ICD-O
- automated coding in ICD
- automated coding in ICD of large number of non-neoplasm diseases for ensuring that they will not be misinterpreted as neoplasm
- check of correspondence between components of diagnosis
- correspondence between behaviour and morphology;
- consistency of neoplasm topography and the patient sex;
- consistency of neoplasm topography and the patient age;
- consistency of neoplasm morphology and the patient sex;
- consistency of neoplasm morphology and the patient age;
- consistency of neoplasm morphology and topography;
- correctness of registering the multiple primary.
- analysis of unspecified and indefinitely specified primary tumours.
- presence of TNM for the histologically verified diagnosis;
- validity of TNM indices for specific morphology and site;
- correspondence of TNM and stage;
The check functions of UCR are permanently improving.
4) The analysis of completeness of the account of cancer
cases has shown, that on the average in the first year 93 % cases is registered,
in the second - 5.7 % and in the third and the fourth - 0.7 and 0.4 %% accordingly
is registered. The most incomplete account is observed in children's age group.
With the help of cancer registry data more precise incidence data in separate
areas of Ukraine in comparison with official statistics’ rates were received.
As it is show on the table and the diagram, the parameters considerably have
changed ( AGE &
5) The comparative analysis of mortality rates on cancer registry data and data of State Statistics Bureau has shown significant divergences by most probable for metastasis tumour sites - that is liver, lung, bones, and also stomach and leukaemia for men and for women as well. Besides we know that as well as all data of official statistics in our country this data is data of the first year of account, and the subsequent information is not taken into account while calculating the mortality rates. Therefore nowadays the most exact mortality from malignant neoplasm' rates can be received on the cancer registry data.
6) The absence of precise recommendations in multiple primary tumours registration resulted in significant presence of such cases in some oblasts and their absolute absence (but dozen of duplicates) in others. Automated control function of multiple neoplasm (based on the rules of IARC/IACR) was developed for supporting the unified registration of all such cases therefore.
7) From the first versions the software of Ukrainian cancer registry had means for revealing suspicion on the probable duplicate in the moment of information input.
5. APPLICATION OF DATA LINKAGE IN UKRAINIAN
While developing the technology of Ukrainian cancer registry the application of
automated linkage procedures in international practice of cancer registration
was analysed. With the help of linkage procedures the following tasks are solved:
The methods of probabilistic linkage are applicable, for example, if we have two rather large (more than ten thousands records) independent sources of personified computerised information and we have to reveal records about the same patients in both these sources. Unfortunately in public health services of Ukraine the computer databases are not widespread so that it become possible to use automated linkage for the decision of such cancer-registry’s tasks as, for example, to find out the data about patients’ deaths.
- automated making links of records about the same patient in different registries. Just this task prevails in the majority of western registries and it is solved by methods of probabilistic linkage.
- automated search and elimination of duplicated records within one registry.
- automated transfer of data from one registry to another (data from hospital cancer registry is transferring into population-based cancer registry).
But the task of search of duplicated records is more urgent for us. The reasons
are following: (S_LINK_2)
- At first, there is no uniform registration number for a patient in Ukraine similar to number of the medical insurance number in some countries. Number of the passport usually is not filled in the medical documents, and besides it can change. Therefore identification of patients will be usually carried out using surname, year of birth, place of residence...
- Secondly, the cancer-registry of Ukraine involves network of regional registries, and, consequently, some of duplicates result from population migration.
- Thirdly, while keeping paper card files during many years the revealing of duplicated records actually was impossible which results in a tendency to hyper-registration and increasing the quota of diseased.
Application of usual for world practice procedures of probabilistic linkage is complicated by the fact that there is a sufficient number of studies of problems of probabilistic linkage for English language: NYSIIS-codings, methods of batching, etceteras are developed. While the similar study for Russian was not carried out or their results are inaccessible. The simple carrying of English language algorithms over the Russian language makes quality of linkage worse. We undertake attempts of adaptation of these algorithms, but the final result is far from the desirable.
Besides probabilistic linkage presupposes an establishment of some probable false links (usually with probability 99.5%), that is quite allowable for a task of linkage of two registries for some scientific research. However complete automated use of these algorithms for search of duplicates or automated data transfer would result in that 1 patient from 200 (2 hundreds) patients would be added the another patient's data (accounting such probability of false links).
Therefore, in view of application in the Ukrainian cancer-registry, the original
linkage algorithm has been developed. The essence of it is in the automated search
of suspicions for duplicate with the subsequent interactive review of pairs found
(as shown on the picture, by Fazes) (S_LINK_3):
Faze 1. The procedure of search of cards’ pairs, suspicious for the duplicate, is similar to that used in probabilistic linkage. The search is made according to surname, name, patronymic and date of birth of a patient, both complete and partial coincidence of these parameters are taken into account. At the given stage this procedure gives higher probability of revealing of duplicates than procedure of probabilistic linkage adapted by us for Russian language. But in future it is possible to replace our procedure for probabilistic one or using these procedures concurrently as well.
Faze 2. The final decision that found pair is true link the person takes. Let's note that searching of suspicious for duplicate pairs is only automated. The responsible person of the registry does the final conclusion about identity or diversity of pair found. Sometimes this conclusion cannot be made only with computer files’ data and examination of primary paper forms or consultation with the workers of a residential for patient oncological department is necessary. But beneficially the conclusion made may be considered as practically authentic.
Faze 3. After a pair of cards is recognised as duplicate, the automated joining of the information contained in both cards occurs. Practically always each of duplicate cards contains some part of pertinent data to be pieced together. For example, if a duplicate has appeared on account of wrong registration of multiple cancer, then each of the diagnoses most likely are in a separate cards, and the resulted card should contain them both. The special algorithm is developed for analysis and transferring the information from each of relevant fields of card so that it provides maximum authentic result after the cards being pieced together.
The card recognised by the operator as more authentic should be chosen as a source. For each record of diagnosis, treatment etc. in the second card the search of the appropriate record in the source card is carried out. If such information is absent then the record is transferred wholly. If similar (in date and other content) record is present in the source card then only those fields, which are not filled in source or are filled less detailed are transferred.
For example, if in some record of diagnosis the morphological type "Malignant
neoplasm" is specified, and in the duplicate card with similar tumour site and
date of diagnosis the morphological type "Alveolar adenocarcinoma" is specified,
then in a result we shall receive "Alveolar adenocarcinoma" irrespective of the
fact which card was recognised as a source. (S_LINK_4)
Faze 4. After the automated data transferring the unwanted card is deleted, and operator can edit the result card (in case of necessity).
Faze 5. In any case upon termination of data joining the procedure of complete logical check of the result card is performed.
Faze 6. Only after completion of the computerised card check procedure and possible errors correcting the result card is to be legal in the database.
Thus, we, on the one hand, automate all labour-consuming and routine work of search and elimination of the duplicates, and on the other hand, the registry worker provides the control of this process and participates in sometimes arisen non-standard situations and logic contradictions solving.
Applied in duplicates search mechanism has also become useful for automated data transferring from the Hospital Cancer Registry into the Population-based Cancer Registry. The “Abstract from medical in-patient card for patient with malignant diagnosis” is sent into the oncological dispensary in place of residence of patient with the aim of patient’s registration and updating the information already stored in oblast cancer registry. Many patients receive treatment in the residential oncological dispensary where they are under follow-up. Their data are stored in the unified computerised system of the hospital cancer registry.
For the automated data transferring it was necessary to solve the following problems
- extracting necessary data from the database of hospital registry;
- search of cards correspondent to ones transferred into population-based registry;
- creating new cards for the registered first time patients and updating the information for those already registered.
In hospital cancer registry the procedure of creating the computer abstracts’ file is developed. It contains the same information as paper copies (Form 027/onco), and is created in the same moment that is upon termination of medical in-patient card input into the computerised hospital cancer registry. The electronic abstracts are stored in files of the same format as in population-based cancer registry.
While transferring into population-based cancer registry the data of electronic abstracts are located in so-called exchange data buffer wherefrom population cancer registry workers transfer them in a database. The same procedure of preliminary search according to key fields is carried out, the same as while creating a new card. If an appropriate card in database of population registry is found out - then the same as while the automated joining of duplicates procedure is performed. But the roles of duplicates play the hospital registry cards from exchange data buffer. Upon termination of transferring the card in the buffer eliminates and the result card in population registry’s database is checking.
The application of the automated procedures of data transferring from hospital registry into the population one has allowed to reduce time of handling the new card of patient from the selfsame dispensary from several minutes to several seconds. It also significantly reduces working hours for running the registry and stream of paper documents within dispensary. Besides the probable new mistakes caused by the repeated input from the paper documents are reduced either. All actions while joining the data, both automated and made by operator, are recorded in protocol.
The work on introduction of automated data exchange technology is now conducted in all cancer registries of Ukraine For these goals we use means of e-mail. Last year Ukrainian Research Institute of Oncology and Radiology has transferred in regional dispensary not only current electronic abstracts, but also abstract for all patients treated in it during last 10 years. After processing these data it was found out that the significant amount of patients treated in URIOR were not yet registered in their place of residence (because the paper “Abstracts” and “Notifications” have not been sent).
The development and introduction of data exchange automated technology has become possible due to development of the unified software for the population-based cancer registries in all oblasts, and due to wide introduction of compatible computerised hospital cancer registry information system.
Nowadays we have an opportunity to create the common oncological information environment in Ukraine.