Data on employment by sex and detailed occupational groups
The database on employment by sex and detailed occupational groups (SEGREGAT) contains statistics for over 80 developed and developing countries and for years near to 1970, 1980, 1990 and 2000. The statistics are not always comparable across countries or across points in time, given differences regarding the sources of data collection, worker coverage and national classifications used. However, it may be possible to compare specific and well defined occupational groups, such as teachers, doctors and taxi drivers.
This database was first compiled in 1993 to measure and analyze occupational segregation between men and women in the world . It was also used to provide background information on women and men managers to analyze the reasons women find it hard to come into positions of power . In addition, the database has been used for many other applications, including for the analysis of individual occupations. More recently, it has been used to analyze occupational segregation in the context of discrimination at work 
The database is compiled mainly from responses to questionnaires we send to countries . It includes statistics for countries providing employment information by sex for at least 15 occupational groups. Statistics come mainly from Population Census and Labour Force Survey but also in some cases also from Administrative Records and Establishment-based surveys. Generally they refer to total employment although in some cases it can also relate to the labour force (including employment and unemployment) or to salaried employment (this is usually the case when the source is an establishment-based survey). Statistics by detailed occupational groups can be presented according to different occupational classifications. Most countries use an adaptation of an international classification (ISCO-68 or ISCO-88) but many developed nations used a national classification. The occupational titles used appear either in English, French or Spanish. Not all occupational groups have corresponding titles, however, because countries do not always provide complete information. The database provides information about coverage, type of classification used and source for each country and reference year.
To maintain its usefulness, this unique database is regularly corrected for errors and updated. As much as possible its country coverage is extended to improve regional representation. While a number of checks are carried out to identify errors in the data, we are aware that some data entry errors remain in the data, and we would appreciate users to point them out to us, at email@example.com.
An outstanding feature of the database is its diversity. Statistics stem from different sources, cover different groups of workers and use different occupational classifications. These variations are common even within countries for different reference years. Most countries provided data from Census Populations, but as expected, many results for the 2000ís refer to Labour Force Surveys. As requested, most countries provided data for the employed population, although some cover the labour force. By far, the greatest source of variation relates to the occupational classifications used. All these factors affect comparability between countries and across time. The following paragraphs will describe the main sources of variation and their effects on the data.
The source of data collection used may also affect the resulting statistics, because each source applies different data collection and processing strategies. Population Censuses, for example, are huge exercises where neither the quality of the information collected nor of the coding procedures can be controlled as effectively as in Labour Force Surveys (or other sample surveys): one can expect that systematic ad unknown biases be more present when this source is used. On the other side, Population Censuses which are coded completely are essentially free from imprecisions due to sampling and can produce statistics for very detailed occupational groups, while this is not the case of Labour Force Survey results, which may contain many 'non-significant' values, as may Population Censuses where only a sample of questionnaires is coded
Another source of variation relates to the worker coverage. Some data cover the employed population, other data cover the labour force, some include the armed forces and others do not. It can be argued that the occupational distribution of the employed population differs from that of the labour force because of the effect of the unemployed population in the latter. For the unemployed, occupation refers to the 'previous occupation' performed, i.e. which is not equivalent to the 'present occupation' measured for the employed population. And the unemployed may be concentrated in certain occupations, i.e. their occupational distribution may be different from that of the employed population.
As said above, the most important source of variation were the national occupational classifications used. They not only varied between countries, reflecting variations in national realities, but also across time. Countries may revise their national classification of occupations (or adopt a revised international classification) to introduce new occupations, delete obsolete occupations, or reorganise the existing conceptual framework. In order to allow comparability between countries and across time, the classifications need to be "mapped", i.e. occupational groups in one classification need to be assigned to the corresponding occupational group(s) in the other classification. In some cases, this will require occupational groups to be merged to create comparable groups in both classifications. Generally, reliable information and in-depth knowledge is needed on the content and meaning of each occupational group involved for this to be done correctly. We do not encourage mapping classifications across countries, and advise caution when mapping across time.
Finally, the coding procedures used by the statistical agencies are another important source of variation in the data. Only when identical occupational classifications are used and the same coding rules and procedures are applied will the occupational groups have the same contents in practice. The coding rules also have an important effect on the size of the "ignored" or "unclassifiable" groups : their numbers will be small when sufficient information is obtained for classification, when computer assisted coding is used, or when the coding strategy hinders their use, but high when used as "dump" categories, i.e. when all "difficult" or "unclear" responses are systematically coded into these occupational groups. The data give evidence of better coding strategies when Labour Force Surveys are used (i.e. the "ignored" occupations are smaller) than when Population Censuses are used. In addition, the quality of the coding of the individual observations may be higher in a Labour Force Survey than in a Population Census, because better information may be used as basis for the coding, and the coders may be more experienced.
 See ILO (2003): Global report under the Follow-up to the ILO Declaration on Fundamental Principles and Rights at Work. Report 1(b) to the 91st Session of the International Labour Conference. June 2003. International Labour Office, Geneva.
 "Ignored" and "unclassifiable" occupations should be distinguished from occupations "not elsewhere classified". In principle, the "not elsewhere classified" groups are intended to cover well defined occupations which are too small to deserve a separate occupational group in the classification. Unfortunately, the data seem to show that these occupational groups were also being used as "dump" categories by a number of countries.