For this 2nd edition of the Challenge4Cancer, Epidemium will provide a comprehensive dataset of demographic, economic and environmental indicators, collected from the public data repositories of international institutions.

The data is coupled with the cancer epidemiologic data used in the previous Challenge4Cancer, with the purpose of exploring and revealing correlations between these factors and worldwide incidence and mortality data. As of now, the data were collected from the following three institutions: a/ The World Bank, b/ The Food and Agriculture Organization for the United Nations (FAO) and c/ the International Labour Organization (ILO) The data aggregates about 7000 indicators, within a 40 years and 200 countries span. The individual indicators are classified according to their sources and theme.

In order to get a ready-to-use dataset, we performed a cleaning and processing pipeline on the source files using Dataiku Data Science Studio software. The documentation is available next to the data tables.

Datasets available

Cancer incidence and mortality tables, sources WHO. From 1953 to 2007 (incidence), 1950 to 2013 (mortality), for all types of cancers.

Data extracted from WHO and OECD from 1960 to 2012, for all types of cancer.

Relevant demographic, economic, educational, health and social data from worldwide institutions

Data extracted from World Bank, FAO, ILO,... More than 10 000 indicators across 200 countries, from the 1960s to today