Welcome to wbdata's documentation!
What is wbdata?
Wbdata is a simple python interface to find and request information from the World Bank's various databases, either as a dictionary containing full metadata or as a pandas DataFrame. Currently, wbdata wraps most of the World Bank API, and also adds some convenience functions for searching and retrieving information.
Wbdata was designed to be used either in a script or in a shell. In a shell, wbdata assumes that the user will use most functions to look up the codes necessary to retrieve the information he wants. To this end, the default in shell mode for most functions is to simply print the id and human-readable name of each item in question. In a script, the default is to return the entire response from the World Bank converted to python objects.
All the functions that you need to get started are in the wbdata
module.
Finally, it should be pointed out that wbdata is in the "release early" portion of the "release early, release often" cycle, and the current test suite is pretty perfunctory. You won't end up with the wrong data, but any irregularities I haven't specifically encountered in the World Bank database have not been dealt with.
Installation
Wbdata is available on PyPi which means you can install using pip:
pip3 install -U wbdata
You can also download or get the source from GitHub.
A Typical User Session
Let's say we want to find some data for the ease of doing business in some well-off countries. I might start off by seeing what sources are available and look promising:
In [1]: import wbdata
In [2]: wbdata.get_sources()
Out[2]:
id name
---- --------------------------------------------------------------------
1 Doing Business
2 World Development Indicators
3 Worldwide Governance Indicators
5 Subnational Malnutrition Database
6 International Debt Statistics
11 Africa Development Indicators
12 Education Statistics
13 Enterprise Surveys
14 Gender Statistics
15 Global Economic Monitor
16 Health Nutrition and Population Statistics
18 IDA Results Measurement System
19 Millennium Development Goals
20 Quarterly Public Sector Debt
22 Quarterly External Debt Statistics SDDS
23 Quarterly External Debt Statistics GDDS
25 Jobs
27 Global Economic Prospects
28 Global Financial Inclusion
29 The Atlas of Social Protection: Indicators of Resilience and Equity
30 Exporter Dynamics Database – Indicators at Country-Year Level
31 Country Policy and Institutional Assessment
32 Global Financial Development
33 G20 Financial Inclusion Indicators
34 Global Partnership for Education
35 Sustainable Energy for All
37 LAC Equity Lab
38 Subnational Poverty
39 Health Nutrition and Population Statistics by Wealth Quintile
40 Population estimates and projections
41 Country Partnership Strategy for India (FY2013 - 17)
43 Adjusted Net Savings
45 Indonesia Database for Policy and Economic Research
46 Sustainable Development Goals
50 Subnational Population
54 Joint External Debt Hub
57 WDI Database Archives
58 Universal Health Coverage
59 Wealth Accounts
60 Economic Fitness
61 PPPs Regulatory Quality
62 International Comparison Program (ICP) 2011
63 Human Capital Index
64 Worldwide Bureaucracy Indicators
65 Health Equity and Financial Protection Indicators
66 Logistics Performance Index
67 PEFA 2011
68 PEFA 2016
69 Global Financial Inclusion and Consumer Protection Survey
70 Economic Fitness 2
71 International Comparison Program (ICP) 2005
73 Global Financial Inclusion and Consumer Protection Survey (Internal)
75 Environment, Social and Governance (ESG) Data
76 Remittance Prices Worldwide (Sending Countries)
77 Remittance Prices Worldwide (Receiving Countries)
78 ICP 2017
79 PEFA_GRPFM
80 Gender Disaggregated Labor Database (GDLD)
81 International Debt Statistics: DSSI
82 Global Public Procurement
83 Statistical Performance Indicators (SPI)
84 Education Policy
85 PEFA_2021_SNG
86 Global Jobs Indicators Database (JOIN)
87 Country Climate and Development Report (CCDR)
88 Food Prices for Nutrition
89 Identification for Development (ID4D) Data
Well, that "Doing Business"---source 1---looks like a winner. Let's see what we've got available to us there.
In [3]: wbdata.get_indicators(source=1)
Out[3]:
id name
------------------------------------------------- ---------------------------------------------------------------------------------------------------------------
ENF.CONT.COEN.ATDR Enforcing contracts: Alternative dispute resolution (0-3) (DB16-20 methodology)
ENF.CONT.COEN.ATFE.PR Enforcing contracts: Attorney fees (% of claim)
ENF.CONT.COEN.COST.ZS Enforcing contracts: Cost (% of claim)
ENF.CONT.COEN.COST.ZS.DFRN Enforcing contracts: Cost (% of claim) - Score
ENF.CONT.COEN.CSMG Enforcing contracts: Case management (0-6) (DB16-20 methodology)
ENF.CONT.COEN.CTAU Enforcing contracts: Court automation (0-4) (DB17-20 methodology)
ENF.CONT.COEN.CTFE.PR Enforcing contracts: Court fees (% of claim)
ENF.CONT.COEN.CTSP.DB16 Enforcing contracts: Court structure and proceedings (0-5) (DB16 methodology)
ENF.CONT.COEN.CTSP.DB1719 Enforcing contracts: Court structure and proceedings (0-5) (DB17-20 methodology)
ENF.CONT.COEN.DB0415.DFRN Enforcing contracts (DB04-15 methodology) - Score
ENF.CONT.COEN.DB16.DFRN Enforcing contracts (DB16 methodology) - Score
ENF.CONT.COEN.DB1719.DFRN Enforcing contracts (DB17-20 methodology) - Score
ENF.CONT.COEN.ENFE.PR Enforcing contracts: Enforcement fees (% of claim)
ENF.CONT.COEN.ENJU.DY Enforcing contracts: Enforcement of judgment (days)
ENF.CONT.COEN.FLSR.DY Enforcing contracts: Filing and service (days)
ENF.CONT.COEN.PROC.NO Enforcing contracts: Procedures (number)
ENF.CONT.COEN.PROC.NO.DFRN Enforcing contracts: Procedures (number) - Score
ENF.CONT.COEN.QUJP.DB16.DFRN Enforcing contracts: Quality of the judicial processes index (0-19) (DB17-20 methodology) - Score
ENF.CONT.COEN.QUJP.DB1719.DFRN Enforcing contracts: Quality of judicial processes index (0-19) (DB17-19 methodology) - Score
ENF.CONT.COEN.QUJP.XD Enforcing contracts: Quality of the judicial processes index (0-18) (DB17-20 methodology)
ENF.CONT.COEN.RK.DB19 Rank: Enforcing contracts (1=most business-friendly regulations)
ENF.CONT.COEN.TRJU.DY Enforcing contracts: Trial and judgment (days)
ENF.CONT.DURS.DY Enforcing contracts: Time (days)
ENF.CONT.DURS.DY.DFRN Enforcing contracts: Time (days) - Score
ENF.CONT.EC.QJPI Enforcing contracts: Quality of judicial administration index (0-18) (DB17-19 methodology)
IC.BUS.EASE.DFRN.DB1014 Global: Ease of doing business score (DB10-14 methodology)
IC.BUS.EASE.DFRN.DB15 Ease of doing business score (DB15 methodology)
IC.BUS.EASE.DFRN.DB16 Global: Ease of doing business score (DB15 methodology)
IC.BUS.EASE.DFRN.XQ.DB1719 Global: Ease of doing business score (DB17-20 methodology)
IC.BUS.EASE.XQ Ease of doing business index (1=most business-friendly regulations)
IC.CNST.LIR.XD.02.DB1619 Dealing with construction permits: Liability and insurance regimes index (0-2) (DB16-20 methodology)
IC.CNST.PC.XD.04.DB1619 Dealing with construction permits: Professional certifications index (0-4) (DB16-20 methodology)
IC.CNST.PRMT.BQCI.015.DB1619.DFRN Dealing with construction permits: Building quality control index (0-15) (DB16-20 methodology) - Score
IC.CNST.PRMT.COST.WRH.VAL Dealing with construction permits: Cost (% of Warehouse value)
[And more deleted for brevity]
Alrighty. There's a lot there. But let's say I'm in the early stages of developing a question and go for the most general measure, which is the "Ease of Doing Business Index" with the id "IC.BUS.EASE.XQ".
Now remember, we're only interested in high-income countries right now, because
we're elitist. So let's use the query parameter of the get_countries
function to figure out the code for the United States so we don't have to wait
for data from a bunch of other countries:
In [4]: wbdata.get_countries(query='united')
Out[4]:
id name
---- --------------------
ARE United Arab Emirates
GBR United Kingdom
USA United States
"USA". Very creative. Thank you, World Bank. But in any case, let's get our data:
In [5]: wbdata.get_data("IC.BUS.EASE.XQ", country="USA")
Out[5]:
[{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business rank (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2022',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business rank (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2021',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business rank (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2020',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business rank (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2019',
'value': 6,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business rank (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2018',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0}]
[And so on]
And that returns a big long list of dictionaries with all the relevant data and metadata as organized by the World Bank. Now let's say we want to look at the United Kingdom as well ("GBR", see above), and only for the years 2010-2011. We can actually search using multiple countries and restrict the dates using datetime objects. Here's what that would look like:
In [6]: wbdata.get_data("IC.BUS.EASE.XQ", country=["USA", "GBR"], date=("2010", "2011"))
Out[6]:
[{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business index (1=most business-friendly regulations)'},
'country': {'id': 'GB', 'value': 'United Kingdom'},
'countryiso3code': 'GBR',
'date': '2011',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business index (1=most business-friendly regulations)'},
'country': {'id': 'GB', 'value': 'United Kingdom'},
'countryiso3code': 'GBR',
'date': '2010',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business index (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2011',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0},
{'indicator': {'id': 'IC.BUS.EASE.XQ',
'value': 'Ease of doing business index (1=most business-friendly regulations)'},
'country': {'id': 'US', 'value': 'United States'},
'countryiso3code': 'USA',
'date': '2010',
'value': None,
'unit': '',
'obs_status': '',
'decimal': 0}]
And we get another list of dictionaries, which we can parse any which way we please.
So let's get a little bit more analytic. Let's say we want to fetch this same indicator, but also GDP per capita and for all high-income countries. Let's find the other indicator using the query parameter to search, and limiting ourselves to indicators from source 2, the World Development Indicators.
In [7]: wbdata.get_indicators(query="gdp per capita", source=2)
Out[7]:
id name
----------------- -------------------------------------------------------------------
NY.GDP.PCAP.CD GDP per capita (current US$)
NY.GDP.PCAP.CN GDP per capita (current LCU)
NY.GDP.PCAP.KD GDP per capita (constant 2015 US$)
NY.GDP.PCAP.KD.ZG GDP per capita growth (annual %)
NY.GDP.PCAP.KN GDP per capita (constant LCU)
NY.GDP.PCAP.PP.CD GDP per capita, PPP (current international $)
NY.GDP.PCAP.PP.KD GDP per capita, PPP (constant 2017 international $)
SE.XPD.PRIM.PC.ZS Government expenditure per student, primary (% of GDP per capita)
SE.XPD.SECO.PC.ZS Government expenditure per student, secondary (% of GDP per capita)
SE.XPD.TERT.PC.ZS Government expenditure per student, tertiary (% of GDP per capita)
Like good economists, we'll use the one that seems most impressive: GDP per capita at PPP in constant 2017 dollars, which has the id "NY.GDP.PCAP.PP.KD". But what about using high-income countries?
In [8]: wbdata.get_incomelevels()
Out[8]:
id value
---- -------------------
HIC High income
INX Not classified
LIC Low income
LMC Lower middle income
LMY Low & middle income
MIC Middle income
UMC Upper middle income
Funtastic. Finally, let's make sure we get our data into a lovely merged pandas DataFrame, suitable for analysis with that library, statsmodels, or whatever else we'd like.
In [9]: countries = [i['id'] for i in wbdata.get_countries(incomelevel='HIC')]
In [10]: indicators = {"IC.BUS.EASE.XQ": "doing_business", "NY.GDP.PCAP.PP.KD": "gdppc"}
In [11]: df = wbdata.get_dataframe(indicators, country=countries, parse_dates=True)
In [12]: df.describe()
Out[12]:
doing_business gdppc
count 58.000000 2040.000000
mean 49.534483 40524.953859
std 36.754384 21654.925324
min 1.000000 4217.814643
25% 20.500000 25921.663744
50% 41.500000 37138.923235
75% 71.000000 49218.423295
max 139.000000 157600.647353
Now we can look at the correlation:
In [13]: df.sort_index().groupby('country').last().corr()
Out[13]:
doing_business gdppc
doing_business 1.000000 -0.407761
gdppc -0.407761 1.000000
And, since lower scores on that indicator mean more business-friendly regulations, that's exactly what we would expect. Hooray!