It has been some time since I have posted something new here due to my ever-increasing work load. However, in the course of working on one of my projects (a conference paper examining rival mechanisms of the diffusion of political liberalism), I came across an efficient solution to a problem that seems to be pervasive in comparative and international relations political science: how to reshape Freedom House data into a useable format.
I should first be clear about something: Freedom House (FH) is not usually where I go to get data. For my research I typically use the POLITY data, since I am usually interested in operationalizing democratic political institutions. POLITY was created by political scientists rather than a NGO, and has a more transparent data collection process, and utilizes less controversial (and more stable) measures such as institutional constraints on the executive. While it is inappropriate to consider either POLITY or FH scores as true measures of democracy- as many comparativists have discussed at length in the literature- when political scientists need to measure concepts like democratization, they usually end up using one of these. Despite its useful qualities, the POLITY project is hampered by a slow updating process, as noted in a post by political scientist Jay Ulfelder. For researchers interested in getting the most recent data, FH can do the trick. For my diffusion project, I needed data that covered the recent unrest in the Middle East and North Africa. Since the 2014 update to FH has been released, I was in luck.
However, as many who use Freedom House scores know, the data is not formatted for the typical political science application. In order to merge FH scores with economic indicators or conflict data, you need to reshape the data so that rows are by country-year and columns are the included measures of political rights and civil liberties. You need to attach Correlates of War country codes, and you also probably want the combined freedom score that informs FH’s status measure (free, partly free, not free). Most of all, you do not want to have to do this from scratch each year when FH updates their data. Ideally, there would be a package in R that could do this for you. Since there is not (that I am aware of), I am sharing the following code (adapted from Dave Armstrong’s solution to this issue), which automates most of the process.
There are two steps which you must do before running this code. The most recent version of the FH data should be saved as a CSV, removing the “Notes and Clarifications” section at the end of the spreadsheet. Since this information is unpredictable, I did not write its removal into the automated code. Also, the apostrophe must be removed from the “Cote d’Ivoire” country name, or the code will break. Once that is done, you can run the following code. The cleaned 2014 FH scores are available here for those interested.
require(countrycode) require(reshape) ## read in column names fh.names <- read.table("FH1973-2013.csv", header=F, skip=5, sep=",")[1:2, -1] ## read in data fh <- read.table("FH1973-2013.csv", header=F, skip=7, sep=",") ## years covered by FH data years <- na.omit(unique(t(fh.names)[,1])[-1]) years[10:17] <- c(1981, 1983, 1984, 1985, 1986, 1987, 1988, 1989) ## countries in the FH data cn <- c("country", paste(gsub(" ", "", t(fh.names)[1:123,2]), rep(years, each=3), sep="")) fh <- fh[1:205,1:124] colnames(fh) <- cn rownames(fh) <- NULL ## reshape the data redat <- reshape(fh, idvar = "country", timevar = "year", direction="long", varying = list(PR=grep("^PR", colnames(fh), value=T), CL = grep("^CL", colnames(fh), value=T), Status=grep("^S", colnames(fh), value=T) )) rownames(redat) <- NULL ## recode the year variable, making sure to skip 1982 redat$year <- redat$year + 1971 redat$year[which(redat$year > 1981)] <- redat$year[which(redat$year > 1981)] + 1 ## recode the missing value redat[which(redat == "..", arr.ind=T)] <- NA ## rename the variables redat <- rename(redat, c(PR1972="political.rights", CL1972="civil.liberties", Status1972="status")) ## create COW country code for matching with other datasets, manual fix for Serbia redat$ccode <- countrycode(redat$country, "country.name", "cown") redat$ccode[which(redat$country == "Serbia")] <- 345 redat <- redat[-which(redat$ccode == 345 & is.na(redat$status) | redat$status == ""),] ## create region using countrycode redat$region <- factor(countrycode(redat$ccode, "cown", "region")) ## fix the S. Africa issue of dual coding in 1972 redat[which(redat$ccode == 560 & redat$year == 1972),3:5] <- c(5,6,"NF") ## set PR and CL scores to numeric and combine redat$political.rights <- as.numeric(as.character(redat$political.rights)) redat$civil.liberties <- as.numeric(as.character(redat$civil.liberties)) redat$freedom <- (redat$political.rights + redat$civil.liberties)/2