Data Science

ResearchGate is where I have listed my publications. You can also see the colleagues with whom I have collaborated on the work I do.

I also have an account on GitHub where I put snippets of code that I have created for working with data, usually in R. For instance, I used some R to randomize PII (Personal Identifiable Information) for work here. I've reproduced some of this code below:


Code to randomize birthdates, deathdates and service dates for CIBMTR Data Back to Center datasets

Version: 2.0
Date: August 22, 2013
Author: Paul K. Courtney

require(lubridate) # Makes dealing with dates easier
require(plyr) # Tool for general data munging

The two source datasets were downloaded from the CIBMTR portal as Excel XLSX files, which were then saved as CSV files for import into R.

First, get the data read into dataframes; version 2 of the csv's indicates that the date columns have all been formatted using Excel in the form MM/DD/YYYY since I found that R interpreted a two year date of "56" as "2056".

PreTED = read.csv("PreTED2.csv")
PrePostTED = read.csv("PostTED2.csv")

Read in the date column names for Pre and Post

preDates <- read.csv("PreDates.csv")
postDates <- read.csv("PostDates.csv")

Then we need to fix this so it's not a dataframe but a vector:

preDates <- as.character(preDates$Pre)
postDates <- as.character(postDates$Post)