Saturday, December 5, 2015

Meaghan Procrastinates on Important Dissertation Progress By Half-Assing an R Tutorial

I (Meaghan) have spent a lot of time in R. Like, a LOT. Not really accomplishing lots and lots, mind you, but just kind of fucking around hopelessly most of the time. Meaghan of 3 months ago looked at R and thought "is there any way I can avoid using that program?" while Meaghan of today thinks "is there anything I can do with that program that will permit me to feel useful while procrastinating on something else?" As it turns out, there definitely is: Meaghan of today will now be presenting a wonderful R tutorial on how to scatterplot some shit and then make it pretty without getting entangled in ggplot2 (which is another code word for the bowels of hell).

One of the beautiful and fucking awful things about R is that for any one way of doing something, there's about 60 others. I'm going to tell you how to do things that you could probably do in other ways. These ways make sense to me, but if they cognitively don't work for you I'm sure you could find another 6+ ways of accomplishing the same goal. Also, everything I'm reporting here comes from a place of necessity: I'm sure there are other useful things we could talk about with scatterplots, but since I didn't have to think about them…. I'm not going to talk about them!

R: the very basics

It's free and available on the internet, and very powerful. It isn't user-friendly, unless your user is the Lorax of computer programming.





I use R-studio, which makes R a little prettier and look more like a normal computer program. I also like it because it tells me what type of data I have and acts as a good reminder of what I have named it.

Because my normal naming schemes are pretty incomprehensible, including myself once the coffee wears off.

In R-studio, however, the rules still apply: you have to type most things to make things happen. I like to think of R as an extremely drunk friend wandering through the woods: you need to be very specific and precise when you speak, or your friend will either ignore you or do something very unpredictable and unhelpful. So to start off, tell your drunk friend where you're going by setting your working directory.

setwd("C:/Users/Meaghan/Dropbox/Dissertation Stuffs/Chapter 1 Variation of Teeth/Data Analysis Files")

Then, you're going to need to import your data. You know how sometimes your drunk friends think that closets are toilets until they are angrily corrected? Don't let the drunk friend pee on your shoes, just tell them what the file type is to begin with. Is it an xlsx file? Use read.xlsx. What about a csv? Use read.csv. Also, tell them what to name it. In this case, we'll call it procrastination, since that is ultimately what all this is.

procrastination <- read.csv("procrastination.csv")

This imports a lovely dataset which includes columns of the weeks of term (1-11), the number of fucks I give about things, how many important project I have due, % of time I spend binge watching Netflix, and amount I hate the world on a scale of 1-10.

What a lovely table! So concise, so descriptive, so helpful!
So that's what my data looks like when imported into R. Some people like to then go and turn components into vectors for easy reference, but I work with huge different datasets and it doesn't work well for me mentally. You can tell your drunk friend where to focus by using the dollar sign – sort of literally dangling money in your friends face until they focus. Procrastination$weeks tells R to look at the weeks column in the dataset procrastination. Pretty simple. So let's do a basic scatterplot!

plot(procrastination$week, procrastination$fucks)
Yes, so pretty. Actually pretty hideous I think we can all agree, so what can we do to fix it? Well first off, I hate the label names. Drunk friend is not good at naming things. So let's tell drunk friend here what to call everything and we will be reaaaaally precise here. For the x label, I want it to say Weeks of Term. Y label should be # of Fucks Given, and the title should be # of Fucks Decreases over course of term.

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term")

But I can't see those dumb little dots. Drunk friend is still being unhelpfully obtuse. I'd like to have closed circles. To do that, tell R your exact code for your symbols. These symbol codes are called pch codes, and the one I want is 16.

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term", pch=16)

Better, much better, but I still can't see them. How do I make the points bigger? Imagine your drunk friend comes with a pre-set volume: unfortunately, you can tell your drunk friend to be louder or quieter in relation to that volume but you can't just tell them "be an appropriate volume, a volume which is close to silence" Same with size - drunk friend gets proportionality but not etiquette. Cex is the argument, and it relates to %; a cex=.5 argument would make everything 50% of the size it was before, while a cex=2 would make it twice as large.

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term", pch=16, cex=2)
 

Let's imagine however that you wanted to scale the size of different points by something – like, maybe the amount you hate your life. That would work a little differently. Remember to be specific: tell it the other variable you want it to scale by.

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term", pch=16, cex=(procrastination$hate))

 

Color is pretty easy. You can tell R the normal nameof a color, or you can use the fancy codes from the internet. It doesn't matter too much.

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term", pch=16, cex=(procrastination$hate), col="red")


But that's boring. Why not make it a little fancier? What about if you wanted to make it a gradient of colors relating to the amount of time you spend on Netflix? Let's make a gradient to start with, a beautiful gradient that goes from blue to red. We'll call it Color, because we are feeling highly descriptive and unoriginal.

color <- colorRampPalette(c("blue", "red"))

Next, we want to divvy up that gradient according to a variable. You want to tell it how many breaks you need as well – too many, and everything looks the same color. In this case, we will go with 10.

Netflix <- color(10)[as.numeric(cut(procrastination$netflix,breaks = 10))]

Now, color using our new Netflix code! Remember to capitalize!

plot(procrastination$week, procrastination$fucks, xlab="Weeks of Term", ylab="# of Fucks Given", main="# of Fucks Decreases Over Course of Term", pch=16, cex=(procrastination$hate), col=Netflix)


Now, I was going to teach you how to put in a legend but to be honest, I think that is a different blog post. Also, I've procrastinated way too much already. Like.... way too much. Shit. I guess I'd better go edit something.

Well, join our blog next time for "Meaghan Procrastinates on Important Dissertation Progress By Half-Assing an R Tutorial!"