Predictive model for the Oscars

A few years ago, as part of the graduate course Data Analysis and Report Writing in the Department of Epidemiology, Biostatistics and Occupational Health at McGill University, we explored the topic of predictive modeling using a dataset containing movies, directors and actors who were nominated for an Academy Award. The goal was to select some variables and build a predictive model for the winner in four categories: Best Picture, Best Director, Best Actor, and Best Actress. As a movie fan, this was the dream assignment: I could combine my love of movies with my love of statistics! And it payed off: I was the only one in my class to correctly predict all four winners.

Read more

Oscar 2018 predictions

For the past few years, I have tried to predict the winners in all categories at the Academy Awards. Again, I will be using statistics and data analysis to inform my decision in some categories: Best Picure, Best Director, Best Actor, Best Actress, Best Supporting Actor, and Best Supporting Actress

As for the last three years, I stick to what the model tells me for my prediction in these categories. However, I’m skeptical about the predictions I have for best picture: several pundits see The Shape of Water as a front runner, but my model only gives it a 16% chance of winning. Due to rule changes that now require a preferential ballot for Best Picture, the winner has been difficult to predict in recent years. Since The Shape of Water possibly has a broader appeal than Lady Bird and Three Billboards Outside Ebbing, Missouri, it may prevail in the end. But I still believe Three Billboards Outside Ebbing, Missouri is the actual front-runner; but I do think my model is under-estimating The Shape of Water’s chances and over-estimating Lady Bird’s.

In the next few days, I will write another post in which I’ll describe how my model works. I’ll take the opportunity to try and explain why my model is so bearish for The Shape of Water.

My predictions are below, in bold. After the Academy Awards next weekend, I will update this post and point out the winners–I will indicate them in italics.

Read more

First steps with Leaflet

I should probably be working on my thesis, but instead I started reading through the introduction to the R package leaflet. And the following made me feel excited: Code for America has GeoJSON data on their Github page for several cities in the world. In particular, they have both Saskatoon and Montreal! I used the leaflet package to draw the boundaries of each neighbourhoods. You can see the results here. (I wasn’t able to host the maps directly on this webpage.) The next step would be to colour-code the neighbourhoods according to interesting statistics!

Read more

Installing multiple R versions

Sahir Bhatnagar and I are currently wrapping up the first version of our package casebase. In short, it’s an R package for survival analysis, where we use case-base sampling to fit smooth-in-time hazards. (I could write a post on this package, but there’s no need: check out the website and the four vignettes.) As part of our workflow, we perform continuous integration using Travis CI, and we test our package against both the current and development versions of R. Recently, some tests began to fail against the development version, and so I had to install R-devel on my local machine in order to debug our code. This blog post is a summary of how I did it.

To be fair, this is already documented online, and I made use of these resources; see the official R installation docs and this RStudio support post. I’m writing yet another post simply as a reference for myself and my colleagues. But I also ran into a compilation error that I wanted to document here. That error was “caused” by following closely the (amazing) book R packages by Hadley Wickham. Stick around to learn what the problem was!

Read more

Oscar 2017 predictions

For the past few years, I have tried to predict the winners in all categories at the Academy Awards. And for the past two years, I’ve also used statistics and data analysis to inform my decision in four categories: Best Picure, Best Director, Best Actor, and Best Actress.

As for the last two years, I stick to what the model tells me for my prediction in these categories. However, I’m skeptical about the predictions I have for the acting categories. First, for best actor, Denzel Washington and Casey Affleck are the only two front-runners–there is no way Ryan Gosling will win this award. Second, for best actress, although I would argue that Emma Stone is now the favourite, Isabelle Huppert definitely has a chance too. Therefore, I expect to miss one of these categories.

Finally, this year is definitely La La Land’s show: 14 nominations, tied with Titanic and All About Eve for the most ever. I don’t think it will set a new record (which is 11 wins), but I expect them to win around 10 awards. I don’t expect them to run the tables in the sound/music categories–which would be a first for a musical. I think the only one of these four awards they could miss is Best Sound Editing, which I predict will probably go to Arrival.

My predictions are below, in bold. After the Academy Awards, I will update this post and point out the winners–I will indicate them in italics.

Update (2017/02/27): Wow! What a finale! As for my predictions, I did better than last year: 16/24.

Read more