[ad_1]
This blog is meant to be a fun take on predicting Song and Record of the Year for the 63rd Annual GRAMMY Awards.
Events in the entertainment industry look a little different this year, with the GRAMMY’s being a prime example. After a postponement due to rising COVID-19 numbers in the Los Angeles area, Sunday, March 14th will mark the 63rd GRAMMY awards. Contrary to previous years, it will take place outside without a formal audience. Last year, Billie Eilish was the star of the show sweeping all four major categories (Song of the Year, Record of the Year, Album of the Year, and Best New Artist), an accomplishment only done once before. This year, while she does have a song in the running for both Song of the Year and Record of the Year, she’ll have tough competition with the likes of Beyoncé (who has nine total nominations), Dua Lipa (six), and Taylor Swift (six) all in the mix.
For the past couple of years (2019, 2020), I’ve been enjoying the exercise of using DataRobot’s enterprise AI platform to generate predictions for who is going to win certain GRAMMY awards, specifically Song of the Year and, as of last year, Record of the Year. What I love about doing this side project each year is that it demonstrates how ubiquitous data science and machine learning can be — from tried and true traditional use cases to non-traditional ones like the GRAMMYs. Taking a look at performance over the past two years, the model generated from DataRobot ranked the winner as the most likely to win two out of three times, with the winner being in the predicted top two each time.
As I’ve done in the previous years, it is important to offer the caveat that these predictions are purely meant to be a fun way to talk about the upcoming GRAMMY awards. Distilling the Recording Academy’s decision-making process into a simple set of data points and algorithms requires much more information and subject matter expertise than what I’ve gathered here. However, I think it helps continue the conversation of how AI can be applied in the music industry and the importance of domain expertise in any machine learning use case.
Now that we’ve set the stage, let’s get down to this year’s predictions.
What Data Did We Use?
The core part of the dataset comes from Spotify (e.g., how danceable is a song?) and Genius (e.g., the lyrics), which serve as the main characteristics (i.e., features) for each nominated track. Similar to my other two GRAMMY posts, I added word counts/percentages of song lyrics that contain text associated with certain emotions, such as anger, sadness, joy, etc., including an overall sentiment score. New to this year’s analysis is augmenting the final dataset with betting odds from recent awards shows (e.g., is this song a favorite to win?), information about past wins (e.g., has this artist won this award before?), and topic-modeling based features derived from one of my other Spotify-inspired datasets.
Modeling Workflow
In terms of modeling, I leverage the same framework as before: iterate through DataRobot’s automatically generated blueprints to identify which one does the best job ranking the most likely winner, using the past five awards shows as my validation period. One addition to this year’s modeling workflow was to do my feature selection with FIRE (Feature Importance Rank Ensembling), a robust framework for feature selection when you’ve already built many candidate machine learning models. For simplicity, I just show the top three most likely to win according to the best model from DataRobot.
Drumroll…
For this awards ceremony, it looks like we have quite a toss-up for Song of the Year with the model giving a slight edge to Taylor Swift’s popular ballad from her critically acclaimed album folklore. Getting her first Song of the Year win this go-round is consistent with what others are saying, especially considering this award is specifically for songwriting rather than production value. As for Record of the Year, the model deems Dua Lipa’s dance hit the likely favorite with Beyoncé as a solid runner-up.
What Is Important to Know?
The plot below shows some output examples from FIRE, which contains box-plots describing the distribution of importance scores for each feature. The larger the score, the more important the feature is. Here, we can see that likelihood_to_win (i.e., the betting odds) yields the strongest signal for determining Song of the Year winners followed by how acoustic / energetic a track is and information around the overall sentiment, including the gloom index.
Distribution of Feature Importance Scores via FIRE
All in all, making predictions this year was a much welcomed distraction from the current pandemic crisis. It’ll be fun to see if the Recording Academy picks any major upsets in light of its atypical awards show format and if any unique surprises are in store. Check it out on March 14 at 8:00 pm EST to find out!
About the author
Data Science Evangelist at DataRobot
Taylor Larkin is a data scientist at DataRobot. Based out of Atlanta, he’s currently responsible for executing data science projects as well as enabling customers to do data science work. He has worked on machine learning projects and research articles in a variety of realms including geomagnetic storm prediction, healthcare, renewable energy, sports analytics, and wine preference. Prior to joining DataRobot, Taylor graduated from The University of Alabama with a PhD in Business Analytics and an MS in Applied Statistics.
Meet Taylor Larkin
[ad_2]
Source link