Spotify Data? That's Music To My Ears!

This dataset was taken from the very popular TidyTuesday github repo, and this was my attempt at having a go at visualization given my love for music and this was a Spotify dataset.

In the spirit of “Perfect is the enemy of good”, this will be a short post aimed at answering just a couple of questions with EDA and visualization.

Datasets from TidyTuesday are usually cleaned (or at least there’ll be instructions/hints on what one should first start with), and I begin by importing the data and exploring it via skimr.

spotify_songs <-
  read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')

skimr::skim(spotify_songs)

Table 1: Data summary
Name	spotify_songs
Number of rows	32833
Number of columns	23
_______________________
Column type frequency:
character	10
numeric	13
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
track_id	0	1	22	22	28356
track_name	5	1	1	144	23449
track_artist	5	1	2	69	10692
track_album_id	0	1	22	22	22545
track_album_name	5	1	1	151	19743
track_album_release_date	0	1	4	10	4530
playlist_name	0	1	6	120	449
playlist_id	0	1	22	22	471
playlist_genre	0	1	3	5	6
playlist_subgenre	0	1	4	25	24

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
track_popularity	1	42.48	24.98	0.00	24.00	45.00	62.00	100.00	▆▆▇▆▁
danceability	1	0.65	0.15	0.00	0.56	0.67	0.76	0.98	▁▁▃▇▃
energy	1	0.70	0.18	0.00	0.58	0.72	0.84	1.00	▁▁▅▇▇
key	1	5.37	3.61	0.00	2.00	6.00	9.00	11.00	▇▂▅▅▆
loudness	1	-6.72	2.99	-46.45	-8.17	-6.17	-4.64	1.27	▁▁▁▂▇
mode	1	0.57	0.50	0.00	0.00	1.00	1.00	1.00	▆▁▁▁▇
speechiness	1	0.11	0.10	0.00	0.04	0.06	0.13	0.92	▇▂▁▁▁
acousticness	1	0.18	0.22	0.00	0.02	0.08	0.26	0.99	▇▂▁▁▁
instrumentalness	1	0.08	0.22	0.00	0.00	0.00	0.00	0.99	▇▁▁▁▁
liveness	1	0.19	0.15	0.00	0.09	0.13	0.25	1.00	▇▃▁▁▁
valence	1	0.51	0.23	0.00	0.33	0.51	0.69	0.99	▃▇▇▇▃
tempo	1	120.88	26.90	0.00	99.96	121.98	133.92	239.44	▁▂▇▂▁
duration_ms	1	225799.81	59834.01	4000.00	187819.00	216000.00	253585.00	517810.00	▁▇▇▁▁

A lot of interesting labels are associated with the data, some of which include danceability, instrumentalness and valence. Full definitions can be found in the associated data dictionary.

I proceed to wrangle the data by adding my own labels to indicate the decades in which the track/album appears in.

spotify <- spotify_songs %>%
  distinct(track_name, track_artist, .keep_all = TRUE) %>%
  mutate(year = str_extract(track_album_release_date, "^\\d..."))

spotify$decades <- cut(
  as.numeric(spotify$year),
  c(1956, 1960, 1970, 1980, 1990, 2000, 2010, 2021),
  labels = c("50s", "60s", "70s", "80s", "90s", "2000s", "2010s")
)

Using track popularity as a gauge, how have subgenres evolved over the decades?

spotify %>%
  group_by(decades, playlist_subgenre) %>%
  add_count(playlist_subgenre) %>% 
  filter(n > 5) %>% 
  ggplot(aes(
    reorder_within(playlist_subgenre, track_popularity, decades),
    track_popularity
  )) +
  geom_boxplot(aes(fill = playlist_genre)) +
  coord_flip() +
  facet_wrap(decades ~ ., nrow = 2, scales = "free_y") +
  scale_x_reordered() +
  theme_ipsum() +
  labs(
    title = "Popularity of Genres Through The Decades",
    subtitle = "Recent Decades Saw An Explosion of Music Genres - Led by Rock and R&B",
    caption = "\n Source: TidyTuesday
      Visualization: Desmond Choy (Twitter @Norest)",
    fill = "Music Genres",
    x = "Music Sub-Genres",
    y = "Track Popularity"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 25),
    plot.subtitle = element_text(size = 15),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 15),
    legend.position = "top",
    legend.title = element_text("Music Genres"),
    legend.box = "horizontal",
    legend.text = element_text(size = 10)
  ) +
  guides(row = guide_legend(nrow = 1))

Permanent wave stood out as a rock sub-genre that, until 2010, stood the test of time in terms of popularity.

Trouble is… as an avid music fan, I’ve not heard of this sub-genre permanent wave at all! Still horrified, let me dig into the dataset a little more. I discover permanent wave actually had a few of my all-time favourite artists and I’ve been a closet permanent wave fan all this while!

spotify %>% 
  filter(playlist_subgenre == "permanent wave") %>% 
  count(track_artist, sort = TRUE)

## # A tibble: 471 x 2
##    track_artist              n
##    <chr>                 <int>
##  1 Muse                     19
##  2 The Smiths               19
##  3 David Bowie              13
##  4 Depeche Mode             12
##  5 The Cure                 12
##  6 Foo Fighters             11
##  7 New Order                11
##  8 Red Hot Chili Peppers    11
##  9 George Harrison           9
## 10 Oasis                     9
## # ... with 461 more rows

How about some suggestions to danceable EDM tracks that I could listen to when out for a run?

We filter by Danceability, as defined as how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

spotify %>% 
  select(playlist_genre, playlist_subgenre, track_name, danceability) %>% 
  filter(playlist_genre == "edm") %>%
  distinct(track_name, .keep_all = TRUE) %>% 
  group_by(playlist_subgenre) %>%
  top_n(n = 20, wt = danceability) %>% 
  ggplot(aes(reorder_within(track_name, danceability, playlist_subgenre), danceability)) +
  geom_point(aes(colour = playlist_subgenre), size = 3, show.legend = FALSE) +
  coord_flip() +
  facet_wrap(. ~ playlist_subgenre, nrow = 2, scales = "free_y") +
  scale_x_reordered() +
  theme_ipsum() +
  labs(
    title = "What are some of the most danceable EDM tracks?",
    subtitle = "Danceability describes how suitable a track is for dancing based on a combination of musical elements\nA value of 0.0 is least danceable and 1.0 is most danceable.",
    caption = "\n Source: TidyTuesday
      Visualization: Desmond Choy (Twitter @Norest)",
    fill = "Music Genres",
    x = "Album Name",
    y = "Danceability"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 25),
    plot.subtitle = element_text(size = 15),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 15),
    legend.position = "top",
    legend.title = element_text("Music Genres"),
    legend.box = "horizontal",
    legend.text = element_text(size = 10)
  ) +
  guides(row = guide_legend(nrow = 1))

Finally, how about some curated suggestions - Based on the criteria listed below, what are some suggestions for sub-genres?

Instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

So my approach was to create a criteria that involved summing up Instrumentalness, Acousticness, Valence. Sub-genres with the highest criteria would then be picked …. right?

spotify %>% 
  mutate(criteria = instrumentalness + acousticness + valence) %>% 
  select(playlist_genre, playlist_subgenre, track_album_name, criteria) %>% 
  distinct(track_album_name, .keep_all = TRUE) %>% 
  group_by(playlist_subgenre) %>%
  summarise(criteria = sum(criteria)/n()) %>% 
  arrange(desc(criteria))

## # A tibble: 24 x 2
##    playlist_subgenre  criteria
##    <chr>                 <dbl>
##  1 hip hop               1.08 
##  2 tropical              0.945
##  3 reggaeton             0.907
##  4 neo soul              0.888
##  5 latin pop             0.866
##  6 classic rock          0.865
##  7 electro house         0.852
##  8 hip pop               0.850
##  9 urban contemporary    0.830
## 10 latin hip hop         0.823
## # ... with 14 more rows

Hip-hop?? When you think acousticness and instrumental tunes … hip hop doesn’t quite come to mind.

spotify %>% 
  mutate(criteria = instrumentalness + acousticness + valence) %>% 
  select(playlist_genre, playlist_subgenre, track_artist, track_album_name, criteria) %>% 
  distinct(track_album_name, .keep_all = TRUE) %>% 
  filter(playlist_subgenre == "hip hop") %>% 
  arrange(desc(criteria)) %>% 
  head(20)

## # A tibble: 20 x 5
##    playlist_genre playlist_subgenre track_artist    track_album_name    criteria
##    <chr>          <chr>             <chr>           <chr>                  <dbl>
##  1 rap            hip hop           Goldenninjah    Moods                   2.86
##  2 rap            hip hop           oofoe           double oo tape          2.73
##  3 rap            hip hop           luvwn           sanya                   2.70
##  4 rap            hip hop           Brenky          Previsão                2.7 
##  5 rap            hip hop           Bluedoom        4:20 PM                 2.66
##  6 rap            hip hop           Sarah, the Ill~ Pocket Full of Cry~     2.62
##  7 rap            hip hop           Loop Schrauber  Repeat                  2.62
##  8 rap            hip hop           Chris Keys      Detour                  2.62
##  9 rap            hip hop           Ymori           Better Things           2.60
## 10 rap            hip hop           Leavv           essence                 2.58
## 11 rap            hip hop           Flynn           Cycles                  2.58
## 12 rap            hip hop           junyii.         junyii·dr!p            2.52
## 13 rap            hip hop           Smeyeul.        Bedroom Skits           2.52
## 14 rap            hip hop           Nathan Kawanis~ Yokohama                2.52
## 15 rap            hip hop           Brenky          Winter Flakes           2.52
## 16 rap            hip hop           Chill Children  bob le head             2.49
## 17 rap            hip hop           Mr Mantega      Fire to Hire            2.44
## 18 rap            hip hop           jrd.            Reflections             2.43
## 19 rap            hip hop           David Chief     Sands EP                2.42
## 20 rap            hip hop           Made in M       Flashlight              2.41

I initally thought there was an error in the data or my code. But I picked a few tunes to sample and it turns out I genuinely enjoyed all of them! This was an amazingly fruitful and productive exploration of new music to widen my aural horizons.

Here’s a Top20 playlist below, based on my criteria.

spotify %>% 
  mutate(criteria = instrumentalness + acousticness + valence) %>% 
  select(playlist_genre, playlist_subgenre, track_artist, track_album_name, criteria) %>% 
  distinct(track_album_name, .keep_all = TRUE) %>% 
  arrange(desc(criteria)) %>% 
  head(20)

## # A tibble: 20 x 5
##    playlist_genre playlist_subgenre track_artist    track_album_name    criteria
##    <chr>          <chr>             <chr>           <chr>                  <dbl>
##  1 rap            hip hop           Goldenninjah    Moods                   2.86
##  2 latin          tropical          Kavv            Cruise Control          2.77
##  3 latin          tropical          S-Ilo           Targa                   2.73
##  4 rap            hip hop           oofoe           double oo tape          2.73
##  5 rap            hip hop           luvwn           sanya                   2.70
##  6 rap            hip hop           Brenky          Previsão                2.7 
##  7 r&b            urban contempora~ Paco de Lucía   La Búsqueda (Remas~     2.68
##  8 latin          tropical          Reyna Tropical  Como Fuego              2.68
##  9 rap            hip hop           Bluedoom        4:20 PM                 2.66
## 10 rock           classic rock      Booker T. & th~ Green Onions            2.63
## 11 rap            hip hop           Sarah, the Ill~ Pocket Full of Cry~     2.62
## 12 rap            hip hop           Loop Schrauber  Repeat                  2.62
## 13 rap            hip hop           Chris Keys      Detour                  2.62
## 14 rap            hip hop           Ymori           Better Things           2.60
## 15 rap            hip hop           Leavv           essence                 2.58
## 16 rap            hip hop           Flynn           Cycles                  2.58
## 17 r&b            urban contempora~ Grey            Goodnight, Universe     2.57
## 18 pop            indie poptimism   Joe Corfield    Chillhop Essential~     2.54
## 19 latin          tropical          S-Ilo           Ascent                  2.53
## 20 rap            hip hop           junyii.         junyii·dr!p            2.52

As always, RMarkdown document can be found in my github should you wish to replicate these results.