QMUL Master’s Thesis
Automatic Playlist Generation and Music Library Visualization with Timbral Similarity Measures pdf
The size of personal digital music collections has grown significantly over the past decade, but common consumer media players offer limited options for library browsing and navigation. Music visualization is a very active field, and there are a number of interesting projects investigating artist and song visualizations on both online platforms and prototype media players. What’s missing, however, is the ability for music lovers to explore visualizations of their own collection on their own media player. The goal of this project was to incorporate a music library visualization into an existing consumer player, and Songbird was the natural choice due to it’s cross-platform, open source nature and a growing community of users eager to customize their listening experience with add-ons.
If you are only interested in the features of the Songbird add-on, then check out the project page. If you want to know more about what’s going on under the hood, then read on…
First, a disclaimer: at several points in the written paper I mention that SoundBite for Songbird is the first music library visualization for an existing consumer software media player. As luck would have it, the mufin player pro, featuring a full interactive 3D song space, was released two days after my thesis was submitted. While I can no longer claim first, SoundBite for Songbird will have the advantage of being-cross platform, and more importantly, free.
I’ll also take this opportunity to plead mea culpa to my newfound inability to keep my British/US spellings straight. It’s a little difficult to move back from visualising to visualizing immediately after writing 15,000 words without the letter Z. Or Zed. Or whatever you want to call it.
How do we take a large music library and create an interactive visualization where the songs are laid out intelligently in a 2D space? By intelligently, I mean that the relative proximity of songs in the visualization space should reflect the degree of similarity between the songs. The first stage of the solution requires defining a measure to compare any two songs and extracting the information required to make the comparison. The second stage requires a method to reduce a high dimensional space of individual song-to-song similarities to a meaningful lower dimensional configuration for visualization. I focused on a solution that could be implemented to function seamlessly in the Songbird environment.
Defining song similarity is a tricky task. Content-based measures can be used to identify songs with similar musical quantities such as loudness, timbre, or rhythm, but they can’t capture any of the external factors that play such a large part in the listening process. Contextual information like genre labels can help provide more meaningful recommendations, but contextual labels can’t be applied until someone actually listens to a song and provides some form of annotation. Collaborative filtering compares user profiles (song ratings, purchasing history, listening habits, etc.) to provide recommendations, but it also requires a listener action before a new song can be recommended.
For this project, I chose to use a timbral content-based similarity measure and focus on the visualization and implementation. The timbral similarity measure essentially captures qualities that make one song sound like another song. Sounds like is quite obviously a very subjective and fuzzy measure, and automatic playlists using this measure can often have quite surprising results. Whether those results are errant or interesting probably depends on the listener.
The image below shows an example automatic playlist using the timbral similarity measure and “Run For Your Life” by The Beatles as a seed track. The collection is a 6000 track personal collection heavily weighted toward rock, and this playlist is mostly homogenous in terms of genre, era, and artist popularity.
Not all automatic playlists are this homogenous. Using a different Beatles track, “Across The Universe,” as the seed, some of the most similar songs are from relatively unknown artists (The Hang-Ups) or completely different genres (Herbert, electronic). Again, errant or interesting is a matter of interpretation. But listening reveals that the songs do have timbral similarity, and that is all this similarity measure attempts to capture.
In the implementation, the similarity feature extractor and distance measure are both generalized interfaces, and I hope to eventually provide more similarity measures incorporating contextual and collaborative information.
For a library containing n songs, there are n2 song-to-song similarity scores. Self-similarity and symmetric similarities can be ignored to help reduce the size of the similarity data, but even collections of moderate size will have a high dimensional similarity space that cannot easily be reduced to two or three dimensions for visualization. Dimensionality reduction is a well-studied problem, but many solutions have computational and/or memory requirements that make them unreasonable for implementation within a Songbird add-on.
To identify a feasible solution, I evaluated several methods on a collection of nearly 50,000 songs from Artists Without A Label. I looked at simple feature vector truncation, Principal Components Analysis, classical Multidimensional Scaling (MDS), and an efficient approximation to MDS called Landmark Multidimensional Scaling (LMDS). The quality of each method was scored by evaluating how well the distances between songs in the 2D layout reflected the higher dimensional dissimilarities. This can be observed visually by looking at scatter plots showing the relationship between each song-to-song distance in the two spaces.
For all the gritty details and many more plots like that, refer to chapter four of the written report. My evaluation also examined the computational resources required for each method. This resulted in the choice of LMDS for dimensionality reduction in the Songbird extension.