‘Sound Accurate?’: Exploring Spotify's recommendation engine and its effect on diversity

MUSS3140

‘Sound Accurate?’: Exploring Spotify's recommendation engine and its effect on diversity

Matthew Thomas Purbrick

Submitted in partial fulfilment of the requirements for the degree BSc Music, Multimedia and Electronics

School of Music, University of Leeds, August 2021

Abstract

Spotify's recommendation engine comprises algorithmic playlists (including radio stations) and editorial playlists.

The evidence suggests Spotify's algorithmic recommendations slightly decrease consumption diversity and subtly increase the rate of users leaving the platform. This is due to accuracy (as in accurately predicting users' tastes) being the primary focus in algorithmic design, which may be antithetical to diversity. Whilst diverse recommendations correlate with long-term user satisfaction, accurate recommendations are necessary for short-term user engagement. However, it may be possible to resolve this conflict of interests by simultaneously increasing accuracy and diversity to produce truly serendipitous algorithmic recommendations.

Spotify’s editorial playlists represent a combination of human and algorithmic decision-making. Whilst these playlists can improve the consumption diversity of individual users, they otherwise cheapen the musical experience by relegating music to the background. Furthermore, the increase in individual consumption diversity fostered by editorial playlists may not reach the users who need it most nor translate into increased diversity across Spotify’s entire platform. The prevalence of Spotify’s editorial playlists can also be seen as a return to curated musical experiences, after a short period where Shuffle (as in the mode of music playback) selected what we listened to.

A lack of diversity on Spotify’s platform extends beyond the user experience to spell out bad news for social equality. Female artists are especially underrepresented on Spotify, as they are in other areas of the music industry. Practices that constitute affirmative action have been adopted in the live sector to lessen this problem, and there is evidence for trace amounts of affirmative action within Spotify’s editorial playlists as well. Although Spotify may hold the power to solve this problem through algorithmic affirmative action, so far the starkest instance of affirmative action within Spotify’s recommendation engine (the ‘EQUAL’ hub) falls squarely into its curated editorial side.

Dedication

For my hyperpop heroes Chris, Dad, Harriet, Jack, Jim, Katie, Mum, Oscar, Raf and Suman, and my ever-supportive friends and family A. G. Cook, Bladee, Dylan Brady, Grimes, Laura Les and Oneohtrix Point Never.

Chapter 1: Introduction

Some scholars like to use histrionic language and striking turns of phrase to communicate the awesome quantity of cultural capital now possessed by recommendation algorithms. Morris (2015) declaims “in an era where content is not actually scarce, but falsely made to act scarce, organization is king.” Although the algorithms used to organize our content are certainly authoritative, perhaps kingmaker would be a more accurate word. For it is from inside impenetrable black boxes that these “curators of popular culture” pull the strings.

Similarly, Gillespie (2013) boldly states “That we are now turning to algorithms to identify what we need to know is as momentous as having relied on credentialed experts, the scientific method, common sense, or the word of God,” almost implying that algorithms have replaced religion as our lives’ guiding force, shaping our behaviours and worldviews.

However, Schwab (2018) uses Spotify specifically as an example of a place where we need not fear algorithms’ power. Schwab states the consequences of wonky recommendations are “small for the consumer,” and that “there is a big difference between a music recommendation service and a news recommendation service.” Yet Chodos (2019) argues that Spotify’s algorithms can still warp our individual musical tastes which, on a macroscale, could cause massive cultural consequences. Chodos also states that discriminatory algorithms “represent a serious social justice concern” on Spotify as much as anywhere else. Spotify’s recommendation algorithms may feel rather more important to an individual once they are aware of the gigantic gender disparity on its platform, or once they know how gloomy it feels to lose pleasure in music.

Spotify’s recommendation engine is not solely algorithmic; it also comprises Spotify’s vast number of editorial playlists. A key player in both camps is the Echo Nest which was acquired by Spotify for $100 million in 2014 and, according to Chodos, is “almost certainly” still in use on the platform today. The Echo Nest analyzes music in two ways: by processing the music’s audio signal itself to uncover technical details such as key and BPM, and through crawling the web (searching reviews, blog posts, tweets etc.) to find keywords that reveal cultural details. The classic example of what can be gleaned through this language processing, but not by analyzing the signal, is the distinction between Christian rock and regular rock music (Prey, 2017).

According to Chodos, the Echo Nest’s founder Brian Whitman “argues that by combining these two types of signal, one can come much closer to the true essence of musical meaning.” The successful cross-pollination of two distinct techniques is a common theme in this dissertation.

Chapter 2: Algorithms

Research conducted by Spotify’s scientists

Whilst a paper written by a team of Spotify’s own data scientists in collaboration with the University of Toronto (Anderson, Maystre, et al., 2020) opens by stating personalized recommendations are “strongly associated” with reduced consumption diversity, a subsequent paper written by a separate team of Spotify information scientists in collaboration with MIT (Holtz, Carterette, et al., 2020) opens with the declaration “It remains unknown whether personalized recommendations increase or decrease the diversity of content people consume.”

Whilst the latter paper references the former and ultimately comes to similar conclusions on how recommender systems “homogenize individual users’ consumption,” it is more measured in tone. By contrast, the research conducted by Anderson et al. seems flimsy, but their discussions and conclusions reach more interesting places.

The many parallels between a measure of diversity and the algorithms themselves

In outlining the complexities of their research, Anderson et al. discuss the difficulty of objectively measuring diversity (or inversely similarity). “To do so, one needs a consistent notion of similarity that can compare any pair of items, and also a way of scoring an arbitrary subset of items on a continuous scale from very similar to very diverse.” Anderson et al. eventually solve “the problem of quantifying content diversity in a principled way” by “[leveraging] large volumes of listening patterns to determine a consistent notion of similarity between any two songs.” In simpler terms, they base their definition of similarity entirely on streaming data (ignoring any musical, technical or socio-historical similarities between tracks), and two tracks are considered similar if there is overlap between who listens to them. Greater overlap signifies greater similarity.

This is similar to a fundamental method of algorithmic recommendation still used by Spotify: collaborative-filtering. According to Ricci, Rokach and Shapira (2011), “the first [recommender systems] applied algorithms to leverage recommendations produced by a community of users. [...] This approach is termed collaborative-filtering and its rationale is that if the active user agreed in the past with some users, then the other recommendations coming from these similar users should be relevant as well and of interest to the active user.” In simpler terms, relevant recommendations are created by assessing what else is liked by other users with overlapping tastes (or a shared history of engaging with the same items). Again greater overlap signifies greater relevance.

Anderson et al.’s measure of similarity is also in line with accuracy: often the primary criteria used to test recommendation algorithms according to McNee, Riedl and Konstan (2006). Algorithms are often tested for accuracy by “comparing the algorithm’s prediction against a user’s rating of an item.” This is often achieved via the leave-n-out approach (Breese 1998): the algorithm is presented with a “percentage of the dataset” (e.g. a percentage of a user’s ‘Liked Songs’), and assessed on how much of the withheld data the algorithm can generate. Accuracy is itself aligned with the collaborative-filtering systems it is used to gauge. In the words of Zhou, Kuscsik, et al. (2010), “The focus on similarity is compounded by the metrics used to assess recommendation performance.”

Since their measure of similarity/diversity has so much in common with how recommendation algorithms are built and how they are tested, it is no surprise that Anderson et al. “observed strong associations between algorithmic recommendations and long-term reductions in content diversity.” That is not to say that Anderson et al.’s research was fundamentally flawed. On the contrary, their work may be critical in understanding the perils of algorithmic listening.

Rather, it seems incredibly straightforward that algorithms optimized for finding patterns in streaming data would make these listening patterns further entrenched. McNee et al. describe this “narrow focus” on accuracy as “misguided” and even “detrimental to the field [of recommender systems].” McNee et al. use the hypothetical example of a “travel recommender system” to illustrate the inadequacy of accuracy-optimised algorithms’ recommendations. “Suppose all of the recommendations it gives to you are for places you have already traveled to. Even if the system were very good at ranking all of the places you have visited in order of preference, this still would be a poor recommender system.”

Short-term advantages versus long-term disadvantages

According to Zhang, Ó Séaghdha, et al. (2011), “An ideal recommendation system should mimic the actions of a trusted friend or expert.” This is at odds with the outputs of accuracy-optimised algorithms: for example, a friend constantly recommending films that you had already watched would become tedious. Additionally, a recommendation from an expert would be less personalized, less accurate but far more valuable, more universal than the outputs generated by these algorithms.

However, accuracy-optimised recommendation systems are still powerful and impressive things, even if they do not broaden users’ horizons. Anderson et al. make a point of recognising “the [well-known] short-term benefits of recommender systems.” Perhaps, recommender systems are somewhat mislabelled, allowing academics to win a semantic battle by comparing the algorithms’ outputs to recommendations from “a trusted friend or expert” and therefore arguing that the technology fails in its primary function.

However, Anderson et al. warn us “[recommender systems’] long-term impacts are less well understood.” A user may feel elated when Spotify auto-plays a song they love. But by selecting familiar material, Spotify is forgoing the opportunity to play the user something they have not heard before. In the short-term, it is perfectly reasonable to prefer old favourites over new discoveries. But long-term, it is harder to put a positive spin on the reduced consumption diversity caused by these highly accurate selections.

Anderson et al. observed that, amongst users with the lowest activity levels, those with the least diverse content consumption (i.e. “extreme specialists”) were 35% more likely to leave the platform than those with the most diverse content consumption (i.e. “extreme generalists”). Since “conversion to subscriptions and retention on the platform are very strongly associated with greater content diversity,” it is within Spotify’s own interests to foster consumption diversity on its platform.

Balancing accuracy and diversity

Zhang et al. believe the key to increasing diversity and escaping a “self-reinforcing cycle of opinion” is to prioritise “non-accuracy factors” in the design of recommender systems and “inject novelty, diversity, and serendipity into the recommendation process.” Similarly, McNee et al. highlight serendipity as the crucial factor in designing recommender systems that are “not only accurate and helpful, but also a pleasure to use.”

Yet algorithms with other factors prioritized over accuracy are not necessarily a magic bullet. They may not even be suitable for Spotify’s platform. Anderson et al. argue Spotify’s algorithms need to produce content that is relevant and engaging first and foremost. Algorithmic recommendations that achieve diversity may be lacking in other areas and negatively affecting users in other ways. “Recommending content that users do not engage with because they are not relevant or of low quality can have severe implications long-term” as well. Though there may be conflict between “short-term and long-term goals,” for users to be satisfied in the long-term they must first be satisfied in the short-term.

However, many proponents of serendipitous algorithms still recognise and even stress the importance of short-term engagement. Anderson et al. commend the “rich line of work proposing diversity-aware recommendation algorithms,” much of which appreciates the vitality of accuracy such as the “recommendation algorithm that promotes diversity without sacrificing too much accuracy” developed by “Zhou et al.”

Zhou, Kuscsik, et al. (2010) report that an accuracy-optimised algorithm (in their case “ProbS”) can be hybridized with a diversity-optimised algorithm (in their case “HeatS”) to achieve the best of both worlds. “The accuracy of ProbS can thus be maintained and even improved while simultaneously attaining diversity close to or even exceeding that of HeatS.” This is achieved by “tuning the hybridization parameter λ appropriately.” Zhou et al. even calculated optimal values of λ that “simultaneously increase both accuracy and diversity of recommendations.” Furthermore, the optimal value of λ varied depending on the dataset being tested: 0.23 for the Netflix dataset and 0.41 for the Rate Your Music dataset, “where

λ = 0 gives us the pure HeatS algorithm, and λ = 1 gives us pure ProbS.” It is noteworthy that the diversity-optimised algorithm is given greater weighting across both datasets, but it is unknown what the optimal value of λ would be for a Spotify dataset or indeed across Spotify’s platform. Zhou et al. suggest that, whilst a single optimal value of λ could be set globally across an entire platform, “there is no reason why it cannot be tuned differently for each individual user [...] by the system provider.”

Allowing users to balance accuracy and diversity themselves

However, it may be even more interesting to allow Zhou et al.’s hybridization parameter to be set “by users themselves.” Zhang et al. conducted a small study where 21 participants were each “presented with two (unlabelled) Top-20 recommendation lists” generated by two different algorithms. The majority of participants (12 against 7 with 2 undecided) preferred the algorithm that was more diversity-optimised despite also rating it as less enjoyable (by −0.39 on a 5-point Likert scale or 9.75% less enjoyable). Though these results are not stark, the sample size is small and the study may not be reproducible, it is compelling that the participants preferred the algorithm that showcased less familiar music, and would theoretically lead to greater consumption diversity in the long-term, despite also recognising this selection as less enjoyable.

In the battle of “short-term vs long-term engagement,” Anderson et al. seem to believe that the benefits of diversity-optimized algorithms take weeks or months to materialize. However, Zhou et al.’s data implies that users can immediately gauge the benefits of diversity-optimised algorithms. Instead, the issue may be that Spotify are unable to read their users’ immediate approval of diverse recommendations from users’ immediate engagement metrics. On the other hand, perhaps the participants only made the mental leap necessary to appreciate diverse recommendations from thinking unusually critically about the recommendations in the context of an academic study. On the third hand, this data is not pronounced nor reliable enough to draw any grand conclusions beyond acknowledging that some participants preferred the diversity-optimised algorithm whilst others preferred the accuracy-optimised algorithm.

Yet all three of these cases support Zhou et al.’s proposal of users setting their own hybridization parameter, or what Zhang et al. describe as “an adaptive recommendation system, where users can individually tune the level of ‘wildness’ in recommendations.” If the first case is true, most users, if given the option, would voluntarily increase the diversity of their recommendations with tacit knowledge that this will improve their long-term experience on the platform. If the second case is true, users may come to the same conclusion but only after the platform itself ignites this thought process. In their research on increasing the geographic diversity in the timelines of Twitter users in Chile (a highly-centralized country), Graells-Garrido, Lalmas and Baeza-Yates (2016) show how important it is to “make users aware of such diversity.” Most users in their study did not notice the increased diversity in their feeds unless the user interface was redesigned to illustrate it to them. Perhaps it would be beneficial for Spotify to concisely communicate the pros and cons of accuracy-optimized vs diversity-optimized recommender systems to its users, and allow them to make informed decisions about tuning their recommendations. In contrast to Zhou et al.’s results, Graells-Garrido et al. found in their study that “popularity is slightly more valued than diversity.” This gives further support to the third case. It is not groundbreaking to acknowledge that different people, perhaps even different groups of people, think differently about content diversity and have more or less adventurous tastes. Therefore, it could be beneficial for Spotify to accommodate individual users’ tastes by allowing users to decide for themselves how diverse their recommendations are.

Has Spotify learnt anything from the research conducted by their scientists?

Anderson et al. are researchers on Spotify’s payroll and their 2020 article references the work of Zhou et al. (2010), Zhang et al. (2011), and Graells-Garrido et al. (2016). This means that (at least part of) the Spotify organization are fully aware of how their algorithms reduce consumption diversity, and the many existing proposals to solve this problem. Yet Anderson et al. do not drop hints about the progress their organization has made in this area over the past ten years, and instead cite these decade-old articles as if they are as pertinent and applicable as ever. It is hard to uncover whether any of this previous research has lead Spotify to alter their algorithms, or if Anderson, Maystre, et al.’s conclusions were heeded by their own employers, as the internal machinations of Spotify (the organization) are as opaque to outsiders as its black box algorithms.

Chapter 3: Editorial Playlists

Organic listening

In spite of their knowledge (and therefore Spotify’s knowledge) of the various strategies for diversifying the outputs of recommender systems without impeding accuracy, Anderson et al. still note “when users become more diverse in their listening over time, they do so by shifting away from algorithmic consumption and increasing their organic consumption.” Perhaps Spotify’s algorithmic listening experience is especially restrictive, or perhaps Spotify’s organic listening experience is especially exploratory. Chodos (2019) sings the praises of Spotify’s search box, a key starting point for organic listening, which “accepts virtually anything as valid input. Users can enter particular artists, albums, and songs, but they can also enter genres, moods, or other kinds of musical keywords.”

However, it is increasingly hard to define what counts as purely organic listening once users leave the safety of their assembled libraries. Not all the results returned from a search will be entirely organic, nor are the different types of result presented any differently. Chodos writes: “a hub for black history is now clickable in the exact same way as a radio station seeded by Parliament. So is the artist Parliament, as is their classic 1978 track, Flashlight.” To an uncritical user, there is no clear distinction between algorithmic and organic consumption, especially in the foggy grey areas. All these forms of content delivery, from organically selectable single tracks to endless algorithmic radio, are labelled and playable in the exact same way with “the same inviting play button.”

Half man, half machine, 100% Spotify playlist

More specific searches of the “genre/mood/keyword” kind (e.g. ‘dc hardcore’ or ‘calm bjork’) will likely return user playlists. In this way, the musical expertise of one user can be organically and effectively shared with many others.

Simpler and broader search terms (e.g. ‘summer’ or ‘jazz’) often lead to hubs which, according to Spotify’s Explore webpage, comprise “all the content related to a single genre, theme, or moment” such as “League of Legends, Ramadan, and Wellness” alongside more typical musical genres. Whilst some hubs have more sophisticated features, at their core they are simply collections of related editorial playlists.

These editorial playlists should be considered at least a semi-algorithmic form of content delivery since they are assembled with the powerful algorithmic tool Truffle Pig (created specially for Spotify employees by the Echo Nest) and rely more on engagement metrics than the editor’s own volition, especially once they have been published (Pierce, 2015). The name Truffle Pig implies software that can pick out exactly suitable and possibly obscure songs from amongst the overwhelming noise of Spotify’s vast catalogue. Songs that a human might otherwise miss. As Jim Lucchese (then CEO of The Echo Nest) told Pierce, "Say you want high acousticness with up-tempo tracks that are aggressive up to a certain value. It'll generate a bunch of candidates, you can listen to them there, and then drop them in and add them to your playlist." This algorithmic assistance allows Spotify to curate a vast number of playlists: 3,000 according to Ross (2020).

However, there is no guarantee that user playlists are uncontaminated by algorithms either. When a user makes a new playlist, Spotify algorithmically recommends tracks below the playlist window that the user can add by clicking. In fact, a user need only name a newly created playlist ‘Jazz Funk’ (without adding any songs to it) for Spotify to recommend tracks by Herbie Hancock and Kool & The Gang. Since 2016, when this feature was implemented (u/McCool71, 2016), it has been impossible to know how much of a Spotify user playlist was inspired by organic human thought and how much through algorithmic recommendation.

Perhaps all the music played now, even on terrestrial radio, contains at least the echoes of algorithms: a listener may never know whether they came across a track through an unbroken chain of organic human decisions, or because of algorithmic involvement somewhere in the pipeline. It may even be reductive to discern organic from algorithm listening, as Anderson et al. did in their study. Yet as much as the Echo Nest founders “hate this stupid man versus machine dichotomy” (White, 2019), refraining from dichotomizing Spotify’s cyborgian framework may render its dissection a fruitless exercise. Perhaps the vast grey areas need to be filtered out for interesting patterns or applicable conclusions to appear.

The contextual or curatorial shift

Spotify’s approach of supplying users with (semi-algorithmic) playlists for every situation may still be considered algorithmic listening, but in a less personalized, less accurate, more curatorial, more contextual form.

Pagano, Cremonesi, et al. (2016) describe a then-imminent “move towards context-driven recommendation algorithms” or “contextual turn” which will “[give] users room to develop beyond their past needs and preferences” and “counter the dangers of hyper-personalization, i.e., the Filter Bubble problem.” This context is not the context of the music itself (historical, audiential, artistic, or otherwise). It is instead the context of the user: where they are, what they are doing, and what prompted them to open Spotify. Pagano et al. propose an underlying philosophy for prioritising this context: “people have more in common with other people in the same situation, or with the same goals, than they do with past versions of themselves.” Prey (2018) provides the example of a jogger having more in common with other joggers than with themselves at other times of the day. The jogger would therefore be better served by the ‘Workout’ hub than the ‘Focus’ hub they may have used earlier to aid their concentration.

However, Pagano et al. did not predict the arrival of the hubs feature, or anything like it, and instead envisioned context-data as being sourced from smartphones' sensors and the Internet of Things. Yet Spotify needs not analyse a device’s location and gyroscope data to calculate its best guess at the user’s activity and mood. Instead, the user can simply open the Spotify app, scroll an array of colourful tiles labelled things like ‘Workout’, ‘Chill’ and ‘Party’ and select whatever hub is most apposite. Perceptively, Chodos identifies this ability to supply “the right music for every moment” as the manifestation of a “curatorial shift” rather than a contextual shift.

No matter how it was put into practice on their platform, Prey (2018) agrees that this shift prevents Spotify’s recommendation algorithms from pigeonholing users, allowing them to enjoy very fluid identities. He offers “You are a suburban lover of smooth jazz … until you are not,” as an example of this categorical fickleness. Similarly, Spotify spokesperson Ajay Kalia has assured us that Spotify respects the multitudinous aspects of its users’ wants and needs: “we believe that it’s important to recognize that a single music listener is usually many listeners” (Heath, 2015). Spotify CEO Daniel Ek has also stressed the importance of Spotify delivering content to match the listener’s situational context in a 2015 speech. “If we want Spotify to be the soundtrack of our life, we need to deliver music based on who we are, what we're doing, and how we're feeling moment by moment, day by day," (Segall, 2015).

However, Chodos describes the same concept more gloomily as “the subtle creep of the soundtrack, the idea that music is generally supplemental to other activities and modes of consumption.” He believes applying music to every situation equates to engaging less and less with the music itself. Music is stripped of its value as art, and is instead judged by how well it “matches whatever non-musical activity you happen to be engaged in.” He singles out the hub “curiously” dedicated to Ellen DeGenres (the ‘Ellen’ hub) and pulls apart her tagline: “music truly makes everything better. Well, music and salt.”

Chodos scathingly remarks that for Spotify, it seems music “also shares with salt the property of not being very good on its own.” Whilst Spotify’s contextual or curatorial shift may improve users’ consumption diversity, it also debases the content’s value as absolute music, downgrading it to soundtrack. Chodos also states it does nothing to reach “users who know what they like” (users who Anderson et al. may identify as “extreme specialists” in dire need of increasing their consumption diversity) who he argues would not engage with the hubs feature.

The curatorial shift’s effect on diversity

Although Pagano et al. recognise that Spotify prescribing music for users’ current contexts “gives rise to its own bubble effect,” these bubbles are easier to burst since users frequently change the context of their listening (i.e. “the situation and goal of the moment”) and can easily switch to different hubs. However, this bubble-bursting only occurs on the micro-level of the individual user. Consider that perhaps no single person would have previously known all the dozens of artists of varying popularities that are featured on an editorial playlist such as ‘Peaceful Piano’, yet hundreds of thousands of users will then listen to these same artists in identical sequence. Whilst personalized recommendations “create consumption patterns that are homogenous within users and diverse across users” (Holtz, Carterette, et al., 2020), editorial playlists do the opposite and create consumption patterns that are diverse within users but homogenous across users.

However, as we’ve seen with the Echo Nest’s hybrid approach of combining audio signal analysis with web-based context analysis (Chodos, 2019), and the demonstrable success of hybridizing accuracy-optimised algorithms and diversity-optimised algorithms (Zhou et al., 2010), perhaps a hybridization of personalized and curatorial recommendations fosters greater consumption diversity than can be wrung from either recommendation method on its own. Since 2019, the two methods have been truly cross-pollinated to create personalized editorial playlists which Spotify boasts “[increases] the number of artists featured on playlists by 30% and the number of songs listeners are discovering by 35%” alongside boosting saves and repeat listens (Spotify for Artists, 2019). Perhaps, for all its macro-homogenization and artistic debasement, the curatorial shift is made good through its parentage of a best-of-both-worlds scenario. And in spite of his caustic criticism, Chodos concedes the curatorial shift was necessary for Spotify’s survival as a business. Its previous hands-off approach may still “[hold] appeal for aficionados and professionals,” but Spotify had to attract a broader audience for whom music belongs in the background rather than the foreground.

Chapter 4: Shuffle

The power of Shuffle

Powers (2014) contextualizes discussions of recommendation algorithms with what he considers the algorithms’ predecessor: shuffle play. Rather than requiring sophisticated algorithms to evaluate and rank music according to its relevance, Shuffle uses comparatively crude code to queue up all the music in a library in a pseudo-random sequence. Shuffle’s underlying philosophy is founded on randomness, which is easy to calculate, as opposed to relevance, which is far more difficult to calculate. Shuffle does not know or care about the web of connections between artists and fanbases, it simply takes all the songs in a user’s library and draws them out of a hat. A user’s favourite song is no more likely to come on next than their least favourite but still-somehow-on-their-iPod song. In this way, Shuffle greatly equalized playability across each user’s music collection. Shuffle also allowed challenging new music to be listened to in a manner that was less deliberate and intimidating. New albums no longer had to be listened to in single sittings; they could be introduced into a blend of mostly familiar favourites and absorbed more easily, the same way the family dog will only take her pills crushed up and mixed into her regular chow.

Shuffle facilitates both serendipitous music discovery and ‘lean-back’ music listening and, despite its underlying operational differences, Shuffle served a similar function to recommendation algorithms only a decade ago. Powers believes that this similarity between the user experiences of Shuffle and recommendation algorithms “has tremendous importance during this moment when we have ceded attention, status, and magic to the algorithms.” He argues that “there is far less distinction between [randomness] and more orderly organizations than we presume,” and the mania incited by Shuffle in its mid-noughties heyday yields a great pinch of salt to take with the polemics that now surround recommendation algorithms.

Contemporary criticism

Since it has been eclipsed by a more sophisticated technology, the hopes and suspicions fuelled by Shuffle now look rather quaint. For example, Shuffle did not always feel random and this made people question its basic mechanics, both sincerely and humorously. In his 2007 book on the cultural impact of the iPod, Levy wrote “My first iPod loved Steely Dan. So do I. But not as much as my iPod did. [...] Meanwhile, it began to dawn on me that there were songs, and even entire artists, that my iPod had taken a dislike to, if not a formal boycott.” No matter if this phenomenon was technical or psychological, it is worth noting that there were still doubts raised around the equity of Shuffle: a far simpler, less mysterious, and seemingly fairer technology than today’s recommendation algorithms. Perhaps there is no way to recommend items that does not stir controversy. Whilst Powers’ grounds today’s algorithmic recommender systems with a retrospective review of Shuffle, Chander (2019) compares the obscurity of algorithmic ‘black boxes’ with the similarly obscure human alternative: “The ultimate black box is the human mind.”

Smart features

Powers posits Apple’s Smart Shuffle as their response to the accusations of favouritism levied at Shuffle, and describes the feature as allowing the user “to tinker with the frequency of songs from a particular artist or album.” However, Smart Shuffle does not actually have this functionality and instead deals with the streakiness that occurs in truly random patterns: the user sets a single slider to determine whether Shuffle was ‘more likely’ and ‘less likely’ to play sequential songs from the same artist or album (Fryer, 2008).

Though Powers is not alone in bungling the bounds of Smart Shuffle’s functions: Fryer’s blogpost from the time complains Smart Shuffle is also misrepresented in an NPR special entitled ‘Doing the iPod Shuffle.’ Perhaps the name Smart Shuffle promises more than it delivers. The delicious combination of intelligence and randomness implies pure serendipity and describes a powerful music recommendation tool far exceeding its actual basic functionality.

Fryer goes on to commend iTunes’ Smart Playlist feature which can generate “playlists containing songs with desired attributes” such as specified genres and release years. Since algorithmic recommendation is so often encapsulated in Spotify’s playlists, perhaps iTunes’ Smart Playlist feature is an even more pertinent point of comparison, and a truer predecessor, than Shuffle. Fryer highlights Smart Playlist’s ability to “create playlists of songs which have not been listened to in a specified period of time, like the past month.” Although iTunes’ Smart Playlists operate within the limited confines of user libraries, they still have the power to sniff out the dustiest least-listened-to cuts and squeeze as much content diversity out of users’ collections as they can. Compare this to Spotify’s ‘On Repeat’ playlist (Spotify Newsroom, 2019) which, out of the 70 million tracks on the platform, feeds the user back only the tracks they already play over and over, possibly shrinking their musical comfort zone to only a handful of songs. Spotify has several other personalized playlists that further demonstrate its preoccupation with prescribing users only the songs they already listen to most, including ‘Repeat Rewind’, ‘Summer Rewind’ and of course their annual ‘Your Top Songs’ playlists.

Perhaps Spotify can learn from Smart Shuffle’s customizable slider. At present, the only way users can adjust their experience of Spotify’s recommendation algorithms is by toggling autoplay (i.e. choosing whether Spotify stays silent or plays algorithmically selected music after each album or playlist ends). The single slider that constitutes Smart Shuffle's user interface, tucked away in 'Preferences', sets a precedent for a hypothetical feature where users "can individually tune the level of 'wildness' in recommendations” (Zhang et al., 2011) or set their own "hybridization parameter" (Zhou et al., 2010), as discussed in Chapter 2. Following Smart Shuffle's model, the slider for this proposed Spotify feature could be labelled 'more accurate' at one end and 'more diverse' at the other, with a small amount of text nearby outlining the feature's functionality. Similarly, Spotify could follow the precedent set by Smart Playlists by implementing a powerful playlist creation tool, perhaps through putting Truffle Pig (the software Spotify's editors use to create curated playlists) in the hands of its users.

Shuffle versus Spotify

In 2004, New Yorker’s Alex Ross sang the praises of Shuffle: how it encourages the listener to think more open-mindedly about genre by decontextualising the music as it “goes crashing through barriers of style.” Ross declared Shuffle to be “the future” (an ironic claim now considering Shuffle’s semi-obsolescence) and predicted that Shuffle would get teenagers listening to far more classical music. Since, crucially, Shuffle does not introduce new content and only randomizes the sequence of a user’s existing library, Ross believes that Shuffle broadens users’ tastes by teaching them “music is music” and training their ears to be universally less discriminatory of genre. Shuffle strips music of context and, as Ross expounds “[frees it] from all fatuous self-definitions and delusions of significance.” And here Shuffle’s values clash with recommender systems, which heavily rely on whatever context they can glean. For example, according to its co-founder Brian Whitman, the Echo Nest uses “community metadata” gathered from web crawling to “[infer] descriptions and similarities for music,” (Whitman and Lawrence, 2002).

However, Shuffle can still be thought of as crudely analogous to the recommender systems that strike a good balance between accuracy and diversity, such as Zhou et al.’s hybridized algorithms. Shuffle’s random selections provide the diversity whilst the confines of the user’s assembled library provide the accuracy. The original iPod Shuffle could only hold 240 songs (Beer, 2009) whilst Spotify currently has 70 million tracks on its platform (Spotify Newsroom, 2021): a pool of music almost 300 thousand times greater. Perhaps Powers’s central thesis, “that there is continuity, rather than stark division, between randomness and order,” is flawed. Perhaps it is unfair to compare algorithms designed to sift through millions of songs to a random number generator called Shuffle used to sequence merely hundreds, and incorrect to assume the former still has lessons to learn from the latter.

On the other hand, there may indeed be a place for sophisticated recommendation engines to adopt chaos and disorder as tenets. One of the Echo Nest’s software developers, Paul Lamere, developed a tool that (mis)uses its famous algorithms and database to produce “the most ill-fitting recommendations possible,” (Morris, 2015). Lamere christened this tool “The Wreckommender,” and inputting Hannah Montana, for example, “yields a playlist with tracks by Glenn Gould, Dream Theater and Al Hirt,” (Lamere, 2009). Whilst the Wreckommender is no longer online, and was created “as a novelty for a music hack day event in 2009,” Morris asserts “it confirms that the logics to which algorithms conform are editable and capable of being tuned to multiple ends.” In this way, recommender systems may take on the spirit of shuffle and “maintain a bit of disorder,” even if the accuracy that comes for free within a user’s library must be hard-won across a streaming site’s vast catalogue. Shuffle’s valuable lessons are not hidden within its simple code, but can be taken from the enthusiastic response to how Shuffle changed the way people listened to music: what Lydon (2007) contemporaneously labelled “the cult of the random.”

Shuffle on Spotify

Having followed on from Powers’ assertion that Shuffle is the predecessor to today’s recommendation algorithms by starkly comparing the two playback systems, it is worth remembering that both currently coexist on Spotify. Although YouTube Music (one of Spotify’s chief competitors) has abandoned this simple method of mixing up content (u/histrel, 2021), Shuffle is still integral to Spotify’s user experience and inescapable on the mobile version of its Free tier. Here Shuffle operates constantly; this is as much convenience as “the major music recording companies” are willing to grant (Van Camp, 2013). Spotify then rations the number of tracks the Free tier mobile user can skip to incentivize subscription to its Premium tier. This constant-shuffle strategy is presumably effective and examining it alongside Anderson et al.’s research into the benefits of consumption diversity can lead to some interesting conclusions. Anderson et al. observe that users with greater consumption diversity are more likely to join the Premium tier, whereas users with less consumption diversity are more likely to leave. If constant shuffling increases consumption diversity and Premium access inadvertently traps users in filter bubbles, then Spotify’s imposition of Shuffle on its Free tier may keep Free users wanting more, listening more broadly, and possibly feeling more satisfied than Premium users.

Chapter 5: Affirmative Action

The lack of gender diversity in the music industry

Compared to other creative fields, women are vastly underrepresented in the music industry. Whilst women comprise 60.9% of ‘writers and authors’ in the US (Wicht, Waldfogel and Waldfogel, 2018), they only comprise 12.6% of ‘songwriters’ (Smith, Pieper, et al., 2021). Furthermore, it is difficult to pinpoint where the greatest problems occur along this leaky pipeline. In their report for the European commission, Wicht et al. investigated whether Spotify’s editorial playlists were at fault. Yet they concluded that the underrepresentation of women among Spotify’s top artists “mainly arises from the relatively low female share of songs on the platform rather than anti-female bias in playlist decisions.”

Although this report successfully answers in the negative the question it investigated (whether Spotify’s semi-algorithmic playlisting methods exacerbate the music industry’s sexism), to take its findings at face value would tacitly relieve Spotify of any responsibility to lessen the problem. As Elridge Cleaver, early leader of the Black Panther Party, once said: “If you are not part of the solution, you are part of the problem.”

Algorithms versus affirmative action

Chander (2017) argues that an algorithm should never exhibit racism or sexism. If the dataset the algorithm works on is tarnished with such prejudices, it is not acceptable for the algorithm’s results to echo this prejudice. Rather, it is all the more important that the algorithm’s results are non-discriminatory. Chander states “Our prescription to the problem of racist or sexist algorithms is algorithmic affirmative action.”

However, Spotify may wish to avoid drawing attention to the scale of underrepresentation on their platform, and also avoid getting their hands dirty trying to solve it. According to Chander, corporations refrain from engaging with algorithmic affirmative action “for fear that [it] might be used to argue that they intended, or at least abided, any discrimination that persists.” If Spotify openly adopted affirmative action, they might not only lose the artists and listeners who consider such measures to be unfair or discriminatory; they may also receive criticism from those who thought Spotify were not going far enough in their efforts. From Spotify’s perspective, the safest policy may be to sweep the problem under the rug. However, Chander stresses “efforts to avoid and ameliorate discrimination should be recognized as exculpatory, not incriminating.”

Nevertheless, operating with an agenda, even an agenda as noble as redressing past wrongs and promoting diversity, may seem antithetical to the utility of algorithms. As computational processes, algorithms seem immune to human error and irrationality. Their results may be cold but they should at least be objective. Yet this may be a huge misconception. Gillespie (2014) denounces this “promise of algorithmic objectivity” as a “carefully crafted fiction” and states it is impossible for the designers of any algorithm to be truly “hands-off.”

Similarly, Pasquale (2015) describes how algorithms’ opacity and inscrutability produces a mere “patina of inevitability.” Without knowing an algorithm’s inner workings, it is impossible to see how its factors could be weighted differently or its code rewritten (which may radically change its results) and whatever falls out of the black box cannot be properly disputed. Yet Chander (2017) argues extensively that the fairness of an algorithm can be assessed without examining the algorithm itself. The so-called black box is “protected from scrutiny by trade secret law,” and even if it could be cracked open, doing so may only reveal an algorithm “too complex to understand.” What is far more transparent and revealing is how the data flowing out of the algorithm compares to the data flowing in. “By focusing on inputs and outputs, we can more readily identify [an algorithm’s] disparate impact.” Chander would therefore approve of Wicht et al.’s methodology of investigating the gender bias of Spotify’s recommendation engine by comparing the prevalence of female artists in Spotify’s editorial playlists with the share of female artists on the whole platform. Pierce’s (2015) descriptions of Spotify “perfectly [blending] man and machine” also match Wicht et al.’s methodology: their research is a “study [of] the playlist editors and algorithms at Spotify” that does not attempt to distinguish the effects of human editors from algorithms.

Still, if algorithmic objectivity is a facade, it is a useful facade for businesses to hide behind. Morozov (2011) writes that Google defers to “algorithmic neutrality” whilst failing to acknowledge its algorithms as “highly political.” Gillespie (2014) defines Morozov’s arguments as centering around corporations “[deflecting] responsibility,” with Morozov attributing Google’s deflection to “the company’s growing unease with being the world’s most important information gatekeeper.” There is even a precedent of an internet giant deferring to algorithmic objectivity at the clear cost of objectivity itself: after facing criticism that its Trending feature favoured “news stories that were biased against conservatives,” Facebook fired all of its human editors and replaced them with algorithms that “immediately [posted] fake news,” (Newitz, 2016). Like other major internet platforms, Spotify may be unwilling to shatter the illusion of algorithmic objectivity, and doubly unwilling to do so by adopting a strategy as generally controversial as affirmative action.

Differing opinions

Not everyone is convinced by affirmative action’s counter-intuitive methodology of addressing discrimination by taking characteristics such as gender fully into account when making decisions. For example, John Roberts, chief justice of the United States, stated “The way to stop discrimination on the basis of race is to stop discriminating on the basis of race,” in a ruling against a school with an affirmative action admissions policy (Parents Involved in Community Schools v. Seattle School District No. 1, 2007). In his dissent against this infamous ruling, associate justice Stephen Breyer drew a clear distinction between being racist and being race-conscious. Whilst both justices agree that the principle of evaluating people on their race is not good, Breyer posits that the outcome of affirmative action must be weighed against the consequences of inaction and concludes the ends justify the means.

Whilst this legal theory cannot be directly applied to the relationship between Spotify and the self-employed artists on their platform, it still provides a good grounding on the various attitudes to affirmative action in wider society, beyond what is just technically possible or academically ideal. It is especially important to investigate these attitudes in Spotify’s native Sweden, where the Supreme Court ruled against affirmative action policies implemented by Uppsala University, with district courts subsequently ruling against other universities as well (Holmlund, 2007). Similarly, Freidenvall (2015) notes “Despite the fact that Sweden has been recognized as a model of gender equality [...] legal gender quotas have not been enacted,” and “there was considerable opposition to quotas, even among women.”

It is also important to recognise that Spotify is a multinational corporation and may therefore struggle to implement a strategy of affirmative action palatable to its entire global market. According to Archibong and Sharps (2013) “some forms of affirmative action in the United States, such as quotas, [...] would be labeled positive discrimination rather than positive action [in the United Kingdom].”

Globally, ‘tie breaking’ may be the only widely acceptable application of affirmative action. One arena where ties or near ties may occur between Spotify’s artists is on their curated playlists. Yet it is difficult to discern who or what is really in control here. Wicht et al. imply Spotify’s editors have discretion over established playlists: “Suppose an editor is choosing what rank to give a song on the New Music Friday list.” Pierce, on the other hand, states Spotify playlists “live and die by data.” Whilst human editors generate the initial idea for a playlist and select the first crop of tracks (assisted by powerful algorithmic tool Truffle Pig), they rely on “data about music, along with constant feedback about what people are and aren't listening to, [...] to perfectly match their initial hypothesis.” However, these data on track engagement are likely to be too precise for exact ties to occur between tracks risking elimination or broaching inclusion. Therefore, nudging up tracks from underrepresented groups that lie on the playlists’ boundaries may be considered excessive positive discrimination. On the other hand, these data are likely to be complex (collating many metrics such as streams, saves, skips et cetera) and therefore open to many different weightings and interpretations. In this case, there is enough ambiguity for the use of affirmative action to be more widely considered acceptable.

Keychange and affirmative action in the live sector

Away from the world of streaming, there is a distinct movement of affirmative action rolling through the music industry. An initiative called Keychange was founded in 2015 by Vanessa Reed, then CEO of the PRS Foundation. Through Keychange, over 500 music organisations have pledged to achieve a 50:50 gender balance. These include festivals such as We Out Here, Sea Change and Kendal Calling alongside the Royal Philharmonic Orchestra and the BBC Proms. However, there are still countless organizations who have not made this pledge.

According to Maxie Gedge, UK project manager of Keychange, “there are certain festivals that just aren’t taking responsibility, or they’re not viewing it as their responsibility when, in actuality, it’s everyone’s,” (Snapes, 2021). Nashville journalist Megan Seling (2018) echoes this sentiment, describing the passivity with which bookers, fans and bands regard the issue, all considering it somebody else’s to deal with. “It's turned into an exhausting blame game, and it's clear that nothing will ever change if no one takes responsibility for the problem.” Women are underrepresented across the entire music industry, and therefore the responsibility to solve the problem is diffused just as widely.

Even those who have committed to establishing gender equality on their own patch struggle to compensate for the failings that occur elsewhere in the music business. Emma Zillman, the programme director for the Kendal Calling festival which has signed the Keychange pledge, found it easier to book up-and-coming women and non-binary performers, but “Once you get to the level of an artist that [sells more than] 400-500 tickets regionally, it becomes a lot harder,” (Snapes, 2021). Similarly, Glastonbury organiser Emily Eavis has said she wanted to book female headliners "but the pool isn't big enough," (Savage, 2019). There are other hurdles elsewhere in the industry that make it "hard [for artists] to make the leap to that level." Similarly, Zillman says festivals are “just the endpoint,” downstream from so much of the music industry, and changes need to be made in how music is made and consumed for the genders to be balanced all the way to the top of major lineups, amongst the headliners.

Streaming versus the live sector

Yet affirmative action remains simultaneously controversial and overlooked, even in the music industry’s live sector. A survey by Pirate Studios (Pirate Staff, 2021) of over 700 artists found only 7% of participants had an inclusion rider whilst only “30% of respondents had any knowledge of what an inclusion rider is.” The survey also asked for its participants’ opinions and found that “inclusion riders, though generally unknown, are incredibly divisive,” with common criticisms stating they are “unfair” and “tokenistic.” Whilst many people feel “achieving greater diversity on lineups” is of the utmost importance, others believe affirmative action is in itself discriminatory. And alongside Gedge’s and Seling’s characterizations of an industry refusing to take responsibility, most people are simply ignorant of the issue. With these results in mind, it is easier to see why affirmative action has only been semi-successful in the realm of live performance whilst algorithmic affirmative action is apparently yet to materialize in the realm of streaming.

Firstly, as has been established, affirmative action is controversial. Since the landscape of live music is cellularized into hundreds of different festivals and thousands of different events across each country, the organizational teams behind each event are likely to be small enough to reach a consensus on the subject, if it arises at all. Whilst many festivals’ committees have signed the Keychange pledge, many others have decided not to or are not aware of it. The landscape of streaming, on the other hand, is far more monopolistic which renders a ‘divide and conquer’ approach to pushing affirmative action untenable. A large music festival has an attendance of hundreds of thousands (Reality Check team, 2018) whereas a large streaming service has a user base of hundreds of millions (Spotify Newsroom, 2021).

Spotify’s largest shareholders include several “institutional investors such as Morgan Stanley” who collectively own over a third of shares, alongside the company’s co-founders Daniel Ek and Martin Lorentzon who together own 30.6% (Ingham, 2020). It seems unlikely a positive consensus on the importance of affirmative action for the platform’s recommendation systems would be reached amongst Spotify’s directors, especially when the directors are answerable to such hyper-capitalist shareholders.

Secondly, there has been far more pressure on festivals to rebalance their lineups than on Spotify to rebalance their playlists. Again according to Zillman, festivals have become “an easy target because [they] have a poster that clearly shows the hierarchy of the music industry” (Snapes, 2021). Festival posters typically detail the lineup in eye-catching and shareable fashion which allows photoshopped versions to circulate with the all-male acts removed, often leaving a great deal of negative space punctuated by very few female or mixed acts (The Guardian, 2015). “Of almost 100 bands and artists shown in the original list [for Reading and Leeds Festival 2015], only nine were left on the poster after the Photoshop wizardry,” (Barrell, 2015). Spotify’s playlists, on the other hand, are yet to be subjected to this sort of viral scrutiny.

Furthermore, Sweden has also taken a different stance on women-only festivals to other parts of the world. The Michigan Womyn's Music Festival ran in the United States for 39 years before closing on organiser Lisa Vogel’s own terms after years of controversy surrounding the festival’s decision to exclude trans women (Anderson-Minshall, 2015). Compare this to the “man-free” Statement Festival which ran only once in Sweden in 2018. Despite not enforcing its gender policy at the event itself (and being far more inclusive of trans and non-binary people than MWMF), Statement was found guilty of breaking Sweden’s gender discrimination law as they “discouraged a certain group from attending the event” (Snapes, 2018). This serves as a further example of mainstream Swedish society being less comfortable with affirmative action than other parts of the Western world, and renders the possibility of Stockholm-based Spotify embracing algorithmic affirmative action even less likely.

Trace amounts of affirmative action within Spotify’s editorial playlists

Yet Spotify is taking steps to combat the music industry’s rampant gender inequality problem. It may even already employ affirmative action on the curatorial side of its platform. Wicht et al. report the New Music Friday playlist has a slight but statistically significant pro-female bias. Tracks by female artists are ranked, on average, “about two thirds of a rank” above tracks by male artists of the same streaming performance. Whilst this is a meagre boost for female artists (less than a single rank), it is an interesting detail to analyse.

On the one hand, it implies the order of rankings suggested by streaming performance data is subtly overruled by human editors granting slight favour to underrepresented female artists. If this is in fact the case, it may be considered a small instance of affirmative action, albeit through human interference rather than the algorithmic affirmative action described by Chander. At a stretch, it could show Spotify is not opposed to boosting underrepresented groups with its recommendation engine, which would bode well for the future of artist diversity on its platform.

On the other hand, perhaps it is misleading to suggest this datum provides any insight into the relationship between Spotify’s editors and algorithms. Doing so may even be a cardinal sin within this particular realm. According to Pierce, it is wrong to “dump” the complex mechanisms that form Spotify’s recommendation engine “into two buckets: humans and machines.” Moreover, it would be naive to assume Spotify’s editors did not have access to any meaningful data on the tracks beyond their globally viewable number of streams. In fact, Pierce reports “the team sees which songs people love, which ones they save to other playlists, which ones they skip, and which ones make them ditch the playlist entirely.” Perhaps female artists (or at least the female artists considered by Wicht et al.) fare slightly better against these more complex criteria.

On the hypothetical third hand, perhaps this pro-female bias, albeit identifiable and statistically significant, is still too small to form the foundation of any serious argument. Wicht et al. themselves conclude that the most “fruitful steps toward raising the female share of successful songs” involve “[raising] the female share of artists on the platform.” Wicht et al. also point out that “women make up nearly 38 percent of musicians in the US,” suggesting that lowering Spotify’s barriers to entry or otherwise “[getting] more music onto the

Spotify platform” may be an effective strategy for increasing the “female shares of streaming.”

Spotify’s other affirmative action initiatives

Spotify also fund the USC Annenberg Inclusion Initiative’s annual report entitled ‘Inclusion in the Recording Studio?’ (Smith, Pieper, et al., 2021): a comprehensive resource on gender inequality in the realm of chart-topping pop music. The report, possibly because of its benefactor, also highlights Spotify’s EQL Residency program as an “[ongoing effort] to support and encourage women’s participation in music.” Through this annual program, Spotify offers three “six-month [residencies] for female-identifying producers and engineers,” which, whilst beneficial for the three it selects each year, seems like a mere drop in the ocean when the last ‘Inclusion in the Recording Studio?’ report makes clear that only 2% of the producers behind the hit songs of 2020 were women.

In 2021, Spotify launched a new hub entitled EQUAL that features many editorial playlists of exclusively female artists, some organised by nationality such as ‘EQUAL France’ and others by genre such as ‘Women of Hip-Hop’ (Spotify Newsroom, 2021). The hub also features a playlist titled ‘Created by Women’ which features “songs that are 100% written, produced, and performed by women.” This is an excellent instance of Spotify improving representation and diversity through a curatorial shift. However, these playlists seem to lack much specification beyond incredibly broad genre categories like ‘Rock’ or ‘Electronic’. Even less coherent are the playlists organised by nationality, where many disparate styles of music are all presented under the same flag. Whilst such eclectic collections of songs would broaden any user’s consumption diversity, perhaps these playlists would benefit from being personalized or otherwise made more accurate. This would make the playlists more engaging and hopefully generate more traffic. Any content diversity lost from the playlists themselves should be compensated by the increased gender diversity of streams across the platform.

Presently, a user may find sifting through the EQUAL hub’s crop of unpersonalized all-female playlists no less tedious a method of discovering female artists than skipping through any of Spotify’s more accurate and engaging personalized playlists to avoid all the recommended male artists. Yet, to develop accurate and engaging all-female playlists for the EQUAL hub, Spotify would presumably need a method of correctly gendering all the artists on its platform, either mechanically or algorithmically: perhaps assessing artists’ genders from the pronouns used in their online reviews and articles via the Echo Nest's web crawling and text analysis. This represents a substantial challenge. Although the EQUAL hub’s encapsulation may shield its affirmative action practices from controversy, producing accurate all-female playlists localized in this hub would still involve navigating all the technical, ethical and social obstacles that already stand in the way of Spotify adopting algorithmic affirmative action across its entire platform.

Spotify’s Equal campaign also comprises “a new invite-only Equal board [made] up of 15 organisations from around the world,” and a new music programme that “[builds] on the success” of programmes previously established by Spotify “like Sound Up and EQL” to provide further opportunities for women in the music industry (Paine, 2021). These industry placements and targeted funding represent a bottom-up approach to solving gender inequality, to supplement the EQUAL hub’s top-down approach of increasing the Spotify userbase’s consumption of female artists. Whilst the Equal music programme has a limited number of placements to offer, the EQUAL hub is globally available to all of Spotify’s hundreds of millions of users.

It is tempting to infer that, since Spotify has finite resources to solve the problem of gender inequality from the bottom up, it has far greater potential to effect change from the top down by remodeling its recommendation engine and wider platform. Yet Spotify may be far more comfortable forking out funding for internships than it would be instigating further affirmative action, exposing the myth of algorithmic objectivity, or significantly changing its user experience, for the reasons discussed above. Furthermore, Chodos (2019) writes that many users simply “don’t want hubs.” The EQUAL hub comprises a treasure trove of female artists available for every user to uncover, yet for many users, this content will remain hidden behind the hub’s clickable tile. However, Spotify’s hub feature does allow content such as all-female playlists, which some users may disagree with as a form of affirmative action, to be Trojan-horsed into the platform. Moreover, it may be excessively cynical to interpret the organization’s two-pronged approach (to battling gender inequality) as Spotify offsetting its responsibility to improve the gender diversity of its recommendation engine by throwing money at a seperate part of the music industry. Especially since Wicht et al. conclude that a fruitful way of improving the gender balance of Spotify’s top playlists would be “to raise the female share of artists on the platform.” Rather, it may indeed be best to attack the lack of gender diversity from two opposing angles.

A lost pilot for algorithmic affirmative action

In 2017, prior to the Equal campaign, Spotify launched the similarly-named Equalizer Project which supports “female and non-binary creators in the Nordic music industry” via “networking meetings, producer camps, and more,” (Spotify Newsroom, 2020). Spotify reports that the percentage of its top 50 songs in Sweden produced by women has increased year on year (from 0% in 2017, 0.8% in 2018 and 4.1% in 2019) whilst acknowledging “there remains significant ground to be gained.” In 2018, the program was sponsored by Smirnoff, rebranded as Smirnoff Equalizer and included a tool that algorithmically created a playlist based on the user’s listening habits with “a nifty slider bar [...] that allows you to adjust percentages” all the way up to “100 percent female if desired,” (Melby Clinton, 2018). However, the journalist Melby Clinton concedes that her generated playlist “was a bit dizzying, skipping from Shania Twain to John Legend to, strangely, All I Want For Christmas by Mariah Carey,” implying that the tool’s underlying algorithm needed to be better optimised for accuracy or was otherwise lacking in some way.

Smirnoff no longer sponsors the project, and although the program itself lives on, the recommendation tool is unfortunately no longer available. Perhaps it proved too controversial or did not generate enough traffic to justify its worth to Spotify. Perhaps the tool was lost when Smirnoff did not renew its sponsorship, leading to tedious copyright issues not worth battling through or the tool simply being forgotten about on a doomed “smirnoff.spotify” webpage. Perhaps the tool was a mechanical turk (or artificial artificial intelligence (not a typo)) that could only pull tracks from an assembled bank of perhaps a few hundred female songs, hence the odd selections, and could not possibly be scaled up to algorithmically solve the lack of gender diversity across Spotify’s entire platform. Perhaps solving the music industry’s endemic gender inequality problem is simply too much to expect from a glorified vodka advert. This is all speculation. Nevertheless, the Smirnoff Equalizer tool (if not too good to have been true) represents an incredible opportunity squandered, either deliberately or accidentally. A personalized playlist generator with a user-controllable gender-balance parameter seems the perfect realization of algorithmic affirmative action that Spotify could ever implement on its platform.

Chapter 6: Conclusion

The evidence gathered by Anderson et al. and Holtz et al. suggests Spotify’s algorithmic recommender systems indeed decrease consumption diversity. Therefore, more users are pushed into becoming specialists instead of generalists. Holtz et al. recognize specialists’ listening patterns as very distinct from one another. Whilst this balkanization can maintain diversity across Spotify’s platform, these specialist listening patterns are bad for the individual user and lead to increased rates of churning. It is therefore in Spotify’s best interests to ensure its users are consuming a balanced musical diet. Yet there is an identifiable conflict between the short-term goals (achieving engagement via accuracy) and the long-term goals (achieving generalism via diversity) of Spotify’s recommendation engine.

According to Zhou et al. and Zhang et al., the responsibility to balance Spotify’s recommendation engine’s diversity might be best transferred to the user. Furthermore, Zhang et al. demonstrate the success of hybridizing accuracy-optimised algorithms and diversity-optimised algorithms: the resultant recommendations simultaneously outperform each parent algorithm at its own game, with increased accuracy and diversity.

Since 2015, there has been much seemingly human decision-making within Spotify’s recommendation engine, courtesy of the countless editorial playlists produced by Spotify to “soundtrack your life,” (Chodos, 2019). However, Pierce reports that these editorial playlists are assembled with powerful algorithmic tools and tweaked to maximise engagement by relying heavily on user metrics. The Echo Nest’s Brian Whitman (White, 2014) thinks it is even “stupid” to dichotomize man and machine in these contexts. It is indeed important to understand that curatorial recommendations are algorithmically assisted whilst algorithmic recommender systems are authored by people, but refusing to categorically separate organic and algorithmic consumption makes it is impossible to usefully compare the two, and fuzzifies any conclusions drawn from inspecting Spotify’s recommendation engine as a whole.

Chodos defines the upsurge of these editorial playlists as a curatorial shift, which mirrors Pagano et al.’s descriptions of a contextual shift where a user’s tastes are less important than the non-musical activity at hand that they wish to “soundtrack.” Whilst this shift may help users out of their filter bubbles and increase the diversity of personal consumption, it can create filter bubbles of a different sort and decrease the diversity of the music listened to across the platform as a whole. Although this is inverse to the listening patterns created by Spotify’s algorithmic recommender systems, the curatorial shift may not be enough to save extreme specialist users as they already “know what they like” and “don’t want [access to] hubs,” where these editorial playlists are found.

Overall, it is difficult to ascertain if editorial playlists offset the effects of Spotify’s algorithms in terms of consumption diversity. Whilst Anderson et al.’s and Holtz et al.’s research on algorithms is empirical, the literature on editorial playlists is much more philosophical. Chodos’s biggest beef with Spotify’s curated side is the way it presents music as supplementary to other activities (e.g. exercise, study) rather than as an enriching and engrossing activity in its own right. However, he does concede that the curatorial shift was necessary for Spotify to vastly expand its userbase.

The curatorial shift can be seen as a return to curation, after only a brief stint without tastemakers. A decade ago, Shuffle's ability to keep the music playing hands-free served a similar function to today's algorithmic recommender systems. The hysteria inspired by Shuffle in its heyday is a good point of comparison for modern discussions of Spotify's recommendation engine: it shows that even the simplest and fairest ways to order music can cause controversy. And after endless discussions of accuracy metrics, it is refreshing to review how passionate people were about randomly sequenced music relatively recently. Furthermore, the changes made to Shuffle as it evolved set a historical precedent for Zhou et al.'s and Zhang et al.'s hopes for Spotify; a Spotify equivalent of Smart Shuffle could see the user setting the amount of diversity in their algorithmic recommendations.

Yet it is not entirely fair to compare algorithms to Shuffle or to assume that randomness is adjacent to order, as Powers posits, because Shuffle does not have to face the fundamental engineering challenge of sifting through the millions of songs on Spotify’s platform. However, Shuffle is still present on Spotify, especially on the mobile version of its Free tier where it operates constantly. In this way, Spotify may foster greater consumption diversity in its Free users than its Premium users, not in spite of but because of the limited experience it offers Free users.

The area most starkly lacking in diversity on Spotify’s platform is gender diversity. Spotify’s recommendation algorithms may be key to solving this problem via algorithmic affirmative action: a panacea not just for the algorithms themselves, but also “the real world on which [they operate],” (Chander, 2017). In spite of the principle’s general controversy, Wicht et al. found evidence for small amounts of affirmative action within certain editorial playlists. Spotify is also embracing affirmative action in other areas on its platform, and within the wider music industry, by funding recording studio internships and the valuable research into gender inequality conducted by Smith et al. (2021). However, in terms of widespread affirmative action, Spotify lags behind the music festival sector, which itself has a long way to go.

In 2021, Spotify made a large and public push for gender diversity with the launch of the EQUAL hub and its corresponding music programme. Whilst this hub contains numerous all-female playlists, these playlists lack personalization and accuracy which may render them less engaging. But producing more accurate playlists for this hub might be a deceptively large task that involves instigating algorithmic affirmative action across the entirety of Spotify’s platform to accurately log the gender of each and every artist. Yet there is evidence to suggest it can be done, in the form of Spotify’s 2018 Smirnoff Equalizer playlist creator. This throwaway vodka-sponsored tool could create personalized playlists crammed full of female artists, and seemed a promising pilot for algorithmic affirmative action, yet it is no longer available and currently has no clear successor.

Spotify’s recommendation engine has measurable positive, negative and even competing effects on different facets of diversity. Although the recommendation engine is complex, obscure, cyborgian and possibly beyond comprehension, it is not beyond comparison with previous technologies or other areas of the music industry. These comparisons conjure precedents for otherwise purely academic suggestions on how Spotify can improve. Whether Spotify will adopt algorithmic affirmative action to increase gender diversity, or allow its users to tune the content diversity of their own recommendations, only time will tell.

Word count: 11073

Reference List

Anderson, Ashton, Lucas Maystre, Rishabh Mehrotra, Ian Anderson, and Mounia Lalmas. ‘Algorithmic effects on the diversity of consumption on spotify.’ Proceedings of The Web Conference 2020, (2020).

Anderson-Minshall, Diane. Op-ed: Michfest's Founder Chose to Shut Down Rather Than Change With the Times (2015) <https://www.advocate.com/commentary/2015/04/24/op-ed-michfests-founder-chose-shut-down-rather-change-times> [accessed 10 August 2021].

Archibong, Uduak, and Phyllis W. Sharps. ‘A comparative analysis of affirmative action in the United Kingdom and United States.’ Journal of Psychological Issues in Organizational Culture 3, no. S1 (2013), pp. 28-49.

Barrell, Ryan. Remove The All-Male Acts From Reading And Leeds Festival And It Would Be Almost Empty (2015) <https://www.huffingtonpost.co.uk/2015/02/25/reading-leeds-fest-surprisingly-male_n_6751664.html> [accessed 10 August 2021].

Beer, David. ‘The MP3 Player as a Mobile Digital Music Collection Portal.’ Mobile Computing: Concepts, Methodologies, Tools, and Applications, (2009), pp. 1168-1174.

Breese, John S., David Heckerman, and Carl Kadie. ‘Empirical analysis of predictive algorithms for collaborative filtering.’ Proceedings of UAI 1998, (1998), pp. 43-52.

Chander, Anupam. ‘The racist algorithm?’ Michigan Law Review 115, no. 6 (2017), pp. 1023-1045.

Chodos, Asher Tobin. ‘What does music mean to Spotify? An essay on musical significance in the era of digital curation.’ INSAM Journal of Contemporary Music, Art and Technology 1.2 (2019), pp. 36-64.

Explore - Spotify. Hubs (2021) <https://explore.spotify.com/pages/mobile-feature-discover-hubs> [accessed 10 August 2021].

Freidenvall, Lenita. ‘Gender quota spill-over in Sweden: From politics to business?’ EUI Department of Law Research Paper, 28, (2015).

Fryer, Wesley. Options for shuffling songs and podcasts on an iPod or in iTunes (2008) <https://www.speedofcreativity.org/2008/04/23/options-for-shuffling-songs-and-podcasts-on-an-ipod-or-in-itunesgen/> [accessed 10 August 2021].

Gillespie, Tarleton. ‘The relevance of algorithms.’ Media technologies: Essays on communication, materiality, and society, 167.2014 (2014), pp. 167-198.

Graells-Garrido, Eduardo, Mounia Lalmas, and Ricardo Baeza-Yates. ‘Encouraging diversity-and representation-awareness in geographically centralized content.’ Proceedings of the 21st International Conference on Intelligent User Interfaces, (2016), pp. 7-18.

Guardian, The. A man's world? Music festival posters bare after male acts are removed (2015) <https://www.theguardian.com/lifeandstyle/gallery/2015/jun/23/music-festival-posters-male-acts-removed-in-pictures> [accessed 10 August 2021].

Heath, Alex. Spotify has a secret 'taste profile' on everyone, and they showed me mine (2015) <https://www.businessinsider.com/how-spotify-taste-profiles-work-2015-9?r=US&IR=T> [accessed 10 August 2021].

Holmlund, Bertil. 'Comment on Harry J. Holzer: The economic impacts of affirmative action in the US', Swedish Economic Policy Review, 14, (2007), pp. 73-77.

Holtz, David, Benjamin Carterette, Praveen Chandar, Zahra Nazari, Henriette Cramer, and Sinan Aral. ‘The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify.’ Proceedings of the 21st ACM Conference on Economics and Computation, (2020).

Ingham, Tim. Who Really Owns Spotify? (2020) <https://www.rollingstone.com/pro/news/who-really-owns-spotify-955388/> [accessed 10 August 2021].

Keychange. Music organisations (2021) <https://www.keychange.eu/directory/music-organisations> [accessed 10 August 2021].

Lamere, Paul. Paul’s Music Wreckommender (2009) <https://musicmachinery.com/page/44/> [accessed 10 August 2021].

Levy, Steven. The perfect thing: How the iPod shuffles commerce, culture, and coolness (New York: Simon and Schuster, 2007), pp. 170-171.

Lydon, Christopher. The Age of Shuffle (2007) <https://radioopensource.org/the-age-of-shuffle/> [accessed 10 August 2021].

McNee, Sean M., John Riedl, and Joseph A. Konstan. ‘Being accurate is not enough: how accuracy metrics have hurt recommender systems.’ CHI'06 extended abstracts on Human factors in computing systems (2006).

Melby Clinton, Leah. Does Equality Extend To Our Playlists? (2018) <https://www.elle.com/culture/a19057364/smirnoff-equalizer-spotify/> [accessed 10 August 2021].

Morozov, Evgeny. Don't Be Evil (2011) <https://newrepublic.com/article/91916/google-schmidt-obama-gates-technocrats> [accessed 10 August 2021].

Morris, Jeremy Wade. ‘Curation by code: Infomediaries and the data mining of taste.’ European journal of cultural studies, 18.4-5 (2015), pp. 446-463.

Newitz, Annalee. Facebook fires human editors, algorithm immediately posts fake news (2016) <https://arstechnica.com/information-technology/2016/08/facebook-fires-human-editors-algorithm-immediately-posts-fake-news/> [accessed 10 August 2021].

Pagano, Roberto, Paolo Cremonesi, Martha Larson, Balázs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, and Massimo Quadrana. ‘The contextual turn: From context-aware to context-driven recommender systems.’ Proceedings of the 10th ACM conference on recommender systems, (2016), pp. 249-252.

Paine, Andre. Spotify launches Equal initiative to support women creators (2021) <https://www.musicweek.com/digital/read/spotify-launches-equal-initiative-to-support-women-creators/082767> [accessed 10 August 2021].

Parents Involved in Community Schools v. Seattle School District No. 1, 551 US 701 (2007) <https://www.oyez.org/cases/2006/05-908> [accessed 10 August 2021].

Pasquale, Frank. The Black Box Society (Cambridge, MA: Harvard University Press, 2015).

Pierce, David. Inside Spotify's Hunt for the Perfect Playlist (2015) <https://www.wired.com/2015/07/spotify-perfect-playlist/> [accessed 10 August 2021].

Pirate Staff. Only 7% Of Artists Have an Inclusion Rider, Here’s Why You Should (2021) <https://pirate.com/en/blog/news/seven-percent-of-artists-have-an-inclusion-rider/> [accessed 10 August 2021].

Powers, Devon. ‘Lost in the shuffle: Technology, history, and the idea of musical randomness.’ Critical studies in media communication 31, no. 3 (2014), pp. 244-264.

Prey, Robert. ‘Nothing personal: Algorithmic individuation on music streaming platforms.’ Media, Culture & Society 40.7 (2018), pp. 1086-1100.

Reality Check team. Music festivals: What's the world's biggest? (2018) <https://www.bbc.co.uk/news/world-44697302> [accessed 10 August 2021].

Ricci, Francesco, Lior Rokach, and Bracha Shapira. Recommender Systems Handbook (Boston, MA: Springer, 2011), pp. 1-35.

Ross, Alex. Listen to This (2004) <https://www.newyorker.com/magazine/2004/02/16/listen-to-this> [accessed 10 August 2021].

Ross, Danny. Spotify’s Head Of Music Explains How To Get On Playlists (2020) <https://www.forbes.com/sites/dannyross1/2020/03/02/spotifys-head-of-music-explains-playlisting/#:~:text=But%20no%20one's%20still%20looking,We%20have%203%2C000%20Spotify%20playlists> [accessed 10 August 2021].

Savage, Mark. Emily Eavis: 'There aren't enough female headliners' (2019) <https://www.bbc.co.uk/news/entertainment-arts-47753067> [accessed 10 August 2021].

Schwab, Pierre-Nicolas. Workshop on fairness and ethics in recommendation systems (2017) <https://www.intotheminds.com/blog/en/workshop-on-fairness-and-ethics-in-recommendation-systems/> [accessed 10 August 2021].

Segall, Laurie. Spotify wants to be the soundtrack of your life (2015) <https://money.cnn.com/2015/05/20/technology/spotify-announcement/index.html> [accessed 10 August 2021].

Seling, Megan, Musicians Should Have Inclusion Riders Too (2018) <https://www.nashvillescene.com/music/nashvillecream/musicians-should-have-inclusion-riders-too/article_f244ecb1-4f19-5627-a5fb-4b3578cf6054.html> [accessed 10 August 2021].

Smith, Stacy L., Katherine Pieper, Marc Choueiti, Karla Hernandez, and Kevin Yao. ‘Inclusion in the Recording Studio?’ RATIO 25: 16-8, (2021).

Snapes, Laura. 'It’s a statement of exclusion': music festivals return to UK but lineups still lack women (2021) <https://www.theguardian.com/music/2021/mar/26/music-festivals-return-to-uk-but-lineups-still-lack-women> [accessed 10 August 2021].

Snapes, Laura. Swedish women-only music festival found guilty of discrimination (2018) <https://www.theguardian.com/music/2018/dec/19/statement-swedish-women-only-music-festival-guilty-gender-discrimination> [accessed 10 August 2021].

Spotify Newsroom. Company Info (2021) <https://newsroom.spotify.com/company-info/> [accessed 10 August 2021].

Spotify Newsroom. Equalizer Project, Now in Its Fourth Year, Makes Strides in Increasing Female Representation in Music (2020) <https://newsroom.spotify.com/2020-10-27/equalizer-project-now-in-its-fourth-year-makes-strides-in-increasing-female-representation-in-music/> [accessed 10 August 2021].

Spotify Newsroom. Get to Know Some of the Women Featured in Spotify’s New EQUAL Music Program (2021) <https://newsroom.spotify.com/2021-04-29/get-to-know-some-of-the-women-featured-in-spotifys-new-equal-music-program/> [accessed 10 August 2021].

Spotify Newsroom. Introducing Two New Personalized Playlists: On Repeat and Repeat Rewind (2019) <https://newsroom.spotify.com/2019-09-24/introducing-two-new-personalized-playlists-on-repeat-and-repeat-rewind/> [accessed 10 August 2021].

u/histrel. Playlist shuffle broken? (2020) <https://www.reddit.com/r/YoutubeMusic/comments/f7r4l9/playlist_shuffle_broken/> [accessed 10 August 2021].

u/McCool71. Spotify: Recommended songs - new feature! (2016) <https://www.reddit.com/r/spotify/comments/4llsft/spotify_recommended_songs_new_feature/> [accessed 10 August 2021]

Van Camp, Jeffrey. Spotify is now free on Android tablets and iPad, but phones must shuffle (2013) <https://www.digitaltrends.com/mobile/spotify-free-moble-option/> [accessed 10 August 2021].

White, Emily. The Echo Nest CTO Brian Whitman on Spotify Deal, Man Vs. Machine, Why Pandora 'Freaks' Him Out (Q&A;) (2014) <https://web.archive.org/web/20171024053117/http://www.billboard.com/biz/articles/news/digital-and-mobile/5944950/the-echo-nest-cto-brian-whitman-on-spotify-deal-man-vs> [accessed 10 August 2021].

Whitman, Brian, and Steve Lawrence. ‘Inferring descriptions and similarity for music from community metadata.’ (2002).

Wicht, Luis Aguiar, Joel Waldfogel, and Sarah Waldfogel. ‘Playlisting Favorites: Is Spotify Gender-Biased?’ No. 2018-07. Joint Research Centre (Seville site), (2018).

Zhang, Yuan Cao, Diarmuid Ó. Séaghdha, Daniele Quercia, and Tamas Jambor. ‘Auralist: introducing serendipity into music recommendation.’ Proceedings of the fifth ACM international conference on Web search and data mining, (2012), pp. 13-22.

Zhou, Tao, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. ‘Solving the apparent diversity-accuracy dilemma of recommender systems.’ Proceedings of the National Academy of Sciences 107, (2010), pp. 4511-4515.