• Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
  • Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
Anghami
No Result
View All Result

Outlier Detection In Music Playlists

Elias El Khoury by Elias El Khoury
May 18, 2018
in Engineering
Share on FacebookShare on Twitter

Statistical analysis of playlist’s mood

Playlists usually combine songs under a single category such as artist’s “Best of”, genre and epoch, activity-oriented (i.e. workout or cooking playlists) or mood-focused such as melancholic or joyful playlists. We will analyse the coherency of the latter through the evaluation of each song’s sentiment score and the use of the median absolute deviation (MAD) to robustly measure the variability of the playlist’s mood.

At Anghami, we focus on delivering the best user experience and quality content for music lovers. On our path towards continuous improvement and as a data-driven company, we have turned to statistics to further enhance our customer’s listening sessions.

Converting words into numbers

We’ve had adjectives assigned for every song in the playlist such as exciting, sensual, depressive, nostalgic, etc… and in order to analyse them, we created a unidimensional projection with values ranging from -1 (most extreme negative) to +1 (most extreme positive).

For this task, we’ve chosen VADER Sentiment Analysis [1], an open-source lexicon and rule-based sentiment analysis tool. Words that aren’t in VADER’s dictionary were replaced with their closest synonyms with the help of www.thesaurus.com.

Standard deviation & why we didn’t use it

The standard deviation (SD) is how much members of a group differ from the mean value of the group. A small SD indicates that the values are tightly located around the mean, while a large SD means the values are spread over a wide range. The following plot illustrates how the standard deviation is affected by the proximity of the values to the mean. Both populations have the same mean, but their values are spread differently.

Example of samples from two populations with the same mean but different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50. Source

A simple method of determining outliers in a set is to find all entries that are two standard deviations away from the mean. Let’s take a look at this set of numbers: [1 1 2 2.2 3 3.5 4.1 9]

The mean is 3.2250 and SD is 2.5828. The last entry (9) is greater than mean+2*SD (8.3905). We have successfully detected the outlier here, but that’s not enough.

Let’s do the same for this set of numbers: [1 1 2 2.2 3 3.5 4.1 19 62]

mean=10.867, std=19.972

The red line is mean+2*SD (50.811), we failed to identify (19) which logically should be considered an outlier. The reason is that the standard deviation, which is based on squared distances from the mean, is greatly influenced by the large deviations of extreme outliers, in our case (62).

Median absolute deviation

The median absolute deviation (MAD) is a robust measure of statistical dispersion and is more resilient to few extreme outliers. It is defined as the median of the absolute deviations from the data’s median, simply MAD = median( abs( Xᵢ – median(X) ).

Let’s compute the robust zScore of each of the previous data points using the MAD:

octave:1> a = [1 1 2 2.2 3 3.5 4.1 19 62];
octave:2> abs(a - median(a)) / mad(a,1)
ans =1.81818    1.81818    0.90909    0.72727    0.00000    0.45455    1.00000   14.54545   53.63636

Using the same cut-off factor of 2, we find that the outliers here are entries 19 and 62.

octave:3> a(find((abs(a - median(a)) / mad(a,1)) > 2))
ans =19 62

It’s not a perfect solution, but it’s much better than the less robust std. dev.


Wrapping up

For every playlist, we translate the song’s mood into a normalized score using VADER and compute its robust zScore in terms of the median absolute deviation. The playlists are ordered using the distance between their maximum and minimum sentiments and the sum of their songs’ zScores; thus, if a playlist contains a negative song among mostly positive ones, it will be surfaced first and the outlier songs highlighted for review.

What’s next?

Now that we’re able to analyse playlists over a single attribute (sentiment), we began looking into methods that efficiently deal with multiple dimensions. Hopefully this will be the topic of another story.


[1] Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Tags: Data AnalysisResearch and DevelopmentSoundStatistics
Elias El Khoury

Elias El Khoury

Ingestion & R&D Lead, joined Anghami in 2016

Related Posts

Simple Re-Ranker For Personalized Music Recommendation At Anghami
Engineering

Simple Re-Ranker For Personalized Music Recommendation At Anghami

How we used a simpler re-ranker based on UserDNA to improve our recommendations Recommending more personalized music helps users...

by Jimmy Jarjoura
March 9, 2021
Goya, Anghami’s Painter
Engineering

Goya, Anghami’s Painter

Music in the streaming era doesn’t exist alone, it is coupled with images; album arts, playlist cover arts, the...

by Ibrahim Fawaz
October 6, 2020
Anghami Live Radios
Engineering

Anghami Live Radios

Prior to digitally buying and streaming music on our smartphones, mediums ranging from vinyls, cassettes, to walkmans dominated the...

by Sebastien Melki
September 21, 2020
Managing an Application Rewrite
Engineering

Managing an Application Rewrite

Last February, Anghami released a totally revamped version of its iOS application. Since its inception in 2012, Anghami has...

by Marwan Fawaz
July 12, 2020
Next Post
The Power of Audio

The Power of Audio

  • ANGHAMI LAUNCHES THE THUMB RACE CHAMPIONSHIP CLAIM YOUR FAVORITE SONG AND SHARE IT WITH THE WORLD

    ANGHAMI LAUNCHES THE THUMB RACE CHAMPIONSHIP CLAIM YOUR FAVORITE SONG AND SHARE IT WITH THE WORLD

    0 shares
    Share 0 Tweet 0
  • Hidden Anghami Features

    0 shares
    Share 0 Tweet 0
  • ANGHAMI LAB OPENS IN BOULEVARD RIYADH CITY, BRINGING A WHOLE NEW EXPERIENCE TO THE CITY’S BUZZING MUSIC AND SOCIAL SCENE

    0 shares
    Share 0 Tweet 0
  • 5 Tips to Increase Your Streams on Anghami

    0 shares
    Share 0 Tweet 0
  • Anghami Live Radios

    0 shares
    Share 0 Tweet 0

About Anghami . Join Our Team . Go To app

© 2021 Anghami

No Result
View All Result
  • Homepage
  • Engineering
  • Product
  • What’s New
  • For Brands
  • Music
  • Life at Anghami

© 2020 Anghami blog