• Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
  • Engineering
  • Product
  • For Brands
  • What’s New
  • Music
  • Life at Anghami
No Result
View All Result
Anghami
No Result
View All Result

Outlier Detection In Music Playlists

Elias El Khoury by Elias El Khoury
May 18, 2018
in Engineering
Share on FacebookShare on Twitter

Statistical analysis of playlist’s mood

Playlists usually combine songs under a single category such as artist’s “Best of”, genre and epoch, activity-oriented (i.e. workout or cooking playlists) or mood-focused such as melancholic or joyful playlists. We will analyse the coherency of the latter through the evaluation of each song’s sentiment score and the use of the median absolute deviation (MAD) to robustly measure the variability of the playlist’s mood.

At Anghami, we focus on delivering the best user experience and quality content for music lovers. On our path towards continuous improvement and as a data-driven company, we have turned to statistics to further enhance our customer’s listening sessions.

Converting words into numbers

We’ve had adjectives assigned for every song in the playlist such as exciting, sensual, depressive, nostalgic, etc… and in order to analyse them, we created a unidimensional projection with values ranging from -1 (most extreme negative) to +1 (most extreme positive).

For this task, we’ve chosen VADER Sentiment Analysis [1], an open-source lexicon and rule-based sentiment analysis tool. Words that aren’t in VADER’s dictionary were replaced with their closest synonyms with the help of www.thesaurus.com.

Standard deviation & why we didn’t use it

The standard deviation (SD) is how much members of a group differ from the mean value of the group. A small SD indicates that the values are tightly located around the mean, while a large SD means the values are spread over a wide range. The following plot illustrates how the standard deviation is affected by the proximity of the values to the mean. Both populations have the same mean, but their values are spread differently.

Example of samples from two populations with the same mean but different standard deviations. Red population has mean 100 and SD 10; blue population has mean 100 and SD 50. Source

A simple method of determining outliers in a set is to find all entries that are two standard deviations away from the mean. Let’s take a look at this set of numbers: [1 1 2 2.2 3 3.5 4.1 9]

The mean is 3.2250 and SD is 2.5828. The last entry (9) is greater than mean+2*SD (8.3905). We have successfully detected the outlier here, but that’s not enough.

Let’s do the same for this set of numbers: [1 1 2 2.2 3 3.5 4.1 19 62]

mean=10.867, std=19.972

The red line is mean+2*SD (50.811), we failed to identify (19) which logically should be considered an outlier. The reason is that the standard deviation, which is based on squared distances from the mean, is greatly influenced by the large deviations of extreme outliers, in our case (62).

Median absolute deviation

The median absolute deviation (MAD) is a robust measure of statistical dispersion and is more resilient to few extreme outliers. It is defined as the median of the absolute deviations from the data’s median, simply MAD = median( abs( Xᵢ – median(X) ).

Let’s compute the robust zScore of each of the previous data points using the MAD:

octave:1> a = [1 1 2 2.2 3 3.5 4.1 19 62];
octave:2> abs(a - median(a)) / mad(a,1)
ans =1.81818    1.81818    0.90909    0.72727    0.00000    0.45455    1.00000   14.54545   53.63636

Using the same cut-off factor of 2, we find that the outliers here are entries 19 and 62.

octave:3> a(find((abs(a - median(a)) / mad(a,1)) > 2))
ans =19 62

It’s not a perfect solution, but it’s much better than the less robust std. dev.


Wrapping up

For every playlist, we translate the song’s mood into a normalized score using VADER and compute its robust zScore in terms of the median absolute deviation. The playlists are ordered using the distance between their maximum and minimum sentiments and the sum of their songs’ zScores; thus, if a playlist contains a negative song among mostly positive ones, it will be surfaced first and the outlier songs highlighted for review.

What’s next?

Now that we’re able to analyse playlists over a single attribute (sentiment), we began looking into methods that efficiently deal with multiple dimensions. Hopefully this will be the topic of another story.


[1] Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Tags: Data AnalysisResearch and DevelopmentSoundStatistics
Elias El Khoury

Elias El Khoury

VP Information & Content Systems @ Anghami & OSN+, joined Anghami in 2016

Related Posts

Guitar + Honeycomb: Anghami’s Complete Data Engineering Solution
Engineering

Guitar + Honeycomb: Anghami’s Complete Data Engineering Solution

How we built a unified Data Engineering platform for schema management and Spark job development that transformed our workflows...

by Ajinkya Bhat
February 13, 2026
Graph-based network detection using Jaccard similarity to connect labels with shared content
Engineering

Detecting Music Label Fraud at Scale: A Graph-Based Approach

The Problem Every day, thousands of new songs appear on streaming platforms-not all of them created by real artists....

by Elias El Khoury
January 8, 2026
House of Code: rebuilding OSN+ in 4 months
Engineering

Rebuilding OSN+: A Technical Post-Mortem

I have wanted to write this post for a while now. But honestly, after the marathon of delivering this...

by Sebastien Melki
October 3, 2025
+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية
Engineering

+OSN تتعاون مع شركة castLabs لتعزيز حماية المحتوى على منصتها الرقمية

أعلنت castLabs، الشركة الرائدة في تكنولوجيا الفيديو الرقمي، عن تعاونها مع +OSN لتقديم تقنية "دي آر إم توداي" لحماية...

by Nour Sawli
September 11, 2024
Next Post
The Power of Audio

The Power of Audio

  • Anghami and Huawei celebrate five years of collaboration shaping a connected entertainment ecosystem across MENA

    Anghami and Huawei celebrate five years of collaboration shaping a connected entertainment ecosystem across MENA

    0 shares
    Share 0 Tweet 0
  • Anghami partners with Athar 2025 to power culture and innovation

    0 shares
    Share 0 Tweet 0
  • Anghami Files 2023 Annual Report and Announces 2024 Q1 Results, Highlighting 18% Growth in Subscribers and Significant Margin Improvement

    0 shares
    Share 0 Tweet 0
  • TikTok and Anghami make streaming Nancy Ajram’s ‘Nancy 11’ easier than ever with new ‘Add to Music App’ feature

    0 shares
    Share 0 Tweet 0
  • Anghami and Groover team up to support up-and-coming artists worldwide.

    0 shares
    Share 0 Tweet 0

About Anghami . Join Our Team . Go To app

© 2021 Anghami

No Result
View All Result
  • Homepage
  • Engineering
  • Product
  • What’s New
  • For Brands
  • Music
  • Life at Anghami

© 2020 Anghami blog