Phonemes and Frequencies, or, Why is TLC's No Scrubs So Catchy?

No, I don't want no scrub

A scrub is a guy that can't get no love from me

Hangin' out the passenger side of his best friend's ride

Tryin' to holla at me

[...]

Well a scrub checkin' me, but his game is kinda weak

And I know that he cannot approach me

'Cause I'm lookin' like class and he's lookin' like trash

Can't get wit' a deadbeat ass, so...

Who Cares?

TODO explain motivations, goals, and outline this process.

Start with a question
Inductive Phase: Find an answer
- Come up with a useful framework for understanding the question
- Stare at the original problem through the lens of the framework, and reframe the question
- Infer signal features or patterns using your framework of understanding
- Come up with testable hypothesis (or, more)
Deductive phase: Devil's advocate
- Find a test set
- Large scale testing and deductive reasoning to prove theory
- Note limitations of the test
Conclusion phase: Did you discover something, or fool yourself?

The Jam

Let's start with the chorus;

No, I don't want your number
No, I don't wanna give you mine and
No, I don't wanna meet you nowhere
No, I don't want none of your time and

No, I don't want no scrub
A scrub is a guy that can't get no love from me
Hangin' out the passenger side of his best friend's ride
Tryin' to holla at me

Right off the bat, without any having to use any computer magic, we can notice a few things;

Near-rhyme: number, nowhere
Near-rhyme: you mine, your time
Near-rhyme: No scrub, no love
assonance: a, scrub, love, from
?: passenger side, his best friend's ride,

Rhyme, near rhyme, assonance, allitteration, huh?

What are these? Define them

One method we have for making the "sounds" of words easier to analyze are "phonemes", which are a set of discrete sounds used in human language.

All About Phonemes

English has X phonemes. Here they are;

arpabet and cmudict

Some python to convert text into phonetic representation;

chorus in phonemes;

Are the patterns easier to see now?

Convert the verse to phonemes

Let's download the most current version of CMUDict & fire up an IPython session.

Download the dictionary file from cmudict:

import urllib.request

CMUDICT_LOCATION = r"http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b"

urllib.request.urlretrieve(CMUDICT_LOCATION, r"./cmudict")

('./cmudict', <http.client.HTTPMessage at 0x7f2f78033370>)

And let's take a peek at what's inside. This command shows the last 10 lines of our downloaded file.

cat cmudict | tail -10

ZYUGANOV  Z Y UW1 G AA0 N AA0 V
ZYUGANOV(1)  Z UW1 G AA0 N AA0 V
ZYUGANOV'S  Z Y UW1 G AA0 N AA0 V Z
ZYUGANOV'S(1)  Z UW1 G AA0 N AA0 V Z
ZYWICKI  Z IH0 W IH1 K IY0
{BRACE  B R EY1 S
{LEFT-BRACE  L EH1 F T B R EY1 S
{OPEN-BRACE  OW1 P EH0 N B R EY1 S
}CLOSE-BRACE  K L OW1 Z B R EY1 S
}RIGHT-BRACE  R AY1 T B R EY1 S

As documented on Carnegie Mellon's site, the format of the file is as follows; TODO

Let's look at the start of the file too:

cat cmudict | head -10

;;; # CMUdict  --  Major Version: 0.07
;;; 
;;; # $HeadURL$
;;; # $Date::                                                   $:
;;; # $Id::                                                     $:
;;; # $Rev::                                                    $: 
;;; # $Author::                                                 $:
;;;
;;; #
;;; # ========================================================================

Ah, yeah, there are comments to ignore as well. We'll drop any lines beginning with three semicolons.

Okay, so, that's cool. Let's load this data into Python by creating a map from words to pronunciations.

import re

word_to_pronunciations = {}

# Iterate across every line in the file
with open("cmudict", "r", encoding="latin-1") as fin:
    for line in fin:

        # Split the word off the front of the line
        word, *phonemes = line.split()

        # Skip empty lines
        if len( word ) <= 0:
            continue

        # If this word is punctuation, skip to the next one
        # This skips comment lines too
        first_letter_value = ord(word[0])
        if first_letter_value < ord('A') or first_letter_value > ord('Z'):
            continue

        # Remove the (1), (2) etc from the end of words with mutliple pronunciations
        word = re.sub( r"\(\d+\)$", "", word )

        # Get the current pronunciation list for this word, creating it
        # if needed, and add the new pronunciation.
        try:
            pronunciations = word_to_pronunciations[word]
        except KeyError:
            pronunciations = []
            word_to_pronunciations[ word ] = pronunciations
        pronunciations.append(tuple(phonemes))

As a test, let's see how Tomato is pronounced;

word_to_pronunciations[ "TOMATO" ]

[('T', 'AH0', 'M', 'EY1', 'T', 'OW2'), ('T', 'AH0', 'M', 'AA1', 'T', 'OW2')]

The debate continues, I guess.

Anyhoo, let's translate the verse into phonemes;

VERSE_1 = """
No, I don't want your number
No, I don't wanna give you mine and
No, I don't wanna meet you nowhere
No, I don't want none of your time and

No, I don't want no scrub
A scrub is a guy that can't get no love from me
Hangin' out the passenger side of his best friend's ride
Tryin' to holla at me
"""

LETTERS_AND_APOSTROPHE_RE = r"[^a-zA-Z']"

def text_to_phonemes( text ):
    ret = []
    for line in text.split("\n"):
        for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line):
            if token.upper() in word_to_pronunciations:
                # We have a pronunciation for this token -- add it!
                ret.append( "|".join( word_to_pronunciations[ token.upper() ][0] ) )
            else:
                ret.append( token )
        ret.append( "\n" )
    return "".join( ret )

print( text_to_phonemes( VERSE_1 ) )


N|OW1, AY1 D|OW1|N|T W|AA1|N|T Y|AO1|R N|AH1|M|B|ER0
N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 G|IH1|V Y|UW1 M|AY1|N AH0|N|D
N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 M|IY1|T Y|UW1 N|OW1|W|EH2|R
N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|AH1|N AH1|V Y|AO1|R T|AY1|M AH0|N|D

N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|OW1 S|K|R|AH1|B
AH0 S|K|R|AH1|B IH1|Z AH0 G|AY1 DH|AE1|T K|AE1|N|T G|EH1|T N|OW1 L|AH1|V F|R|AH1|M M|IY1
HH|AE1|NG|G|IH0|N AW1|T DH|AH0 P|AE1|S|AH0|N|JH|ER0 S|AY1|D AH1|V HH|IH1|Z B|EH1|S|T F|R|EH1|N|D|Z R|AY1|D
T|R|AY1|IH0|N T|UW1 holla AE1|T M|IY1

Let's add a pronunciation for "holla":

word_to_pronunciations[ "HOLLA" ] = [ [ "HH", "AA1", "L", "AH0" ] ]

And try again;

print( text_to_phonemes( VERSE_1 ) )


N|OW1, AY1 D|OW1|N|T W|AA1|N|T Y|AO1|R N|AH1|M|B|ER0
N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 G|IH1|V Y|UW1 M|AY1|N AH0|N|D
N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 M|IY1|T Y|UW1 N|OW1|W|EH2|R
N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|AH1|N AH1|V Y|AO1|R T|AY1|M AH0|N|D

N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|OW1 S|K|R|AH1|B
AH0 S|K|R|AH1|B IH1|Z AH0 G|AY1 DH|AE1|T K|AE1|N|T G|EH1|T N|OW1 L|AH1|V F|R|AH1|M M|IY1
HH|AE1|NG|G|IH0|N AW1|T DH|AH0 P|AE1|S|AH0|N|JH|ER0 S|AY1|D AH1|V HH|IH1|Z B|EH1|S|T F|R|EH1|N|D|Z R|AY1|D
T|R|AY1|IH0|N T|UW1 HH|AA1|L|AH0 AE1|T M|IY1

Well, that's hard to look at. Let's add some color by converting our output from text to html, and highlighting vowel phonemes.

EXPORT_BOX_STYLE = """
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;"""

WORD_STYLE = """
padding: 1px;
border: 1px solid black;
display: inline-block;"""

VOWEL_PHONEMES = [ "AA", "AE", "AH", "AO", "EH", "IH", "IY", "UH", "UW",
                   "AW", "AY", "ER", "EY", "OW", "OY" ]

def background_color( text, colorcode ):
    return r"""<span style="background-color:#{};">{}</span>""".format( colorcode, text )

def colorize_phonemes( *phonemes ):
    template = r"""<span style="{}">{}</span>"""
    ret = []
    for p in phonemes:
        if p[:-1] in VOWEL_PHONEMES:
            ret.append( background_color( p, "fdd" ) )
        else:
            ret.append( background_color( p, "dfd" ) )
    return template.format( WORD_STYLE, "|".join( ret ) )

def text_to_phonemes( text ):
    template = r"""<div style="{}">{}</div>"""
    ret = []
    for line in text.split("\n"):
        for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line):
            if token.upper() in word_to_pronunciations:
                # We have a pronunciation for this token -- add it!
                first_pronunciation = word_to_pronunciations[ token.upper() ][0]
                colored_phonemes = colorize_phonemes( *first_pronunciation )
                ret.append( colored_phonemes )
            else:
                ret.append( token )
        ret.append( "</br>\n" )
    return template.format( EXPORT_BOX_STYLE, "".join( ret ) )

print( text_to_phonemes( VERSE_1 ) )

N|OW1, AY1 D|OW1|N|T W|AA1|N|T Y|AO1|R N|AH1|M|B|ER0

N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 G|IH1|V Y|UW1 M|AY1|N AH0|N|D

N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 M|IY1|T Y|UW1 N|OW1|W|EH2|R

N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|AH1|N AH1|V Y|AO1|R T|AY1|M AH0|N|D

N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|OW1 S|K|R|AH1|B

AH0 S|K|R|AH1|B IH1|Z AH0 G|AY1 DH|AE1|T K|AE1|N|T G|EH1|T N|OW1 L|AH1|V F|R|AH1|M M|IY1

HH|AE1|NG|G|IH0|N AW1|T DH|AH0 P|AE1|S|AH0|N|JH|ER0 S|AY1|D AH1|V HH|IH1|Z B|EH1|S|T F|R|EH1|N|D|Z R|AY1|D

T|R|AY1|IH0|N T|UW1 HH|AA1|L|AH0 AE1|T M|IY1

Now we're cooking with gas! Let's add the syllable count after each word, and change up the colors to something more appealing.

EXPORT_BOX_STYLE = """
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
font-size: 13px;
overflow: auto;
margin: 1.2em;"""

WORD_STYLE = """
padding: 1px;
border: 1px solid black;
display: inline-block;"""

VOWEL_PHONEMES = [ "AA", "AE", "AH", "AO", "EH", "IH", "IY", "UH", "UW",
                   "AW", "AY", "ER", "EY", "OW", "OY" ]

def background_color( text, colorcode ):
    return r"""<span style="background-color:#{};">{}</span>""".format( colorcode, text )

def word_to_html( word ):
    template = r"""<span title="{}" style="{}">{}</span>{}"""
    ret = []
    phonemes = word_to_pronunciations[ word.upper() ][0]
    for p in phonemes:
        if p[:-1] in VOWEL_PHONEMES:
            ret.append( background_color( p, "fdd" ) )
        else:
            ret.append( background_color( p, "dfd" ) )
    syllable_count = len( [ p for p in phonemes if p[:-1] in VOWEL_PHONEMES ] )
    return template.format( word, WORD_STYLE, "|".join( ret ), syllable_count )

def text_to_phonemes( text ):
    template = r"""<div style="{}">{}</div>"""
    ret = []
    for line in text.split("\n"):
        for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line):
            if token.upper() in word_to_pronunciations:
                # We have a pronunciation for this token -- add it!
                colored_phonemes = word_to_html( token )
                ret.append( colored_phonemes )
            else:
                ret.append( token )
        ret.append( "</br>\n" )
    return template.format( EXPORT_BOX_STYLE, "".join( ret ) )

print( text_to_phonemes( VERSE_1 ) )

N|OW11, AY11 D|OW1|N|T1 W|AA1|N|T1 Y|AO1|R1 N|AH1|M|B|ER02

N|OW11, AY11 D|OW1|N|T1 W|AA1|N|AH02 G|IH1|V1 Y|UW11 M|AY1|N1 AH0|N|D1

N|OW11, AY11 D|OW1|N|T1 W|AA1|N|AH02 M|IY1|T1 Y|UW11 N|OW1|W|EH2|R2

N|OW11, AY11 D|OW1|N|T1 W|AA1|N|T1 N|AH1|N1 AH1|V1 Y|AO1|R1 T|AY1|M1 AH0|N|D1

N|OW11, AY11 D|OW1|N|T1 W|AA1|N|T1 N|OW11 S|K|R|AH1|B1

AH01 S|K|R|AH1|B1 IH1|Z1 AH01 G|AY11 DH|AE1|T1 K|AE1|N|T1 G|EH1|T1 N|OW11 L|AH1|V1 F|R|AH1|M1 M|IY11

HH|AE1|NG|G|IH0|N2 AW1|T1 DH|AH01 P|AE1|S|AH0|N|JH|ER03 S|AY1|D1 AH1|V1 HH|IH1|Z1 B|EH1|S|T1 F|R|EH1|N|D|Z1 R|AY1|D1

T|R|AY1|IH0|N2 T|UW11 HH|AA1|L|AH02 AE1|T1 M|IY11

Let's Hear It!

Only half the story – lyrics do not capture the intensity or emphasis given