Phonemes and Frequencies, or, Why is TLC's No Scrubs So Catchy?
No, I don't want no scrub A scrub is a guy that can't get no love from me Hangin' out the passenger side of his best friend's ride Tryin' to holla at me [...] Well a scrub checkin' me, but his game is kinda weak And I know that he cannot approach me 'Cause I'm lookin' like class and he's lookin' like trash Can't get wit' a deadbeat ass, so...
Who Cares?
TODO explain motivations, goals, and outline this process.
- Start with a question
- Inductive Phase: Find an answer
- Come up with a useful framework for understanding the question
- Stare at the original problem through the lens of the framework, and reframe the question
- Infer signal features or patterns using your framework of understanding
- Come up with testable hypothesis (or, more)
- Deductive phase: Devil's advocate
- Find a test set
- Large scale testing and deductive reasoning to prove theory
- Note limitations of the test
- Conclusion phase: Did you discover something, or fool yourself?
The Jam
Let's start with the chorus;
No, I don't want your number No, I don't wanna give you mine and No, I don't wanna meet you nowhere No, I don't want none of your time and No, I don't want no scrub A scrub is a guy that can't get no love from me Hangin' out the passenger side of his best friend's ride Tryin' to holla at me
Right off the bat, without any having to use any computer magic, we can notice a few things;
- Near-rhyme: number, nowhere
- Near-rhyme: you mine, your time
- Near-rhyme: No scrub, no love
- assonance: a, scrub, love, from
- ?: passenger side, his best friend's ride,
Rhyme, near rhyme, assonance, allitteration, huh?
What are these? Define them
One method we have for making the "sounds" of words easier to analyze are "phonemes", which are a set of discrete sounds used in human language.
All About Phonemes
English has X phonemes. Here they are;
arpabet and cmudict
Some python to convert text into phonetic representation;
chorus in phonemes;
Are the patterns easier to see now?
Convert the verse to phonemes
Let's download the most current version of CMUDict & fire up an IPython session.
Download the dictionary file from cmudict:
import urllib.request CMUDICT_LOCATION = r"http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict-0.7b" urllib.request.urlretrieve(CMUDICT_LOCATION, r"./cmudict")
('./cmudict', <http.client.HTTPMessage at 0x7f2f78033370>)
And let's take a peek at what's inside. This command shows the last 10 lines of our downloaded file.
cat cmudict | tail -10
ZYUGANOV Z Y UW1 G AA0 N AA0 V ZYUGANOV(1) Z UW1 G AA0 N AA0 V ZYUGANOV'S Z Y UW1 G AA0 N AA0 V Z ZYUGANOV'S(1) Z UW1 G AA0 N AA0 V Z ZYWICKI Z IH0 W IH1 K IY0 {BRACE B R EY1 S {LEFT-BRACE L EH1 F T B R EY1 S {OPEN-BRACE OW1 P EH0 N B R EY1 S }CLOSE-BRACE K L OW1 Z B R EY1 S }RIGHT-BRACE R AY1 T B R EY1 S
As documented on Carnegie Mellon's site, the format of the file is as follows; TODO
Let's look at the start of the file too:
cat cmudict | head -10
;;; # CMUdict -- Major Version: 0.07 ;;; ;;; # $HeadURL$ ;;; # $Date:: $: ;;; # $Id:: $: ;;; # $Rev:: $: ;;; # $Author:: $: ;;; ;;; # ;;; # ========================================================================
Ah, yeah, there are comments to ignore as well. We'll drop any lines beginning with three semicolons.
Okay, so, that's cool. Let's load this data into Python by creating a map from words to pronunciations.
import re word_to_pronunciations = {} # Iterate across every line in the file with open("cmudict", "r", encoding="latin-1") as fin: for line in fin: # Split the word off the front of the line word, *phonemes = line.split() # Skip empty lines if len( word ) <= 0: continue # If this word is punctuation, skip to the next one # This skips comment lines too first_letter_value = ord(word[0]) if first_letter_value < ord('A') or first_letter_value > ord('Z'): continue # Remove the (1), (2) etc from the end of words with mutliple pronunciations word = re.sub( r"\(\d+\)$", "", word ) # Get the current pronunciation list for this word, creating it # if needed, and add the new pronunciation. try: pronunciations = word_to_pronunciations[word] except KeyError: pronunciations = [] word_to_pronunciations[ word ] = pronunciations pronunciations.append(tuple(phonemes))
As a test, let's see how Tomato is pronounced;
word_to_pronunciations[ "TOMATO" ]
[('T', 'AH0', 'M', 'EY1', 'T', 'OW2'), ('T', 'AH0', 'M', 'AA1', 'T', 'OW2')]
The debate continues, I guess.
Anyhoo, let's translate the verse into phonemes;
VERSE_1 = """ No, I don't want your number No, I don't wanna give you mine and No, I don't wanna meet you nowhere No, I don't want none of your time and No, I don't want no scrub A scrub is a guy that can't get no love from me Hangin' out the passenger side of his best friend's ride Tryin' to holla at me """ LETTERS_AND_APOSTROPHE_RE = r"[^a-zA-Z']" def text_to_phonemes( text ): ret = [] for line in text.split("\n"): for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line): if token.upper() in word_to_pronunciations: # We have a pronunciation for this token -- add it! ret.append( "|".join( word_to_pronunciations[ token.upper() ][0] ) ) else: ret.append( token ) ret.append( "\n" ) return "".join( ret ) print( text_to_phonemes( VERSE_1 ) )
N|OW1, AY1 D|OW1|N|T W|AA1|N|T Y|AO1|R N|AH1|M|B|ER0 N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 G|IH1|V Y|UW1 M|AY1|N AH0|N|D N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 M|IY1|T Y|UW1 N|OW1|W|EH2|R N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|AH1|N AH1|V Y|AO1|R T|AY1|M AH0|N|D N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|OW1 S|K|R|AH1|B AH0 S|K|R|AH1|B IH1|Z AH0 G|AY1 DH|AE1|T K|AE1|N|T G|EH1|T N|OW1 L|AH1|V F|R|AH1|M M|IY1 HH|AE1|NG|G|IH0|N AW1|T DH|AH0 P|AE1|S|AH0|N|JH|ER0 S|AY1|D AH1|V HH|IH1|Z B|EH1|S|T F|R|EH1|N|D|Z R|AY1|D T|R|AY1|IH0|N T|UW1 holla AE1|T M|IY1
Let's add a pronunciation for "holla":
word_to_pronunciations[ "HOLLA" ] = [ [ "HH", "AA1", "L", "AH0" ] ]
And try again;
print( text_to_phonemes( VERSE_1 ) )
N|OW1, AY1 D|OW1|N|T W|AA1|N|T Y|AO1|R N|AH1|M|B|ER0 N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 G|IH1|V Y|UW1 M|AY1|N AH0|N|D N|OW1, AY1 D|OW1|N|T W|AA1|N|AH0 M|IY1|T Y|UW1 N|OW1|W|EH2|R N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|AH1|N AH1|V Y|AO1|R T|AY1|M AH0|N|D N|OW1, AY1 D|OW1|N|T W|AA1|N|T N|OW1 S|K|R|AH1|B AH0 S|K|R|AH1|B IH1|Z AH0 G|AY1 DH|AE1|T K|AE1|N|T G|EH1|T N|OW1 L|AH1|V F|R|AH1|M M|IY1 HH|AE1|NG|G|IH0|N AW1|T DH|AH0 P|AE1|S|AH0|N|JH|ER0 S|AY1|D AH1|V HH|IH1|Z B|EH1|S|T F|R|EH1|N|D|Z R|AY1|D T|R|AY1|IH0|N T|UW1 HH|AA1|L|AH0 AE1|T M|IY1
Well, that's hard to look at. Let's add some color by converting our output from text to html, and highlighting vowel phonemes.
EXPORT_BOX_STYLE = """ border: 1px solid #ccc; box-shadow: 3px 3px 3px #eee; padding: 8pt; font-family: monospace; overflow: auto; margin: 1.2em;""" WORD_STYLE = """ padding: 1px; border: 1px solid black; display: inline-block;""" VOWEL_PHONEMES = [ "AA", "AE", "AH", "AO", "EH", "IH", "IY", "UH", "UW", "AW", "AY", "ER", "EY", "OW", "OY" ] def background_color( text, colorcode ): return r"""<span style="background-color:#{};">{}</span>""".format( colorcode, text ) def colorize_phonemes( *phonemes ): template = r"""<span style="{}">{}</span>""" ret = [] for p in phonemes: if p[:-1] in VOWEL_PHONEMES: ret.append( background_color( p, "fdd" ) ) else: ret.append( background_color( p, "dfd" ) ) return template.format( WORD_STYLE, "|".join( ret ) ) def text_to_phonemes( text ): template = r"""<div style="{}">{}</div>""" ret = [] for line in text.split("\n"): for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line): if token.upper() in word_to_pronunciations: # We have a pronunciation for this token -- add it! first_pronunciation = word_to_pronunciations[ token.upper() ][0] colored_phonemes = colorize_phonemes( *first_pronunciation ) ret.append( colored_phonemes ) else: ret.append( token ) ret.append( "</br>\n" ) return template.format( EXPORT_BOX_STYLE, "".join( ret ) ) print( text_to_phonemes( VERSE_1 ) )
Now we're cooking with gas! Let's add the syllable count after each word, and change up the colors to something more appealing.
EXPORT_BOX_STYLE = """ border: 1px solid #ccc; box-shadow: 3px 3px 3px #eee; padding: 8pt; font-family: monospace; font-size: 13px; overflow: auto; margin: 1.2em;""" WORD_STYLE = """ padding: 1px; border: 1px solid black; display: inline-block;""" VOWEL_PHONEMES = [ "AA", "AE", "AH", "AO", "EH", "IH", "IY", "UH", "UW", "AW", "AY", "ER", "EY", "OW", "OY" ] def background_color( text, colorcode ): return r"""<span style="background-color:#{};">{}</span>""".format( colorcode, text ) def word_to_html( word ): template = r"""<span title="{}" style="{}">{}</span>{}""" ret = [] phonemes = word_to_pronunciations[ word.upper() ][0] for p in phonemes: if p[:-1] in VOWEL_PHONEMES: ret.append( background_color( p, "fdd" ) ) else: ret.append( background_color( p, "dfd" ) ) syllable_count = len( [ p for p in phonemes if p[:-1] in VOWEL_PHONEMES ] ) return template.format( word, WORD_STYLE, "|".join( ret ), syllable_count ) def text_to_phonemes( text ): template = r"""<div style="{}">{}</div>""" ret = [] for line in text.split("\n"): for token in re.split("(" + LETTERS_AND_APOSTROPHE_RE + ")", line): if token.upper() in word_to_pronunciations: # We have a pronunciation for this token -- add it! colored_phonemes = word_to_html( token ) ret.append( colored_phonemes ) else: ret.append( token ) ret.append( "</br>\n" ) return template.format( EXPORT_BOX_STYLE, "".join( ret ) ) print( text_to_phonemes( VERSE_1 ) )
Let's Hear It!
Only half the story – lyrics do not capture the intensity or emphasis given