[ C H R O N I C L E ]
In My Life is a nostalgic ballad by the Beatles on the Rubber Soul album “written mainly by John Lennon” with the help of Paul McCartney. I have placed quotation marks around the attribution found on Wikipédia. Indeed, both artists have different memories of their respective roles in writing the song – as they do for another of the group’s successes, Eleanor Rigby. For a long time, journalists have sought material elements (sheet music, witness testimonies) to back one or other attribution, but in vain. A more scientific approach has recently been drawn up by a small group of statisticians in North America.
At the beginning of August, during the 2018 Joint Statistical Meetings (JSM 2018) in Vancouver, Mark Glickman (glicko.net), a statistician at Harvard University in the United States, presented a preliminary study in collaboration with Jason Brown from Dalhousie University in Canada. Their work consisted of applying analysis techniques usually used to attribute a text to one of a number of presumed authors, to music. The most famous example of this process was the identification of the American terrorist known as the Unabomber as the former mathematician Theodore Kaczynski.
The first step consists of getting a “profile” of the authors. In various texts written by them, the probability of each word or expression appearing is calculated (with the exception of articles, pronouns and a few common words). For each author therefore, a “bag of words” is obtained according to their vocabulary. Then, the mystery text is studied by initially supposing that it was equally probable to have been written by one or the other of the authors (the a priori probability). Probabilities are then updated so that it is shown to have been written by one or other of the authors according to the words contained (and hence their probabilities in the “bag of words” associated with each author), using a simple probability calculation, the Bayes formula. Intuitively, if a rare word is frequently used by one author and never by the other, this increases the probability that the text was written by the first author. At the end of the text, a (calculated) probability is obtained that answers the original attribution question. This general Baysian classification principle is frequently used by software to identify and filter out unwanted emails.
The statisticians’ idea was to consider the melody of In My Life as a text and to compare it to other texts (music) correctly attributed to John or Paul. Even without being a great musician, you know that there are no real words in a melody; therefore you have to find an equivalent for words and expressions. Statisticians have several ways of slicing up music into elements that play this role (some of these choices being arbitrary); chords, sequences of two chords, groups of four notes called here “contours”. Using these pieces of music, they refine their attribution probabilities. After an initial classification, they obtain probabilities that are used as supposed values for a classification according to the second slicing. This provides a supposed probability for the third slicing, and so on.
The result of this study is beyond doubt: John Lennon is most likely the author of the music. The authors of the study tested their concept on a group of eight bars of music rather than an entire piece, which seems more in the style of Paul McCartney. The conclusion is identical. The probabilities indicate that the song music about John’s memories of Liverpool is by… John. It remains to be understood why Paul’s memory of this composition is altered; for a clear answer, I believe we need to turn more to chemistry or pharmaceuticals…
Roger Mansuy teaches at Louis-Grand Lycée in Paris, and is a member of the French Committee for Maths Teaching (CFEM).