Can content analytics provide the answer to who wrote To Kill a Mockingbird and put an old rumor to rest?
Was the person who wrote this…
When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow. When it healed, and Jem’s fears of never being able to play football were assuaged, he was seldom self-conscious about his injury. His left arm was somewhat shorter than his right; when he stood or walked, the back of his hand was at right angles to his body, his thumb parallel to his thigh. He couldn’t have cared less, so long as he could pass and punt. When enough years had gone by to enable us to look back on them, we sometimes discussed the events leading to his accident. I maintain that the Ewells started it all, but Jem, who was four years my senior, said it started long before that. He said it began the summer Dill came to us, when Dill first gave us the idea of making Boo Radley come out.
The same person who wrote this?...
Since Atlanta, she had looked out the dining-car window with a delight almost physical. Over her breakfast coffee, she watched the last of Georgia’s hills recede and the red earth appear, and with it tin-roofed houses set in the middle of swept yards, and in the yards the inevitable verbena grew, surrounded by whitewashed tires. She grinned when she saw her first TV antenna atop an unpainted Negro house; as they multiplied, her joy rose.
Or the person who wrote this?...
The village of Holcomb stands on the high wheat plains of western Kansas, a lonesome area that other Kansans call ‘out there.’ Some seventy miles east of the Colorado border, the countryside, with its hard blue skies and desert-clear air, has an atmosphere that is rather more Far West than Middle West. The local accent is barbed with a prairie twang, and ranch-hand nasalness, and the men, many of them, wear narrow frontier trousers, Stetsons, and high-heeled boots with pointed toes. The land is flat, and the views are awesomely extensive; horses, herds of cattle, a white cluster of grain elevators rising as gracefully as Greek temples are visible long before a traveler reaches them.
Those who love books like I do will, of course, recognize quote 1 from To Kill a Mockingbird by Harper Lee, quote 2 from Go Set a Watchman by Harper Lee, and quote 3 from In Cold Blood by Truman Capote. (Memo to file – I need to reread In Cold Blood in a couple of weeks while on vacation.)
Before turning to text analytics, author attribution algorithms, and the long-simmering (and now rekindled) conversation about whether Ms. Lee actually wrote Mockingbird (or whether it was actually written by Mr. Capote), let me just for a moment assume that both Watchman and Mockingbird were written by Ms. Lee (which I actually think the case).
As lots of people have written (spoiler alert), Watchman is basically the first draft of Mockingbird, written from the perspective of an adult Scout, and describes, shall we say, a less flattering later version of Atticus than in Mockingbird. The story goes that Ms. Lee’s agent liked some of the elements of the story in Watchman, asked her to rewrite the novel from the perspective of the young Scout, and the rest is an amazing story of instant fame, a Pulitzer Prize and an Academy Award, the speed of fame even more amazing given that this tsunami occurred in the early 1960s, pre-social era.
And from there, overwhelmed by it all, Ms. Lee escaped to Monroeville, Alabama, never to publish again. There is controversy in the release of the book now given that the release was facilitated by Ms. Lee’s lawyer just a short time after the death of April Lee, Ms. Lee’s sister, and long-time protector.
My personal amateur book reviewer's opinion is that the book is OK. Given how much I love Mockingbird, it couldn’t help but be a disappointment. My guess is that a lot of greed is at the heart of the release right now, and not the right thing to do, which saddens me.
But enough of the American Literature class. This is a content management and content analytics blog!
Reading the book got me thinking about the whole question of applying text analytics to the author attribution question. A Google search yielded some interesting scholarly posts on the analytic question of determining author attribution. This one – Determining if Two Documents Are By the Same Author by Moshe Koppel and Yaron Winter – contained this equation,
which highlighted for me that any hopes of still retaining any limited knowledge from my three-quarters of a math/computer science degree from 35 years ago are long gone.
So I looked for something a bit more my speed. There was a good blog post by Ellen Gamerman (@wsjspeakeasy), Data Miners Dig for Answers About Harper Lee, Truman Capote and ‘Go Set a Watchman,’ that reviewed the author controversy, and points to a new study by text mining sleuths Maciej Eder and Jan Rybicki:
…the developers of a computerized text-analysis tool ran the long-awaited novel and Ms. Lee’s Pulitzer-Prize winning “To Kill a Mockingbird” through an algorithm that searched for signs of heavy editing, frequent rewriting and other influences. The findings, which attempt to shed light on a book that has sparked world-wide attention by an author who has famously declined to discuss her work, show Ms. Lee as the undisputed author of both novels but suggest that her style as a writer was more consistent in “Watchman” than “Mockingbird.”
Here is the direct link to the study, which is fun reading for those of us in the content analytics and semantic technology space, and also proof that academics can have a sense of humor -- Go Set A Watchman while we Kill the Mockingbird In Cold Blood.
The conclusion directly from their research:
This brings us to the Lee/Capote question, which is probably best answered by another method. We have already seen that they are (stylistically) very close to each other. Are they similar because they read the same books, or is there some degree of actual literary collaboration involved? Traces of mutual inspiration, copy-editing, and other ways of collaborative authorship have already been suggested. Since it is difficult to see overlapping stylometric signals in an entire novel, one can see much more when the novel is split into smaller fragments. The idea is simple. First, imagine a centipede. To inspect it using a microscope, we need to slice it into segments (it was already dead when we found it, of course). This allows us to see what’s inside the particular segments. Now, we go back to texts. The goal is to slice a given text – in our case, the Mockingbird – into equal-sized blocks and to apply the usual stylometric procedure to particular slices.
They then contrasted Watchman with Mockingbird and with Capote’s The Grass Harp for “stylometric consistency” to determine the dominant voice in Mockingbird. Their conclusion: “As it turns out, the claims about Capote’s alleged contribution to the Mockingbird are (mostly) unfounded, since a vast majority of segments are clearly classified to Lee.”
But note the “mostly.”
No matter which parameters are used, at the end of the novel there appear a number of…segments which clearly suggest that in this passage, Lee is more similar to Capote than to herself. Even more interesting is the fact that the passage in question exactly coincides with Chapter 28, which is... the climax of the novel: Scout, dressed up in a Halloween costume, is attacked by Bob Ewell; quite luckily for Scout, she survives and accidentally Ewell dies with a kitchen knife stuck under his ribs.
So there you have it. Content and text analytics to the rescue of a long-simmering controversy!
Check out Go Set A Watchman while we Kill the Mockingbird In Cold Blood – there’s more than I’ve summarized here, and I could actually (mostly) understand it.