A Lexicographic Analysis of Similarities in the Pauline Epistles

Papyrus 46, showing 2 Cor 11:33-12:9 (cropped)

Using Python 3

I did a computer analysis of the word frequencies in the Greek New Testament to examine the following question:
Which Pauline Epistles were actually written by Paul?

By examining sometimes minute differences in the frequencies of the most common words (prepositions, conjunctions, and pronouns - the Greek equivalents of the, for, is, I, etc.) I was able to uncover remarkable agreements with modern critical New Testament scholarship regarding stylistically similar and dissimilar Pauline epistles. My analysis reaffirms the authenticity of books like Romans, 1 Corinthians, 2 Corinthians, and Galatians while simultaneously confirming the possible pseudepigraphical ("forged") nature of Ephesians, Colossians, 1 Timothy, and Titus.

However, my analysis contradicts modern scholarship on the authenticity of 2 Timothy and suggests that it is far more Pauline than it is currently believed to be. (The analysis also contradicts modern scholarship on Philemon but this is most likely the fault of the analysis.)

Entire Greek New Testament with Strong's numbers. Source (github) | CSV file (9.1 MB)
reader.py (1.8 kB) - script for importing the csv file above
nt.py (5.7 kB) - primary script for all the computations, uses reader.py
twenty.txt (16.6 kB) - the contents of appendix A (twenty most common words in each book, listed with the respective frequencies)
similarities.txt (21.5 kB) - the contents of appendix B (contains the a-values for all pairings of NT books, ranked by book)
nt.tex (78.7 kB) - LaTeX file for the pdf above

Proof of concept

If you're skeptical as to whether this method works, I have done a simple proof-of-concept using books written in English.

The first test: I compared five books by Malcolm Gladwell (Blink, David and Goliath, Outliers, Talking to Strangers, and The Tipping Point) and Michael Lewis (Moneyball, The Big Short, The Fifth Risk). Both are non-fiction authors. The analysis worked well - correctly identifying the three most "non-Gladwellian" books.

The second test: I looked at two books by Hemingway: The Old Man and the Sea + A Farewell to Arms. I selected 5 5000 word chunks from the first book and 3 5000 word chunks from the second book. The analysis worked well - correctly determining the odd ones out.

The third test: I compared several works of 20th century American fiction - five books by Hemingway, The Great Gatsby (Fitzgerald), and The Catcher in the Rye (Salinger). The analysis failed - the works of Hemingway were much less inter-similar than expected: several works were more similar to The Great Gatsby or The Catcher in the Rye than they were to the other Hemingway works. I chalked up this error due to the tendency for fiction authors to vary their writing styles significantly.

Disclaimer 1: This was mostly a personal project to practice my programming skills. I was surprised by the results, especially since I did not need to optimize anything to attain them. You may be equally surprised! However, I did not do any research into how other people have analyzed the New Testament with computer methods, so it is possible that my methods are either too simple or too wrong to be considered any kind of relevant contribution to the complex, divided field of New Testament studies.

Disclaimer 2: Any information about current scholarly consensus should be taken with a grain of salt. First of all, most outside information in this paper came from Wikipedia. At the very best, it's a bad research practice and at the very worst, the information is wrong. Also, please note that both Wikipedia and modern critical analysis are skewed against conservative religious thought due to the general overrepresentation of progressive scholarship in universities. Modern scholars are extremely quick to reject claims to authorship on the basis of occasionally minute details.

Disclaimer 3: For the Christian, rejecting the teachings of certain books because they are pseudepigraphic is dangerous business. Many times, they are considered pseudepigraphic largely in part due to the presence of "inconvenient" doctrines (misogyny is the biggest culprit) or simply because it seems different. "If we lose inerrancy, then it becomes a very slippery slope then to making the Bible be whatever we want it to be. We just say that everything we don't like is not authentic. In that case the Bible is no longer the Word of God but just a source to defend what you already believe." --Larry Lin (pastor at the Village Church Hampden in Baltimore, MD)

What is the Bible?

If you are not familiar with the Bible, I can give a short introduction here.

The Bible is the authoratitive religious text of Christianity. Protestants, Catholics, and Latter-day Saints all use a Bible (but slightly differing ones). I'll be talking about the Protestant Bible here.

Christians believe that the Bible is divinely inspired, that it was written by men in their languages but inspired by their God. The Bible is split into two parts: the Old Testament and the New Testament, each of which consist of many individual books with different authors.

The Old Testament is also, approximately, the primary religious text of Judaism. Its original language is Hebrew. The Old Testament consists of the creation myth (Genesis), the Jewish law, the history of the Jewish people from Abraham through David and onward, various books of poetry and wisdom, and various books of prophecy. Some of this prophecy was fulfilled before the life of Christ, some of it was fulfilled by the life of Christ (at least this is what Christians believe), and some of it has not yet been fulfilled.

Christian faith relies solely on the deity of Jesus Christ and the Old Testament is first and foremost a Jewish text, but the Old Testament is relevant because Christianity and Judaism are uniquely intertwined.

The New Testament is not shared by Judaism and Christianity. It is written in Greek. It begins with the "gospel", four books recounting what is essentially the same history: the ministry of Jesus Christ in Palestine, his death by crucifixion as ordered by the Roman governor Pontius Pilate (in office 26 - 36 CE), and his resurrection and appearances to many people following his death.

That is followed by the Book of Acts, a historical account of the founding of the Christian church and its spread throughout the Roman Empire within the span of a few decades. Of particular importance are the missionary journeys of the Apostle Paul.

The latter half of the New Testament consists of letters from church leaders like Paul, James, Peter, and John to churches throughout the Meditteranean. They contain instructions for Christian living, clarifications and restatements of theological doctrine, and general encouragement. The majority are written by the Apostle Paul, and these are termed the Pauline Epistles.

The final book is a book of apocalyptic prophecy, called Revelation.