‘I know no speck so troublesome as self’: Finding Middlemarch through Corpus Linguistics

Dr Rosalind White, (@DrRosalindWhite on Twitter) research associate at the University of Birmingham’s Centre for Corpus Research and on #FindingMiddlemarch at Royal Holloway, University of London, proposes a way into George Eliot’s Middlemarch using corpus linguistics.

In this blog post, I’d like to explore how corpus linguistic tools can be used to illuminate the semantic texture of George Eliot’s writing. I make use of the CLiC Web App (Mahlberg et al. 2020). From the outset, George Eliot frames Middlemarch: A Study of Provincial Life (1871) as a pseudo-scientific sample of the complex human dynamics that can be found in a provincial town. Using the metaphor of an optical microscope, the narrator vows to concentrate ‘all the light [they] can command’ on ‘unravelling certain human lots and seeing how they [are] woven and interwoven’ (102).

Even with a microscope directed on a water-drop we find ourselves making interpretations which turn out to be rather coarse; for whereas under a weak lens you may seem to see a creature exhibiting an active voracity […] a stronger lens reveals to you certain tiniest hairlets which make vortices for these victims while the swallower waits passively at his receipt of custom.

A ‘strong lens applied to Mrs Cadwallader’s match-making’, Eliot suggests, will show a parallel ‘play of minute causes’ (41). This metaphorical microscope reappears throughout the narrative. Ineffectual intellectual Edward Casaubon — who ‘dreams in footnotes’ — is so single-minded that Mrs Cadwallader suspects a drop of his blood under a slide, would reveal ‘all semicolons and parentheses’ (49).  Ego, Eliot infers, can act as a ‘tiny speck very close to our vision’ that ‘blot[s] out the glory of the world and leave[s] only a margin by which we see the blot’, for there is ‘no speck so troublesome as self’ (589).

Fig. 1 Middlemarch (Edinburgh and London: William Blackwood & Sons, 1871-72) with original green decorated wrapper. Fig. 2 Middlemarch, Fair Copy, Manuscript Add MS 34034, British Library.

Corpus linguistics can be seen as a form of “distant reading” (Moretti 2013, Mahlberg & Wiegand 2020, Froehlich 2018). It is a method that allows us to momentarily look past the haziness of our own subjectivity and obtain a panoramic perspective by making the large-scale aspects of literature more visible. Digital tools like CLiC can be used to help us map themes, characterisation and cultural trends across a given corpus. 

Reconciling quantitative data with qualitative analysis is a method in keeping with George Eliot’s characteristic use of empathy to transport her readers beyond the margins of their own subjectivity. As Eliot’s narrator famously puts it:

That element of tragedy which lies in the very fact of frequency, has not yet wrought itself into the coarse emotion of mankind; and perhaps our frames could hardly bear much of it. If we had a keen vision and feeling of all ordinary human life, it would be like hearing the grass grow and the squirrel’s heartbeat, and we should die of that roar which lies on the other side of silence. As it is, the quickest of us walk about well wadded with stupidity. [emphasis mine]

Middlemarch is a novel crammed with characters blighted by a narrowness of vision — fromthe myopic Dorothea who ‘always see what nobody else sees’ but ‘never see what is quite plain’ (23) to Dr Lydgate, blinded by ambition to the ‘hampering threadlike pressure of small social conditions’ (132). It is therefore a text uniquely receptive to the use of such “distant” methods.

Provincialism on the Page

Using CLiC to run a concordance on the word ‘Middlemarch’, we can observe that the town itself is frequently employed as an adjective — from ‘Middlemarch habits’, ‘Middlemarch politics’ and ‘Middlemarch gossip’ to more peculiar phrases like ‘in a Middlemarch light’, ‘Middlemarch phraseology’ and ‘the limits of Middlemarch perception’. That something as idiosyncratic as one’s perception could be considered quintessentially ‘Middlemarch’ speaks to the sheer embeddedness of Eliot’s characters in the culture of their town. Even more mundane articles (like ‘Middlemarch medicine’ or ‘Middlemarch lodgings’) are presented as wholly inextricable from their locality.

Graphical user interface, text, application

Description automatically generated

Fig. 3 A sample of concordance lines highlighting collocates to the left of ‘Middlemarch’ that denote departure, via CLiC v. 2.1.2., collocates highlighted with the KWICGrouper.

Figure 3 is a concordance for the word Middlemarch, i.e., the word is displayed in the centre with a certain amount of context on the left and the right. Interestingly, Middlemarch repeatedly occurs alongside collocates that denote departure like leave, leaving, left, quit, and quitting (collocates are words that occur repeatedly on the left or right of a search word). This is despite the fact that (barring the epilogue ‘finale’ and the honeymoon trip to Rome) there is no point at which a character expressly departs from the town.


Description automatically generated

Fig. 4 Percentage of Coventry women in 1851 born less than 10km away. Just 10.44% of women and 11.71% of men in Coventry were born more than 50km away from their current residence; in contrast, as many as 73.49% of women and 71.51% of men in Coventry resided less than 10km away from their birthplace. Data via PopulationsPast.Org.

In her twenties, George Eliot compared her own provincial existence to the ‘walled-in world’ of a David Wilkie genre painting (Eliot 2010: 76). In later life as an author, however, she effectively reframes the humdrum particulars of provincial life by bringing them into sharper focus. As Ruth Livesey has put it, Eliot’s mode of realism is laced with a radical ‘insistence that a picture of the commonplace world is entitled to full colour’ (Livesey 2020: 11). Using the nineteenth-century reference corpus provided by CLiC to generate a list of key words, this ‘luminous detail’ (Livesey 2020: 1) is immediately apparent. Amongst character names like Dodo and “plot” related words like vote or medical, more domestic words like furniture appear at a higher frequency. There are 47 instances of furniture in Middlemarch, whereas Charles Dickens’ Great Expectations, for example, uses furniture 7 times, (so 147.78 per million words vs 37.81 per million words). Solely from the term furniture, a miniature narrative emerges in which honest, hardworking characters like Mr Farebrother or the Garths are unconcerned by the state of their furniture (lines 1-4) while those with ‘spots of commonness’ like Dr. Lydgate agonise over the subject (lines 5-8)


Description automatically generated

Fig. 5 A sample of 8 concordance lines of 47 of furniture in Middlemarch, via CLiC v. 2.1.2.

Also notable is the word light, which is used in both a literal and metaphorical sense over a hundred times: from the ‘wondrous modulations of light and shadow’ on an old, thatched roof ‘full of mossy hills and valleys’ to Will Ladislaw’s impish smile described as ‘a gush of inward light’.

Tracing the Trappings of Gender

Examining concordance lines that include female pronouns like she, her, and hers, in comparison to the male pronouns he, him, and his, the tethering of women to a certain sphere of existence is immediately noticeable.  There are 22 instances of her on the left of marriage, but only 12 examples of the cluster ‘his marriage’. The phrase ‘her husband’ is used 153 times (largely in reference to the internal ruminations of Rosamond and Dorothea), but ‘his wife’ is used only used 62 times. Significantly, the cluster ‘her husband’ frequently presents in reference to a female character carefully observing, or even micro-managing, the emotional state of her husband. As Figure 6 shows, Dorothea’s own emotions are inextricably wedded to those of her husband: from her delight ‘at seeing her husband less weary than usual’ (line 10), to her private vow to vanish the morning’s gloom ‘if she could see her husband glad at her presence’ (line 7).

Graphical user interface, text, application, email

Description automatically generated

Fig. 6 Concordance lines generated by ‘her husband’, via CLiC v. 2.1.2. Note the many references to a wife observing her husband’s emotions.

There is a parallel discrepancy in the adjectives used by various characters to refer to the opposite sex: plain, single, and foolish all collocate exclusively with ‘women’, while professional, medical, intellectual and clever collocate exclusively with ‘men’. 

There is a visible difference between ‘money’ collocating with male pronouns vs female pronouns.  His repeatedly occurs in the first position to the left of ‘money’, but her only does so on one occasion. Her is on one occasion separated by the world ‘superfluous’ (line 3), coming in as a modifier of money, and the only example of her presenting to the immediate left of money refers to Dorothea speculating on how she may help a future husband. Upon closer inspection, every example of money collocating with a female pronoun also refers to a female character’s husband, father, or brother: from Mary handing her hard-earned money over to Caleb Garth (line 8) to Rosamond and Lydgate’s joint financial difficulties (line 7). (For more on corpus methods to describe the gendered world in nineteenth century fiction also see Cermakova & Mahlberg 2022 & forthcoming).

Graphical user interface, application

Description automatically generatedGraphical user interface, text, application, email

Description automatically generated

Fig. 7 A comparison of male and female pronouns collocating with ‘money’, via CLiC v. 2.1.2.

Finding Empathy & Authorial Allegiance

To conclude, I’d like to draw attention to the use of corpus linguistics as a skeleton key that quickly provides us with access into a character’s mind. Instances in which the word but presents to the immediate left of a character’s name can easily be used for this purpose (Fig. 8). In line 1, for example, ‘but’ directs us to the romance taking place purely in Rosamond’s mind: where ‘every look and word’ is the subject of ‘eager meditation’.

Graphical user interface, text

Description automatically generated

Fig. 8 A comparison of male and female pronouns collocating with ‘money’, via CLiC v. 2.1.2.

This also serves as a way to track authorial allegiance to various characters (Dorothea Lydgate and Rosamond appear in this configuration the most frequently). Emotional epiphanies, self-reflection, and inward anxieties are all expressed under these conditions.

Graphical user interface, text, application

Description automatically generated

Fig. 9 Characters in Middlemarch that collocate with ‘poor’, via CLiC v. 2.1.2.

The word felt (occurring to the immediate right of a character’s name) can be tracked in a similar way, as can the word poor (to the immediate left).  Dorothea, Rosamond and Lydgate accrue the most empathy in this manner, closely followed by Casaubon and Fred Vincy. Interestingly, despite the fact they collocate frequently with felt and but, neither Mr. Bulstrode nor Will Ladislaw collocate with poor. Moreover, both Lydgate and Casaubon collocate considerably less with poor than they do with felt or but. As is evident in Figure 9, the word poor is often used by Eliot as a direct appeal to the reader for empathy (lines 10-11 & 14-17). Conscious, perhaps, of the fact that male readers might find it more difficult to connect with women, it is a stylistic device that Eliot seems to keep in reserve for her female characters (Rosamond and Dorothea collocate with poor 17 and 16 times respectively).

Chart, line chart

Description automatically generated

 Fig. 9 A line graph tracking the rate at which characters collocate to the left of ‘felt’ or to the right of ‘poor’ or ‘but’.

Concluding Remarks

Scholars of nineteenth-century literature have long regarded “close reading” as the cornerstone of literary analysis (such methods rest on the belief that one can extract the intrinsic themes of a text by zooming in on certain passages). This post has, I hope, demonstrated what can be gained from reconciling “close” and “distant” reading methods through corpus tools. Middlemarch is a famously tightly woven novel that subtly knits together the narrative of multiple disparate individuals into a collective whole. It is a novel that is best observed at multiple scales; for as Eliot herself noted in The Mill on the Floss ‘there is nothing petty to the mind that has a large vision of relations.’


This post is part of the AHRC funded project ‘Finding Middlemarch in Coventry 2021-22’ led by Professor Ruth Livesey (Royal Holloway, University of London) and Professor Redell Olsen (Royal Holloway, University of London). The project will culminate in ‘Of that Roar Which…’, an experimental short film by Redell Olsen.

 Tickets for ‘The Great Middlemarch Mystery’ — an immersive multi-location theatre experience, researched and co-developed by Professor Livesey and produced by Dash Arts — are available now! The play will take place in Coventry’s Cathedral Quarter from Thursday 7- Sunday 10 April 2022.

Join the conversation via our blog on Twitter with #FindingMiddlemarch 

Please cite this post as follows: White, R. (2022) ‘I know no speck so troublesome as self’: Finding Middlemarch through Corpus Linguistics [Blog post]. CLiC Fiction Blog, University of Birmingham. Retrieved from [https://blog.bham.ac.uk/clic-dickens/2022/03/31/finding-middlemarch-through-corpus-linguistics/]