Reflecting

I'm having another sorting/throwing-out blitz and today came across some notes I made on a course I did on language in 1997, back when external university courses were affordable. John Simpson, then editor of the Oxford English Dictionary, was talking about the frequency of word usage in English, based on the British National Corpus, a 100-million word collection of samples of written and spoken language, collected between 1991 and 1994 and designed to represent a wide cross-section of British English*.

I'd forgotten that he told us that the pronoun 'he' is the ninth most commonly used word in the Corpus while 'she' is the 42nd.

I hoped I could see how things have changed in the last 22 years but the Corpus has been only minimally updated. So instead I had a look at Google Ngram - an analysis of all the books (published between 1800 and 2008) it has digitised. What's especially fun about this is how easy it is to compare different words

Let's unpack this graph a little:
He does much more than she does. It is more often done to her than to him (probably - 'her' can also be equivalent to 'his' so this is not a direct comparison). Most things and attributes seem to be his, not hers.

Well, fancy that!

* The written part of the BNC (90%) includes, for example, extracts from regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, school and university essays, among many other kinds of text. The spoken part (10%) consists of orthographic transcriptions of unscripted informal conversations (recorded by volunteers selected from different age, region and social classes in a demographically balanced way) and spoken language collected in different contexts, ranging from formal business or government meetings to radio shows and phone-ins.


Getting distracted like this makes the tidying extremely slow.

Comments
Sign in or get an account to comment.