In a volume that is now safely back in Cambridge University Library, I read something that annoyed me slightly in what was otherwise a mildly interesting paper about the use of prepositions instead of declension in the letters of Braulio of Saragossa. I realise that sounds unlikely to be interesting, but it provides a useful way to assess the supposed decline of Classical Latin if you take one of the great minds of Visigothic Spain as the marker standard. (He was pretty solid but there are particular cases, especially the use of per + acc. for agency rather than ab + abl., where he shows that things had changed since Cicero, since you ask. Oh, didn’t you? Well, thank heavens I only gave one example then.) So anyway, what’s the problem, Jarrett? Well, the problem lies in this line about that very phenomenon:
Estos datos son, nos parece, pruebas del empleo creciente de per para introducir el agente, aunque domine con mucho (en un 87%, 53 citas) la construcción clásica con ab.1
I won’t blame you if you don’t see a problem there. They don’t teach historians statistics, and even when historians learn statistics it’s often apparently only so as to mislead and bamboozle. I remember when I had to be told this myself, in fact, and it wasn’t so long ago. But the thing is, in order to have a percentage that’s meaningful you need a hundred of something. If your 87% is made of fewer than 87 units you shouldn’t be using percentage at all.
But surely Jonathan this is mere pedantry, I hear you cry. But it isn’t, it can matter. It’s an issue of significance and, in malicious cases, spin. Calling something a percentage implies reliability when compared to other things that are called percentages. Let me give you an example from my own work. One of the many papers currently being rallied between me and reviewers is one about persons with Arabic names in Christian León.2 In it I compare the presence of these people in the documents of three separate archives, this being a rough way to approximate a distinction by geography that would be hard to do by any more precise criterion. So, at one point I write:
At Oviedo, then, in the period of these charts 3 (out of 8 surviving) documents feature Arabic names; at Sahagún 81 out of 134 do (60%); and at León the proportion is 166 out of 226 (73%).
The reviewer that I have still to satisfy wanted to know at this point why I hadn’t given a percentage for Oviedo, and my response, not as expressed in e-mail but as shouted to an empty room when I read the comments, was roughly, “Because it wouldn’t mean anything!” Consider. The percentage, in the sense of ‘what you get if you divide 3 by 8 and multiply by 100′, is 37·5. Now. If I had a round two hundred documents that featured Arabic names from some fourth archive that I had included—and Celanova may even have that many—then if someone found one more, that would make a flybite of difference, half a percent, and my figures would be pretty safe.3 At León, one extra would make less difference even than that; my actual percentage to 2 significant figures would be the same. At Sahagún, 61%, a change but hardly a paper-destroying one. But if at Oviedo they opened a secret chamber and found some originals of documents that Bishop Pelayo altered, and one of them was from the relevant period and out of thirty witnesses at one hearing one, just one was called Zeiti or something, the ‘percentage’ that I’d given at Oviedo would be 44%, a leap of 7 percentage points, implying seven times as much shift as it would be at León or Sahagún. To put it another way, giving a percentage at Oviedo would be to imply that those three people in that tiny sample, years or decades apart from each other, are as significant to our understanding of this group as eighty-five people from León, whereas actually, because they are statistically so insignificant they probably don’t actually tell us anything. They’re so odd that each can probably be explained as a one-off: for example, two of them are frequently-appearing royal courtiers who also appear in the León documents… Whereas 85 people with such names at León do tell us something about what people were at court, perhaps where they’d come from and what it was fashionable to do or be there.
So, next time you’re tempted to use that pair of circles and a slash, well, consider if a fraction might not serve you better if a mathematician happens to be in the audience…
1. Maria Luisa García Sanchidrián, “Del sistema casual a las preposciones : una muestra en Braulio de Zaragoza” in Maurilio Pérez González (ed.), Actas, II Congreso Hispánico de latín medieval (Leon, 11-14 de Noviembre de 1997) vol. I (Leon 1998), pp. 483-491, quote at p. 491.
2. Jonathan Jarrett, “Arabic-named communities in ninth- and tenth-century Asturias and León, at court and at home” in Journal of Medieval Iberian Studies Vol. 2 (London forthcoming).
3. And actually Celanova was the first of these groups to be studied, by Richard Hitchcock, ‘Arabic Proper Names in the Becerro de Celanova’ in David Hook & B. Taylor (edd.), Cultures in Contact in Medieval Spain: historical and literary essays presented to L. P. Harvey (London 1990), pp. 111-26, now much expanded in Hitchcock, Mozarabs in Medieval and Early Modern Spain (Aldershot 2008), pp. 53-68. The Celanova documents have also now been edited in José Miguel Andrade Cernadas (ed.), O Tombo de Celanova: estudio introductorio, edición e índices (ss. IX-XII) (Santiago de Compostela 1995), though lots of problems with this edition have been noted.