Conferring in Naples, II: papers in the p. m.

Sala Conferenze in the Palazzo degli Uffici, Università degli Studi Federico II, Naples

Sala Conferenze in the Palazzo degli Uffici, Università degli Studi Federico II, Naples

The reason I had all the time I had to go sight-seeing in Naples was that the conference I was at didn’t start till the somewhat advanced hour of three o’clock the same afternoon. It then ran on till after eight, mind, but if you’re used to Mediterranean time-keeping that’s actually perfectly sensible. Anyway, I arrived slightly early, things started slightly later so I had the chance to catch up on old acquaintance here and there before the papers commenced. The venue for the first two days was the newer, less splendid part of the Università degli Studi di Napoli Federico II in the Palazzo degli Uffici, but on the other hand it was one of the most technically well-equipped conference centres I’ve been lucky enough to present in, full PA and three projection screens, etc. One thing to its detriment was that the designers seemed to have expected people to be using all three screens at once; everyone who was on only one found, I think, that the detail of their slides was hard to see at the size it finished up at. This may be another way of saying that all digital diplomatists pack their slides too full.

Frontage of the Università degli Studi di Napoli Federico II

The old building's kind of nice too, mind, and much easier to find

Anyway, there was an opening address that I understood only a third of (the third that was in French) and then there were some sessions. I’m conscious that much of the below is a bit technical, combining as it does computers and diplomatic, neither of them fields averse to labelling things with their own new words, so feel free to tune in next post if you’d rather. I’m guessing, though, that there are some people who would like to hear about it so I’m not skimping what’s necessary to make it meaningful for them. (Abstracts of all the papers are online, too, from this page which also has links to all our slides, so I’ll link the abstracts from the titles. Almost like being there!)

Linguistical Statistics

  • Nicolas Perreaux, “From Accumulation to Exploitation? Experiments and Proposals for Indexing and for the Use of Diplomatics Databases“, opened up with a dilemma that we would keep coming back to again and again through the conference: there are now some really impressive digital corpora of charters online or otherwise available, about 150,000 documents at the time of presentation, with which some incredible work ought to be possible, and very few projects actually exploiting them.1 He suggested that it might be because it still all needs sorting out and indexing, but really wanted to tell us about the testing of traditional diplomatic categories of document that he’d done in trying to work out if this could be done automatically. Basically, by lexical analysis as he did it, looking for distinct groups of words, only notitiae and episcopal acta are typologically distinct.2 Even this, he thought, was a start in separating things out, but I thought that actually he’d shown that the categories themselves are pretty much bunk. This fits with a documented tendency of mine to ignore work coming from the German Rechtschule however, and it’s hard to say which of us is thinking more progressively here. The advantage of it as a technique is that it’s interested in what the sources themselves emphasise, rather than looking for what we think they meant, but in that case I think we should be ready to define new categories. But then would anyone outside the digital field ever use the stuff? He ended, however, with another big point, which I’ve noted as: “There is no perfect software: what questions do you have?” This, also, many people would pick up, and I thought this paper was a marvellous choice for an opener as it really turned out to encapsulate some of the conference’s agenda.
  • I’ve gone on in detail about this one, because I resonated with a lot of it as you can tell (both in and out of phase), but I must be briefer with the rest.

  • Olivier Canteaut & Frédéric Glorieux, “Essai de classificiation automatique des actes royaux français (XIVe-XVe siècles)“, was as you can see related, and trying to get at the chancery practice revealed by a 13,000 document sample without being swamped by it. Their first problem was how much Latin inflection messes up searches for word groupings; their results were much better with French-language documents. This is going to have to be dealt with: surely we can come up with a Latin parser that rolls words back to their stems for these purposes? But it’s a bit of an overhead. They were instead going to come back round by assessing institutional culture and then matching the phenomena there to linguistic data, but again this seems to me to be giving up and using external models rather than letting the data speak for itself. Some useful issues highlighted all the same, however!
  • Michael Gervers & Gelila Tilahun, “Statistical Methods for Dating Collections of Medieval Documents“, was talking about the DEEDS project at the University of Toronto. I’ve heard a lot of flak for this project from outside so it was interesting to get a view from inside on what it was actually doing, of which however this was only a tiny part. Basically, DEEDS have the problem that most of their documents aren’t dated, so they were using linguistic analysis to see if they could do a sort of linguistical palæography and date them by language use.3 Here again the key turned out to be very small word-groups, ‘shingles’ as they were calling them, and with an analysis based on two-word shingles deployed on a test set with known dates they were able to get a mean error in dating down as small as nine years, although in some periods when documents were fewer the results were a lot less certain. Still far better than `undated’ however! Professor Gervers was happy to admit that the method still needed work but that it works at all was fun to see.
  • The session was split over a coffee break, which was the first time I came across an important regional phenomenon. I am told that the further south you go in Italy, the smaller and stronger ‘a coffee’ gets. The next day on the way to the conference I was introduced to Neapolitan coffee as served in street kiosks and well, yes, it’s about four thimblefuls of espresso for which you pay a Euro twenty, but you don’t need more that soon. My then-companion described it as “punctuation for the day”, which I loved. However, a lot of people at this conference were from a lot further north and expected larger servings. It went quickly. I had two little plastic mugs and then found myself a bit shaky (and as I type this up I’m off caffeine briefly so now I shiver with envy of my autumn self). But I was in no danger of nodding off!

  • Els de Paermentier, “Diplomatica Belgica. Analysing medieval charter texts (dictamen) through a quantitative approach: the case of Flanders and Hainaut (1191-1244)“, was trying something more directed than simple significance-fishing, deliberately looking for ways to recognise documents that were made by the comital chanceries of her two target counties from those that the recipients drafted and just got signed off. She was more or less able to do this, and thus establish that the chancery sometimes issues documents on behalf of relatives of the counts and lesser comital officials as well as just the counts. She also noticed that between 1200 and 1225 the chanceries also stood out for their modifications of phrases that everyone used into their own special versions, which I thought was quite interesting as the assumption is usually that documentary forms are set top-down, but of course over as short a period as that certain individuals could be entirely to blame. I actually don’t think that upsets the point…
  • Robin Sutherland-Harris, “Applications of the DEEDS database to Somerset Charters: dating, diplomatics, and historical context“, was a paper by another member of the DEEDS team but showing the way forward by effectively having used DEEDS as a test set for undigitised records she was using for her own project, which were likewise broadly undated (“thirteenth-century”, sort of level). The things that Robin said that most caught my ear was a mention of a word only two of her charters use for the action of enclosing land, ‘perprisere‘; that is a word I know and was surprised to see here.4 Since her sample was 45 documents, however, it also seemed to me that any changes she is picking up must be personal choices, and she was kind enough to agree that that would be a next step.
  • Timo Korkiakangas, “Challenges for the Linguistic Annotation of an Early Medieval Charter Corpus“, was an analysis of the charters from eighth- and ninth-century Tuscany, and was looking for software that would automate an analysis he was doing of variation from formulae, trying to sort willing alteration from mistakes. So far he had not found such software though he had tested a few! This was I think a good thing for the conference, because it caught again this mismatch between tools and historical enquiry that we had already come up against in the first paper.

Key Note

    The closing presentation of the evening was given by Benoît-Michel Tock, under the title, “Digital Diplomatics, Magic Diplomatics”. This was essentially a retrospective from a man who has seen almost all of this stuff happen, having been involved for a long time with the ARTEM project that eventually went online via TELMA, putting all French original charters from before 1121 on the web. So he knows, and could also tell us that the copies are about to follow the originals up. The problem, of course, is in using these resources together: not all are full-text, different standards of mark-up have been used,5 and of course not everything is very relevant to everything else anyway. Little problems like the fact that many projects only digitise the faces of their charters make certain projects no more possible.6 Even something as basic as lists of what material there is and ideally, where it’s been written about) is often lacking, although weirdly (my point not his) this is where Anglo-Saxon diplomatic began the process, even though its sample is so much smaller. (Perhaps because of that!) There was, too, the very salient and often bitter point that it is both more interesting to do and easier to fund new projects than the maintenance of old ones, something anyone who has worked on an old database which people actually use knows all too well. Also, and this I thought was notable in a very digital conference, he pled for the continuation of publication of printed editions of charter material, arguing that anyone who has worked with eighteenth- or nineteenth-century editions knows that they are far from useless and that they will outlast many a format change and institutional bankruptcy. As long, of course, as we don’t ‘kill all the libraries‘… I think, quite frankly, we should be seeing print now primarily as a stable archive format, which is something we really do not have in the digital world, and so entirely agreed with this point, hence my emphasis here.

And then there was a very excellent dinner, in which I met some very cool people (despite the geekiness of the subject, there were a lot of cool people here), got to tell my father’s story about a friend of his who turned off a god’s electricity and so on, and it was all very good. Nonetheless, when I get to the second day’s report, it’s going to have to be briefer isn’t it? Sorry about that…


1. To name only some of the resources that he did (i. e. those I knew well enough to note): the numerous databases hanging off TELMA : traitement électronique des manuscrits et des archives, MŌM: Europe’s virtual documents online (a. k. a. Monasterium.net), the various online publications of the Fundació Noguera (on which more here), DEEDS as above and Chartae Burgundiae Medii Ævi (already lauded here).

2. If by some chance you don’t understand these terms but would like to, I’m afraid there is little of use online or (non-exclusive) in English (though it’s possible that there is stuff of use online not in English: anybody know some?) and you should probably start with either Olivier Guyotjeannin, Benoît-Michel Tock and Michel Pycke, Diplomatique Médiévale, L’Atelier du Médiéviste 2 (Turnhout 1993) or now Reinhard Härtel, Notarielle und kirchliche Urkunden im frühen und hohen Mittelalter, Historische Hilfswissenschaften (Wien 2011), whichever you’re more linguistically comfortable with, though note that Härtel does not cover royal documents; he gives plentiful reference to those who do. If you are stuck with only English there’s a short piece that by now really needs replacing in the form of Leonard E. Boyle, “Diplomatics” in James M. Powell (ed.), Medieval Studies (Syracuse 1976), pp. 82-113, and that at least is online (as far as I can see completely and free of thumbs) via Google Books.

3. And having heard this paper I must really now get round to reading my copy of the presumably related Michael Gervers (ed.), Dating Undated Medieval Charters (Woodbridge 2000), which was something of an aspirational purchase I-won’t-say-how-long ago.

4. See J. Jarrett, “Settling the Kings’ Lands: aprisio in Catalonia in perspective” in Early Medieval Europe Vol. 18 (Oxford 2010), pp. 320-342 at pp. 335-336.

5. There were some jokes made in the papers the next day about the number of different Encoding Initiatives there seem to be in our worlds, all spinning off the Text Encoding Initiative, a project that was begun to develop an International Standard for the digital encoding of texts, but it was an odd truth that the Charters Encoding Initiative, started because TEI didn’t then quite supply the mark-up it was felt charters needed, was not in use by any of the people presenting at this conference, almost all of whom had opted to use TEI instead so as to increase interoperability with other projects.

6. I think that the first team to break the mould here were probably the guys publishing the facsimile editions of the St Gallen material, which very often has preliminary versions drafted on the dorse of its charters, but they may have just been the first ones I noticed: Peter Erhart (ed.), Chartae Latinae Antiquiores (2) 100: Switzerland 3 (Zürich 2006), Erhart & Bernhard Zeller with Karl Heidecker (edd.), … 101: Switzerland 4 (2008), Erhart, Zeller & Heidecker, … 102: Switzerland 5 (2009), eidem (edd.), … 103: Switzerland 6 (2010) and eidem (edd.), … 104: Switzerland 7 (2011), with vol 105, Switzerland 8, to follow. But of course this is neither online nor open-access.

6 responses to “Conferring in Naples, II: papers in the p. m.

  1. Me! Me! I’m interested…

    In fact, I’m still only halfway through a deep reading of the detail you’ve provided, but in terms of the problems people may have with text recognition in Latin (and other inflected languages) they may want to consult a team down here at Monash, a collaboration between historians and computer scientists, who have adapted student plagiarism software into a butt-kicking Latin text recognition package. They built it for identifying the sources of the Speculum Morale, but it’s now being extended into a wider tool, and members of the ‘public’ can upload texts to it on a temporary basis to run them through. (Sadly the server space just isn’t big enough for everyone to store all their pet stuff on there permanently, and in any case there may be IP issues if they’re using published texts in large chunks…) But ANYWAAAAAY… The software is called Factotum, and URL for those interested is:

    http://viper.infotech.monash.edu.au:4277/

    I’m not up with the actual programming side of it, so for technical queries, contact the project team (emails on the website).

    • That’s interesting stuff, though I can see greater application for it still as a plagiarism detector than for source work, just because for a number of things one simple approach is to fling text strings at the Patrologia Latina database and see what sticks… But there must be wider use cases where that older work-around wouldn’t serve. The inflection parsing is still the question though isn’t it? I may have to ask it of them. Thankyou for the lead (and the interest)!

      • Yes, do. The *really* cool feature is that not only can it detect inflected words but it can also accommodate a certain degree of syntactical change, within margins set by the operator. It can lead to a few false positives, but many more true hits than you would otherwise achieve.

  2. Pingback: Conferring in Naples, III: a full day’s talking « A Corner of Tenth-Century Europe

  3. Pingback: “Digitale Diplomatik 2011″ – Tagungsbericht | Institut für Dokumentologie und Editorik

  4. Pingback: Conferring in Naples, IV and final: clarity, confusion, coffee and photos « A Corner of Tenth-Century Europe

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s