So, term started, and there was a short hiatus, for most of which this post was in draft. But, it’s actually a little hard to work out how to address the papers given at the Digital Diplomatics 2011 conference briefly. I don’t want to go on at the length of the previous post, and ordinarily therefore I’d start by listing the programme, but since it, the abstracts and indeed the slideshows from the papers are all already online, it seems as if you’d already have gone there if you wanted. Still, I can’t think of another structure, and maybe the few things I want to say will spark your interest, so I’m going to use my usual one anyway, but with a cut at the halfway mark because, well, this goes on a bit.
- Jeroen Deploige & Guy de Tré, “When Were Medieval Benefactors Generous? Time Modelling in the Development of the Database Diplomatica Belgica“
- Žarko Vujošević, “The Medieval Serbian Chancery: challenge of digital diplomatics”
- Richard Higgins, “Cataloguing medieval charters: a repository perspective”
This first session had been supposed to feature Christian Emil Ore, but he had now been moved to a slot later in the program, and Mr Vujošević moved up to compensate because of a later speaker not being available as planned. The organiser were keen on keeping papers together that could talk to each other. Dr Higgins’s was however, I think, always going to be an outlier: hailing from Durham University Library, which has a charter or two, although his primary concern was as most others’ getting stuff on the web so it could be used, he was trying to do so as part of a much larger project of which very little else was charters, and much of what he said of trying to find data schemes that would do it all struck close to my old experiences. It helped explain to the more hardcore audience, I think, why libraries so rarely seem to do things with charters the way that digital diplomatists might wish. The paper by Deploige and de Tré, meanwhile showed the kind of thing that we should be able to do with large-scale diplomatic corpora—things like, for example, did people give more to the Church when they were rich and there was peace, or when the Black Death was right around the corner?—but was actually more about quite how difficult it is to digitise medieval dates into something computers can actually compare. They had the compromise of a reference date, computer-readable and therefore unhistorically precise for the most part, and a text field always displayed with it showing the range of possible dates, but this is a kludge, I know because I do it myself, it leads to sorting of documents that may be completely awry, and they had a range of improvements they were hoping to try. And Mr Vujošević, meanwhile, spoke almost as a voice in the wilderness, because although Serbian medieval charters are plentiful they are very variably edited, if at all, and much of his work had turned into battles to simply get the texts out of archives and into a single uniformly-featured database. All the speakers were therefore giving work-in-progress reports on fairly intractable technical and archival problems, but I’m not sure this was the theme the organisers had expected to emerge.
Coffee, however, restored our spirits, and I was able to swap stories as well as some useful software tips with Dr Higgins, so the sessions resumed in good order.
- Pierluigi Feliciati, “Descrizione digitale e digitalizzazione di pergamene e sigilli nel contesto di un sistema informativo archivistico nazionale: l’esperienza del SIAS”
- Francesca Capochiani, Chiara Leoni & Roberto Rosselli Del Turco, “Open Source Tools for Online Publication of Charters”
- François Bougard, Antonella Ghignoli & Wolfgang Huschner, “Il progetto ‘Italia Regia’ & il suo sistema informatico”
The latter two of these papers were given in Italian, or so my notes suggest, whereas the first one, with an Italian title, was presented in English! Figure that one out. Anyway, I don’t speak Italian, and though I was surprised by how much I could muddle out of it by reading the English abstracts at the same time as they spoke, nonetheless I didn’t get much. I will just note that the second paper was actually presented by all three authors, in segments, whereas the last was presented by Ghignoli alone, a pity as I’d like to have met M. Bougard, he does things that interest me. The first paper, although I did understand it, was essentially a verbal poster for this SIAS program, which is slowly chomping through Italy’s national archives and cataloguing them all. Since some 20-25% apparently don’t have indices for their charters at all, some exciting stuff will doubtless come out of this but that wasn’t what the paper was about. The second paper I could follow more or less because it was essentially a how-to guide on publishing such material, a presentation that may have missed its audience here. The third was where my language really just wasn’t up to it and I don’t know if what was being said was a demonstration of a remarkable project or just another one, but the project is a digital database with images of all Italian royal charters, seventh to twentieth centuries, and if you wonder as do I about what the later end of that might even be I guess we can go look…
By now we were running some way behind, and there was a brief attempt to cancel the next coffee break, which had already been over-run. This was largely ignored—punctuation for the day, as I had that morning been told—but with some grumbling things were got going with some time clawed back, and we continued.
- Camille Desenclos & Vincent Jolivet, “Diple, propositions pour la convergence de schémas XML/TEI dédiés à l’édition de sources diplomatiques”
- Daniel Piñol Alabart, “Proyecto ARQUIBANC. Digitalización de archivos privados catalanes: una herramienta para la investigación”
The former of these papers was notable for containing more acronyms and programming languages I think than any other at the conference, but this was partly because it was trying to explain the sheer variety of data schemas in use for charter material out there. By the end of this conference I think it was fairly clear to us all how this was happening: either new researchers don’t realise that there’s a toolset and a set of standards available to them and build their own, or, much more frequently it seemed (but then the former sort largely wouldn’t know about the conference, either…) they are aware of the tools but find them inadequate for their precise enquiry or sample and so modify them for their own purposes. The presenters argued that the widespread use of the TEI standard (explained last post but one) was making this easier for people to do, but that it also made it easier to link things back up again. The other paper, meanwhile, gave me great glee because it had my sort of material in it, documents in happily-familiar scripts and layouts, but what it also alerted me to was that for the period from when records begin in Catalonia to now, as a whole, a full 70% of surviving documentary material (of all kinds) is in private hands. Getting people to let the state digitise it, the point of the ARQUIBANC project, thus presents a number of problems, starting with arrant distrust and moving onto uncatalogued archives and getting scanners into somebody’s attic. Where this has been done, medieval material does come out, as indeed I knew from reading of the Catalunya Carolíngia for Osona and Manresa, where four of the tenth-century documents were revealed precisely by going and knocking on the doors of really old manors, but the size of the project as compared to the resources makes their considerable successes seem puny.
You will also imagine that I had much to ask Senyor Piñol, in shaky Catalan, afterwards on the subject of private archives, and he was helpful, but before very long we were being shuffled off to lunch, where I ate more pizza margarita than even I would have thought plausible in excellent company and felt pretty good about both these things on returning for the poster session and the last six papers.
Now I have pictures of the posters that interested me most, but I’m not quite sure what the ethics are about reproducing them. This is not quite like reporting on a public presentation, this is a lot more like reproducing someone’s actual publication. Figuring that the project staff probably won’t object to the advertising, however, I’ve tentatively embedded them below in illegibly small sizes but using the full-size pictures, loaded up the alt-titles with more information and linked the pictures through to the project sites. I do this not least because I want to talk about some of them, so I hope this will be forgiven and of course I will take down or replace with actual reduced-quality versions any that project owners may wish.
- The thing about the posters was that they almost all reported on finished projects. In some cases these were quite some while finished; the Volterra project (not the Volterra Project, obviously) at top right, for example, uses stuff from a Verona digital palæography project about which I first heard reports on in 2008, in the very same room where I first met Georg Vogeler, which was kind of how I came to be at this conference. They seem to have wrapped it up quite well. As for the Toulouse-Sorbonne project at bottom-left, you’ve all read about that, at least if you were reading this blog in 2008 (again), because it’s the big one using mathematical graph visualisations to study medieval social networks that generated so much discussion here. Since I’d just two months before got one of the project team to present at Leeds, it was nice now to meet another one, Nathalie Villa-Vialaneix, and that was all jolly good. The fourth poster is more contentious; not only had it apparently got damaged in transit, leading people to suspect that this wasn’t its first outing, but as you will know from my previous rant, the thing it was advertising didn’t actually work at that point in time, and this was a point I made to several people. I thought it disingenuous and still do.
But anyway. Over the course of this interval pretty much everybody’s coffee levels were now at an acceptable mark, and so we charged on into the afternoon.
- Paul Bertrand & Maria Gurrado, “Analyse formelle quantitative de la paléographie médiévale. Enquête sur un ensemble de « quittances » médiévales (comté de Flandre, 1275-1325)”
- Jinna Smit, “Automatic Writer Identification: the palaeographer’s new best friend?”
- Dominique Stutzmann, “Diplomatique et paléographie numérique : de nouveaux instruments de datation et d’attribution?”
Paul Bertrand and Jinna Smit – whom it was a pleasure to run into again and who cracked more jokes in her paper than anyone else, all of which did something for her point – were chasing the same things with their papers: when a palæographer detects differences or similarities between written scripts, is what they are detecting subjective or objective, can it be measured? And if so, would machines not be more accurate? Bertrand headed, tactfully, for questions that computers would obviously help with, large-scale comparisons that let him say things like, script is demonstrably getting more closely-packed and dense in Flanders over the thirteenth century, and ecclesiastical scribes are the most regularised. This was a perfect demonstration of how to talk about both methods and what they get you in the same paper, and had some extremely convincing visuals (as you’d expect, really). Jinna’s paper was more of a head-on attack on the computers-vs.-humans problem, however, having applied legal software used for handwriting recognition called GIWIS to a medieval sample, also from Flanders, in search of individual scribes. As a result she’d had to question quite a lot of the old scribal attributions in her corpus. She ended with a plea for projects to share their results, which was another thing that the conference was making clear the need for. Lastly Dominique Stutzmann, using I think the ARTEM database, spent some time demonstrating that scribes were not as consistent as we tend to expect, being ready to mix letter forms at whim; this and other subjective issues in the ways that writing was actually generated make it very necessary to have a good institutional context within which to try automated dating and attribution of scripts, but he thought it could be done well within those constraints with current tools.
MOAR COFFEE and then:
- Christian Emil Ore, “Interlinking Source Text Collections – a Norwegian Example”
- Redmer Alma, “CARTAGO and the semantic web: Digital Oorkondeboek Groningen en Drenthe“
- Aleksandrs Ivanovs & Aleksey Varfolomeyev, “Semantic Publications of Charter Corpora (the case of a diplomatic edition of the complex of Old Russian charters Moscowita-Ruthenia)”
This was the session into which Dr Ore had been moved, and this was because Johan Åhlfeldt, maintainer of Regnum Francorum Online, was not able to be present. This was a disappointment to me as I’ve had an eye on that site for a long time, but Dr Ore gave a clear and comprehensible description of his chosen solutions to dealing with an absolutely huge diplomatic corpus, 19,000 items in one of the three collections he was trying to unify alone. It was odd to hear someone else going along very similar lines to a project (partly) at the Fitzwilliam from which I still have a forthcoming paper, the one that took me to Vienna and which you can read about here if you’d like, but with completely different material; the problem, joining up several catalogues, nonetheless had the same solution in each case.1 Alma meanwhile made the excellent user-experience point that the searcher doesn’t know what’s in your database and that this means you must make sensible search terms explicit, responsive to human language and so on. But which language? People now don’t speak much Latin or, in his documents’ case, thirteenth-century Dutch. That makes semantic web technology not quite as simple as you’d like… There were a lot of good lines here, including a cri de cœur for a Name Encoding Initiative, but also the trite-but-true closer, “If the semantic web is anything… it is us.” Lastly came the Russian paper, which performed the odd feat of making several of these technical issues seem easy to fix and others so insoluble that they needed working round completely. The kind of loose structure that could be turned to many purposes which they were advocating seemed sensible but unambitious to me until Professor Ivanovs said that what he was now thinking of was a sort of “semantic Wiki system” to avoid the heavy work of encoding in detail, at which point I sat bolt upright or should have done: my notes at this point add “[THAT'S WHAT I WANT]” in capitals. They also had some interesting things to say about using ‘controlled English’ as a way to deal with some of the issues of normalisation, collapsing spoken or written languages down into more manipulable and uniform sense units, but it was the Wiki bit that, well, bit me. So I spent a while afterwards trying to extract software tips from Professor Ivanovs to deal with my persistent problem with my own data storage. He was kind enough to give me several, and I on the other hand have been too busy and hidebound to actually try using them, but the obvious one appears to be Semantic MediaWiki, and if I actually get round to it that may solve a good few pressing issues and get me away from Microsoft dependency, for all of which I shall have a lot of thanks due to this paper, this session and this conference.
There was no dinner organised this evening, and so Nathalie and I formed a plan to seek out local pasta. This succeeded, despite the restaurateurs having neither French nor English and we no Italian, and then, pleasantly full, I fell in with a Université de Nancy conference contingent out looking for beer, which we were still gently getting through when Napoli beat Inter Milan, which as I said made for a certain amount of local colour and noise for a while afterwards. The day had been quite heavy going in places; I’d spoken at least three languages, listened to five (and understood maybe two and two halves), parsed innumerable acronyms and drunk a lot of strong coffee, which may account for the poster anger, but I’d also got some damn useful tips, eaten pretty well and been extremely entertained by Naples and its denizens, so although it’s not how I’d want to spend every day of my life, I’m calling this one of the better ones. Now, next day it was my turn at the front, but I have to do some work before I get that far I fear…