Digital cooperation and charters

So on Monday night at Leeds Allan McKinley was observing to me how perverse it was that we charter specialists all had our various pet archives that we know well, he Wissembourg and Worcester, I Vic and Sant Joan de les Abadesses, Wendy Davies Redon, Llandaff, León, Sahagún and heaven knows how many more– and whenever we talk to each other or present research, one of the others of us in the audience perks up and goes “oh, I’ve got something like that but I’ve always thought it was weird” or “oh no I get that all the time, I read it as a factor of such-and-such a phenomenon” and generally we learn a lot from comparison. Case in point, several of the court judgements Wendy was recently talking about exist in preservation only because the impossibly-high compensation was silently paid with a land grant and we later see the land transferred to a church. That doesn’t explain all of the court cases I’ve got preserved in my stuff which don’t apparently feature land but I bet it explains some of them. So, Allan asked, why aren’t we in proper touch asking each other these questions more? Why isn’t there a mailing list or something? And that seemed like a very good point.

King Alfons I of Aragón-Catalonia and his Prime Minister conferring in the Arxiu Comtal de Barcelona

King Alfons I of Aragón-Catalonia and his Prime Minister conferring in the Arxiu Comtal de Barcelona

Then on Tuesday morning, as I described, Georg Vogeler was observing, with feeling, how perverse it was that so much charter stuff was going on to the web, and he gave an extensive list that I’ll come to in a moment, and yet there was no intent to share all this data. Instead people are publishing in their own name as the industry demands, and the power of digitization is damped by all these people with their pet archives putting them out piecemeal using a unique standard that works only for them. Not only should this data be shared, he was arguing, because of the possibilities of being able to make comparisons like that all over Europe, but when people are doing encoding of charters they should do it to a standard, so that this data can easily be shared in future. And he had such a standard ready, which he and his collaborators are trying to get to a point where it can be submitted for ISO recognition. And they do have a mailing list, though I’m not sure how much German-language XML chatter I can handle—and that may be unfair because everything around the website of this initiative suggests that it’s being conducted in English. But anyway. I feel sure these two impulses can be brought together in some way, because they both seem to be dedicated to getting into this state where we can find out easily what otherwise it takes conference papers and years of networking or else reading thousands and thousands of charters to find out, that is, whether this phenomenon we’re looking at is an aberration or an explanation.

Because there really are thousands and thousand, this is one of the things that came clear out of Dr Vogeler’s presentation. I stop much too early for most of these corpora (I try and make my cut-off date 1030), but even before that there are seven thousand odd documents from Catalonia, 1700 from Cluny alone, rather more than that though much less informative from Lorsch, probably three or four thousand from Italy, and *ahem* maybe 1100 from England. (Sorry guys.) I wouldn’t even want to guess how many from France or Germany as wholes. And, in addition, a great many of these are online and it’s increasing. Heck, the Tablettes Albertini are online, after all, how hard can actual charters be? Here the Italians seem to be leading the way: the online Codice Diplomatico dei Lombardia Medievale (secoli VIII-XII) covers a fairly small area of Italy but is already up to more than 5,000 documents. Central Europe is reasonably covered already by Monasterium.net: it’s a scattering, but it’s there, and searching for pre-1000 stuff I discover that they have Passau, Kremsmunster and Salzburg in already, and these are not negligible corpora. France seems to be rather behind, with a site with great intentions at Chartae Galliae, but as yet it doesn’t seem to matter what search you put in, you get the same eight twelfth-century results back. And the really important ARTEM project are still keeping their data for sale only. On the other hand, a rather more old-fashioned digitization process of scanning and converting to PDF images is long underway and the results are often gathered at Ménestrel. Of course, you can’t search that, but one might be able to remotely OCR it; not mark it up into XML, though. At least it’s not locked in a library in a different country.

The chained library of Wenchoster Cathedral (so the site claims)

The chained library of Wenchoster Cathedral (so the site claims)

But yes, there is a lot, and of what I haven’t mentioned here there is rather more gathered by Dr Vogeler & friends at his home base in München. And he was urging anyone listening (though I rather suspect I was the only diplomatist present, I was certainly the only one who identified themselves) to consider, when they edit their charters for digital purposes, to use their Charters Encoding Initiative scheme, so that one day it can all be combined in this glorious superdatabase for all to use. Well, what would that mean? I don’t like XML, instinctively: it is flabby, ugly to parse and does a wide range of jobs badly but because of the range is irreplaceable (much like Microsoft Access versus most other commercial database packages, does a lot of things inadequately that no other single program does together). And none of the professionals I know like it, and wail every time someone else adopts it because it’s one more obstacle to replacing it with something better that has yet to be constructed. Instead we have several things which might be better but which no-one will sacrifice accessibility for until lots of other people are using them. It’s like the whole Windows thing in miniature but without the crushing commercial imperative behind it—in fact, Microsoft’s own implementation of XML is so peculiar that it is actually preventing people adopting it because the accessibility advantage would be lost for its users. So XML is what we have to work with, and whoever replaces it will have to do so using something that can read XML and convert it because by the time it’s accepted, there will be so much data marked up in XML that the conversion work otherwise would be entirely preventative.

ANYWAY. Take this back to the charters, Jarrett. How difficult is it to encode a charter as they want it? They have some digital examples up to guide future scholars, though even these are loaded with comments about needed improvements. So I had at it with one of the odder bits of my corpus, that abbreviated trial from Sant Pere de Casserres I discussed a while ago. Almost immediately there are issues of interpretation, which is why I chose a court case, they’re hard enough to break down using my own schema. The person who claims the land, but in the end doesn’t get it: he is not a recipient, and he isn’t really a petitioner either, you know? Does it really do any good to lump him with people who are requesting the king make precepts for their pet monasteries? Even once you’ve decided, though, it only marks such people up with rôles in the abstract, not the actual text; so the witnesses and scribe and neighbours aren’t qualified as such in this scheme. That would be much more work of course, but without it there’s essentially no point in marking this stuff up at all except for generating summaries for editions. At this point I more or less gave up with it; there is loads to do here, and I suppose the correct thing to do therefore is join in the work, but for the moment, they don’t have a standard to attract people to, and if we’re to share data it may need to be some other way. Is a mailing list so terribly outdated now, I wonder? Would a bulletin board be better? Another blog? I shall have to think, but I’m open to ideas…

P. S. That Diocese of Wenchoster site is one of the strangest things I’ve ever seen, in a very subtle way; what a lot of effort to put in to such an unclear goal… I think I shall have to calm down with the latest issue of the Framley Examiner

About these ads

8 responses to “Digital cooperation and charters

  1. And they do have a mailing list, though I’m not sure how much German-language XML chatter I can handle

    As far as I know, the members of the list come from a very international background, so the language is usually English. According to my experience it’s a rather low volume list with occasional peaks of activity. Unfortunately I didn’t find any archives online.

    Though I am not an active member of the list, I think, a contribution about your court case and the problems of its markup might constitute very valuable input.

  2. I shall certainly sign up in due course, and ask a few dumb questions, but I’m loath to pile in with examples till I’m sure that what they’re trying to do is actually what I imagine such a group would want to do…

  3. In a related field, there are various library projects which are starting to look at how you might encode texts automatically (or with some automatic elements). For one example see here: the library jargon is automatic generation of metadata. This is due to the realisation that expecting people to sit down and do all the coding by hand is too slow and boring for words.

    Given that charter records tend to have a certain amount of structure imposed on them as they are edited, I don’t think it would be impossible to ‘train’ a computer to pick up summaries, dates, possibly even broad type of charter (since the summaries often use fairly formulaic vocabulary) from any particular digitised cartulary, and maybe if you have place or name indices, even mark occurrances of those within a text. Whether you could do this training accurately and quickly enough to be useful I’m not sure. If you cut out the dispute settlements, which do all seem to be written as one-offs, how much variety is there in ‘bog-standard’ charters, or are there no such things?

  4. From bottom to top: there certainly are ‘bog-standard’ charters, though in some places and times more than in others. Even the bog-standard ones have variations, and traditional diplomatic doesn’t always help one schematise this, but last Leeds I was saying how most of my documents clearly responded to an idea of how a charter went, so close that one could reconstruct it as if it were a formula. Of course the ones where it varies are much more interesting to read, and Allan’s work on Wissembourg has tended to ask why it varies when it does, which is interesting; but if what you want is bulk data, then the variation is more of a problem and less of a possiblity.

    But it can certainly be done. Some very subtle programming at the outset could have a computerised parser reading charters, I’m sure; at the Fitzwilliam the SCBI database was built out of such parsing of printed catalogues which fell into a fairly uniform layout. The problem is checking it: we had two people employed for a total of about eight months full-time proofing the conversion of the 40-odd volumes. And those were in English: medieval rural Latin would be a different thing again. But the actual parsing tech., that must be there, it was so nearly there when we were building our own nearly five years ago.

    The metadata stuff is interesting for entirely more workaday reasons and I won’t try and make it blogworthy, but I’m grateful for the link. This is a wheel we’re currently trying to invent ourselves, so seeing other cavemen on the case is useful.

  5. Have you seen the DEEDS project? It has 9500 dated English charters and it just became available to the internet at large in June.

  6. I was dimly aware of that from the job adverts for people to make this happen; I’m glad it has, but it’s much too late for my own interests…

  7. 1. To reduce your anxiety of the “only German” work: The Charters Encoding Initiative (http://www.cei.lmu.de) did all its work in English (although I have to admit that it isn’t the best choice from the diplomatic viewpoint, as the small english terminology in the Vocabulaire International de Diplomatique shows). There are enough non German members in the CEI to switch to the modern lingua franca.

    Only our comments in the Virtual Library “Historical Auxiliary Sciences” (http://www.vl.ghw.lmu.de) are from German colleagues. But why not change it? Any hint and contribution to this site is welcome!

    2. But now to the hard stuff concerning your “anxiety” on XML:
    a) You’re right: We don’t have something better than XML so we have to use it. But it is not only a “something we cannot avoid” but it is something that fits into scholarly work (and fits better then typesetting with any kind of current typesetting programm): Marking up a text is one traditional way we followed a long time: printing parts of the text in italics, putting signs for “here starts/ends the part in elongated script”, inserting notes (a-a) to identify a passage that has been added later, cross out etc. That’s where the XML concept is impressive: It does the same thing but it does it an way that a human (!) can read the ascii-printout and a computer con still work with it (e.g. convert it to something like a traditional printed edition).

    That does not affect your considerations on semantics and particularily on interpretation (is he a recipient or a petitioner?). There are suggestions how to put role identification and other concepts into the encoded text (e.g. rdf/ontologies). But the question remains: how to do it in a way that other scholars can share the information – using their computer and their access to the net. An XML-encoded file on the web, referring to some external definitions (standard) could do a lot, couldn’t it?

    b) another thing is the selection of names for the markup (or any other concept of digital representation of charters): The idea of Michele Ansani several years ago struck me: Why the diplomatists have an “Vocabulaire Internationale de Diplomatique” but don’t have a commen language of encoding digital representations of medieval charters?

    That led to the initial impact of the CEI: try to establish a common terminology that can be used to name XML-tags – and thus that can be used to build an XML-schema or to name generic elements in another XML-schema or even name your tables and columns in a relational database.

    At the moment I would prefer to build an ontology out of the Vocabulaire International de Diplomatique that can be used e.g. for rdf-XML-Markup (as the DEEDS-people do)

    c) And finally some kind of advertisment: It is definitefely a drawback of any computer program to be used by diplomatist if it is peppered with angel brackets, slashes, equation symbols etc. That’s why the monasterium people developed an XML-editor without XML (http://www.editmom.uni-koeln.de). The basic principle was: No angel bracket to the end user (not fully succeeded, but almost!). Does it meet the needs of the diplomatists?
    To test it: create an user account in the editmom-website, select me (or somebody else) as your personal moderator, select a charter to which you want to contribute new information (like transcriptions etc.) and then test the software. I’m so curious what kind of expierences “real life users” (=real diplomatists) make and how it could be improved. Please do not hesitate to bomb me with questions and suggestions!

    3. And finally: As I already mentioned in Leeds: The cei-l (http://www.cei.lmu.de/cei-l.html) has already attracted some diplomatist with an open mind for computing. I don’t think this community would resist a conversion of the objectives of the list from “encoding charters with XML” to “scholarly work with charters in the era of the internet”.

  8. Georg, very happy to hear from you and I’m sorry I hadn’t yet got round to making contact. I should stress that it’s not that I think you shouldn’t be conducting this discussion in German, merely that my German probably isn’t that good… There is also some issue with the English take-up of the Vocabulaire Internationale de Diplomatique; it is as you say an obvious and vital template for this kind of work, but English-language scholars as you may have noticed, and particularly Anglo-Saxonists, often seem to do without it. I’m as guilty as they, because of liking non-diplomatists to understand my papers; so I tend not to use sanctio where I can use ‘penalty clause’ and so on. However, for these purposes it’s nice to know that we’re all talking about the same thing.

    I also agree that XML is obviously useful, it’s just very very flabby and difficult to read. The `no angle-bracket’ aim sounds very laudable. I will, when I get one or two things finished and get to my Sant Pere de Casserres charters, which haven’t been edited before, see if they work in this schema and system with a bit more than ten minutes’ effort. This is a little way off yet though :-(

    I will also at least sign up to the CEI-L, but again, once some other stuff is dealt with that means I can free both time and mind to adjust to that level of discourse. Blogging keeps me simplifying, which is both good and bad…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s