Long-term Preservation of Electronic Publications -
A challenge for a new kind of co-operation between publishers and libraries
Bernd Wegner (TU Berlin)
After a period of approximately 10 years of electronic publishing the vast amount of material which is available electronically now lead to a situation where a more concrete answers to the archiving problem had to be given. As a consequence, several projects came up in 2001, which try to tackle this question and to develop models for the archiving of electronic contents. In contrast to storing printed publications, electronic publications need a lot of care for a diversity of material. This starts with the display format and ends with the compatibility of the offer with the current web browsers, whatever they will be in 10 or 20 years.
As a special project for mathematics the Electronic Mathematics Archives Network Initiative (EMANI) had been designed. Having in mind that a distributed architecture would be more suitable and reduce the load on the partners for such a project, a network is proposed, which also might be a more open approach for extending the project from an initially restricted solution to a more comprehensive enterprise. For the core of the network, a co-operational system of reference libraries and content providers like publishers and editors has been be set up. In the ideal final version they should serve for a long list of purposes covering the major needs of mathematicians for accessing all publications in their domain.
The paper is an updated version of a talk given at LIBCOM 2001. The variety of aspects to be covered by projects in long-term preservation of electronic documents and the rapid development of the activities related to this subject imply that the update will be quite different from the version at LIBCOM 2001. EMANI serves as a model to demonstrate these developments. Special emphasis is given on the role of libraries in this context.
1. Electronic offers and their providers
The impact of electronic devices on the daily life of researchers, teachers or other professionals results from a variety of tools and offers installed in local machines or made accessible through the Internet. The part libraries are mostly involved in consists of electronic publications, or better electronic versions of printed publications. Some libraries already developed digital repositories containing retrodigitised publications, which had been obtained by scanning printed articles and books. But also offers, which could be published only in electronic form, become more and more important. I addition to this researchers and teachers increasingly take advantage of computer algebra systems and other computing software, and visualization techniques using graphics software and image processing tools have become background for most of their presentations and publications. Finally, we should not forget that the internet has been used to establish a communication infrastructure which strongly facilitates their daily work and extend the possibilities for co-operation at distributed sites.
There is a wide range of providers of these offers, going from commercial publishers and learned societies to volunteers and single authors. Also the list of distributors and information brokers is a long one: libraries, databases and indexing services, internet-portals of different types, web browsers et al. In contrast to the "old world" of printed publications these providers have different aims and it is not always clear for the user what he really could expect from these services, when he is searching for some information or article of his own interest. Clearly, libraries try to transfer their system, they have developed for their printed holdings, to these new publications, and hence they still seem to be the most reliable information provider also with respect to electronic offers. But this role has to be acknowledged more widely and the offer has to be improved.
There are good reasons why libraries will be able to maintain their central role for distribution and storage of scientific information and succeed to extend this to the electronic media. They have developed precise and reliable access structures. Their service is free for their specific group of users, and this group is a large one in most cases. Even for external users they developed a good network of exchange facilities, which enables scientists to make their work really accessible for a wide community of users and to read the work of their colleagues without being confronted with bigger commercial barriers. Commonly libraries cover a broad area of subjects and within that they try to be relatively comprehensive. Independent from the frequency of their usage these holdings had been preserved and kept accessible with great care. The objectives of science libraries are user-oriented on one side. And on the other side libraries feel obliged to protect the treasure of knowledge they have accumulated in their collection. This makes them also the best choice for solving the problem of the long-term preservation of electronic publications.
2. Current problems and activities
In the "paper world" the long-term preservation of publications was simple on the first view, though at a closer look a lot of problems had to be handled. They mainly came from the deterioration of the paper or the binding of a book or journal, and they appeared after a comparatively long period in which the physical situation of the document could be considered as stable. Also a wide distribution of documents to several locations world-wide was a factor of stability, protecting them against being all destroyed simultaneously by the impact of wars etc.
For digital publications this period of stability turned out to be extremely small. What everybody experiences with his old releases of word-files, became true meanwhile for the readers of PDF-files, for example. Without conversions, if they exist at all, or simultaneous installation of several versions of the Acrobat-reader a whole range of PDF-files over the period, where the Acrobat reader was offered, is not readable anymore. But this is only one problem. Another one is the stability of the physical carrier, where the data are stored, and there is a variety of plug-ins, which depend on additional software to be offered with the electronic document. Current releases of this software may have a short lifetime. What should we do with the document afterwards?
Obviously there is no complete solution available for these problems, and to work on such a solution has to involve all parties, publishers, libraries and authors. Just in 2001 several of these co-operations started, trying to attack the problem with different models. Most of them involve libraries. Here are some examples.
Most prominent is the co-operation between Elsevier and the libraries at Yale University, caring about the digital preservation of all publications of this big publisher. Harvard University works on the same with Wiley, Blackwell
Science and Chicago University Press. LOCKSS is a system of archiving sites co-ordinated by Stanford University. Through the project Harvest Cornell University is involved in the archiving of publications in agriculture. MIT has dedicated some efforts to a special type of electronic publications, the dynamic documents. The New York Public Library is working on the digital preservation of arts journals. The American Institute of Physics and the American Physical Society have established an archiving system for their publications, which involves the automatic conversion of files when a new release of the reader is distributed. What all of these projects have in common is, that they represent a first approach only and that nobody has a comprehensive solution.
Such a solution will be even more complicated when documents in mathematics are considered, because they are most likely to have software depending enhancements. That is the reason why another approach, called EMANI is described in detail in the next section.
3. The EMANI project
There is a period of approximately 10 years during which electronic publications in mathematics developed from some offers in pioneering freely accessible journals to a first class publication facility with enhanced services in comparison to traditional printed publications. Since older publications are still of big value for research in mathematics, retrospective digitization projects increased the current digital content in mathematics considerably. One major of these projects is ERAM (see  or ).
In the first half of 2001, as a special project to develop models for the archiving of electronic contents in mathematics the Electronic Mathematics Archives Network Initiative (EMANI) had been designed. Having in mind that a distributed architecture would be more suitable and reduce the load on the partners for such a project, a network is proposed, which also might be a more open approach for extending the project from a initial restricted solution to a more comprehensive enterprise.
Though some of the subsequent arguments may apply to all sciences, they are of particular importance for mathematics: Mathematicians and professionals applying mathematics need quick, reliable and integrated access to mathematical publications. Long-term availability of publications is a particular need in mathematics. Digitising of print-only publications and the adjustment of these offers to the current facilities provided for electronic publications leads to an additional series of problems to be solved. Electronic publishing offers a variety of additional information in mathematics, which has to be integrated into the access to the traditional types of publications. EMANI will have to deal with all these aspects and the general problems mentioned in the previous section. Hence a large set of requirements are the consequence, leading to quite ambitious aims and goals for the project.
For the core of the network, a co-operational system of reference libraries and content providers like publishers and editors will be set up. In the ideal final version they should serve for a long list purposes: The basic action will be to store the digital content in mathematics from the content providers at the reference libraries. This will be complemented by retrodigitising all printed publications in mathematics from the content providers at the reference libraries, covering a big part of the publications in mathematics by electronic versions finally. On this basis first measures can be undertaken to care about the long-term preservation of this content in readable form.
But to have the content stored somewhere will not be sufficient. Retrospective digitization may lead to scanned images only, which hopefully can be accessed in some repository. As an important enhancement it will be necessary to improve the usability of the retrodigitized publications by introducing advanced linking and searching facilities and to provide convenient and affordable access to the stored content for mathematicians and professionals using mathematics world-wide.
The reference libraries even may serve as a reference system for other libraries, which want to store and provide part of the content or refresh their existing offers by updated material. Having in mind the long time scale of the publications provided through the network, going from articles from the 19th century to current publications, a system of distribution agents will be needed. This may be a good reason to develop new business models for a distribution of mathematical publications in a combined enterprise between reference libraries and content providers.
4. The starting point of EMANI
Such a complicated enterprise could be started only on a smaller well-controllable scale at first. Once the architecture and the action plan will have been made more precise, an extension may be considered. The current partners to collaborate for the first steps to implement the initiative on the side of the libraries are:
- The Cornell University Library, Ithaca, N.Y.: They have a good tradition in retrospective digitisation projects and are involved in the archiving discussion for other sciences also. In particular they are building up an offer of a bundle of electronic journals in mathematics through project Euclid. They serve as a mirror site for EMIS (see ).
- The State and University Library Goettingen: Also there some important retrospective digitisation projects like ERAM (see  or ) and DIEPER are pursued. In addition to this the SUB Goettingen is obliged to collect all publications in mathematics. In this role they have a high reputation as a center for access to mathematical publications. Moreover they also serve as a mirror site for EMIS.
- The Tsinghua University Library, Beijing: This library has experience with the digitisation of Chinese publications. They are a Chinese center of excellence for installing and offering electronic publications.
- The Orsay Mathematical Library, Paris: They co-ordinate a quite comprehensive consortium of French mathematical libraries. They are speaker of a group of three French partners. The other are retrodigitization centres in Grenoble and Lyon.
The content providers for the start are Springer-Verlag, Birkhaeuser Verlag, Teubner Verlag, Vieweg Verlag and the electronic library ELibM offered through EMIS, the European Mathematical Information Service. The four publishers are looking back to a long tradition in publishing mathematics. They are in charge of several of the best journals in mathematics. In contrast to this the ElibM is a co-operation of several journals and editors on a voluntary basis bundling electronic offers in a world-wide system of WWW-servers (see ).
It is common understanding that the storage of the content in a repository will have priority in the near future and that in general copies of the content stored in the system should be deposited at all reference libraries as a matter of safety. Later on also refreshed versions of the content should be exchanged accordingly. It has been also approved that the partners of the initiative will provide their own achievements to support the aims mentioned above as far as possible. But this will become important in a later phase of the project.
Hence, the first phase of the initiative in preparation consists of the stepwise transfer of the available electronic content from the content providers to the reference libraries. There it will be checked if the files still can be used for the archiving, adjustments will be made in the case of files which are unsuitable for this and recommendations will be developed how the contents providers could care about a more convenient delivery in future cases.
 Hans Becker, Bernd Wegner: ERAM - Digitisation of Classical Mathematical Publications, Proc. ECDL 2000, Lecture Notes in Computer Science 1923, 424-427 (2000).
 Bernd Wegner: ERAM - Digitalisation of Classical Mathematical Publications. Seventh International Conference Crimea 2000O Libraries and Associations in the Transient World: New Technologies and New Forms of Cooperation. Conference Proceedings. Sudak, Autonomous Republic of Crimea, Ukraine, June 3-11, 2000, Volume 1: 268-272.
 Bernd Wegner: ELibM in EMIS - A Model for Distributed Low-Cost Electronic Publishing. Eight International Conference Crimea 2001O Libraries and Associations in the Transient World: New Technologies and New Forms of Cooperation. Conference Proceedings. Sudak, Autonomous Republic of Crimea, Ukraine, June 9-17, 2001, Volume 1: 317-320
Prof. Dr. Bernd Wegner
Fakultaet II, Institut fuer Mathematik
TU Berlin, Sekr. MA 8-1
Strasse des 17. Juni 135
D - 10623 Berlin
Copyright © 2002. International Library Information and Analytical Center.
All rights reserved.