An Overview of our Production Processes

It is essential for the production of functional and durable electronic texts to follow a consistent and carefully controlled sequence of transcription and encoding. This is known as the 'Critical Path'. In the case of the Newton Project, it falls into six interrelated phases:

  1. Primary Research and Administration
  2. Transcription and Encoding of Source Texts
  3. Development of Technical Means of Outputting Encoded Material
  4. Image Capture and Output
  5. Synchronisation of Text and Image
  6. Enhancement of Previously Released Material

As far as Newton's theological papers are concerned, Stage One is virtually complete, Stages 2-4 are underway, Stage 5 has been tentatively touched on and Stage 6 is yet to be dealt with.

1) Primary Research and Administration

This stage involves identifying which manuscripts and other materials are to be transcribed, establishing their whereabouts, and obtaining permission and copyright clearance from the holding institutions to produce and release images and transcripts. It is also necessary to establish whether the manuscript in question has been published in any form before, and if so whether this raises other copyright issues. Equally crucial in this first phase is the establishment of a well-defined but not inflexible procedure for transcription and encoding.

The principal outcomes of Phase One in our case were the online catalogue of Newton's theological, alchemical and Mint papers, and the development of our Transcription and Tagging Guidelines.

2) Transcription and Encoding of Source Texts

Since the vast majority of Newton's manuscript legacy is available on the Chadwick-Healey microfilm edition, it is usually a fairly straightforward (if time-consuming) matter to print out copies for transcribers to work from. In some cases, however, where microfilm copies are either inadequate or non-existent, it is necessary to order new digitised images or to obtain access to the original manuscript before transcription can begin. The appointed transcriber then sets to work, entering transcribed text and XML structure and format tagging both at once.

Untagged transcripts of some texts had already been produced before the current critical path was established. In these instances, tagging has to be applied retrospectively by one of our transcribers or editors.

Once transcription and tagging are complete, the tagged text is proofread by a second researcher and the XML syntax validated by the Technical Manager or the Transcription and Tagging Manager. At this stage, the text is deemed fit for initial online release, though in due course the transcription will undergo a final check by a third researcher. Ideally this should be against the original manuscript rather than a reproduction, though in some cases where access to the original is unduly problematic we have to make do with electronic scans of the original document.

3) Development of Technical Means of Outputting Encoded Material

Tagged transcripts are rendered readable on the Newton Project website by means of a program called a stylesheet, which translates the dense electronic markup into a visual display comprehensible to the human eye. Since the Transcription and Tagging Guidelines are themselves being continuously refined and upgraded, the stylesheet also has to be modified to keep pace.

4) Image Capture and Output

Obtaining high-quality digitised images of original manuscripts is a complex, expensive and time-consuming process involving sometimes lengthy negotiations with holding institutions who, of course, have a duty to ensure that their manuscripts are being handled and disseminated with due care and respect. We aim ultimately to present facsimile images as well as transcriptions of all the material in our database, but given that we are dealing with a huge range of institutions scattered liberally across Europe, North America and Asia, this is a far less straightforward issue than it may sound.

5) Synchronisation of Transcribed Text and Image

Once transcriptions are complete and images of the original manuscript have been captured and cleared for release, the next step is to hook the two up together so that users will be able to compare them, either by toggling between views or by looking at both at once on a split screen. A few of our shorter documents can already be seen in such a split-screen view, but we intend in due course to make the display much more sophisticated and dynamic.

6) Enhancement of Previously Released Material

This final stage entails applying a new level of markup to the encoded text to capture not just formatting information (what the text looks like) but content information (what it means). Names, places, dates, concepts, obsolete or technical terms and so forth will all be individually tagged so that they can be linked to explanatory apparatus. This also makes it possible for documents to be searched by theme and concept as well as explicit textual content. For instance, a paper with strong anti-Trinitarian implications may well not include the expression 'anti-Trinitarian' as such, but users searching for texts on that subject will still be able to find it because the term is present in the coding. We will also start to supply our own editorial commentaries on the material we have released, add translations of non-English text, and provide hyperlinks to other relevant online material.

At this point, electronic markup becomes a highly specialised academic discipline requiring the taggers to have a thorough knowledge and understanding of the primary material they are dealing with and of the secondary material that can be used to elucidate it.

The end result will be to provide users not just with the Newton Project's own realisation of and commentary on Newtonian texts, but also a portal to the wealth of other relevant online and printed material.

© 2014 The Newton Project

Professor Rob Iliffe
Director, AHRC Newton Papers Project

Scott Mandelbrote,
Fellow & Perne librarian, Peterhouse, Cambridge

Sponsored by:

  • University of Sussex
  • Arts and Humanities Research Council
  • JISC
  • CORDIS