In August 2017 we announced the installation of Goobi workflow management software to aid our digitisation processes, since then Goobi has helped us manage several complex digitisation projects of a very large scale. In this blog we look at our progress with Goobi so far….
Goobi is a unique platform based on open source technology which manages the entire digitisation process from the movement and transportation of materials through to image capture, OCR, quality assurance and linking metadata to images before exporting all the data ready to upload into any content management system.
Goobi represents, quite literally, a revolution in the way heritage digitisation is delivered and we are proud to be the only digitisation company in the UK that uses the software. Goobi has ensured that all our on-site & off-site digitisation is managed effectively whilst allowing our customers to have a live view of their project at all times. This has proved to be a very effective methodology for both us and our customers.
Since we started using Goobi it has helped us control and manage our digitisation process whilst eliminating the room for error in any project, it also allows our customers to view progress on an item by item level live online. Goobi uses barcoded identifiers to track each object through the process from the initial identification through to the movement of an object to the digitisation area, scanning, image verification, 100% image quality assurance, file conversion and feeding the content into software for OCR. The entire process is managed at an object level and exports the digitised and extracted data into any content management system as a final step. The software is web based on a secure network and also provides instant reporting and dashboard status of the entire project live.
Metadata Made Easy
The core strength of Goobi is the way that it works with, and is driven by, metadata.
In Goobi, identifying and assigning metadata can occur using several methodologies. According to the customers’ requirements, we would define the best approach for the different material types. The first level of metadata would be at an inventory level where objects and files would be assigned metadata at the beginning of the process according to the customers object identifiers. After digitisation we use OCR technology to capture all printed text and make it searchable. The OCR software can capture large amounts of metadata from the document structure to the text content itself.
The captured metadata (both structural and content-based) is stored separately to the digital images created to enable quicker searching in the ultimate database configuration. We would then store this metadata in whatever format is required by the customer although we would recommend standards-based metadata schemas for long-term portability and digital preservation purposes such as METS, MARC, EAD, ISAD(G) or Dublin Core schemas or any other schema of the customer’s choice.
One of the key functions within the Goobi workflow is Layout Wizzard, this functionality allows us to process many images with minimal intervention. Layout Wizzard contains 3 elements that apply settings to enhance the presentation of the image. The first stage that is carried out by Layout Wizzard is the deskewing of all images. In the rare instance where this hasn’t been done automatically, we can make a manual adjustment. The second stage is cropping of the image, we can set the crop area on the first image and apply this setting to all other images in the file set. The final step is Cut Book Spine, this is where we apply a digital cut to the spine side of the page/image. Again we apply the setting to the first image and then apply this to all images in the file set.
The end result is a file set that maintains a consistent look throughout with minimal input.
Digitisation Processes powered by Goobi
Digitisation to us is the process of converting an analogue object into a digital format with associated metadata to make it discoverable in a database system. As always, we recommend that digital assets are stored in digital preservation formats with separate metadata (again in standard formats) to enable data portability between systems in the long term.
The digitisation process is an end to end activity that works from the initial inventory through to the upload of the content and metadata to a database system and the return of the digitised object to its allocated storage location. We advocate an approach which monitors every stage in this process to ensure that all materials are logged, monitored and verified at each stage. Typically, a digitisation process will consist of the following stages. (N.B.: depending on the objects to be digitised not all stages are required)
- Inventory generation (logging all objects with a unique identifier at the beginning of the process. This could involve barcoding objects or object containers, assigning unique identifiers and logging storage locations.
- Movement (moving objects or groups of objects from storage to the Genus scanning location)
- Verification (checking all objects are present)
- Scanning (creating digital images from the objects)
- Verification (system-based verification that all content has been captured to the correct specification)
- QA (100% checking of all content by operators for clarity, focus etc)
- Layout Wizzard (Image analysing and post-processing)
- Automatic data extraction (OCR based content extraction, analysis and verification including structural information)
- Metadata enhancement (adding of additional metadata according to the specification as needed)
- Data verification (verification of metadata compliance to standards and file format integrity)
- Upload (uploading of metadata and digital assets to line of business systems)
- Movement (replacing of the object back into store)
- Verification (verifying all digitised objects are back in the store and are accounted for)
All the above processes are managed centrally, can be reported on by the Genus workflow management software; Goobi and can be accessed by the customers at all times.
To summarise our experience with Goobi so far has been a positive one, our workflow has improved and digitisation processes on a whole have been made easier. If you are taking on a large-scale digitisation project, we would highly recommend using Goobi. Why not try Goobi-to-go for a smaller project to see how it can improve workflow at your organisation.