Goobi is a post-processing and workflow application for digitisation projects. It is completely configurable with the ability to add or remove modules according to the workflow requirements of the organisation. Goobi is employed globally by many institutions to provide a robust and seamless control system for mass digitisation projects. It is a web-based application that supports external connections to facilitate remote working.
In most cases, digitisation projects must process an enormous volume of data every day. High-resolution digitised collections can easily occupy several gigabytes of storage space and contain hundreds of individual files. In addition, users often work with several derivatives of the original master images. Given the enormous volume of data, it is important to maintain a very structured approach to avoid losing the overview of the entire project and the common thread that unites the data.
At Genus, we use Goobi to enable us to control all aspects of post capture workflow from cropping and de-skewing tasks, pagination, image derivative creation, throughput control and project management. It enables us to provide our clients with access to live data about their project at any time and affords us the ability to allow remote workers access to perform tasks via a VPN while retaining the images safely within our network environment. All of this ensures we output consistent jobs across all types of original content, from different sources and for all clients. Other users utilise Goobi to provide differing sets of tasks from controlling the capture element of digitisation through to enriching descriptive and structural metadata.
Initially, a project is specified, and a workflow is built to deliver the requirement using our standard building blocks of processes and tasks within Goobi. Our Project Managers have the ability to build the workflows by modifying and adding or removing processes, as necessary. Every Child Record (item, book, photograph, document etc.) within a collection is represented by a Process in Goobi. Goobi has an intuitive Graphical User Interface backed up with comprehensive scripts meaning complex tasks can be managed in a pre-arranged sequence. Once the workflow has been built it ensures that all images ingested into the Goobi environment follow the same path, guaranteeing the output meets our defined quality standards. We can build as many different Projects as required, all with different workflows if necessary. We can run these Projects concurrently with different teams of Users assigned based on Task, Project or a combination of both.
Goobi enables us to automate repetitive manual tasks for greater consistency and accuracy. As a result, each item can pass through the entire workflow much more rapidly than is the case using manual workflow techniques.
Once the image files have been uploaded into Goobi via the Batch Upload tool, the server carries out an automatic process to verify the Tiff Headers and ensure the files are stable and not corrupt in any way. Once this is complete, they are presented to the user for the first time via the Task Manager window.
Our Project Managers typically have access to all workflow steps and whichever projects they are responsible for, but Users are set up with different permission and task assignments that are precisely tailored to that person’s skills and project membership. As a result, users are not faced with tasks for which they are not responsible, either because they do not have the necessary skills or have not been assigned to the project in question. All users focus exclusively on their own work without having to bother about the tasks being performed by other project members before or after their own contribution or about the rest of the workflow for an item.
The User can select a task from a list presented to them, the first task in our normal workflows is Image QA. This plugin allows the operator to do a quick check of all the images for orientation and general Quality Assurance.
Once accepted, this process goes server side for the fully automated task Auto-Image Analysis. Goobi automatically inspects every image individually and calculates how much de-skew is necessary, if a book spine is present, how much border to leave after cropping etc – all actions that are fully user definable.
Once the images have been analysed, it appears once again in the Task Manager at the LayoutWizzard stage. In this task the User is presented with a preview of every image with its automatically calculated de-skew and cropping parameters applied. If the server had not correctly analysed the image parameters these may be adjusted by the User.
In the following example, a lightly damaged page has been analysed, it has been rotated by 1.40o, cropped correctly and the book spine identified. All the user needs to do is to accept the image skew and crop parameters as being correct.
The following image illustrates the accuracy in which Goobi analyses more seriously damaged content, again without the user having to make any manual adjustments.
All the task settings are of course configurable at a project level. For example, the threshold (degree of accuracy) for the auto-crop and the amount of border remaining visible on an image are both completely definable. Usually the accuracy of Goobi in detecting the correct parameters for the cropping of images is so high that the user simply needs to accept the crop action.
Again, we are not restricted to the physical location of Users. Goobi has enabled us to support home working by our staff as all these tasks can be performed from anywhere in the world with an internet browser.
Once an item has passed thorough LayoutWizzard, the server then automatically applies the chosen parameters and then presents the images in the Task Manager at the Pagination stage.
Pagination is the step where we can apply differing page numbering schemas to the image files if required. It is also an opportunity to carry out a final Quality Check on the images and verify the server crop and de-skew parameters have been applied as expected. We can view thumbnails, single pages or a zoomable view to check focus and detail.
Once this process is accepted, the server again takes over the workflow and produces the image derivatives as required. Our standard outputs are Uncropped Master Tiff, Cropped Tiff, Compressed Jpg, and a multipage PDF/A. These formats are completely configurable and often we do not require outputs in all the formats.
At any time during the workflow, we can obtain a snapshot of what stage the Processes are at.
The image derivatives are finally delivered out of the server workflow. This can be to a shared directory on our Network Attached Storage devices or directly to an ftp server. Direct delivery to a client ftp server is an incredibly useful tool as it ensures a regular, prompt delivery of files to the customer and therefore avoids periods of inactivity followed by a bulk delivery of files which may need ingesting or client side QC.
The management tab gives us access to project statistics so we can measure the job progress and whether we need to re-assign more staff to a bottle neck in production.
Multiple projects can be compared to facilitate resource management and to give a Management level overview of ongoing project status.
We can either chose to view headline figures or drill into specific tasks within projects to gain a granular understanding of our performance. Output as charts, tables and csv’s are all available and of course, we can provide open access to this facility for our clients. Ultimately this facility also enables us to be transparent when it comes to billing as the client has access to the same data as ourselves.
Goobi has proved to be an invaluable tool in the Genus studios for the management of Cultural Heritage Digitisation projects. It provides Imaging Technicians a clearly defined amount of work to process, assists Project Managers in ensuring complete visibility across different projects containing many thousands of images and provides our clients with consistent, high quality images and transparency for progress and billing purposes.