Searching for visual content online is a bit like being in a tiny fishing boat atop a vast ocean. Sometimes the net fills with small fry which have to be thrown back; and sometimes it’s like casting with a rod and line into the deep with nothing but hope in your heart.
It would be easy to say that keywording is the answer and leave it at that. But solutions to content retrieval are multi-layered, and although keywording is important, it’s just one part of a package of factors you need to address if you want users to find your content.
In this blog I’ll discuss the criteria for effective retrieval, and how to go about planning your keywording. At a time when budgets are squeezed to the limit, collections need more than ever to be informed by the digital principle of think first, do once and use many times. This applies equally to keywording and retrieval.
The first step, as always, is to to understand what your users want. Who are they and what are they searching for? Become a user yourself and note down what works and what doesn’t, noting both the pleasures and irritations. Look at other search sites for comparison. Then investigating how search systems work will help you analyse what’s going on your own search system, and indicate your best path forward.
The difference between a text based search and image based searches
Keywording generally means tagging visual content with words that help users to retrieve it. Describing content, as we do in a caption or content description, is a text based approach which involves not only descriptions of what you can see, but often also context information. For example a caption might say ‘Rock Musician Rocky Rockville onstage in London’. If this caption content is used to retrieve images, a general search for London will find this image, which may be a close-up of the singer crooning over a mike. This is probably not the result you want if you’re illustrating a guidebook with sights of London, which was used as context in the description. This speaks to the perennial problem of distinguishing between images in and images of a location. Similarly, if you run a search for rock, meaning the geographical term, you will retrieve the rock star among the stones.
When you add specific visual retrieval keywords to an image, you’ll want to use terms searchers will use. You may add Rock Star, Singer, Singing, Rock music, Mike, Microphone, Energetic, Music Industry, and so on. Whether you add the location will depend on what other mechanisms you have for finding images, and this is where some forethought about how to structure your data becomes a critical factor.
Image search experiences are littered with examples like the above; irrelevant words getting in the way of successful search results. Time is money for users, and they don’t want to have to wade through thousands of irrelevant images. Before you rush off to sign up volunteers or keywording companies to do your keywording, there are a few crucial areas you need to dig into.
Data Structure and Search methods
The first key question is this. When you enter a word in the quick search box, what fields are being searched? Is your system searching on captions or keywords, or both? If you have a mixture of keyworded and non-keyworded content you may want a search on both, but you might choose to prioritise the keyworded content so those results appear at the top of the search. With a search over keyword and caption fields, you cast a wider net, but you will retrieve more irrelevant results. For targeted results, assigned keywords are best, but if most of your collection isn’t keyworded, as is the case for most cultural heritage organisations, you will have to act strategically.
You need to know whether terms like geographical location and proper names are separately recorded and separately searchable in your database (in an advanced search for example ). Does your software support authority or pick lists that allow the user to avoid the pitfall of spelling mistakes? You may not need to add location terms in the keyword field in your own system, but you need to know how that data will be exported if your images pass to agents or other organisations.
Finally, have you sorted your images into meaningful categories? These can give a very quick route to users who want to find their way around your search system, and narrow down your search results very effectively.
The example of the rock star above illustrates what can happen if caption words are indexed for search. But sometimes keywords are added over-liberally so that even the minutest details in the image are recorded as keywords. This can drive the seasoned user to distraction.
For example, if I’m searching for a stock image of a saucepan, I won’t be pleased to see hundreds of images where saucepans are minutely and incidentally shown in a corner of the image. Unless, that is, this is a historical archive where researchers may depend on finding tiny details for a picture of life in other times. The strategy depends on the content and the needs of the users.
Some difficulties in content retrieval are down to the glorious flexibility and ambiguity of language itself. We rely on words for much of our retrieval, and some words (homographs) can have several meanings. In the first example above, which kind of rock am I searching for, the music or the geographical feature? Is my search for trunk aimed at the storage trunk, tree trunk, elephant trunk or car trunk?
Added to this, an idea or concept can be expressed in different ways. If I’m searching for a an image with people wearing wellies, I may have to search for Wellie, Wellies, Wellington (but I don’t want the man), Wellingtons, Wellington Boot, Boot (Please don’t give me car boots), Boots and so on. I may run out of steam if the system doesn’t help me on my way.
There are ways to handle these issues, using combinations of predictive text, keyword linking (more with this keyword), thesaurus (which links words together in a hierarchy) and visual search techniques. How you proceed will depend on your content, your priorities, your software capability and your resources, and again, it’s always worth seeing how other similar collections do it.
Another issue to resolve is how to record ideas or concepts that people may search for. These may not be contained in the textual description of the content, but can be interpreted from the content by a keyworder who understands the needs of the users. So an image of someone wearing a facemask on the other side of a pane of glass can denote sadness or isolation or loneliness, or indeed ill health, but this may not appear in the factual description which might be ‘Resident in nursing home meeting her daughter for the first time in 6 weeks.’ Interpretation is a large part of the way we use images. They symbolise thoughts, feelings and situations. While relevant conceptual keywords are routinely added to content in the stock image industry, other collections in the cultural sector may have relevant content which holds only the factual description of the content, without any visual interpretation. These collections may want to adopt conceptual keywording for parts of their collection, to make more of their content available for general searching.
Using a Thesaurus
A hierarchical thesaurus for visual content follows the principle of doing the thinking upfront, and brings significant productivity gains to both keywording and image search. Essentially, the idea is to allow both keyworder and user to enter any word they can think of to describe an image. The interconnection of words already there in the structured hierarchy ensures that all the other relevant words are linked to it in the background. If you take the example of the Wellington Boot above, the keyworder may tag an image with Wellie, but the database user can enter any of the terms above and still retrieve the image. It also means that the content will be found under a more general search, like a search for Boot, or Footwear, because of the vertical hierarchical connections in the thesaurus.
Keywording companies all use thesauruses to support their productivity, so they can provide a full list of relevant words to tag your images. But the best solution is to have thesaurus functionality in your own database, which means you control how the words are stacked and defined in your own specialisation. It’s best to buy in a thesaurus and modify it, as it’s a long and arduous task to set up a hierarchy of the fifty thousand words or so you need in a general visual thesaurus. But don’t be fooled into thinking that the thesaurus offered by your academic departments will do the job in the same way. A subject thesaurus may be useful as part of a general thesaurus, but it doesn’t do the job of classifying terms for general visual content.
What about AI?
AI is hugely useful in content retrieval, but, as always, its use comes with a health warning. The fact that you have to train AI to recognise content meant that the first decent AI search results in the image stock industry were returned, some years back, from items like beaches and palm trees (if only!) which were commonly used at the time. These days, AI systems will probably start to recognise people’s faces under their masks. Computers can recognise objects and shapes and faces once they are trained with the data, and they need a lot of data to learn. If you run a specialised collection, you may need to train the AI systems with your specific types of image to get good results from AI tagging. How does the system know all the different shapes and categories of amulet, for example unless it has a lot of specialist data to learn from?
Objects are relatively well defined and easy for AI to learn, but concepts and emotions are more complex. References to concepts like war, love, loneliness will differ both within and across cultures. What looks like a Garden shed in one culture can look like Housing in another. I’ve seen the images of anti poaching retrieved from a search on War for example because the AI was seeing camouflage clothing.
So the best way to think about AI at the moment is to see it as an aid in those areas where it can learn quickly and for those collections who have suitable content. Human QA should of course be part of the plan.
Bringing it all together
In any discussion of content retrieval we should review a number of techniques to approach the result we want for our users, depending on the resources we have to hand. The stock image industry has always been hot on focused search results, as the business depends on relevant hits, not on quantity of returns. The cultural sector, with a broader remit, can learn from the imaging licensing business how to approach people who don’t know precisely what they want and are not specialists in the subject matter.
This brings me to the Google image search, which is commonly used by picture researchers to find content online. It’s now possible to filter search results by licence (commercial or CC) using the new Google Licensable Badge, which will improve results for both users and collections. But the algorithms used by Google can never replace the data provided by intimate knowledge of the collections themselves. If you want users of your collection to retrieve focused search results from your collection, you will have to do the work.
In summary, first know your collection and the way your users search, then analyse your data structure and current search software. You are then ready to investigate what search features are available to you now, and what investment you may need to make in new or updated software, and the human resources to achieve those ends.