In yesterday’s post, I introduced the idea of authority control and showed how it is similar to the idea of disambiguation in Wikipedia. Today, I want to go into authority control in more depth.
The purpose of authority control is to have one term – and one term only – to represent a concept, name, or place. (This post will focus on concepts, but you’ll be able to extrapolate this idea for names and places.) This official term is the authorized heading. The Library of Congress publishes lists of these authorized headings.
Why is this important?
If you want to get the best search results, then you’ll want to search with the authorized subject headings.
So, for instance, if you want to search for works about feminist philosophy, you should know that the official term is not “feminist philosophy”, nor is it “women and philosophy”. Instead, the official subject heading is “feminist theory”.
As an aside, the Library of Congress has been accused on more than a few occasions of using racist, sexist, homophobic, demeaning, or just downright stupid subject headings. If you ever want to discuss it, contact me. We’ll go grab a cup of coffee, and I’ll tell you about Sandynistas.
For now, we will leave aside the issue of whether or not you like the subject heading. The point is that there is one and only one subject heading for a given concept, and if you know it, your catalog searches will be awesome, because you will be able to find everything in the catalog that has that subject heading.
Before I get into the complicated part, let’s think about the power of this for a second. When you search Google or you search an article database, you are searching the full text of bazillions of documents. You can find lots of stuff using keywords. But you have probably had this experience when using Google: you enter some keywords, and then the results aren’t what you want. Then you change the keywords and see if you can get different results. You might do this a few times until you find something that looks useful to you.
The truth is, that’s a pretty ineffective search technique. There is no way for you to know if you got the best sites in your results, the most useful sites, or even a significant number of the sites on the topic you want. For instance, let’s say you hit the lottery and you want to hit the fresh powder at Vail. So first you Google ‘Vail fresh powder’. You’ll find some stuff. Then you try ‘skiing Vail’. That looks better. Then you think about what you’re really looking for, and you try ‘Vail ski vacation package’. Now you’ve got some really helpful results. But what you can never know is if you’ve gotten all the results on ski vacation packages at Vail.
Imagine you could look up an official term, like “Skiing–Vacation packages–Colorado–Vail”, and if you put it in Google, it would bring you only the appropriate results and all of the appropriate results.
Google doesn’t do that, but the catalog does.
Why? Because Google bases its searches on searching the full text of the website. There is so much stuff on the web, and language is a messy thing. There is no standard way of talking about anything. (Did you ever read Ernest Hemingway’s “Hills like white elephants”? It’s a short story about abortion that never once mentions abortion. It doesn’t even mention pregnancy. If you tried to do a keyword search on abortion, it would never be in the results, since the word ‘abortion’ is not in it.)
The catalog is different. When you search the catalog, you are not searching the full text of the items in the catalog. Instead, you are searching catalog records that have been created by professionally-trained catalogers.
I’ll pick up tomorrow by traipsing through a catalog record in detail, to show you where the expert searching power lives in the catalog.