Introducing the Redesigned New Catalog

A redesigned version of the new catalog was publicly released on April 25. A test version of this new design was made available to library staff in early March. The Forward Development Task Force (FDTF) spent much of the previous two months improving the new design based on staff and user feedback and usability testing.

The new design incorporates the branding and colors of the Libraries’ website and other UW sites and is “skinnable”, meaning other UW System campuses can personalize it with their logos and colors. Responsive Web Design ensures that the new catalog dynamically adapts to a user’s viewing environment, tailoring itself to smartphones and tablets.

Overview of New Design

Home

The new catalog homepage embraces a clean and simple design aesthetic. The inclusion of cover art and images from the University of Wisconsin Digital Collections Center add visual interest.

Usability testing confirmed that the new design meets user expectations. The design was almost universally praised and participants compared it to popular commercial web destinations, like Amazon.

Results List

Prominent availability buttons are located in almost every record in the search results list when a search is scoped to UW- Madison. The green “Available” button appears if the record includes one item and that item is available for checkout. The orange “View record” button is displayed if a record includes more than one item (i.e., a library owns multiple copies of the item or more than one library holds the same item). The red “Not Available” button is displayed if UW-Madison owns one copy of an item and that item is currently checked out.

A “Libraries” facet appears beneath “Search Scope” under Refine Your Search, enabling users to easily switch between UW System and UW-Madison searching and limit a search to a particular location at UW-Madison. Currently selected facet “bubbles” at the top of the page allow users to quickly refine and remix search queries.

Users consistently told us that limiting by location and format is extremely important to them. The “Libraries” facet and the conspicuous search scope functionality were informed by this feedback. Users also indicated that they want to see availability information up front; they don’t want to click into a record only to discover that the item isn’t available. We plan to get more feedback on the logic of the current availability buttons in the coming months to ensure this functionality is meeting user expectations.

Item Records

Availability information is now clearly displayed in a table on the right side of item records. The UW-Madison branded “Find It” button appears in records for journals.

Buttons were created for common user actions, such as cite, email, and print, and placed in a toolbar at the top of the item record.

User testing has demonstrated that, within the item record, availability and location information for items at UW-Madison is intuitive and easy to determine. Feedback from users also indicated that the “View Physical Copies at Other Campuses” link to find availability information at other UW System schools and the “Sign In to Place Request” button need to be more visible. The FDTF is working to make these critical functionalities more evident.

Advanced Search

Links to Advanced Search are available in multiple locations, next to the basic search box and in the main navigation bar. Within Advanced Search, the “Library” limit, formerly the “Location” limit, is displayed first to reflect user behavior.

Usability testing revealed that Advanced Search is more commonly used than previously suspected. The FDTF is reviewing the examples beneath each search field to ensure they provide accurate and unambiguous guidance.

Browse

The A-Z navigation now includes a secondary list to help improve user discovery of a desired subject term. For example, if a user is looking for items on presidents and clicks on the “P”, a sub-menu with the second letter for all subject terms beginning with P appears. The user could then click on “Pr” to narrow their browse. The “Pr” is also displayed in the “Filter by” box, directing users to use this functionality to further narrow their result set.

The FDTF is aware that significant usability issues still exist with this portion of the application. Namely, the “Filter by” box remains case sensitive. User testing indicated that the A-Z list and sub-list make sense to users and, to some extent, direct them to effectively use the “Filter by” box; however, participants are still perplexed by the number of links on this page and aren’t sure why they would use this feature over keyword searching.

User Feedback

This semester, the Web Services Usability and Assessment Team (WSUAT) completed two round of usability testing and met with the DoIT Student Advisory Board. For more detailed information on user feedback and the FDTF response, please see the complete first round report and the response from the design team.

The full report from the second round of testing and notes from the DoIT Student Advisory Board meeting will be made available shortly.

Usability testing helps us identify trends in user behavior and red flag issues, but this information is collected from a small sample of users in a test environment and doesn’t completely represent how users will interact with the new catalog. We also need to hear from you! Please let us know what you think of the new design by using the blue “Share Your Feedback” button at the bottom of every page in the new catalog.

Browse and the Catalog – Part 2

Browse – The Tab

Besides being a fine tool for finding content from within a record, the Browse feature of the catalog can also be invoked from the Browse tab.

When a researcher starts at the Browse tab (instead of getting there from within a record), she or he is presented with the ability to browse by Subject, by Digital Collections, by Languages or by Author/Creator Names. These are all metadata elements in our catalog. The list is based on the same metadata that we use for refining and limiting. Here’s what it looks like:

Browse Tab

When you choose Topics (under Subjects) the resultant page lists the first 30 most common topical subjects in our catalog. When you choose Places (under Subjects), the resultant page lists the first 30 most common geographic subjects in our catalog. Choose Languages and the researcher sees the 30 most common languages used. Choose Authors/Creators Names, and the 30 most common names appear. Here’s the Topics page:

Browse Tab Topics

For all of these types of Browses, there is an A-Z bar, and the ability to see the list in alphabetical order instead of by frequency, which is the default. A researcher can also move past the first 30 using the Next button. There is also a “Filter by:” box to refine the list. More on that feature later.

Subject words in the Browse list can be found anywhere in a full subject heading in a record. For example, in the above screen is the subject “Politics and government” with 187,752 records. Records in this set might have subject headings like:

Middle East — Politics and government
Taipei (Taiwan) — Politics and government
Scotland — Politics and government — 17th century

Browsing >> Topics

If you know any part of a subject heading, you can Browse it. Suppose the researcher knows that “health care reform” is a subject term. Start at the Browse tab, and choose Topics (under Subjects). In the “Filter by:” box, type: Health care reform. NOTE: Subject browses are case-sensitive! The screen will look like this:

Browse Health Care Reform

When you choose the first set of records, you’ll get a screen like this one:

Browse Health Care

Records in this set of 1,014 have subject headings like this:

Health care reform — Economic aspects — Massachusetts
Health care reform — Government policy — United States

In the above screen, the searcher can take a look at all the records, or use the Related Topics section to narrow down the search. From the Related Topics section, the researcher might choose “Law and legislation” to narrow the search. The screen would look like this:

Browse Health Care Reform Law

 

Another example:

If you know that one of the subject subheadings used in records is “Environmental aspects”, you can use that to Browse: (Remember that subject browse terms are case sensitive!)

Browse Environmental aspects

Records in this set have subject headings like this:

Economic development — Environmental aspects
Pesticides — Environmental aspects — United States

Choose that heading and that leads you here to see Environmental aspects of many different things:

Browse Environmental aspects 2

A researcher can narrow this search here. Choose Pollution, for example, and that leads to this screen:

Browse Environmental pollution

A researcher can further narrow this by choosing Water:

Browse Environmental pollution

 

Browse >> Places

Use the Browse >> Places to find records that contain the geographic area you are interested in seeing. For example, suppose you want to know about everything we have where New Zealand is part of the subject. Choose Browse >> Places, and then use the “N” from the alphabetical list to find New Zealand things. Note: You could also use the “Filter by:” box and type New Zealand.

Browse New Zealand

Choose New Zealand (4,086 records!) and see this list:

Browse New Zealand List

If you were interested in the Economics, you might narrow the search by choosing “Economic conditions” in the Related Topics column. If you wanted a Guide book to New Zealand, choose the Content Types – Guidebooks.

Browse > Author/Creator Names

The Browse – Author/Creator names feature is still in the early stages. It will be far more productive to search for authors and creators using the Search features.

Browse and the Catalog – Part 1

The best place to learn how the Browse feature works in the catalog is to read Steve Meyer’s blog post, here:

http://forward.library.wisconsin.edu/moving-forward/?p=742

Browse – From a Record

The Browse feature was meant to be invoked from within a record.  A researcher does a search, looks at the list of records, and selects one of interest.  That record generally has several subject headings (or names).  Click on one of those headings, and the researcher gets to the Browse feature.

From this Browse page, a researcher can look for similar subject headings, expand or limit subject headings, explore the catalog based on language or use the authors/creators portion.

Sample Searching Number 1:

Do an Anywhere search: homeless women

http://forward.library.wisconsin.edu/catalog?q=homeless+women&qt=search&local=true

One of the records near the top of the list is the title: “Transitional programs for homeless women with children :” shown below:

http://forward.library.wisconsin.edu/catalog/ocm39951810

Homeless Women record

There are three LC subject headings for this record.  Click on the first one “Homeless women–Services for–United States”.  That takes you to the Browse page, here:

Browse Homeless Women

There were 6 records retrieved with the subject terms “Homeless women ” AND “Services for” AND “United States”.

Perhaps this is too narrow.  Click on the red “minus” sign next to the words “Service for” as you see in the shot below:

Browse Homeless Women Minus

and the page changes to this shot:

Browse Homeless Women Minus 2

Notice that this widens your search so there are 19 records of interest.  If you get rid of the “United States” part of the original browse, the result list expands to 74 records.

Sample Searching Number 2:

Do a different anywhere search:  health care law

http://forward.library.wisconsin.edu/catalog?&local=true&q=health+care+law&qt=search

The record with the title “Health care law and ethics” is pretty interesting.

http://forward.library.wisconsin.edu/catalog/ocm50920633

Browse Health Care Law

Choose the subject heading “Medical care–Law and legislation–United States–Cases”.  That choice leads to this browse page:

http://forward.library.wisconsin.edu/subjects/topic?f[content_types_facet][]=Cases&f[geographic_subjects_facet][]=United+States&f[subjects_facet][]=Law+and+legislation&main_topic=Medical+care

SPECIAL NOTE:   Remember that whenever you get to any page in the catalog, you can use its URL as a sort of “canned” search.

Browse Medical Care Page

In this case, there are 14 records of interest where the subject heading is “medical care” AND “law and legislation” AND “united states” AND “cases”.

Get rid of the Content Type, “Cases” and the search broadens quite significantly, to the page below:

Browse Medical Care Page Minus
;

Sample Searching Number 3:

Do an anywhere search with the words:  social responsibility

http://forward.library.wisconsin.edu/catalog?&local=true&q=social+responsibility&qt=search

The title “Social responsibility : corporate governance issues” looks interesting.  Choose it.

http://forward.library.wisconsin.edu/catalog/ocm51779661

Browse Social responsibilty

Click on the subject heading “Social responsibility of business.  That will take you to the Browse page below:

Browse Social Response Business

This time there are 657 records.  Narrow this browse by choosing, for example, “Environmental aspects”.  That leads to this browse page:

Browse Social Response Business Environ

This might be a workable list, or you could narrow this further.

 

Searching the Catalog

Searching the Catalog

A researcher has several different ways to find things in the library catalog: a regular search, an advanced search and a browse.  This document describes regular and advanced.  There’s another document that describes browse.

There is another post called “Indexing Details” that shows which MARC fields are indexed in the solr/Lucerne index used for the catalog.  Please refer to it to help you understand the details of which fields are used when you search.

When you type a single word into the catalog’s search box, the search engine looks for all records that have that word.  It gives each record a score to determine its relevance, and then presents the most relevant at the head of the results list.  This is true for all types of searches.

Regular Search – Anywhere

When a researcher first enters the catalog, the default search is a regular search and the default search type is an “Anywhere” search.

Type:  polymer

The search engines looks for all records that have the word “polymer” in any of the indexed fields, assigns each record a relevance score, and then presents a list of results.  Records that have the exact word “polymer” in a title are presented first.  See the Relevance document for how this part works.

When a researcher types more than one word, the search engine treats those as a Boolean ANDed search.

Type:  business ethics

The search engine looks for all records that contain both “business” AND “ethics” in any of the indexed fields, assigns a relevance score and presents those most relevant first in the results list.  Those record that have both words in the title are presented first.

In addition, we use a rule called mm (min-match) which makes it possible to say that a certain minimum number of clauses must match.  Note: a clause generally means either a word or a phrase in quotation marks.  When a researcher uses a large number of terms, the rules don’t require that ALL of the terms be in the results list.

For “Anywhere” searches, the rule is:

“mm”  =  5 <-1   7<  -2  10 < -3

The translation of this is:

- queries with 5 clauses must return records that match all 5 clauses.  It also means that queries with 2 or 3 or 4 clauses must match 2 or 3 or 4

- queries with 6 clauses must match 5 or 6 (the <-1 7 part)

- queries with 7 clauses must match 6 or 7 (the <-1 7 part)

- queries with 8, 9 or 10 clauses must match 6, 7 or 8 of the clauses (the <-2  10 part)

- queries with 11 or more clauses must match all but 3 of the clauses (the 10 < -3 part)

This means that if you type the words:

Gertrude stein bibliography

Each record in the result set of records must have all 3 words somewhere in it in order to be considered a match.

If you type:

Gertrude stein women literature criticism

The system must match all 5 of these words in order to match.

Regular Search – Title

The default search in the library catalog is the “Anywhere” search.  A researcher can also choose to search by Title, Author and Subject.

Title searching is matched against only the “titles” and “alternate titles” fields of the catalog’s records.  The details of those fields are in the Indexing table.  The basis search rules still apply.

Choose Title from the drop-down area in the search box, and type:  midnight

The search engine will find all records which have the word “midnight” as a word in any of the title index fields, give a relevance score to each record, and then present them in a results list.  Records which have only the word “midnight” as their title will appear first in the list as most relevant.  Records which have the word “midnight” somewhere in the 245 field will appear next.  After that, the word “midnight” may appear in any of the records’ titles fields.

Choose Title from the drop-down area in the search box and type:  thirteen stories

The search engine will find all records that have both words in the title fields, ANDed together.  Those records that have only those two words will appear at the top of the list.

One of the fields that is indexed for titles is the Table of Contents of a record.  If you know the title of a short story that’s in an anthology (and the record for that anthology has a Table of Contents), you can find those books that contain that short story.

From the Title drop-down, type, in quotes: “hills like white elephants”

The search engine will find that exact phrase in the title fields, and it finds about 6 records, all of which have this as a short story title.

Regular Search – Author

Author searching is matched against only the “names” or Author/Creator fields of the records in the catalog.  The details of which fields of a record are included as names are in the indexing post.  The basic search rules still apply.

In the Author drop-down of a regular search box, type:  fitzgerald, ella (you can also type ella fitzgerald)

The search engine will find all records that have both “fitzgerald” AND “ella” anywhere in the names fields of the records, assign a relevance score and return a results list.  Those records that have both words together (one of the relevance rules) will appear near the top of the list.

Regular Search – Subject

Subject searching is matched against only the “subjects” and “geographic subjects” fields of the catalog’s records.  The details of those fields are in the indexing post.  The basis search rules still apply.

In the Subject drop-down of a regular search box, type:  anthropology

The search engine will look for all records that have the word “anthropology” in any of their subject fields.  It will assign relevance scores to each record and return a results list.

In the Subject drop-down of a regular search box, type:  anthropology ethnology

In this case, the search engine will look for all records that have both words “anthropology” AND “ethnology” anywhere in their subject fields, assign the records relevance scores and return a results list.

In the Subject drop-down of a regular search box, type, in quotes: “people with disabilities”

The search engine will look for all records that have that phrase in any of their subject fields.  Those records that have the exact phrase will be near the top.  Those that have subject terms that contain that phrase will also be retrieved.

Search tips:

1.  Use quotation marks around a word or a group of words to force an exact match.

Examples:

“war of attrition”

“Bernice bobs her hair”

“educational change”

2.  If you know the full title, type the whole thing *including* initial articles:

a beautiful mind

the house at pooh corner

an evening in paris

the china syndrome

ADVANCED SEARCHING

With the Advanced search feature of the library catalog, a researcher can more specifically define search terms by mixing and matching the fields used to search and by using Boolean logic within searches.  As with all library catalog searching, the search engine will find the terms searched within the indexes used, assign relevance scores to the records found, and return a results screen in relevance order.

In Advanced search, the min-match rule is 100%.  This means that whatever terms are used in a search, ALL the terms must appear in the indexes used to be considered a match.

Advanced search includes the same 4 indexes as basic (Anywhere, Title, Author and Subject).   It also has several more fields to search.

Identifiers

Use the Identifiers search when you know the ISSN, ISBN or record ID.  Most of the time, a record ID is an OCLC number.

In Advanced search, in the Identifiers box, type: ocm01327794

This retrieves the single record, “My life and times”.

Note: for OCLC numbers, include the whole string, including the prefix.  ocm01327794 works but 01327794 does not.

In the Advanced search, Identifiers box, type:  0397507984 (an ISBN)

This retrieves the single record “Physical therapy”

In the Advanced search, Identifiers box, type:  0022-2917 (an ISSN)

This retrieves the single record “Journal of music therapy

Note: include the hypen in an ISSN search.  0022-2917 works but 0022 2917 does not.

For any identifier search, the term to use is the exact term that is in the record’s 001 or 020 or 022 field.

Call Numbers

The index used for call numbers comes for the records’ 852 call number fields.

In the Call Numbers box, type:  QD501 C49 2004

The system will retrieve the title “Chemical reactions : quantitative level …”

Try:  TP372.5

The system will retrieve a list of records, all of which have “TP372.5″ in the first part (subfield h) of their call number.  Call number Advanced searching is not a brows search – the results are not in shelf-list order – but all of the records will have the part of the call number you choose in them.

Years

One way that the Advanced search is different from a regular catalog search is that it’s possible to combine searches by years.  At this point, “Years” is defined as the publishing date.

While it’s possible to just do a Years search by typing, for example 2012 in the first box and 2012 in the second box to see all records published in 2012, the Years  box is meant to be used with other types of searches.  For example, to see the 2010 edition of the Springer handbook of nanotechnology, do this:

Title: springer handbook of nanotechnology

Years:  2010 – 2012

In this case, you get the 3rd (2010) edition and the 2010 3rd and revised edition.

Using Boolean Operators and Logic

The library catalog’s Advanced search supports Boolean operators and Boolean logic for searches.  The very first part of this feature is set at the top of the Advanced screen in section called

“Match All/Any of the fields below”

By default, this toggle is set to “All”.  When you put terms in more than one field of an advanced search, those fields are ANDed together for you search.

When you set the Match rule to “Any”, OR is the operator.  When you put terms in more than one field of an advanced search, those fields are ORed together for your search.

Advanced search can accommodate simple and sophisticated Boolean searching.  Here are examples:

1.
keyword:  (violence OR abuse OR battered) AND (family OR domestic)

2.
keyword:  cryogenics
Subjects:  low temperature

3.
title: hamlet
author/creator: Shakespeare

4.
keywords: chemistry OR chemical
subjects:  academic dissertations

Advanced Searching with  Limits

There is a post about “Refine and Limits” here: http://forward.library.wisconsin.edu/moving-forward/?p=904

Here are more examples about Advanced searching using the Limit feature:

1. Books about bird behavior in Steenbock:

Limit: Books AND Steenbock
keywords: birds
subjects: behavior OR behaviour

2. Thelonious Monk’s works in the Music Library:

Limit: sound recordings OR music scores AND Mills Music
Author/creator: thelonious monk

 

 

 

Indexing Details

This post supersedes the one posted December 2010. A group of people from of the Catalog Review Group is currently reviewing these indexes. Once they have finished their review, we will update this table with any changes.

TITLE searching

Type of title Field number Subfields
“title” 245 abkfnp
“alternate_titles” :
uniform titles 130 ahmnpr
240 ahmnpr
243 ahmnpr
730 ahmnprt
varying form of title 210 a
246 ahnp
former title 247 ahnp
series title 400 tv
410 ptv
411 tv
440 anpv
490 av
760, 762 t
800, 810, 811 kptv
830 akptv
added entry title 700, 710 mnprt
711 npt
740 ahnp
767, 772, 773, 776, 780, 785 t
contents notes 505 at

SUBJECT searching

Note: Subject searching is based on the field tag and subfield.  The indexing makes no distinction among the various thesauri used (indicator values) for the subject terms.

Type of subject Field number Subfields
“subjects”:
names 600 abcdtx
610 abtx
611 atx
uniform title 630 ax
topical terms 650 ax
651 x
690 ax
“geographic_subjects”:
geographic terms 600, 610, 611, 630, 650, 690 z
651 a

 

AUTHOR/CREATOR or NAMES searching

Type of author/creator Field number Subfields
“names”:
personal names 100 aq
400 ab
700 aq
800 aq
corporate names 110 ab
410 abn
710 ab
810 a
conference names 111, 411, 711, 811 a

 

ANYWHERE searching

In addition to all fields and subfields of the Title, Subject and Author/Creator searches, an ANYWHERE search includes these fields:

Name of field Field number Subfields
call numbers 852 (MARC holdings) hi
content types 600, 610, 611, 630, 650, 651, 655, 690 v
655 a
dates 260 c
identifiers 001
022 a
imprints 260 abc
languages 008 bytes 35-37
041 ad
notes 500, 502, 508, 511, 520, 538, 546, 586 a
505 art
time periods 600, 610, 611, 630, 651, 655, 690 y
648 a
650 dy
isbns 020 a
years 260 c
series statement 400 abdtv
410 abdfnptv
411 atv
440 anpv
490 av
800 abcdfklmnpqstv
810 abcdfklnptv
811 acdefklnpqtv
830 adfklnpstv

Relevance – Rules and Technical Details

Relevance

When you type search terms into the catalog’s search box, the Solr/Lucerne search engine looks for all records that have those terms.  It gives each record a score to determine its relevance, and then presents the most relevant at the head of the results list.

It is difficult to illustrate each of the relevance rules since all of them may be applied to each query a researcher enters.   The rules are (numbered for ease of reading):

1.  The more times a search term appears in a document, the higher the score.

2.  Matches on rarer terms count more than matches on common terms.

3.  If there are multiple terms in a query, the more terms that match, the higher the score.

NOTE:  When a researcher types more than one word as a search query, we use rules called min-match to analyze the query.  This is explained fuller in the “Searching the Catalog” document.

4.  Matches on a smaller field score higher than matches on a larger field.

5.  If a boost was specified for a document at index time, scores for searches that match that document will be boosted.   Note: We don’t currently do any index time boosting.

6.  A user may explicitly boost the contribution of one part of a query over another (by using the + next to a term).

7.   Our request-handlers (the pieces of code that are used for each type of request, like anywhere, title, subject, author) use query-time boosting.  Boosts or weights are applied at query time to different fields in the index.  In some cases, there are multiple versions of fields which we use in these boosting rules.  The Appendix to this document shows the different searches and the way that queries using those searches are boosted.

Aside:  We make a difference between terms that are *not* stemmed and those that *are* stemmed.   A *not* stemmed word means exactly the word as our researcher has typed it.  If a researcher types the word “explosive” we want to boost that exact word higher than any of its stems.  For more detail about stemming, see the document “Stemming and the Catalog”.

For the boosting part of the rules this means if I type the word “beloved” (without the quotes) in the regular search “Anywhere” box, all the records that have just that word as the main part of the title (245 subfield “a”) will get boosted to the top of the list.

Next will be those records that have that exact word in the title.

Next will be those records that have that word (or any of its stems) anywhere in the main title field.

Next will be those records that have the word in any of the alternate title fields.

Then would be rated records where the word “beloved” (again, no quotes) is in the names, then subjects, then anywhere in any of the other fields we index (“keywords”).

There is a second part of each query analysis called phrase boosting.   A second pass is made thru the search results which boosts the query as if it were a phrase.  When a researcher types more than one word, those records that contain those words together are boosted higher than those that have each of the words somewhere in the record.

For example: a researcher does a title search:

minnesota poetry

The top records will have those exact words in that order at the top of the list.  Here’s the search:

http://forward.library.wisconsin.edu/catalog?q=minnesota+poetry&qt=search_title&local=true

 

SAMPLE SEARCHING TO SHOW RELEVANCE

Anywhere search – single word:

Mandela

Carousel

aliens

Anywhere search – phrase:

Galapagos islands

stem cell research

endangered species

Try each of the above searches as title searches, too, to see how relevance changes somewhat.

Remember:  the results of any search reflect the analysis of each record and its score based on *all* the rules.   (Except perhaps that single-word searches don’t use rule 3).

**********************************

APPENDIX: Query Handlers Boosting Rules:  REGULAR SEARCH

Note: we made special rules for the parts of a MARC record that contain the vernacular for special languages, because these languages treat words differently from other languages.

Query Type = search (Anywhere) – single words

Field Name Boost by: notes
title_no_subtitle 10,000,000 only 245, subfield a
title_no_stem 1,000,000 exact word
title 100,000
alternate_titles 50,000
thai_vernacular_titles 50,000
chinese_vernacular_titles 50,000
cjk_vernacular_titles 50,000
names 1,000
vernacular_names 1,000
thai_vernacular_names 1,000
chinese_vernacular_names 1,000
cjk_vernacular_names 1,000
subjects 250
vernacular_subjects 250
thai_vernacular_subjects 250
chinese_vernacular_subjects 250
cjk_vernacular_subjects 250
keywords
cjk_keywords
thai_keywords
chinese_keywords

 

Query Type = search (Anywhere) – phrase

Field Name Boost by: notes
title_no_subtitle 10,000,000 only 245, subfield a
title_no_stem 1,000,000 exact words
title 1,000,000
alternate_titles 500,000
thai_vernacular_titles 750,000
chinese_vernacular_titles 750,000
cjk_vernacular_titles 750,000
names 10,000
vernacular_names 10,000
thai_vernacular_names 20,000
chinese_vernacular_names 20,000
cjk_vernacular_names 20,000
subjects 2,500
vernacular_subjects 2,500
thai_vernacular_subjects 5,000
chinese_vernacular_subjects 5,000
cjk_vernacular_subjects 5,000
cjk_keywords 100
thai_keywords 100
chinese_keywords 100
keywords 10

Query Type = search_title (Title) – single words

Field Name Boost by: notes
title_no_subtitle 10,000,000 only 245, subfield a
title_no_stem 1,000,000 exact word
title 100,000
alternate_titles_no_stem 75,000 exact words
alternate_titles 50,000
thai_vernacular_titles 50,000
chinese_vernacular_titles 50,000
cjk_vernacular_titles 50,000

 

Query Type = search_title (Title) – phrase

Field Name Boost by: notes
title_no_subtitle 100,000,000 only 245, subfield a
title_no_stem 10,000,000 exact words
title 1,000,000
alternate_titles_no_stem 750,000 exact words
alternate_titles 500,000
thai_vernacular_titles 500,000
chinese_vernacular_titles 500,000
cjk_vernacular_titles 500,000

Query Type = search_author (Author) – single words

Field Name Boost by: notes
names_no_stem 1,000 exact word
names 250
vernacular_names 250
thai_vernacular_names 250
chinese_vernacular_names 250
cjk_vernacular_names 250

Query Type = search_author (Author) – phrase

Field Name Boost by: notes
names_no_stem 10,000 exact words
names 2,500
vernacular_names 2,500
thai_vernacular_names 2,500
chinese_vernacular_names 2,500
cjk_vernacular_names 2,500

 

Query Type = search_subject (Subject) – single words

Field Name Boost by: notes
subjects_no_stem 5,000 exact word
subjects 1,000
geographic_subjects 1,000
vernacular_subjects 1,000
thai_vernacular_subjects 1,000
chinese_vernacular_subjects 1,000
cjk_vernacular_subjects 1,000

Query Type = search_subject (Subject) – Phrase

Field Name Boost by: notes
subjects_no_stem 50,000 exact words
subjects 10,000
geographic_subjects 10,000
vernacular_subjects 10,000
thai_vernacular_subjects 10,000
chinese_vernacular_subjects 10,000
cjk_vernacular_subjects 10,000

********************************************

APPENDIX: Query Handlers Boosting Rules:  ADVANCED SEARCHING

Query type=advanced

Query Type = qf_title (Title) – single words

Field Name Boost by: notes
title_no_subtitle 500
title_no_stem 200
title 50
alternate_titles_no_stem 25
alternate_titles 10
thai_vernacular_titles 10
chinese_vernacular_titles 10
cjk_vernacular_titles 10

Query Type = pf_title (Title) – phrase

Field Name Boost by: notes
title_no_subtitle 10,000
title_no_stem 5,000
title 3,000
alternate_titles_no_stem 1,000
alternate_titles 500
thai_vernacular_titles 500
chinese_vernacular_titles 500
cjk_vernacular_titles 500

Query Type = qf_names (Authors/Creators) – single words

Field Name Boost by: notes
names_no_stem 200
names 50
vernacular_names 50
thai_vernacular_names 50
chinese_vernacular_names 50
cjk_vernacular_names 50

Query Type = pf_names (Authors/Creators) – phrase

Field Name Boost by: notes
names_no_stem 5,000
names 3,000
vernacular_names 3,000
thai_vernacular_names 3,000
chinese_vernacular_names 3,000
cjk_vernacular_names 3,000

Query Type = qf_subjects (Subjects) – single words

Field Name Boost by: notes
subjects_no_stem 200
subjects 100
geographic_subjects 100
thai_vernacular_subjects 100
chinese_vernacular_subjects 100
cjk_vernacular_subjects 100

Query Type = pf_subjects (Subjects) – phrase

Field Name Boost by: notes
subjects_no_stem 5,000
subjects 1,000
geographic_subjects 1,000
vernacular_subjects 1,000
thai_vernacular_subjects 1,000
chinese_vernacular_subjects 1,000
cjk_vernacular_subjects 1,000

 

Query Type = qf_identifiers (Identifiers) – single words

Field Name Boost by: notes
id 500
raw_identifier 200
identifiers 200
collection_ids 200
isbns 200
call_numbers 200
call_numbers_no_stem 500

 

Query Type = pf_identifiers (Identifiers) – phrase

Field Name Boost by: notes
id 5,000
raw_identifier 2,000
identifiers 2,000
collection_ids 2,000
isbns 2,000
call_numbers 2,000
call_numbers_no_stem 5,000

Query Type = qf_call_numbers (Call Numbers) – single words

Field Name Boost by: notes
call_numbers_no_stem 200
call_numbers 100

Query Type = pf_call_numbers (Call Numbers) – phrase

Field Name Boost by: notes
call_numbers_no_stem 5,000
call_numbers 1,000

 

Query Type = qf_keywords (Keywords) – single words

Field Name Boost by: notes
keywords single keywords in advanced are
cjk_keywords not boosted
chinese_keywords
thai_keywords

 

Query Type = pf_keywords (Keywords) – phrase

Field Name Boost  by: notes
keywords 10
cjk_keywords 100
chinese_keywords 100
thai_keywords 100

Refine and Limit – What they are and how they are different

REFINE and LIMITS

When you do a search in the catalog, it returns a list of records.  On the left side of the results page you will see ways to “Refine” your search.  Many people call these “facets”.  Facets are generated based on the content of the records that have been retrieved. We call this section “Refine” to make it clear that our researchers might use these terms to refine or narrow down their search results.

There are four ways to Refine: by Formats, by Subjects, by Languages and by Geographic Subjects.

Note:  There is another document called “Searching the Catalog” which explains the indexing details.

Refine by Formats

The Formats category shows the type of material of a record: book, video, microform or kit, for example.  Each record has at least one format.  Some records have more than one format.  For example, there can be records that have both print and microform holdings, and records that have a print holdings and an online (electronic resources) holding.

When we index records, we take the leader of each record (where format data is encoded) and any 007 of each record (where more detail on the physical description of a record is encoded) and assign it one or more formats based on the content of those two parts of the MARC record.

Note: using Refine does not affect the relevance rating of the documents; it just narrows down the list of records in the results list.  You might think of Refine as an “ANDed” search.

Searching examples (Search only UW Madison):

Anywhere: Adirondacks, refined by Maps, Atlases

go.wisc.edu/z0sz3b

Anywhere: Wisconsin, refined by Maps, Atlases

go.wisc.edu/37j002

Anywhere: water pollution, refined by Journals, Magazines, Newspapers:

go.wisc.edu/k8158t

Anywhere: middle east policy, refined by Books

go.wisc.edu/u6lv15

One of the things you can do when you use the Refine feature is find e-books.  It’s not fool-proof, but if your researcher wants e-books, this helps.

Do a search, refine by “Books” and then refine again by “Electronic resources”.  Here is an example:

Anywhere: quantum mechanics, refined first by Books, and then by Electronic resources:

go.wisc.edu/a8h0s2

Here are some single record examples to show the encoding:

Here’s an example where the encoding shows both book (printed material) with the leader “am” and 007a which is Maps, Atlas.

http://forward.library.wisconsin.edu/catalog/ocm01997440

“The geographic regions of Vermont: a study in maps”

Here’s an example where the encoding shows both book (with the leader “am” and the 007 for Electronic Resource:

http://forward.library.wisconsin.edu/catalog/ocm42330179

“China’s unfinished economic revolution”

Refine by Language

The Language category shows the language in which an item was written or translated.  A record can have more than one language category.  When we index records, we take the language code of the record’s 008 field and the contents of the MARC tag 041 to determine the language.  The 041 Language code field is defined in the MARC21 Format for Bibliographic Data as: “Codes for languages associated with an item when the language code in field 008/35-37 of the record is insufficient to convey full information”.

Searching examples: (Search only UW Madison)

Anywhere: literature Portugal, refined by Portuguese

go.wisc.edu/m96y3j

Anywhere: harry potter, refined by Spanish

go.wisc.edu/g1x8jc

To see the encoding, here are some single record examples:

http://forward.library.wisconsin.edu/catalog/ocm27408656

“Logos”  (Russian)

http://forward.library.wisconsin.edu/catalog/ocm17601251

“L’urbs : espace urbain et historie” (Italian)

This example shows that the record is encoded as in German (the 008 field part) but that it is also in English and in French (the 041 part):

http://forward.library.wisconsin.edu/catalog/ocm58045758

“Berlin, Deutschland = Berlin, Germany = Berlin, Allemagne”

This example shows that the record is encoded as French (the 008 part), but is also in Italian (the 041 part):

http://forward.library.wisconsin.edu/catalog/ocn318871517

“Rome et la science modern”

Refine by Subjects

The Subjects category shows all of the subject terms from topical subfields from all the records in the results list.  When we index records we take all of the subject fields and subfields of each record and assign subject categories.

Sample searches:

Anywhere:  drug war, refined by subject heading “Drug traffic”

go.wisc.edu/nzgtyy

Anywhere:  terrorism, refined by subject heading “National security”

go.wisc.edu/hb4qon

Anywhere: peace movement, refined by subject heading “Protest movements”

go.wisc.edu/50kgv4

Refine by Geographic Subjects

The Geographic Subjects category shows all of the geographic subject terms used in the records in the results list.  When we index records we take all the geographic subject fields and the geographic subfields of other subject fields of a record and assign these as geographic subjects.

Sample searches:

Anywhere:  working class, refined by geographic subject heading “France”

go.wisc.edu/w5ch8f

Anywhere:  low income housing, refined by geographic subject heading “Wisconsin”

go.wisc.edu/35kuc2

Anywhere: political parties, refined by geographic subject heading “India”

go.wisc.edu/f0m47e

****************************************************

Advanced Search and Limits

When you go the Advanced Search page, you see a section called “Limits” on the right side of the search page.  We call this section “Limits” to distinguish it from “Refine” because in the Advanced search, researchers can modify their entire search using the three types of limits at any point; not just after a search is begun.  In the Advanced Search it is possible to know before any search is entered, all of the formats, languages and locations of the entire catalog collection by starting a search with a Limit to any Format, Language or Location.

The limits on the advanced search screen are a list of all the values for formats, languages and locations that are in our collection.  If a researcher sees “Abkhaz” in the list of languages, there is at least one Abkhaz item in the UW System.  If a researcher sees a Format of 3-D Objects, Models, Sculpture, that means there is at least one record of that Format.

When a researcher enters terms in any of the search boxes, and then uses the Limits feature, the search engine will find records with that search term AND which have one or more of the Format, Language or Location Limits which the researcher has checked.

It is also true that when a researcher enters terms in any of the search boxes and then uses more than one Limit, the search that is returned will contain records that have that search term and which have *any* of the limits.   Limits are OR-ed together in the Advanced Search.

SAMPLE SEARCHES

LIMIT FIRST

The researcher wants to have a list of all records in Hebrew in our catalog.   Do this search:

go.wisc.edu/79226j

A researcher wants to know all the Music Scores available in the Music Library.  Do this search:

go.wisc.edu/6jq7i6

A researcher wants to see all Sound Recordings (LPs, DVDs, CD are all included here) in Spanish in our catalog.  Do this search:

go.wisc.edu/4227ot

A researcher wants to see everything that’s in the CCBC.  Do this search:

go.wisc.edu/367555

 

SEARCH, THEN LIMIT

Try these searches to see how to search and then to limit:

Subject:  global warming, limited by Videos, Slides, Films

go.wisc.edu/168f8a

Keywords: business ethics, limited by Format: Books AND Location: Business Library

go.wisc.edu/gcvlls

Subject: housing AND poverty, limited by Books, English, Memorial Library

go.wisc.edu/m3v460

Title: technical reports, limited by Wendt Library

go.wisc.edu/042si2

 

Stemming – What’s that?

Stemming

In linguistics or in information retrieval, stemming is the process of reducing some kinds of words to their stem. There are a variety of stemmers available. The one we use is an algorithmic stemmer called the Snowball English or Porter2 stemmer which derives from it. Other stemmers use dictionaries instead of algorithms. Think of stemming as suffix-stripping, but with some specific rules about what constitutes a suffix. Here is a paper on the Porter2 stemmer for those who want more.

http://snowball.tartarus.org/texts/introduction.html

And a chart with examples of the Porter2 stemmer is here:

http://snowball.tartarus.org/algorithms/english/stemmer.html

And here is a site where you can try for yourself to see what constitutes a stem and what does not:

http://qaa.ath.cx/porter_js_demo.html

Here you will see that if you type the word “greens” you will find that there is a stem for it: “green”. But if you type the word “greener” you will find that there is no stem for it beside “greener”. Another example would show you that when you type the word “searching” or “search” or “searches” you’ll find they all stem to “search”. But “searcher” does not; it stems to “searcher”.

Another way to understand this is to think about this is that for some fields, we broaden the recall by searching for words that share a common stem. However, we substantially boost precision in our relevancy algorithm by providing higher scores when the user’s query matches the unstemmed version of the data. This extra boost is applied to our most important fields: titles, names, subjects and call numbers.

Two things of note here: one is that stemming is just one of the tools we use when we create our index and when we analyze a user’s query. Others are how to determine what a “word” is (surrounded by white-space except for specific languages), setting all terms to lowercase and changing CamelCase words to lower case (where CamelCase becomes the word camelcase). The other is that the newest version of Solr/Lucerne includes a dictionary stemmer for over 50 languages. We’ll be looking at that after our new catalog has gone live.

Notes from the DoIT Student Advisory Board Meeting

I presented our findings from the DoIT Student Advisory Board meeting at the Forward Forum on November 30. You can find the full report from this very informative session here.

We would like to repeat this activity with a couple more groups at the beginning of next semester. If you know of a group that may be interested, such as your student assistants, please get in touch for more details.

Advanced Search Update

Forward’s advanced search has gone through a major overhaul in the last few weeks. The form has been updated and new functionality has been added:

  • Boolean support: all advanced search fields can be used with AND/OR/NOT boolean operators and parenthetical nesting
  • Limits: pre-limit your search to specific formats, languages or library locations

Blacklight upgrade

This work was made possible by upgrading the Blacklight software that runs Forward. Upgrading Blacklight allowed us to install the Blacklight Advanced Search plugin developed by Jonathan Rochkind of John Hopkins University Libraries.