Search engines

From HLWIKI Canada
Revision as of 10:19, 9 June 2012 by Dean (Talk | contribs)

Jump to: navigation, search
Search engine wheel
Are you interested in contributing to HLWIKI Canada - hlwiki.ca? contact: dean.giustini@ubc.ca

To browse other articles on a range of HSL topics, see the wiki index.

Contents

Introduction

See also Google scholar | Microsoft Academic Search in beta | Open access | Open search tools | Scirus

Google, Bing and Yahoo are the three top search engines in the digital age. As such, they are directly linked to the work of librarians and information specialists and to trends such as open access and web 2.0. Search engines are popular because they offer a quick way to easily search across the web. Search engines, however, can introduce a number of problems in efficient information retrieval, especially in the area of subject-based searching. Many health librarians point to issues of poor recall, consistency and authority control with search engines although they are deemed acceptable for most queries because they facilitate quick access and point to popular content.

Browsing & precision trends

It was inevitable that searching would take on some of the features of web 2.0. Many of the earliest algorithms were based on link popularity, a kind of wisdom of the crowds. Now, information needs in workplaces arise in the context of collaboration and participation. At one end, health teams write collaboratively and participate in trials where comprehensive (high recall) searches are needed (though they may tolerate low precision to begin). At the other end, medical students and nurses work together to retrieve a few good articles. These health professionals do not require high recall but high precision and algorithms coupled with other forms of recommended websites produce acceptable results. As the web scales in size, new requirements emerge. Customized social search - offered by Google health - is likely to become more important as a means of offering targeted searching among a group of sites. These websites will be recommended or compiled by other workers and collaborators or other experts. However, it must be said that with the rise of search tools and social search, there is a concomitant decline in the use of traditional databases. This may be due to the simple truth that high recall and precision in searching is not required for most queries. Health library users, for example, have different requirements for literature reviews and basic information. Precision tolerance of health professionals is directly related to recall. In the Internet age, the notion of complete recall as an indicator of success seems outdated and unrealistic. Exhaustive searching is not always needed. The idea of leading users to an acceptable number of papers has led some librarians to suggest proportional recall (or relative recall) where success is expressed as the number of relevant documents retrieved, over relevant documents required. A pharmacist may need five relevant documents, but her search retrieves only three. Proportional recall is therefore three-fifths, or 60%. This measure, while appealing, is artificial in that few health library users can specify what they really need before searching, let alone how many documents they will need.

Challenges faced by search engines

  • The web is growing faster than present-technology can index it. Major search-engines are slower to index new Web pages, according to anecdotal evidence. The overlap in the top three search engines of sites indexed is less than 10%.
  • Many web pages are updated frequently, which makes it necessary to revisit them daily.
  • Queries are limited to searching for key words, which may result in false drops, especially using the default page-wide search. Better results might be achieved by using a proximity-search option with a search-bracket to limit matches within a paragraph or phrase, rather than matching random words scattered across large pages. Another alternative is using human operators to do the researching for the user with organic search engines.
  • Dynamically-generated sites may be slow or difficult to index, or may result in excessive results, perhaps generating 500 times more Web pages than average. Example: for a dynamic Web page which changes content based on entries inserted from a database, a search-engine might be requested to index 50,000 static Web pages for 50,000 different parameter values passed to that dynamic web page.
  • Many dynamically generated sites are not indexable by search engines; this phenomenon is known as the deep, "dark" or invisible web.
  • Some search-engines do not rank results by relevance, but by the amount of money the matching Web sites pay.
  • In the past year, search engine optimization (SEO) has become big business and some techniques conspire to undermine organic results. This leads to linkspam or bait-and-switch pages which contain little information about matched phrases. Some observers suggest that relevant Web pages are being pushed down in results due to SEO.

General search engines

Social search

Science search engines

Health specific search tools

References

  • Google scholar bibliography
  • Eurekster build mini-search engines that aggregate information on particular topics from sites they choose. In the Search Engine Journal, Greg Sterling said: "Eurekster... serves up progressively more relevant results (than general Web search) on the basis of communal search behavior..." about
  • Johnson BE. Bing or bust: can Microsoft cure 'search overload syndrome'? Computers in Libraries. 2009;29(10):36-40.
    • Microsoft's Bing is promoted as a new kind of search engine. A departure from Google's minimalist user interface, it instead integrates several navigational features that bring human- and computer-generated metadata to the front end of the search process. This article examines the extent to which the new approach is an advance over existing search engines, and the impact on information professionals.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox