Semantic web

  25 October 2016


"I have a dream ... [where computers] become capable of analyzing all the data on the web -- content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize." — Sir Tim Berners-Lee, 1999

The semantic web is a contested term, one that comprises techniques and tools that promise to improve the web's organization and usability. According to Tim Berners-Lee, "...The Semantic Web is an extension of the Web through standards by the World Wide Web Consortium (W3C). The standards promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF)..." Thus the semantic web will offer more structure where documents and websites will be interconnected and linked through the use of descriptive metadata (i.e., linked data). The notion of linked data is an important feature of the future web as articulated by proponents of the semantic web which will supposedly improve our understanding of what's on the web and help to make the semantic inferences needed among billions of items. The semantic web should ultimately lead to better findability of materials which, one surmises, will have been better described (more accurately described) through the use of thesauri and various markup languages. In addition, descriptive language will help the web's automated systems and bots to do their work by crawling the web and bringing similar documents together for retrieval. Many ideas of the semantic web originate with Sir Tim Berners-Lee, a British computer scientist and key figure in the evolution of the web, who is widely-viewed as the web's 'inventor'. The semantic web has been debated for years, and can be said to be fiercely contested. It is a controversial concept; critics say it will be impossible to achieve and that Google's PageRank algorithm (and other algorithms like it) will be required to find anything for the foreseeable future.

Why is the semantic web necessary?

The rise of the Internet has made information plentiful. Search engines such as Google, Bing and Yahoo have made it possible to search across a top layer of the web in less than a second. Without the advent of these large search systems, it's obvious that the Internet age would have been made that much more chaotic. However, we now face some serious problems associated with the use of search engines if we want to maintain some semblance of control over our search and retrieval activities:

  • Too many irrelevant documents
  • In information retrieval, two principles help in developing search strategies to locate all relevant documents in a database: these are precision and recall. On the web, using Google scholar let's say, means that our search results are generally higher in recall but lower in precision. Even where the most relevant items are found, many items are either missed or will never be found.
  • Not enough relevant documents
  • Using search engines means we don't find all relevant materials, and may need synonyms to describe what we are looking for; this is called insufficient or low recall. Low recall may not be a problem for a current generation of web users, but compare searches on the web to those conducted in a database of controlled vocabularies. Controlled terms (or semantically similar queries) return similar results and co-locate documents based on their content.

Metadata will help search tools find similar materials much like what we enjoy currently in library databases and catalogues. Defined simply as 'data about data', metadata is an important tool. Two important library standards for describing materials on the web are Dublin Core and Resource Description and Access (RDA). Health librarians can think of metadata as the kind of information used in bibliographic records in our catalogues, information that describes documents or other intellectual works based on professional standards. The Dublin Core metadata element set is defined by NISO Standard Z39.85-2007. For more background, see the Semantic Media wiki.

The semantic web (and its associated semantic aware applications) attempts to address the problems associated with poor integration, findability and organization on the current Web. Interestingly, the semantic web and Web 3.0 are often used synonymously because they share many of the same goals and objectives. However, the semantic web is a specific set of trends and technologies that will reach maturity in the next ten to fifteen years. This period of time is seen to be the third decade of the web's evolution from 2010-2020, which is why it is referred to as web 3.0. Web expert Nova Spivack has come up with his predictions of features of web 4.0 already. A 2001 Scientific American article by Berners-Lee describes a vision for the web that is quite different from the one that we have presently. A recent article from Berners-Lee and colleagues stated that: "This simple idea is still in process. Perhaps intelligent agents - smarter 'bots' that crawl websites for useful information and connect it - that have been touted for ages will finally materialize."

Interesting quotes by the semantic web guru

"People keep asking what web 3.0 is. I think maybe when you've got ...[the] semantic Web integrated across a huge space of data, you'll have access to an unbelievable resource [in web 3.0]." - A 'more revolutionary' Web - Sir Tim Berners-Lee "I have a dream for the web [where computers] become capable of analyzing all the data on the web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines." "Semantic publishing will benefit greatly from the semantic web. In particular, the semantic web is expected to revolutionize academic and scientific publishing, such as real-time publishing and sharing of experimental data on the Internet. This simple but radical idea is now being explored by W3C HCLS group's Scientific Publishing Task Force."

Why should librarians care

Librarians need to articulate a vision for change and find a secure place for themselves in the digital age. (See Fiona Bradley's Semantic Library). A web built on the principles of description and analysis will be vastly different from what we know today. It may be where much of human knowledge is built into the web itself. What's remarkable about semantic technologies is that they will not affect the look or feel of our 'web experiences', and will probably perform tasks behind the scenes, without our users' knowledge. In other words, users will be unaware of any filters or tools we have devised to connect them to the vast networks of information across the world. Given our users' expectations for seamless delivery of information age and their demands for instant access via handheld technologies such as the iPad, it's not a matter of if this will happen - but when.

Semantic websites

