I was scanning journal tables of contents as usual this week and it occurred to me that there must be a better way to find relevant and timely research information that would be of interest to Sciencebase readers…and, of course, out pops the following title:
Technically approaching the semantic web bottleneck
Sounded, perfect…kind of…but what’s the semantic web, why’s there a bottleneck and what can be done to lube the tube?
Tim Berners-Lee’s original vision for the semantic web was that information would be just as readable (and understandable) to a person or to a machine. Digital objects, whether web page, image, video, or some other file, would have embedded within them meta data that would provide context to the content and allow software to extract meaning from the file.
Some software currently has a limited understanding of simple meta data, although any SEO will tell you that Google largely ignores web page meta data these days. That point aside, there is so much that might be done if the web were effectively self-aware (not talking notions of the singularity here, just making it all more useful and easier to use). So, I asked the paper’s author, Nikolaos Konstantinou, for a few examples of how the semantic web, often referred to as Web 3.0 (although you might call it Web 2.1 or Web 2.0++), might benefit us. The first benefit would be more intelligent searches he told me, either across the web or in large-scale data repositories where intelligence is referred to in contrast to the conventional keyword-based search methods employed by the search engines.
“For instance, performing a search in Google for e.g. ‘renaissance paintings’ you will notice that among the first pages of the results returned, the vast majority contains the keywords ‘renaissance paintings’ in the respective page text (or image HTML image ‘alt’ tag),” he says. “That is because the search engine does not process the content available semantically and therefore, the results although they will be accurate, will be far from being complete. This will cause an arts student, for instance, to spend too much time finding relevant content. She would probably have to visit certain museum pages and collect the results on her own.”
This is where the semantic web would come into play, Konstantinou adds. “The vision is to get a list of what you asked for even in the case when your keyword does not exist in the web page. In the example above, a page with Leonardo da Vinci’s paintings will not be considered relevant if the words ‘renaissance paintings’ do not exist in the page. In the semantic web world the system would ‘know’ that Leonardo da Vinci is an artist of the Renaissance and therefore his works would be returned to the user performing the query.”
A second benefit would be knowledge inferred by the existing one. A system built using semantic web technologies, with the support of reasoning procedures could logically deduce information, explains Konstantinou.
“The most classic example about inference is that from the statements ‘all men are mortal’ and ‘Socrates is a man’, we can deduce that ‘socrates is mortal’. This property (transitive property) in combination with a wider set of properties can augment the knowledge inserted in a system, without requiring human insertion of each and every fact, which avoids errors and reduces the workload.”
Simply, by stating 5 facts to a system, using an ontology (a glossary) and a reasoner, the system will be able to deduce 15 facts by applying logic rules (reasoning). This is in fact what allows the intelligent queries mentioned in the Renaissance example. Such a system, when asked “is socrates mortal”? will return a YES, while without reasoning the answer would be NO (or UNKNOWN in other cases). Similarly, socrates would be included in a search like “tell me all the mortals in the system”. “This is, in fact, what is meant by ‘machine understandable’ information, the ability for a machine to process information,” adds Konstantinou.
Now…how do I apply that logic to scanning tables of contents for worthy news items?
Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Periklis Stavrou, & Nikolas Mitrou (2010). Technically approaching the semantic web bottleneck Int. J. Web Engineering and Technology, 6 (1), 83-111