Search This Blog

Thursday, December 18, 2008

An Interview with Dr. Rudi Studer on Semantic Search Technologies

DECEMBER 16, 2008

An Interview with Dr. Rudi Studer on Semantic Search Technologies

Dr. Rudi Studer is no stranger to the world of semantic search. A full professor in Applied Informatics at University of Karlsruhe, Dr. Studer is also director of the Karlsruhe Service Research Institute, an interdisciplinary center designed to spur new concepts and technologies for a services-based economy. His areas of research include ontology managementsemantic web services, and knowledge management. He has been a past president of the Semantic Web Science Association and has served as Editor-in-Chief of the journal Web Semantics.

In addition to his duties as director of the KSRI, Dr. Studer is a vice president for Semantic Technologies Institute International and helped found ontoprise GmbH, an enterprise software company built around deploying semantic technologies. Dr. Studer recently gave a talk at Yahoo! about semantic technologies, and he was kind enough to answer a set of follow-up questions about the future of semantic search.

Yahoo! (Y!): Could you please tell us about your research on semantic search at the University of Karlsruhe?

Rudi Studer (RS): We look at semantic search as a process of information access, where one or several activities can be supported by semantic technologies. These activities include preprocessing and extraction of information, the interpretation of user information needs, the actual query processing, the presentation of results, and finally, the processing of user feedback for subsequent queries and to generate improved refinements. In all of these steps, semantic technologies can be exploited. For example, with respect to interpreting user information needs, we work on techniques to automatically translate information needs, expressed in either natural language queries or keyword-based queries, into expressive queries that are specified in structured query languages, such as SPARQL.

Y!: Early on, semantic technologies drew criticism for overestimating their own short-term impact and failing to embrace some of the realities of the Web. In what ways do you think the semantic web community has matured since then?

RS: It’s true that in the Semantic Web community a lot of emphasis has been put on Semantics rather than on Web aspects. But, important to note, semantic technologies are not only about the Web. Many of these technologies, e.g. in the context of Enterprise Information Integration, were indeed successful in closed and controlled environments. Now, we’re beginning to see that these technologies are more and more applied to open Web environments, as well.

Of course there have also been many developments that focus on Web aspects in particular. In the context of combining Web 2.0 and Semantic Web technologies, we see that the Web is the central point. In terms of short term impact, Web 2.0 has clearly passed the Semantic Web, but in the long run there is a lot that Semantic Web technologies can contribute. We see especially promising advancements in developing and deploying lightweight semantic approaches.

Y!: In principle, semantic technologies should be able to help search engines more precisely match the user’s intent with the content on the page. But again, this has proven to be harder to realize than originally expected. Are we getting closer to the solution?

RS: No one ever said that it was going to be easy! But yes, we are getting closer. As I indicated before, many of the technologies today work well in closed environments (e.g. Enterprise scenarios), but do not necessarily scale to the Web (yet). But of course there is improvement on that side as well. Powerset (acquired by Microsoft this year), for example, is a good indicator of where we’re headed and certainly a proof point that we’re getting closer.

Y!: The semantic web suffers from a chicken-and-egg problem, where developers are unwilling to create applications due to a lack of metadata, and publishers are unwilling to expose metadata due to a lack of applications. What are some of the ways to break out of this deadlock?

RS: There are two solutions to this: First, we need to make it easier for publishers to produce semantic metadata and second, we need to make the benefits more obvious for the application developers.

With regard to the first aspect, a lot of the data is already available in structured form (e.g. in databases of the deep web), and technically straight-forward to expose in the form of RDF. TheOpen Linked Data Initiative is a good example of large numbers of data sources that have been published as RDF data. Then there is the unstructured data. Technologies like semantic wikis (e.g. the Semantic MediaWiki) allow the easy and seamless construction of semantic metadata as the content is produced.

The benefits of semantic metadata are becoming more and more obvious. At this year’s ISWC the Billion Triple Challenge uncovered a number of useful applications that show the benefits of combining existing Semantic Web data sources in an intelligent way.

Y!: How do you think major search engines supporting semantic technologies might contribute to the growth of the semantic web?

RS: Once search engines index Semantic Web data, the benefits will be even more obvious and immediate to the end user. Yahoo!’s SearchMonkey is a good example of this. In turn, if there is a benefit for the end user, content providers will make their data available using Semantic Web standards.

Y!: What do you think are some of the commercial opportunities left to be explored by semantic technologies?

RS: So far, semantic technologies have been used in commercial products for data integration, enterprise semantic search and content management, etc. I expect this area to grow, but prospectively I see more and more potential for business opportunities in the combination of the social web and semantic technologies as well as in the context of mashups. An area that is also still largely unexplored is the area of advertisements in the context of semantic search.

Y!: What are some of the pitfalls that developers run into when they first start investigating or deploying semantic metadata?

RS: One problem in the early days was that the tool support was not as mature as for other technologies. This has changed over the years as we now have stable tooling infrastructure available. This also becomes apparent when looking at the at this year’s Semantic Web Challenge.

Another aspect is the complexity of some of the technologies. For example, understanding the foundation of languages such as OWL (being based on Description Logics) is not trivial. At the same time, doing useful stuff does not require being an expert in Logics – many things can already be done exploiting only a small subset of all the language features.

Y!: If you’re a front-end developer who’s interested in finding out more about semantic metadata, where should you get started?

RS: There are now numerous books out there, e.g. Antoniou/van Harmelen: A Semantic Web PrimerDavies et al. (eds.): Semantic Web Technologies, and Staab/Studer (eds.): Handbook on Ontologies. There is also a large collection of video lectures at

Of course the W3C recommendations for RDFOWL and SPARQL are a useful reference. For inspiration, I recommend looking at some of the sites exploiting semantic technologies, e.g.semanticweb.orgTwine, or Freebase.

1 comment:

kidehen said...

Take two (i.e., I posted yesterday):

I would suggest you include DBpedia amongst the list of practical and live usecases of the virtues of RDF based Linked Data.