Introduction to Semantic Web : Concept and Implementation in xhtml


19 Mar  

Assume that you want to know about Python snake and you go to Google and type Python as the search query. You can see that some of the results will be about the programming language ‘Python’. Though Google is the best search engine, it can’t precisely tell which pages are about the snake ‘python’. It is also true that Google uses many algorithms to find the relationships between words in a document and group those results together. This is the reason why you get suggestions from Google if you enter ‘python’ as the search query.

This problem can be solved only if computers can understand human language. Even though natural language processing (NLP) algorithms are employed in many popular search engines, the tools are not very efficient.

Here comes the importance of creating a semantic web where computers can understand relationships between words and objects. This edition is about the idea behind Semantic networks and their implementation.

 

RDF

Resource Description Framework (RDF) is a layout proposed by the W3 consortium for enabling semantic networks. We can say that the idea had its origin in the concept of tagging. Here is a sample RDF description:

 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description rdf:about="http://aasisvinayak.com/
">
        <dc:title>Aasis Vinayak</dc:title>
        <dc:publisher>aasisvinayak.com</dc:publisher>
    </rdf:Description>
</rdf:RDF>

 

Here, what it tells is that ‘Aasis Vinayak’ is the title of the URL page and the publisher is aasisvinayak.com. This is a very simple case. But this effectively conveys some idea to the computer (or search engine crawler) in a semantic way.

 

RDFa

 

RDF is only a concept, we need to use RDFa (Resource Description Framework in attributes) tools for implementing this idea. RDFa  uses two main concepts :

 

CURIE:  You might know that URI refers to Universal Resource Identifier. The most common URI is URL (Universal Resource locator) which is the address of a web page. You may have noticed that many URLs are very along and hard to remember (like this one – http://techblog.aasisvinayak.com/a-twitter-application-using-java-and-swing-tutorial/ ). In RDFa, we use an elegant form of URL called CuteURI (CURIE) :

 

foaf: name

where, the first part is actually a representation of another URI.

 

curie used in rdfa

 

N3 notation  or Notation3: This is another concept in RDFa where we divide all the ‘ideas and concepts’ into three parts – subject, predicate and object (just like we did in 1st grade!). Let’s take an example:

 

Vinayak loves GNU

Here ‘Vinayak’ is the subject, ‘loves’ is predicate and ‘GNU’ is the object.

In N3 notation format, this should be expressed as:

 

@prefix  pref<http://aasisvinayak.com/vocabulary#>.

<#vinayak> pref:loves <#GNU>.

 

@prefix indicates the CURIE and the URI to be used will be given as ‘pref’. And we use ‘.’ (period) for ending all statements. Here is another example expressed in tree form:

 

 

 

semantic tree

 

Vocabulary

In the previous case, we have seen the use of ‘vocabulary’. It is used to describe a particular subject or object. Say, if ‘vinayak’ is linked to ‘name’ vocabulary, it means that it is a name. Here is a more concrete example:

 

@prefix  pref<http://xmlns.com/foaf/0.1/#>.

<#vinayak> pref:name <"Aasis Vinayak">.

 

This means that ‘vinayak’ is a name and the complete name is Aasis Vinayak.

 

How to implement this?

 

We have seen the syntax for implementing RDF; now let’s see how to put this into practice. For this I’m going to use the previous example itself. In xhtml, I can write:

 

<body xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <span about="vinayak" property="foaf=name">
        Aasis Vinayak       
    </span>
</body>

 

This represents the same thing. If  change the human readable part as:

 

<body xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <span about="vinayak" property="foaf=name">
        Aasis Vinayak   PG
    </span>
</body>

 

The machine will still understand the idea – that ‘Aasis Vinayak   PG’ is the full name.

There is another property called ‘typeof’ which can be used to specify what kind of subject or object that we are meddling with:

 

<body xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <span about="#vinayak" typeof="foaf:Person" property="foaf:name">
        Aasis Vinayak       
    </span>
</body>

 

This means that ‘Aasis Vinayak’ is also person. In short, ‘Aasis Vinayak’ is the ‘name of a person’.  In this way, a computer will ‘understand’ the idea.

Let’s go one more step by using a ‘friend-of-a-friend’ relationship. Say,

Jane knows Mac

jane knows mac - semantic relationship

 

(I assume that you know how to divide this sentence and rewrite that as RDFa). This can be expressed in the following way:

 

<body xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <span about="#jane" typeof="foaf:Person" property="foaf:name">
       Jane Blah      
    </span>
    <span about="#mac" typeof="foaf:Person" property="foaf:name">
       Mac Blah Blah     
    </span>   
    <span about="#jane" rel="foaf:knows" resource="#mac">
      Jane knows Mac   
    </span>    
</body>

 

Here you can see that we have mentioned full names of two people and then in the third segment we linked them using the relationship property ‘foaf:knows’. Now, the computer ‘knows’ that  ‘Jane knows Mac’.

Now imagine if we could  do this in every page; Google can precisely tell which page talks about the python snake!

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay

Tags: , , , ,


TechBlog on Facebook

Comments (1)

 

  1. Warren Otano says:

    Blog looks really good mate, keep it up! Inspires me to keep building a following of my own.

Leave a Reply