{"id":26,"date":"2004-08-01T16:31:12","date_gmt":"2004-08-01T22:31:12","guid":{"rendered":"http:\/\/www.khaitan.org\/blog\/2004\/08\/the-vision-of-semantic-web-part-i-search-engines-and-web-content\/"},"modified":"2004-08-01T16:31:12","modified_gmt":"2004-08-01T22:31:12","slug":"the-vision-of-semantic-web-part-i-search-engines-and-web-content","status":"publish","type":"post","link":"https:\/\/www.khaitan.org\/blog\/2004\/08\/the-vision-of-semantic-web-part-i-search-engines-and-web-content\/","title":{"rendered":"The Vision of Semantic Web: Part I (Search Engines and Web content)"},"content":{"rendered":"<p><a href=\"http:\/\/www.m-w.com\/cgi-bin\/dictionary?book=Dictionary&#038;va=semantic\">Semantic<\/a> 1 : of or relating to meaning in language<br \/>\nThat&#8217;s the dictionary definition of Semantic. When applied to the Web&#8211;it means content which is <i>semantically<\/i> related to the content. Let us take the example of a keyword search on Google. I type in <a href=\"http:\/\/www.google.com\/search?hl=en&#038;ie=UTF-8&#038;q=Blog\">Blog<\/a>, take a snapshot of the results and then key in <a href=\"http:\/\/www.google.com\/search?hl=en&#038;ie=UTF-8&#038;q=Weblog\">Weblog<\/a>. There is only one result in the top 10 which is found in these two samples.<br \/>\nBlog and Weblog; don&#8217;t we use these interchangeably? Don&#8217;t they mean the same? Semantically, to a human&#8211;YES; to the search engine indexing the web content&#8211;NO. That&#8217;s exactly the vision of Semantic Web, when search engines and information retrieval in general extracts data like humans.<br \/>\nWell, in the above example of &#8220;Blog&#8221; vs. &#8220;Weblog&#8221;, its not the search engine&#8217;s fault for failing to index the content in a desirable manner. To some extent the problem also lies in the HTML page, which expresses the term &#8220;Blog&#8221; and &#8220;Weblog&#8221;. What if the HTML page header says that all the terms in the page conform to certain taxonomy. This is not uncommon, exactly what we do in a DTD or an XML Schema document. Take for example, the &lt;P&gt; tag. The tag is defines in the HTML DTD, and well understood by the browser&#8217;s parsing and rendering engine. A browser semantically understands this tag as&#8211;&#8220;the text which comes after this tag is a paragraph and should be rendered as such&#8221;. In case of HTML the vocabulary is limited, a P tag is always a P tag. However, in case of English language a &#8220;Blog&#8221; is a &#8220;Weblog&#8221; which is an &#8220;Online Journal&#8221; which is&#8230; the list continues.<br \/>\nEstablishing relationship is not trivial. A well-defined set of terms related with peers, parent-child nodes, and attributes&#8211;essentially this is Ontology, a way of representing and conceptualizing knowledge.<br \/>\nOne very good example, where this association works&#8211;A robot programmed to identify\/recognize fruits. Robot&#8217;s master writes the word &#8220;Mango&#8221; on the whiteboard. The robot quickly scans his ontology(assuming that the robot in our example uses Ontology for Knowledge Representation) for a match. He finds an exact match for the word M-A-N-G-O. Then he traverses; Mango &#8211;> Mangifera Indica (attribute type Scientific Name) &#8211;> Fruit (Parent node). The robot then thinks&#8211;&#8220;Mango is a Fruit&#8221;. But, how does he find whether the fruit is sweet\/sour, grown in tropical climate, has a large seed, grows on trees, is rich in Vitamin C, Folate, Selenium and Pantothenic Acid ? The answer lies within the Ontology, which could represent the extended knowledge as well.<br \/>\nGoing back to the search example, there are couple of ways to solve this problem:<\/p>\n<ol>\n<li>While indexing the page, instead of indexing the terms, index the generic id as retrieved from a &#8220;super&#8221; ontology. The hard part is locating the Ontology\n<li>Let the web page authors expose the terms with some metadata around it. For (a hypothetical) example:<br \/>\n&lt;p&gt;This is my &lt;so:onto id=&#8221;757893&#8243; contextid=&#8221;222&#8243;&gt;Weblog&lt;\/so:onto&gt;<\/p>\n<li>Convert the search term itself. For example, if I search for Weblog, two queries are made&#8211;for &#8220;Blog&#8221; and &#8220;Weblog&#8221; and the search results de-duped and presented.\n<\/ol>\n<p>Some work is already being done in the <a href=\"http:\/\/tap.stanford.edu\/\">TAP Project<\/a>. TAP is a succession of Alpiri, founded by RV Guha and Rob McCool, the same people behind TAP.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Semantic 1 : of or relating to meaning in language That&#8217;s the dictionary definition of Semantic. When applied to the Web&#8211;it means content which is semantically related to the content. Let us take the example of a keyword search on Google. I type in Blog, take a snapshot of the results and then key in [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[9],"tags":[],"_links":{"self":[{"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/posts\/26"}],"collection":[{"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/comments?post=26"}],"version-history":[{"count":0,"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/posts\/26\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/media?parent=26"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/categories?post=26"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.khaitan.org\/blog\/wp-json\/wp\/v2\/tags?post=26"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}