Subject: ME768 : Project Report | | | Year/Sem: 1999-2000, IInd Semester | | | Instructor: Amitabha Mukherjee | | | Author: Apurva Sharma |
CONTENTS: |
|
|
|
|
|
|
Agent, here means someone who acts on your behalf. Information agents are loosely analogous to travel agents, insurance agents, etc.
Software Agent technology has seen a lot of development recently. As a result we see, Search agents, Auction agents, Bidding agents, Chatting agents etc. To quote from (http://www.siliconalleyreporter.com, 23 December 1998),
Today the e-commerce companies are paying portals a heavy ransom for traffic. In the future, they will pay intelligent agent services (most of which will be owned by the portals) for traffic. The successful portals of the future will combine a trusted brand, strong editorial, and data leveraging services like comparison shopping.From my review of a paper on ShopBot, I got the idea about a comparison shopping agent. The importance of Internet agents is stressed by the following quote :
From the consumer's point of view, it is virtually impossible to find the small set of pages that list a specific product for sale. From the vendor's point of view, it is extremely difficult to attract qualified buyers to their site.The job of the proposed "BestPrix" agent would be find out the prices of a particular commodity from a few online vendors and return a comparison list to the user, thereby, assisting him in his purchase.
Bargain Finder Agent helps do comparison shopping for music CD's.
Reel.com suggests movies "similar" to the one specified from a specified period.
Airfare.com finds the lowest fares in a market before you book.
Price-Search, is similar to jango. Given a product it searches its resources for the best price to offer.
Besides, there are a lot of other Software Agents for Internet (called BOTs).
A procedure which accesses the vendor site to look for a given product and returns a set of strings, each corresponding to a product description returned by the vendor.Example:
<html>
<body>
<form name=search action="/cgi-bin/search.cgi">
<b>Artist:</b> <input type=text name=ar>
</form>
</body>
</html>The tree is actually of depth only 3. the HTML tag is the root. All forms come at level below it. Each form has its structure in the lower level below it. Each node houses its attributes. We need this structure to extract out some contextual information. So from the text preceeding the textfield in the above example we conclude that this field is meant for filling the artist name. Also an analysis of the form tag suggests that the only meaningful sub-tags of form tag for a search form are : textfield, checkbox, radiobutton, select box. Password field, textarea and others are irrelevant and forms containing these can be safely rejected. Of the remaining forms we still have to figure out the best suitable candidate.
Each form is given a weight attribute. Initially the weight attribute is set to the number of attributes from the domain model corresponding to which fields are found in the form. These forms are then tried out in decreasing order of weights. The learner fills these forms by matching fields from the existing result it has and sends them for processing from the host site. The results are parsed and matched with the sample results. The form giving the best results matching the sample one is selected as the desired candidate.
Parsing of results works as follows. Individual result fields are assumed to start of from paragraph separators - <td>, <li>, <p>, <br>, <tr>.
<TR>
<TD></TD>
<TD VALIGN=TOP>
<FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333"></FONT></TD>
<TD><FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333">
<B><A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/FIND/album.html/artistid=AEROSMITH/itemid=298729">
Big Ones<IMG src="http://gs.cdnow.com/graphics/speaker.gif" ALT="Sound Sample" border=0 width=11 height=11></A></B><BR>1994<BR>
<FONT face='Verdana,Helvetica,Sans-Serif' size='1' color='#333333'><B>All Music Guide Pick: Best of Artist</B></FONT></TD>
<TD ALIGN=RIGHT><FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333">
<IMG SRC="http://gs.cdnow.com/RP/GR/eng/30_off_right.gif" HEIGHT=11 WIDTH=104><BR>
CD <A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/ACCT/cart.html/widgetid=23784">
<FONT COLOR="#CC6600">$13.28</FONT></A><BR>Tape
<A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/ACCT/cart.html/widgetid=23785">
<FONT COLOR="#CC6600">$12.49</FONT></A><BR></FONT></TD>
</TR>The above hypertext shows the search results from http://www.cdnow.com for the Aerosmith album Big One's. Positions for individual attributes of the product are identified in the search result and a regular expression consisting of the result hypertext with the attribute values replaced by symbolic names for product attributes is emitted. The shopper uses this by replacing the symbolic names by actual product attributes to find other products. A page for error is also obtained and saved for each vendor. This allows the shopper to recognize when it has failed to find a particular product from the vendor's website.
It obtains the request for a product from the user via the GUI, and goes in parallel to each online vendor's searchable index, and fills out and submits the forms. For each resulting page not matching the vendor's failure template, it strips off the header and tailer, and looks in the remaining code for any results - any logical lines matching the learned product description format. It then sorts the results by ascending order of price, and generates a summary for the user.Back to contents
There are things still remaining to be looked into. These shortcomings are mainly due to the hueristics used to reduce the search space and model the domain. To give an example of the possible scenarios not catered by the present agent implementation:
We talk of the music CD domain itself. If the first search field (key one) is taken as Record Label the results give an altogether different picture. There is an extra level of indirection added. The results give a list of Albums offered. One has to follow these to reach the actual page from where it can obtain the price and other things. The current agent cannot do so, simply because its model does not include it.Back to contents
@Article{Doorenbos/Etzioni/Weld:1997, author= { Doorenbos Robert B., Etzioni Oren, Weld Daniel S. }, keywords= { AGENTS WWW }, institution= { UWASH-CSE }, title= { A Scalable Comparison-Shopping Agent for the WWW }, journal= { Autonomous Agents }, year= { 1997 }, e-mail= { bobd@cs.washington.edu, etzioni@cs.washington.edu, weld@cs.washington.edu }, url= { ftp://ftp.cs.washington.edu:21/pub/etzioni/softbots/agents97.ps }, annote= { This paper describes a domain-independent comparison-shopping agent named ShopBot. Given the home pages of several online stores, ShopBot autonomously learns how to shop at those vendors. After learning, it is able to speedily visit a dozen software and CD vendors, extract product information, and summarize the results for the user. ShopBot achieves this without sophisticated natural language processing, and requires only minimal knowledge about different product domains. Instead ShopBot relies on a combination of heuristic search, pattern matching, and inductive learning techniques. ShopBot is unique in its ability to learn to extract information from the semi- structured text published by Web vendors. The most important regularity exploited is that vendors structure their store fronts for easy navigation and use a uniform format for product descriptions. Major limitations are that it is limited to stores that provide a search-able index, and, relies heavily on HTML. The paper clearly describes the heuristics used at different stages with justification. Results of trials of the ShopBot are also provided which support the authors claims about the utility of the software. }