BestPrix - A price comparison agent for WWW

Subject: ME768 : Project Report

Year/Sem: 1999-2000, IInd Semester

Instructor: Amitabha Mukherjee

Author: Apurva Sharma

CONTENTS:

Motivation

Relation To Past Work

MOTIVATION

Computer technology has dramatically enhanced our ability to generate, deliver and store information. Unfortunately, our tools for locating, filtering, and analyzing information have not kept pace. A popular solution is intelligent agents.

Agent, here means someone who acts on your behalf. Information agents are loosely analogous to travel agents, insurance agents, etc.

Software Agent technology has seen a lot of development recently. As a result we see, Search agents, Auction agents, Bidding agents, Chatting agents etc. To quote from (http://www.siliconalleyreporter.com, 23 December 1998),

Today the e-commerce companies are paying portals a heavy ransom for traffic. In the future, they will pay intelligent agent services (most of which will be owned by the portals) for traffic. The successful portals of the future will combine a trusted brand, strong editorial, and data leveraging services like comparison shopping.

From my review of a paper on ShopBot, I got the idea about a comparison shopping agent. The importance of Internet agents is stressed by the following quote :

From the consumer's point of view, it is virtually impossible to find the small set of pages that list a specific product for sale. From the vendor's point of view, it is extremely difficult to attract qualified buyers to their site.

The job of the proposed "BestPrix" agent would be find out the prices of a particular commodity from a few online vendors and return a comparison list to the user, thereby, assisting him in his purchase.

Back to contents

RELATION TO PAST WORK

My work draws much inspiration from ShopBot. ShopBot was developed a Washington Univ. to demonstrate the development of a domain-independent comparison shopping agent. It is now commercially succeeded by http://jango.excite.com.

Bargain Finder Agent helps do comparison shopping for music CD's.

Reel.com suggests movies "similar" to the one specified from a specified period.

Airfare.com finds the lowest fares in a market before you book.

Price-Search, is similar to jango. Given a product it searches its resources for the best price to offer.

Besides, there are a lot of other Software Agents for Internet (called BOTs).

Back to contents

METHODOLOGY

The "BestPrix" agent has the following two problems to solve:

The Extraction Procedure Learning Problem

Given:

Incomplete domain model:

Example products: P₁, P₂, ..., P_n.
Attributes of the products (e.g., manufacturer(P₁) = Microsoft, name(P₁) = Encarta, ...).

The URL for the home page of a vendor.

Determine:

A procedure which accesses the vendor site to look for a given product and returns a set of strings, each corresponding to a product description returned by the vendor.

Example:

Artist e.g. Aerosmith
Album Title e.g. Big Ones
Song Title
Record Label
Video Title
Actor/Director

Procedure:

Artist

<html>

<body>

    <form name=search action="/cgi-bin/search.cgi">

    <b>Artist:</b> <input type=text name=ar>

    </form>

</body>

</html>

Each form is given a weight attribute. Initially the weight attribute is set to the number of attributes from the domain model corresponding to which fields are found in the form. These forms are then tried out in decreasing order of weights. The learner fills these forms by matching fields from the existing result it has and sends them for processing from the host site. The results are parsed and matched with the sample results. The form giving the best results matching the sample one is selected as the desired candidate.

Parsing of results works as follows. Individual result fields are assumed to start of from paragraph separators - <td>, <li>, <p>, <br>, <tr>.

<TR>

<TD></TD>

<TD VALIGN=TOP>

<FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333"></FONT></TD>

<TD><FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333">

<B><A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/FIND/album.html/artistid=AEROSMITH/itemid=298729">

Big Ones<IMG src="http://gs.cdnow.com/graphics/speaker.gif" ALT="Sound Sample" border=0 width=11 height=11></A></B><BR>1994<BR>

<FONT face='Verdana,Helvetica,Sans-Serif' size='1' color='#333333'><B>All Music Guide Pick: Best of Artist</B></FONT></TD>

<TD ALIGN=RIGHT><FONT FACE="Verdana, Helvetica, Sans-Serif" SIZE="2" COLOR="#333333">

<IMG  SRC="http://gs.cdnow.com/RP/GR/eng/30_off_right.gif" HEIGHT=11 WIDTH=104><BR>

CD <A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/ACCT/cart.html/widgetid=23784">

<FONT COLOR="#CC6600">$13.28</FONT></A><BR>Tape

<A href="/cgi-bin/mserver/SID=1061910195/pagename=/RP/CDN/ACCT/cart.html/widgetid=23785">

<FONT COLOR="#CC6600">$12.49</FONT></A><BR></FONT></TD>

</TR>

http://www.cdnow.com

Output:

The URL of a page containing a form for a searchable index.
A function mapping product attributes to fields of that form.
Functions for extracting product data from pages returned by the index:

A function that recognizes failure pages
A function that strips header and tailer information from successful pages.
A function that extracts a set of individual product descriptions from the remaining text on a successful page.

Problem of finding the prices

It obtains the request for a product from the user via the GUI, and goes in parallel to each online vendor's searchable index, and fills out and submits the forms. For each resulting page not matching the vendor's failure template, it strips off the header and tailer, and looks in the remaining code for any results - any logical lines matching the learned product description format. It then sorts the results by ascending order of price, and generates a summary for the user.

Back to contents

RESULTS

I have been able to implement the basic architecture of the BestPrix agent. As of now I have added support for only Music CD's domain. Two vendors http://www.cdnow.com and http://www.cduniverse.com are included.

There are things still remaining to be looked into. These shortcomings are mainly due to the hueristics used to reduce the search space and model the domain. To give an example of the possible scenarios not catered by the present agent implementation:

We talk of the music CD domain itself. If the first search field (key one) is taken as Record Label the results give an altogether different picture. There is an extra level of indirection added. The results give a list of Albums offered. One has to follow these to reach the actual page from where it can obtain the price and other things. The current agent cannot do so, simply because its model does not include it.

Back to contents

FUTURE WORK

Future work in this are can take two directions. One possibility is to strengthen the model to include more complex scenarios. But I believe this would complicate the design of the agent and adversely affect its efficiency and simplicity of use. That was the reason I rejected suggestions like using a dictionary, word stemming, human administered learner etc.
Other direction is related to the current progress in XML. As the trend shows future websites are going to be based on XML. XML makes it easy for automated agents to parse and understand the structure of the websites. So if somebody can design a DTD and an associated automated agent capable of understanding it and if several vendor's use that DTD to create an image of their website compliant with the agent, web shoppers can use this agent to automatically obtain the best prices for their chosen products and maybe actually purchase using the agent.

Back to contents

BIBLIOGRAPHY/WEBINFO

Bibliography

@Article{Doorenbos/Etzioni/Weld:1997,
  author=      { Doorenbos Robert B., Etzioni Oren, Weld Daniel S. },
  keywords=    { AGENTS WWW },
  institution= { UWASH-CSE },
  title=       { A Scalable Comparison-Shopping Agent for the WWW },
  journal=     { Autonomous Agents },
  year=        { 1997 },
  e-mail=      { bobd@cs.washington.edu, etzioni@cs.washington.edu, weld@cs.washington.edu },
  url=         { ftp://ftp.cs.washington.edu:21/pub/etzioni/softbots/agents97.ps },
  annote=      {
                 This paper describes a domain-independent comparison-shopping
                 agent named ShopBot. Given the home pages of several online
                 stores, ShopBot autonomously learns how to shop at those
                 vendors. After learning, it is able to speedily visit a dozen
                 software and CD vendors, extract product information, and
                 summarize the results for the user. ShopBot achieves this
                 without sophisticated natural language processing, and requires
                 only minimal knowledge about different product domains. Instead
                 ShopBot relies on a combination of heuristic search, pattern
                 matching, and inductive learning techniques. ShopBot is unique
                 in its ability to learn to extract information from the semi-
                 structured text published by Web vendors.
                 The most important regularity exploited is that vendors
                 structure their store fronts for easy navigation and use a
                 uniform format for product descriptions.
                 Major limitations are that it is limited to stores that
                 provide a search-able index, and, relies heavily on HTML.

                 The paper clearly describes the heuristics used at different
                 stages with justification. Results of trials of the ShopBot
                 are also provided which support the authors claims about the
                 utility of the software. }

WebInfo

Back to contents