[Download] | [Documentation Home] | [Release Note]

Step-by-Step Guide on Creating Data Services Using Web Scraping

Introduction

WSO2 Data Services Server supports to scrape web information and extract web information as a service. This guide will use Yahoo Weather Forecasts to extract weather information.

DS

Figure 1: Weather Forcast

Creating the Config


Before we create the service, we need to create a configuration file which gives information regarding the web resource and the xslt file which will have information regarding the response.

Sample Configuration File

<?xml version="1.0" encoding="UTF-8"?>
<config>
 <var-def name='weatherInfo'>
  <xslt>
   <xml>
    <html-to-xml> 
     <http method='get' url='http://weather.yahoo.com/'/>
    </html-to-xml>
   </xml>
   <stylesheet>
    <file path="/media/data/web/template.xsl"/>
   </stylesheet>
  </xslt>
 </var-def>
</config>

Sample XSLT file

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
  <WeatherInfo>
   <xsl:for-each select="//div[@id='MediaWeatherFeaturedLocations']/div[@class='bd']/ul/li">
     <Weather>
      <Region><xsl:value-of select="div[@class='forecast']/h4"/></Region>
      <Temp><xsl:value-of select="div[@class='forecast']/p[@class='temp-f']/span[@class='now']"/></Temp>
     </Weather>
   </xsl:for-each>
  </WeatherInfo>
</xsl:template>
</xsl:stylesheet>

Step 2: Start by giving a name


Lets begin to create the data service by clicking on link in left menu under Services/Add/Data Service. It requires to enter a name to the Data service. Lets name it as WeatherInfoService and click on Next.

DS

Figure 2: Create Data Service

Step 2: Enter details about your web data source


Second step is to enter details about the data source which we are using to create the service. There are two options where you can provide the configuration details. You can either save configuration into a configuration.xml and provide the file path, or can enter as a inline configuration. Give suitable DataSource name and select web data source as the data source type. For the following sample we will give the above configuration using a file path as shown below.

DS

Figure 3: Configure Web Source, by giving the file path

Inline Config

Figure 3.1: Configure Web Source, by inline configuration


Step 3: Create Query


To extract information you need to create a query along with the structure of the response.

Give a name to the query, and select the data source name from the drop down, which is created in previous step. Scraper Variable should be given as same as the output name in the configuration, which actually returns the output from the configuration. Eg: It should be 'weatherInfo' which is the var-def name in the configuration.

Enter a name for Grouped By Element and Row Name. You can also give a name space if you wish. Click on Add Output Mappings to map the response to output XML

DS

Figure 4: Create New Query

Click on "Add Output Mapping" button to create how the output should look like. Once you have entered the output mapping details click on "Main Configuration" and click on "Save" button.

DS

Figure 5: Add Output Mappings

Created query will be listed as follows. Click on "Next" to create the operation.

DS

Figure 6: Created Query

Step 4: Create a web service operation


Enter a name for the operation and select the query from the drop down list. Click on "Save".

DS

Figure 7: Add operation

Once you click on "Finish" your data service will be created and deployed.

Step 5: Service deployed


You can see the deployed service by clicking on "List" under Manage/Services left menu. Deployed WeatherInfoService service will be list as follows.

DS

Figure 8: Deployed Services

Click on the "WeatherInfoService". It will direct to the Service Dashboard (WeatherInfoService). You can click on "Edit Data Service (XML Edit)" and view the created data service as a XML.

DS

Figure 9: Edit data service

Step 6: Try your service


Click on "Try It" link to invoke the service.

DS

Figure 10: Invoke service using Try-it