<function name="function_name"> function body </function>
Name | Required | Default | Description |
---|---|---|---|
name | yes | The name of user-defined function |
<function name="download-multipage-list"> <return> <while condition="${pageUrl.toString().trim() != ''}" maxloops="${maxloops}" index="i"> <empty> <var-def name="content"> <html-to-xml> <http url="${pageUrl}"/> </html-to-xml> </var-def> <var-def name="nextLinkUrl"> <xpath expression="${nextXpath}"> <var name="content"/> </xpath> </var-def> <var-def name="pageUrl"> <template>${sys.fullUrl(pageUrl, nextLinkUrl)}</template> </var-def> </empty> <xpath expression="${itemXPath}"> <var name="content"/> </xpath> </while> </return> </function> <var-def name="imgLinks"> <call name="download-multipage-list"> <call-param name="pageUrl"> http://images.google.com/images?q=harvest&hl=en&btnG=Search+Images&nojs=1 </call-param> <call-param name="nextXPath"> //a[@shape='rect' and .='Next']/@href </call-param> <call-param name="itemXPath"> //img[contains(@src, 'images?q=tbn')]/@src </call-param> <call-param name="maxloops"> 5 </call-param> </call> </var-def>
Here the function named download-multipage-list is defined in order to serve multiple extractions. It collects link URLs from series of pages where XPath expression parameter is used to determine URL of next page with links if it exists. This situation is typical for list of products, or list of search results spanning multiple web pages. After that, the function is called with specified parameters in order to collect image links from Google images search limiting number of resulting pages to 5.