org.encog.bot.browse
Class LoadWebPage

java.lang.Object
  extended by org.encog.bot.browse.LoadWebPage

public class LoadWebPage
extends Object

Called to actually load a web page. This will read the HTML on a web page and generate the DocumentRange classes.

Author:
jheaton

Constructor Summary
LoadWebPage(URL theBase)
          Construct a web page loader with the specified base URL.
 
Method Summary
 int findEndTag(int index, Tag tag)
          Find the end tag that lines up to the beginning tag.
 WebPage load(InputStream is)
          Load a web page from the specified stream.
 WebPage load(String str)
          Load the web page from a string that contains HTML.
protected  void loadContents()
          Using the data units, which should have already been loaded by this time, load the contents of the web page.
protected  void loadDataUnits(InputStream is)
          Load the data units.
protected  void loadForm(int index, Tag tag)
          Called by loadContents to load a form on the page.
protected  void loadInput(int index, Tag tag)
          Called by loadContents to load an input tag on the form.
protected  void loadLink(int index, Tag tag)
          Called by loadContents to load a link on the page.
protected  void loadTitle(int index, Tag tag)
          Called by loadContents to load the title of the page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LoadWebPage

public LoadWebPage(URL theBase)
Construct a web page loader with the specified base URL.

Parameters:
theBase - The base URL to use when loading.
Method Detail

findEndTag

public final int findEndTag(int index,
                            Tag tag)
Find the end tag that lines up to the beginning tag.

Parameters:
index - The index to start the search on. This specifies the starting data unit.
tag - The beginning tag that we are seeking the end tag for.
Returns:
The index that the ending tag was found at. Returns -1 if not found.

load

public final WebPage load(InputStream is)
Load a web page from the specified stream.

Parameters:
is - The input stream to load from.
Returns:
The loaded web page.

load

public final WebPage load(String str)
Load the web page from a string that contains HTML.

Parameters:
str - A string containing HTML.
Returns:
The loaded WebPage.

loadContents

protected final void loadContents()
Using the data units, which should have already been loaded by this time, load the contents of the web page. This includes the title, any links and forms. Div tags and spans are also processed.


loadDataUnits

protected final void loadDataUnits(InputStream is)
Load the data units. Once the lower level data units have been loaded, the contents can be loaded.

Parameters:
is - The input stream that the data units are loaded from.

loadForm

protected final void loadForm(int index,
                              Tag tag)
Called by loadContents to load a form on the page.

Parameters:
index - The index to begin loading at.
tag - The beginning tag.

loadInput

protected final void loadInput(int index,
                               Tag tag)
Called by loadContents to load an input tag on the form.

Parameters:
index - The index to begin loading at.
tag - The beginning tag.

loadLink

protected final void loadLink(int index,
                              Tag tag)
Called by loadContents to load a link on the page.

Parameters:
index - The index to begin loading at.
tag - The beginning tag.

loadTitle

protected final void loadTitle(int index,
                               Tag tag)
Called by loadContents to load the title of the page.

Parameters:
index - The index to begin loading at.
tag - The beginning tag.


Copyright © 2014. All Rights Reserved.