|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface ISCrawlerInterface
Interface of the main Crawler class of the Web search engine. This class is used to start and stop the Crawler, to reset the engine and to control crawling parameters.
Runnable,
Thread,
InetAddress,
URL,
HttpURLConnection,
InputStreamReader,
BufferedReader,
Exception| Field Summary | |
|---|---|
static int |
RUNNING
The Running state of the current thread |
static int |
STOPPED
The Idle state of the current thread |
| Method Summary | |
|---|---|
void |
addLink(java.net.URL link)
Adds a new link to the URL queue, if the link is not yet visited. |
java.net.URL |
getBest()
Returns the best candidate to be visited next. |
java.lang.String |
getContentType(java.net.URLConnection urlConnection)
Returns the ContentType of the current document. |
int |
getCrawlingDepth()
Returns the current maximum allowed crawling depth. |
ISDocumentInterface |
getCurrentDocument()
Returns the last document visited by the Crawler. |
java.net.URL |
getCurrentURL()
Returns the last URL visited by the Crawler. |
int |
getMaxQueueSize()
Returns the maximum allowed size of the URL Queue |
java.net.URL |
getNextURL()
Returns the next URL to be searched. |
int |
getQueueSize()
Returns the current size of the URL queue |
int |
getState()
Returns the current state of the crawler. |
int |
getTimeout()
Returns the current timeout of the crawler. |
boolean |
isDataStructureEmpty()
Checks if our data structure is empty or not. |
boolean |
isVisited(java.net.URL doc)
Checks if the URL of the given document is already visited by the crawler. |
void |
reset()
Resets the crawler. |
boolean |
robotSafe(java.net.URL url)
Checks if there exists a robots.txt on the server and checks it contains a "Disallow:". |
ISDocumentInterface |
runParser(java.io.Reader r)
Starts the parser. |
void |
setCrawlingDepth(int depth)
Sets the maximum allowed crawling depth. |
void |
setCurrentDocument(ISDocumentInterface isd)
Sets the last document visited by the Crawler. |
void |
setQueueMaxSize(int m)
Set the maximum allowed size of the URL queue |
void |
setState(int state_code)
Sets the current state of the crawler. |
void |
setTimeout(int t)
Sets the current timeout of the crawler. |
void |
start()
Starts the thread of the crawler and changes the engine state to RUNNING |
void |
stop()
Stops the crawler. |
| Methods inherited from interface java.lang.Runnable |
|---|
run |
| Field Detail |
|---|
static final int RUNNING
static final int STOPPED
| Method Detail |
|---|
void start()
RUNNING
void stop()
STOPPED.
void reset()
STOPPED,
void addLink(java.net.URL link)
link - The URL link representation of the new targetint getState()
RUNNING and STOPPED.
RUNNING oder STOPPEDvoid setState(int state_code)
RUNNING and STOPPED.
The - current state of the crawler, RUNNING oder STOPPEDint getTimeout()
void setTimeout(int t)
The - current timeout of the crawler in ms.int getQueueSize()
void setQueueMaxSize(int m)
m - The maximum allowed Queue sizeint getMaxQueueSize()
void setCrawlingDepth(int depth)
depth - The maximum allowed craling depth.int getCrawlingDepth()
java.net.URL getBest()
null if the queue is empty.boolean isVisited(java.net.URL doc)
true if the engine was able to recognize
the given URL as already visited, false.ISDocumentInterface getCurrentDocument()
void setCurrentDocument(ISDocumentInterface isd)
The - last visited document as object that implements ISDocumentInterface (and contains all
extracted links, words and their stems).java.net.URL getCurrentURL()
java.net.URL getNextURL()
getBest().
boolean isDataStructureEmpty()
ISDocumentInterface runParser(java.io.Reader r)
Reader - which contains the URL to be parsed.
ISDocumentInterface.boolean robotSafe(java.net.URL url)
url - URL which should be checked.
java.lang.String getContentType(java.net.URLConnection urlConnection)
urlConnection - The current document given by a URLConnection.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||