This article will help you in resolving following issue:
1) If you want to crawl the website using Nutch Crawler without indexing the HTML content into SOLR here are the changes that you need to perform the crawl script of nutch package.
You need to remove the following piece of code:
Hope This Helps!!!
1) If you want to crawl the website using Nutch Crawler without indexing the HTML content into SOLR here are the changes that you need to perform the crawl script of nutch package.
You need to remove the following piece of code:
SOLRURL="$3"
if [ "$SOLRURL" = "" ]; then
echo "Missing SOLRURL : crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>"
exit -1;
fi
echo "Indexing $SEGMENT on SOLR index -> $SOLRURL"
$bin/nutch index -D solr.server.url=$SOLRURL $CRAWL_PATH/crawldb -linkdb $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT
if [ $? -ne 0 ]
then exit $?
fi
echo "Cleanup on SOLR index -> $SOLRURL"
$bin/nutch clean -D solr.server.url=$SOLRURL $CRAWL_PATH/crawldb
if [ $? -ne 0 ]
then exit $?
fi
Hope This Helps!!!
No comments:
Post a Comment