Sunday, July 27, 2014

Crawl your website using Nutch Crawler without Indexing the HTML content into SOLR

This article will help you in resolving following issue:

1) If you want to crawl the website using Nutch Crawler without indexing the HTML content into SOLR here are the changes that you need to perform the crawl script of nutch package.

You need to remove the following piece of code:

SOLRURL="$3"

if [ "$SOLRURL" = "" ]; then echo "Missing SOLRURL : crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>" exit -1;fi


echo "Indexing $SEGMENT on SOLR index -> $SOLRURL" $bin/nutch index -D solr.server.url=$SOLRURL $CRAWL_PATH/crawldb -linkdb $CRAWL_PATH/linkdb $CRAWL_PATH/segments/$SEGMENT if [ $? -ne 0 ]
then exit $?
fi

echo "Cleanup on SOLR index -> $SOLRURL" $bin/nutch clean -D solr.server.url=$SOLRURL $CRAWL_PATH/crawldb if [ $? -ne 0 ]
then exit $?
fi

Hope This Helps!!!

Sunday, July 13, 2014

[SOLR] RELOAD solrconfig.xml and schema.xml without restarting the SOLR

Most of the time we get answer that we need to restart the SOLR instance if we make a change in schema.xml and solrconfig.xml.

But now with SOLR4.0 onwards, this can be achieved using RELOAD command.

Command:
http://localhost:8983/solr/admin/cores?action=RELOAD&core=core0

Now, If you make changes to your solrconfig.xml or schema.xml files and you want to start using them without stopping and restarting your SOLR instance. 
Then just execute the RELOAD command on your core.

NOTE:
However there are few configuration changes which still needs, the restart of SOLR instance, 
1) IndexWriter related settings in <indexConfig>
2) Change in <dataDir> location

Hope this Helps!!!

Reference:
https://wiki.apache.org/solr/CoreAdmin#RELOAD



Sunday, July 6, 2014

Unix Commands : Tricks and Tips


  1. Update a file into a JAR file

I want to update the application.properties file in myproject.jar file.

jar -uf myproject.jar application.properties

This command will update application.properties in myproject.jar
Note that for -u  application.properties must already exist in the jar file, and will only be overwritten if it's newer than the one in the jar.

Work-Life Balance in IT Industry


In IT Industry this is what we call as Work-Life Balance





Password Protected Solr Admin Page

As we all know Solr Admin Page is not password protected and anyone can get into Solr Admin Page. However this article will ...