Sunday, June 29, 2014

Zookeeper Cluster Setup

Today we are going to create a zookeeper cluster with 5 instances.

1) Required / Installed software
    1.a) Java - jdk1.7.* (/usr/java)
    1.b) Zookeeper – zookeeper-3.4.6 (/opt/install/zookeeper)
    1.c) Port selection for zookeeper deployment

          1.c.i ) On Different Servers
Server ID
Server IP
Client Port
Quorum Port
Leader Election Port
1
10.10.10.1
2181
2888
3888
2
10.10.10.2
2181
2888
3888
3
10.10.10.3
2181
2888
3888
4
10.10.10.4
2181
2888
3888
5
10.10.10.5
2181
2888
3888


          1.c.ii ) On Same Servers
Server ID
Server IP
Client Port
Quorum Port
Leader Election Port
1
10.10.10.1
2181
2888
3888
2
10.10.10.1
2182
2889
3889
3
10.10.10.1
2183
2890
3890
4
10.10.10.1
2184
2891
3891
5
10.10.10.1
2185
2892
3892
     


2) Installation folder structure
    2.a) Zookeeper Server
                /opt/install/zookeeper/zookeeper1/
                /opt/install/zookeeper/zookeeper2/
                /opt/install/zookeeper/zookeeper3/
                /opt/install/zookeeper/zookeeper4/
                /opt/install/zookeeper/zookeeper5/

    2.b) Data Directory
                /opt/install/zookeeper/data/zookeeper1/
                /opt/install/zookeeper/data/zookeeper2/
                /opt/install/zookeeper/data/zookeeper3/
                /opt/install/zookeeper/data/zookeeper4/
                /opt/install/zookeeper/data/zookeeper5/

    2.c) Logs Directory
                /opt/install/zookeeper/logs/zookeeper1/
                /opt/install/zookeeper/logs/zookeeper2/
                /opt/install/zookeeper/logs/zookeeper3/
                /opt/install/zookeeper/logs/zookeeper4/

                /opt/install/zookeeper/logs/zookeeper5/

3) Download a Zookeeper
    3.a) Currently we are using zookeeper-3.4.6.
    3.b) From website http://zookeeper.apache.org/releases.html
    3.c) Download the latest version of zookeeper software and untar it to /tmp/zookeeper.
    3.d) The following steps need to be done for all the zookeeper instances.
    3.e) Copy /tmp/zookeeper/zookeeper-3.4.6/* to /opt/install/zookeeper/zookeeper1/
    3.f) Copy $ZOO_HOME1/conf/zoo_sample.cfg as zoo.cfg 
    3.g) Edit zoo.cfg and have the following properties

tickTime=2000                                                              
initLimit=10                                                                   
syncLimit=5                                                                   
(make sure directory is present)                                     
dataDir=/opt/install/zookeeper/data/zookeeper1/           
# the port at which the clients will connect                       
clientPort=2181                                                              
(make sure directory is present)                                     
dataLogDir=/opt/install/zookeeper/logs/zookeeper1/       
#                                                                                     
server.1=10.10.10.1:2888:3888                                       
server.2=10.10.10.2:2888:3888                                       
server.3=10.10.10.3:2888:3888                                       
server.4=10.10.10.4:2888:3888                                       
server.5=10.10.10.5:2888:3888                                       
                               
4) Create “myid” file in data directory. 
    4.a) Edit it and write only “1” on server1
    4.b) Edit it and write only  “2” on server2
  4.c) Edit it and write only “3” on server3
  4.d) Edit it and write only  “4” on server4 
4.e) Edit it and write only “5” on server5

5) Start the zookeeper. 

        ($ZOO_HOME/bin/zkServer.sh start on each of the zookeeper instance)
        $ /opt/install/zookeeper/zookeeper1/bin/zkServer.sh start
        $ /opt/install/zookeeper/zookeeper2/bin/zkServer.sh start
        $ /opt/install/zookeeper/zookeeper3/bin/zkServer.sh start
        $ /opt/install/zookeeper/zookeeper4/bin/zkServer.sh start
        $ /opt/install/zookeeper/zookeeper5/bin/zkServer.sh start

        Sample Output:
        JMX enabled by default
        Using config: /opt/install/zookeeper/zookeeper1/bin/../conf/zoo.cfg
        Starting zookeeper ... STARTED


6) To STOP the Zookeeper instances
        $ /opt/install/zookeeper/zookeeper1/bin/zkServer.sh stop
        $ /opt/install/zookeeper/zookeeper2/bin/zkServer.sh stop
        $ /opt/install/zookeeper/zookeeper3/bin/zkServer.sh stop
        $ /opt/install/zookeeper/zookeeper4/bin/zkServer.sh stop
        $ /opt/install/zookeeper/zookeeper5/bin/zkServer.sh stop


7) To Check the Zookeeper Status
        $ /opt/install/zookeeper/zookeeper1/bin/zkServer.sh status
        $ /opt/install/zookeeper/zookeeper2/bin/zkServer.sh status
        $ /opt/install/zookeeper/zookeeper3/bin/zkServer.sh status
        $ /opt/install/zookeeper/zookeeper4/bin/zkServer.sh status
        $ /opt/install/zookeeper/zookeeper5/bin/zkServer.sh status

        Sample Output
        JMX enabled by default
        Using config: /opt/install/zookeeper/zookeeper2/bin/../conf/zoo.cfg
        Mode: leader

        JMX enabled by default
        Using config: /opt/install/zookeeper/zookeeper2/bin/../conf/zoo.cfg
        Mode: follower


Sunday, June 22, 2014

Passwordless SSH and SCP to Unix Server

In this article we will see how we can do SCP and SSH to remote server without giving any password. 
This helps while we are writing a bash script for doing ssh and scp.

Run this command on LocalMachine
$ ssh-keygen -t dsa

Add following lines to your .bash_profile on your LocalMachine:
# So that ssh will work, take care with X logins - see .xsession
[[ -z $SSH_AGENT_PID && -z $DISPLAY ]] &&
exec -l ssh-agent $SHELL -c "bash --login"

Run these command on LocalMachine
$ ssh-copy-id RemoteServer
$ scp ~/.ssh/id_dsa.pub RemoteServer:.ssh/id_dsa.pub.LocalMachine

Run these command on RemoteServer
$ cat ~/.ssh/id_dsa.pub.LocalMachine >> ~/.ssh/authorized_keys
$ rm ~/.ssh/id_dsa.pub.LocalMachine
$ chmod 600 ~/.ssh/authorized_keys
$ chmod 700 ~/.ssh

After executing the above commands we will be able to SCP or SSH to RemoteServer from LocalMachine.

Sunday, June 15, 2014

SCP file TO and FROM remote server

1) To copy the files from remote server to local machine
>> scp user@from-host:source-file local-destination-folder

from-host :-> Hostname from where we want to copy the files
source-file  :-> File that we want to copy
local-destination-folder :-> Local folder where we want to copy the files


2) To copy files from local server to remote server
>> scp source-file user@to-host:remote-destination-folder

source-file :-> File that we want to copy
to-host :-> Hostname where we want to copy the files
remote-destination-folder :-> Exact folder on remote machine where you want to copy the files

3) To copy the folder.
>> scp -r 

-r :-> This Parameter is to copy the folder.

Sunday, June 8, 2014

How to configure Nutch in Eclipse for SOLR

Checkout and Build Nutch:
1.    Get the latest source code from SVN using terminal.
For Nutch 1.x (ie.trunk) run this:
svn co https://svn.apache.org/repos/asf/nutch/trunk


2.    Add “http.agent.name” and “http.robots.agents” with appropiate values in “conf/nutch-site.xml”.
Here you have to rename the nutch-site.xml.template file to nutch-site.xml and make the changes accordingly.
See conf/nutch-default.xml for the description of these properties.

3.    Also, add “plugin.folders” and set it to {PATH_TO_NUTCH_CHECKOUT}/build/plugins. eg. If Nutch is present at "/home/Desktop/2.x",
set the property to:


<property>
   <name>plugin.folders</name>
   <value>/home/Desktop/2.x/build/plugins</value>
</property>

There is no /build/plugins folder currently present. But when you run the "ant eclipse" command you will get the "/build/plugins" in your {PATH_TO_NUTCH_CHECKOUT}.
Thats why it is written as set the absolute path as {PATH_TO_NUTCH_CHECKOUT}/build/plugins.
Do not give relative path here as it wont.

4.    Run this command:
ant eclipse



5.    Load project in Eclipse
5.1.    In Eclipse, click on “File” -> “Import...”

5.2.    Select “Existing Projects into Workspace”

5.3.    In the next window, set the root directory to the location where you took the checkout of nutch 2.x (or trunk). Click “Finish”.

5.4.    You will now see a new project named 2.x (or trunk) being added in the workspace. Wait for a moment until Eclipse refreshes its SVN cache and builds its workspace. You can see the status at the bottom right corner of Eclipse.

5.5.    In Package Explorer, right click on the project “2.x” (or trunk), select “Build Path” -> “Configure Build Path”

5.6.    In the “Order and Export” tab, scroll down and select “2.x/conf” (or trunk/conf). Click on “Top” button. Sadly, Eclipse will again build the workspace but this time it won’t take take much.


6.    Need to Download following jar files :
http://mvnrepository.com/artifact/org.elasticsearch/elasticsearch/0.90.1
Configure the above jar file in eclipse.

7.    One error you will get for “ElasticsearchException”. Change it to “ElasticSearchException” (S Capital)


8.    Now you are ready to run the nutch code in eclipse:
8.1.    Lets start off with the inject operation.

8.2.    Right click on the project in “Package Explorer” -> select “Run As” -> select “Run Configurations”.

8.3.    Create a new configuration. Name it as "inject".
For 1.x ie trunk : Set the main class as: org.apache.nutch.crawl.Injector
For 2.x : Set the main class as: org.apache.nutch.crawl.InjectorJob

8.4.    In the arguments tab, for program arguments, provide the path of the input directory which has seed urls.

8.5.    Set VM Arguments to “-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log”

8.6.    Click "Apply" and then click "Run".

8.7.    If everything was set perfectly, then you should see inject operation progressing on console.



Class in Nutch 1.x (i.e.trunk)
inject :- org.apache.nutch.crawl.Injector
generate :- org.apache.nutch.crawl.Generator
fetch :- org.apache.nutch.fetcher.Fetcher
parse :- org.apache.nutch.parse.ParseSegment
updatedb :- org.apache.nutch.crawl.CrawlDb


Class in Nutch 2.x
inject :- org.apache.nutch.crawl.InjectorJob
generate :- org.apache.nutch.crawl.GeneratorJob
fetch :- org.apache.nutch.fetcher.FetcherJob
parse :- org.apache.nutch.parse.ParserJob
updatedb :- org.apache.nutch.crawl.DbUpdaterJob


HOPE THIS HELPS!!!!

Password Protected Solr Admin Page

As we all know Solr Admin Page is not password protected and anyone can get into Solr Admin Page. However this article will ...