Thursday, March 7, 2013

Change IP or Hostname for an already installed Cloudera Manager 4.x (CM 4.0)



I installed the Cloudera Manager 4 on a 1GbE default setup on eth0/bond0 and now I need to use 10GbE or Infiniband IB or 40GbE without the reinstall. How I did it!!

 The assumptions here are that you already have a CM 4.x already installed with the default embedded postgres DB on a linux server
. If you want to learn how-to just ask...


Let's start here where I have a 3 cluster setup with yosemite001 - yosemite003

 hostname = yosemite00[1-3].somedomain.com = CM Installed on 1GbE


  1. Shutdown all services
  2. service cloudera-scm-agent stop on all nodes 
  3. service cloudera-scm-server stop on CM server

From here if you have a paid or supported version, You are about to lose it or maybe they say "Cool good job!!"  proceed at your own risk...

from the CM Server
as root su – postgres (If you have installed on a different database then login using that credentials)

psql -h localhost -p 7432 -U scm
à When asked for the password open another terminal and run
[root@yosemite001 ~]# grep password /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.password=TVkDZxuNCw
paste the password & you should see the scm prompt
scm=>  select host_id,host_identifier,name,ip_address from hosts;  --> Check the current config and save the values in a file/notepad you will need this if you need to flip back.

host_id |    host_identifier    |         name          |   ip_address
---------+-----------------------+-----------------------+----------------
       2 | yosemite001.somedomain.com | yosemite001.somedomain.com | 192.168.0.11
       3 | yosemite002.somedomain.com | yosemite002.somedomain.com | 192.168.0.12
       4 | yosemite003.somedomain.com | yosemite003.somedomain.com | 192.168.0.13
(3 rows)

scm=> update hosts set (host_identifier,name,ip_address) = ('yosemite001-10g.somedomain.com','yosemite001-10g.somedomain.com','192.168.10.11') where host_id=2;
UPDATE 1

Update all the other rows.

Check if the updates went through

scm=>  select host_id,host_identifier,name,ip_address from hosts;
host_id |     host_identifier                       |           name                                   |   ip_address
---------+-------------------------------------+---------------------------------------+----------------
       3 | yosemite002.somedomain.com    | yosemite002.somedomain.com    | 192.168.0.12
       4 | yosemite003.somedomain.com    | yosemite003.somedomain.com    | 192.168.0.13
       2 | yosemite001-somedomain.com | yosemite001-ib.somedomain.com | 192.168.10.11
(3 rows)

Exit the tool  “\q;”

edit the /etc/cloudera-scm-agent/config.ini and update the server and the listen ip & hostname section on all nodes to the new interface ip & address

edit /etc/sysconfig/network with the new hostname for the interface
run hostname yosemite001-10g (Update all hostnames on the servers)
run “exec bash” and verify the hostname change is done

Run on the CM Server
chkconfig cloudera-scm-server on
service cloudera-scm-server start

Run the following on all the nodes

chkconfig cloudera-scm-agent on
service cloudera-scm-agent start on all the nodes

Login to the GUI verify the host changes and “Good health”

Force CM to rerun the configuration to register the changes (mimic changes and revert to force CM a rerun of config )

Go to the hdfs->system-wide config edit/changes and make any minimal changes and save and revert back to original and save
Do the same for mapreduce as well
Restart the services

Verify the client functionality by running Terasort.


9 comments:

  1. Wow! Thanks. I don't like this move from "simple" configuration files to this series of incantations :(

    ReplyDelete
  2. Thank you very much, it really help us

    ReplyDelete
  3. Thank so much!!! That useful information helped to solve many connection problems those led to fails during calculation of jobs....

    ReplyDelete
  4. Hi,

    I have managed to screw up my HDFS ( CDH 4) installation. Are you able to provide paid/unpaid help regarding this?

    Thanks

    ReplyDelete
  5. Yes. Could you explain the issue with HDFS ?

    ReplyDelete
  6. We have a 4 node CDH 4.1 cluster with HDFS on it on Amazon EC2. ( 1 x Name Node + 3 x Data nodes )

    I took a snapshot of the data node servers at T1 but forgot to take a snapshot of the name node server as well.

    18 hours later (T1 + 18) I realized that sometime during the last 18 hours the disk was full and took another snapshot of the 3 data nodes.

    I increased the size of the disks, but when I tried to bring the cluster back up, Name Node won't come out of safe mode because a number of blocks were missing.

    I do have a backup of the name node which ran sometime around T1 + 12 hours.

    I have tried to bring the cluster back up with different combinations of the Name Node backup and Data Nodes backup but I get anywhere from 150 to 600 blocks missing ( out of ~1200 total ).

    Is there a way to recover the data or I'm screwed?

    ReplyDelete
    Replies
    1. ps40,

      were you able to recover your files ? if you need any help post back and we can sync up.

      Delete
  7. Hey When I run the select query I get something like this:
    scm=> select host_id,host_identifier,name,ip_address from hosts;
    host_id | host_identifier | name | ip_address
    ---------+--------------------------------------+------------------------+---------------
    8 | dc7c25c8-958a-4332-b78e-1780820333a4 | yamazaki.lunexa.local | 192.168.2.222
    6 | 52a9d844-0f16-42b8-9f1b-2572dbbc04fa | woodford.lunexa.local | 192.168.2.224

    The host_identifier seems odd, do I still need to change that part, mind you I am running this select query while scm is still up I just wanted to see how this would work when porting the machines to a new network/domain. Any ideas/help on this issue.

    ReplyDelete
  8. It should be fairly straight forward and the earlier blog should help you transition over to the new network. If you using Hive, then it get's a little complicated. The simpler option if you are running 4.4 or above (might work on earlier version but have not tried it) , There is file /etc/cloudera-scm-agent/config.ini -- In this file uncomment the following lines
    # listening_ip=
    # Hostname that Agent reports as its hostname
    # listening_hostname=
    and fill these in with the new network or domain once you have moved over but before you start any agents/services. This is very effective if you have multiple interfaces. Also if you need to force from a config side, you can use dfs.datanode.hostname in hdfs-site.xml and mapred.tasktracker.dns.nameserver or slave.host.name in mapred-site.xml.

    Hopefully that should help you migrate to the new Network/domain. Regarding the host identifier, you might get new records added for the new network, which is fine and you can delete the old network names from the cluster. As always, you can try this on test site if you have before trying on production :-)

    ReplyDelete