I installed the Cloudera Manager 4 on a 1GbE default setup on eth0/bond0 and now I need to use 10GbE or Infiniband IB or 40GbE without the reinstall. How I did it!!
The assumptions here are that you already have a CM 4.x already installed with the default embedded postgres DB on a linux server
. If you want to learn how-to just ask...
Let's start here where I have a 3 cluster setup with yosemite001 - yosemite003
hostname = yosemite00[1-3].somedomain.com = CM Installed on 1GbE
- Shutdown all services
- service cloudera-scm-agent stop on all nodes
- service cloudera-scm-server stop on CM server
From here if you have a paid or supported version, You are about to lose it or maybe they say "Cool good job!!" proceed at your own risk...
from the CM Server
as root su – postgres (If you have installed on a different database then login using that credentials)
psql -h localhost -p 7432 -U scm
à When asked for the password
open another terminal and run
[root@yosemite001
~]# grep password /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.password=TVkDZxuNCw
paste the password & you
should see the scm prompt
scm=> select
host_id,host_identifier,name,ip_address from hosts; -->
Check the current config and save the values in a file/notepad you will need
this if you need to flip back.
host_id |
host_identifier
|
name |
ip_address
---------+-----------------------+-----------------------+----------------
2 | yosemite001.somedomain.com | yosemite001.somedomain.com | 192.168.0.11
3 | yosemite002.somedomain.com | yosemite002.somedomain.com | 192.168.0.12
4 | yosemite003.somedomain.com | yosemite003.somedomain.com | 192.168.0.13
(3 rows)
scm=> update hosts set
(host_identifier,name,ip_address) =
('yosemite001-10g.somedomain.com','yosemite001-10g.somedomain.com','192.168.10.11') where
host_id=2;
UPDATE 1
Update all the other rows.
Check if the updates went
through
scm=> select
host_id,host_identifier,name,ip_address from hosts;
host_id
| host_identifier |
name |
ip_address
---------+-------------------------------------+---------------------------------------+----------------
3 | yosemite002.somedomain.com | yosemite002.somedomain.com | 192.168.0.12
4 | yosemite003.somedomain.com | yosemite003.somedomain.com | 192.168.0.13
2 | yosemite001-somedomain.com | yosemite001-ib.somedomain.com | 192.168.10.11
(3 rows)
Exit the tool “\q;”
edit the
/etc/cloudera-scm-agent/config.ini and update the server and the listen ip
& hostname section on all nodes to the new interface ip & address
edit /etc/sysconfig/network with
the new hostname for the interface
run hostname yosemite001-10g
(Update all hostnames on the servers)
run “exec bash” and verify the
hostname change is done
Run on the CM Server
chkconfig cloudera-scm-server on
service cloudera-scm-server
start
Run the following on all the
nodes
chkconfig cloudera-scm-agent on
service cloudera-scm-agent start
on all the nodes
Login to the GUI verify the host
changes and “Good health”
Force CM to rerun the
configuration to register the changes (mimic changes and revert to force CM a
rerun of config )
Go to the hdfs->system-wide
config edit/changes and make any minimal changes and save and revert back to
original and save
Do the same for mapreduce as
well
Restart the services
Verify the client functionality
by running Terasort.
Wow! Thanks. I don't like this move from "simple" configuration files to this series of incantations :(
ReplyDeleteThank you very much, it really help us
ReplyDeleteThank so much!!! That useful information helped to solve many connection problems those led to fails during calculation of jobs....
ReplyDeleteHi,
ReplyDeleteI have managed to screw up my HDFS ( CDH 4) installation. Are you able to provide paid/unpaid help regarding this?
Thanks
Yes. Could you explain the issue with HDFS ?
ReplyDeleteWe have a 4 node CDH 4.1 cluster with HDFS on it on Amazon EC2. ( 1 x Name Node + 3 x Data nodes )
ReplyDeleteI took a snapshot of the data node servers at T1 but forgot to take a snapshot of the name node server as well.
18 hours later (T1 + 18) I realized that sometime during the last 18 hours the disk was full and took another snapshot of the 3 data nodes.
I increased the size of the disks, but when I tried to bring the cluster back up, Name Node won't come out of safe mode because a number of blocks were missing.
I do have a backup of the name node which ran sometime around T1 + 12 hours.
I have tried to bring the cluster back up with different combinations of the Name Node backup and Data Nodes backup but I get anywhere from 150 to 600 blocks missing ( out of ~1200 total ).
Is there a way to recover the data or I'm screwed?
ps40,
Deletewere you able to recover your files ? if you need any help post back and we can sync up.
Hey When I run the select query I get something like this:
ReplyDeletescm=> select host_id,host_identifier,name,ip_address from hosts;
host_id | host_identifier | name | ip_address
---------+--------------------------------------+------------------------+---------------
8 | dc7c25c8-958a-4332-b78e-1780820333a4 | yamazaki.lunexa.local | 192.168.2.222
6 | 52a9d844-0f16-42b8-9f1b-2572dbbc04fa | woodford.lunexa.local | 192.168.2.224
The host_identifier seems odd, do I still need to change that part, mind you I am running this select query while scm is still up I just wanted to see how this would work when porting the machines to a new network/domain. Any ideas/help on this issue.
It should be fairly straight forward and the earlier blog should help you transition over to the new network. If you using Hive, then it get's a little complicated. The simpler option if you are running 4.4 or above (might work on earlier version but have not tried it) , There is file /etc/cloudera-scm-agent/config.ini -- In this file uncomment the following lines
ReplyDelete# listening_ip=
# Hostname that Agent reports as its hostname
# listening_hostname=
and fill these in with the new network or domain once you have moved over but before you start any agents/services. This is very effective if you have multiple interfaces. Also if you need to force from a config side, you can use dfs.datanode.hostname in hdfs-site.xml and mapred.tasktracker.dns.nameserver or slave.host.name in mapred-site.xml.
Hopefully that should help you migrate to the new Network/domain. Regarding the host identifier, you might get new records added for the new network, which is fine and you can delete the old network names from the cluster. As always, you can try this on test site if you have before trying on production :-)