Cloudera部署SUSE

操作系统:SUSE11 SP3
CM版本:CM5.11.0
CDH版本:CDH5.10.1
使用root用户对集群进行部署

服务器配置

  • 关闭防火墙
  • 配置时钟同步服务
  • 检查数据盘是否挂载

安装的机器ip:
192.168.65.40,192.168.65.41,192.168.65.42,192.168.65.43,192.168.65.44,192.168.65.45,192.168.65.46,192.168.65.47,192.168.65.48,192.168.65.49,

  1. 安装操作系统,建议对操作系统盘做RAID1
  2. 如果不能连接互联网,先创建OS的repository,以便yum或zypper可以访问OS镜像
  3. 为了使集群中各个节点之间能互相通信,需要静态或动态配置节点的IP地址。如果使用动态配置,请安装DHCP和DNS服务器,具体请参见对应软件的安装文档,此不赘述;如果使用静态IP地址,请正确配置各节点的IP,并在/etc/hosts配置所有节点的静态DNS解析。
    以cm节点为例:/etc/sysconfig/network-scripts/ifcfg-eth0

    DEVICE="bond0"
    BOOTPROTO="static"
    IPV6INIT="no"
    MTU="1500"
    NM_CONTROLLED="no"
    ONBOOT="yes"
    TYPE="Ethernet"
    IPADDR=192.168.60.25
    NETMASK=255.255.255.0
    GATEWAY=192.168.60.1

    /etc/hosts样例

    192.168.60.24 ddp-cm
    192.168.60.25 ddp-hnn-01
    192.168.60.26 ddp-dn-001
    192.168.60.27 ddp-dn-002

  4. 如果机器配置有双网卡,可以做双网卡绑定;

  5. 关闭并禁用防火墙
    /sbin/rcSuSEfirewall2 stop
    chkconfig SuSEfirewall2_setup off
  6. 关闭SELinux
    service boot.apparmor stop
    chkconfig boot.apparmor off
  7. 重启网络服务,并初始化网络
    /etc/init.d/network restart
  8. 修改transparent_hugepage参数,这一参数默认值可能会导致CDH性能下降
    在/etc/init.d/after.local中增加一行:
    if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi
  9. 禁用交换内存

    $> vim /etc/sysctl.conf
    增加一行:vm.swappiness=0
    echo 0 >/proc/sys/vm/swappiness

  10. 修改/etc/security/limits.conf或者在/etc/security/limits.d下增加相应的配置文件,可以设置一些硬限制和软限制;Cloudera Manager节点会为所有节点自动做这些修改。

  11. 在需要作为Repo库的节点上安装必要的软件,包含HTTP服务和Repo创建工具
  12. SUSE11 SP3 安装会有缺失的依赖需要手动安装rpm包
    记得先查看是否有失效的zypper源,删除后添加内网配置的zypper源

    zypper ar http://192.168.58.55/suse/11/x86_64/ main
    zypper ref
    zypper in postgresql91

    之后即可安装所缺的依赖包 python-psycopg
    sudo rpm -ivh python-psycopg2-2.6.2-3.2.x86_64.rpm

MySQL建表语句:

CREATE USER 'hive'@'%' IDENTIFIED BY 'xxxxxxxx';
create database metastore_r default character set utf8;
GRANT ALL PRIVILEGES ON metastore_r. * TO 'hive'@'%';
create user 'amon'@'%' identified by 'xxxxxxxx'
create database amon_r default character set utf8;
grant all privileges on amon_r.* to 'amon'@'%';
create user 'rman'@'%' identified by 'xxxxxxxx';
create database rman_r default character set utf8;
grant all privileges on rman_r.* to 'rman'@'%';
create user 'sentry'@'%' identified by 'xxxxxxxx';
create database sentry_r default character set utf8;
grant all privileges on sentry_r.* to 'sentry'@'%';
create user 'nav'@'%' identified by 'xxxxxxxx';
create database nav_r default character set utf8;
grant all privileges on nav_r.* to 'nav'@'%';
create user 'navms'@'%' identified by 'xxxxxxxx';
create database navms_r default character set utf8;
grant all privileges on navms_r.* to 'navms'@'%';
create user 'cm'@'%' identified by 'xxxxxxxx';
create database cm_r default character set utf8;
grant all privileges on cm_r.* to 'cm'@'%';
CREATE USER 'hue'@'%' IDENTIFIED BY 'xxxxxxxx';
create database hue_r default character set utf8;
GRANT ALL PRIVILEGES ON hue_r.* TO 'hue'@'%';
CREATE USER 'oozie'@'%' IDENTIFIED BY 'xxxxxxxx';
create database oozie_r default character set utf8;
GRANT ALL PRIVILEGES ON oozie_r.* TO 'oozie'@'%';
FLUSH PRIVILEGES;

Mysql 修改密码

use mysql:
5.7之前版本
update user set password=passworD(“xxxxxxxx”) where user=’hive’;
update user set password=password(“xxxxxxxx”) where user=’hue’;
update user set password=passworD(“xxxxxxxx”) where user=’oozie’;
5.7之后版本
update user set authentication_string = password(‘xxxxxxxx’) where user = ‘hue’;

CDH软件下载与配置(Cloudera管理器节点)

  1. 下载Cloudera管理器需要的rpm包
  2. 下载Parcel包(包含了CDH中的Hadoop组件)
    选择合适版本的parcel包,包括manifest.json文件
  3. 下载后将下载的RPM包放置在一个文件夹后,createrepo .
  4. 执行完后,在cm目录下生成目录repodata
  5. 将文件移动到特定的目录,确保可以通过HTTP协议进行访问
    Apache2服务安装zypper install apache2

http服务侦听配置:修改/etc/apache2/listen.conf 配置HTTP服务器侦听IP和端口,如:
Listen 192.168.65.50:80
http共享目录配置:修改/etc/apache2/default-server.conf 配置一个共享目录,如:

DocumentRoot "/srv/www/htdocs"
<Directory "/srv/www/htdocs">
Options Indexes FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
</Directory>

启动apache2服务:

/etc/init.d/apache2 start
将下载的CM和Parcels介质文件存放到/srv/www/htdocs/下的自定义目录下
把cm的rpm包移到/var/www/htdocs/cm5.11
把parcels包移到/srv/www/htdocs/parcels/
能通过浏览器通过HTTP URL访问介质目录内容即可,如:
http://<IP>/cm5.11/
http://<IP>/parcels/
新建/etc/zypp/repos.d/myrepo.repo

[myrepo]
name=repo
baseurl=http://192.168.65.40/cm5.11/
enabled=true
gpgcheck=false

安装Cloudera管理器

  1. 安装JDK zypper install oracle-j2sdk1.7
  2. 安装Cloudera管理器服务器 zypper install cloudera-manager-daemons cloudera-manager-server
  3. 在需要访问MySQL的节点上安装mysql-connector-java

    cp mysql-connector-java-5.1.34.jar /usr/share/java/
    ln –s mysql-connector-java-5.1.34.jar mysql-connector-java.jar

  4. 为Cloudera管理器配置外部数据库

    /usr/share/cmf/schema/scm_prepare_database.sh -h <MYSQL_HOST> <DB_TYPE> <DATABASE> <USERNAME> <PASSWORD>
    -h <MYSQL_HOST>可以不指定,默认是localhost
    DB_TYPE可以是mysql, oracle
    DATABASE即为之前为Cloudera Manager配置的数据库
    USERNAME/PASSWORD 即为可以访问这个数据库的用户
    /usr/share/cmf/schema/scm_prepare_database.sh -h 192.168.60.173 mysql 'cm_r' cm 'xxxxxxxx'

  5. 启动Cloudera管理器服务器 service cloudera-scm-server start 启动后就可以访问Cloudera管理器页面了,Cloudera管理器的监听端口为7180

安装CDH集群

  1. 输入账户密码admin/admin,点击“登录”
  2. 选择要安装的集群版本,点击“继续
  3. 了解CDH支持的Hadoop组件信息,点击“继续”
  4. 查找并选择需要安装CDH的机器,点击“继续”
  5. 点击“使用Parcels(建议)”右侧的“更多选项”按钮,在弹出框中设置CDH Parcel包的URL,点击“确定”
  6. 选择5.10.1-1.cdh5.10.1.p0.10,在“自定义存储库”填写依赖包的URL,点击“继续”
  7. 选择需要的JDK,点击“继续”
  8. 输入集群机器的登录密码,点击“继续”
  9. 集群依赖包安装,安装完后点击“继续”
  10. Parcel包安装,安装完后点击“继续”
  11. 检查主机正确性,如果检查出现任何潜在问题,你可以到集群中进行修复,修复后点击“重新运行”重新检查。解决所有问题后,点击“完成”
  12. 选择要安装的服务套装,点击“继续”
  13. 选择具体的角色分配
  14. 设置数据库,设置完毕后点击“测试连接”,测试全部通过后点击“继续”
  15. 配置集群组件的相关参数,点击“继续”
  16. 启动集群,完成后点击“继续”
  17. 点击“完成”
  18. 访问集群
  19. 搞定…

彻底删除CM和CDH步骤,以便重来

service cloudera-scm-agent stop
umount /var/run/cloudera-scm-agent/process
rm -rf /usr/share/cmf /var/lib/cloudera* /var/cache/zypp/packages/cloudera-* /var/cache/zypp/raw/cloudera-* /var/cache/zypp/solv/cloudera-* /var/log/cloudera-scm-* /var/run/cloudera-scm-* /etc/cloudera-scm-server*
rpm -qa |grep cloudera
rpm -e cloudera-manager-agent-5.11.1-1.cm5111.p0.9.sles11
rpm -e cloudera-manager-server-5.11.1-1.cm5111.p0.9.sles11
rpm -e cloudera-manager-daemons-5.11.1-1.cm5111.p0.9.sles11
rm -rf /var/lib/hadoop-* /var/lib/impala /var/lib/solr /var/lib/zookeeper /var/lib/hue /var/lib/oozie /var/lib/pgsql /var/lib/sqoop2 /data/dfs/ /data/impala/ /data/yarn/ /dfs/ /impala/ /yarn/ /var/run/hadoop-*/ /var/run/hdfs-*/ /usr/bin/hadoop* /usr/bin/zookeeper* /usr/bin/hbase* /usr/bin/hive* /usr/bin/hdfs /usr/bin/mapred /usr/bin/yarn /usr/bin/sqoop* /usr/bin/oozie /etc/hadoop* /etc/zookeeper* /etc/hive* /etc/hue /etc/impala /etc/sqoop* /etc/oozie /etc/hbase* /etc/hcatalog
rm -rf /opt/cloudera/parcel-cache /opt/cloudera/parcels
ps aux|grep supervisord |grep -v grep | awk '{print "kill -9 "$2}'|sh
umount /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1
查看磁盘占用:fuser -m -v /mnt/test

sh formatdisk.sh
sh mountdisk.sh

删除创建的数据库

drop database amon_r;
drop database hue_r;
drop database metastore_r;
drop database nav_r;
drop database navms_r;
drop database oozie_r;
drop database rman_r;
drop database sentry_r;


Running a MapReduce Job

  1. Log into a cluster host.
  2. Run the Hadoop PiEstimator example:

    sudo -u hdfs hadoop jar \
    /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
    pi 10 100

  3. View the result of running the job by selecting the following from the top navigation bar in the Cloudera Manager Admin Console: Clusters > Cluster Name > YARN Applications.