Skip to content

P1

211275022田昊东

单机伪分布式安装

  • 在ubuntu18.04完成安装

  • Hadoop在任务计划分发、心跳监测、任务管理、多租户管理等功能上,需要通过SSH进行通讯,故先安装ssh,并配置公私钥

  • 安装sudo apt-get install openssh-server

  • 配置公私钥:

    • cd ~/.ssh/
    • ssh-keygen -t rsa
    • cat ./id_rsa.pub >> ./authorized_keys
  • 由于hadoop使用java编写,需要先配置java环境

  • sudo apt-get install default-jre default-jdk

  • 编辑环境变量

    • sudo vim ~/.bashrc

    • bash export JAVA_HOME=/usr/lib/jvm/default-java export HADOOP_HOME=/usr/local/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

    • source ~/.bashrc

  • 下载并解压hadoop

  • sudo tar -zxf /home/hadoop/Downloads/hadoop-3.3.6.tar.gz -C /usr/local

  • 将java环境写入配置文件

  • cd ./hadoop/etc/hadoop

  • sudo vim hadoop-env.sh
  • 添加export JAVA_HOME=/usr/lib/jvm/default-java
  • 激活配置source hadoop-env.sh

  • 编辑配置文件

  • core-site.xml

    • sudo gedit core-site.xml

    • xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>

    • 设置存放目录以及主机号、端口号

  • hdfs-site.xml

    • sudo gedit hdfs-site.xml

    • xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> </configuration>

    • 配置复制块的数目,namenode节点以及datanode节点存储的本地路径

  • mapred-site.xml

    • sudo gedit mapred-site.xml

    • xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

  • yarn-site.xml

    • sudo gedit yarn -site.xml

    • xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>

  • 格式化hadoop/usr/local/hadoop/bin/hdfs namenode -format

  • 启动系统

  • 启动系统start-all.sh

  • 使用jps检查运行状态
  • image-20230930205406931

  • 运行测试

  • 创建文件夹
    • hdfs dfs -put usr/local/hadoop/etc/hadoop/*.xml input
    • hdfs dfs -rm -r output
  • 准备测试输入hdfs dfs -put usr/local/hadoop/etc/hadoop/*.xml input
  • 测试hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep input output 'dfs[a-z.]+'
    • image-20230930205707105
    • image-20230930205720336
  • 输出结果hdfs dfs -cat output/*
    • image-20230930205533104

问题及解决

  • 无法运行测试

  • image-20230930205811637

  • hadoop classpath命令获取 Hadoop 的类路径

  • 在yarn-site.xml添加

    • 告诉 YARN ResourceManager 启动应用程序时要包括的类路径。

    • xml <property> <name>yarn.application.classpath</name> <value>...</value> </property>

docker集群安装

  • docker安装curl https://get.docker.com/ | bash -s docker --mirror Aliyun
  • 检查运行状态sudo systemctl status docker
  • 启动sudo systemctl start docker
  • 将用户添加到docker可执行组sudo usermod -aG docker hadoop
  • 拉取docker镜像:docker pull kiwenlau/hadoop:1.0
  • image-20230930215141414
  • 创建hadoop-docker网络sudo docker network create --driver=bridge hadoop
  • image-20230930215358086
  • 准备好hadoop四个配置文件以及构建脚本Dockeerfile
  • 配置节点启动配置文件docker-compose.yml

  • yaml version: '3' services: master1: image: hadoop stdin_open: true tty: true command: bash hostname: hadoop330 ports: - "9000:9000" - "9870:9870" - "8088:8088" master2: image: hadoop stdin_open: true tty: true command: bash worker1: image: hadoop stdin_open: true tty: true command: bash worker2: image: hadoop stdin_open: true tty: true command: bash worker3: image: hadoop stdin_open: true tty: true command: bash

  • 启动集群docker-compose up -d

  • image-20231001205155090

  • 查看镜像运行状态

  • image-20231001205223777

  • 进入主节点docker exec -it job-big-data_master1_1 bash

  • 格式化HDFS./bin/hdfs namenode -format
  • 启动集群./sbin/start-all.sh
  • 检查各节点运行状态jps

  • image-20231001205251173

  • image-20231001205305569
  • image-20231001205318847

  • 运行测试

  • bash $ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/root $ bin/hdfs dfs -mkdir input $ bin/hdfs dfs -put /etc/hadoop/*.xml input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar grep input output 'dfs[a-z.]+'

  • image-20231001205416516

问题及解决

  • 运行Dockeerfile时无法下载依赖

  • bash > [2/9] RUN yum install -y openssh-server sudo: #0 0.865 CentOS Linux 8 - AppStream 71 B/s | 38 B 00:00 #0 0.867 Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist

  • 换源,载docker构建脚本中添加RUN sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list