0%

bigtop杂记

本文主要包括:

  • Bigtop编译Hadoop

Bigtop简介

Bigtop 是 Apache 基金会的一个项目,旨在为基础设施工程师和数据科学家寻找领先的开源大数据组件的全面打包、测试和配置。 Bigtop 支持广泛的组件
使用Bigtop编译,官方推荐使用官方提供的docker
具体使用案例:

docker run --rm -u jenkins:jenkins -v `pwd`:/ws --workdir /ws bigtop/slaves:trunk-ubuntu-20.04 bash -l -c './gradlew allclean ; ./gradlew bigtop-groovy-pkg'

命令解释:

  1. docker run: 运行一个Docker容器。
  2. --rm: 在容器退出后自动删除容器。这可以确保容器使用后不会留下垃圾。
  3. -u jenkins:jenkins: 指定容器中执行命令的用户和用户组。在这里,用户为jenkins,用户组也为jenkins。
  4. -v pwd:/ws: 将当前工作目录(pwd命令获取)挂载到容器中的/ws目录。这样可以将当前主机上的文件映射到容器内。
  5. --workdir /ws: 设置容器的工作目录为/ws,即挂载的主机工作目录。
  6. bigtop/slaves:trunk-ubuntu-20.04: 使用的Docker镜像。这个命令使用了名为bigtop/slaves的Docker镜像,版本为trunk-ubuntu-20.04。
  7. bash -l -c './gradlew allclean ; ./gradlew bigtop-groovy-pkg': 在容器内执行的命令。这个命令首先运行两个Gradle任务,allclean和bigtop-groovy-pkg。
  8. bash -l: 启动一个新的Bash shell,并加载所有配置文件,以确保正确的环境设置。
  9. -c: 指定要在Bash shell中执行的命令。

bigtop编译步骤:

  1. 第一次启动bigtop,bigtop会自己下载一些自己的依赖
  2. 初始化task参数,包含一些环境变量,各个组件的版本等
  3. 下载组件源码,这里默认是从APACHE_MIRROR = "https://apache.osuosl.org"APACHE_ARCHIVE = "https://archive.apache.org/dist"去下载
    1. 也可以指定git,让它从git上clone代码
  4. 编译代码
  5. 生成rpm包

注意:

  1. bigtop task参数在bigtop.bom里配置,比如编译hadoop:
    'hadoop' {
          name    = 'hadoop'
          rpm_pkg_suffix = "_" + bigtop.base_version.replace(".", "_")
          relNotes = 'Apache Hadoop'
          version { base = '3.3.4'; pkg = base; release = 1 }
          tarball { destination = "${name}-${version.base}.tar.gz"
                    source      = "${name}-${version.base}-src.tar.gz" }
          url     { download_path = "/$name/common/$name-${version.base}"
                    site = "${apache.APACHE_MIRROR}/${download_path}"
                    archive = "${apache.APACHE_ARCHIVE}/${download_path}" }
        }
  2. bigtop下载的源码会在bigtop/dl下打成tar包,然后在bigtop/output/hadoop下打成rpm包。最后解压到bigtop/build/hadoop/rpm/SOURCES,编译的时候使用bigtop/build/hadoop/rpm/BUILD,并且源代码下载后,下次启动就会直接使用bigtop/output/hadoop下的rpm包,不会再次下载,并且,会先删除bigtop/build/hadoop下的文件,重新生成,所以,直接修改解压后的源码是没有用的。
    ./gradlew hadoop-rpm -PparentDir=/usr/bigtop
    如果想使用自己修改后的源码编译,Bigtop也提供了指定git地址的方式,具体使用案例:
    ./gradlew hadoop-rpm -Pgit_repo=https://github.com/gujincheng/hadoop.git -Pgit_ref=main -Pbase_version=3.3.4 --info
    这里需要注意:
  3. 直接从githup上clone官方的源码,使用bigtop会报错,报错类似:
    /usr/bin/cat /home/gujc/bigtop/build/hadoop/rpm/SOURCES/patch3-fix-broken-dir-detection.diff | /usr/bin/patch  -p1  --fuzz=0 执行会失败
    具体原因这里也没找到,解决办法:
    先使用./gradlew hadoop-rpm -PparentDir=/usr/bigtop下载好源码,然后把该源码put到git上,然后在用git编译,这样就可以修改源码了

    这里的原因是:bigtop会在bigtop/bigtop-packages/src/common/hadoop下有多个diff文件,bigtop会比较源码与diff文件,把diff文件里的内容打补丁到最终的源码里
    解决办法: rm -rf bigtop/bigtop-packages/src/common/hadoop/*.diff

Bigtop编译Hadoop

  1. 下载bigtop
    git clone https://github.com/apache/bigtop.git -b release-3.2.0
  2. 安装hadoop必要的依赖
    yum -y install fuse-devel cmake cmake3 lzo-devel openssl-devel
  3. 使用bigtop编译hadoop
    ./gradlew hadoop-rpm -Pgit_repo=https://github.com/gujincheng/hadoop.git -Pgit_ref=main -Pbase_version=3.3.4 -PparentDir=/usr/bigtop --info

    报错

  4. Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.11.2:yarn (yarn install) on project hadoop-yarn-applications-catalog-webapp: Failed to run task: 'yarn ' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    原因:node版本与pom中版本不一致
    确认编辑报错位置:hadoop-yarn-applications-catalog-webapp(cd hadoop-3.3.4-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp)

修改pom.xml文件中v14.21.3与系统版本一致

  1. org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (ember build) on project hadoop-yarn-ui: Command execution failed.
    /home/gujc/bigtop/build/hadoop/rpm/BUILD/hadoop-3.3.4-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/temp/lib/temp.js:273
    exports.dir               = path.resolve(os.tmpDir());
                                                ^
    
    TypeError: os.tmpDir is not a function
        at Object.<anonymous> (/home/gujc/bigtop/build/hadoop/rpm/BUILD/hadoop-3.3.4-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/webapp/node_modules/temp/lib/temp.js:273:45)
    原因:os.tmpDir() 方法在 Node.js 版本 12.20.0 中被废弃,并在 Node.js 版本 14.0.0 中被移除。因此,在 Node.js 版本 14.0.0 及以后的版本中,os.tmpDir() 方法将不再可用。
    解决办法:
  2. 需要把node版本换到v14.0之前,修改`hadoop-yarn-api/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/pom.xml,把nodeVersion改成v13.14.0
  3. 添加--ignore-engines
    <execution>
       <phase>generate-resources</phase>
       <id>yarn install</id>
       <goals>
          <goal>yarn</goal>
       </goals>
       <configuration>
          <arguments>install --ignore-engines</arguments>
       </configuration>
    </execution>
  4. com.github.eirslett:frontend-maven-plugin:1.11.2:yarn (yarn install) on project hadoop-yarn-applications-catalog-webapp
    [INFO] error triple-beam@1.4.1: The engine "node" is incompatible with this module. Expected version ">= 14.0.0". Got "13.14.0"
    [INFO] error Found incompatible module.
    [INFO] error Found incompatible module.info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
    这里添加使用--ignore-engines禁用或忽略对项目所需 Node.js 版本的引擎检查
    在```修改:
    <execution>
        <phase>generate-resources</phase>
        <id>yarn install</id>
        <goals>
          <goal>yarn</goal>
        </goals>
        <configuration>
            <arguments>install --ignore-engines</arguments>
        </configuration>
    </execution>
    最终编译成功:
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-mapreduce-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-namenode-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-secondarynamenode-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-zkfc-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-journalnode-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-datanode-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-dfsrouter-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-httpfs-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-kms-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-resourcemanager-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-nodemanager-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-proxyserver-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-timelineserver-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-mapreduce-historyserver-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-yarn-router-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-client-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-conf-pseudo-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-doc-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-libhdfs-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-libhdfs-devel-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-libhdfspp-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-libhdfspp-devel-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-hdfs-fuse-3.3.4-1.el7.x86_64.rpm
    写道:/home/gujc/bigtop/build/hadoop/rpm/RPMS/x86_64/hadoop-debuginfo-3.3.4-1.el7.x86_64.rpm
    执行(%clean): /bin/sh -e /var/tmp/rpm-tmp.eo0qaR
    + umask 022
    + cd /home/gujc/bigtop/build/hadoop/rpm//BUILD
    + cd hadoop-3.3.4-src
    + /usr/bin/rm -rf /home/gujc/bigtop/build/hadoop/rpm/BUILDROOT/hadoop-3.3.4-1.el7.x86_64
    + exit 0
    执行(--clean): /bin/sh -e /var/tmp/rpm-tmp.cwdAe5
    + umask 022
    + cd /home/gujc/bigtop/build/hadoop/rpm//BUILD
    + rm -rf hadoop-3.3.4-src
    + exit 0
    [ant:touch] Creating /home/gujc/bigtop/build/hadoop/.rpm
    :hadoop-rpm (Thread[Execution worker for ':',5,main]) completed. Took 38 mins 23.831 secs.
    
    BUILD SUCCESSFUL in 39m 14s
    31 actionable tasks: 5 executed, 26 up-to-date