0%

impala3.4源码编译

本文主要包括:

  • impala3.4源码编译

impala3.4源码编译

最近在集成impala与hudi,但是impala需要在3.4版本以后才支持hudi的读取,所以这里尝试升级impala,废话不多说,下面是编译步骤:

Apache Impala是以源码的形式release的,因此需要自行在对应的平台上编译。找一个跟集群环境一致的机器。
根据文档中的“Building Impala without Test Data (for testing Impala)”章节来编译Impala:
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala

下载impala源码

git clone --single-branch --branch 3.4.1 https://github.com/apache/impala.git impala-3.4
cd impala-3.4

bin/bootstrap_system.sh 脚本安装编译依赖:

export IMPALA_HOME=`pwd`
./bin/bootstrap_system.sh

bin/bootstrap_system.sh 踩坑记录

  • package make-1:3.82-24.el7.x86_64 is already installed
    ## 很明显,这个包已经安装过,remove之后重新执行
    yum remove make-1:3.82-24.el7.x86_64
  • apache-ant-1.9.14-bin.tar.gz下载失败
    正在清理软件源: base docker-ce-stable epel extras updates
    Cleaning up list of fastest mirrors
    ++ redhat sudo wget -nv https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz
    ++ [[ true == true ]]
    ++ sudo wget -nv https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz
    https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz:
    2022-05-05 13:36:24 错误 404:Not Found。
    解决方法是:
    # 修改bootstrap_system.sh ant下载地址
    vim $IMPALA_HOME/bin/bootstrap_system.sh
    
    244   https://downloads.apache.org/ant/binaries/apache-ant-1.10.12-bin.tar.gz
    245 redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403  apache-ant-1.10.12-bin.tar.gz'
    246 redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz
    247 redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin
  • Data directory is not empty!
    Hint: the preferred way to do this is now "postgresql-setup initdb"
    Data directory is not empty!
    这个问题是因为多次执行了bin/bootstrap_system.sh,导致初始化postgresql多次,这里只需要删除/var/lib/pgsql下的data文件夹和initdb.log
    然后重新执行bin/bootstrap_system.sh即可

如果之前在这台机器上编译过Impala,也可以跳过上面这一步。直接进行编译:

source $IMPALA_HOME/bin/impala-config.sh
$IMPALA_HOME/buildall.sh -noclean -notests -release

编译成功查看

[root@V-NJ-2-220 impala]# ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar
-rwxr-xr-x. 1 root root 530M 56 16:39 be/build/latest/service/impalad
-rw-r--r--. 1 root root 7.5M 56 16:41 fe/target/impala-frontend-0.1-SNAPSHOT.jar
[root@V-NJ-2-220 impala]# strings be/build/latest/service/impalad | grep 3.4.1-
3.4.1-RELEASE

impala默认使用静态编译,但还是有一些动态依赖,用 ldd 指令查看:

[root@V-NJ-2-220 impala]# ldd be/build/latest/service/impalad
	linux-vdso.so.1 =>  (0x00007fff13bda000)
	/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/libjsig.so (0x00007f03f6fbe000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03f6da2000)
	libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f03f6b85000)
	libjvm.so => /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so (0x00007f03f5ada000)
	libkudu_client.so.0 => /root/gujc/impala/toolchain/kudu-4ed0dbbd1/release/lib64/libkudu_client.so.0 (0x00007f03f53f7000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f03f51ef000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f03f4feb000)
	libssl.so.10 => /lib64/libssl.so.10 (0x00007f03f4d79000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f03f4916000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f03f462d000)
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f03f43e0000)
	libstdc++.so.6 => /root/gujc/impala/toolchain/snappy-1.1.4/lib/libstdc++.so.6 (0x00007f03f40d6000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f03f3dd4000)
	libgcc_s.so.1 => /root/gujc/impala/toolchain/snappy-1.1.4/lib/libgcc_s.so.1 (0x00007f03f3bbd000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f03f37ef000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f03f71c2000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f03f35d5000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f03f339e000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f03f316b000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f03f2f67000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f03f2d57000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f03f2b41000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f03f293d000)
	libfreebl3.so => /lib64/libfreebl3.so (0x00007f03f273a000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f03f2513000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f03f22b1000)

这些 so 文件大部分是系统自带的或者已安装的,我们只要复制跟Impala版本相关的就好,比如说 libkudu_client.so.0,其它的不需要一并复制。
这里说的复制,是指,需要把非系统安装的依赖,复制到CM的安装文件夹内,可以看出,只有libkudu_client是放在新编译的源码内的,其他的都是引用的系统安装的依赖

新版本impala部署

我们来生成一个和 /opt/cloudera/parcels/CDH/lib/impala 目录结构一样的目录,然后通过在 CM 里设置 IMPALA_HOME 环境变量来使用它
具体的文件结构这里就不细看了,可以参考在CDH6.3中单独升级IMPALA到APACHE IMPALA 3.4

生成新Impala目录

cd /opt/cloudera/parcels/CDH/lib
cp -r impala impala-3.4
cd impala-3.4

下面按照一下步骤操作:

  1. 把lib目录里的jar包都删了,剩下so文件
  2. libkudu_client.so.0 替换为我们编译Impala 3.4时用的,从前面ldd的输出可以看到在 /root/gujc/impala/toolchain/kudu-4ed0dbbd1/release/lib64/libkudu_client.so.0,其它so文件不用管
  3. impala-3.4依赖的jar包也都复制进这个lib目录,它们在编译目录里能找到,具体路径是 $IMPALA_HOME/fe/target/dependency/
  4. impala-3.4编译出来的 impala-frontend-0.1-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 fe/target/impala-frontend-0.1-SNAPSHOT.jar
  5. 把 impala-3.4编译出来的 impala-data-source-api-1.0-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar。
  6. 把sbin-retail目录的impalad换成apache impala 3.4编译后的impalad,在编译目录里的路径是 be/build/latest/service/impalad
  7. 把新的Impala目录放到所有机器上,确保它们一致
  8. 更改CM配置并重启,在CM中去到Impala -> 配置 -> env,加一个环境变量 IMPALA_HOME=/opt/cloudera/parcels/CDH/lib/impala-3.4
  9. 然后重启整个Impala集群

通过beeline链接impala,查看输出日志:

[root@V-NJ-2-220 ~]# beeline -u "jdbc:impala://xxxx:21050;AuthMech=3;UID=root;PWD=;UseSasl=0"
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:impala://xxxx:21050;AuthMech=3;UID=root;PWD=;UseSasl=0
Connected to: Impala (version 3.4.1-RELEASE)
Driver: ImpalaJDBC (version 02.06.03.1004)
Error: [Cloudera][JDBC](11975) Unsupported transaction isolation level: 4. (state=HY000,code=11975)
Beeline version 2.1.1-cdh6.2.0 by Apache Hive
0: jdbc:impala://xxxx:21050>

可以看出,现在的impala是3.4.1-RELEASE版本的了
具体可以参考CDH6.3.2升级impala3.2至impala3.4详细步骤

impala的源码里默认制定的hudi版本是0.5.0-incubating,这里测试了一下,把hudi的版本改成了0.10.1编译不通过
之后impala还会不会维护3.x版本?因为4.x不支持hive2,仅支持hive3,这样升级的代价更大

IMPALA_HBASE_VERSION    = 2.1.0-cdh6.x-SNAPSHOT
IMPALA_HUDI_VERSION     = 0.5.0-incubating
IMPALA_SENTRY_VERSION   = 2.1.0-cdh6.x-SNAPSHOT
IMPALA_KUDU_VERSION     = 4ed0dbbd1
IMPALA_KUDU_JAVA_VERSION= 1.12.0-SNAPSHOT
IMPALA_RANGER_VERSION   = 2.0.0.7.0.2.0-212