本文主要包括:
impala3.4源码编译
最近在集成impala与hudi,但是impala需要在3.4版本以后才支持hudi的读取,所以这里尝试升级impala,废话不多说,下面是编译步骤:
Apache Impala是以源码的形式release的,因此需要自行在对应的平台上编译。找一个跟集群环境一致的机器。
根据文档中的“Building Impala without Test Data (for testing Impala)”章节来编译Impala:
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
下载impala源码
git clone --single-branch --branch 3.4.1 https://github.com/apache/impala.git impala-3.4
cd impala-3.4
bin/bootstrap_system.sh 脚本安装编译依赖:
export IMPALA_HOME=`pwd`
./bin/bootstrap_system.sh
bin/bootstrap_system.sh 踩坑记录
- package make-1:3.82-24.el7.x86_64 is already installed
## 很明显,这个包已经安装过,remove之后重新执行 yum remove make-1:3.82-24.el7.x86_64 - apache-ant-1.9.14-bin.tar.gz下载失败
解决方法是:正在清理软件源: base docker-ce-stable epel extras updates Cleaning up list of fastest mirrors ++ redhat sudo wget -nv https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz ++ [[ true == true ]] ++ sudo wget -nv https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz https://downloads.apache.org/ant/binaries/apache-ant-1.9.14-bin.tar.gz: 2022-05-05 13:36:24 错误 404:Not Found。# 修改bootstrap_system.sh ant下载地址 vim $IMPALA_HOME/bin/bootstrap_system.sh 244 https://downloads.apache.org/ant/binaries/apache-ant-1.10.12-bin.tar.gz 245 redhat sha512sum -c - <<< '2287dc5cfc21043c14e5413f9afb1c87c9f266ec2a9ba2d3bf2285446f6e4ccb59b558bf2e5c57911a05dfa293c7d5c7ad60ac9f744ba11406f4e6f9a27b2403 apache-ant-1.10.12-bin.tar.gz' 246 redhat sudo tar -C /usr/local -xzf apache-ant-1.10.12-bin.tar.gz 247 redhat sudo ln -s /usr/local/apache-ant-1.10.12/bin/ant /usr/local/bin - Data directory is not empty!
这个问题是因为多次执行了bin/bootstrap_system.sh,导致初始化postgresql多次,这里只需要删除Hint: the preferred way to do this is now "postgresql-setup initdb" Data directory is not empty!/var/lib/pgsql下的data文件夹和initdb.log
然后重新执行bin/bootstrap_system.sh即可
如果之前在这台机器上编译过Impala,也可以跳过上面这一步。直接进行编译:
source $IMPALA_HOME/bin/impala-config.sh
$IMPALA_HOME/buildall.sh -noclean -notests -release
编译成功查看
[root@V-NJ-2-220 impala]# ll -h be/build/latest/service/impalad fe/target/impala-frontend-0.1-SNAPSHOT.jar
-rwxr-xr-x. 1 root root 530M 5月 6 16:39 be/build/latest/service/impalad
-rw-r--r--. 1 root root 7.5M 5月 6 16:41 fe/target/impala-frontend-0.1-SNAPSHOT.jar
[root@V-NJ-2-220 impala]# strings be/build/latest/service/impalad | grep 3.4.1-
3.4.1-RELEASE
impala默认使用静态编译,但还是有一些动态依赖,用 ldd 指令查看:
[root@V-NJ-2-220 impala]# ldd be/build/latest/service/impalad
linux-vdso.so.1 => (0x00007fff13bda000)
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/libjsig.so (0x00007f03f6fbe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03f6da2000)
libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f03f6b85000)
libjvm.so => /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so (0x00007f03f5ada000)
libkudu_client.so.0 => /root/gujc/impala/toolchain/kudu-4ed0dbbd1/release/lib64/libkudu_client.so.0 (0x00007f03f53f7000)
librt.so.1 => /lib64/librt.so.1 (0x00007f03f51ef000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f03f4feb000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007f03f4d79000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f03f4916000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f03f462d000)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f03f43e0000)
libstdc++.so.6 => /root/gujc/impala/toolchain/snappy-1.1.4/lib/libstdc++.so.6 (0x00007f03f40d6000)
libm.so.6 => /lib64/libm.so.6 (0x00007f03f3dd4000)
libgcc_s.so.1 => /root/gujc/impala/toolchain/snappy-1.1.4/lib/libgcc_s.so.1 (0x00007f03f3bbd000)
libc.so.6 => /lib64/libc.so.6 (0x00007f03f37ef000)
/lib64/ld-linux-x86-64.so.2 (0x00007f03f71c2000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f03f35d5000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f03f339e000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f03f316b000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f03f2f67000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f03f2d57000)
libz.so.1 => /lib64/libz.so.1 (0x00007f03f2b41000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f03f293d000)
libfreebl3.so => /lib64/libfreebl3.so (0x00007f03f273a000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f03f2513000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f03f22b1000)
这些 so 文件大部分是系统自带的或者已安装的,我们只要复制跟Impala版本相关的就好,比如说 libkudu_client.so.0,其它的不需要一并复制。
这里说的复制,是指,需要把非系统安装的依赖,复制到CM的安装文件夹内,可以看出,只有libkudu_client是放在新编译的源码内的,其他的都是引用的系统安装的依赖
新版本impala部署
我们来生成一个和 /opt/cloudera/parcels/CDH/lib/impala 目录结构一样的目录,然后通过在 CM 里设置 IMPALA_HOME 环境变量来使用它
具体的文件结构这里就不细看了,可以参考在CDH6.3中单独升级IMPALA到APACHE IMPALA 3.4
生成新Impala目录
cd /opt/cloudera/parcels/CDH/lib
cp -r impala impala-3.4
cd impala-3.4
下面按照一下步骤操作:
- 把lib目录里的jar包都删了,剩下so文件
- libkudu_client.so.0 替换为我们编译Impala 3.4时用的,从前面ldd的输出可以看到在 /root/gujc/impala/toolchain/kudu-4ed0dbbd1/release/lib64/libkudu_client.so.0,其它so文件不用管
- impala-3.4依赖的jar包也都复制进这个lib目录,它们在编译目录里能找到,具体路径是 $IMPALA_HOME/fe/target/dependency/
- impala-3.4编译出来的 impala-frontend-0.1-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 fe/target/impala-frontend-0.1-SNAPSHOT.jar
- 把 impala-3.4编译出来的 impala-data-source-api-1.0-SNAPSHOT.jar 放进lib目录,在编译目录里的路径是 ext-data-source/api/target/impala-data-source-api-1.0-SNAPSHOT.jar。
- 把sbin-retail目录的impalad换成apache impala 3.4编译后的impalad,在编译目录里的路径是 be/build/latest/service/impalad
- 把新的Impala目录放到所有机器上,确保它们一致
- 更改CM配置并重启,在CM中去到Impala -> 配置 -> env,加一个环境变量 IMPALA_HOME=/opt/cloudera/parcels/CDH/lib/impala-3.4
- 然后重启整个Impala集群
通过beeline链接impala,查看输出日志:
[root@V-NJ-2-220 ~]# beeline -u "jdbc:impala://xxxx:21050;AuthMech=3;UID=root;PWD=;UseSasl=0"
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:impala://xxxx:21050;AuthMech=3;UID=root;PWD=;UseSasl=0
Connected to: Impala (version 3.4.1-RELEASE)
Driver: ImpalaJDBC (version 02.06.03.1004)
Error: [Cloudera][JDBC](11975) Unsupported transaction isolation level: 4. (state=HY000,code=11975)
Beeline version 2.1.1-cdh6.2.0 by Apache Hive
0: jdbc:impala://xxxx:21050>
可以看出,现在的impala是3.4.1-RELEASE版本的了
具体可以参考CDH6.3.2升级impala3.2至impala3.4详细步骤
impala的源码里默认制定的hudi版本是0.5.0-incubating,这里测试了一下,把hudi的版本改成了0.10.1编译不通过
之后impala还会不会维护3.x版本?因为4.x不支持hive2,仅支持hive3,这样升级的代价更大
IMPALA_HBASE_VERSION = 2.1.0-cdh6.x-SNAPSHOT
IMPALA_HUDI_VERSION = 0.5.0-incubating
IMPALA_SENTRY_VERSION = 2.1.0-cdh6.x-SNAPSHOT
IMPALA_KUDU_VERSION = 4ed0dbbd1
IMPALA_KUDU_JAVA_VERSION= 1.12.0-SNAPSHOT
IMPALA_RANGER_VERSION = 2.0.0.7.0.2.0-212