The title of this post is a bit of mouthful and requires a bit more explanation.
I am running a pure open-source version of Kafka (currently running 2.7) and am using SASL/GSSAPI connections between all of the brokers and ZooKeeper. Currently, the whole system, including ZooKeeper, is running Java 8 and it is long-overdue to be upgraded to Java 11.
Upgrading Kafka to Java 11 causes the server to send an incorrect authenticationID
String to ZooKeeper which results in the ACLs on the znodes being set to the hostname of the first Kafka server that connects to ZooKeeper. This results in only one of the Kafka hosts being able to r/w the znodes.
Prior to upgrading the following are the logs from one of the ZooKeeper nodes:
2021-05-17 14:47:35,345 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.19.65.22:41698
2021-05-17 14:47:35,357 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.19.65.22:41698
2021-05-17 14:47:35,367 [myid:1] - INFO [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x1797ac46a0f0000 with negotiated timeout 18000 for client /172.19.65.22:41698
2021-05-17 14:47:35,470 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET; authorizationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET.
2021-05-17 14:47:35,474 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: kafka
2021-05-17 14:47:35,474 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@964] - adding SASL authorization for authorizationID: kafka
The following is the corresponding logs from the Kafka host making the connection:
[2021-05-17 14:47:37,512] INFO Successfully authenticated client: authenticationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET; authorizationID=kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET. (org.apache.kafka.common.secur
ity.authenticator.SaslServerCallbackHandler)
The most important part of this is that ZooKeeper sees the authenticationID
as kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET
. ZooKeeper is configured with the following options so that the host and realm will be removed from the authenticationID
. This will leave the service name, ‘kafka‘, and result in all of the Kakfa znodes with kafka as the owner of the node. This then enables all of the hosts in the Kafka cluster r/w access to those znodes. Following are the configurations in ZooKeeper that enable this behavior.
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
The resulting znodes ACLs are
[zk: localhost:2181(CONNECTED) 5] getAcl /brokers
'world,'anyone
: r
'sasl,'kafka
: cdrwa
System Configurations
Following are all of the configurations for both the ZooKeeper and Kafka cluster:
ZooKeeper
zoo.cfg
maxClientCnxns=50
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/data_dir
dataLogDir=/zk/data_log_dir
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000
clientPort=2181
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
server.1=zk2-01.dv.quasar.nadops.net:2888:3888
server.2=zk2-02.dv.quasar.nadops.net:2888:3888
server.3=zk2-03.dv.quasar.nadops.net:2888:3888
jaas.conf
Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/etc/zookeeper/conf/vmwqsrdvz201.dv.quasar.nadops.net.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/vmwqsrdvz201.dv.quasar.nadops.net@EXAMPLE.NET";
};
Kafka
broker-jaas.conf
KafkaServer {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/usr/local/kafka/config/vmwqsrdvk01.dv.quasar.nadops.net.keytab"
storeKey=true
principal="kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET";
};
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/usr/local/kafka/config/vmwqsrdvk01.dv.quasar.nadops.net.keytab"
storeKey=true
serviceName="zookeeper"
principal="kafka/vmwqsrdvk01.dv.quasar.nadops.net@EXAMPLE.NET";
};
server.properties
Only the relevant configs are included below.
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,BROKER_LISTENER:SASL_SSL
inter.broker.listener.name=BROKER_LISTENER
listeners=SASL_SSL://:9092,BROKER_LISTENER://:9093
ssl.keystore.location=/usr/local/kafka/config/kafka01.dv.quasar.nadops.net-keystore.jks
ssl.keystore.password=***
ssl.key.password=***
ssl.truststore.location=/usr/local/kafka/config/kafka01.dv.quasar.nadops.net-truststore.jks
ssl.truststore.password=***
sasl.kerberos.service.name=kafka
sasl.mechanism.inter.broker.protocol=GSSAPI
sasl.enabled.mechanisms=GSSAPI
zookeeper.set.acl=true
super.users=User:kafka
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
allow.everyone.if.no.acl.found=false
Upgrading Kafka to Java 11
Once Kafka (leaving ZooKeeper on Java 8) is upgraded to Java 11 Kafka no longer sends the expected authorizationID String to ZooKeeper.
Following are the ZooKeeper logs of the same authentication:
2021-05-18 14:31:01,627 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.19.65.23:53374
2021-05-18 14:31:01,631 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.19.65.23:53374
2021-05-18 14:31:01,636 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer@617] - Established session 0x3797fd27e300004 with negotiated timeout 18000 for client /172.19.65.23:53374
2021-05-18 14:31:01,728 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=vmwqsrdvk02@EXAMPLE.NET; authorizationID=kafka/vmwqsrdvk02.dv.quasar.nadops.net@EXAMPLE.NET.
2021-05-18 14:31:01,728 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: vmwqsrdvk02
2021-05-18 14:31:01,728 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@964] - adding SASL authorization for authorizationID: vmwqsrdvk02
Following are the logs from Kafka that results in an exception being thrown because it is unable to write to the ZooKeeper znode:
[2021-05-18 14:31:01,410] INFO Initiating client connection, connectString=vmwqsrdvz201.dv.quasar.nadops.net:2181,vmwqsrdvz202.dv.quasar.nadops.net:2181,vmwqsrdvz203.dv.quasar.nadops.net:2181 sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@39b43d60 (org.apache.zookeeper.ZooKeeper)
[2021-05-18 14:31:01,414] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-05-18 14:31:01,419] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,421] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-05-18 14:31:01,556] INFO Client successfully logged in. (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,573] INFO TGT refresh thread started. (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,577] INFO Client will use GSSAPI as SASL mechanism. (org.apache.zookeeper.client.ZooKeeperSaslClient)
[2021-05-18 14:31:01,604] INFO TGT valid starting at: Tue May 18 14:31:01 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,604] INFO TGT expires: Wed May 19 00:31:01 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,605] INFO TGT refresh sleeping until: Tue May 18 22:32:58 UTC 2021 (org.apache.zookeeper.Login)
[2021-05-18 14:31:01,606] INFO Opening socket connection to server vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,614] INFO Socket connection established, initiating session, client: /172.19.65.23:53374, server: vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181 (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,622] INFO Session establishment complete on server vmwqsrdvz203.dv.quasar.nadops.net/172.19.65.21:2181, sessionid = 0x3797fd27e300004, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
[2021-05-18 14:31:01,626] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-05-18 14:31:01,731] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /brokers/ids
at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:564)
at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1662)
at kafka.zk.KafkaZkClient.makeSurePersistentPathExists(KafkaZkClient.scala:1560)
at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1(KafkaZkClient.scala:1552)
at kafka.zk.KafkaZkClient.$anonfun$createTopLevelPaths$1$adapted(KafkaZkClient.scala:1552)
at scala.collection.immutable.List.foreach(List.scala:333)
at kafka.zk.KafkaZkClient.createTopLevelPaths(KafkaZkClient.scala:1552)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:467)
at kafka.server.KafkaServer.startup(KafkaServer.scala:233)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:82)
at kafka.Kafka.main(Kafka.scala)
[2021-05-18 14:31:01,733] INFO shutting down (kafka.server.KafkaServer)
The authorizationID now provided by Kafka is now vmwqsrdvk02@EXAMPLE.NET
, only the hostname and realm, and no longer the String that ZooKeeper expects. The resulting, derived authorizationID is vmwqsrdvk0
2 which means that only that Kafka host has r/w to any of the znodes which it creates.
Solution
After exhausting search options and finding multiple pages indicating that I have everything configured correctly, even for JDK 11, I posted to the #apache-kafka channel on chat.freenode.net and got a hint. Evidently, there are some JDKs that have bugs in their Kerberos implementation.
I was running java-11-openjdk-11.0.8.10-1.el7.x86_64
from CentOS 7 Updates. I double-checked and there was an updated JDK available.
After updating to the latest Java 11 JDK everything worked as expected. If the particular JDK that you are using is still not working try AdoptOpenJDK.