Analytics and Data Summit 2018: Serverless and Machine Learning + Open Source Big Data in the Cloud

The year has just started and here is the first “good news” yet: My presentation about “Serverless Architectures and Machine Learning” was accepted for the Analytics and Data Summit 2018 (former BIWA conference). The presentation will include a live demo with Fn Project.

In addition to that I will give another presentation together with Edelweiss Kammermann about Open Source Big Data (with Hadoop, Hive, Spark and Kafka live demos) in the Cloud. IMHO, two fabulous topics – I am looking forward to see you there!

Java One 2017: Open Source Big Data in the Cloud (Hadoop, Hive, Spark, Kafka)

It’s true. I always said “presenting at Java One is like playing in champions league”. Last month I had the great pleasure to present at the Java One 2017 conference in San Francisco together with Edelweiss Kammermann about Open Source Big Data used in the cloud. The presentation included 4 live demos about Apache Hadoop with Map Reduce, Apache Hive, Apache Spark and Kafka all using Oracle Big Data Cloud Service – Compute Edition (aka BDCS-CE) and the Oracle Event Hub Service. The presentation was recorded – so you can enjoy from anywhere in the world.

For your convenience the slides are available on slideshare:

Purge / Empty / Drain a Kafka Topic in Oracle Event Hub Service (or any other Kafka broker)

I did not find this solution myself, but I am also not sure where I discovered it. Just a note to myself.

Actually it is becomes useful once you enabled client access to your Oracle Event Hub Cloud Service, since the web based console itself does not implement every functionality that is provided by Kafka.

# PURGE topic
# we drain the topic by expiring the messages

./kafka-topics --zookeeper ZKADR --alter --topic topic_name --config retention.ms=1000
./kafka-topics --zookeeper ZKADR --alter --topic topic_name --delete-config retention.ms

On another thought: what if the Event Hub Console implemented draining a topic in the web console? And maybe it could also display the number of messages stored in topic.

Access Oracle Event Hub Kafka from External Kafka Client or Tool

Access Oracle Event Hub from external Tool or Command-Line Client

Oracle Event Hub provides a managed Kafka PaaS solution. To access it from an on-premises client you have to make sure to enable the ports to Event Hub Zookeeper and the Kafka broker.

Access to Kafka Broker

First lets enable access to Kafka broker. To do so, check the OPC Event Hub service for the connect string.

Create Event Hub Broker Access Rule

Then create a new access rule. Warning: In general you should not allow public access to access your Event Hub service! This is just for demo purposes to make the tool work. In case of doubt create a rule with your own IP address and talk your friendly security officer first of all.

The creation of the rule might take a few seconds:

Create Zookeeper Access Rule

Once the rule for the Kafka broker is created, we need to create a rule for Zookeeper which is using port 2181:

Explore Kafka Tool (or other)

Now lets start our Kafka tool (for demonstration purpose) only, configure the connection details for the Zookeeper IP and port, and then try to connect to Oracle Event Hub Service:

Voila, it is working 🙂 You can explore your topics or even create new ones. Note that  Oracle Event Hub uses a special naming convention for topics.

Oracle Event Hub Cloud Service: What you need to know about Topic Names

There are a number of things related to topics in Oracle Event Hub service that everybody should be aware of:

  • Oracle Event Hub topics created with the web console are automatically prefixed with the OPC ID domain.
  • Event hub topics can be created via the Kafka command-line from any host (assuming you allow the clients to access Event Hub CS). These topics are not prefixed with the OPC ID domain.
  • Topics created with the CLI (without ID domain prefix) are not shown in the service console.

IMHO, this is behaviour is not very useful for various reasons:

  • If you are planning to use Event Hub as a drop in replacement for another Kafka installation you won’t be able to create the proper topic names for already existing topics with the service console.
  • You have to add the ID domain prefix in every client. This is particularly bad e.g. for a Java producer. Hard-coded ID domains will show up sooner or later in the source code.
  • Being forced to use the ID domain prefix in every client might turn out to be a security issue. Did you note that most bloggers blacken their ID domains in screenshots when writing about OPC?