Java One 2017: Open Source Big Data in the Cloud (Hadoop, Hive, Spark, Kafka)

It’s true. I always said “presenting at Java One is like playing in champions league”. Last month I had the great pleasure to present at the Java One 2017 conference in San Francisco together with Edelweiss Kammermann about Open Source Big Data used in the cloud. The presentation included 4 live demos about Apache Hadoop with Map Reduce, Apache Hive, Apache Spark and Kafka all using Oracle Big Data Cloud Service – Compute Edition (aka BDCS-CE) and the Oracle Event Hub Service. The presentation was recorded – so you can enjoy from anywhere in the world.

For your convenience the slides are available on slideshare:

Purge / Empty / Drain a Kafka Topic in Oracle Event Hub Service (or any other Kafka broker)

I did not find this solution myself, but I am also not sure where I discovered it. Just a note to myself.

Actually it is becomes useful once you enabled client access to your Oracle Event Hub Cloud Service, since the web based console itself does not implement every functionality that is provided by Kafka.

# PURGE topic
# we drain the topic by expiring the messages

./kafka-topics --zookeeper ZKADR --alter --topic topic_name --config retention.ms=1000
./kafka-topics --zookeeper ZKADR --alter --topic topic_name --delete-config retention.ms

On another thought: what if the Event Hub Console implemented draining a topic in the web console? And maybe it could also display the number of messages stored in topic.

Access Oracle Event Hub Kafka from External Kafka Client or Tool

Access Oracle Event Hub from external Tool or Command-Line Client

Oracle Event Hub provides a managed Kafka PaaS solution. To access it from an on-premises client you have to make sure to enable the ports to Event Hub Zookeeper and the Kafka broker.

Access to Kafka Broker

First lets enable access to Kafka broker. To do so, check the OPC Event Hub service for the connect string.

Create Event Hub Broker Access Rule

Then create a new access rule. Warning: In general you should not allow public access to access your Event Hub service! This is just for demo purposes to make the tool work. In case of doubt create a rule with your own IP address and talk your friendly security officer first of all.

The creation of the rule might take a few seconds:

Create Zookeeper Access Rule

Once the rule for the Kafka broker is created, we need to create a rule for Zookeeper which is using port 2181:

Explore Kafka Tool (or other)

Now lets start our Kafka tool (for demonstration purpose) only, configure the connection details for the Zookeeper IP and port, and then try to connect to Oracle Event Hub Service:

Voila, it is working 🙂 You can explore your topics or even create new ones. Note that  Oracle Event Hub uses a special naming convention for topics.

Feedback: Big Data Training: Hadoop, Spark, Kafka, Cassandra, Oracle and the Cloud

Guys, thanks for attending my DOAG training day about Cloudera/Hadoop and Oracle Big data. I am pleased about your amazing feedback!

Statistics about Big Data Training Day

About the course

Attendees had the following opinion about the course when asked right after the training

  • 100% of those who answered would recommend the course
  • 100% of all found it interesting
  • 81% found that content matched their experience
  • 86% were happy or very happy with the course (these are the 2 highest grades possible)
  • 0% were unhappy
  • Everyone  except one person (that is 20 people) found the level of difficulty okay.
  • Everyone except one person found that they were engaged enough

About myself

The following is what attendees mentioned as feedback about myself (there were no explicit questions about Edelweiss, so she is not included unfortunately). Multiple answers were possible, answers weren’t mandatory. This basically tells that someone making a cross at e.g. informative sincerely means it.

  • 86% found me interesting
  • 40% entertaining 🙂
  • 90% informative
  • 71% demonstrative and clear

Training Day: Cloudera Hadoop Stack with Kafka and Cassandra and Oracle Big Data / BI and the Cloud

Right after the DOAG conference 2016 in Nürnberg / Germany we will be running a big data training day. Meet the big 4: Hadoop, Spark, Kafka (all 3 from the Cloudera distribution), and Cassandra. Plus Oracle Big Data on top. For details in German see here.

Topics of the workshop are the following (subject to change):
Open Source / Cloudera Stack 

Oracle Big Data Products

Oracle Big Data Appliance

OBIEE

Oracle Data Integrator (ODI)

Oracle Big Data Discovery

 

The training day will include live demos and some are running in the cloud. So stay tuned!

This event will be co-hosted by Edelweiss Kammermann, Oracle ACE director from Uruguay and BI expert. Edelweiss will present mostly about Oracle Big Data / Business Intelligence, whereas I will try to cover the open source Hadoop / Cloudera part.

OLYMPUS DIGITAL CAMERA