Declarative Data Lake using Apache Hudi

Harsha Teja Kanna
2 min readFeb 20, 2022

Follow up post: https://www.ekalavya.dev/how-to-run-apache-hudi-deltastreamer-kubevela-addon/

I have been working on operationalizing Apache Hudi recently and tried to do some user-friendly stuff for adopting it like here. I had a table service in mind but did not get a chance to complete my implementation.

But now, I want to write some tutorials about Apache Hudi while it is still fresh in my mind and I need a simple demo setup for readers.

And I read a blog about the fastest way to try out Apache Iceberg on a laptop. So I want to create my own

Kubernetes, Spark, and Hudi: The Fastest Way to Try Apache Hudi!

I am making slow progress towards it. Maybe I will start posting the tutorial blogs from next week. It works something like.

Setup local environment

kind create cluster -config kind.yaml

Setup hudi-operator

kubectl apply -k hudi-operator/config

Create hudi table

kubectl apply -f hudi_table.yaml

Run hudi query

kubectl apply -f hudi_query.yaml

So once the first 2 steps are done, we can try out many Apache Hudi features using 2 commands.

To accomplish this I am also implementing the Hudi Lock configuration using Kubernetes to demo the whole gamut of Hudi features, so taking some time.

This may sound like going a little overboard just for a blog, I am kind of obsessed with developer experience generally(devx).

--

--