🪴 无人之路

Search

Search IconIcon to open search

Hands on Zeppelin

Last updated Dec 17, 2022 Edit Source

# Hands on Zeppelin: Step by Step

In order to build a data playground enabling the engineers to explore data collected by autonomous cars, I am trying to deploy Apache Zeppelin as a component of data platform.

# Introduction

Apache Zeppelin is: “Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.”

# Install

According to installation document, I choose to install Zeppelin using the offical docker on a server (http://10.10.32.4):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ mkdir Zeppelin & cd Zeppelin
$ docker run -u $(id -u) -p 8080:8080 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook \\            -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' \\            --name zeppelin apache/zeppelin:0.10.0

Unable to find image 'apache/zeppelin:0.10.0' locally
0.10.0: Pulling from apache/zeppelin
16ec32c2132b: Pull complete
aafd5bdc2bb7: Pull complete
0bb58b150809: Pull complete
68d71ea3a296: Pull complete
9c7277321f0c: Downloading [=============================================>     ]   2.59GB/2.816GB
6be3e4488900: Download complete
622d30c2f649: Download complete
d10a38bf471f: Download complete
4006c4346d45: Download complete
4f4fb700ef54: Download complete

...

ERROR [2022-12-13 08:13:15,873] ({main} ZeppelinServer.java[main]:262) - Error while running jettyServer
java.lang.Exception: A MultiException has 2 exceptions.  They are:
1. java.io.IOException: Creating directories for /notebook/.git failed
2. java.lang.IllegalStateException: Unable to perform operation: create on org.apache.zeppelin.notebook.repo.NotebookRepoSync

	at org.apache.zeppelin.server.ZeppelinServer.main(ZeppelinServer.java:256)

These two error was raised because of Volume mapping issue with Zeppelin, you can fix it to add

-u 0 parameter to set the user to 0 (root) in docker run.

1
2
3
$ docker run -u $(id -u) -p 8080:8080 -u 0 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook \\            -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' \\            --name zeppelin apache/zeppelin:0.10.0

or you can change the owner of dir logs and notebook using:

1
2
3
4
5
$ sudo chown -R 1000:1000 notebook
$ sudo chown -R 1000:1000 logs
$ docker run -u $(id -u) -p 8080:8080 --rm -v $PWD/logs:/logs -v $PWD/notebook:/notebook \
           -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' \
           --name zeppelin apache/zeppelin:0.10.0

Then visit http://10.10.32.4:8080/ to use Zeppelin.

image-20221213165658796

# Create New Note

yshJJIkNnq

Create a new note through these two entries by giving it a name and select a default interpreter.

image-20221214150836370

You can use multiple interpreter in one zeppeline note to support different languages. You can get this via ChatGPT. See details in “Q&A (with ChatGPT on Zeppelin)” part.

image-20221214151356720

I write 3 different code snippet in 3 different languages of Markdown, Python and sql. Run them and we will get:

image-20221214151758144

You can see:

# Interpreters

We can see and manage Zeppelin’s interpreters via the ‘interpreter’ menu below:

image-20221214151959692

All interpreters are here:

image-20221214154255076

# Add a SQL Interpreter (for ADB)

In order to access data in mysql, add a SQL interpreter:

Reference:

If you want to connect other databases such as Mysql, Redshift and Hive, you need to edit the property values. You can also use Credential for JDBC authentication. If default.user and default.password properties are deleted(using X button) for database connection in the interpreter setting page, the JDBC interpreter will get the account information from Credential.

First, add a credential info, which stores safely the username and password of the mysql connection:

image-20221214164153294

Second, add a new interpreter adbsql as

image-20221214164336089

Remember to add dependencies to locate the jar used to run the mysql driver:

image-20221214164508044

# Access Data using SQL

Then I can access the data in mysql using sql like this:

image-20221214171612554

# Add MySQL Interpreter

image-20221215144609848

The dependency should rather be mysql:mysql-connector-java:5.1.44 than 5.1.41 used before, because

java.sql.SQLException: Unknown system variable ‘query_cache_size’

query_cache_size was removed in MySQL 8. Check the docs.

It works with JDBC driver 5.1.44.

image-20221215170718539

If mysql:mysql-connector-java:5.1.41 is used, query_cache_size error will be raised.

image-20221215171148557

Then you can access the data in mysql db:

image-20221215171439961

# Store Zeppelin Notes to GitLab Repo

How To: Store Zeppelin Notes in GitHub repo

You can set the local dir for Zeppelin notes to be a git repo and use Zeppelin’s version control feature to manage versions of a notes.

Add a new version and commit

image-20221215110158669

image-20221215110238902

Then git log will show the history of version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
git log
commit 1f029d8073928a3a9e67e7728cd293472595d94a (HEAD -> master)
Author: zeppelin <zeppelin@cffe349e8d4c>
Date:   Thu Dec 15 03:03:04 2022 +0000

    add dist stat

commit 836ca519affbc28ac97d998676b060f826b6073b (origin/master)
Author: zeppelin <zeppelin@cffe349e8d4c>
Date:   Thu Dec 15 02:43:25 2022 +0000

    test

commit a24ac32075750272a696cb0641693ff0040290ab
Author: zeppelin <zeppelin@cffe349e8d4c>
Date:   Thu Dec 15 01:40:06 2022 +0000

    test

commit f26c7f85d4b36a59bb7ec79152cbfc389c907a68
Author: zeppelin <zeppelin@cffe349e8d4c>
Date:   Wed Dec 14 09:17:09 2022 +0000

    Move note 2HPWKGXXY from test_note_of_zeppelin_2HPWKGXXY.zpln to 数据访问demo_2HPWKGXXY.zpln

Use git push to push local version to remote repo.

# Q&A (with ChatGPT on Zeppelin)

# Can I use python and sql in one zeppelin note?

ChatGPT:

Yes, it is possible to use Python and SQL together in a Zeppelin notebook. In fact, many data analysis tasks can benefit from combining the capabilities of both languages.

To use Python and SQL together in a Zeppelin notebook, you will first need to make sure that you have the appropriate interpreters installed and configured. In Zeppelin, an interpreter is a plug-in that enables the notebook to execute code in a specific programming language. For Python and SQL, you will need to install and configure the Python and SQL interpreters, respectively.

Once the interpreters are installed and configured, you can use the %python and %sql magic commands in your Zeppelin notebook to switch between Python and SQL code blocks. For example, you could use Python to load and clean your data, and then use SQL to analyze and visualize the cleaned data.

Here is an example of how you might use Python and SQL together in a Zeppelin notebook:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
%python
# load and clean data using Python

import pandas as pd

df = pd.read_csv("data.csv")
df = df.dropna()

%sql
-- analyze and visualize data using SQL

SELECT * FROM df LIMIT 10

I hope this helps! Let me know if you have any other questions.

# SQL: how to add a sql interpreter to apache zeppelin

image-20221214155408956

# Java: hello world in apache zeppelin

Java interpreter for Apache Zeppelin

ChatGPT:

image-20221214152806322