December 10, 2022

Blog @ Munaf Sheikh

Latest news from tech-feeds around the world.

CockroachDB CDC With Hadoop Ozone S3 Gateway and Docker Compose – Part 4

Great post from our friends at Source link

This is the fourth in the series of tutorials on CockroachDB and Docker Compose.

Today, we’re going to evaluate the Hadoop Ozone object store for CockroachDB object-store sink viability. A bit of caution, this article only explores the art of possible, please use the ideas in this article at your own risk! Firstly, Hadoop Ozone is a new object store Hadoop Community is working on. It exposes an S3 API backed by HDFS and can scale to billions of files on-prem!

You can find the older posts here: Part 1, Part 2, Part 3.

  • Information on CockroachDB can be found here.
  • Information on Docker Compose can be found here
  • Information on Hadoop Ozone can be found here
  1. Download ozone 0.4.1 distro
wget -O hadoop-ozone-0.4.1-alpha.tar.gz https://www-us.apache.org/dist/hadoop/ozone/ozone-0.4.1-alpha/hadoop-ozone-0.4.1-alpha.tar.gz
tar xvzf hadoop-ozone-0.4.1-alpha.tar.gz
  1. Modify the compose file for Ozone to include CRDB
cd ozone-0.4.1-alpha/compose

Notice the plethora of compose recipes available here!

We will focus on the ozones3 as we need the S3 gateway. As a homework exercise, try ozones3-haproxy once you’re done with this tutorial. I can see a lot of interesting use cases with that!

cd ozones3

Edit the file and add Cockroach:

   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw

The whole docker-compose file should look like so now:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: "3"
services:
   datanode:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
        - ../..:/opt/hadoop
      ports:
        - 9864
      command: ["ozone","datanode"]
      env_file:
        - ./docker-config
   om:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9874:9874
      environment:
         ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
      env_file:
          - ./docker-config
      command: ["ozone","om"]
   scm:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9876:9876
      env_file:
          - ./docker-config
      environment:
          ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
      command: ["ozone","scm"]
   s3g:
      image: apache/ozone-runner:${HADOOP_RUNNER_VERSION}
      volumes:
         - ../..:/opt/hadoop
      ports:
         - 9878:9878
      env_file:
          - ./docker-config
      command: ["ozone","s3g"]
   crdb:
      image: cockroachdb/cockroach:v21.2.3
      container_name: crdb-1
      ports:
         - "26257:26257"
         - "8080:8080"
      command: start-single-node --insecure
      volumes:
         - ${PWD}/cockroach-data/data:/cockroach/cockroach-data:rw
  1. Start docker-compose with CRDB and Ozone.

By default, Ozone will start with a single data node, we’re going to start it with 3 data nodes at once.

docker-compose up -d --scale=datanode=3
Creating network "ozones3_default" with the default driver
Creating ozones3_s3g_1      ... done
Creating ozones3_om_1       ... done
Creating ozones3_datanode_1 ... done
Creating ozones3_datanode_2 ... done
Creating ozones3_datanode_3 ... done
Creating crdb-1             ... done
Creating ozones3_scm_1      ... done
  1. Check logs for om and s3g
docker logs `ozones3_s3g_1`
docker logs `ozones3_om_1`

To make sure everything works and S3, as well as Ozone Manager, are up.

2020-01-06 16:30:42 INFO  BaseHttpServer:207 - HTTP server of S3GATEWAY is listening at http://0.0.0.0:9878
2020-01-06 16:30:50 INFO  BaseHttpServer:207 - HTTP server of OZONEMANAGER is listening at http://0.0.0.0:9874
  1. Browse the UI.

Ozone exposes a few UIs via HTTP, specifically:

After the bucket is created, you can browse to it:

http://localhost:9878/bucket1?browser

  1. Create a bucket.
aws s3api --endpoint http://localhost:9878/ create-bucket --bucket=ozonebucket
{
    "Location": "http://localhost:9878/ozonebucket"
}
  1. Upload a file to the bucket.
touch test
aws s3 --endpoint http://localhost:9878 cp test s3://bucket1/test
artem@Artems-MBP ozones3 % aws s3 --endpoint http://localhost:9878 cp test s3://ozonebucket/test
upload: ./test to s3://ozonebucket/test

You can browse the bucket using UI, hit refresh if necessary.

http://localhost:9878/ozonebucket?browser

You can also use aws API:

aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket
2020-01-06 12:59:39          0 test
  1. Setup a changefeed in CRDB to point to Ozone.

The steps here are not much different from the Minio changefeed described in the previous post.

Access the cockroach CLI.

docker exec -it crdb-1 ./cockroach sql --insecure
SET CLUSTER SETTING cluster.organization = '<organization name>';

SET CLUSTER SETTING enterprise.license="<secret>";

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE DATABASE cdc_demo;

SET DATABASE = cdc_demo;

CREATE TABLE office_dogs (
     id INT PRIMARY KEY,
     name STRING);

INSERT INTO office_dogs VALUES
   (1, 'Petee'),
   (2, 'Carl');

UPDATE office_dogs SET name="Petee H" WHERE id = 1;
  1. Create an Ozone-specific changefeed.
CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
root@:26257/cdc_demo> CREATE CHANGEFEED FOR TABLE office_dogs INTO 'experimental-s3://ozonebucket/dogs?AWS_ACCESS_KEY_ID=dummy&AWS_SECRET_ACCESS_KEY=dummy&AWS_ENDPOINT=http://ozones3_s3g_1:9878' with updated;
        job_id
+--------------------+
  518597966522974209
(1 row)

Time: 20.3764ms

The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set with dummy data to make changefeed work, Ozone needs to run in Kerberos mode to configure AWS secrets, see this

At this point, go back to the S3 UI and make sure dogs directory is created. Alas, the directory is there and if you browse to the farthest child directory, you will notice the JSON file.

Again, modifying the rows in the table will produce new files on the filesystem.

UPDATE office_dogs SET name="Beathoven" WHERE id = 1;

Clicking on the file will open a new browser tab with the following data:

{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}

We can also confirm the files are there with CLI:

artem@Artems-MBP ozones3 % aws s3 ls --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/
2020-01-06 13:05:45        191 202001061805395834869000000000000-aa12c96bd4b5919c-1-2-00000000-office_dogs-1.ndjson
2020-01-06 13:09:28         99 202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson
aws s3 cp --quiet --endpoint http://localhost:9878 s3://ozonebucket/dogs/2020-01-06/202001061808465775434000000000001-aa12c96bd4b5919c-1-2-00000001-office_dogs-1.ndjson /dev/stdout
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578334166605122300.0000000000"}

Hope you enjoyed this tutorial and come back for more! Please share your feedback in the comments.

This article only scratches the surface, for everything, there is to learn about Hadoop and
Ozone, navigate to their respective websites.

#CockroachDB #CDC #Hadoop #Ozone #Gateway #Docker #Compose #Part