December 1, 2022

Blog @ Munaf Sheikh

Latest news from tech-feeds around the world.

CockroachDB CDC Using Minio as Cloud Storage Sink – Part 3

Great post from our friends at Source link

This is the third in the series of tutorials on CockroachDB and Docker Compose. Today, we’re going to explore CDC capability in CockroachDB Enterprise Edition using Minio object store as a sink. To achieve this, we’re going to reuse the compose file from the first two tutorials and finally bring this to a close. Without further ado

You can find the first post here and the second post here.

  • Information on CockroachDB can be found here.
  • Information on Docker Compose can be found here.
  • Information on Minio can be found here.
  • Information on Changefeed can be found here.

1- Start the docker-compose if you haven’t already.

docker-compose up -d

2- Browse to the Minio Web UI to create our first bucket.

As covered in the previous post as well as noted in the minio container logs, you can access the Web UI using http://127.0.0.1:9000. Again, from our compose script, we set environment variables for an access key, secret key, and region name.

     - MINIO_ACCESS_KEY=miniominio
     - MINIO_SECRET_KEY=miniominio13

Plugin those keys into the browser to access the UI. Once you’re in, you are greeted with a Minio file browser.

  1. Create a bucket by clicking the plus sign in the lower-right-hand corner.

Hit enter after typing out the name.

  1. Set up a changefeed in CockroachDB.

Access the cockroach CLI:

docker exec -it crdb-1 ./cockroach sql --insecure

Follow along in this tutorial to get an enterprise changefeed working. Specifically, set your organization name and license, then enable changefeed.

SET CLUSTER SETTING cluster.organization = '<organization name>';

SET CLUSTER SETTING enterprise.license="<secret>";

SET CLUSTER SETTING kv.rangefeed.enabled = true;

CREATE DATABASE cdc_demo;

SET DATABASE = cdc_demo;

CREATE TABLE office_dogs (
     id INT PRIMARY KEY,
     name STRING);

INSERT INTO office_dogs VALUES
   (1, 'Petee'),
   (2, 'Carl');

UPDATE office_dogs SET name="Petee H" WHERE id = 1;
  1. Create Minio specific changefeed
CREATE CHANGEFEED FOR TABLE office_dogs INTO 's3://miniobucket/dogs?AWS_ACCESS_KEY_ID=miniominio&AWS_SECRET_ACCESS_KEY=miniominio13&AWS_ENDPOINT=http://minio:9000&AWS_REGION=us-east-1' with updated;
CREATE CHANGEFEED FOR TABLE office_dogs INTO 's3://miniobucket/dogs?AWS_ACCESS_KEY_ID=miniominio&AWS_SECRET_ACCESS_KEY=miniominio13&AWS_ENDPOINT=http://minio:9000&AWS_REGION=us-east-1' with updated;
        job_id
+--------------------+
  517728810522771457
(1 row)

Time: 13.6961ms

You can now monitor your changefeed with ID 517728810522771457 in the CRDB UI or with SHOW JOBS SQL command.

Let’s inspect this statement a bit. First, there’s a scheme s3://, object storage sink. Then we mention miniobucket and directory within the bucket dogs. We are using Minio as our S3 object store and will plugin the AWS_KEY_ID and AWS_SECRET_ACCESS_KEY with properties, we set for Minio in the compose file. Additionally, we have to set AWS_ENDPOINT so that Minio can serve as our AWS endpoint. We’re also adding an optional AWS_REGION but for the sake of this tutorial it can be left out. You can probably see a pattern, prepend any of the site-specific properties with &PROPERTY

Feel free to review changefeed documentation for additional features, like the one I just learned where changefeed can be set up for multiple tables, e.g. CREATE CHANGEFEED FOR TABLE table1, table2, table3 INTO.... It will maintain a single changefeed for multiple tables, which happens to be a best practice over a single changefeed per table.

Now that changefeed is actually working, browse to the Minio UI, and within the miniobucket you will find a directory called dogs.

Here you will find a timestamped directory and within it a JSON file. This file will include all changes to the table. This CDC can also be set up with Kafka sink, again, feel free to get acquainted with changefeed documentation linked above. Possibilities are endless!

Finally, let’s browse the file and peek into it using a file browser. Recall in the compose file, we started Minio with a docker volume mapped to the current directory ${PWD}/data:/data. Navigating to the current directory where compose file resides, you will find a /data directory. Browse to the root of the child directory to find the JSON payload.

cd data/miniobucket/dogs/2020-01-03

This is where all of the CDC changes to the records in the table will be stored in JSON format.

artem@Artems-MBP 2020-01-03 % ls
202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
artem@Artems-MBP 2020-01-03 % head 202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
{"after": {"id": 1, "name": "Petee H"}, "key": [1], "updated": "1578069504198133400.0000000000"}
{"after": {"id": 2, "name": "Carl"}, "key": [2], "updated": "1578069504198133400.0000000000"}

From this point on, executing updates on the table will create new files.

UPDATE office_dogs SET name="Beathoven" WHERE id = 1;
root@:26257/cdc_demo> select * from office_dogs;
  id |   name
+----+-----------+
   1 | Beathoven
   2 | Carl
(2 rows)

Time: 1.4273ms

And on the filesystem:

artem@Artems-MBP 2020-01-03 % ls
202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson
202001031639259077295000000000001-8e419d24c147ac20-1-8-00000001-office_dogs-1.ndjson
artem@Artems-MBP 2020-01-03 % head *
==> 202001031638241981334000000000000-8e419d24c147ac20-1-8-00000000-office_dogs-1.ndjson <==
{"after": {"id": 1, "name": "Petee H"}, "key": [1], "updated": "1578069504198133400.0000000000"}
{"after": {"id": 2, "name": "Carl"}, "key": [2], "updated": "1578069504198133400.0000000000"}

==> 202001031639259077295000000000001-8e419d24c147ac20-1-8-00000001-office_dogs-1.ndjson <==
{"after": {"id": 1, "name": "Beathoven"}, "key": [1], "updated": "1578069607039550700.0000000000"}

This is a wrap for CDC sink to an object store with Cockroach and Minio. Hope you enjoyed this tutorial. Please share your feedback in the comments.

#CockroachDB #CDC #Minio #Cloud #Storage #Sink #Part