December 7, 2022

Blog @ Munaf Sheikh

Latest news from tech-feeds around the world.

ShardingSphere: Develop Distributed SQL Statement

Great post from our friends at Source link

In the previous articles “An Introduction to DistSQL” and “Integrating SCTL Into DistSQL’s RAL— Making Apache ShardingSphere Perfect for Database Management”, the Apache ShardingSphere committers shared the motivations behind the development of DistSQL, explained its syntax system, and impressively showcased how you can use just one SQL to create a sharding table.

Today, to help you gain a better understanding of DistSQL and develop your own DistSQL syntax, our community author analyzes the design & development process of DistSQL and showcases how you can implement a brand new DistSQL grammar in four stages of the development life cycle (i.e. demand analysis, design, development & testing).

What is DistSQL?

Like standard SQL, DistSQL or Distributed SQL is a built-in SQL language unique to ShardingSphere that provides incremental functional capabilities beyond standard SQL. Its design purpose is to empower resource and rule management with SQL operation capabilities. For more information about DistSQL, please read “Build a data sharding service with DistSQL.”

Why Do You Need DistSQL?

Redefining the boundary between middleware and database, and allowing developers to leverage Apache ShardingSphere as if they were using a database natively, is DistSQL’s design goal.

Therefore, to avoid a steep learning curve, DistSQL provides a syntax structure and syntax validation system similar to standard SQL. Another advantage of DistSQL is its ability to manage resources and rules at the SQL level without configuration files.

How Do You Develop DistSQL?

Preparation

Toolkit:

  1. ANTRL4 is a tool that translates your grammar to a parser/lexer in Java (or another target language). It’s used as a parser. You need to configure it before starting to develop your DistSQL. Want to get started with ANTLR 4? Please refer to this ANTRL4 Concise Tutorial.
  2. IntelliJ IDEA is an integrated development environment written in Java for developing computer software. You also need the plug-in ANTLR v4to test the grammar rules defined by ANTRL4.
  • First, choose the right Test Rule:

  • Input the statement to be verified in ANTLR Preview:

input statement to be verified in ANTLR Preview

You need to know the DistSQL execution process as well as the basics of synaptics and plug-ins. The complete DistSQL execution process is truly complicated, but the awesome architecture of ShardingSphere allows developers to develop DistSQL features without having to pay attention to the whole process.

However, you still need to take care of the following core procedures:

core procedures

Note: Here, we take data sharding as an example. Be aware that different features have different visitors.

Practice

After understanding the execution process of DistSQL, you now can appreciate the practical case and learn how to develop your first DistSQL statement.

In the previous article “An Introduction to DistSQL”, the author showcased how you can leverage DistSQL to create a sharding with the statement show sharding table rules.

Now, we have a new request: How can you use a DistSQL statement to quickly query shard quantity of each table shard? The designed syntactic statement is as follows:

show sharding tables count [from schema] ;

Preparation

  1. MySQL containing databases and tables for sharding.
  2. Zookeeper is used as Registry Center
  3. ShardingSphere-Proxy 5.0.0

Demonstration

1. Define the Statement

Add the following statement definition into the file: src/main/antlr4/imports/sharding/RQLStatement.g4. When it’s done, you can use ANTLR v4 to test it.

statement defined

Please ensure that all keywords in that statement are defined. For example, COUNT is an undefined statement here, so you need to define it in src/main/antlr4/imports/sharding/Keyword.g4'.

After you define the statement, you also need to add it into the file ShardingDistSQLStatement.g4. It’s for parsing router.

statement added to SharingDistSQLStatement.g4

Now, it’s time to use shardingsphere-sharding-distsql-parser to compile and generate the relevant objects.

2. Parse the Definition

Then, you also need to add a DistSQLStatement object of the definition in shardingsphere-distsql-statement to save the variable attributes of the statement. For example, the schemaName of the statement definition needs to be saved to the object DistSQLStatement.

save variable attributes of the statement

Since ShardingSphere uses ANTLR’s Visitor mode, in terms of definition handling, it’s required to rewrite visitShowShardingTableCount in ShardingDistSQLStatementVisitor. The purpose of this method is to create a ShowShardingTablesCountStatement object and save the related variable attributes to the object DistSQLStatement.

ShowShardingTablesCountStatement

shardingsphere-distsql-statement actually depends on shardingsphere-sharding-distsql-parser, so it’s necessary to compile shardingsphere-distsql-statement.

3. Handle Data and Return Results

Data handling is managed by the execute method of Handler or Executor, and getRowData returns the results. Different types of statement definitions focus on different things. For instance, when DistSQLResultSet is used as the result storage object, result data is assembled in the method execute.

Show the execution method and the DistSQLResultSet as shown in the below image:

execution method

In ShardingTablesCountResultSet, init gets and assembles data, and getRowData returns row data. getType is also obviously in the class. The method belongs to the TypedSPI interface, so ShardingTablesCountResultSet also needs to add org.apache.shardingsphere.infra.distsql.query.DistSQLResultSet into the directory src/main/resources/META-INF/services of the current module to complete the SPI injection. The path and content are as follows:

developed feature of statement definition

Now, you have successfully developed the feature of this statement definition.

4. Finish Your Unit Test and Parse Test

When you complete the basic feature development, you need to add test cases to the new class or method, and to complete parse tests for the new syntax. This ensures its continuous usability. The following code block is the unit test of ShardingTablesCountResultSet.

unit test of ShardingTablesCountResultSet

In addition to the unit test, you are also required to complete a parsing test for the grammar definition in shardingsphere-parser-test. The purpose is to parse the input DistSQL into a DistSQLStatement and then compare the parsed statement with your expected TestCase object. The steps are as follows:

  • Add the SQL you want to test in src/main/resources/sql/supported/rql/show.xml

#ShardingSphere #Develop #Distributed #SQL #Statement