When Doris Learns to "Speak Dialects"
The SQL dialect conversion feature of Apache Doris understands more than ten SQL dialects, including Presto, Trino, Hive, and ClickHouse.
Join the DZone community and get the full member experience.
Join For FreeData migration is like moving house — every data engineer has faced this headache: a pile of SQL statements that need rewriting, as if you have to disassemble and reassemble all the furniture.
Different systems' SQL syntax is like different dialects. Although they all speak the SQL language, each has its own "accent" and habits. "If only there were a 'translator'!" This is probably the wish of every engineer who has experienced system migration.
Today, I want to introduce a magical "translator" — Apache Doris's SQL dialect conversion feature. It can understand more than ten SQL dialects, including Presto, Trino, Hive, ClickHouse, and Oracle, and can automatically complete the conversion for you!
Doris SQL Dialect Compatibility: Smooth Data Migration Like Silk
"Facing system migration, SQL rewriting is like playing Tetris — one wrong move and you're in trouble."
This sentence voices the sentiment of many data engineers. As data scales grow and businesses evolve, companies often need to migrate data from one system to another. The most painful part of this process is undoubtedly the compatibility of SQL syntax.
Each data system has its unique SQL dialect, just like each place has its own dialect. Although they all speak SQL, each has its own "accent." When you need to migrate data from Presto/Trino, ClickHouse, or Hive to Doris, hundreds or even thousands of SQL statements need to be rewritten, which is undoubtedly a huge project.
Apache Doris understands this pain. In version 2.1, Doris introduced the SQL dialect compatibility feature, supporting more than ten mainstream SQL dialects, including Presto, Trino, Hive, ClickHouse, and Oracle. Users only need to set a simple session variable to let Doris directly understand and execute the SQL syntax of other systems.
Compatibility tests show that in some users' actual business scenarios, Doris' compatibility with Presto SQL reaches as high as 99.6%, and with the ClickHouse dialect, it reaches 98%. This means that the vast majority of SQL statements can run directly in Doris without modification.
For data engineers, it is like holding a universal translator. No matter which SQL "dialect" it is, it can be automatically converted into a language that Doris can understand. System migration no longer requires manually rewriting a large number of SQL statements, greatly reducing the cost and risk of migration.
From "Dialect Dilemma" to "Language Master"
Zhang Gong is an experienced data engineer who recently received a challenging task — to migrate the company's data analysis platform from ClickHouse to Apache Doris. Faced with hundreds of SQL statements, he couldn't help but rub his temples.
"If only there were a tool to directly convert ClickHouse SQL to Doris," Zhang Gong muttered to himself. It was then that he discovered Doris' SQL dialect compatibility feature.
Let's follow Zhang Gong's steps to see how he solved this problem:
First, download the latest version of the SQL dialect conversion tool.
On any FE node, start the service with the following commands:
# config port
vim apiserver/conf/config.conf
# start SQL Converter for Apache Doris
sh apiserver/bin/start.sh
# webserver
vim webserver/conf/config.conf
# webserver start
sh webserver/bin/start.sh
Start the Doris cluster (version 2.1 or higher), and after the service is started, set the SQL conversion service address in Doris:
set global sql_converter_service_url = "http://127.0.0.1:5001/api/v1/convert"
Then, switch the SQL dialect with just one command:
set sql_dialect=clickhouse;
That's it! Zhang Gong found that SQL statements that originally needed to be manually rewritten could now be executed directly in Doris:
mysql> select toString(start_time) as col1,
arrayCompact(arr_int) as col2,
arrayFilter(x -> x like '%World%',arr_str)as col3,
toDate(value) as col4,
toYear(start_time)as col5,
addMonths(start_time, 1)as col6,
extractAll(value, '-.')as col7,
JSONExtractString('{"id": "33"}' , 'id')as col8,
arrayElement(arr_int, 1) as col9,
date_trunc('day',start_time) as col10
FROM test_sqlconvert
where date_trunc('day',start_time)= '2024-05-20 00:00:00'
order by id;
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
| 2024-05-20 13:14:52 | [1, 2, 3] | ["World"] | 2024-01-14 | 2024 | 2024-06-20 13:14:52 | ['-0','-1'] | "33" | 1 | 2024-05-20 00:00:00 |
+---------------------+-----------+-----------+------------+------+---------------------+-------------+------+------+---------------------+
1 row in set (0.02 sec)
"This is simply amazing!" Zhang Gong was pleasantly surprised to find that this seemingly complex ClickHouse SQL statement was perfectly executed.
Not only that, but he also discovered that Doris provides a visual interface that supports both text input and file upload modes. For a single SQL statement, users can directly input text in the web interface. If there are a large number of existing SQL statements, you can upload files for one-click batch conversion of multiple SQL statements:
Through the visual interface, Zhang Gong can upload SQL files in batches and complete the conversion with one click. "This is like having a universal translator that can seamlessly switch between ClickHouse and other SQL dialects," Zhang Gong exclaimed.
What's more, he was delighted to find that the accuracy of this "translator" is quite high. In actual testing, the compatibility with Presto SQL reaches 99.6%, and with ClickHouse, it reaches 98%. This means that the vast majority of SQL statements can be used directly, greatly improving migration efficiency.
The pressure of the data migration project was greatly reduced, and Zhang Gong could finally get a good night's sleep. However, he still had a small concern: "What if there are unsupported syntaxes?"
At this point, he found that Doris' development team values user feedback highly. Through communities, Ask forums, GitHub Issues, or mailing lists, users can provide feedback anytime to promote the continuous optimization and improvement of the SQL dialect conversion feature.
This open and user feedback-oriented attitude gives Zhang Gong great confidence for the future. "Next time I encounter a data migration project, I know which 'magic tool' to use!"
Stay tuned for more interesting, useful, and valuable content in the next issue!
Opinions expressed by DZone contributors are their own.
Comments