Clickhouse is a distributed, high-performance SQL analytics database. Yandex has production systems with hundreds of servers. Clickhouse supports the full replication or sharding of tables over clusters. All queries run in multiple threads over multiple servers. With features such as compression and multi-threading, the Clickhouse database can process billions of rows per second on some queries. This speed allows for improvements around resource usage, massive scaling, increased data retention periods and enhanced functionality offered to clients.
Clickhouse is a column-oriented database system, with all tables partitioned by default which leads to high performance and flexibility. Various compression and different data encoding algorithms provide flexibility with compression. Optimizations such as LowCardinality
allow Enum
-like behaviour without having to specify the Enum
values in advance. It supports arrays and other complex data types, views, and most other standard advanced SQL functionality. As an open-source database, Clickhouse is freely available. Active development means that new features are frequently added.
As scaling and performance increasingly becomes an issue with database systems, an alternative with greater scalability is required. Clickhouse requires far less CPU than other databases and is easy to scale to multiple servers. Increasing the ingest batch size delivered to Clickhouse significantly improves performance at the small cost of a bit of latency. Tiered storage and JBOD-style disk management within clickhouse itself reduces management overhead and improves performance and speed. Clickhouse is easy to integrate with other systems as it has HTTP, MySQL and PostgreSQL frontends.