Questions tagged [clickhouse]

Use this tag when referencing Yandex' ClickHouse database management system.

ClickHouse is an open source column-oriented database management system capable of real time generation of analytical data reports using SQL queries. It was originally designed to power Yandex Metrica.

ClickHouse is column-oriented and supports the concept of Table Engines, to face different workloads and provide different functionalities. It also supports many features typical of relational databases, like views and materialized expressions. It is designed to natively scale horizontally.

In a true column-oriented DBMS, there isn't any "garbage" stored with the values. For example, constant-length values must be supported, to avoid storing their length "number" next to the values. As an example, a billion UInt8-type values should actually consume around 1 GB uncompressed, or this will strongly affect the CPU use. It is very important to store data compactly (without any "garbage") even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.

References

ClickHouse Main Site

16 questions
3
votes
2 answers

More efficient accumulator in SQL?

I'm writing a ledger system where every transaction can have multiple classifications. For example, if someone purchases a widget for $50, I can categorize that transaction as having an account of "Revenue" and an SKU as "SKU1". Users can then…
poundifdef
  • 141
  • 3
3
votes
3 answers

Continuously move data from server A to Server B while deleting the data in server A?

I'm developing an ad server that is expected to handle ad impressions/billion clicks per day. The most difficult challenge I am facing is moving data from one server to another. Basically the flow is like this : Multiple front facing load balancers…
2
votes
1 answer

In what cases is using ClickHouseDb and the like a neccessity?

An open source for website analytics - https://github.com/plausible/analytics They use Postgresql and ClickHouseDb. When it comes to web analytics, there're tons of events going that need to be tracked. From the point of view of database, why is…
2
votes
1 answer

Correlated subqueries. Count Visits after Last Purchase Date

I'm pretty new to SQL and have been trying to solve this task for a while...still no luck. I would appreciate if someone here could help me out. I have a database with columns: ClientID VisitID Date PurchaseID (array) etc. What I'm…
1
vote
1 answer

Clickhouse OPTIMIZE performance for deduplication

I want to try and understand the performance of the OPTIMIZE query in Clickhouse. I am planning on using it to remove duplicates right after a bulk insert from a MergeTree, hence I have the options of: OPTIMIZE TABLE db.table DEDUPLICATE or OPTIMIZE…
AmyChodorowski
  • 113
  • 1
  • 5
1
vote
1 answer

How to backup clickhouse over SSH?

In postgreSQL, I usually run this command to backup and compress (since my country have really low bandwidth) from server to local: mkdir -p tmp/backup ssh sshuser@dbserver -p 22 "cd /tmp; pg_dump -U dbuser -Fc -C dbname | xz - -c" \ | pv -r -b >…
Kokizzu
  • 1,403
  • 6
  • 18
  • 35
1
vote
1 answer

Mounting Clickhouse data directory to another partition: DB::Exception: Settings profile `default` not found

I'm trying to move clickhouse data directory to another partition /dev/sdb1. So here's what I've done: sudo systemctl stop clickhouse-server mv /var/lib/clickhouse /var/lib/clickhouse-orig mkdir /var/lib/clickhouse chown clickhouse:clickhouse…
Kokizzu
  • 1,403
  • 6
  • 18
  • 35
1
vote
1 answer

Clickhouse Replication without Sharding

How to make replication (1 master, 2 slave for example) in ClickHouse without sharding? All I can see from the examples are always have sharding: Altinity Presentation Docker Compose Example ProgrammerSought Blog QuidQuid Blog FatalErrors…
Kokizzu
  • 1,403
  • 6
  • 18
  • 35
0
votes
0 answers

PeerDB Initial Snapshot Performance Impact on Standby PostgreSQL

I have set up a Change Data Capture (CDC) pipeline using PeerDB to mirror tables from a PostgreSQL standby read replica to ClickHouse. • The PostgreSQL database contains terabytes of data. • The initial snapshot of the existing data needs to be…
0
votes
1 answer

How to check user settings in ClickHouse

I have a user created by SQL command. I know there is the getSetting() function to check user settings. e.g. : SELECT getSetting('async_insert'); But how to check other users' settings if you are a DBA? Is there any view/function/request for this…
Mikhail Aksenov
  • 430
  • 1
  • 7
  • 19
0
votes
0 answers

Calculate the sum of minutes between statuses Clickhouse

There is a table in ClickHouse that is constantly updated, format: date_time | shop_id | item_id | status | balance --------------------------------------------------------------- 2022-09-09 13:00:01 | abc | 1234 | 0 |…
Kirill_K
  • 1
  • 1
0
votes
0 answers

How do i design a schema with proper DB engine to accumulate data depending on this need on clickhouse or in any other database?

We're a new Adtech company and I was planning to design a database where I'll pull all the data to a single table and then make new tables with a materialized views for others to generate multiple reports. Say we have Inventory, impression, views…
0
votes
1 answer

Is Azure Managed Disks enough to ensure high-durability for a database?

I want to set up a database in a high durability set-up on Azure. I've previously relied on DB-as-a-service offerings, but can't do that in this case, so I'd like your feedback on the plan below. Is this enough to ensure reliable storage of…
0
votes
1 answer

Clickhouse create database structure for json data

New to clickhouse and stuck on the database creation structure for importing json data which is nested Take for example the json data that looks like the following when there is data populated "FirewallMatchesActions": [ "allow" ], …
p4guru
  • 296
  • 1
  • 6
-1
votes
1 answer

How to create pre-computed tables in order to speed up the query speed

One of the issues that I am encountering presently is that we have certain very large tables (>10 Million rows).When we reference these large tables or create joins, the speed of query is extremely slow. One of the hypothesis for solving the issue…
1
2