Questions tagged [cassandra]

Apache Cassandra is an open source distributed database management system. It is designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution.

Apache Cassandra is a highly scalable, eventually consistent, distributed, structured row/column store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's . Like , Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra's Dynamo-based cluster model provides linear scalability and fault tolerance on commodity hardware or cloud infrastructure. Its support for replicating across multiple data centers is best-in-class, providing low latency and the ability to survive entire data center outages.

Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates and powerful built-in caching with the fastest write performance as compared to other database solutions and makes it a compelling option for big data processing. It provides linear scalability with the provision to add/remove nodes on the fly without downtime.

Cassandra was open-sourced by Facebook in 2008 and quickly became a top-level Apache project. Today, it's widely used by companies in many markets.

Official links:

Documentation

Useful Links:

596 questions
85
votes
5 answers

Which database could handle storage of billions/trillions of records?

We are looking at developing a tool to capture and analyze netflow data, of which we gather tremendous amounts of. Each day we capture about ~1.4 billion flow records which would look like this in json format: { "tcp_flags": "0", "src_as":…
somecallmemike
  • 965
  • 1
  • 7
  • 5
20
votes
5 answers

Infrastructure for Highly Concurrent, High Write DB

My requirements are: 3000 Connections 70-85% Write vs Read Currently, we are maxing out a High-CPU, Extra Large Instance at 700 connections. All 8 cores are maxed. We think it's the number of concurrent connections as the memory is fine. The write…
Justin
  • 301
  • 2
  • 3
17
votes
2 answers

Why does Cassandra recommend against creating an index on high-cardinality columns?

The Cassandra documentation states, Do not use an index in these situations: On high-cardinality columns because you then query a huge volume of records for a small number of results. See Problems using a high-cardinality column index below. It…
Thanatos
  • 271
  • 2
  • 4
15
votes
3 answers

Do you have to run nodetool repair on every node?

Do you have to run nodetool repair on every node in a cluster, or do you only need to run it on one node, and from there Cassandra will take care of the rest?
2rs2ts
  • 275
  • 1
  • 4
  • 11
15
votes
1 answer

Is it safe to add a new node to a Cassandra cluster while a repair is running?

I'm getting ready to expand an existing Cassandra cluster. I have repairs scheduled to run on a reoccurring basis. Do I need to disable repairs when adding a new node to a cluster, or can I bootstrap new nodes while repairs are running elsewhere…
Gene
  • 305
  • 2
  • 11
13
votes
1 answer

Cassandra multidatacenter configuration with 1 external ip

I'm trying to setup a multi-datacenter Cassandra cluster. The problem is that my datacenters have only 1 external IP (wan IP), I can setup port forwarding on the data centers switchs to access each node from the outside world using a different port,…
Sergio Ayestarán
  • 353
  • 1
  • 3
  • 10
12
votes
2 answers

What is a good way to copy data from one Cassandra ColumnFamily to another on the same Keyspace (like SQL's INSERT INTO)?

Trying to find a way to easily transfer all the rows from a Cassandra ColumnFamily/Table to another. The COPY command, as I understand, is a good option. However, as it dumps all the data to .csv on disk and then loads it back, I can't help but…
Juan Carlos Coto
  • 1,588
  • 5
  • 18
  • 25
11
votes
2 answers

What are the penalties of using many (thousands) of column families or keyspaces in Cassandra?

I am in the process of evaluating the best design for our Cassandra installation. There is not so much information out there on the Internet about using the first two levels of access that Cassandra provides—keyspaces and column families. I am…
favo
  • 213
  • 2
  • 5
10
votes
2 answers

Cassandra: maintenance

I am inexperienced with Cassandra, but I have some experience with SQL-based relational databases. I have been unable to find best practices information about how to maintain Cassandra once deployed. Is it necessary to VACUUM the database? I…
Mayur Patel
  • 103
  • 1
  • 5
10
votes
2 answers

Cassandra - Query a column with collection type

I am pretty new to cassandra, so pardon me if this turns out to be a silly question. I have a table structure as below CREATE TABLE data_points ( id text PRIMARY KEY, created_at timestamp, previous_event_id varchar, properties…
Rohit
  • 201
  • 1
  • 2
  • 5
10
votes
1 answer

What are the practical limitations on a column family in Cassandra?

In Cassandra, it's not recommended to have more than a few thousand column families, let's say 2,000 for the sake of argument. In cases where more than 2,000 types of data need to be persisted, one approach would be to shard multiple unrelated types…
Andrew Swan
  • 201
  • 2
  • 5
9
votes
2 answers

NoSQL : What is unstructured data?

we are currently running at the edge of resources with our mssql server based solution. We have now many traditional options regarding the next move to tackle the load: buy faster CPUs and IO split some customers to seperate server move db to…
thst
  • 203
  • 1
  • 8
8
votes
2 answers

Modeling Graph Data in Cassandra DB

I want to use Apache Cassandra to store a large amount of graph data according to a property graph model. The model contains the following entities: Vertices: Contains a map of key/value pairs (properties). Some keys should be indexed for querying…
ThePhysicist
  • 641
  • 6
  • 8
8
votes
1 answer

How can I get nodetool without Cassandra

I have Cassandra pool across machines in LAN. I need nodetool on machine without Cassandra installed. How can I get it? Is the installing Cassandra and disabling db service is the best way?
Dmitry Belaventsev
  • 183
  • 1
  • 1
  • 4
8
votes
3 answers

"phpMyAdmin" for Cassandra

Is there a tool like phpMyAdmin for RDBMS-MySQL for a Cassandra DB? I know that there is less "runtime" config possible (column families can not be edited at runtime etc.). Nevertheless, it would be very helpful to have a GUI to inspect the…
strauberry
  • 185
  • 1
  • 5
1
2 3
39 40