Questions tagged [data-warehouse]

A database system optimised for reporting, particularly in aggregate. Often, but not always implemented using a star schema.

A Data Warehouse is a specialised database system that is optimised for reporting or at least easy extraction of data. One important point is that the data in a data warehouse is usually loaded from external systems, and the database design has significant differences to those used for transactional systems. Data warehouse systems often have several characteristic features:

  • Use of star schemas with central fact tables joining to dimension tables. These facilitate fast aggregate reporting and simple query plans. Sometimes other designs such as snowflake schemas or relatively normalised operational data store databases are employed.

  • Storage of historical data - often data warehouse systems are used for analytical queries that examine historical data, or trends in aggregate figures over time.

  • ETL processes to load the data from external sources. Data warehouse systems often function to aggregate data from multiple sources.

  • Conformed data - data from multiple sources is often transformed into a common format, allowing data from multiple sources to be queried at the same time.

431 questions
28
votes
4 answers

Compare two similar Postgres databases for differences

I occasionally download publicly available data sets in the form of Postgres dBs. These datasets are updated/modified/expanded over time by the repository host. Is there a Postgres command or tool (ideally FOSS) that can show the differences…
CuriousGorge
  • 291
  • 1
  • 3
  • 4
28
votes
5 answers

What are some ways to implement a many-to-many relationship in a data warehouse?

The dominant topologies of Data Warehouse modelling (Star, Snowflake) are designed with one-to-many relationships in mind. Query readability, performance, and structure degrades severely when faced with a many-to-many relationship in these modelling…
Brian Ballsun-Stanton
  • 4,731
  • 2
  • 30
  • 36
25
votes
3 answers

What are the arguments in favor of using ELT process over ETL?

I realized that my company uses an ELT (extract-load-transform) process instead of using an ETL (extract-transform-load) process. What are the differences in the two approaches and in which situations would one be "better" than the other? It would…
HelloWorld1
  • 797
  • 2
  • 8
  • 16
23
votes
2 answers

What are Measures and Dimensions in Cubes

I'm new to Microsoft SQL Server Business Intelligence and Analysis Service (but I've been programming for years with SQL Server). Can any one describe Measures and Dimensions in Cubes in simple words (If it's possible with images)?
DooDoo
  • 203
  • 1
  • 3
  • 13
21
votes
3 answers

Clustered columnstore indexes and foreign keys

I am performance tuning a data warehouse using indexes. I am fairly new to SQL Server 2014.Microsoft describes the following: "We view the clustered columnstore index as the standard for storing large data warehousing fact tables, and expect it…
20
votes
2 answers

Difference between star schema and data cube?

I am involved in a new project where I have to create a data cube from the existing relational database system. I understood that the existing system is not properly designed, and I am not sure where to start. My question are: What is difference…
Rathish Kumar B
  • 2,320
  • 5
  • 24
  • 36
18
votes
1 answer

Query strategies using SQL Server 2016 system-versioned temporal tables for Slowly-Changing Dimensions

When using a system-versioned temporal table (new in SQL Server 2016), what are the query authoring and performance implications when this feature is used to handle Slowly Changing Dimensions in a large relational data warehouse? For example, assume…
17
votes
2 answers

Open Source Business Intelligence/DWH solutions

I wonder that this question hasn't already been asked. Google only has very few results for me that don't show a high quality tool What are some Open Source (also free is ok) solutions for Data Warehouses and more specifically Business Intelligence…
14
votes
2 answers

Handling time zones in data mart/warehouse

We are starting to design the building blocks of a data mart/warehouse and we need to be able to support all time zones (our clients are from all over the world). From reading discussions online (and in books), a common solution seems to be to have…
14
votes
2 answers

ETL: extracting from 200 tables - SSIS data flow or custom T-SQL?

Based on my analysis, a complete dimensional model for our data warehouse will require extraction from over 200 source tables. Some of these tables will be extracted as part of an incremental load and others will be a full load. To note, we have…
8kb
  • 2,639
  • 2
  • 32
  • 36
13
votes
2 answers

Alternative to EAV for dynamic fields in a star schema data warehouse

I need to support dynamic fields and values in a big datawarehouse for storing API requests log, my user case is that I need to store all API requests query string and able to perform query against them in the future (so it is not just storage, so I…
Howard
  • 249
  • 1
  • 5
  • 10
13
votes
1 answer

How much RAM should I get for a cloud-hosted PostgreSQL data warehouse?

I'm looking at migrating a current PostgreSQL data warehouse to a cloud host with SSD storage and RAM as one of the main sizing variables. The bulkiest data we're dealing with at the moment will live on monthly partitioned tables. Each month is…
raphael
  • 599
  • 2
  • 4
  • 20
12
votes
1 answer

Should I disable "auto update statistics" in a data warehousing scenario?

I have 200 GB data warehouse in SQL Server. I have been experiencing really slow execution times for some queries; for example 12 hours for a simple delete query with an inner join. After doing some research with the execution plans, I've updated…
saso
  • 243
  • 3
  • 9
12
votes
1 answer

How do you calculate how much hardware resources you need for a database?

We are in the process of scaling our database servers. I was wondering how should we calculate how much hardware resources we need for our databases? Here is a little bit information about our current database server: MySQL Database InnoDB for all…
Mahdi
  • 185
  • 1
  • 1
  • 8
11
votes
1 answer

Limit the degree of parallelism (DOP) available to any query

On Oracle Exadata (11gR2), we have a relatively beefy database. cpu_count is 24 parallel_server_instances is 2 parallel_threads_per_cpu is 2 We noted, through observation in Oracle Enterprise Manager (OEM), that performance was terrible due to…
grenade
  • 211
  • 1
  • 2
  • 6
1
2 3
28 29