Questions tagged [etl]

'Extract, Transform and Load': Data load processing, normally in the context of data warehouse systems.

ETL (Extract, Transform and Load) processing is used to load a database from an external source. Normally used in the context of data warehouse systems, an ETL process has the three eponymous functions

  • Extract: data is extracted from the source system. The source could be a database, file, or API exposed by the source system.

  • Transform: The data from the source system may not be in the same format as the destination. Transformation is the process of changing the data so it will load correctly into the destination.

  • Load: The data is loaded into the destination, often through some sort of bulk load facility.

ETL processing can be done with hand-built code in SQL or a client-side language, or it can be done with tools specifically designed for the job (usually referred to as 'ETL tools'). Examples of such tools include SQL Server Integration Services and Informatica Powercentre.

257 questions
39
votes
3 answers

Are regular VACUUM ANALYZE still recommended under 9.1?

I'm using PostgreSQL 9.1 on Ubuntu. Are scheduled VACUUM ANALYZE still recommended, or is autovacuum enough to take care of all needs? If the answer is "it depends", then: I have a largish database (30 GiB compressed dump size, 200 GiB data…
François Beausoleil
  • 1,463
  • 3
  • 15
  • 24
31
votes
4 answers

What is a scalable way to simulate HASHBYTES using a SQL CLR scalar function?

As part of our ETL process, we compare rows from staging against the reporting database to figure out if any of the columns have actually changed since the data was last loaded. The comparison is based on the unique key of the table and some kind of…
Joe Obbish
  • 32,976
  • 4
  • 74
  • 153
29
votes
6 answers

How do I run a large script with many inserts without running out of memory?

Question: I have a script with around 45 thousand insert from select statements. When I try and run it, I get an error message stating that I have run out of memory. How can I get this script to run? Context: Added some new data fields to make an…
spaghetticowboy
  • 425
  • 1
  • 4
  • 9
26
votes
7 answers

Any Open Source / free ETL out there?

I was using Pentaho Data Integration even before Pentaho bought it and call it that. I have the last free version. I went on their website recently to see if they had released another version only to find out my favorite open source etl is not much…
Nicolas de Fontenay
  • 1,875
  • 2
  • 18
  • 37
25
votes
3 answers

What are the arguments in favor of using ELT process over ETL?

I realized that my company uses an ELT (extract-load-transform) process instead of using an ETL (extract-transform-load) process. What are the differences in the two approaches and in which situations would one be "better" than the other? It would…
HelloWorld1
  • 797
  • 2
  • 8
  • 16
15
votes
4 answers

Quick way to validate two tables against each other

We're doing an ETL process. When all is said and done there are a bunch of tables that should be identical. What is the quickest way to verify that those tables (on two different servers) are in fact identical. I'm talking both schema and data. …
RThomas
  • 3,446
  • 6
  • 30
  • 48
14
votes
2 answers

ETL: extracting from 200 tables - SSIS data flow or custom T-SQL?

Based on my analysis, a complete dimensional model for our data warehouse will require extraction from over 200 source tables. Some of these tables will be extracted as part of an incremental load and others will be a full load. To note, we have…
8kb
  • 2,639
  • 2
  • 32
  • 36
11
votes
2 answers

PostgreSQL for high volume transactions and for Data warehousing

Am quite new to PostgreSQL, I have never done a large deployment using it before. But, I have good experience in enterprise solutions and I want to try and apply some of what I learned using PostgreSQL. I have site which is sized to handle large…
Mo J. Mughrabi
  • 211
  • 1
  • 2
  • 4
10
votes
4 answers

Is there a standard language/interface for programmatic ETL in SQL Server?

I'm currently in the process of creating ETLs for our data warehouse. We're using SSIS 2008, but we're running into issues, the biggest of which is the difficulty in re-using components. We have separate packages for each table and each package…
kubi
  • 201
  • 1
  • 6
9
votes
5 answers

Copying a table (and all of its data) from one server to another?

I have a massive table, let's say 500,000 rows. I want to copy it (schema and data) from one server to another. This is not an upsert or any kind of update; It's a one-off straight copy and paste. What are the idiomatic approaches to this? I've…
J. Mini
  • 1,161
  • 8
  • 32
9
votes
6 answers

Dimensional Modeling and ETL in Redshift

I have been researching Amazon's Redshift database as a possible future replacement for our data warehouse. My experience has always been in using dimensional modeling and Ralph Kimball's methods, so it was a little weird to see that Redshift…
njkroes
  • 655
  • 4
  • 7
  • 15
7
votes
2 answers

Importing a large column (8000 chars) from Excel using SSIS

I'm trying to set up a regular import of an excel spreadsheet that we get from a vendor. I'm using SQL 2008 R2 SSIS to import it into a table. The problem connection manager is an OLE DB connection to the spreadsheet. The spreadsheet is Excel…
Kenneth Fisher
  • 24,307
  • 13
  • 63
  • 116
7
votes
3 answers

Tablespace on ephemeral storage

For performance reasons in some scenarios, e.g. Amazon EC2, you have access to a faster and cheaper storage device, which loses all its data on reboots, so it is called "ephemeral". This question is about taking advantage of such type of storage in…
Andrea Ratto
  • 171
  • 3
7
votes
1 answer

Duplicating some rows of a data flow in SSIS

I have a data flow to migrate data from an old database to a new one. The old design had all the data and historical information (changes) stored in a single table with a "version" (incrementing integer) against the row. The new design has two…
Tony
  • 378
  • 1
  • 4
  • 20
7
votes
2 answers

SQL Staging Tables: Primary Key Clustered or Heap

We are taking legacy flat txt files and inserting them into stage tables with SSIS.The question arose whether table should have primary clustered key index or not. This is direct flat file import with no transformation. create table…
1
2 3
17 18