0

I take a Derived Column Transformation to replace a value as soon as a condition on two other columns is met. How can I see in the data flow how often a value changes by this replacement? Is there a way to show it on the output arrow, or how else can I show the changes inside the data flow?

PS (not needed for the Q/A)

I want to check whether the string input is the same as the lookup output. But sometimes, names (first names and last names) have more than one name in the value, like Taylor-Smith against Smith, or Bob Thomas against Bob. People might marry or might in times fill their two first names so that data might not match over two master data accounts.

(DT_BOOL)(FINDSTRING(name,[Lookup.name],1)) ? 1 : [score_name]

The value of column score_name is replaced by 1 as soon as the name is found inside the Lookup.name, else the value is kept which is between 0 and 1.

"Name in Lookup?":

enter image description here

enter image description here

You could take any other example like Replace column values?, the question does not depend on what is done.

questionto42
  • 366
  • 1
  • 2
  • 12

1 Answers1

0

Best choice: Making new columns is easier than replacing columns

Instead of replacing columns, it is much easier to make new Derived Columns for each condition without any Conditional Split and then show the number of changes of the values between old and new after a Multicast and a Conditional Split. If you want to replace a column, just drop the old one and keep the new one.

Not recommended: Conditional Split and Merge

Make a Conditional Split and split the data by the condition:

(DT_BOOL)(FINDSTRING(name,[Lookup.name],1) > 0 && [score_name] < 1)

There are at least two grey output arrows:

  • one for the condition
  • and one for the rest;
  • if you have more than one condition, you have more than two grey arrows.

The output arrows show the numbers of rows of the split in the data flow, which is the aim in question, and the split avoids replacing a 1 by a 1 (in my example) so that you can see how many values change by the replacement. For the replacement, make a Derived Column for the one output arrow where the condition is met and replace the value by 1 (in my example). There is no condition needed anymore since that was done by the Conditional Split. Afterwards, sort both by the same key and take the Merge component to append them together again.

If you have three grey output arrows, you will need two Merge components in a row to get everything together in one dataset again.

I had 8 conditions that were built on top of four main conditions, and I split them into four rounds, and this setup became a mess of components. I cannot recommend it.

questionto42
  • 366
  • 1
  • 2
  • 12