Friday 1 February 2013

Understanding of Transformer Looping & Caching


 
Understanding of Transformer Looping & Caching:-

 Power of transformer looping is introduced from IIS Data stage 8.5 and above versions. Transformer looping is made developers life easy in developing complex scenarios which was almost difficult in earlier versions.

 loop variables:-

You can use loop variables when a loop condition is defined for the Transformer stage. When a loop is defined, the Transformer stage can output multiple rows for every row input to the stage. Loop variables are evaluated every time that the loop is iterated, and so can change their value for every output row. Such variables are accessible only from the Transformer stage in which they are declared. You cannot use a loop variable in a stage variable derivation.

Loop variables can be used as follows:

·                     They can be assigned values by expressions.

·                     They can be used in expressions which define an output column derivation.

·                     Expressions evaluating a variable can include other loop variables or stage variables             or the variable being evaluated itself.

 

@ITERATION: - System variable used for transformer looping

 What is Caching in the Transformer?

 The Transformer Cache is an in-memory storage mechanism that is available from within the Transformer stage and is used to help solve complex data integration scenarios. The cache is a first-in/first-out (i.e. FIFO) construct and is accessible to the developer via two new functions:

 
  SaveInputRecord: stores an input row to back of the cache

  GetInputRecord: retrieves a saved row from the front of the cache

 
I will try to explain the transformer caching and looping using below scenario:-

 

Below is my input data:-

 

Customer 
 Store
 Items
Value
600785011
103
10
100
600785011
104
20
200
600785011
109
30
300


Output:-

 

For each customer get a list of the stores he shopped, with this list output every combination of stores dropping customer from the output.

 

Home Store
 Items
 Value
Away Store
Items
Value
103
10
100
103
10
100
103
10
100
104
20
200
103
10
100
109
30
300
104
20
200
103
10
100
104
20
200
104
20
200
104
20
200
109
30
300
109
30
300
103
10
100
109
30
300
104
20
200
109
30
300
109
30
300

 

Below is Job Design:-

  SeqFile -------> TRFM1-------> TRFM2--------->TempDataset
 

 
TSFM1:-

 Define below stage variables in transformer:-

 NumSavedRows= SaveInputRecord

IsBreak= LastRowInGroup(to_tsfm1.Customer)

svCurrStore= If IsBreak Then to_tsfm1.Store:"|": svPrevStore Else to_tsfm1.Store

svPrevStore= If IsBreak Then to_tsfm1.Items else  to_tsfm1.Items :"|": svPrevItem svCurrItem= If IsBreak Then to_tsfm1.Items :"|": svPrevItem Else to_tsfm1.Items

svPrevItem= If IsBreak Then to_tsfm1.Items else  to_tsfm1.Items :"|": svPrevItem

svCurrValue= If IsBreak Then to_tsfm1.Value :"|": svPrevValue Else to_tsfm1.Values

svPrevValue= If IsBreak Then to_tsfm1.Value else  to_tsfm1.Value :"|": svPrevValue

NumRows= If IsBreak Then NumSavedRows Else 0          

 Define below loop conditions & Derivation:-

                       @ITERATION <= NumRows

OutputRecord= GetSavedInputRecord()

 
Output Derivations:-

Store=Inputlink.Store

Item=InputLink.Item

Value=InputLink.Value

ConcatStore=svCurrStore

ConcatItem=svCurrItem

ConcatValue=svCurrValue

Customer=InputCustomer
 

The logic of this transformer is as follows:

1 input data is read and stored in the cache via the "SaveInputRecord" function.
2.key breaks are tested via the "LastRowInGroup.
3.svCurrStore, svCurrItem, svCurrValue will concatenate with pipl all stores,Items & values in the same group.
4.if there is a key break, the " NumRows " variable is set to the number of records on the cache and we'll skip to step 6
5.if there is no key break(NumRows=0) the Loop Condition will be false.  This will cause the next input record to be read and processed .
6.finally, the transformer will loop while "@ITERATION <= NumRows " and read the cache records.

 
TSFM2:-
Define below Stage Variable:-

                         RowNum=Count(InputLink1.ConcatStore,”|”)

 Define below loop conditions :-

                          @ITERATION <= RowNum

 Output Derivation:-

Store=InputLink1.Store

Item=InputLink.Item

Value=InputLink.Value

AwayStore= Field(to_tsfm2.ConcatStore,"|",@ITERATION)

Item= Field(to_tsfm2.ConcatItems,"|",@ITERATION)

Value= Field(to_tsfm2.ConcatValue,"|",@ITERATION)

 
 

Note:- I have just tried to explain better understanding of Transformer looping using this scenario in my blog,may be it can be done in different logics.I will appriciate if you can bring other logics too.
 
Thanks
Prasoon

 




Monday 28 January 2013

Datastage Admin Commands

Using DataStage Administrator commands to find:
Command DescriptionCommand
DS Project Object countsSELECT COUNT(*) FROM DS_JOBS;
SELECT COUNT(*) FROM DS_JOBOBJECTS;
  
List all jobs in a DS projectSELECT JOBNO, @ID, CATEGORY, NAME FROM DS_JOBS
  
Clear Log From Administratorselect JOBNO from DS_JOBS where NAME='Name of the job'
CLEAR.FILE RT_Logxx where xx is JOBNO.
  
CLEAR HASH FILE.CLEAR.FILE <HASHED FILE NAME>
  
Find how many Jobs are using given Hashed file as INPUTselect DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS,DS_JOBS where DS_JOBOBJECTS.OLETYPE = 'CHashedInput' and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'hashed file name' group by JOB_NAME; 
  
Find how many Jobs are using given Hashed file as OUTPUselect DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS,DS_JOBS where DS_JOBOBJECTS.OLETYPE = 'CHashedOutput' and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'hashed file name' group by JOB_NAME; 
  
Find how many Jobs are using given Hashed file as INPUT and OUTPUTselect DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS, DS_JOBS where DS_JOBOBJECTS.OLETYPE in ('CHashedOutput','CHashedInput') and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'Hashed file name' group by JOB_NAME;
  
View column data in a hashed fileSELECT COLUMN_NAME FROM HASHED_FILE_NAME
  
List metadata of hashfileSELECT "FIELD", "FIELD.TYPE", "FIELD.DEF", "FIELD.CONV", "FIELD.HEAD", "FIELD.FMT", "FIELD.ASSOC" FROM DICT hashedfile ORDER BY "FIELD.NO", "TYPE";
  
Insert data into the hashfileINSERT INTO HASHED_FILE_NAME (COLUMN_NAME) VALUES ('VALUE');
  
Find which user modified the job lastSELECT INSTANCE, DTM, MODIFIER FROM DS_AUDIT WHERE CLASS = '2' AND INSTANCE LIKE 'JOB NAME'; 
  
List all the projects in DataStage.. With OwnerSELECT @ID, NAME FMT '40L' FROM UV.ACCOUNT;
  
Reset surrogate key generated using Keymgmt function in DatastageUPDATE SDKSequences USING DICT VOC SET F1 = '1' WHERE @ID = 'Your_Sequence_Name';
  
List all the surrogate key variables used in datastage projSELECT * FROM SDKSequences;
  
List Datastage error message decsription, given error meSELECT * FROM SYS.MESSAGE WHERE @ID = '081002';
  
List all jobs using a TABLESELECT DS_JOBS.NAME AS JOB_NAME, DS_JOBS.CATEGORY, DS_JOBOBJECTS.NAME AS OBJECT_NAME, DS_JOBOBJECTS.OLETYPE, EVAL DS_JOBOBJECTS."if index(upcase(@RECORD),'TABLE_NAME',1) > 0 then 'FOUND' else ''" AS FOUND FMT '5L' FROM DS_JOBS, DS_JOBOBJECTS WHERE DS_JOBS.JOBNO = DS_JOBOBJECTS.OBJIDNO and FOUND = 'FOUND' GROUP BY JOB_NAME, DS_JOBS.CATEGORY, OBJECT_NAME, DS_JOBOBJECTS.OLETYPE, FOUND; 
  
  
List Sequencer jobs calling a specified job in the project  @DsServe >> /etl/Ascential/Datastage/DSEngine/bin/dssearch -ljobs -usedby -r -oc -oj <projectname> <jobname>
List all jobs called in specified Sequencer job  @DsServe >> /etl/Ascential/Datastage/DSEngine/bin/dssearch -ljobs -uses -r -oc -oj
<projectname> <SeqJobname>

Friday 25 January 2013

Unix Sed Command & hand-on


SED (Stream Editor)

 
Syntex:-

                        Sed options ‘address action’ file(s)

Sed uses instruction to act on text. An instruction combines an address for selecting lines,with an action to be taken on them.

Command  : s for Substitution

Sed has several commands, but most people only learn the substitute command: s. The substitute command changes all occurrences of the regular expression into a new value. A simple example is changing "day" in the "old" file to "night" in the "new" file:

                                   sed s/day/night/ old >new

 

If you have meta-characters in the command, quotes are necessary. And if you aren't sure, it's a good habit, and I will henceforth quote future examples to emphasize the "best practice." Using the strong (single quote) character, that would be:

sed 's/day/night/'  old >new

There are four parts to this substitute command:

s         Substitute command
/../../   Delimiter
day       Regular Expression Pattern Search Pattern
night     Replacement string

Slash as a delimiter:-

If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin you could use “|” to quote the slash:-

sed 's|/usr/local/bin|/common/bin|' <old >new

Using & as a matched String:-

Sometimes you want to search for a pattern and add some characters, like parenthesis, around or near the pattern you found. It is easy to do this if you are looking for a particular string:

sed 's/abc/(abc)/' <old >new               This will not work

The solution requires the special character "&." It corresponds to the pattern found.

sed 's/[a-z]*/(&)/' <old >new

 

Substitution flag:-

/g - Global Replacement:-

Most Unix utilties work on files, reading a line at a time. Sed, by default, is the same way. If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first.If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:

sed 's/[^ ][^ ]*/(&)/g' <old >new

 
/1,/2,etc. Specifying Which occurance:-

 

If you want to modify a particular pattern that is not the first one on the line,if you want to delete the second word on the line then use this command:-

 

sed 's/[a-zA-Z]* //2' <old >new

 

You can add a number after the substitution command to indicate you only want to match that particular pattern.

 

You can combine a number with the g (global) flag. For instance, if you want to leave the first world alone , but change the second, third, etc. to DELETED, use /2g:

 

sed 's/[a-zA-Z]* /DELETED /2g' <old >new

 
Write To a File with /w filename:-

There is one more flag that can follow the third delimiter. With it, you can specify a file that will receive the modified data. An example is the following, which will write all lines that start with an even number to the file even:

 

sed -n 's/^[0-9]*[02468] /&/w even' <file

 

You must have exactly one space between the w and the filename.
Ex:-

1)      sed –n ‘1,500w foo1           ---Saves 1 to 500 to foo1

501,$w foo2’ foo.main        ----- from 501 to last line to foo2

 

Multiple Command with –e command:-

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/' <old >new
 

Filenames On the Command Line:-

You can specify files on the command line if you wish.

 

sed -e 's/^#.*//' f1 f2 f3 | grep -v '^$' | wc –l

 

The sed substitute command changes every line that starts with a "#" into a blank line.

Grep was used to filter out empty lines.

Wc counts the number of lines left.

 

Note :-^ sign shows start line.

            $ shows end of the line.

Sed –n Option:-No Printing 

The "-n" option will not print anything unless an explicit request to print is found.
Eg:-

          sed –n ‘1,2p’ <filename>

This command will print the first two lines of file( head -2 <filename>)

          sed –n ‘$p’  <filename >

Print the last line .( tail -1 <filename>)
        

Negating the Action:-

     Sed –n ‘3,$!p’ <filename>        Don’t print from line 3 to end
 

Deleting Line:-  sed use –d (delete) command to emulate grep –v option of selection lines not containing the pattern .

e.g.

Sed ‘/director/d’ emp.lst >olist

Sed –n ‘/director/!p’ emp.lst > olist     ( -n option is not used with d )

 -> select all lines except those containing  director and save in olist.

Pattern:-

Sed uses the same convention, provided you terminate the expression with a slash.

To delete the first number on all lines that start with a "#," use:

sed '/^#/  s/[0-9][0-9]*//'

 

Suppose you wanted to search for the string "/usr/local/bin" and you wanted to change it for "/common/all/bin." You could use the backslash to escape the slash:

 
   sed '/\/usr\/local\/bin/ s/\/usr\/local/\/common\/all/'

   sed 's/foo/bar/g' filename         # standard replace command

   sed '/foo/ s/foo/bar/g' filename   # executes more quickly
 

Range By Line Number:-

 You can specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 100 lines, you can use:

sed '1,100 s/A/a/'

Range By Pattern:-

sed '/start/,/stop/ s/#.*//'

This example will remove comments everywhere except the lines between the two keywords:

sed -e '1,/start/ s/#.*//' -e '/stop/,$ s/#.*//'

eg:-
sed -e 's/#.*//' -e '/^$/ d'   (Removing comments and blank lines takes two commands. The first removes every character from the "#" to the end of the line, and the second deletes all blank lines: )

 
Relationship with d  ,  p & !

 
  Sed      Range   Command   Results
  --------------------------------------------------------
  sed -n   1,10    p         Print first 10 lines (head -10)
  sed -n   11,$    !p        Print first 10 lines
  sed      1,10    !d        Print first 10 lines
  sed      11,$    d         Print first 10 lines
  --------------------------------------------------------
  sed -n   1,10    !p        Print last 10 lines (tail -10 )
  sed -n   11,$    p         Print last 10 lines
  sed      1,10    d         Print last 10 lines
  sed      11,$    !d        Print last 10 lines
  --------------------------------------------------------
  sed -n   1,10    d         Nothing printed
  sed -n   1,10    !d        Nothing printed
  sed -n   11,$    d         Nothing printed
  sed -n   11,$    !d        Nothing printed
  --------------------------------------------------------
  sed      1,10    p         Print first 10 lines twice,
                            Then next 10 lines once
  sed      11,$    !p        Print first 10 lines twice,
                            Then last 10 lines once
  --------------------------------------------------------
  sed      1,10    !p        Print first 10 lines once,
                            Then last 10 lines twice
  sed      11,$    p         Print first 10 lines once,
                            then last 10 lines twice

Hand-on sed:-


 # double space a file

 sed G


 # triple space a file

 sed 'G;G'


 # count lines (emulates "wc -l")

 sed -n '$='

 

# Remove ^M character

#in bash/tcsh, press Ctrl-V then Ctrl-M

sed '1,$  s/^M//g'  filename   

 

# delete leading whitespace (spaces, tabs) from front of each line

 # aligns all text flush left

 sed 's/^[ \t]*//'                    # see note on '\t' at end of file

 
# delete trailing whitespace (spaces, tabs) from end of each line

 sed 's/[ \t]*$//'                    # see note on '\t' at end of file

 

# delete BOTH leading and trailing whitespace from each line

 sed 's/^[ \t]*//;s/[ \t]*$//'

 
# substitute (find and replace) "foo" with "bar" on each line

 sed 's/foo/bar/'             # replaces only 1st instance in a line

 sed 's/foo/bar/4'            # replaces only 4th instance in a line

 sed 's/foo/bar/g'            # replaces ALL instances in a line

 sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # replace the next-to-last case

 sed 's/\(.*\)foo/\1bar/'            # replace only the last case

 
# substitute "foo" with "bar" ONLY for lines which contain "baz"

 sed '/baz/s/foo/bar/g'

 

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"

 sed '/baz/!s/foo/bar/g'

 

 

 # print the line immediately before a regexp, but not the line

 # containing the regexp

 sed -n '/regexp/{g;1!p;};h'

 

 

 # print the line immediately after a regexp, but not the line

 # containing the regexp

 sed -n '/regexp/{n;p;}'

 # print only lines of 65 characters or longer

 sed -n '/^.\{65\}/p'

 

# print only lines of less than 65 characters

 sed -n '/^.\{65\}/!p'


 # print line number 52

 sed -n '52p'                 # method 1

 sed '52!d'                   # method 2

 sed '52q;d'                  # method 3, efficient on large files