DSSharing

Friday, 1 February 2013

Understanding of Transformer Looping & Caching

Understanding of Transformer Looping & Caching:-

Power of transformer looping is introduced from IIS Data stage 8.5 and above versions. Transformer looping is made developers life easy in developing complex scenarios which was almost difficult in earlier versions.

loop variables:-

You can use loop variables when a loop condition is defined for the Transformer stage. When a loop is defined, the Transformer stage can output multiple rows for every row input to the stage. Loop variables are evaluated every time that the loop is iterated, and so can change their value for every output row. Such variables are accessible only from the Transformer stage in which they are declared. You cannot use a loop variable in a stage variable derivation.

Loop variables can be used as follows:

· They can be assigned values by expressions.

· They can be used in expressions which define an output column derivation.

· Expressions evaluating a variable can include other loop variables or stage variables or the variable being evaluated itself.

@ITERATION: - System variable used for transformer looping

What is Caching in the Transformer?

The Transformer Cache is an in-memory storage mechanism that is available from within the Transformer stage and is used to help solve complex data integration scenarios. The cache is a first-in/first-out (i.e. FIFO) construct and is accessible to the developer via two new functions:

• SaveInputRecord: stores an input row to back of the cache

• GetInputRecord: retrieves a saved row from the front of the cache

I will try to explain the transformer caching and looping using below scenario:-

Below is my input data:-

Customer	Store	Items	Value
600785011	103	10	100
600785011	104	20	200
600785011	109	30	300

Output:-

For each customer get a list of the stores he shopped, with this list output every combination of stores dropping customer from the output.

Home Store	Items	Value	Away Store	Items	Value
103	10	100	103	10	100
103	10	100	104	20	200
103	10	100	109	30	300
104	20	200	103	10	100
104	20	200	104	20	200
104	20	200	109	30	300
109	30	300	103	10	100
109	30	300	104	20	200
109	30	300	109	30	300

Below is Job Design:-

SeqFile -------> TRFM1-------> TRFM2--------->TempDataset

TSFM1:-

Define below stage variables in transformer:-

NumSavedRows= SaveInputRecord

IsBreak= LastRowInGroup(to_tsfm1.Customer)

svCurrStore= If IsBreak Then to_tsfm1.Store:"|": svPrevStore Else to_tsfm1.Store

svPrevStore= If IsBreak Then to_tsfm1.Items else to_tsfm1.Items :"|": svPrevItem svCurrItem= If IsBreak Then to_tsfm1.Items :"|": svPrevItem Else to_tsfm1.Items

svPrevItem= If IsBreak Then to_tsfm1.Items else to_tsfm1.Items :"|": svPrevItem

svCurrValue= If IsBreak Then to_tsfm1.Value :"|": svPrevValue Else to_tsfm1.Values

svPrevValue= If IsBreak Then to_tsfm1.Value else to_tsfm1.Value :"|": svPrevValue

NumRows= If IsBreak Then NumSavedRows Else 0

Define below loop conditions & Derivation:-

@ITERATION <= NumRows

OutputRecord= GetSavedInputRecord()

Output Derivations:-

Store=Inputlink.Store

Item=InputLink.Item

Value=InputLink.Value

ConcatStore=svCurrStore

ConcatItem=svCurrItem

ConcatValue=svCurrValue

Customer=InputCustomer

The logic of this transformer is as follows:

1 input data is read and stored in the cache via the "SaveInputRecord" function.
2.key breaks are tested via the "LastRowInGroup.
3.svCurrStore, svCurrItem, svCurrValue will concatenate with pipl all stores,Items & values in the same group.
4.if there is a key break, the " NumRows " variable is set to the number of records on the cache and we'll skip to step 6
5.if there is no key break(NumRows=0) the Loop Condition will be false. This will cause the next input record to be read and processed .
6.finally, the transformer will loop while "@ITERATION <= NumRows " and read the cache records.

TSFM2:-

Define below Stage Variable:-

RowNum=Count(InputLink1.ConcatStore,”|”)

Define below loop conditions :-

@ITERATION <= RowNum

Output Derivation:-

Store=InputLink1.Store

Item=InputLink.Item

Value=InputLink.Value

AwayStore= Field(to_tsfm2.ConcatStore,"|",@ITERATION)

Item= Field(to_tsfm2.ConcatItems,"|",@ITERATION)

Value= Field(to_tsfm2.ConcatValue,"|",@ITERATION)

Note:- I have just tried to explain better understanding of Transformer looping using this scenario in my blog,may be it can be done in different logics.I will appriciate if you can bring other logics too.

Thanks

Prasoon

Monday, 28 January 2013

Datastage Admin Commands

Using DataStage Administrator commands to find:

Command Description	Command
DS Project Object counts	*SELECT COUNT() FROM DS_JOBS;**
	*SELECT COUNT() FROM DS_JOBOBJECTS;**

List all jobs in a DS project	SELECT JOBNO, @ID, CATEGORY, NAME FROM DS_JOBS

Clear Log From Administrator	select JOBNO from DS_JOBS where NAME='Name of the job'
	CLEAR.FILE RT_Logxx where xx is JOBNO.

CLEAR HASH FILE.	CLEAR.FILE <HASHED FILE NAME>

Find how many Jobs are using given Hashed file as INPUT	select DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS,DS_JOBS where DS_JOBOBJECTS.OLETYPE = 'CHashedInput' and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'hashed file name' group by JOB_NAME;

Find how many Jobs are using given Hashed file as OUTPU	select DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS,DS_JOBS where DS_JOBOBJECTS.OLETYPE = 'CHashedOutput' and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'hashed file name' group by JOB_NAME;

Find how many Jobs are using given Hashed file as INPUT and OUTPUT	select DS_JOBS.NAME AS JOB_NAME FMT '35L' from DS_JOBOBJECTS, DS_JOBS where DS_JOBOBJECTS.OLETYPE in ('CHashedOutput','CHashedInput') and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO and EVAL DS_JOBOBJECTS."@RECORD<6>" = 'Hashed file name' group by JOB_NAME;

View column data in a hashed file	SELECT COLUMN_NAME FROM HASHED_FILE_NAME

List metadata of hashfile	SELECT "FIELD", "FIELD.TYPE", "FIELD.DEF", "FIELD.CONV", "FIELD.HEAD", "FIELD.FMT", "FIELD.ASSOC" FROM DICT hashedfile ORDER BY "FIELD.NO", "TYPE";

Insert data into the hashfile	INSERT INTO HASHED_FILE_NAME (COLUMN_NAME) VALUES ('VALUE');

Find which user modified the job last	SELECT INSTANCE, DTM, MODIFIER FROM DS_AUDIT WHERE CLASS = '2' AND INSTANCE LIKE 'JOB NAME';

List all the projects in DataStage.. With Owner	SELECT @ID, NAME FMT '40L' FROM UV.ACCOUNT;

Reset surrogate key generated using Keymgmt function in Datastage	UPDATE SDKSequences USING DICT VOC SET F1 = '1' WHERE @ID = 'Your_Sequence_Name';

List all the surrogate key variables used in datastage proj	*SELECT FROM SDKSequences;**

List Datastage error message decsription, given error me	*SELECT FROM SYS.MESSAGE WHERE @ID = '081002';**

List all jobs using a TABLE	SELECT DS_JOBS.NAME AS JOB_NAME, DS_JOBS.CATEGORY, DS_JOBOBJECTS.NAME AS OBJECT_NAME, DS_JOBOBJECTS.OLETYPE, EVAL DS_JOBOBJECTS."if index(upcase(@RECORD),'TABLE_NAME',1) > 0 then 'FOUND' else ''" AS FOUND FMT '5L' FROM DS_JOBS, DS_JOBOBJECTS WHERE DS_JOBS.JOBNO = DS_JOBOBJECTS.OBJIDNO and FOUND = 'FOUND' GROUP BY JOB_NAME, DS_JOBS.CATEGORY, OBJECT_NAME, DS_JOBOBJECTS.OLETYPE, FOUND;


List Sequencer jobs calling a specified job in the project	@DsServe >> /etl/Ascential/Datastage/DSEngine/bin/dssearch -ljobs -usedby -r -oc -oj <projectname> <jobname>
List all jobs called in specified Sequencer job	@DsServe >> /etl/Ascential/Datastage/DSEngine/bin/dssearch -ljobs -uses -r -oc -oj <projectname> <SeqJobname>

Friday, 25 January 2013

Unix Sed Command & hand-on

SED (Stream Editor)

Syntex:-

Sed options ‘address action’ file(s)

Sed uses instruction to act on text. An instruction combines an address for selecting lines,with an action to be taken on them.

Command : s for Substitution

Sed has several commands, but most people only learn the substitute command: s. The substitute command changes all occurrences of the regular expression into a new value. A simple example is changing "day" in the "old" file to "night" in the "new" file:

sed s/day/night/ old >new

If you have meta-characters in the command, quotes are necessary. And if you aren't sure, it's a good habit, and I will henceforth quote future examples to emphasize the "best practice." Using the strong (single quote) character, that would be:

sed 's/day/night/' old >new

There are four parts to this substitute command:

s         Substitute command

/../../   Delimiter

day       Regular Expression Pattern Search Pattern

night     Replacement string

Slash as a delimiter:-

If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin you could use “|” to quote the slash:-

sed 's|/usr/local/bin|/common/bin|' <old >new

Using & as a matched String:-

Sometimes you want to search for a pattern and add some characters, like parenthesis, around or near the pattern you found. It is easy to do this if you are looking for a particular string:

sed 's/abc/(abc)/' <old >new This will not work

The solution requires the special character "&." It corresponds to the pattern found.

sed 's/[a-z]*/(&)/' <old >new

Substitution flag:-

/g - Global Replacement:-

Most Unix utilties work on files, reading a line at a time. Sed, by default, is the same way. If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first.If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:

sed 's/[^ ][^ ]*/(&)/g' <old >new

/1,/2,etc. Specifying Which occurance:-

If you want to modify a particular pattern that is not the first one on the line,if you want to delete the second word on the line then use this command:-

sed 's/[a-zA-Z]* //2' <old >new

You can add a number after the substitution command to indicate you only want to match that particular pattern.

You can combine a number with the g (global) flag. For instance, if you want to leave the first world alone , but change the second, third, etc. to DELETED, use /2g:

sed 's/[a-zA-Z]* /DELETED /2g' <old >new

Write To a File with /w filename:-

There is one more flag that can follow the third delimiter. With it, you can specify a file that will receive the modified data. An example is the following, which will write all lines that start with an even number to the file even:

sed -n 's/^[0-9]*[02468] /&/w even' <file

You must have exactly one space between the w and the filename.

Ex:-

1) sed –n ‘1,500w foo1 ---Saves 1 to 500 to foo1

501,$w foo2’ foo.main ----- from 501 to last line to foo2

Multiple Command with –e command:-

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/' <old >new

Filenames On the Command Line:-

You can specify files on the command line if you wish.

sed -e 's/^#.*//' f1 f2 f3 | grep -v '^$' | wc –l

The sed substitute command changes every line that starts with a "#" into a blank line.

Grep was used to filter out empty lines.

Wc counts the number of lines left.

Note :-^ sign shows start line.

$ shows end of the line.

Sed –n Option:-No Printing

The "-n" option will not print anything unless an explicit request to print is found.

Eg:-

sed –n ‘1,2p’ <filename>

This command will print the first two lines of file( head -2 <filename>)

sed –n ‘$p’ <filename >

Print the last line .( tail -1 <filename>)

Negating the Action:-

Sed –n ‘3,$!p’ <filename> Don’t print from line 3 to end

Deleting Line:- sed use –d (delete) command to emulate grep –v option of selection lines not containing the pattern .

e.g.

Sed ‘/director/d’ emp.lst >olist

Sed –n ‘/director/!p’ emp.lst > olist ( -n option is not used with d )

-> select all lines except those containing director and save in olist.

Pattern:-

Sed uses the same convention, provided you terminate the expression with a slash.

To delete the first number on all lines that start with a "#," use:

sed '/^#/ s/[0-9][0-9]*//'

Suppose you wanted to search for the string "/usr/local/bin" and you wanted to change it for "/common/all/bin." You could use the backslash to escape the slash:

sed '/\/usr\/local\/bin/ s/\/usr\/local/\/common\/all/'

sed 's/foo/bar/g' filename # standard replace command

sed '/foo/ s/foo/bar/g' filename # executes more quickly

Range By Line Number:-

You can specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 100 lines, you can use:

sed '1,100 s/A/a/'

Range By Pattern:-

sed '/start/,/stop/ s/#.*//'

This example will remove comments everywhere except the lines between the two keywords:

sed -e '1,/start/ s/#.*//' -e '/stop/,$ s/#.*//'

eg:-
sed -e 's/#.*//' -e '/^$/ d' (Removing comments and blank lines takes two commands. The first removes every character from the "#" to the end of the line, and the second deletes all blank lines: )

Relationship with d , p & !

  Sed      Range   Command   Results

  --------------------------------------------------------

  sed -n   1,10    p         Print first 10 lines (head -10)

  sed -n   11,$    !p        Print first 10 lines

  sed      1,10    !d        Print first 10 lines

  sed      11,$    d         Print first 10 lines

  --------------------------------------------------------

  sed -n   1,10    !p        Print last 10 lines (tail -10 )

  sed -n   11,$    p         Print last 10 lines

  sed      1,10    d         Print last 10 lines

  sed      11,$    !d        Print last 10 lines

  --------------------------------------------------------

  sed -n   1,10    d         Nothing printed

  sed -n   1,10    !d        Nothing printed

  sed -n   11,$    d         Nothing printed

  sed -n   11,$    !d        Nothing printed

  --------------------------------------------------------

  sed      1,10    p         Print first 10 lines twice,

                            Then next 10 lines once

  sed      11,$    !p        Print first 10 lines twice,

                            Then last 10 lines once

  --------------------------------------------------------

  sed      1,10    !p        Print first 10 lines once,

                            Then last 10 lines twice

  sed      11,$    p         Print first 10 lines once,

                            then last 10 lines twice

Hand-on sed:-

# double space a file

sed G

# triple space a file

sed 'G;G'

# count lines (emulates "wc -l")

sed -n '$='

# Remove ^M character

#in bash/tcsh, press Ctrl-V then Ctrl-M

sed '1,$ s/^M//g' filename

# delete leading whitespace (spaces, tabs) from front of each line

# aligns all text flush left

sed 's/^[ \t]*//' # see note on '\t' at end of file

# delete trailing whitespace (spaces, tabs) from end of each line

sed 's/[ \t]*$//' # see note on '\t' at end of file

# delete BOTH leading and trailing whitespace from each line

sed 's/^[ \t]*//;s/[ \t]*$//'

# substitute (find and replace) "foo" with "bar" on each line

sed 's/foo/bar/' # replaces only 1st instance in a line

sed 's/foo/bar/4' # replaces only 4th instance in a line

sed 's/foo/bar/g' # replaces ALL instances in a line

sed 's/$.*$foo$.*foo$/\1bar\2/' # replace the next-to-last case

sed 's/$.*$foo/\1bar/' # replace only the last case

# substitute "foo" with "bar" ONLY for lines which contain "baz"

sed '/baz/s/foo/bar/g'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"

sed '/baz/!s/foo/bar/g'

# print the line immediately before a regexp, but not the line

# containing the regexp

sed -n '/regexp/{g;1!p;};h'

# print the line immediately after a regexp, but not the line

# containing the regexp

sed -n '/regexp/{n;p;}'

# print only lines of 65 characters or longer

sed -n '/^.\{65\}/p'

# print only lines of less than 65 characters

sed -n '/^.\{65\}/!p'

# print line number 52

sed -n '52p' # method 1

sed '52!d' # method 2

sed '52q;d' # method 3, efficient on large files