Thursday, 3 November 2016

Find all columns with only nulls in SQL Server table

Find all column fields with only nulls in SQL Server table


The piece of code below will give list of all columns that have only nulls in a give table. Replace YourTableName with your tablename in the code below. This is variation of code that I found here (
http://stackoverflow.com/questions/63291/sql-select-columns-with-null-values-only) 

CREATE TABLE #NullColumns (ColumnName Varchar(100))
DECLARE @TableName Varchar(100)
SET @TableName='YourTableName'
DECLARE crs CURSOR LOCAL FAST_FORWARD FOR SELECT name FROM syscolumns WHERE id=OBJECT_ID(@TableName)
OPEN crs
DECLARE @name sysname
FETCH NEXT FROM crs INTO @name
WHILE @@FETCH_STATUS = 0
BEGIN
    EXEC('INSERT INTO #NullColumns (ColumnName) SELECT ''' + @name + ''' WHERE NOT EXISTS (SELECT * FROM ' + @TableName + ' WHERE ' + @name + ' IS NOT NULL)')
    FETCH NEXT FROM crs INTO @name
END
CLOSE crs
DEALLOCATE crs
SELECT * FROM #NullColumns
DROP TABLE #NullColumns

0

Wednesday, 26 October 2016

Static vs Dynamic Lookup in Informatica

Difference between Static vs Dynamic Lookup In Informatica

Static Lookups: 

These are are used when you want to lookup a value on a static table or a SQL query based on certain matching conditions. It caches the values when the static lookup in created at the begining of the execution of the session. If the values in the lookup tables are changing and you want the changed values to be reflected during execution of the session then you need to use dynamic cache. 
When you create a lookup transformation it by defaults creates a static cache. Static lookups can be connected or unconnected depending on weather you want multiple or single value to be returned from the lookup.  Check out unconnected transformation here.

Dynamic Lookups: 

These kind of lookups are required when you want the changes happening in the lookup table to be reflected during the execution of the mapping/session. You can create dynamic lookups by setting Dynamic lookup Cache property in the properties section of the lookup transformation. When you make the lookup dynamic, you will observe in ports tab that a new field called NewLookupRow is created automatically. This NewlookupRow port will indicate if a row has been inserted or updated in the cache. In the dynamic lookup the records gets inserted or updated in the cache depending on weather the rows existed or not existed in the cached. 


Use of NewLookupRow in Dynamic Cache:

NewLookupRow port can have the following values:-
0 - Indicates that powercenter did not update or insert the row in the cache.
1 - Indicates that powecenter inserted the row into the cache because it is a new row
2-  Indicates that powecenter updated the row in the cache because the lookup value has changed.


Using the above values you can decide using update or filter transformation what you want to do with those rows coming of the source. Like for example if you want to insert it into the target table or update it? If NewLookupRow=1 then insert it to the target table or if it equal to 2 then update the record in the target table.




0

Monday, 24 October 2016

How to Loop in SQL server?

How to Loop in SQL server? 


The below example shows how to loop in microsoft SQL server and insert records in chunks: This makes inserts faster.

SET IDENTITY_INSERT dbo.table1 ON /* This is required if you are adding data to Identity columns */

DECLARE @cnt INT = 2006;

WHILE @cnt < 2016
BEGIN
select * from table1 where [Reporting_Period_SID]= @cnt
  SET @cnt = @cnt + 1
 END

SET IDENTITY_INSERT dbo.table1 OFF

Selecting top 3 (n) or Max 3 (n) records for each group by condition


Use the example to select top 3 records for each group. The row number is broken by the Sales_Reporting_Period and then is used to filter the top 3 records.

SELECT Company_code, Run_sid, Curr_ind, Sales_Reporting_Period from
(SELECT          AF.Company_Code,
                BB.Run_SID ,
              BB.Curr_Ind
              ,BB.Sales_Reporting_Period , ROW_NUMBER()
    over (
        PARTITION BY Sales_Reporting_Period
        order by Sales_Reporting_Period, BB.Run_Sid Desc
    ) AS RowNo
         FROM
                       [dbo].[Run_Dim] BB,
                       [dbo].[My_Reserve_Fact] AF
                       Where
              AF.Run_SID = BB.Run_SID
              group by
                      AF.Company_Code,
                     BB.SalesReporting_Period
                  ,BB.Curr_Ind
                     ,BB.Run_SID)Q1 where RowNo <=3


0

Thursday, 6 October 2016

SQL server and azure troubleshooting - common issues

 SQL server and azure troubleshooting - common issues

1) Msg 245, Level 16, State 1, Line 2

Conversion failed when converting the varchar value '000005EW84' to data type int.


Resolution: this issue happens when there is character in the column that you are converting to integer. Use a case statement like below to remove those records that have a column. In the example below policy_number is the column that has characters.

case when isnumeric(policy_number) = 1 then cast(policy_number as int) else 0 end as policy_num


2) Cannot insert explicit value for identity column in table 'Mytable_Dim' when IDENTITY_INSERT is set to OFF.

Resolution: You're inserting values for OperationId that is an identity column.

You can turn on identity insert on the table like this so that you can specify your own identity values.

SET IDENTITY_INSERT Table1 ON

INSERT INTO Table1
/*Note the column list is REQUIRED here, not optional*/
            (Mydata_SID,
             MyDate_Desc)
VALUES      (20,
             ''My data")

SET IDENTITY_INSERT Table1 OFF

3)Msg 515, Level 16, State 2, Line 45

Cannot insert the value NULL into column 'Party1_SID', table 'dbo.Party_Dim'; column does not allow nulls. INSERT fails.


Resolution: The column Party1_Sid does not accept null values.  Insert a value of -1 to the column whenever the value is null. Isnull() function can be used to test if a column value is null.

4) Msg 120, Level 15, State 1, Line 45 The select list for the INSERT statement contains fewer items than the insert list.


Resolution:The number of SELECT values must match the number of INSERT columns.
Match the number of columns on select and insert statements..

5) Msg 264, Level 16, State 1, Line 45

The column name 'Party1_SID' is specified more than once in the SET clause or column list of an INSERT.



Resolution: Remove any duplication of columns in the query. A column cannot be assigned more than one value in the same clause. Modify the clause to make sure that a column is updated only once. If this statement updates or inserts columns into a view, column aliasing can conceal the duplication in your code.

6) Arithmetic overflow error converting numeric to data type numeric.


Resolution: Increase the numeric precision. If there are datatype such as  numeric(11,2) you might have to make it high so that all the values can fit into the datatype. For example, make it numeric(20,2).


7) Msg 1033, Level 15, State 1, Procedure test_vw, Line 56
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

 Resolution: Remove order by from the view statement.

 

0

Thursday, 25 August 2016

Improving performance of Inserts, updates and deletes in SQL server

How to make Inserts , updates and deletes faster in SQL server  - Microsoft Azure?

I tried to delete some records from a table with 100 millions records and it took forever. One of the reason being there was no index on the column I used in the where clause of the delete statement and other reason being the table was so big. So deleting a big chunk of the data from such big tables can take forever since the data gets written to transaction logs and the table has to scan for all the records that match the where clause condition.

How to delete records quickly from huge tables in sql server on Azure? 

1) Use only the column that has index defined in the where clause condition. May be this will help find the records to be deleted quickly.
2) Delete records in small chunks using the loop below. This prevents transaction logs from growing rapidly.  @@rowcount always returns the number of records affected from delete.

DELETE TOP (100000) from dbo.mytable WHERE name='dwbi'
WHILE @@rowcount > 0
BEGIN
 
  DELETE TOP (100000) from dbo.mytable
    WHERE name='dwbi'
 
END
 
3)  Insert data into temp table with no index on it using the columns that need not be deleted , delete the original table, rename the temp table back to the original table and add back the index on the newly renamed temp table.

select * into tmp_mytable from mytable where name!='dwbi'
drop mytable
sp_rename 'tmp_mytable', 'mytable'


How to insert records quickly from huge tables in sql server on Azure? 

My records to insert records into a huge table improved a lot after I disabled the non clustered indexes and rebuild it after the insert. The steps are below:

ALTER INDEX [IX_MYTABLE_User_SID] ON dbo.[MYTABLE] DISABLE
Insert into [dbo].[mytable] select * from [dbo].[stg_table]

ALTER INDEX [IX_MYTABLE_UserSID] ON dbo.[MYTABLE] REBUILD

How to update records quickly from huge tables in sql server on Azure? 

My  usual way of updating records used to be something like this:

update dbo.mytablename set name= (
select dbo.TMP_table_name.name
from TMP_table_name
where dbo.mytablename.party_sid=TMP_table_name.party_sid)

The above update statement takes a lot of time because it updates all the records even those that do not match. The easier and faster way to update is using the statement below:  This will update only the matching records and it is pretty quicker.

update dbo.mytablename set
dbo.mytablename.name=dbo.TMP_table_name.name
from TMP_table_name
where dbo.mytablename.party_sid=TMP_table_name.party_sid


0

Friday, 8 July 2016

Performance tuning of Informatica Big Data Edition Mapping

Performance tuning of Informatica Big Data Edition Mapping



Below are the list of performance tuning steps that can be done in Informatica Big Data Edition:


1)  When using a look up transformation only when the lookup table is small. Lookup data is copied to each node and hence it is slow.


2) Use Joiners instead of lookup for large data sets.


3) Join large data sets before small datasets. Reduce the number of times the large datasets are joined in Informatica BDE.


4) Since Hadoop does not allow updates, you will have to rebuild the target table whenever the record is updated in a target table. Instead of rebuilding  the whole table, consider rebuilding only the impacted partitions.


5) Hive slower with any non string data type. It needs to create temp tables to do the conversion to and from the non string data type to string data type. Use non string data type only when required.


6) Use the data type precision close the actual data. Using higher precision slows down the performance of Informatica BDE.


7) Map only the ports that are required in the mapping transformation or loaded to target. Less number of ports means better performance and less data reads.









0

Workarounds for Mapping variables and parameters and sequence generators and sorters in BDE

Mapping variables and parameters and sequence generators and sorters in BDE


Since there are no mapping variables/parameters and sequence generators in BDE, you can use the following workarounds:

For mapping variables and parameters, you can use a control table or a files instead and read the control tables or files in the mapping and use the content of the table/files in your mapping. Create a look up on the control table to get values for all the parameters defined in the control table. You can update the control table if the parameter needs to be updated at the end of the run.



For Sequence generator, you can use UUID (Unique Identified) functions instead. These UUID functions are alphanumeric and if you need numeric only then use Java functions.

HSQL does sorting by default. i.e Hadoop does the sorting and so you do not need sorter unless you are using it with a aggregator that has a sorted input. In this case you need to add a sorter to validate the mapping.
0

Thursday, 7 July 2016

Useful Queries for troubleshooting amazon redshift

USEFUL QUERIES FOR TROUBLESHOOTING IN AMAZON REDSHIFT 

Here are some of my queries for troubleshooting in amazon redshift. I have collected this from different sources.

TO CHECK LIST OF RUNNING QUERIES AND USERNAMES:

select a.userid, cast(u.usename as varchar(100)), a.query, a.label, a.pid, a.starttime, b.duration,
b.duration/1000000 as duration_sec, b.query as querytext
from stv_inflight a, stv_recents b, pg_user u
where a.pid = b.pid and a.userid = u.usesysid




select pid, trim(user_name), starttime, substring(query,1,20) from stv_recents where status='Running'

TO CANCEL A RUNNING QUERY:

cancel <pid>


You can get pid from one of the queries above used to check running queries.


TO LOOK FOR ALERTS:

select * from STL_ALERT_EVENT_LOG
where query = 1011
order by event_time desc
limit 100;


TO CHECK TABLE SIZE:

select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema,
trim(a.name) as Table, b.mbytes, a.rows
from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a
join pg_class as pgc on pgc.oid = a.id
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on a.id=b.tbl
order by b.mbytes desc, a.db_id, a.name;


TO CHECK FOR TABLE COMPRESSION:

analyze <tablename>;
analyze compression <tablename>;



TO ANALYZE ENCODING:

select "column", type, encoding
from pg_table_def where tablename = 'biglist';



TO CHECK LIST OF FILES COPIED:

select * from stl_load_errors

select * from stl_load_commits


select query, trim(filename) as file, curtime as updated, *
from stl_load_commits
where query = pg_last_copy_id();


TO CHECK LOAD ERRORS

select d.query, substring(d.filename,14,20),
d.line_number as line,
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id();


TO CHECK FOR DISKSPACE USED IN REDSHIFT:

select owner as node, diskno, used, capacity
from stv_partitions
order by 1, 2, 3, 4;
select query, trim(querytxt) as sqlquery
from stl_query
order by query desc limit 5;


SOME IMPORTANT AWS COMMANDS:

To resize the redshift cluster (node type and number of nodes always required):

aws  redshift modify-cluster --cluster-identifier <cluster name> --node-type dw2.8xlarge --number-of-nodes 3

To get filelist on S3:

aws s3 ls $BUCKET/  > ./filecount.out

To get status of cluster and other information of cluster in text format:

aws redshift describe-clusters --output text   


0

Thursday, 28 April 2016

How to run stored procedure in Informatica cloud

How to run stored procedure in Informatica cloud?

There are multiple ways to run stored procedure in Informatica cloud.

If you want to call a stored procedure by passing an input value and reading the output from the stored procedure and process it in a mapping then it is better to use SQL transformation in Informatica cloud mapping designer as shown in picture below:


If you want to read the output of a stored procedure as a source in a mapping then you can call the stored procedure a query in the source as shown in screenshot below. The exec sp_help stored procedure is called in the source query. The stored procedure should return some data for this method to work. You might have to define the output variables in the stored procedure.



If you want to just execute a stored procedure once before or after executing a mapping then you can call the stored procedure a pre or post processing command in the mapping configuration as shown in the screenshot below. SQLCMD of sql server is used to execute the stored procedure. The output of the stored procedure is written to an output file. -b flag is used to return error code and make the preprocessing command fail when the stored procedure fails. Use session property  Preprocessing command failure and set it to stop if you want failure to stop the processing of other jobs.




0

Tuesday, 29 March 2016

Importing Cobol copy book definition , and reading and troubleshooting cobol file issues in Informatica

Importing Cobol copy book definition in Informatica and troubleshooting cobol file issues

 To read a cobol file, first you need to create the source definition and then read the file the data from the cobol input file. Source definitions are created from cobol copy book definitions.  Cobol copy book has definition of all the fields present in the data file along with the level and occurence information.

How to import a cobol copy book in Informatica?


Create source definition using Source-> Importing From Cobol  file option to create the source definition from cobol copy book.

Insert the source definition into mapping. Instead of source qualifier it is going to create normalizer transformation.Read from the normalizer and write to any target.


Details on how to load data from cobol copy book is in this video that I found on youtube:

https://www.youtube.com/watch?v=WeACh8AWQv8 
 

Troubleshooting copy book import in informatica:

Reading from cobol file is not always straightforward. You will have to sometimes edit the copy book to get it working. Some of the messages I had while creating source definitions are below:

Error messages seen:

05 Error at line 176 : parse error

Error: FD FILE-ONE ignored - invalid or incomplete.

05 Error: Record not starting with 01 level at line 7


Resolution for copy book import issues In Informatica:

1) Add the headers and footers shown in blue below even if they are blank

        identification division.
        program-id. mead.
        environment division.
        select file-one assign to "fname".
       data division.
       file section.
       fd  FILE-ONE.

        01  HOSPITAL-REC.
           02  DEPT-ID                PIC 9(15).
           02  DEPT-NM                PIC X(25).
           02  DEPT-REC                OCCURS 2 TIMES.
               03  AREA-ID             PIC 9(5).
               03  AREA-NM             PIC X(25).
               03  FLOOR-REC       OCCURS 5 TIMES.
                   04  FLOOR-ID       PIC 9(15).                                
                   04  FLOOR-NM       PIC X(25).
      working-storage section.
       procedure division.
           stop run.


2)  All the fields should start from column 8. Create 8 spaces before each field.
3) Make sure the field names do not start with *. It has to have a level number at the beginning.
4) Level 01 has to be defined as the first record. If level 01 not there then it would not import the cbl file.
5) Special levels such as 88 , 66 might not get imported. This apparently is limitation of powercenter. I could not find a way around it. 


Resolution for cobol file issues In Informatica:

I had such a hard time reading a cobol file that was in EBCDIC format with comp-3 fields in it. I thought the issues are mostly in importing the cobol copy book but even more issues when reading the cobol file. It is supposed to be straight forward in Informatica powercenter. The list of issues I faced.

1) When reading from cobol mainfram file the comp-3 fields were not read properly.
2) The alignments of the fields were not correct. The values were moving to next columns and had a cascading effect.

To resolve the above two issues I asked the mainframe file to be sent in binary format. Apparently files ftp'ed in ascii format from mainframe causes all the above issue. The comp-3 fields are not transferred properly in Ascii mode of ftp. 

Now with the binary file and below properties shown in screen it was supposed to work correctly. It was not. It was all garbage because of a weird issue. The properties I set for code page in source properties in source designer was for some reason not getting reflected in the session properties for that source. I had to manually change in the session- mapping -> source- file properties -advanced tab to code page IBM EBCDIC US English IBM037 and Number of Bytes to skip between records to 0 and it finally worked.

   
 To summarize when reading cobol file:
a) Ask for the file to be ftp'ed in binary format from the mainframe.
2) Set the code page and number of bytes to skip appropriately as shown in the screenshot in the session properties.

This should solve most of your problems with comp-3 fields and alignments ,etc. 

If not try unchecking IBM comp-3 option in the source properties from source designer. Try changing the code page and number of bytes to skip,  etc and hopefully it works.    

If you are seeing dots at the end of some comp-3 fields then most likely you are writing numeric data into varchar field. If you write it to numeric fields in the database those dots should disappear. 

Another issue with Informatica power center is that it can read only fixed width files. if you have occurs section with depending on condition then most likely you have a variable length file and Informatica power center does not work correctly with variable length files. Ask the mainframe guys to convert it to fixed width files. 

  

0