Monday, December 08, 2008
Create Insert Command for SQL tables
set @columnnames = ''
set @tablename = 'planresource'
set @whereclause = ' where planid in (select planid from [plan] where planid >= 1500) '
set @tmptablename = '##tmp__'+@tablename
set @str1 = 'select * into '+@tmptablename+' from ['+@tablename+'] '+@whereclause
--print @str1
exec sp_executesql @str1
set @str1 = 'alter table '+@tmptablename+' add insertcommand nvarchar(3000)'
exec sp_executesql @str1
select @columnnames = @columnnames + '['+column_name+'], ' from information_schema.columns where table_name = @tablename
--print @columnnames
--print left(@columnnames , len(@columnnames)-1)
set @columnnames = left(@columnnames , len(@columnnames)-1)
--print @columnnames
set @insertcommand = 'insert into ['+@tablename+'] ('+@columnnames+') values ('
--print @insertcommand
set @str1 = 'update '+@tmptablename+' set insertcommand = '''+@insertcommand +''''
--print @str1
exec sp_executesql @str1
declare @sqlstring2 nvarchar(4000), @colname nvarchar(100)
set @sqlstring2 = 'update '+@tmptablename+' set insertcommand = insertcommand + '''
DECLARE tmptable_name CURSOR FOR select column_name from information_schema.columns where table_name = @tablename
OPEN tmptable_name
FETCH NEXT FROM tmptable_name INTO @colname
WHILE @@FETCH_STATUS = 0
BEGIN
--set @sqlstring2 = @sqlstring2 + ' convert(nvarchar(1000),isnull(replace(['+@colname+'],'''''''',''''''''''''),''''))+'''''')'' '
set @sqlstring2 = @sqlstring2 + '''''''+ convert(nvarchar(1000),isnull(replace(['+@colname+'],'''''''',''''''''''''),''''))+'''''', '
FETCH NEXT FROM tmptable_name INTO @colname
end
--print @sqlstring2
CLOSE tmptable_name
DEALLOCATE tmptable_name
set @sqlstring2 = left(@sqlstring2 , len(@sqlstring2)-1)+')'''
--print @sqlstring2
exec sp_executesql @sqlstring2
--select insertcommand from ##tmp__landingpage
--select insertcommand from #tmp
declare @string3 nvarchar(1000)
set @string3 =
'
declare @sqlstring nvarchar(4000)
DECLARE tmptable_name CURSOR FOR select insertcommand from '+@tmptablename+'
OPEN tmptable_name
FETCH NEXT FROM tmptable_name INTO @sqlstring
WHILE @@FETCH_STATUS = 0
BEGIN
print @sqlstring
print ''go''
FETCH NEXT FROM tmptable_name INTO @sqlstring
end
CLOSE tmptable_name
DEALLOCATE tmptable_name
'
--print @string3
exec sp_executesql @string3
set @string3 = 'drop table '+@tmptablename
exec sp_executesql @string3
Tuesday, November 18, 2008
sample for "execute as"
go
-- current user
SELECT SUSER_NAME(), USER_NAME();
select top 10 * from audit
go
-- deny select for a user
DENY SELECT ON [dbo].[Audit] TO [reportuser]
GO
-- change execute as
EXECUTE AS USER = 'reportuser';
SELECT SUSER_NAME(), USER_NAME();
-- this will fail
select top 10 * from audit
REVERT;
Go
-- revoke select for a user
REVOKE SELECT ON [dbo].[Audit] TO [reportuser]
GO
EXECUTE AS USER = 'reportuser';
SELECT SUSER_NAME(), USER_NAME();
-- this will run
select top 10 * from audit
REVERT;
Go
-- this is logged in user again
select top 10 * from audit
SELECT SUSER_NAME(), USER_NAME();
go
Tuesday, November 04, 2008
Date ETL Routines
Most of the DBAs are involved in daily routine jobs such as checking backups, security issues, performance tunings and many other tasks that are defiled to be line of DBA responsibilities.
On the other hand Data Warehousing is a challenge for database developers and programmers. Import, export and processing data is part of each DW task. I found myself very interested and amused when I’m involved in such a task. Most of the small companies don’t have a dedicated DW developer, so the requests are directed to DBA, and the expectation is that we (as DBAs and Data guys) should be able to complete any requests in a short amount of time.
In a world that each company is trying to expand and inquires another companies and competitors, it’s obvious that synchronizing the technologies of those companies are challenges for all IT personal, and when it comes to Data and information synchronization, then DBA’s and DW developers have important role to be able to provide solid data synchronization and reporting solutions.
I was involved in many projects that importing and processing the data from other databases and even from other platforms were essential for business owners. We had multi platform production database servers and DW farms. It’s obvious that reports are of the most imports piece for management. We had to create a solid DW and reporting solution which be able to provide real-time and financial reports on both sides. We start by building a new DW farm, utilizing MSSQL 2005 SSAS, SSRS and SSIS. Loading data from MySQL and Oracle databases, in a daily or hourly base, becomes a challenge for me and after many attempts and different approaches; I was able to build and maintain a solid DW farm.
Most of the schemas I was importing had INSERTDATETIME (Timestamp) field. I took 2 different approaches and used each in different scenarios. I will try to describe each method below and list the pros and cons for each one.
Importing the records within a certain period of time
Importing records based on the previous import’s timestamp
1- Importing the records within a certain period of time
In this method, I built a table where I keep the history of the pull (or push) . StartTime / EndTime is basically the range that records are affected. The insert query will look like this:
Insert into “TargetTable”
(Field1, Field2,Field3)
Select Field1, Field2,Field3 from “SourceTable” where InsertDateTime >= @StartTime and InsertDateTime < @EndTime As you can see only records in the range are selected. StartTime is always calculated based on the previous successful history, plus the desired Interval. Select @interval = interval from PackageInformation Select top 1 @StartTime = EndTime , @EndTime = EndTime + @interval from PackageExecutionHostory order by InsertDateTime desc After defining the processing date range, package will execute and upon successful completion the new date range will be recorded. This will become the start point for next time. 2-Importing records based on the previous import’s timestamp In the method, I record only the max of timestamp, which is processed. Select top 1 @StartTime = LastMaxDate from PackageExecutionHistory order by InsertDateTime desc The insert query will look like this: Insert into “TargetTable” (Field1, Field2,Field3) Select Field1, Field2,Field3 from “SourceTable” where InsertDateTime >= @StartTime
For recording successful transactions history, will need to run another query as :
Select @LastMaxDate = max(TimeStamp) from TargetTable
Inert into PackageExecutionHistory (LastMaxDate, InsertDateTime) values (@lastMaxDate,getdate())
I need to mention that each of these processes are depending on Job Schedule and will be executed from a Job.
Pros & Cons of each method.
Importing records based on the previous import’s timestamp
No matter when you run the job, all the records, since last execution, will be pulled (or inserted)
No control for pulling records between a certain periods of time. If there are high volumes of transactions, this might be a slow process. Inserting and selecting large number of records is always a bottleneck.
If any record in target table is getting inserted, while the select statement is trying to prepare the data set, then there is potential deadlock situation.
It’s possible to not capture some of the records if using select statement with (nolock) option to prevent performance impact.
Importing the records within a certain period of time
Certain data range to be selected
The data size is mostly predictable and by tuning interval period, it’s possible to minimize the impact of loading large data set
If the SQL job is not executed, the next job will pull only the records which are within the range of selection, which were supposed to be pulled by earlier job. This is a potential problem if the SQL jobs are not executed for a while, and will cause delay on data delivery.
Need more controlling and trapping procedures to prevent false execution on not valid date range.