A co-worker recently asked me why she would get timeout when updating a column in a big table. This column will be updated to 0 (zero) because a default value of 0 (zero) constrain is now added to the table.
The table has over 40 million records. Although I didn’t ask the exact size in MB, I am pretty sure that a table with over 40 million records is not a small table anymore.
I had a blog a few weeks ago that talked about 4 different scenarios when deleting or updating data on a large table.
I didn’t refer my co-worker to the above blog because the blog is really too long to read in one minute and many scenarios are not applicable to the issue. Scenario # 3 in the above blog “Updating a Nonpartitioned Table – Switch with a Temp Table” is the closest scenario to the issue my co-worker is trying to resolve.
So I summarized a few steps for my co-workers. When we try to “update” a big table, we would actually avoid using the UPDATE command directly on the main table. We want to take advantage of the bulk load optimizations in T-SQL with the INSERT … SELECT method or the SELECT … INTO method.
Update a large table
“Updating”a large table may be time-consuming.
Here is a summary of steps that use the INSERT … SELECT method:
1. Create a temporary table with the same table schema.
2. Copy all the data from the actual table to the temporary table using INSERT … SELECT method. Make sure that the column that needs to be updated to 0 (zero) has value 0 (zero) in the SELECT clause.
3. Drop the actual table. If the large table has constraints and indexes, make sure that you have the scripts to re-create them.
4. Rename the temporary table to the name of the actual table. Recreate any constraints and indexes.
I like the SELECT … INTO method. So here are the steps:
1. Create a temporary table with the same table schema using the SELECT … INTO method. The data is also copied to the temporary table at the same table the temporary table is created. Make sure that the column that needs to be updated to 0 (zero) has value 0 (zero) in the SELECT clause.
2. Truncate the actual table.
3. Use INSERT … SELECT method to insert the data from the temporary table to the actual table.
4. Drop the temporary table
My-coworker took my suggestion and eventually “updated” the column with all 0 (zero) value.
But it puzzled me that how she got the timeout error. “Updating” a big table might take a long time, but where is the timeout from?
Then it dawned on me that she is not “updating” the table using T-SQL, rather modifying the big table by using the table designer in SQL Server Management Studio.
Then I found this Microsoft article.
Although I prefer coding everything in T-SQL, there are programmers who use SQL graphical tools instead. Information from this article can be helpful.
Here is a summary from the article.
When you try to modify a large table by using the table designer in Microsoft SQL Server Management Studio, you may receive an error message that is similar to the following:
Unable to modify table.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
This behavior occurs because of the transaction time-out setting for the table designer and for the database designer in SQL Server Management Studio. By default, this setting is 30 seconds.
From the Tools à Options à Designers à Table and Database Designers, you can change this setting ‘Transaction time-out after’ to a higher number.
Notice that this setting differs from the setting in the Execution time-out box in SQL Server Management Studio. By default, the setting in the Execution time-out box for Query Editor in SQL Server Management Studio is zero. Therefore, Query Editor waits infinitely for the query to finish and never times out.
To resolve this by-design behavior, use one of the following methods:
- Click to clear the Override connection string time-out value for table designer updates check box for the table designer and for the database designer in SQL Server Management Studio. See the option in the first screenshot.
- Specify a high setting in the Transaction time-out after box for the table designer and for the database designer in SQL Server Management Studio. Also see the option in the first screenshot.
- Modify the large table by using Transact-SQL statements in Query Editor in SQL Server Management Studio. This is my recommendation to my co-worker. Make sure you don’t use the UPDATE statement directly on the large table, use a temporary table and take advantage of the bulk load optimizations in T-SQL with the INSERT … SELECT method or the SELECT … INTO method.
Now I wonder how my co-worker reacted when I sent her the T-SQL script to “update” the table to resolve the time-out issue.
Calculated Field has existed in Reporting Services since version 2005. SSRS 2005 did not have the most attractive user interface design, so this pretty useful feature stayed hidden from me until version 2008.
Even in SSRS 2008, I did not pay much attention to it until I started to write reports in MDX queries on SSAS cubes.
Once I started using them, I find myself like it more and more.
One of the challenges in SSRS reports is to set parameter defaults dynamically. It’s an even bigger challenge if you want to dynamically set parameter defaults to multiple values, such as (Select All). Don’t read this statement wrong. Setting parameter defaults to multiple values, such as (Select All), is not hard. What is hard is to do it dynamically, meaning the situation where you want to set the default or not depends on the selection of another parameter.
I recently ran into this issue when working on a Data-Driven Subscription project. The idea is that we’d use just one SSRS report for both online ad-hoc reporting, and for a daily data extraction that can be sent to users via E-Mail.
For the online ad-hoc reporting, we certainly do not want to default all parameters to (Select All). For the daily data extraction though, that’s exactly what we’d like to do.
Since I am using Data-Driven Subscription and a SSIS package to trigger the subscription, you would think that I can manipulate the parameter defaults in either SQL code and/or in SSIS scripting. But I highly recommend against that if your goals are
1) To use just one SSRS report for both online ad-hoc reporting, and for a daily data extraction
2) Not to replicate all the queries you already have in the SSRS report
3) Not to waste your valuable time to wrestle with Reporting Services. When it comes to parameters with defaults or cascading parameters, Reporting Services is not very forgiving in terms of dynamic settings. If you have tried that before, you probably know what I meant.
Using Calculated Field feature in SSRS 2005 and above is what I’d recommend.
Suppose you have a hidden parameter, Subscription.
Parameter Subscription has two values, 1 and 0.
Its default is 0 (not the 1 shown).
When Subscription is 1, we want to set the default of parameter Geography to (Select All).
We’d create a Calculated Field for DataSet_Geography. Call it GeographyKey_Dynamic.
In the Expression Editor for GeographyKey_Dynamic, enter an IIF() expression. This expression will basically use the GeographyKey value if the Subscription is 1, otherwise, it will be Nothing.
Now we are ready to use this new Calculated Field GeographyKey_Dynamic as the default value for parameter Geography.
DataSet_Geography is the same dataset for Available Values. The only difference is that in Available Values, we’d always use GeographyKey, which is a SQL column. For the Default Values, we’d use GeographyKey_Dynamic instead.
1) Watch out when you like the Calculated Fields too much and use them excessively. SSRS reports will typically go through Retrieval, Processing and Rendering phases during run time. If you have queried a Report Server database, you will find that the Processing and Rendering phases take more time than you’d like. Having too many Calculated Fields will no doubt increase the Processing time. So think about distributing the burden across queries on servers and calculations in memories.
2) For data-driven subscription, see my blog at
Recently I had privilege of reviewing a book “SQL Server Interview Questions and Answers”. Two authors Pinal Dave and Vinod Kumar are well-respected names in SQL Server community. I decided to write a book review with hope that database professionals can take full advantage of this book and ace that SQL Server interview.
I have worked with database applications for most of my career, and have been a SQL Server database developer for the last fifteen years, with specialization in Microsoft Business Intelligence development since 2006. I always thought SQL database development is my choice of career. However, by 2005, due to numerous family relocation that crossed two countries and 5 states and provinces in 4 years, I started to struggle with my career. I bumped from one contract to another, and endured (and failed) many grueling job interviews.
In one of such interviews, I was ushered into a conference room with 6 young IT professionals already sitting in the room with papers in front of them on the long conference table. One of them, the leader I assumed, started introduction. Before the lengthy introduction ended, I realized that these 6 young but serious-looking IT processionals weren’t the only interviewers I had to face. There were 4 other interviewers whom I would not be able see face to face, but I could hear them on the speaker phone. Three of them are developers specialized in certain areas and one is the hiring manager.
Needleless to say that I failed that interview miserably. It wasn’t until several failed job interviews that I began to see a pattern in the interview questions. That is they don’t ask tricky questions. They ask database concepts with SQL server, and expect accurate and complete answers with real-life examples.
I immediately started to collect every SQL interview question I could find online, and immersed myself in questions that I could not articulate very well, and expanded on the questions that I was personally interested in.
Looking back several years later, I wish I had a book like “SQL Server Interview Questions and Answers”. It has exactly what most interviewers are looking for in your answers, SQL server database concepts, accurate and complete answers with real-life examples.
Study the Interview Questions and Answers for Job Interviews
While talking to a co-worker who had to start a job interview process again, she mentioned to me that interviewers are really looking for textbook-style answers. She decided to spend a few months to study for a Microsoft SQL Server certificate exam, hoping that by the time she passes the certificate exam, she would be able to ace that job interview. I wish I knew about the book at the time so I could recommend the book to her.
Where can you get a copy?
Enjoy and good luck!
SSAS has two major functions:
1. a high-performance cubing technology, and
2. a data-mining technology
Here are the steps you can take to create an SSAS database using BIDS.
- 1. Create an SSAS project in BIDS.
- 2. Create a data source (DS) that connects to your data warehouse in star schema.
- 3. Create a data source view (DSV) to act as a layer between your data source and the data warehouse and to protect you from changes.
- 4. Run the Cue Wizard to create the first cube and dimension.
- 5. Finally, modify the cube and dimensions to scale better and be properly formatted.
A data warehouse is made of fact and dimension tables.
- Fact tables: hold measures, which are the numeric columns that you want to put on the report.
- Dimension tables: hold the columns in which you want to filter the data.
SSAS adopts many of the same items but has changed their names.
- Measure: individual column in fact table
- Measure Group: in SSAS measure groups are the collections of measures within a single fact table.
- Attributes: in SSAS attributes are the columns in dimensions.
- Members: unique values in attribute; When we process a dimension, queries against the underlying dimension table in the data warehouse to retrieve all the distinct values on an attribute-by-attribute basis. These distinct values are called members.
- User Hierarchy: If users consistently view the data in a given way in a dimension, we can create a user hierarchy. These hierarchies give the user the means to find the data quickly. Most date dimensions have a hierarchy for Year –> Quarter –> Month –> Date. Creating the hierarchy allows the user to drag over a single element that contains all those attributes.
- Attribute Relationships: Attribute relationships tie multiple attributes in a single SSAS dimension together, much as foreign keys tie various tables together in SQL Server. This helps with the cube’s performance in query time and processing time. It also helps with security and user experience.
- Cube: interceptions of dimensions; all aggregations are pre-stored in cube.
Assuming that now we’ve loaded our data warehouse. The rest of my notes are all about SSAS.
- SSAS is a service that can pre-aggregate data and make it faster to query data. According to the workshop presenter, it took only 2 seconds to query a 24 terabytes data (in grocery industry)
- Reduce report backlogs because SSAS allows your users to create complex ad hoc reports without having to involve IT.
- With SSAS, users can create report via familiar interfaces like Excel or Reporting Services with drag & drop.
- With SSAS, all the reporting logics are centralized.
- From a developer’s perspective, SSAS really means that no more bad SQL queries, no more waiting for 2 minutes for a report.