In Part I, I introduced the new index structure in a SQL Server 2012 “ColumnStore” index, examined the storage mechanism, and illustrated how this index helped us to boost the query performance in a data warehouse. (http://blog.gnetgroup.com/bi/2012/10/22/introduction-to-sql-server-2012-columnstore-index-part-i/) Today, some related topics will be covered, including considerations in loading data into the “ColumnStore” index-enabled table, typical scenarios using and not using a “ColumnStore,” and some constraints in a “ColumnStore” index.
Considerations for Loading Data
According to the design of a “ColumnStore” index, it makes the table read-only, which means you cannot write data into this table in a normal way; including insertion, updating and deleting. While this sounds bad, it is fine for tables in a data warehouse. As we all know, one of the characteristics in a data warehouse is that tables are mainly for reading by end users and data loading happens usually in a fixed window of time, for example, nightly or weekly. Regardless, how do we load data into a read-only table?
Don’t worry! There are two basic approaches to help us walk around this issue.
Disabling Index and re-enabling Index
The first approach is frequently used in many ETL design. Before loading the data, disable the index and then re-enable the index. This is deemed as a best practice for loading data into a big table because it can speed up the whole ETL process and the implementation is as simple as ABC. The detailed steps can follow as you disable a “ColumnStore” index, load data and re-enable the index.
Compared to the first approach, partition switching especially fits for the big data loading, which has outstanding performance benefits.
The steps are below:
- Load data into a staging table
- Create a “ColumnStore” Index on a stage table
- Split one empty partition on the main table
- Switch the stage table into the empty partition
You can refer to the scripts created in Table 1 (assuming your main table has been designed as a partition table).
insert into [stg].[FactOnlineSales]
select * from [dbo].[FactOnlineSales] where OnlineSalesKey>=31000000
–Create CS index on partition table
select * from [STG].[FactOnlineSales]
–Step 2, Create CS Index on stg
CREATE NONCLUSTERED COLUMNSTORE INDEX [NonClusteredColumnStoreIndex_Demo_STG] ON [STG].[FactOnlineSales]
)WITH (DROP_EXISTING = Off)
—Create one more parition on the base table
ALTER PARTITION SCHEME [ps_fact_sales]
NEXT USED [Primary]
ALTER PARTITION FUNCTION [pf_range_fact]()
MERGE RANGE (30500000)
ALTER PARTITION FUNCTION [pf_range_fact]()
SPLIT RANGE (31000000)
partition = $PARTITION.[pf_range_fact]([OnlineSalesKey])
,rows = COUNT(*)
,min = MIN([OnlineSalesKey])
,max = MAX([OnlineSalesKey])
GROUP BY $PARTITION.[pf_range_fact]([OnlineSalesKey])
ORDER BY PARTITION
–Switch the data into the new parition
ALTER TABLE [STG].[FactOnlineSales]
SWITCH TO [dbo].[FactOnlineSales_Prtition] PARTITION 6
Table 1: Partition Switching
Typical Scenarios When Building a “ColumnStore” Index
Generally speaking, a “ColumnStore” index especially fits for the big tables in a data warehouse. The main reason, which has been mentioned before, is that tables in a data warehouse system for end users are read-only (query intensive). On the contrary, if your system is mainly for read-and-write, then this is not the choice for you.
Specifically, we can consider using a “ColumnStore” index in following scenarios. First of all, if you have the report query running on the data warehouse system directly and the performance is good enough for end users, plugging this magic index on the main fact tables is probably a good idea. Secondly, if you are experiencing long times for cube processing, enabling this magic index on the fact table would be a good panacea for you. Then, in your EDW system, if your cube is designed as ROLAP mode in order to accelerate the data analysis for your business users, a “ColumnStore” index is a smart choice. Next, if you have deployed the Tabular BI sematic model in a “Direct Query” mode, you should try this magic index. These are the typical cases that should be considered when using a “ColumnStore” index.
On the other hand, in some cases a “ColumnStore” index does not fit. For example, if your fact table is required to be frequently updated (if your cube enables the feature of writing back), as we said the fact table with a “ColumnStore” index is read-only, so we cannot use a “ColumnStore” index on this kind of fact tables. Additionally, if your table cannot be implemented by partition switching or index rebuilding, you also cannot consider using this index. Last but not least, if your query is highly-selective enough, then your common index helps and a “ColumnStore” is not necessary.
In this release, SQL Server 2012, there are still some constraints in “ColumnStore” index. So far, you only can create one and only one “ColumnStore” index per table. Further, only a non-clustered “ColumnStore” index is supported and you cannot create a clustered one. A “ColumnStore” index also cannot be created with an indexed view. Moreover, you cannot blend a “ColumnStore” with a “filter” index. Lastly, not every column type is supported to be included in a “ColumnStore” index. The unsupported data types are Binary, BLOB, UniqueIdentifier, CLR, decimal(>18 digits), (n)varchar(max), datetime2.
In this post (Part II), I have shared two best practices on how to move data into the fact table with a “ColumnStore” index. Also, it points out the typical scenarios for using and not suing a “ColumnStore” index. In the end, we also examine some constraints for a “ColumnStore” in this release.