Tuesday 18 March 2014

Partitioning in SQL Server 2012 Databases - Part 2

View Partition
The concept of Partitioned View started with SQL 7, but Microsoft introduced updatable Partitioned View from SQL Server 2000.
To work with Partitioned View, we need to split the large table into smaller member tables. The data is partitioned between the member tables based on ranges of data values in one of the columns from main table. The data ranges for each member table are defined in a CHECK constraint specified on the partitioning column.
How to do it? – Let’s say you have a SalesOrderDetail tables, which storing data for all the country, now you want to split into a table for each country. You create tables for each country and accordingly store the data into it.
Creating the Partition View – We created partition tables that include the correct primary keys and constraints, SQL Server can access the correct partition table through a partition view.
A partition view that uses UNION ALL to vertically merge data from all the member tables into a single result set. When SELECT statements referencing the view specify a search condition on the partition column, the query optimizer uses the CHECK constraint definitions to determine which member table contains the rows.
You can partition a view locally, which means that all the underlying tables are stored on the same SQL Server instance.
But what, if your member tables residing across multiple servers? - You can also create a view using member tables from different server, which is known as a distributed partitioned view. The databases are said to be federated / merged. You can use linked servers to union all the member table data through views.
With the data split into several member tables, if you want you can be directly queried.  But best and flexible approach is to access the whole set of data by querying a partitioned view that unites all the member tables.
If a query accesses the union of all the member tables, the query processor can retrieve data only from the required member tables. This is the feature of SQL Server query processor to handle such a partitioned view.
A partitioned view not only handles selects, but also data can be inserted, updated, and deleted.
Important: The individual tables underneath the partitioned view are called member tables, not to be confused with partitioned tables, a completely different technology.

What are Table Partitions? - Partitioning a database improves performance and simplifies maintenance. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.
Maintenance tasks, such as rebuilding indexes or backing up a table, can run more quickly.
Partitioned tables are tables whose data is horizontally / vertically divided into units which may be spread across more than one filegroup in a database.
Partitioning makes large tables or indexes more manageable by letting you access or manage subsets of data quickly and efficiently.
Under a partitioning scenario, operations such as loading data from an OLTP to an OLAP system will be very efficient.
Maintenance operations that are performed on subsets of data are also performed more efficiently because they target only the data that is required, instead of the whole table.
Partitioned tables support all the properties and features associated with designing and querying standard tables, including constraints, defaults, identity and timestamp values, triggers, and indexes.
Note: To the DBA and to the end user it looks like there is only one table, but based on the partition scheme the underlying data will be stored in a different partitions and not in one large table.  This makes all of the existing code you have in place work without any changes and you get the advantage of having smaller objects to manage and maintain.
Neither vertical partitioning nor horizontal partitioning is automated in SQL Server 2012. Granted, you can automate the techniques by writing your own script or application, but SQL Server does not provide functions for such partitioning. If you want to automate the partitioning of your tables, you need to leave manual table creation behind and look at partitioning with functions and schemes.
Automated table partitioning is based on SQL Server 2012 features called partition functions and partition schemes. The partition function and scheme must exist before the table can be created as a partitioned table.
The first object that must be created is the partition function. The scheme depends on the function, and the table depends on the scheme. The partition function is created using the CREATE PARTITION FUNCTION command.
If your database contains very large tables, you may benefit from partitioning those tables onto separate filegroups. This technology, introduced in SQL Server 2005, allows you to spread data onto different physical disks, leveraging the concurrent performance of those disks to optimize query performance.
In earlier versions of SQL Server this was achieved by creating separate tables on different file groups of database, and then creating a view by using UNION on these tables to make it look like a single table.
First, let us create a Test database for partitioning with more than one file groups.
Here I am going to create [MyPartitionDB] database with two file groups – [PRIMARY] and [FileGroup2], both of these groups contain a data file:
--Create Database with filegroup
CREATE DATABASE [MyPartitionDB] ON
PRIMARY
      (     NAME = N'MyPartitionDB',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MyPartitionDB.mdf' ,
            SIZE = 3072KB , FILEGROWTH = 1024KB
      ),
FILEGROUP [FileGroup2]
      (     NAME = N'MyPartitionDB_2',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MyPartitionDB_2.ndf' ,
            SIZE = 3072KB , FILEGROWTH = 1024KB
      )
LOG ON
      (     NAME = N'MyPartitionDB_log',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MyPartitionDB_log.ldf' ,
            SIZE = 9216KB , FILEGROWTH = 10%
      )
--I created both files on the same disk, it is recommended to create files on separate disks for optimal performance.
Partitioning a SQL Server database table is a three-step process:
1.       Create the partition function
2.       Create the partition scheme
3.       Partition the table

Step 1: Creating a Partition Function – Define How to Split Data
This is the first step in creating a partitioned table. The partition function defines how you want SQL Server to partition the data. At this point, we’re not concerned with any particular table, we’re just generically defining a technique for splitting data.
You can define the partitions by specifying the boundaries of each partition.
For example, suppose we have an Internet User table that contains information on all of our Internet Users, identified by a unique number, ranging from 1 to 1,000,000. We might decide to partition that table into four equally spaced partitions, using the following partition function:
--Step 1: Creating a Partition Function
CREATE PARTITION FUNCTION InternetUser_PartFunction (int)
AS RANGE RIGHT
FOR VALUES (250000, 500000, 750000)
The above boundaries define four partitions. The first contains all values less than 250,000. The second contains values between 250,000 and 499,999. The third contains values between 500,000 and 749,999. All values greater than or equal to 750,000 go in the fourth partition.
Notice that I used the "RANGE RIGHT" clause in this example. This indicates that the boundary value itself should go in the partition on the right side.
Alternatively, if I had used "RANGE LEFT", the first partition would have included all values less than or equal to 250,000; the second partition would have included values between 250,001 and 500,000, and so on.
Tip: Three catalog views expose information about partition function: syspartition_functions, syspartition_function_range_values, and syspartition_parameters.
Step 2: Creating a Partition Scheme - Defines where to store each range of Data
Once you have a partition function describing how you want to split your data, you need to create a partition scheme defining where you want to partition it.
This is a straightforward process that links partitions to filegroups. In above example we created a Database with only one file group, if you want, you can create four file groups and assign each partition to specific file group. Here we will assign these partition to primary and FileGroup2 as follows:
--Step 2: Creating a Partition Scheme
CREATE PARTITION SCHEME InternetUser_PartScheme
AS PARTITION InternetUser_PartFunction
TO ([FileGroup2], [FileGroup2], [FileGroup2], [PRIMARY])
Notice that we now link a partition function to the partition scheme, but we still haven’t linked the partition scheme to any specific database table. That’s where the power of reusability comes into play. We could use this partition scheme (or just the partition function, if we desired) on any number of database tables.
Tip: To examine information about partition schemes programmatically, query sys.partition_schemes.
Step 3: Partitioning a Table
After defining a partition scheme, you’re now ready to create a partitioned table. This is the simplest step of the process. You simply add an "ON" clause to the table creation statement specifying the partition scheme and the table column to which it applies. Only partition scheme need to be specified, you don not need to specify a partition function as this is already defined in partition scheme:
--Step 3: Partitioning a Table
--For example, if you wanted to create a Internet Users table using our partition scheme,
--you would use the following Transact-SQL statement:
CREATE TABLE InternetUsers
(UserID int, FirstName varchar(40), LastName varchar(40), Email varchar(50))
ON InternetUser_PartScheme (UserID)
That’s everything you need to know about partitioning tables in Microsoft SQL Server. Remember to leverage the power of reusability by writing generic partition functions and partition schemes that might apply to multiple tables.
Note: The performance benefit of partitioned tables doesn't visible until the table is extremely large (billion-row per table) in terabyte size databases. In some testing, partitioned tables actually hurt performance on smaller tables with less than a million rows. Maybe that's why table partitioning isn't included in Standard Edition.
On the other hand, even for tables with fewer than one million rows, partitioning can be an effective part of archiving old data into one partition while keeping current data in another partition.
How to partition an existing table?
It’s very simple, create a CLUSTERED INDEX on the table.
While creating a clustered index you need to specify the partition scheme to be used in ON clause of CREATE INDEX statement.
We still need to create a partition function and a partition scheme, we have learned earlier about how to create the same.

CREATE CLUSTERED INDEX [IndexName]
ON [dbo].[TableName]
(
      [ColumnName] ASC
) ON PartitionScheme(PartitionColumnName)
Lab: To create and use the Table Partition, Please view the file ‘15. Large Database.sql’.
Tip: To see information about how partitions are being used, look at sys.partitions.

Querying Metadata Information - Partition
To query metadata information about your partitioned tables and indexes SQL Server provides you with the $PARTITION() function and several catalog views. The $PARTITION() function is used in conjunction with the partition function name ($PARTITION.partition_function_name()).
The function's purpose is to return the target partition number of a given input value that represents a partitioning column value.
For example, our tables and indexes were partitioned using the InternetUser_PartFunction() function, which accepts a integer value as input. To find the target partition number for the value 2458, use the following code:
SELECT $PARTITION.InternetUser_PartFunction(2458)
You'll get 1 as the output because the target partition for the given input date is 1. Note that the input value doesn't necessarily have to represent a row's order date that already resides in the table. You can use the function to find out a row's target partition number with a certain order date before you insert it into the table. You can also use the function as part of a query to check in which partition each row resides as shown below:
SELECT UserID, FirstName, LastName, Email,
$PARTITION.InternetUser_PartFunction(UserID)AS 'Partition'
You can also query many catalog views to get information about the partitioned tables, indexes, the partition functions, and partition schemes. Here's the list of catalog views available and a brief description of each:
         sys.partition_functions - returns information about partition functions
         sys.partition_parameters - returns information about parameters of partition functions
         sys.partition_range_values - returns information about partition function's boundary values
         sys.partition_parameters - returns information about parameters of a partition function
         sys.partition_range_values - returns information about boundary values of a partition function
         sys.partition_schemes - returns information about partition schemes
         sys.data_spaces - returns information about partition schemes
         sys.destination_data_spaces - returns information about individual partition schemes
         sys.partitions - returns information about partitions
         sys.tables - returns information about partitioning of tables
         sys.indexes - returns information about partitioning of indexes
         sys.index_columns - returns information about partitioning of indexes
For example, following query gives you partition numbers and an approximate number of rows in each partition of the Orders table:
SELECT      partition_id, object_id, partition_number, rows
FROM        sys.partitions
WHERE       object_id = OBJECT_ID('InternetUsers')

The difference between using this query as opposed to an aggregate query against the 'InternetUsers' table with the $PARTITION() function is that the former gives you the result immediately while the latter actually counts the rows. The row counts in sys.partitions view aren't updated with each modification submitted against a partition.
So if you need a rough estimate you can query the sys.partitions view, but if you need an accurate number, you need to use the $PARTITION() function and actually count.
Another example of querying catalog views for partitioning information is the following query, which returns the 'InternetUser_PartFunction function's boundary values:
SELECT * FROM sys.partition_range_values
WHERE function_id = (SELECT function_id
       FROM sys.partition_functions WHERE name = 'InternetUser_PartFunction')
How to Alter partition Table?
The Alter commands are simple, modifying the design of partition tables never executes very quickly, as you can imagine.
Merging Partitions
Merge and split surgically modify the table partition design. The ALTER PARTITIONMERGE RANGE command effectively removes one of the boundaries from the partition function and merges two partitions.
For example, to remove the boundary between 25000 and 5000 in the InternetUser_PartFunction partition function, and combine the data from 1 to 25000 and 25001 to 50000 into a single partition, use the following ALTER command:
--Merging Partitions
ALTER PARTITION FUNCTION [InternetUser_PartFunction]()
MERGE RANGE (250000);
Note: If multiple tables share the same partition scheme and partition function being modified, then multiple tables will be affected by these changes.
Splitting Partitions
To split an existing single partition, the first step is to designate the next filegroup to be used by the partition scheme. This is done using the ALTER PARTITION…NEXT USED command. If you specify too many filegroups when creating a scheme, you will get a message that the next filegroup used is the extra filegroup you specified. Then the partition function can be modified to specify the new boundary using the ALTER PARTITION…SPLIT RANGE command to insert a new boundary into the partition function. It's the ALTER FUNCTION command that actually performs the work.
In above example we merge the range, now we will split using the below code:
--Splitting Partitions
ALTER PARTITION SCHEME [InternetUser_PartScheme]
NEXT USED [FileGroup2];
ALTER PARTITION FUNCTION [InternetUser_PartFunction]()
SPLIT RANGE (25000);
Can we move new (entire) Table into a partition?
Yes, SQL Server switching tables is allow to move an entire table into a partition within a partitioned table, or to remove a single partition so that it becomes a stand-alone table.
But this cool and useful functionality comes with some restrictions:
         Every index for the partition table must be a partitioned index.
         The new table must have the same columns (excluding identity columns), indexes, and constraints (including foreign keys) as the partition table, except that the new table cannot be partitioned.
         The source partition table cannot be the target of a foreign key.
         Neither table can be published using replication, or have schema-bound views.
         The new table must have check constraint restricting the data range to the new partition, so SQL Server doesn't have to re-verify the data range.
         Both the stand-alone table and the partition that will receive the stand-alone table must be on the same filegroup.
         The receiving partition or table must be empty.
Performing Switching In
The ALTER TABLE…SWITCH TO command will move the new table into a specific partition. To determine the empty target partition, select the database Summary page à Disk Usage report:
--Switching New Tables in Partition
ALTER TABLE InternetUsersNew
SWITCH TO InternetUsers1 PARTITION 4
Performing Switching Out
The same technology can be used to switch a partition out of the partition table so that it becomes a stand-alone table. Because no merger is taking place, this is much easier than switching in. The following code takes the first partition out of the InternetUsers1 partition table and reconfigures the database metadata so it becomes its own table:
--Performing Switching Out
ALTER TABLE InternetUsers1
SWITCH PARITION 4 to InternetUsersArchive
How to implement sliding window partitioning?
Now we know, Partitioning is the internal storage separation for the subset of the data in the entity. Think about it as about one “virtual” table and multiple “internal” tables which store and combine data seamlessly to the client.
Let’s think about specific functional use-cases when it’s beneficial:
Sliding window scenario: Assuming you have production database and you need to keep one year worth of the data in the table. Every 1st day of the month you want to purge the data that is more than 1 year old. Without partitioning, you’ll have to use delete statement that would be time consuming and introduce extensive locking in the database (There are a couple ways to avoid it though).
With partitioning, you simply switch oldest partition to the temporary table and truncate it. This is metadata operation that takes milliseconds. We will talk about it in this section.
Production/historical tables pattern: Think about the previous example when you need to move data to historical table in the same database. Similar approach – you technically move the partition from production to historical tables.
Bulk data load: Sometimes it can work in the opposite direction. The data can be loaded to the temporary table and next, that table becomes the new partition. This minimizes the locking during the load. This one has the limited usage though – when system rarely loads the data. You don’t want to do it on daily basis and have hundreds of partitions.
Sliding / Rolling Partition
Implementing of sliding window scenario is one of the biggest benefits you can get from table partitioning. This is quite simple process – can be used in 2 different scenario:
·         Create the new partition for the future data
·         Switch the oldest partition to the temporary table. After that you can either truncate or archive that data. Possibly even switch this temporary table as the new partition in the historical table.
Sliding Window / Rolling partitions are useful for time-based partition functions such as partitioning a year of data into months. Each month, the rolling partition expands for a new month. To build a 13-month rolling partition, perform these steps each month:
·         Add a new boundary.
·         Point the boundary to the next used filegroup.
·         Merge the oldest two partitions to keep all the data.
Please find the below links, which will give you more information about Sliding Partition:
How to remove partition?
To remove the partitioning of any table, drop the clustered index and add a new clustered index without the partitioning ON clause.
When dropping the clustered index, you must add the MOVE TO option to actually consolidate the data onto the specified filegroup, thus removing the partitioning from the table:
DROP INDEX Index_Name
ON [dbo].[TableName] WITH (MOVE TO [Primary]);
Do you know about Data Driven Partitioning?
This is not related to any Microsoft partitioning technology.
This is purely based on architectural pattern that DBA used in large, heavy transaction databases.
Very simple, but very fast.
A data-driven partitioning scheme segments the data into different servers based on a partition key. Each server has the same database schema, but the data stored is only the required data partition key or ranges.
For example, server A could hold accounts 1–999. Server B could hold accounts 1,000–1,999. Server C could hold all accounts greater than or equal to 2,000.
A partition mapping table stores the server name for each partition key value or range of values. The partition key table would hold from and to account numbers and the server name.
The middle tier reads and caches the partition mapping table, and for every database access it checks the partition mapping table to determine which server holds the needed data.
Important Points Related to Table Partitioned:
·         Partitions cannot apply for servers or instances.
·         Partitions have to be in the same instance and in the same database. Partitioning therefore is a scale-up solution.
·         A scale-out solution for SQL Server can be implemented through distributed partitioned views hosted on Federated Database Servers. This is an advanced solution.
·         Partitions must be created on different filegroups – False.
The partition scheme definition defines on which filegroup a partition resides. It's a common misconception that you have to spread your partitions out among multiple filegroups.
The only reason, in my opinion, that you would want multiple filegroups is if those filegroups reside on physically separate drives and you were doing this to improve your I/O for large range queries.
I have not seen any performance benefit to having multiple filegroups located on the same drive.
·         To partition a non-partitioned table you will need to drop and recreate table again – False.
You can partition an existing table by creating a clustered index (or rebuilding an existing clustered index) on your new Partition Scheme. This will effectively partition your data as the leaf level of a clustered index is essentially the data. 
Please have a look above example ‘Partition an Existing Table’.
·         Partitioning an existing table is a strictly offline operation - True
As we know rebuilding or creating a clustered index is indeed an offline operation.
Your table will not be available for querying during this operation.
However, partitioning is an Enterprise feature, so we have the online index rebuild feature available to use.
The ONLINE = ON option allows us to still query the table while under the covers the partitioning operation is going on.
SQL Server does this by using an internal snapshot of the data. Obviously, there is a performance hit and I don't recommend you do this during a busy time but if you have a requirement for 24x7 availability then this is a possible solution.
CREATE CLUSTERED INDEX [IndexName]
ON [dbo].[TableName]
WITH (DROP_EXISTING = ON, ONLINE = ON)
(
      [ColumnName] ASC
)
ON PartitionScheme(PartitionColumnName)
·         Altering a partition function is a metadata only operation - False
In practice you may find out that a MERGE or SPLIT operation may take much longer than the few seconds expected. Altering a partition function is an offline operation and can also result in movement of data on disk and so become extremely resource intensive.
·         Query Performance on Partitioned Tables: Partition Elimination and Beyond
SQL Server tries to identify when it can use limited parts of a partitioned table. The SQL Server query optimizer may direct a query to only a single partition, multiple partitions, or the whole table.
Using fewer partitions than the entire table is called “partition elimination.”
·         How to decide if we should use table partitioning?
Please follow the below link for detail:

7 comments:

  1. Sometime few educational blogs become very helpful while getting relevant and new information related to your targeted area. As I found this blog and appreciate the information delivered to my database.
    หิ้งพระโต๊ะหมู่บูชา

    ReplyDelete
  2. Hi,

    Can you please tell me if vertical partitioning is available in SQL 2014 .
    The MSDN link mentions only horizontal partitioning.

    https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-2014

    Can you please confirm.

    ReplyDelete