Monday, 17 March 2014

New and Enhanced Features - SQL Server 2012 (What's New in SQL Server 2012) - Part 5

Beyond Relational Enhancements

As many of you are familiar with the concept of beyond relational, basically beyond relational is data and services that go above and beyond the traditional table data in SQL Server.
With SQL Server 2012 a tremendous number of enhancements have been made to the programmability and beyond-relational paradigm. Some of the beyond-relational enhancements in SQL Server 2012 positively affecting the ability to build applications managing all data include refinements to existing technology features such as full-text search, spatial data, and FILESTREAM.
SQL Server 2012 also introduces brand new capabilities, such as FileTables and statistical semantic searches.
·         Why Microsoft having more concern for Beyond Relation Data? - Working with both relational and nonrelational data in single application or project is an extremely complex task for any organizations in current business trend.
There are many two regions for this complexity: First, it is challenging to integrate structured and non-structured data because the data is altogether different. Second, structured data usually resides in a relational database whereas unstructured data is stored in file folders or shares on a server. This creates many issues for organizations e.g. a database admin will need to manage two different sets of storage spaces: a relational database and files shares that host unstructured data.
Here DBA requires two different approaches to achieve both high availability and disaster recovery, and there isn’t an easy way to keep the structured and unstructured synchronized data. Finally, it’s challenging to obtain a single search experience to obtain relevant data across structured and non-structured formats.
Before moving to the answer, here one more important point, in the current industry trends the volume of beyond-relational data (not structured data, is not stored in databases, is not managed, and cannot scale) is more than the relational data.
Now we can understand the reasons for investing in the beyond-relational paradigm in SQL Server 2012, SQL Server can give organizations a way to manage unstructured data while providing a rich application experience and the opportunity to gain valuable insight from the data. In addition, this solution can scale to support several hundred million documents.
·         Take a look on previous release of SQL Server: From last few release of SQL Server, Microsoft focus on beyond-relational capabilities, if you remember in the past, you had to use binary large objects (BLOBs) such as varbinary max columns to store unstructured data in tables within a database. This approach provided integration; however, the solution lacked the same streaming speed organizations typically received when documents were stored directly in the Windows file system.
To address these concerns, FILESTREAM was introduced with SQL Server 2008. FILESTREAM enabled SQL Server–based applications to store unstructured data, such as documents and images, on the file system. Applications could leverage the rich streaming APIs and performance of the file system while maintaining transactional consistency between the unstructured data and corresponding structured data.
However, FILESTREAM still left a gap for some Windows applications that required the Windows 32 Directory Hierarchy file system to function, because you needed to provide a transactional context to the FileOpen command that you could achieve only with the FileStream APIs. Those applications still could not leverage the FILESTREAM feature.
Another new capability introduced in SQL Server 2008 to address unstructured data was Remote BLOB Store (RBS). The RBS standardized API set allowed for a binary large object (BLOB), such as an office document or video, to be stored within a vendor-independent API. RBS provides a vendor independent API to store BLOB IDs of documents stored in remote BLOB storage inside the database and to manage the link consistency.
The actual size and location of the files and documents were stored in the database instead of by its structure. RBS allowed database administrators to tie the relational data with the RBS data in a loosely coupled manner.
RBS does not integrate with full-text search, nor does it provide full transactional integration between the two stores, but it does allow database administrators to build applications in a standardized way, allowing a store to pass data back and forth.
Do you know how Microsoft Outlook works with Microsoft Exchange? – You many know that these applications store both structured data and unstructured data, and they provide traditional search capabilities that are similar to semantic search by finding related documents across all of the elements in the application. In general, each email message received, including its date and priority, is stored in a structured database, whereas the actual content of the message - including attachments such as pictures and Microsoft Office Documents are considered to be unstructured data. By using the search inbox tool, we have the ability to achieve both structured and unstructured search capability across messages with specific keywords.
Now we will discuss in details about new enhancements:
1.       File Table: FileTable is one of best exciting feature provided by SQL Server 2012. The FileTable feature builds on top of the SQL Server FILESTREAM technology. The major advantage of FileTable is it has Windows API compatibility for file data stored within an SQL Server database. This lets an application integrate its storage and data management components, and provides integrated SQL Server services (including full-text search and semantic search) over unstructured data and metadata, along with easy policy management and administration.
FileTable is a user table that gets created within a database in the Database Engine. FileTable has a fixed schema and contains FILESTREAM and file attributes. Users can define indexes, constraints, and triggers; however, columns and system-defined constraints cannot be altered or dropped. Each row within the table represents a file or a directory, and the tree integrity is maintained by system-defined constraints. This is because there is a need to enforce the Windows semantics to ensure the solution works the way NTFS expects.
In summary, FileTable provides a specialized table that makes use of the FILESTREAM feature. An organization can now store files and documents in special tables in SQL Server and access them from Windows applications as if they were stored in the file system, without making any changes to the Windows applications, while providing a Server Message Block (SMB) share view and full Win32 application compatibility.
FileTables are a great way for organizations to remove barriers to using SQL Server to store unstructured data that is currently residing as files on file servers.
Managing and securing FileTable is similar to a traditional table stored within the SQL Server Database Engine. FileTable data supports backup and restore operations and SQL Server 2012 AlwaysOn Availability Groups for both high availability and disaster recovery. For more detail, visit the link: http://msdn.microsoft.com/en-us/library/ff929144.aspx
2.     FILESTREAM Enhancements: As you know, FILESTREAM enables SQL Server- based applications to store unstructured data, such as documents and images, on the file system. With regard to FILESTREAM in SQL Server 2008 R2, only one storage container per FILESTREAM file group is supported. This limits storage capacity scaling and I/O scaling.
SQL Server 2012 has made investments in FILESTREAM by adding support for multiple storage containers, to achieve maximum scale-up functionality. In addition, the following list articulates other new features associated with FILESTREAM:
Support for multiple storage containers and file groups:
      Data Definition Language (DDL) changes to Create/Alter Database statements
      Ability to set max_size for the containers
      Database Console Commands (DBCC) Shrinkfile, Emptyfile support
      Storage scaling by adding additional storage drives
      I/O scaling with multiple disks
3.      Full-Text Search: Microsoft has been invested huge time and money to improve the Full-Text Search in SQL Server 2012. The improved full-text search improved in performance and scale, new functionality, including a new semantic-similarity search capability. From a performance and scale perspective, full-text search in SQL Server 2012 now scales over 100 million documents; some test cases have even reached 350 million documents.
      Property Search: Beginning in SQL Server 2012, you can configure a full-text index to support property-scoped searching on properties, such as Author and Title, which are emitted by IFilters. This form of searching is known as property searching. Whether property searching is possible on a given type of document depends on whether the corresponding filter (IFilter) extracts search properties during full-text indexing. Among IFilters that extract a number of document properties are the IFilters for Microsoft Office 2007 document file types, such as .docx, .xlsx, and .pptx
      Customizable NEAR: Beginning in SQL Server 2012, you can customize a proximity search by using the new custom NEAR option of the CONTAINS predicate or CONTAINSTABLE function. Custom NEAR enables you to optionally specify the maximum number of non-search terms that separate the first and last search terms in a match. Custom NEAR also enables you to optionally specify that words / phrases are matched only if they occur in order in which specify them.
New Word Breakers and Stemmers: All the word breakers and stemmers used by Full-Text Search and Semantic Search, with the exception of the Korean language, are updated in this release.
4.      Semantic Search {New}: Statistical semantic search extends the full-text search capability by providing semantic insight into textual content. Statistical Semantic Search provides deep insight into unstructured documents stored in SQL Server databases by extracting and indexing statistically relevant key phrases. Then it also uses these key phrases to identify and index documents that are similar or related. You query these semantic indexes by using three Transact-SQL rowset functions to retrieve the results as structured data.
You can now include automatic tag extraction, related content discovery, and hierarchical navigation across similar content. For example, you can query the key phrases index to build the taxonomy for an organization or a corpus of documents. Similarly, you might want to query the document similarity index to identify resumes that match a job description.
Before an organization can index documents with semantic search and return relevant results, the documents must be stored in a SQL Server database. The FileTable and FILESTREAM feature in Microsoft SQL Server 2012 is a prerequisite.
To leverage the full-text semantic search capability within SQL Server, a database administrator must ensure the Full-Text and Semantic Extractions for Search feature was installed.
5.     Spatial Enhancements: Spatial data typically refers to the geographic location of features and boundaries in relation to the Earth, including but not limited to land masses, oceans, and natural or constructed features.
Spatial data is stored as coordinates and topology in either a raster or vector data format that can be mapped.
SQL Server 2008 supports geometry and geography data types so that organizations can store spatial vector data. These geometry and geography data types support methods and properties allowed for the creation, comparison, analysis, and retrieval of spatial data.
Improvements in spatial support continue with SQL Server 2012, especially with regard to spatial type and performance.
With the launch of SQL Server 2012, new spatial type features are introduced. Specifically, there are new subtypes for geometry and geography data types that support circular arc segments such as circular strings, compound curves, and curve polygons. All of the methods support circular arcs, including circular-arc support on an ellipsoid.
Note: SQL Server 2012 is the first commercial database system to support spherical circular arcs.
The geography data type has achieved parity with the geometry data type in the functionality and the variety of methods that it supports. Overall performance, from spatial indexes to methods, has significantly improved. These and other improvements to spatial data support represent a significant step forward in the spatial capabilities of SQL Server.
6.      ODBC for Linux: The Microsoft SQL Server ODBC Driver for Linux allows native C and C++ applications to leverage the standard ODBC API and connect directly to SQL Server 2012.
This offers database administrators and developers greater flexibility when building and managing applications that run on Linux and co-exist with the SQL Server environment.
The highlights of this release are native connectivity from Linux to SQL Server via ODBC, support for 64-bit Red Hat Enterprise Linux 5 and 6 and tools such as BCP and SQLCMD.
7.     SQL Azure Enhancements: These don't really go directly with the release of SQL 2012, but Microsoft is making some key enhancements to SQL Azure.
Reporting Services for Azure will be available, along with backup to the Windows Azure data store, which is a huge enhancement. The maximum size of an Azure database is now up to 150G.
Also Azure data sync allows a better hybrid model of cloud and on-premise solutions
8.    Big Data Support: I saved the biggest for last, introduced at the PASS (Professional Association for SQL Server) conference last year; Microsoft announced a partnership with Hadoop provider Cloudera.
One part of this involves MS releasing a ODBC driver for SQL Server that will run on a Linux platform. Additionally, Microsoft is building connectors for Hadoop, which is an extremely popular NoSQL platform.
        With this announcement, Microsoft has made a clear move into this very rapidly growing space.

No comments:

Post a Comment