Beyond
Relational Enhancements
As many of you are familiar with the concept of beyond relational, basically beyond
relational is data and services that go above and beyond the traditional
table data in SQL Server.
With SQL Server 2012 a tremendous number of enhancements
have been made to the programmability and beyond-relational paradigm. Some of
the beyond-relational enhancements in SQL Server 2012 positively affecting the
ability to build applications managing all data include refinements to existing
technology features such as full-text
search, spatial data, and FILESTREAM.
SQL Server 2012 also introduces brand new
capabilities, such as FileTables and
statistical semantic searches.
·
Why
Microsoft having more concern for Beyond Relation Data? - Working
with both relational and nonrelational data in single application or project is
an extremely complex task for any
organizations in current business trend.
There are
many two regions for this complexity: First, it is challenging to integrate
structured and non-structured data because the data is altogether different.
Second, structured data usually resides in a relational database whereas
unstructured data is stored in file
folders or shares on a server. This creates many issues for organizations
e.g. a database admin will need to manage two different sets of storage spaces:
a relational database and files shares that host unstructured
data.
Here DBA
requires two different approaches to achieve both high availability and
disaster recovery, and there isn’t an easy way to keep the structured and
unstructured synchronized data. Finally, it’s challenging to obtain a single
search experience to obtain relevant data across structured and non-structured formats.
Before
moving to the answer, here one more important point, in the current industry
trends the volume of beyond-relational
data (not structured data, is not stored in databases, is not managed, and
cannot scale) is more than the
relational data.
Now we
can understand the reasons for investing in the beyond-relational paradigm in
SQL Server 2012, SQL Server can give organizations a way to manage unstructured data while providing a rich
application experience and the opportunity to gain valuable insight from
the data. In addition, this solution can scale to support several hundred
million documents.
·
Take a
look on previous release of SQL Server: From last few release of SQL
Server, Microsoft focus on beyond-relational capabilities, if you remember in
the past, you had to use binary large
objects (BLOBs) such as varbinary
max columns to store unstructured data in tables within a database. This
approach provided integration;
however, the solution lacked the same streaming speed organizations typically
received when documents were stored directly in the Windows file system.
To
address these concerns, FILESTREAM
was introduced with SQL Server 2008. FILESTREAM enabled SQL Server–based
applications to store unstructured data, such as documents and images, on the
file system. Applications could leverage the rich streaming APIs and
performance of the file system while maintaining transactional consistency
between the unstructured data and corresponding structured data.
However,
FILESTREAM still left a gap for some Windows applications that required the Windows 32 Directory
Hierarchy file system to function, because you needed to provide a
transactional context to the FileOpen command that you could achieve only with
the FileStream APIs. Those
applications still could not leverage
the FILESTREAM feature.
Another
new capability introduced in SQL Server 2008 to address unstructured data was Remote BLOB Store (RBS). The RBS standardized API set allowed for a binary large object (BLOB), such as an
office document or video, to be stored within a vendor-independent API. RBS
provides a vendor independent API to store BLOB IDs of documents stored in
remote BLOB storage inside the database and to manage the link consistency.
The
actual size and location of the files and documents were stored in the database
instead of by its structure. RBS allowed database administrators to tie the relational data with the RBS data in a
loosely coupled manner.
RBS does
not integrate with full-text search, nor does it provide full transactional
integration between the two stores, but it does allow database administrators
to build applications in a standardized way, allowing a store to pass data back
and forth.
Do you
know how Microsoft Outlook works with Microsoft Exchange? – You
many know that these applications store both structured data and unstructured
data, and they provide traditional search capabilities that are similar to
semantic search by finding related documents across all of the elements in the
application. In general, each email
message received, including its date and priority, is stored in a structured database, whereas the actual
content of the message - including attachments such as pictures and Microsoft Office
Documents are considered to be
unstructured data. By using the search inbox tool, we have the ability to
achieve both structured and unstructured search capability across messages with
specific keywords.
Now we
will discuss in details about new enhancements:
1. File Table:
FileTable is one of best exciting feature provided by SQL Server 2012. The FileTable feature builds on top of the
SQL Server FILESTREAM technology. The major advantage of FileTable is it has
Windows API compatibility for file data stored within an SQL Server database.
This lets an application integrate its storage and data management components,
and provides integrated SQL Server services (including full-text search and semantic
search) over unstructured data and metadata, along with easy policy
management and administration.
FileTable is a
user table that gets created within a database in the Database Engine.
FileTable has a fixed schema and contains FILESTREAM and file attributes. Users
can define indexes, constraints, and triggers; however, columns and
system-defined constraints cannot be altered or dropped. Each row within the
table represents a file or a directory, and the tree integrity is maintained by
system-defined constraints. This is because there is a need to enforce the
Windows semantics to ensure the solution works the way NTFS expects.
In
summary, FileTable provides a specialized table that makes use of the FILESTREAM
feature. An organization can now store files and documents in special tables in
SQL Server and access them from Windows applications as if they were stored in
the file system, without making any changes to the Windows applications, while
providing a Server Message Block (SMB) share view and full Win32 application compatibility.
FileTables
are a great way for organizations to remove barriers to using SQL Server to
store unstructured data that is currently residing as files on file servers.
Managing
and securing FileTable is similar to a traditional table stored within the SQL
Server Database Engine. FileTable data supports backup and restore operations
and SQL Server 2012 AlwaysOn Availability Groups for both high availability and
disaster recovery. For more
detail, visit the link: http://msdn.microsoft.com/en-us/library/ff929144.aspx
2. FILESTREAM Enhancements: As you know, FILESTREAM enables
SQL Server- based applications to store unstructured
data, such as documents and images, on the file system. With regard to
FILESTREAM in SQL Server 2008 R2, only one storage container per FILESTREAM
file group is supported. This limits storage capacity scaling and I/O scaling.
SQL
Server 2012 has made investments in FILESTREAM by adding support for multiple storage containers, to achieve
maximum scale-up functionality. In addition, the following list articulates
other new features associated with FILESTREAM:
Support
for multiple storage containers and file groups:
•
Data Definition Language (DDL) changes to
Create/Alter Database statements
•
Ability to set max_size for the containers
•
Database Console Commands (DBCC) Shrinkfile,
Emptyfile support
•
Storage scaling by adding additional storage drives
•
I/O scaling with multiple disks
3. Full-Text Search: Microsoft
has been invested huge time and money to improve
the Full-Text Search in SQL Server 2012. The improved full-text search
improved in performance and scale, new functionality, including a new semantic-similarity
search capability. From a performance and scale perspective, full-text search
in SQL Server 2012 now scales over 100
million documents; some test cases have even reached 350 million documents.
•
Property
Search: Beginning in SQL Server 2012, you can configure a
full-text index to support
property-scoped searching on properties, such as Author and Title, which
are emitted by IFilters. This form of searching is known as property searching. Whether property
searching is possible on a given type of document depends on whether the
corresponding filter (IFilter) extracts search properties during full-text
indexing. Among IFilters that extract a number of document properties are the
IFilters for Microsoft Office 2007
document file types, such as .docx, .xlsx, and .pptx
•
Customizable
NEAR: Beginning in SQL Server 2012, you can customize a
proximity search by using the new custom NEAR
option of the CONTAINS predicate or
CONTAINSTABLE function. Custom NEAR enables you to optionally specify the
maximum number of non-search terms
that separate the first and last search terms in a match. Custom NEAR also
enables you to optionally specify that words / phrases are matched only if they
occur in order in which specify them.
New Word
Breakers and Stemmers: All the word breakers and stemmers used by
Full-Text Search and Semantic Search, with the exception of the Korean
language, are updated in this release.
4. Semantic Search {New}:
Statistical semantic search extends the full-text search capability by
providing semantic insight into textual content. Statistical Semantic Search
provides deep insight into unstructured
documents stored in SQL Server databases by extracting and indexing
statistically relevant key phrases. Then it also uses these key phrases to
identify and index documents that are similar or related. You query these
semantic indexes by using three Transact-SQL rowset functions to retrieve the
results as structured data.
You can
now include automatic tag extraction,
related content discovery, and hierarchical navigation across similar content.
For example, you can query the key phrases index to build the taxonomy for an
organization or a corpus of documents. Similarly, you might want to query the
document similarity index to identify resumes
that match a job description.
Before an
organization can index documents with semantic search and return relevant
results, the documents must be stored in a SQL Server database. The FileTable and FILESTREAM feature in
Microsoft SQL Server 2012 is a prerequisite.
To
leverage the full-text semantic search capability within SQL Server, a database
administrator must ensure the Full-Text
and Semantic Extractions for Search feature was installed.
For more
detail, visit the link: http://msdn.microsoft.com/en-us/library/gg492075.aspx
5. Spatial Enhancements: Spatial
data typically refers to the geographic
location of features and boundaries
in relation to the Earth, including but not limited to land masses, oceans,
and natural or constructed features.
Spatial
data is stored as coordinates and
topology in either a raster or vector data format that can be mapped.
SQL
Server 2008 supports geometry and geography data types so that organizations
can store spatial vector data. These geometry and geography data types support
methods and properties allowed for the creation, comparison, analysis, and
retrieval of spatial data.
Improvements
in spatial support continue with SQL Server 2012, especially with regard to spatial type and performance.
With the
launch of SQL Server 2012, new
spatial type features are introduced. Specifically, there are new subtypes for
geometry and geography data types that support circular arc segments such as circular
strings, compound curves, and curve polygons. All of the methods
support circular arcs, including circular-arc support on an ellipsoid.
Note: SQL Server 2012 is the first commercial database system to support
spherical circular arcs.
The
geography data type has achieved parity with the geometry data type in the
functionality and the variety of methods that it supports. Overall performance,
from spatial indexes to methods, has significantly
improved. These and other improvements to spatial data support represent a
significant step forward in the spatial capabilities of SQL Server.
6. ODBC for Linux: The
Microsoft SQL Server ODBC Driver for
Linux allows native C and C++ applications to leverage the standard ODBC
API and connect directly to SQL Server 2012.
This
offers database administrators and developers greater flexibility when building
and managing applications that run on Linux and co-exist with the SQL Server
environment.
The
highlights of this release are native connectivity from Linux to SQL Server via
ODBC, support for 64-bit Red Hat Enterprise Linux 5 and 6 and tools such as BCP
and SQLCMD.
7. SQL Azure Enhancements: These
don't really go directly with the release of SQL 2012, but Microsoft is making
some key enhancements to SQL Azure.
Reporting
Services for Azure will be available, along with backup to the Windows Azure
data store, which is a huge enhancement. The maximum size of an Azure database
is now up to 150G.
Also
Azure data sync allows a better hybrid model of cloud and on-premise solutions
8. Big Data Support: I saved the biggest for last, introduced at
the PASS (Professional Association for SQL Server) conference last year;
Microsoft announced a partnership with Hadoop provider
Cloudera.
One part
of this involves MS releasing a ODBC driver for SQL Server that will run on a
Linux platform. Additionally, Microsoft is building connectors for Hadoop,
which is an extremely popular NoSQL platform.
With this announcement, Microsoft has made a clear
move into this very rapidly growing space.
No comments:
Post a Comment