SQL

In-Memory OLTP is WAY, way cool. NoSQL for the masses!

I’m still playing around with In-Memory OLTP on SQL Server 2014. Already I’m impressed. It seems to be the default way, when I’m writing BI solutions in the future – or transactional systems where there’s a ton of locking issues – to magically resolve contention issues and dramatically improve throughput and performance. We’re talking 5-10x performance gains – for little or no cost in terms of redesign. With sprocs, we’re seeing sometimes 10-30x improvement. Awesome!

Use Case Scenarios

Folks, this new feature is meant for where there’s a lot of contention issues. If you have a data warehouse or where the database is highly structured, it may not be a good fit. In tables where there are a lot of singleton type actions – an atomic relationship – it’s a natural.

The thing to keep in mind is that Hadoop/NoSQL doesn’t fit every scenario. NoSQL is not meant as a replacement for OLTP and has inherent latency issues.

If you want to scale up efficiently, you need to handle transactional and machine-generated data efficiently. You can now buy a 50-core machine with 500GB of memory for 50K – chump change really – and memory is scaling up to terabyte range. CPU, because of inherent heat issues in processing, has stalled out in comparison. So if we’re going to see improvements in SQL performance It has to be in the one area that has really kicked off in the past decade – memory. Frankly speaking, the rules that were in place when SQL Server 7/2000 and above were written no longer apply when we’re talking about a TB or so of memory. The old rules assumed all pages required a potential read from the disk – but today’s lower-cost memory means we need a new costing algorithm. Waits for log writes, latch availability and lock releases can be ignored or minimized.

And, I found it very transparent. You’re managing memory in the same space, and the change is nearly painless. Just some modest changes to the primary key and some db settings, and I was good to go. For example, check out the snapshot below. Half of them are in-memory, half aren’t. Can you tell the difference? I can’t.

Table/indexes are now only in-memory – this is stream-based storage. It’s lock and latch-free, with no runtime compilation of sprocs – sprocs and BL are all natively compiled, meaning a huge performance gain.

Migration Path

It isn’t just moving indexed tables to memory. And it isn’t just pinning the table in the buffer pool (i.e. the venerable and now deprecated DBCC PINTABLE). There will be some changes you’ll have to make to your table structure in handling a migration. This has been described as being NOSQL for SQL Server – without having to dip into the bowels of MapReduce algorithms.

Some things to keep in mind:

  • You will have to make some changes to your database structure to get in-memory OLTP working for you.
  • Data is (or can be) durable and highly available – meaning you won’t lose data after a server crash.
  • Row-based structure – forget about table/index structures as we know it today. A row is an atomic thing. There’s no such concept with a page as you get with an on-disk engine. Rows are versioned and there is no in-place updates. (!)
  • You can create a non-durable table with the data being transient using the SCHEMA_ONLY attribute. (Think for Staging tables or for managing session state).
  • You must have at least one index. There’s two types of indexes – nonclustered hash index (good for point lookups) or memory optimized nonclustered index (for ranges). Engine removes lock/latch concurrent. We’ve seen much reduced bandwidth in the transaction log, latency does remain.
  • One big issue for me – Entity Framework uses ad hoc queries, meaning you won’t have in-memory OLTP benefits with native sprocs.
  • With transactions –
    • Declare isolation level (no locking hints)
    • Use retry logic to handle conflicts/validation failures
    • Avoid using long-runing transactions (duh)
    • Disk-based Snapshot, RepeatableRead/Serializable is not supported. SNAPSHOT, REPEATABLEREAD/SERIALIZABLE are all good. Data migration use Snapshot (for example where you use READCOMMITTED today)

Looking Before You Leap

There’s a management data warehouse where you can analyze the current application for potential migration. This will give you a quadrant view if it’s a good fit.

To run this, I right-clicked on the Management, Data Collection node in SSMS, and selected the Configure Management Data Warehouse option. After a splash screen, I selected my target imoltp database, and left the users/roles unmapped in the final screen. Lots of spinning wheels at this point while SQL Server thought things over and spun up my mdw. This gave me a nice graphical view on whether migrating my tables was a good thing. Here’s the best practices below for analyzing whether you should make the leap:

(note – above is from MSDN article here, http://msdn.microsoft.com/en-us/library/dn205133.aspx )

 

I recommend experimenting with this on a copy of your current database. See if you run the report – R-mouse click at the DB-level or on individual tables – if you don’t see significant benefits.

So for highly transactional systems or for data warehouses, think of this as being a solution.

 

References

Great article by Kalen Delany on the internals – a must-read: http://t.co/6IzfRnJLcu

A great MSDN overview: http://msdn.microsoft.com/en-us/library/dn133186(v=sql.120).aspx

Some Sample Scripts To Play With

 

 

CREATE
TABLE [dbo].[t3_disk](

    [c1] [int] IDENTITY(1,1)
NOT
NULL,

    [c2] [char](100)
NOT
NULL,


CONSTRAINT [PK_t3_disk] PRIMARY
KEY
CLUSTERED

(

    [c1] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =
OFF, IGNORE_DUP_KEY
=
OFF, ALLOW_ROW_LOCKS
=
ON, ALLOW_PAGE_LOCKS
=
ON) ON [PRIMARY]

)
ON [PRIMARY]

 

GO

 

 

SET
STATISTICS
TIME
OFF;

SET
NOCOUNT
ON;

 

— inserts – 1 at a time

DECLARE @starttime datetime2
=
sysdatetime();

DECLARE @timems INT;

DECLARE @rowcount INT
= 100000

 

 

— disk-based table and interpreted Transact-SQL

DECLARE @i int
= 0

DECLARE @c varchar(300)

BEGIN
TRAN

WHILE (@i <10000)

BEGIN

SET @c =
RIGHT(CAST((replicate(‘1’,100)
+
CAST(@i as
VARCHAR(300))) as
VARCHAR(300)), 100)

INSERT
INTO t3_disk VALUES(@c)

SET @i = @i + 1

END

COMMIT

 

SET @timems =
datediff(ms, @starttime, sysdatetime());

SELECT
‘Disk-based table insertion, 100000 records: ‘
+
CAST(@timems AS
VARCHAR(10))
+
‘ ms’;

 

 

–ok 225 ms.

 

 

— Interop Hash

— for bucket size see this article – http://msdn.microsoft.com/en-us/library/dn494956.aspx

–in general you want the bucket size to be between 1-2x the size of the number of unique values in the index

 

 

USE [imoltp]

GO

SET
ANSI_PADDING
ON

DROP
TABLE dbo.t3_inmem

 

CREATE
TABLE [dbo].[t3_inmem]

(

    [c1] int
IDENTITY(1,1)
NOT
NULL PRIMARY

KEY
NONCLUSTERED
HASH
WITH (BUCKET_COUNT=20000),

    [c2] [char](100)
NOT
NULL,

 

–could experiment as well with PRIMARY KEY NONCLUSTERED

)WITH ( MEMORY_OPTIMIZED =
ON
,
DURABILITY
= SCHEMA_AND_DATA )

 

 

SET
STATISTICS
TIME
OFF;

SET
NOCOUNT
ON;

 

— inserts – 1 at a time

DECLARE @starttime datetime2
=
sysdatetime();

DECLARE @timems INT;

DECLARE @rowcount INT
= 100000

 

DECLARE @i int
= 0

DECLARE @c varchar(300)

BEGIN
TRAN

WHILE (@i <10000)

BEGIN

SET @c =
RIGHT(CAST((replicate(‘1’,100)
+
CAST(@i as
VARCHAR(300))) as
VARCHAR(300)), 100)

INSERT
INTO t3_inmem(c2)
VALUES(@c)

SET @i = @i + 1

END

COMMIT

 

SET @timems =
datediff(ms, @starttime, sysdatetime());

SELECT
‘IN-memory OLTP-based table insertion, 100000 records: ‘
+
CAST(@timems AS
VARCHAR(10))
+
‘ ms’;

 

–105 ms.

 


 

Advertisements

Code snippet – expanding hierarchies in SQL Server

I seem to remember posting on this subject back in December – but I can’t find it. Sigh. (note – just found the post, here – it’s down a ways but still there.) I have a table called Symptom that contains a hierarchical structure – i.e. a ID field and a ParentID field, in this implementation. (Yes I could have used HierarchyID – we opted not to, after looking at all our options.)

Note we could have used recursion. If this was >10 levels or so, I would do that. But SQL is great at joins and they execute very fast; we’ll use this approach for now. See http://www.codeproject.com/Articles/16032/Expanding-SQL-Hierarchies-The-Dualistic-Approach or the Microsoft SQL Server Bible 2008 by Paul Nielsen for more on expanding hierarchies and the adjacency pattern.


-- Retrieve the whole hierarchy from level 1 to level 5
declare @t table (
l1 int, l2 int, l3 int,  l4 int, l5 int, id int, depth int)

-- Populate @t: expand msg hierarchy for levels 1-5
insert into @t
select m1.SymptomID, m2.SymptomID, m3.SymptomID, m4.SymptomID, m5.SymptomID, null, 1
from Symptom as m1
left outer join Symptom m2 on m1.SymptomID=m2.ParentID
left outer join Symptom m3 on m2.SymptomID=m3.ParentID
left outer join Symptom m4 on m3.SymptomID=m4.ParentID
left outer join Symptom m5 on m4.SymptomID=m5.ParentID
where m1.ParentID is NULL

-- Calculate node level for each node and get tree depth
declare @depth int
update @t set depth = depth + 1 where l2 is not null
update @t set depth = depth + 1 where l3 is not null
update @t set depth = depth + 1 where l4 is not null
update @t set depth = depth + 1 where l5 is not null
select @depth = max(depth) from @t

-- Since we have made several joins, we have only leaf nodes of level 4 in  -- @t. Add missing leaf nodes of level 1
insert into @t select distinct l1, NULL, NULL, NULL, NULL, NULL, 1 from @t where l2 is not NULL
-- Add missing leaf nodes of level 2
insert into @t select distinct l1, l2, NULL, NULL, NULL, NULL, 2 from @t where l3 is not NULL
-- Add missing leaf nodes of level 3
insert into @t select distinct l1, l2, l3, NULL, NULL, NULL, 3 from @t where l4 is not NULL
-- Add missing leaf nodes of level 4
insert into @t select distinct l1, l2, l3, l4, NULL, NULL, 3 from @t where l5 is not NULL

-- Populate id field, get the rightmost msg id from @t
update @t set id=coalesce(l5, l4, l3, l2, l1)

select id
, depth
, SymptomFullDesc = 
CASE WHEN depth = 1 THEN l1.SymptomDesc 
WHEN depth = 2 THEN l1.SymptomDesc + ' > ' + l2.SymptomDesc
WHEN depth = 3 THEN l1.SymptomDesc + ' > ' + l2.SymptomDesc + ' > ' + l3.SymptomDesc 
WHEN depth = 4 THEN l1.SymptomDesc + ' > ' + l2.SymptomDesc + ' > ' + l3.SymptomDesc + ' > ' + l4.SymptomDesc
WHEN depth = 5 THEN l1.SymptomDesc + ' > ' + l2.SymptomDesc + ' > ' + l3.SymptomDesc + ' > ' + l4.SymptomDesc + ' > ' + l5.SymptomDesc
END
from @t t
left outer join Symptom l1 on t.l1 = l1.SymptomID
left outer join Symptom l2 on t.l2 = l2.SymptomID
left outer join Symptom l3 on t.l3 = l3.SymptomID
left outer join Symptom l4 on t.l4 = l4.SymptomID
left outer join Symptom l5 on t.l5 = l5.SymptomID
ORDER BY depth, SymptomFullDesc

EntityFramework – “The provider did not return a ProviderManifest instance”

Quick post for the day. I was trying to configure a data source in a listview – converting over to EntityFramework from a SQL data source – and I got this funky message:

Hmmm. Why is this? Well, I had to open up my data models project, and change the ProviderManifestToken=”2012″ attribute to “=2008”.

\

And voila. Changing that one key – and this is a known bug with EntityFramework 6 that the team has on their list to fix – got me where I needed to go. I had my lot tracking admin screen up and going in about 5 minutes.

Webforms Versus MVC – Honestly, Who Cares?

P.s. – Webforms, rightly, have a bad name – because of bloated codebehind. But what happens when you drag on a listview, and click on the little arrow in the Design pane – and point to an EF data source with a PK? Suddenly, you have a fully functioning page – with NO codebehind – and all your upd/ins/del statements are coming straight from EF. That seems pretty clean to me.

To me – and to many others – ASP.NET Webforms and MVC are equivalent forms. IF your webform has no bloated codebehind, and is running from either a EF-generated model or, better, knockoutJS data binding – don’t you have a clean and easy to maintain architecture?

Webforms are bad if you have bloated codebehind. They’re good and equivalent to MVC without that code smell. MVVM – well, that is a superior pattern to both – but that’s for a future post.

Admin database revisited…

I’ve attached a fresh copy of the Admin database SQL. I found this an easy-peasy method of adding backups and making databases self-service.

Admin database setup

Here’s the steps – there’s a more detailed blog article on this elsewhere on my site –

1. create a database called ‘Admin’

2.

setup a weekly job runnning the following steps – either as a sproc or as a set of steps (recommended)

    1. EXEC dbo.Daily_Data_Collection – fills daily report

    2. EXEC GetLargeDBs (index fragmentation pt 1)

    3. EXEC usp_CollectFragmentation – index fragmentation pt 2)

    4. exec usp_IndexMaintenance – can run once a day, or weekly

    5. EXEC usp_RunCheckDb – runs checkdb on all db’s over 50 gb

    6. exec usp_UpdateStat_OnAllDbs – update database statistics


3. Get backups going:

    1. enable adavance options if not enabled on MASTER

        EXEC sp_configure ‘show advanced options’, 1

        RECONFIGURE

        EXEC sp_configure’xp_cmdshell’, 1

        RECONFIGURE

    2. Then set up a job weekly for System –

        EXEC usp_Backup_db @path = ‘C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\’, @dbType = ‘System’, @bkpType = ‘Full’, @retention = 1, @liteSpeed = ‘N’

        And a similar one for User:

        EXEC usp_Backup_db @path =’C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Backup\’, @dbType = ‘User’, @bkpType = ‘Full’, @retention = 1, @liteSpeed = ‘N’

Do these things and you’ll have an Admin database that tracks the health of your system, and a more healthy set of indexes/backups.

Learning from my mistakes on SQL…

Here’s some design mistakes I’ve made in the past… learn from my example and shun these anti-patterns!

  1. Normalization is here for a reason. Normalization defines a set of methods to break down tables to their constituent parts until each table represents one and only one “thing”, and its columns serve to fully describe only the one “thing” that the table represents. If you find yourself using string operators in WHERE clauses a lot (SUBSTRING, CHARINDEX, LIKE) to parse out a value in a column – you’ve missed the boat design-wise and your data will become less and less searchable as you move away from the SQL paradigm. (And don’t get me started on COLDESC1, cOLDESC2, etc)
  2. Natural keys are the thing, man. Think carefully before you use an identity or a GUID as your only key. You should reconsider your design if you can’t pick out which row you want from a table without knowledge of the surrogate key. Each of your tables in general should have a natural key that means something to the user and uniquely identifies each row in the table – i.e. go with PartNumber over an integer PartID GUID. In the extremely rare event that there’s no natural key (like with an audit table) – then and only then use a surrogate key.
  3. Protect your data, dammit. Your tables should have constraints on nullability, string length, FK assignment, etc.
  4. Naming standards are actually important. For example, think about X304E_DESCR. It’s important to be CONSISTENT above all and be descriptive – this is your first line of defense when it comes to documentation. Make your names clear, simple and descriptive – and don’t include metadata in the column names.
  5. Don’t use one table to hold all your domain values. Remember that SQL works in sets, according to normalization rules. If you’re doing joins like “AND customer.TableName = ‘Customer’ and customer.RelatedToColumn = ‘descr’, well jeez… that’s a performance killer and will quickly make your queries maddeningly hard to write and update. Remember, tables should represent one ‘thing’ – and one thing only.
  6. Avoid dynamic SQL unless there’s truly no other choice. The same reason above applies to trying to code generic T-SQL objects – such as accepting a table or set of column values as a parameter. It seems neat and tidy to do this initially as it can handle multiple tables with one sproc – but we don’t think of the performance implications of misusing sp_executeSQL in this way. Sometimes there’s no shortcut to writing CRUD statements against the entities directly – or using EF.

Looking through this article on the interwebs, I’m thinking a lot of my experience here isn’t unique. Tip of the hat to that blogger and the great books from Celko and the Guru series from Ken Henderson.