The joys of importing Excel data into SQL 2008 R2

One of the things I find myself doing every day is teaching apps developers how to do stuff with these principles in mind:

  1. Least privilege
  2. Smallest attack surface

Unfortunately, most apps developers in my workplace:

  1. Are administrators on both their workstations (I'm OK with this) and their development servers (not so OK with this), which means they:
  2. Install heaps of shit they don't need, particularly on the dev server
  3. Configure that shit to run under their own administrative user account (!!!!)
  4. Forget they've installed that shit
  5. After a period of time, don't remember if that shit is necessary for their code to run
  6. Install all the server-side components of their client/server app onto one single dev server.  This blurs the lines.  Is it the web part of the app that performs task X, or is it the SQL part?  Who knows.  The apps developers don't.

So when the time comes to provision their production environment, I get requests like this:

  1. Install Excel on the SQL box and the web server box, and actually any other box that our app uses, because we can't remember which box actually needs it
  2. Enable xp_cmdshell on the SQL box.  We're not really sure why, but we need it.
  3. And while you're at it, we need you to create scheduled tasks on all the boxes (because we can't remember what box does what), and that task needs admin rights because otherwise it doesn't work.  But we don't really know why it doesn't work.

I wish I was making this shit up.  But this is what I've spent the last week extracting from a particular group of people.  Of course, when I say NO to this sort of crap, it becomes my problem to come up with an alternative that uses least privilege, whilst installing the smallest possible amount of stuff to make it work.

I'll write a separate post about the xp_cmdshell thing, but for now I want to deal with the Excel thing.  I don't believe that any SQL server should have anything installed on it aside from SQL.  Excel is a user app, not a server app.  And I don't believe for a second that SQL "needs" it to import Excel data.

So.  Getting onto the heart of this post.  How do we get that Excel data into a SQL server database?

It would be nice if something like this would just work.  It doesn't:

SELECT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=c:\downloads\ImportTest-XLS.xls;Extended Properties=Excel 8.0')...[Sheet1$]

This code was snaffled from a MS article, BTW, http://support.microsoft.com/kb/321686.  The article also makes reference to linked servers, but for my purposes I just want to run ad hoc queries on transient spreadsheets.

We get this error message:

Msg 7308, Level 16, State 1, Line 1
OLE DB provider 'Microsoft.Jet.OLEDB.4.0' cannot be used for distributed queries because the provider is configured to run in single-threaded apartment mode.

 

Hmm.  WTF does that mean?  Believe it or not, it's not because you need to do something about apartment mode, whatever the hell that is.  It's actually SQL's obscure way of complaining about the data source driver.  For 2007 and 2010 versions of Excel files, you need to install a provider that knows how to deal with these files.  By default, SQL 2008 R2 doesn't have one:

 

 

Get a hold of the Microsoft Access Database Engine 2010 Redistributable (don't worry about the "Access" in the title, just download it).  Make sure you get the 64-bit version if you're running SQL 2008 R2.

Install it.  If you get this message, it's because you have a 32-bit installation of Office or one of its components.  Naughty!  Uninstall it.

Uninstalling the Office components might require a reboot.  My experience with the Access driver is that you'll need to reboot your server again (or possibly just restart the SQL services – I haven't tried this in my test environment yet).  So if you're doing this on a production box, be sure to schedule an outage.

You'll see a new provider in the list:

Let's try running that TSQL code again:

SELECT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=c:\downloads\ImportTest-XLS.xls;Extended Properties=Excel 8.0')...[Sheet1$]

We get the same error message again because now we are using the wrong provider.  Let's change that:

SELECT * FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'EXCEL 12.0;Database=c:\downloads\ImportTest-XLS.xls;HDR=YES;IMEX=1',
'SELECT * FROM [sheet1$]');

Now we get another error message, though thankfully less obscure:

Msg 15281, Level 16, State 1, Line 1
SQL Server blocked access to STATEMENT 'OpenRowset/OpenDatasource' of component 'Ad Hoc Distributed Queries' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ad Hoc Distributed Queries' by using sp_configure. For more information about enabling 'Ad Hoc Distributed Queries', see "Surface Area Configuration" in SQL Server Books Online.

 

OK.  Let's enable that, shall we?

sp_configure 'show advanced options', 1;
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;
GO

Let's try again:

Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)" reported an error. Access denied.
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "Microsoft.ACE.OLEDB.12.0" for linked server "(null)".

Oh for Christ's sake.  What now?

There are some things that need to be enabled on the provider itself:

USE [master]
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NestedQueries', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NonTransactedUpdates', 1
GO

Most articles I've seen recommend just the first two settings, but in all my experiments I just couldn't get it to work.  I found an article that mentioned the last two settings and had success with that.  Your mileage may vary.

Let's try it again!

SELECT * FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'EXCEL 12.0;Database=c:\downloads\ImportTest-XLS.xls;HDR=YES;IMEX=1',
'SELECT * FROM [sheet1$]');

Holy snapping duckshit!  It WORKS!

First name Last Name
Firstname1 Lastname1
Firstname1 Lastname2
Firstname3 Lastname3

You'll also be able to read .XLSX files:

SELECT * FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'EXCEL 12.0;Database=c:\downloads\ImportTest-XLSX.xlsx;HDR=YES;IMEX=1',
'SELECT * FROM [sheet1$]');

And CSV files, though I suspect you'd be able to do this without the driver:

SELECT * FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'Text;Database=C:\Downloads\;HDR=Yes;',
'SELECT * FROM [ImportTest-CSV.csv]');

Note the syntax for a CSV is slightly different.  Also note that in all of them, the last parameter makes reference to the sheet name.  So make sure you've got that last part right.

So, in summary:

  1. Remove any existing Office components and reboot.
  2. Install the Microsoft Access Database Engine 2010 Redistributable and reboot
  3. Run the following TSQL commands:
USE [master]
GO

sp_configure 'show advanced options', 1;
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
RECONFIGURE;
GO

EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NestedQueries', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NonTransactedUpdates', 1
GO

 

And you are good to go!!!!!!!!!!!!!!!!  laugh

 

Now that you can actually read the damn files, you can import them into a table:

SELECT * INTO SomeTable
FROM OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'EXCEL 12.0;Database=c:\downloads\ImportTest-XLS.xls;HDR=YES;IMEX=1',
'SELECT * FROM [sheet1$]');

Note that the only change in the query is the "select into" part.  The rest is the same, and this applies to the XLSX or CSV import.

The SELECT INTO will overwrite your table, so if the plan is to add rows to it, use this sort of query:

INSERT INTO SomeTable
SELECT * FROM
OPENROWSET(
'Microsoft.ACE.OLEDB.12.0',
'EXCEL 12.0;Database=c:\downloads\ImportTest-XLSX.xlsx;HDR=YES;IMEX=1',
'SELECT * FROM [sheet1$]');