Getting Documents Out of a Sharepoint (MOSS) Database

April 9, 2008

Our migration to the new MOSS farm has gone really well. We had a small hiccup with our CRM integration code, but it was quickly solved and I was able to leave last night a little after 11pm.

So a little background before I get to my utility. Our CRM system contains a custom entity that, on creation, uses a callout to interact with the MOSS web service and create a new site collection that we use to store documents and the like related to that entity. It works really well when MOSS is responding – but last year we had lots of timeout problems, and when that happened the developer wrote a utility to try to create the site again.

It looks like MOSS has somehow become confused about the nature of those sites that were recreated, because they are not browsable in the new farm. Users were getting to them in the old farm, and the documents are all in the database that was copied to the new farm…you just “can’t get there from here.”

Our solution will be to create new sites for those that we know are problems using our existing template and security process, but we needed to get the files out of the database with the folder structure intact so that we could just drag and drop them into the new site shared document lists. Full credit: this is largely taken from a blog post here – the only real change I made was to use a command line argument to select the shared document list for a specific site without having to recompile the program.

Here we go…and I apologize, wordpress “vanilla” doesn’t do a very good job of preserving all my pretty resharper-assisted formatting from Visual Studio.

using System;
using System.Data.SqlClient;
using System.IO;
namespace SharepointFiles
{
class Program
{
static void Main(string[] args)
{
// Connection string to your Sharepoint content database - substitute Server and DB names
string dbConnString = "server=servername;database=WSS_Content_DatabaseName;trusted_connection=true";

// If no command line argument then quit
if(args.Length == 0)
{
Console.WriteLine("ERROR - No Command Line Argument Provided");
return;
}

// The ProgramId is the internal identifier for the site I want to get from the system.
string programId = args[0];

// Set up the SQL Connection and Command
SqlConnection con = new SqlConnection(dbConnString);
con.Open();
SqlCommand cmd = con.CreateCommand();

// Query that will return the file path, name and content from the database - Note the use of the formatted string.
cmd.CommandText =
string.Format(
"SELECT d.DirName, d.LeafName, ds.Content FROM AllDocs d JOIN AllDocStreams ds on d.Id = ds.Id
AND d.Level = ds.Level where d.DirName like 'sites/{0}/shared documents%'
and d.DirName not like '%shared documents/forms%' order by d.DirName, d.LeafName",
programId);

// Execute a datareader for the command
SqlDataReader reader = cmd.ExecuteReader();
while(reader.Read())
{
string dirname = (string) reader["dirname"];
string leafname = (string) reader["leafname"];
// Note that the directory and files will be created from the directory that the exe file is located
if(!Directory.Exists(dirname))
{
Directory.CreateDirectory(dirname);
Console.WriteLine("Creating Directory: " + dirname);
}

// Filestream is used for the file binary data
FileStream fs = new FileStream(dirname + "/" + leafname,FileMode.Create,FileAccess.Write);
BinaryWriter writer = new BinaryWriter(fs);

int bufferSize = 1000000;
long startindex = 0;
long retval = 0;
byte[] outbyte = new byte[bufferSize];

// Create the file
do
{
retval = reader.GetBytes(2, startindex, outbyte, 0, bufferSize);
startindex += bufferSize;
writer.Write(outbyte, 0, (int) retval);
writer.Flush();
} while (retval == bufferSize);

// Finish writing the file
writer.Close();
fs.Close();

Console.WriteLine("Finished writing file: " + leafname);

}

reader.Close();
con.Close();

}
}
}
Advertisements

SharePOINT!!!!!!

April 8, 2008

Oh, how I hate you, Sharepoint. So, so much.

I’ve had nothing but problems with Sharepoint since the 2003 version. And I can’t say that it’s all Sharepoint’s fault. I probably just don’t know what I’m doing wrong, and it seems like we’ve had the three stooges of the Sharepoint contracting community in here to do things.

Our Sharepoint 2003 implementation was actually pretty stable, but for some dumb reason it used our domain account for the SQL Server Agent in some critical areas. It managed to plug along without any real problems, but we never really used it for more than a document repository.

Then along comes MOSS. We decided to upgrade in conjunction with our switch to Microsoft CRM 3 last summer. BIG MISTAKE. Not because of MOSS per-se, but because we couldn’t find anyone competent here locally to help us. One guy essentially disappeared in the middle of the project, and the next one decided to do a reinstall using his own domain account instead of any of the service accounts created for him. In addition, they both managed to completely under-spec the installation, leaving us with a solution that was hobbled due to a bad service account reference AND could hardly crawl its own content.

I found another Sharepoint resource and recommended it to our company as someone who could help reimplement our MOSS farm “for realz,” but the cost ended up being too high for our budget. It was too bad, because I’m certain those folks would have kicked butts and taken names. Luckily we did get some very good training in the process of trying to “smarten ourselves up” about Sharepoint, and we found a local person (a former employee) who was able to do what we needed.

So we have rebuilt our MOSS farm into a 3-server structure – one for front end, one front end for crawling and indexing and a back-end for the database. We have around 120gb of content in MOSS so far and it’ll just keep growing. Hopefully this reimplementation will work out, but as of right now I’m trying to determine if our interfacing systems – including the aforementioned CRM – can cope with the new MOSS structure. Right now CRM seems to be barfing in our custom callout that seemed to work in Staging, so I’m not certain what the problem is. Hopefully the developer will be able to get that fixed so that QA can finish and we can all go to bed…but that may be wishful thinking.

The problem is that I think MOSS is a really cool technology, and if leveraged properly it could really improve our ability to collaborate and interact. I just hope that our most recent iteration will work out – maybe the third time will be the charm.