Wednesday, October 24, 2012

Tips to get accurate list of user created documents exclude system ones from SharePoint database



There are some use cases and scenarios where you need to retrieve SharePoint site collection information quickly without performance suffer through object model that will iterate all sub-sites and lists. One of the use cases is to evaluate third party search crawling index process accuracy on the site collection. We would like to get the accurate list of all documents exclude system ones in a site collection to compare the crawling result. The key here is to exclude system documents.

There are many different ways including using server object model, client object model, PowerShell, search API, SOAP/REST web service, or 3rd party tools. Another way, and a really quick one, is querying directly the SP content db tables or views. Since reading from a SharePoint DB (with the exception of the logging database) might leave it in an unsupported state indicated in KB article, I would suggest doing this in non-production environment. You could also confirm the script with Microsoft before run on production.

If you read the SP content db schema, you might find several SQL scripts to get list of documents from SharePoint database. Some of them are from SP 2007 that will no longer work and some of them will list many system files/documents. I modified the SQL script and added several enhancements so we could get accurate list of user created documents exclude system ones from SharePoint database without blocking other queries. Here are some tips.

  1. Use Docs view instead of AllDocs table to simplify query
  2. Join Docs view with UserData view to exclude system files/documents 
  3. All (NOLOCK) hint to both Docs and UserData views to prevent performance degradation
  4. Filter out file type to exclude system like files. Example is LeafName LIKE '%.stp' to exclude solution files
  5. Filter out directory type to exclude system like files. Example is filesDirName LIKE '%_catalogs%' as branding files
 Here is the script and you might need to exclude more additional system like files/documents.

SELECT

distinct DirName AS Directory,

UserData.nvarchar7 AS Title,

LeafName AS DocName,

Docs.Size AS Size,

Docs.TimeCreated AS Time

FROM Docs AS Docs WITH (NOLOCK) INNER JOIN UserData(NOLOCK) as UserData ON (Docs.DoclibRowId = UserData.tp_ID) AND Docs.ListId = UserData.tp_ListId

WHERE (DoclibRowId > 0)

AND Docs.ListID = UserData.tp_ListID
AND Docs.DoclibRowId = UserData.tp_ID
AND Docs.Type <> 1
AND (LeafName NOT LIKE '%.stp') 
AND (LeafName NOT LIKE '%.aspx')
AND (LeafName NOT LIKE '%.xfp')
AND (LeafName NOT LIKE '%.dwp')
AND (LeafName NOT LIKE '%template%')
AND (LeafName NOT LIKE '%.inf')
AND (LeafName NOT LIKE '%.css')
AND (LeafName NOT LIKE '%.master')
AND (LeafName NOT LIKE '%.xml')
AND (Docs.Size > 1)
AND (DirName NOT LIKE '%_catalogs%')
AND (DirName NOT LIKE 'personal%')
AND (DirName NOT LIKE '%Style%')
AND (DirName NOT LIKE '%SiteCollectionImages')
AND (DirName NOT LIKE 'mysites%')
AND (DirName NOT LIKE '%Publishing%')
AND (DirName NOT LIKE '%OntolicaStyles%')
AND (DirName NOT LIKE '%Data Connections for PerformancePoint%')
AND (DirName NOT LIKE '%ProjectBICenter/Templates%')
AND (DirName NOT LIKE '%Data Connections%')
AND (DirName NOT LIKE '%Reporting Templates%')
AND (DirName NOT LIKE '%FormServerTemplates%')
AND (DirName NOT LIKE '%ProjectBICenter/Sample Reports%')
AND (DirName NOT LIKE '%Pages%')

ORDER BY Directory


There are some other useful SQL query on users on the site collection and list data you could play around. Please remember to use these on non-production environment and consultant with Microsoft support before execute on production.

No comments:

Post a Comment