Friday, September 14, 2012

Attivio SharePoint 2010 search - what you need to know as SharePoint architect and administrator



As an enterprise search initiative, our company started to evaluate Attivio search engine that should add world-class content analysis, search and navigation. Attivio’s Active Intelligence Engine™ (AIE) indexes content and metadata, performs advanced linguistic and context analysis, delivers permission-aware, relevance-ranked search and content navigation. One of the search integration targets is SharePoint 2010. Since there is limited architecture diagram and document on the architecture how Attivio integration with SharePoint and what are the impacts to SharePoint, this blog will cover the Attivio architecture, Attivio SharePoint connector, limitation of the Attivio SharePoint integration so you could refer to manage the Attivio integration with SharePoint.

Attivio architecture has three major layers including Endpoint API layer (Ingestion) to crawl all the contents, Universal Index layer to create indexes, and Query API layer to expose search. There are other services including ingestion services and asynchronous workflows for cleansing and enriching content before it is persisted in the Universal Index, system services for backup and logs, and Transport Layer enables workflow communication and distribution across one or many nodes. The detailed architecturediagram is listed below.




Attivio SharePoint connector supports all SharePoint lists including document libraries, calendar, tasks, issues, discussions boards and all SharePoint objects as well as a read/write feature. It also gives users access to all site collections in a farm, including subsite connection.

Attivio SharePoint connector installation is very simple and one wsp solution named entropysoft.sharepoint.webservice.wsp will be deployed to SharePoint farm. Since there is very limited documentation on how the connector works, we will dig into at what components will be deployed so we will be able to understand how it works.

After Attivio SharePoint connector installation, the following changes will be made to SharePoint farm.


1. One farm solution named entropysoft.sharepoint.webservice.wsp deployed globally

2. Four dll files deployed to assembly GAC  
  • Entropysoft.Sharepoint.WebService
  • Entropysoft.WebConfModif
  • log4net.dll
  • Microsoft.Web.Services2.dll
 3. Two web services files will be deployed to ISAP folder
  • sharepointConnector.asmx
  • sharepointConnectorwsdl.aspx
4. One web service entry below will be added to all web.config configuration session 


  <location path="_vti_bin/sharepointConnector.asmx">
    <system.web>
      <authorization>
        <deny users="?" />
      </authorization>
      <webServices>
        <soapExtensionTypes>
          <add type="Entropysoft.Sharepoint.Webservice.ExceptionSoapExtension, Entropysoft.Sharepoint.Webservice, Version=4.5.91.0, Culture=neutral, PublicKeyToken=08ab0f4d3c6ea37b" priority="2" group="0" />
          <add type="Microsoft.Web.Services2.WebServicesExtension, Microsoft.Web.Services2,Version=2.0.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" priority="1" group="0" />
        </soapExtensionTypes>
      </webServices>
    </system.web>
  </location>



After the connector installation, you could verify the connector through web service. The URL is http://<servername>/<sites>/_vti_bin/sharepointConnector.asmx. You could view the wsdl by appending the ?wsdl as you normally do for all other SharePoint web services.


If you add a service account through central admin to have read access all the webapp site collection and pass only webapp root site collection URL, your have complete the SharePoint side installation and configuration. The Attivio crawling process will use the web service call GetSiteCollectionsUrls to retrieve all site collections inside the webapp and then call web services to index all content and metadata inside the site collection. After the first full crawling, the connector will use web service call GetChanges to index any future changes.


As SharePoint administrator, you may be concerned on the performance impact to the system especially on the first FULL crawling process. You should conduct the performance testing on the crawling process on non production environment and schedule this on non working hours in production.

Now, you should feel comfortable to manage the Attivio SharePoint 2010 connector installation, configuration, and support. We will focus on some of the issues in next blog.


No comments:

Post a Comment