SharePoint 2013 Web Applications not creating on all servers in the farm, farm solutions not deploying…


I figured this was worth Sharing after coming across some rather odd behaviour during the provisioning of a new SharePoint farm.

Now before I start, I will say that it’s not me that builds & configures the servers.

So We use a fully scripted setup via Powershell for our SharePoint farm provisioning process, it gets a farm configured joins servers to the farm, set’s up our services, configured search topology, get’s everything as we want for our system.., in this particular environment, the other servers all joined the farm without errors, but the Web Application in IIS was only getting created on one of the three WFE servers in the farm, rather than all of them, the server that was getting the IIS site created was the one where the PowerShell script was being run..

I know that from experience, SharePoint creates it’s IIS Web Applications on other servers in the farm via a “Web Application Provisioning” timer job that run’s on each server joined to the farm, so the problem had to be something to do with this job not running on all the servers, usually a web application appears on all WFE simultaneously or at least within a minute or so of issuing the PowerShell commands from one server to create it.

So I check event logs looking for errors in the timer service on the other WFE servers, nothing, I looked into the ULS logs for errors from OWSTIMER.EXE, nothing, I scratched my head and then slept on the problem, the next morning I was none the wiser, but checking the problematic environment I noticed that the missing WebApps had appeared in IIS while I was away over night.

Hmm I put this one down to experience and moved on, thinking nothing more of this until we tried to deploy some custom farm solutions, again via powershell, on the same farm, the WFE server the powershell was running on WSP deployed straight away, DLL’s GAC’ed etc.. on other two servers nothing deployed to GAC, and the Farm Solutions were stuck at the deploying phase…

WSP’s are again deployed to all servers via a timer job process, thinking about the two issues I started to wonder what the commonality between them was.

It was at this point as I switched from server to server via the Remote Desktop Connection Manager that I spotted the problem, I noticed the time on the windows desktop was different between servers, somehow the time zone was mismatched during the server builds between the WFE server the PowerShell scripts were running on and the other two WFE Servers, the other 2 WFE were 2 hours behind, UTC+8, rather than UTC+10.

I adjusted the time zone of all the servers to match and then WSP files started deploying correctly and also IIS web sites were getting created without issue.

So looks to me that job scheduling via the timer service doesn’t use UTC it uses local server time to decide when to run a job, so if your servers have been configured with different time zones, then you will get some strange behaviour from SharePoint timer jobs, a nice little thing to add to the farm server build QA check list (ensure servers are all set to the same time zone :-))

Thanks for reading.

SharePoint 2013 Configuration Database Registry Key Corruption


Not sure if you’ve ever suffered from this problem, but if you have it can be puzzling as to what is causing it.

From what I’ve found out versions of SharePoint 2013, before the November 2014 CU have an issue that can cause the following registry key to become corrupt.

HKLM\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\15.0\secure\ConfigDB

This causes your WFE or Application server to then act as though it has been removed from the SharePoint farm.

Any attempt to run the PSConfig.exe (via powershell) or the Product’s and technologies config wizard to join the server back to the farm ends up with an error message either on the console or in the PSCDiag’s file in your logs folder, with a stack trace that looks similar to.

System.ArgumentNullException: Value cannot be null.
Parameter name: server
at Microsoft.SharePoint.Administration.SPSqlConnectionString.GetSqlConnectionString(String server, String failoverServer, String instance, String database, String userName, String password, String connectTimeout)
at Microsoft.SharePoint.Administration.SPSqlConnectionString..ctor(String connectionString)
at Microsoft.SharePoint.PostSetupConfiguration.InitializeTask.GetSetupType()
at Microsoft.SharePoint.PostSetupConfiguration.InitializeTask.Validate(Int32 nextExecutionOrder)
at Microsoft.SharePoint.PostSetupConfiguration.TasksQueue.Validate(Boolean useDefaultExecutionOrder)

What is happening here, is that the details of the connection string to the config DB are lost and the key in the registry end’s up with just the “ConnectionTimout=XXX” value rather than the full string that should read something like “Data Source=SQLServer;Initial Catalog=SP_Config;Integrated Security=True;Enlist=False;Pooling=True;Min Pool Size=0;Max Pool Size=100;Connect Timeout=45”

I believe the issue is related to a non thread safe dictionary object being used to rebuild the connection string this rebuild process can occur during an APP Pool recycle.

Their appears to be a fix in the November 2014 CU that resolves the problem, so if you are experiencing issues with the registry key becoming corrupt, you have two options.

Either deploy November 2014 CU or later, or if this isn’t feasible due to the complexities of updating multiple production servers, you can remove the write permissions of all users on the registry key, so they can read it, but not be able to update it.

This later step is a work around, and may cause issues if you attempt to remove and re-add the server to another SharePoint farm in future, but it will prevent the issue from re-occurring until you can update your SharePoint 2013 platforms with the appropriate or later CU.

Thanks for reading.