SCCM – Management Point Installation Troubleshooting

October 4, 2019   //   Cloud Microsoft System Center, ,

System Center Configuration Manager (SCCM) can be a great endpoint management solution for your on-premises IT infrastructure. Yet as many SCCM admins can attest, the software is quite complex, and there are many subtle places where things can go wrong. In this blog we’ll explore some troubleshooting tips that can be used to diagnose and remediate challenges with the SCCM Management Point (MP) role.

The MP is required for status messages and policy data to be passed between client and Primary Site Server. Hence, if you MP isn’t healthy – and especially if you only have one MP – your overall SCCM hierarchy will have “gone gray” (more on this later). To mitigate the risk of individual MP failure, you can collocate the MP role on Distribution Points in different sites; this is a bit less heavy-handed than employing a Secondary Site Server.

MP Signing

Each MP uses a “Server Authentication” certificate to sign its requests. Evidence of a certificate problem can manifest very early in the PXE process while “looking for policy”; it hangs at “Waiting for Approval”.

You’ll see something reported back in the smspxe.log like “RequestMpKeyInformation Send() failed.

Assuming you do get into WinPE, if the MP certificate is expired, or has recently changed, you may see errors like “Retrieving policy for this computer…” timeouts after entering WinPE and typing the PXE password. In this case, it never presents you a list of available Task Sequences and results in 0x80004005 (Access Denied!), as shown in smsts.log.

To fix this condition, generate a new certificate and bind it to the IIS website. It is recommended to use Active Directory Certificate Services (ADCS) for this, so the certificate will be automatically trusted throughout the domain.

After switching out the MP certificate, wait 30 minutes for it to send a site system update notification from its outbox. Then restart its service, checking mpcontrol.log to make sure it is clean and choosing the certificate you intend.

Query the service to confirm it is running.

You can try rebuilding the boot images at this point, which will re-inject the new MP certificate into the WIM, and then retry PXE booting. But in either scenario, you should also check the Monitoring section of the SCCM console. If there’s a health issue beyond just the certificate, it will likely reveal the status of SMS_MP_CONTROL_MANAGER is red.

Assuming a certificate swap doesn’t correct the issue, you must then proceed to repair the MP.

MP Repair

The Management Point is likely somehow broken if a majority (or all) of your clients have “gone gray”, showing an X in their icons, as in the screenshot below. One reason this might happen is if the MP couldn’t reinstall itself following a site minor update KB being applied. If the MP is out-of-service for a longer period, such as when it has been uninstalled and left to sit without the role overnight, all clients will have transitioned to a gray question mark:

CompMon.log shows the MP isn’t accessible because it’s pending reinstall.

Monitoring shows more detail:

In this case, investigation revealed it didn’t have permissions, even though the computer account was in the local Administrators group via nesting; the fix was to just put the Primary Site Server’s computer object in the group directly and reboot. Wait and see if SiteComp can make progress after that.

Even after doing this, the installation may still be stalled because the MP requires CcmExec to be stopped before it can proceed with its Custom Actions; sometimes the service is busy and unable to stop itself. The MpSetup.log will hold there if that’s the case. Manually trigger the service stop or kill the process. MpMsi.log will process once it goes along.

<06/05/19 10:47:17> SqlNativeClient is already installed (Product Code: {9D93D367-A2CC-4378-BD63-79EF3FE76C78}). But to support TLS1.2, a new version with Product Code: {B9274744-8BAE-4874-8E59-2610919CD419} needs to be manually installed.

This message will get logged even if you’ve already installed SQL 2012 SP4: sqlncli_GDR_11.4.7462.6_x64.msi – which does indeed support TLS1.2, as per TLS 1.2 support for Microsoft SQL Server. You should update to this version as a security best practice.

Hopefully your MP will start working again. But if not, there’s one other thing to check on: the SMS Notification Server.

SMS Notification Server

Even after all that, if all clients are still showing a gray X – although the MP appears otherwise healthy – then you probably have a problem with the SMS Notification Server. Note that the above repair only performed an upgrade, not a full uninstall/reinstall. According to this, it seems that in rare cases (such as after an in-place OS upgrade or when the computer account properties significantly change in AD) the SMSBGB service’s identity can become messed up – which is exactly what was done in this instance. The telltale sign of this bad state is indicated by the message:

“Notification Server on MYSCCM.MYDOMAIN.LOCAL failed to initialize. The operating system reported error 2147500058: The server process could not be started because the configured identity is incorrect. Check the username and password.”

Review these logs for additional detail: BgbServer.log, BgbMgr.log, BgbIsapiMsi.log, BgbSetup.log, DmpMsi.log.

You may also see: “Error: Failed to create BgbServerController instance 8000401a”

This error condition can only be remedied by doing a full uninstall of the MP role. It will warn you that clients will lose the ability to communicate with the server, but since that is already the case when things have “gone gray”, just say yes and commence the removal.

After the MP uninstall finished, it took about 15 minutes more before BGB deinstall kicked off via Site Comp – so be sure to wait for that thread to exit before attempting to reinstall the role. Restart SiteComp if impatient, but you need to wait for the log to show “SMSBGB deinstall exited with return code 0”.

If you notice that after uninstalling the role, it has not deleted the service – that’s the problem!

C:\Windows\system32\dllhost.exe /Processid:{ABDD4508-6A7D-4F27-8D91-1496133A3F57}
And the service still won’t start…

While troubleshooting, it was observed that trigging the MP role removal is supposed to properly delete the service while a reinstall of the role will properly re-add the service. All this points to the initial identity error message. To fix the identity, explicitly delete the leftover service after uninstalling the role.

 

It went into a “disabled” state. After a reboot, it was finally deleted by the system, and you could retry fixing it yet again.

Now reinstall the MP role (ensure SMS_NOTIFICATION_SERVER is gone from services.msc before you do).

After the MP itself is reinstalled, first it also removes/reinstalls the DMP sub-role, before finally getting around to BGB.

Finally, after about 15 minutes, it got correctly installed and was able to be restarted as needed and stay running!

It’s possible there may yet be some issues getting the CCM agent reinstalled after that though, but now we’re pretty much out of the MP troubleshooting realm and back to the usual CCMExec troubleshooting.

If stuck with a repeating “MP Certificate Maintenance” for more than 15-30 minutes, first try restarting CcmExec and SmsExec a few times. Be patient and wait another 30 minutes because sometimes the process takes a while to kick off.

If still stuck with an infinite “MP Certificate Maintenance”, according to this and that, you’ll need to run CcmSetup.exe /uninstall on the MP, then again attempt to uninstall/reinstall MP role. That should get you out of it.

Checking MpControl.log will show Status Code 500 internal server error when it is stuck in MP certificate maintenance. When it comes back to health after removing the CCM client, it flips to status code 200 – OK and updates the registry.

We’ve made some progress. The service still finally starts and stays running. You’ll indeed see this is what listens on TCP 10123 for client notifications on their connectivity status.

What you want to see is the MP and Notification Service both running truly healthy and picking up active clients. If you see the message “the queue for BGB server doesn’t exist” just ignore it if it starts to pick-up clients 300 seconds later (this was a transitive issue because of force-killing the service and went away afterwards).

Hopefully this will enable you get your SCCM Management Points back on track! If you need more help with your System Center infrastructure, SWC Technology Partners has a team of Microsoft Certified Engineers that can provide comprehensive support for all your on-premises and in-cloud Information Technology platforms.