An Adventure in Troubleshooting the Intune Connector

Note The procedures below are also published here in a more step-by-step format (less backstory/discussion): http://social.technet.microsoft.com/wiki/contents/articles/29847.troubleshooting-the-intune-connector-for-system-center-configuration-manager.aspx

 

Recently - I was troubleshooting a device enrolment problem in my test environment. I had been trying to enrol an Android device into Intune and it was enrolling on the device but I couldn't get it to appear in Intune. I followed the usual steps I walk through to troubleshoot the Intune Connector on Configuration Manager but to no avail. I started to think it wasn't my environment and started down a typical "if it’s not me, then it must be you" route of thought. Turns out it was my environment and it was replication related (but not DRS) - read on to find out more.

The backstory

I had a hardware failure on my Hyper-V server over the holiday that caused some corruption in my VMs (it was a weird issue where the disks in a Storage Pool slowly failed but while failing bounced up and down). Luckily, I had a backup and proceeded to restore the impacted ConfigMgr primary site and that fixed things (or so I believed at the time). Everything was working just fine in the environment - objects were replicating up in the environment and I was doing some testing of some Android scenarios. I could get object metadata to flow into Intune and appear on the device so I assumed all was well.

I then tried to enrol an Android device - that worked just fine but the device didn't appear in the console. I assume that I was being impatient, walked off and got busy with my day job and life in general. I came back and found it wouldn't enrol and did more troubleshooting.

The troubleshooting - the device

1. I began by trying to un-enrol the device and for the deletion to be replicated to get settled then re-enrolling it.

2. I began trying to enrol various Android devices to isolated if it was a device issue…nothing worked.

3. I then tried enrolling Windows Phones to see if there was a platform specific issue. That didn't work either.

The troubleshooting - the connector

At this stage, I needed to move on to the Intune Connector. I knew that it was like a download issue, as I could get objects into the service.

1. Checked dmpdownloader.log to see if the Intune Connector's download component was able to pull messages from the service. That looked just fine, as I’d expect it to be. Message like the following were the last thing in the log:

Received 1 messages  SMS_DMP_DOWNLOADER 23/01/2015 15:38:45 5912 (0x1718)

Received 3 messages  SMS_DMP_DOWNLOADER 23/01/2015 15:48:53 3784 (0x0EC8)

Received 3 messages  SMS_DMP_DOWNLOADER 23/01/2015 16:49:31 2836 (0x0B14)

Received 3 messages  SMS_DMP_DOWNLOADER 23/01/2015 17:55:15 5964 (0x174C)

2. Check the CloudUserSync.log (authorizes users to enrol devices in Intune) and OutgoingContentManager.log (uploads application files to Intune) – they shouldn’t have had anything to do with the problem but just as a sanity check.

3. Checked dmpuploader.log to see if the Intune Connector's upload component was able to send messages to the service. Again – this shouldn’t have been the problem but I just needed to rule it out. This also looked just fine, as I’d expect it to be. Message like the following were the last thing in the log:

StartUpload for replication group CloudDmp last sync version 12170413 ...     SMS_DMP_UPLOADER 23/01/2015 16:55:32 6656 (0x1A00)

Startload succeeded with transmission ID a36266f9-7663-4625-be03-71f9f89b6a43  SMS_DMP_UPLOADER 23/01/2015 16:55:33 6656 (0x1A00)

Expecting sync data or sync end message, however message type is DRS_SyncPing  SMS_DMP_UPLOADER 23/01/2015 16:55:34 6656 (0x1A00)

EndUpload transmission a36266f9-7663-4625-be03-71f9f89b6a43 final data version 12170518 succeeded      SMS_DMP_UPLOADER 23/01/2015 16:55:35 6656 (0x1A00)

Found sync start for replication group CloudDmp   SMS_DMP_UPLOADER 23/01/2015 18:00:35 6656 (0x1A00)

StartUpload for replication group CloudDmp last sync version 12170518 ...     SMS_DMP_UPLOADER 23/01/2015 18:00:35 6656 (0x1A00)

Startload succeeded with transmission ID cee9ce04-3c6d-4f24-97fb-c66355250d31  SMS_DMP_UPLOADER 23/01/2015 18:00:36 6656 (0x1A00)

Expecting sync data or sync end message, however message type is DRS_SyncPing  SMS_DMP_UPLOADER 23/01/2015 18:00:37 6656 (0x1A00)

EndUpload transmission cee9ce04-3c6d-4f24-97fb-c66355250d31 final data version 12170624 succeeded      SMS_DMP_UPLOADER 23/01/2015 18:00:38 6656 (0x1A00)

The troubleshooting - the site

At this stage, I’d ruled out the Intune Connector installed on my CAS and started to get worried that something was wrong in my environment (“It’s not you, it’s me”). :)

  1. I took a look at DRS but this looked find under Monitoring à Database Replication.
  2. I decided to run Replication Link Analyser on each site locally with local admin permissions on SQL Server and the Site Server. That returned nothing wrong on either site.
  3. I then decided to ensure that my primary site could process the messages, so I walked through the various log files:
  1. ddm.log (processes the incoming device registration and gives the initial basic information you see in the console)
  2. dataldr.log (processes the incoming hardware inventory being sent by the service via the connector)
  3. statsys.log (processes the incoming state messages sent by the service via the connector that gives information about device compliance and app deployment)
  4. mpfdm.log (at the primary site processes and routes messages from the connector to components)
  5. hman.log (at the CAS (if you have one) routes messages to each device’s assigned primary site for already enrolled clients or to the site configured in the Connector for new devices)
  • All of the above checked out 100% and appeared as you’d expect on a normal site.

At this stage, all the usual stuff looked fine. DRS replication was after the recovery and all my components worked just fine. However, I did notice that one of my secondary sites was stuck setting itself up post recovery (I ignored this). It was now time to troubleshoot file replication.

File what?!

Background on the Intune Connector

Now – you might ask…”I though all data moves using DRS in 2012? Why are you troubleshooting file replication!?” It does…but remember that Site Data flows up the hierarchy to the CAS and is owned by the primary site. The primary site owns the device records in hybrid MDM. In order for the data to be stored in the database and for DRS to replicate it, we need a primary site to process it first. To do then, the CAS (or site hosting the Intune Connector) needs to replicate the messages from the service to a primary site. Which primary site is based on the following:

A. For new devices: messages are sent to the site configured in the Intune Connector, and those devices then become assigned to that site.

B. For already registered devices: messages for that device are sent to the site to which the device is registered. 

So…I went to go troubleshoot file replication in my environment like it Configuration Manager 2007. Ahh…the memories! :)

The troubleshooting – file replication?

I discovered this looked fine from the CAS down to the primary site. I went through all the usual logs:

· scheduler.log (schedules the transmission and ensure that sender scheduler currently allow communication)

· sender.log (actually does SMB transmission)

The endgame

I was perplexed…everything was as it should be, so I decided to look at my primary site more closely. I then noticed that my CM installation directory was massive (more than 150GB in size)! That’s definitely not normal. The files were in despooler on the primary site, loads of PCK files backlogged that related to some software update packages in my environment.

I opened up despool.log (as process is responsible for despooling (yes, autocorrect despooling and not despoiling!) incoming messages that have been queued up from other sites. I noticed that the log file was saying that it wasn’t processing any file because the site was in maintenance mode.

Site currently is in maintenance mode and D:\Program Files\Microsoft Configuration Manager\inboxes\despoolr.box\receive\204kocaz.sni is not DRS init or certificate exchange package, backup it until site becomes active.      SMS_DESPOOLER 21/01/2015 11:11:06 2816 (0x0B00)

Waiting for ready instruction file....    SMS_DESPOOLER 21/01/2015 11:11:06 2816 (0x0B00)

Site currently is in maintenance mode and D:\Program Files\Microsoft Configuration Manager\inboxes\despoolr.box\receive\204kncaz.sni is not DRS init or certificate exchange package, backup it until site becomes active.      SMS_DESPOOLER 21/01/2015 11:14:02 2816 (0x0B00)

Hmm…DRS replication was just fine both from the CAS and the primary. Then I remembered, that the secondary site wasn’t healthy. I’d ignored this because I wasn’t concerned about the secondary site at that particular moment because it didn’t participate in the hybrid MDM work the Intune Connector was doing.

The cause

The secondary site setup had caused the primary site to pause replication while DRS is setup – which should only take minutes to a couple of hours tops not days/weeks. Clearly my secondary site wasn’t in a good state…luckily for me I didn’t need it at the moment or for any tests…so I deleted it from the hierarchy and decided until I get time to either troubleshoot it further or build a new one I could live without it. Within a matter of seconds, despooler sprang to life and started processing my CAS packages and eventually processed my Intune Connector messages as well. These were routed to the right location. From there, within a few minutes I had the devices showing up in the console.

Hopefully my pain will be useful to you in the future and prevent or minimise any trouble you hit.