Azure Percept DK loses network connection after 15-20 minutes of operating

BLAzure 1 Reputation point
2022-03-07T19:38:20.123+00:00

Hello again,

After some setup issues, I updated to the latest version via UUU tool which allowed setup to complete (https://learn.microsoft.com/en-us/answers/questions/760976/percept-setup-doesn39t-complete.html)

Device was working great and was accessible in Azure Percept Studio and etc.

However, after about 15-20 minutes device network connection seems to drop and device cannot be contacted. This is not a one off, is consistent now following a restart device will become inaccessible after about 15-20 minutes with the following symptoms:

  • Device is powered on
  • LED light on device is solid white
  • Connection in Azure Percept Studio changes to "Disconnected"
  • Connection graph in IOT hub drops from 1 to 0.
  • Cannot SSH to device anymore from local WIFI network
  • If I press the OFF button it will go through graceful shutdown process

Logs on device itself are still being generated and can access these via SSH following restart so appears has lost network access rather than just full crashed.

Device edgeHub_log.txt shows the following:

<6> 2022-03-07 08:20:11.390 +00:00 [INF] - Processing subscriptions TwinResponse, Methods, ModuleMessages, DesiredPropertyUpdates for client zeperceptdevice1/devmmclientmodule.  
<6> 2022-03-07 08:20:11.393 +00:00 [INF] - Set subscriptions from session state for zeperceptdevice1/devmmclientmodule  
<4> 2022-03-07 08:20:15.332 +00:00 [WRN] - Error creating cloud connection for client zeperceptdevice1/azureeyemodule  
Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException: Transient network error occurred, please retry.  
 ---> System.Net.Sockets.SocketException (11): Resource temporarily unavailable  
   at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpIoTTransport.InitializeAsync(TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpIoTConnector.OpenConnectionAsync(TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.EnsureConnectionAsync(TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpConnectionHolder.OpenSessionAsync(DeviceIdentity deviceIdentity, TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpUnit.EnsureSessionAsync(TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.AmqpIoT.AmqpUnit.OpenAsync(TimeSpan timeout)  
   at Microsoft.Azure.Devices.Client.Transport.Amqp.AmqpTransportHandler.OpenAsync(CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.Client.Transport.ProtocolRoutingDelegatingHandler.OpenAsync(CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.<>c__DisplayClass23_0.<<ExecuteWithErrorHandlingAsync>b__0>d.MoveNext()  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func`1 asyncOperation)  
   --- End of inner exception stack trace ---  
   at Microsoft.Azure.Devices.Client.Transport.ErrorDelegatingHandler.ExecuteWithErrorHandlingAsync[T](Func`1 asyncOperation)  
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass33_0.<<OpenInternalAsync>b__0>d.MoveNext()  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)  
   at Microsoft.Azure.Devices.Client.InternalClient.OpenAsync()  
   at Microsoft.Azure.Devices.Edge.Util.TaskEx.TimeoutAfter(Task task, TimeSpan timeout) in /home/vsts/work/1/s/edge-util/src/Microsoft.Azure.Devices.Edge.Util/TaskEx.cs:line 144  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ModuleClientWrapper.OpenAsync() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ModuleClientWrapper.cs:line 63  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.<>c__DisplayClass30_0.<<InvokeFunc>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 187  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func`1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 180  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func`1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 180  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.OpenAsync() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 64  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.ConnectToIoTHub(ITokenProvider newTokenProvider) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 134  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.CreateNewCloudProxyAsync(ITokenProvider newTokenProvider) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 117  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.Create(IIdentity identity, Action`2 connectionStatusChangedHandler, ITransportSettings[] transportSettings, IMessageConverterProvider messageConverterProvider, IClientProvider clientProvider, ICloudListener cloudListener, ITokenProvider tokenProvider, TimeSpan idleTimeout, Boolean closeOnIdleTimeout, TimeSpan operationTimeout, String productInfo, Option`1 modelId) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 99  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnectionProvider.<>c__DisplayClass17_1.<<Connect>b__2>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnectionProvider.cs:line 162  
--- End of stack trace from previous location where exception was thrown ---  
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnectionProvider.Connect(IIdentity identity, Action`2 connectionStatusChangedHandler) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnectionProvider.cs:line 178  

docket.txt shows that the device is still running and does a graceful shutdown when I reboot:

Mar 07 08:35:40 apd-d8c0a6a82c29 dockerd[1245]: time="2022-03-07T08:35:40.961752744Z" level=warning msg="[resolver] connect failed: dial udp 8.8.8.8:53: connect: network is unreachable"  
Mar 07 08:35:40 apd-d8c0a6a82c29 dockerd[1245]: time="2022-03-07T08:35:40.961847739Z" level=warning msg="[resolver] connect failed: dial udp 8.8.8.8:53: connect: network is unreachable"  
Mar 07 08:35:41 apd-d8c0a6a82c29 systemd[1]: Stopping Docker Application Container Engine...  
Mar 07 08:35:41 apd-d8c0a6a82c29 dockerd[1245]: time="2022-03-07T08:35:41.124130902Z" level=info msg="Processing signal 'terminated'"  
Mar 07 08:35:41 apd-d8c0a6a82c29 dockerd[1245]: time="2022-03-07T08:35:41.128800354Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby  
Mar 07 08:35:41 apd-d8c0a6a82c29 dockerd[1245]: time="2022-03-07T08:35:41.131750129Z" level=info msg="Daemon shutdown complete"  
Mar 07 08:35:41 apd-d8c0a6a82c29 systemd[1]: Stopped Docker Application Container Engine.  
-- Reboot --  

Any thoughts?

UPDATE:

> cat /etc/os-release

NAME="Common Base Linux Mariner"
VERSION="1.0.20220127"
ID=mariner
VERSION_ID=1.0
PRETTY_NAME="CBL-Mariner/Linux"
ANSI_COLOR="1;34"
HOME_URL="https://aka.ms/cbl-mariner"
BUG_REPORT_URL="https://aka.ms/cbl-mariner"
SUPPORT_URL="https://aka.ms/cbl-mariner"

> cat /etc/os-subrelease

BUILDER_NAME=Microsoft.AED
BUILD_DATE="2022-02-09T02:49:50Z"
ID=santacruz
VERSION_ID=165650
NAME="Santa Cruz Mariner"
VERSION="Drop 165650"
BRANCH="refs/heads/releases/public_preview_1.0"
COMMIT_ID="dffb37184b8cb0feb32bf2ad40bc0cd8980ea662"

> cat /etc/adu-version

2022.102.109.102

> Run the following command and provide us the network_log.txt to examine?

Start up:

Mar 08 17:39:48 apd-d8c0a6a82c29 ztpd[709]: set pattern='0 0 0 0' for led '/sys/class/leds/white:wlan'  
Mar 08 17:39:48 apd-d8c0a6a82c29 ztpd[709]: set pattern='255 0 255 0' for led '/sys/class/leds/white:wlan'  
Mar 08 17:39:48 apd-d8c0a6a82c29 systemd-networkd[488]: wlan0: DHCPv4 address 192.168.86.58/24 via 192.168.86.1  
Mar 08 17:39:48 apd-d8c0a6a82c29 systemd-networkd[488]: wlan0: Configured  

Below: Only events that happen in the network log between start up and shutdown. Device went offline around this time. Seems to correlate with the device stopping the access point:

Mar 08 17:42:46 apd-d8c0a6a82c29 ztpd[709]: [wlan0] event=<3>CTRL-EVENT-BSS-REMOVED 0 70:3a:cb:97:c8:7f  
Mar 08 17:42:46 apd-d8c0a6a82c29 ztpd[709]: [wlan0] event=<3>CTRL-EVENT-BSS-REMOVED 2 12:13:31:da:48:a7  
Mar 08 17:42:46 apd-d8c0a6a82c29 ztpd[709]: [wlan0] event=<3>CTRL-EVENT-BSS-REMOVED 3 10:13:31:da:48:9f  
Mar 08 17:42:46 apd-d8c0a6a82c29 ztpd[709]: [wlan0] event=<3>CTRL-EVENT-BSS-REMOVED 4 12:13:31:da:48:a9  
Mar 08 18:05:19 apd-d8c0a6a82c29 systemd[1]: hostapd.service: Unit not needed anymore. Stopping.  
Mar 08 18:05:19 apd-d8c0a6a82c29 hostapd[853]: wlan1: interface state ENABLED->DISABLED  
Mar 08 18:05:19 apd-d8c0a6a82c29 hostapd[853]: wlan1: AP-DISABLED  
Mar 08 18:05:19 apd-d8c0a6a82c29 hostapd[853]: wlan1: CTRL-EVENT-TERMINATING  
Mar 08 18:05:19 apd-d8c0a6a82c29 hostapd[853]: nl80211: deinit ifname=wlan1 disabled_11b_rates=0  
Mar 08 18:05:19 apd-d8c0a6a82c29 ztpd[709]: inotify path=/var/run/wpa_supplicant target=hostapd mask=0x40000200  
Mar 08 18:05:19 apd-d8c0a6a82c29 systemd[1]: Stopping hostapd...  
Mar 08 18:05:19 apd-d8c0a6a82c29 systemd[1]: Stopped hostapd.  

Start of shutdown/reboot

Mar 08 18:20:03 apd-d8c0a6a82c29 ztpd[709]: inotify path=/var/run/wpa_supplicant target=sshd.pid mask=0x00000200  
Mar 08 18:20:03 apd-d8c0a6a82c29 ztpd[709]: failed to retrieve pending sd-bus message (-104)  
Mar 08 18:20:03 apd-d8c0a6a82c29 ztpd[709]: failed to update sd-bus fd settings for epoll (9)  
Azure Percept
Azure Percept
A comprehensive Azure platform with added security for creating edge artificial intelligence solutions.
70 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Quinn Tran 191 Reputation points
    2022-03-15T17:14:36.817+00:00

    Hello @BLAzure ,
    As we can’t repro the issue, we think that this is likely a network environment issue. Could you provide more details about your network environment?

    1. Do you apply any specific configurations on the Percept DK, such as iptables rules, or something related to network?
    2. You happened to modify something on Percept DK, does this issue happen with pure image under identical network environment?
    3. Does this issue happen with Ethernet connection? Could you try to connect an Ethernet cable from one of the Wi-Fi AP to make sure the IP domain is identical?
    4. Could you also verify again with 5G/LTE hotspot connection?
      Preproduce steps:
      1. Enable hotspot function on a mobile phone.
      2. Power on Percept DK, and establish Wi-Fi connection with the hotspot’s SSID.
      3. Reproduce this issue again.

    Thank you!