How to troubleshoot issues with the Log Analytics agent for Linux
This article provides help troubleshooting errors you might experience with the Log Analytics agent for Linux in Azure Monitor and suggests possible solutions to resolve them.
Log Analytics Troubleshooting Tool
The Log Analytics Agent Linux Troubleshooting Tool is a script designed to help find and diagnose issues with the Log Analytics Agent. It is automatically included with the agent upon installation. Running the tool should be the first step in diagnosing an issue.
How to Use
The Troubleshooting Tool can be run by pasting the following command into a terminal window on a machine with the Log Analytics agent:
sudo /opt/microsoft/omsagent/bin/troubleshooter
Manual Installation
The Troubleshooting Tool is automatically included upon installation of the Log Analytics Agent. However, if installation fails in any way, it can also be installed manually by following the steps below.
- Ensure that the GNU Project Debugger (GDB) is installed on the machine since the troubleshooter relies on it.
- Copy the troubleshooter bundle onto your machine:
wget https://raw.github.com/microsoft/OMS-Agent-for-Linux/master/source/code/troubleshooter/omsagent_tst.tar.gz - Unpack the bundle:
tar -xzvf omsagent_tst.tar.gz - Run the manual installation:
sudo ./install_tst
Scenarios Covered
Below is a list of scenarios checked by the Troubleshooting Tool:
- Agent is unhealthy, heartbeat doesn't work properly
- Agent doesn't start, can't connect to Log Analytic Services
- Agent syslog isn't working
- Agent has high CPU / memory usage
- Agent having installation issues
- Agent custom logs aren't working
- Collect Agent logs
For more details, please check out our GitHub documentation.
Note
Please run the Log Collector tool when you experience an issue. Having the logs initially will greatly help our support team troubleshoot your issue quicker.
Purge and Re-Install the Linux Agent
We've seen that a clean re-install of the Agent will fix most issues. In fact this may be the first suggestion from Support to get the Agent into a uncorrupted state from our support team. Running the troubleshooter, log collect, and attempting a clean re-install will help solve issues more quickly.
- Download the purge script:
$ wget https://raw.githubusercontent.com/microsoft/OMS-Agent-for-Linux/master/tools/purge_omsagent.sh
- Run the purge script (with sudo permissions):
$ sudo sh purge_omsagent.sh
Important log locations and Log Collector tool
| File | Path |
|---|---|
| Log Analytics agent for Linux log file | /var/opt/microsoft/omsagent/<workspace id>/log/omsagent.log |
| Log Analytics agent configuration log file | /var/opt/microsoft/omsconfig/omsconfig.log |
We recommend you to use our log collector tool to retrieve important logs for troubleshooting or before submitting a GitHub issue. You can read more about the tool and how to run it here.
Important configuration files
| Category | File Location |
|---|---|
| Syslog | /etc/syslog-ng/syslog-ng.conf or /etc/rsyslog.conf or /etc/rsyslog.d/95-omsagent.conf |
| Performance, Nagios, Zabbix, Log Analytics output and general agent | /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf |
| Additional configurations | /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.d/*.conf |
Note
Editing configuration files for performance counters and Syslog is overwritten if the collection is configured from the Agents configuration in the Azure portal for your workspace. To disable configuration for all agents, disable collection from Agents configuration or for a single agent run the following:
sudo /opt/microsoft/omsconfig/Scripts/OMS_MetaConfigHelper.py --disable && sudo rm /etc/opt/omi/conf/omsconfig/configuration/Current.mof* /etc/opt/omi/conf/omsconfig/configuration/Pending.mof*
Installation error codes
| Error Code | Meaning |
|---|---|
| NOT_DEFINED | Because the necessary dependencies are not installed, the auoms auditd plugin will not be installed. Installation of auoms failed, install package auditd. |
| 2 | Invalid option provided to the shell bundle. Run sudo sh ./omsagent-*.universal*.sh --help for usage |
| 3 | No option provided to the shell bundle. Run sudo sh ./omsagent-*.universal*.sh --help for usage. |
| 4 | Invalid package type OR invalid proxy settings; omsagent-rpm.sh packages can only be installed on RPM-based systems, and omsagent-deb.sh packages can only be installed on Debian-based systems. It is recommend you use the universal installer from the latest release. Also review to verify your proxy settings. |
| 5 | The shell bundle must be executed as root OR there was 403 error returned during onboarding. Run your command using sudo. |
| 6 | Invalid package architecture OR there was error 200 error returned during onboarding; omsagent-*x64.sh packages can only be installed on 64-bit systems, and omsagent-*x86.sh packages can only be installed on 32-bit systems. Download the correct package for your architecture from the latest release. |
| 17 | Installation of OMS package failed. Look through the command output for the root failure. |
| 18 | Installation of OMSConfig package failed. Look through the command output for the root failure. |
| 19 | Installation of OMI package failed. Look through the command output for the root failure. |
| 20 | Installation of SCX package failed. Look through the command output for the root failure. |
| 21 | Installation of Provider kits failed. Look through the command output for the root failure. |
| 22 | Installation of bundled package failed. Look through the command output for the root failure |
| 23 | SCX or OMI package already installed. Use --upgrade instead of --install to install the shell bundle. |
| 30 | Internal bundle error. File a GitHub Issue with details from the output. |
| 55 | Unsupported openssl version OR Cannot connect to Azure Monitor OR dpkg is locked OR missing curl program. |
| 61 | Missing Python ctypes library. Install the Python ctypes library or package (python-ctypes). |
| 62 | Missing tar program, install tar. |
| 63 | Missing sed program, install sed. |
| 64 | Missing curl program, install curl. |
| 65 | Missing gpg program, install gpg. |
Onboarding error codes
| Error Code | Meaning |
|---|---|
| 2 | Invalid option provided to the omsadmin script. Run sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h for usage. |
| 3 | Invalid configuration provided to the omsadmin script. Run sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -h for usage. |
| 4 | Invalid proxy provided to the omsadmin script. Verify the proxy and see our documentation for using an HTTP proxy. |
| 5 | 403 HTTP error received from Azure Monitor. See the full output of the omsadmin script for details. |
| 6 | Non-200 HTTP error received from Azure Monitor. See the full output of the omsadmin script for details. |
| 7 | Unable to connect to Azure Monitor. See the full output of the omsadmin script for details. |
| 8 | Error onboarding to Log Analytics workspace. See the full output of the omsadmin script for details. |
| 30 | Internal script error. File a GitHub Issue with details from the output. |
| 31 | Error generating agent ID. File a GitHub Issue with details from the output. |
| 32 | Error generating certificates. See the full output of the omsadmin script for details. |
| 33 | Error generating metaconfiguration for omsconfig. File a GitHub Issue with details from the output. |
| 34 | Metaconfiguration generation script not present. Retry onboarding with sudo sh /opt/microsoft/omsagent/bin/omsadmin.sh -w <Workspace ID> -s <Workspace Key>. |
Enable debug logging
OMS output plugin debug
FluentD allows for plugin-specific logging levels allowing you to specify different log levels for inputs and outputs. To specify a different log level for OMS output, edit the general agent configuration at /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf.
In the OMS output plugin, before the end of the configuration file, change the log_level property from info to debug:
<match oms.** docker.**>
type out_oms
log_level debug
num_threads 5
buffer_chunk_limit 5m
buffer_type file
buffer_path /var/opt/microsoft/omsagent/<workspace id>/state/out_oms*.buffer
buffer_queue_limit 10
flush_interval 20s
retry_limit 10
retry_wait 30s
</match>
Debug logging allows you to see batched uploads to Azure Monitor separated by type, number of data items, and time taken to send:
Example debug enabled log:
Success sending oms.nagios x 1 in 0.14s
Success sending oms.omi x 4 in 0.52s
Success sending oms.syslog.authpriv.info x 1 in 0.91s
Verbose output
Instead of using the OMS output plugin you can also output data items directly to stdout, which is visible in the Log Analytics agent for Linux log file.
In the Log Analytics general agent configuration file at /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf, comment out the OMS output plugin by adding a # in front of each line:
#<match oms.** docker.**>
# type out_oms
# log_level info
# num_threads 5
# buffer_chunk_limit 5m
# buffer_type file
# buffer_path /var/opt/microsoft/omsagent/<workspace id>/state/out_oms*.buffer
# buffer_queue_limit 10
# flush_interval 20s
# retry_limit 10
# retry_wait 30s
#</match>
Below the output plugin, uncomment the following section by removing the # in front of each line:
<match **>
type stdout
</match>
Issue: Unable to connect through proxy to Azure Monitor
Probable causes
- The proxy specified during onboarding was incorrect
- The Azure Monitor and Azure Automation Service Endpoints are not included in the approved list in your datacenter
Resolution
Reonboard to Azure Monitor with the Log Analytics agent for Linux by using the following command with the option
-venabled. It allows verbose output of the agent connecting through the proxy to Azure Monitor./opt/microsoft/omsagent/bin/omsadmin.sh -w <Workspace ID> -s <Workspace Key> -p <Proxy Conf> -vReview the section Update proxy settings to verify you have properly configured the agent to communicate through a proxy server.
Double-check that the endpoints outlined in the Azure Monitor network firewall requirements list are added to an allow list correctly. If you use Azure Automation, the necessary network configuration steps are linked above as well.
Issue: You receive a 403 error when trying to onboard
Probable causes
- Date and Time is incorrect on Linux Server
- Workspace ID and Workspace Key used are not correct
Resolution
- Check the time on your Linux server with the command date. If the time is +/- 15 minutes from current time, then onboarding fails. To correct this update the date and/or timezone of your Linux server.
- Verify you have installed the latest version of the Log Analytics agent for Linux. The newest version now notifies you if time skew is causing the onboarding failure.
- Reonboard using correct Workspace ID and Workspace Key following the installation instructions earlier in this article.
Issue: You see a 500 and 404 error in the log file right after onboarding
This is a known issue that occurs on first upload of Linux data into a Log Analytics workspace. This does not affect data being sent or service experience.
Issue: You see omiagent using 100% CPU
Probable causes
A regression in nss-pem package v1.0.3-5.el7 caused a severe performance issue, that we've been seeing come up a lot in Redhat/Centos 7.x distributions. To learn more about this issue, check the following documentation: Bug 1667121 Performance regression in libcurl.
Performance related bugs don't happen all the time, and they are very difficult to reproduce. If you experience such issue with omiagent you should use the script omiHighCPUDiagnostics.sh which will collect the stack trace of the omiagent when exceeding a certain threshold.
Download the script
wget https://raw.githubusercontent.com/microsoft/OMS-Agent-for-Linux/master/tools/LogCollector/source/omiHighCPUDiagnostics.shRun diagnostics for 24 hours with 30% CPU threshold
bash omiHighCPUDiagnostics.sh --runtime-in-min 1440 --cpu-threshold 30Callstack will be dumped in omiagent_trace file, If you notice many Curl and NSS function calls, follow resolution steps below.
Resolution (step by step)
Upgrade the nss-pem package to v1.0.3-5.el7_6.1.
sudo yum upgrade nss-pemIf nss-pem is not available for upgrade (mostly happens on Centos), then downgrade curl to 7.29.0-46. If by mistake you run "yum update", then curl will be upgraded to 7.29.0-51 and the issue will happen again.
sudo yum downgrade curl libcurlRestart OMI:
sudo scxadmin -restart
Issue: You are not seeing forwarded Syslog messages
Probable causes
- The configuration applied to the Linux server does not allow collection of the sent facilities and/or log levels
- Syslog is not being forwarded correctly to the Linux server
- The number of messages being forwarded per second are too great for the base configuration of the Log Analytics agent for Linux to handle
Resolution
- Verify the configuration in the Log Analytics workspace for Syslog has all the facilities and the correct log levels. Review configure Syslog collection in the Azure portal
- Verify the native syslog messaging daemons (
rsyslog,syslog-ng) are able to receive the forwarded messages - Check firewall settings on the Syslog server to ensure that messages are not being blocked
- Simulate a Syslog message to Log Analytics using
loggercommandlogger -p local0.err "This is my test message"
Issue: You are receiving Errno address already in use in omsagent log file
If you see [error]: unexpected error error_class=Errno::EADDRINUSE error=#<Errno::EADDRINUSE: Address already in use - bind(2) for "127.0.0.1" port 25224> in omsagent.log.
Probable causes
This error indicates that the Linux Diagnostic extension (LAD) is installed side by side with the Log Analytics Linux VM extension, and it is using same port for syslog data collection as omsagent.
Resolution
As root, execute the following commands (note that 25224 is an example and it is possible that in your environment you see a different port number used by LAD):
/opt/microsoft/omsagent/bin/configure_syslog.sh configure LAD 25229 sed -i -e 's/25224/25229/' /etc/opt/microsoft/omsagent/LAD/conf/omsagent.d/syslog.confYou then need to edit the correct
rsyslogdorsyslog_ngconfig file and change the LAD-related configuration to write to port 25229.If the VM is running
rsyslogd, the file to be modified is:/etc/rsyslog.d/95-omsagent.conf(if it exists, else/etc/rsyslog). If the VM is runningsyslog_ng, the file to be modified is:/etc/syslog-ng/syslog-ng.conf.Restart omsagent
sudo /opt/microsoft/omsagent/bin/service_control restart.Restart syslog service.
Issue: You are unable to uninstall omsagent using purge option
Probable causes
- Linux Diagnostic Extension is installed
- Linux Diagnostic Extension was installed and uninstalled, but you still see an error about omsagent being used by mdsd and cannot be removed.
Resolution
- Uninstall the Linux Diagnostic Extension (LAD).
- Remove Linux Diagnostic Extension files from the machine if they are present in the following location:
/var/lib/waagent/Microsoft.Azure.Diagnostics.LinuxDiagnostic-<version>/and/var/opt/microsoft/omsagent/LAD/.
Issue: You cannot see data any Nagios data
Probable causes
- Omsagent user does not have permissions to read from Nagios log file
- Nagios source and filter have not been uncommented from omsagent.conf file
Resolution
Add omsagent user to read from Nagios file by following these instructions.
In the Log Analytics agent for Linux general configuration file at
/etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf, ensure that both the Nagios source and filter are uncommented.<source> type tail path /var/log/nagios/nagios.log format none tag oms.nagios </source> <filter oms.nagios> type filter_nagios_log </filter>
Issue: You are not seeing any Linux data
Probable causes
- Onboarding to Azure Monitor failed
- Connection to Azure Monitor is blocked
- Virtual machine was rebooted
- OMI package was manually upgraded to a newer version compared to what was installed by Log Analytics agent for Linux package
- OMI is frozen, blocking OMS agent
- DSC resource logs class not found error in
omsconfig.loglog file - Log Analytics agent for data is backed up
- DSC logs Current configuration does not exist. Execute Start-DscConfiguration command with -Path parameter to specify a configuration file and create a current configuration first. in
omsconfig.loglog file, but no log message exists aboutPerformRequiredConfigurationChecksoperations.
Resolution
- Install all dependencies like auditd package.
- Check if onboarding to Azure Monitor was successful by checking if the following file exists:
/etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.conf. If it was not, reonboard using the omsadmin.sh command line instructions. - If using a proxy, check proxy troubleshooting steps above.
- In some Azure distribution systems, omid OMI server daemon does not start after the virtual machine is rebooted. This will result in not seeing Audit, ChangeTracking, or UpdateManagement solution-related data. The workaround is to manually start omi server by running
sudo /opt/omi/bin/service_control restart. - After OMI package is manually upgraded to a newer version, it has to be manually restarted for Log Analytics agent to continue functioning. This step is required for some distros where OMI server does not automatically start after it is upgraded. Run
sudo /opt/omi/bin/service_control restartto restart OMI.
- In some situations, OMI can become frozen. The OMS agent may enter a blocked state waiting for OMI, blocking all data collection. The OMS agent process will be running but there will be no activity, evidenced by no new log lines (such as sent heartbeats) present in
omsagent.log. Restart OMI withsudo /opt/omi/bin/service_control restartto recover the agent.
If you see DSC resource class not found error in omsconfig.log, run
sudo /opt/omi/bin/service_control restart.In some cases, when the Log Analytics agent for Linux cannot talk to Azure Monitor, data on the agent is backed up to the full buffer size: 50 MB. The agent should be restarted by running the following command
/opt/microsoft/omsagent/bin/service_control restart.Note
This issue is fixed in Agent version 1.1.0-28 or later
If
omsconfig.loglog file does not indicate thatPerformRequiredConfigurationChecksoperations are running periodically on the system, there might be a problem with the cron job/service. Make sure cron job exists under/etc/cron.d/OMSConsistencyInvoker. If needed run the following commands to create the cron job:mkdir -p /etc/cron.d/ echo "*/15 * * * * omsagent /opt/omi/bin/OMSConsistencyInvoker >/dev/null 2>&1" | sudo tee /etc/cron.d/OMSConsistencyInvokerAlso, make sure the cron service is running. You can use
service cron statuswith Debian, Ubuntu, SUSE, orservice crond statuswith RHEL, CentOS, Oracle Linux to check the status of this service. If the service does not exist, you can install the binaries and start the service using the following:Ubuntu/Debian
# To Install the service binaries sudo apt-get install -y cron # To start the service sudo service cron startSUSE
# To Install the service binaries sudo zypper in cron -y # To start the service sudo systemctl enable cron sudo systemctl start cronRHEL/CeonOS
# To Install the service binaries sudo yum install -y crond # To start the service sudo service crond startOracle Linux
# To Install the service binaries sudo yum install -y cronie # To start the service sudo service crond start
Issue: When configuring collection from the portal for Syslog or Linux performance counters, the settings are not applied
Probable causes
- The Log Analytics agent for Linux has not picked up the latest configuration
- The changed settings in the portal were not applied
Resolution
Background: omsconfig is the Log Analytics agent for Linux configuration agent that looks for new portal-side configuration every five minutes. This configuration is then applied to the Log Analytics agent for Linux configuration files located at /etc/opt/microsoft/omsagent/conf/omsagent.conf.
- In some cases, the Log Analytics agent for Linux configuration agent might not be able to communicate with the portal configuration service resulting in latest configuration not being applied.
Check that the
omsconfigagent is installed by runningdpkg --list omsconfigorrpm -qi omsconfig. If it is not installed, reinstall the latest version of the Log Analytics agent for Linux.Check that the
omsconfigagent can communicate with Azure Monitor by running the following commandsudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'. This command returns the configuration that agent receives from the service, including Syslog settings, Linux performance counters, and custom logs. If this command fails, run the following commandsudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'. This command forces the omsconfig agent to talk to Azure Monitor and retrieve the latest configuration.
Issue: You are not seeing any custom log data
Probable causes
- Onboarding to Azure Monitor failed.
- The setting Apply the following configuration to my Linux Servers has not been selected.
- omsconfig has not picked up the latest custom log configuration from the service.
- Log Analytics agent for Linux user
omsagentis unable to access the custom log due to permissions or not being found. You may see the following errors: [DATETIME] [warn]: file not found. Continuing without tailing it.[DATETIME] [error]: file not accessible by omsagent.- Known Issue with Race Condition fixed in Log Analytics agent for Linux version 1.1.0-217
Resolution
Verify onboarding to Azure Monitor was successful by checking if the following file exists:
/etc/opt/microsoft/omsagent/<workspace id>/conf/omsadmin.conf. If not, either:Reonboard using the omsadmin.sh command line instructions.
Under Advanced Settings in the Azure portal, ensure that the setting Apply the following configuration to my Linux Servers is enabled.
Check that the
omsconfigagent can communicate with Azure Monitor by running the following commandsudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/GetDscConfiguration.py'. This command returns the configuration that agent receives from the service, including Syslog settings, Linux performance counters, and custom logs. If this command fails, run the following commandsudo su omsagent -c 'python /opt/microsoft/omsconfig/Scripts/PerformRequiredConfigurationChecks.py'. This command forces the omsconfig agent to talk to Azure Monitor and retrieve the latest configuration.
Background: Instead of the Log Analytics agent for Linux running as a privileged user - root, the agent runs as the omsagent user. In most cases, explicit permission must be granted to this user in order for certain files to be read. To grant permission to omsagent user, run the following commands:
- Add the
omsagentuser to specific groupsudo usermod -a -G <GROUPNAME> <USERNAME> - Grant universal read access to the required file
sudo chmod -R ugo+rx <FILE DIRECTORY>
There is a known issue with a race condition with the Log Analytics agent for Linux version earlier than 1.1.0-217. After updating to the latest agent, run the following command to get the latest version of the output plugin sudo cp /etc/opt/microsoft/omsagent/sysconf/omsagent.conf /etc/opt/microsoft/omsagent/<workspace id>/conf/omsagent.conf.
Issue: You are trying to reonboard to a new workspace
When you try to reonboard an agent to a new workspace, the Log Analytics agent configuration needs to be cleaned up before reonboarding. To clean up old configuration from the agent, run the shell bundle with --purge
sudo sh ./omsagent-*.universal.x64.sh --purge
Or
sudo sh ./onboard_agent.sh --purge
You can continue reonboard after using the --purge option
Log Analytics agent extension in the Azure portal is marked with a failed state: Provisioning failed
Probable causes
- Log Analytics agent has been removed from the operating system
- Log Analytics agent service is down, disabled, or not configured
Resolution
Perform the following steps to correct the issue.
- Remove extension from Azure portal.
- Install the agent following the instructions.
- Restart the agent by running the following command:
sudo /opt/microsoft/omsagent/bin/service_control restart.
- Wait several minutes and the provisioning state changes to Provisioning succeeded.
Issue: The Log Analytics agent upgrade on-demand
Probable causes
The Log Analytics agent packages on the host are outdated.
Resolution
Perform the following steps to correct the issue.
Check for the latest release on page.
Download install script (1.4.2-124 as example version):
wget https://github.com/Microsoft/OMS-Agent-for-Linux/releases/download/OMSAgent_GA_v1.4.2-124/omsagent-1.4.2-124.universal.x64.shUpgrade packages by executing
sudo sh ./omsagent-*.universal.x64.sh --upgrade.
Issue: Installation is failing saying Python2 cannot support ctypes, even though Python3 is being used
Probable causes
There is a known issue where, if the VM's language isn't English, a check will fail when verifying which version of Python is being used. This leads to the agent always assuming Python2 is being used, and failing if there is no Python2.
Resolution
Change the VM's environmental language to English:
export LANG=en_US.UTF-8
Povratne informacije
Pošalјite i prikažite povratne informacije za