Cloudera Manager支持邮件告警、snmp转发告警和自定义脚本告警三种方式。本文介绍自定义告警脚本的方式将CM告警输出。
1. 创建目录
Alert Publisher服务所在节点创建目录
mkdir /opt/cloudera/alert_script
2. 创建脚本
由于Alert Publisher服务通过bash执行自定义脚本,所以先编写alert.sh脚本,获取原始告警数据
vim alert.sh
#!/bin/bash
# 原始告警
OG_ALERT_FILE=$1
# 告警信息发送到自定义python脚本解析
cat "$OG_ALERT_FILE" | python /opt/cloudera/alert_script/alert.py
alert.sh脚本将获取的数据转发到python脚本解析,以下是alert.py脚本中解析告警数据的函数:
def get_alert_data():
"""
解析接收的原始告警数据
"""
org_alert_file = sys.stdin
alert_data = json.load(org_alert_file)
alert_data_list = []
for i in range(0, len(alert_data)):
alert = alert_data[i]["body"]["alert"]['attributes']
cluster = alert['CLUSTER'][0]
hostname = alert.get('HOSTS', [None])[0]
_time = int(str(alert_data[i]["body"]["alert"]['timestamp']['epochMs'])[0:-3])
_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(_time))
alter_service = alert['HEALTH_TEST_RESULTS'][0]['testName']
alert_event_code = alert['HEALTH_TEST_RESULTS'][0]['eventCode']
alert_severity = alert['HEALTH_TEST_RESULTS'][0]['severity']
alert_content = alert['HEALTH_TEST_RESULTS'][0]['content']
# msg = f"告警集群:{cluster},告警主机:{hostname},告警级别:{alert_severity},告警时间:{_time}," \
# f"告警服务:{alter_service},告警事件编码:{alert_event_code},告警详细内容:{alert_content}"
msg = [cluster, hostname, alert_severity, _time, alter_service, alert_event_code, alert_content]
alert_data_list.append(msg)
return alert_data_list
获取到解析的数据后,可通过邮件、钉钉机器人、企业微信机器人等方式进行告警。本文以邮件为例,如下是alert.py脚本完整内容:
#!/usr/bin/env python
# coding=utf-8
# Author: Yujichang
# desc: cdh自定义告警脚本
import json
import sys
import time
import smtplib
from email.mime.text import MIMEText
class SendMail(object):
"""自定义发送告警邮件"""
def __init__(self):
self.mail_host = 'smtp.mxhichina.com'
self.mail_user = 'yujichang@deepexi.com'
self.mail_passwd = '*****'
self.mail_name = 'yujichang@deepexi.com'
self.mail_to_user = ['yujichang@deepexi.com']
def send_mail(self, content):
sender = "CDHMonitor"+"<"+self.mail_name+">"
msg = MIMEText(content, 'html', 'utf-8')
msg['Subject'] = 'CDH服务状态检查'
msg['From'] = sender
msg['to'] = ",".join(self.mail_to_user)
try:
s = smtplib.SMTP_SSL()
s.connect(self.mail_host, 465)
s.login(self.mail_user, self.mail_passwd)
s.sendmail(sender, self.mail_to_user, msg.as_string())
s.close()
return True
except Exception as e:
print(e)
return False
def get_alert_data():
"""
解析接收的原始告警数据
"""
org_alert_file = sys.stdin
alert_data = json.load(org_alert_file)
alert_data_list = []
for i in range(0, len(alert_data)):
alert = alert_data[i]["body"]["alert"]['attributes']
cluster = alert['CLUSTER'][0]
hostname = alert.get('HOSTS', [None])[0]
_time = int(str(alert_data[i]["body"]["alert"]['timestamp']['epochMs'])[0:-3])
_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(_time))
alter_service = alert['HEALTH_TEST_RESULTS'][0]['testName']
alert_event_code = alert['HEALTH_TEST_RESULTS'][0]['eventCode']
alert_severity = alert['HEALTH_TEST_RESULTS'][0]['severity']
alert_content = alert['HEALTH_TEST_RESULTS'][0]['content']
# msg = f"告警集群:{cluster},告警主机:{hostname},告警级别:{alert_severity},告警时间:{_time}," \
# f"告警服务:{alter_service},告警事件编码:{alert_event_code},告警详细内容:{alert_content}"
msg = [cluster, hostname, alert_severity, _time, alter_service, alert_event_code, alert_content]
alert_data_list.append(msg)
return alert_data_list
def conversion_to_html(data: list):
"""
将解析的告警数据转换为html格式,这个函数主要是定义邮件告警时,以html格式发送
"""
import pandas as pd
# pandas v1.0以下版本需要设置列宽度
pd.set_option('display.max_colwidth', -1)
columns = ['集群', '主机', '级别', '时间', '服务', '事件编码', '详细信息']
alert_data_html = pd.DataFrame(data, columns=columns).to_html(escape=False)
head = \
"""
<head>
<meta charset="utf-8">
<STYLE TYPE="text/css" MEDIA=screen>
table.dataframe {
border-collapse: collapse;
border: 2px solid #a19da2;
/*居中显示整个表格*/
margin: auto;
}
table.dataframe thead {
border: 2px solid #91c6e1;
background: #f1f1f1;
padding: 10px 10px 10px 10px;
color: #333333;
}
table.dataframe tbody {
border: 2px solid #91c6e1;
padding: 10px 10px 10px 10px;
}
table.dataframe tr {
}
table.dataframe th {
vertical-align: top;
font-size: 14px;
padding: 10px 10px 10px 10px;
color: #105de3;
font-family: arial;
text-align: center;
}
table.dataframe td {
text-align: center;
padding: 10px 10px 10px 10px;
}
body {
font-family: 宋体;
}
h1 {
color: #5db446
}
div.header h2 {
color: #0002e3;
font-family: 黑体;
}
div.content h2 {
text-align: center;
font-size: 28px;
text-shadow: 2px 2px 1px #de4040;
color: #fff;
font-weight: bold;
background-color: #008eb7;
line-height: 1.5;
margin: 20px 0;
box-shadow: 10px 10px 5px #888888;
border-radius: 5px;
}
h3 {
font-size: 22px;
background-color: rgba(0, 2, 227, 0.71);
text-shadow: 2px 2px 1px #de4040;
color: rgba(239, 241, 234, 0.99);
line-height: 1.5;
}
h4 {
color: #e10092;
font-family: 楷体;
font-size: 20px;
text-align: center;
}
td img {
/*width: 60px;*/
max-width: 300px;
max-height: 300px;
}
</STYLE>
</head>
"""
body = \
"""
<body>
<div align="center" class="header">
<!--标题部分的信息-->
<h1 align="center">CDH集群服务告警</h1>
</div>
<hr>
<div class="content">
<!--正文内容-->
<h2> </h2>
<div>
<h4></h4>
{df_html}
</div>
<hr>
<p style="text-align: center">
</p>
</div>
</body>
""".format(df_html=alert_data_html)
html_msg = "<html>" + head + body + "</html>"
html_msg = html_msg.replace('\n', '').encode("utf-8")
return html_msg
if __name__ == '__main__':
alert_data = get_alert_data()
alert_data_html = conversion_to_html(alert_data)
s = SendMail()
s.send_mail(alert_data_html)
注:使用以上邮件告警方式,python需安装pandas模块。
3. 修改属主
chown cloudera-scm:cloudera-scm -R /opt/cloudera/alert_script
chmod +x /opt/cloudera/alert_script/alert.sh
chmod +x /opt/cloudera/alert_script/alert.py
4. 配置CM告警服务
在CM-Cloudera Management Service-配置 搜索alert.script.path,配置上创建的脚本:
然后重启Alert Publisher服务。
6. 验证
验证的话可以在操作系统中kill掉相关进程,本文手动kill掉Hue Load Bancer服务,一会后收到相关邮件告警: