Prometheus+Grafana监控 学习记录

165次阅读
没有评论

共计 5926 个字符,预计需要花费 15 分钟才能阅读完成。

1. 安装 Prometheus Server

1.1 运行用户创建

groupadd prometheus
useradd -g prometheus -m -d /opt/prometheus/ -s /sbin/nologin prometheus

1.2 prometheus server安装

wget https://github.com/prometheus/prometheus/releases/download/v2.25.2/prometheus-2.25.2.linux-amd64.tar.gz
tar xzf prometheus-2.25.2.linux-amd64.tar.gz -C /opt/
cd /opt/prometheus-2.25.2.linux-amd64

1.3 prometheus配置语法校验

建议每次修改prometheus配置之后, 都进行语法校验, 以免导致 prometheus server无法启动. 

./promtool check config prometheus.yml

1.4 启动Prometheus

此时采用默认配置启动 prometheus server 看下界面, 稍后介绍如何监控Linux 服务器. 

./prometheus --config.file=prometheus.yml

1.5 通过浏览器访问prometheus

访问地址 ip:9090

Prometheus+Grafana监控 学习记录

发现 target 中只有 prometheus server, 因为我们还没有加入其他监控, 下面进行介绍, 后续博文中还将陆续介绍如何监控 redis, RabbitMQ, Kafka, nginx, java等常见服务. 

Prometheus+Grafana监控 学习记录

prometheus默认配置:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    scrape_interval: 10s
    static_configs:
    - targets: ['localhost:9090']

1.6 设置prometheus系统服务,并配置开机启动

touch /usr/lib/systemd/system/prometheus.service
chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service
vi /usr/lib/systemd/system/prometheus.service

将如下配置写入prometheus.servie

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
# --storage.tsdb.path是可选项,/opt/prometheus/为存放执行文件目录,请根据自己实际地址填写,默认数据目录在运行目录的./dada目录中
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/opt/prometheus/data --storage.tsdb.retention=60d
Restart=on-failure

[Install]
WantedBy=multi-user.target

Prometheus启动参数说明

  • --config.file -- 指明prometheus的配置文件路径
  • --web.enable-lifecycle -- 指明prometheus配置更改后可以进行热加载
  • --storage.tsdb.path -- 指明监控数据存储路径
  • --storage.tsdb.retention --指明数据保留时间

设置开机启动

systemctl daemon-reload
systemctl enable prometheus.service
systemctl status prometheus.service
systemctl restart prometheus.service

说明: prometheus在2.0之后默认的热加载配置没有开启, 配置修改后, 需要重启prometheus server才能生效, 这对于生产环境的监控是不可容忍的, 所以我们需要开启prometheus server的配置热加载功能.

在启动prometheus时加上参数 web.enable-lifecycle , 可以启用配置的热加载, 配置修改后, 热加载配置: 

curl -X POST  http://localhost:9090/-/reload

2. Prometheus 配置监控其他Linux主机(以下操作在其他Linux机器上面执行)

2.1 node_exporter安装配置

# 运行用户添加
groupadd prometheus
# /usr/local/node_exporter/ 为准备存放监控文件的路径
useradd -g prometheus -m -d /usr/local/node_exporter/ -s /sbin/nologin prometheus
# 前往 https://prometheus.io/download/#node_exporter 下载node_server
wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz

# 解压到指定目录并删除下载文件 
tar -zxf node_exporter-1.1.2.linux-amd64.tar.gz 
mv node_exporter-1.1.2.linux-amd64 /usr/local/ 
ln -sv /usr/local/node_exporter-1.1.2.linux-amd64 /usr/local/node_exporter 
rm -f node_exporter-1.1.2.linux-amd64.tar.gz

# 系统服务配置 node_exporter 
touch /usr/lib/systemd/system/node_exporter.service 
chown prometheus:prometheus /usr/lib/systemd/system/node_exporter.service 
chown -R prometheus:prometheus /usr/local/node_exporter* 
vi /usr/lib/systemd/system/node_exporter.service

在node_exporter.service中加入如下代码:

[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
# /usr/local/node_exporter/node_exporter 为执行文件路径,请自行修改
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

启动 node_exporter 服务并设置开机启动

systemctl daemon-reload
systemctl enable node_exporter.service
systemctl start node_exporter.service
systemctl status node_exporter.service
systemctl restart node_exporter.service
systemctl start node_exporter.service
systemctl stop node_exporter.service

node_exporter启动成功后, 你就可以通过如下api看到你的监控数据了(将下面的node_exporter_server_ip替换成你的node_exporter的IP地址, 放到浏览器中访问就可以了 ). 

http://node_exporter_server_ip:9100/metrics

为了更好的展示, 接下来我们将这个api 配置到 prometheus server中, 并通过grafana进行展示.

将 node_exporter 加入 prometheus.yml配置中 ,完整的prometheus.yml文件如下

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
# 下面内容为新增 ,yml文件存放地址请自行修改
  - job_name: 'Linux'
    file_sd_configs:
    - files: ['/opt/prometheus/linux.yml']
      refresh_interval: 5s

并在文件/opt/prometheus/linux.yml中写入如下内容

- targets: ['192.168.64.202:9100']
      labels:
        name: 'linux-node01'

如果你按照上面的方式配置了, 但是使用工具 promtool检测prometheus配置时,没有通过, 那肯定是你写的语法有问题, 不符合yml格式. 请仔细检查下. 如有疑问, 可以在下方评论区留言. 

这样做的好处是, 方便以后配置监控自动化, 规范化, 将每一类的监控放到自己的配置文件中, 方便维护. 

#配置语法校验
./promtool check config prometheus.yml
# 重载prometheus配置
curl -X POST  http://localhost:9090/-/reload

Prometheus+Grafana监控 学习记录

3 数据展示Grafana安装配置(主机器)

下载地址: https://grafana.com/grafana/download

wget https://dl.grafana.com/oss/release/grafana-7.4.5-1.x86_64.rpm
sudo yum install grafana-7.4.5-1.x86_64.rpm

granafa默认端口为3000,可以在浏览器中输入http://localhost:3000/

granafa首次登录账户名和密码admin/admin,可以修改

如果发现访问不了,请手动执行启动服务命令

# 启动
systemctl start grafana-server
# 加入开机启动
systemctl enable grafana-server
# 查看服务启动状态
systemctl status grafana-server

配置数据源

Prometheus+Grafana监控 学习记录

Prometheus+Grafana监控 学习记录Prometheus+Grafana监控 学习记录Prometheus+Grafana监控 学习记录

Prometheus+Grafana监控 学习记录

新增其他Linux机器

只需要安装 安装 node_exporter 方法重复执行即可

注意 /opt/prometheus/linux.yml 文件新增内容

- targets: ['192.168.64.202:9100']
      labels:
        name: 'linux-node01'
# 新加入的机器
    - targets: ['192.168.64.203:9100']
      labels:
        name: 'linux-node02'

正文完
 0
Eric chan
版权声明:本站原创文章,由 Eric chan 于2021-03-25发表,共计5926字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。