提到「Prometheus」的文章

…M，甚至可能会失败； - 重复 OOM，在 replay WAL 期间也可能会触发 OOM； - 因为 OOM 导致的停机导致监控中断活潜在的 metrics 丢失。在进行评估后，最终决定将其从 prometheus 迁移到了 VictoriaMetrics，带来的好处是： - 更好的资源利用率，CPU usage 降低 50%，内存使用率保持在 35% 以下，性能稳定； - 更好的查询性能； - 更低的…

…ium](https://valyala.medium.com/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832) 作者认为 prometheus 的 irate 无法正确的捕获峰值数据，因为 irate 只计算当前区间的最后两个点，可能会丢失其他峰值数据，具体是否有效取决于 start 和 end 以及 step 数值。作者建议使…

…urce. How many hours do your company’s engineers spend staring at dashboards? 配合阅读：Gitlab 如何使用 prometheus 进行异常检测： [How to use Prometheus for anomaly detection in GitLab](https://about.gitlab.com/blog/2019/…

…g.palark.com/vector-to-export-pgsql-logs-into-prometheus/) 通过 vector 将 PostgreSQL 的异常日志暴露给 prometheus，只需要使用 vector 就可以实现日志的读取、解析、结构化、过滤、转换为 exporter。最终暴露的指标是异常日志的数量，通过检查最近 30min 内日志数量是否有变化来产生报警。最近有一个需…

…tps://tomscii.sig7.se/2022/07/uMon-stupid-simple-monitoring) 作者逆潮流的选择了 RRDtool 来实现自己的监控系统。吐槽 prometheus & Grafana：占据不少磁盘空间的 TSDB、内存资源消耗、配置复杂、网络通信（闲置系统中的噪音）、复杂的前端页面、DSL。 --- [Don't trust default…

…gs，Metrics，Traces 。日志级别和格式混乱,没有共识。日志被用于各种用途,从调试到业务分析,使其变得极为关键，存储成本高昂,大多数日志从未被使用。 Metrics 很好，但是当前 prometheus 生态在后续的扩展上没有那么容易。 --- [Backup and Restore Containers With Kubernetes Checkpointing API | by Martin…

…Prometheus 的 Summary 和 Histogram prometheus 的 Summary 类型和 Histogram 类型。之前内部实现网络监控功能的时候，使用了 Histogram ，Bucket 的划分粒度比较重要，如果划分的不好那么结果的偏差可能会比…

…ever](https://drafts.damnever.com/2022/some-important-notes-about-victoriametrics.html) 自己只是很简单的使用 prometheus，读下来并没有特别深的感受，这里引用组里同事的评论： > 关于 wal 的部分，victoriametrics 作者的文章说的是数据损坏比数据丢失更严重，更难以处理，因为总会有掉电，只能二选一，wa…

…overflow'" --- Prometheus 数据存储那些事儿 - luozhiyuns Blog prometheus 数据存储介绍。 --- How do Nix builds work?…