来自GoogleDevOps经验的落地实践-SRE课件.pptx
- 【下载声明】
1. 本站全部试题类文档,若标题没写含答案,则无答案;标题注明含答案的文档,主观题也可能无答案。请谨慎下单,一旦售出,不予退换。
2. 本站全部PPT文档均不含视频和音频,PPT中出现的音频或视频标识(或文字)仅表示流程,实际无音频或视频文件。请谨慎下单,一旦售出,不予退换。
3. 本页资料《来自GoogleDevOps经验的落地实践-SRE课件.pptx》由用户(晟晟文业)主动上传,其收益全归该用户。163文库仅提供信息存储空间,仅对该用户上传内容的表现方式做保护处理,对上传内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知163文库(点击联系客服),我们立即给予删除!
4. 请根据预览情况,自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器,压缩文件请下载最新的WinRAR软件解压。
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 来自 GoogleDevOps 经验 落地 实践 SRE 课件
- 资源描述:
-
1、SRE 是什么鬼Google SRE 07-14 YouTube Streaming Video transcoding,streaming,storage(1PB/month )Global CDN network(10K nodes,peaking 10Tbps egress).Google SRE 07-14 Google Cloud Platform Machine lifecycle management(X clusters globally,Y machines)Borg ,Omega(X million jobs scheduled every week)SITE RELIAB
2、ILITY ENGINEERING说白了就是 DevOps 一回事Site 生产线管理员 Ensure user-visible uptime and service quality Authority over production environment.跟网站一起成长 Steep learning curve,mostly due to complexity Continuous retraining,sites always being improved 基础架构设施 Specializations for shared infrastructure Ensure those comp
3、onents have good reliabilityReliability it just works Service Level Objective (SLO)Monitoring/Deployment Capacity Planning 以一敌百 Team manages monitoring and develops automation Implies use of scripting and data analysis tools Most failures need automated recoveries in place 救火队员和纵火犯合体 Elevated risk d
4、uring convenient working hours Learn of age mortality risk during preceding workday Infant mortality ideally also avoids mealsEngineering 码农 Not administration 报警系统重度(中毒)用户 Holes may cause outage before notification occurs Routinely use multiple layers,levels and viewpoints Design the manual and aut
5、omatic escalation paths 对未来负责 Responsible for enabling growth and scaling Plan for requirements,identify inefficiencies File bugs and,where appropriate,fix them tooWho are SRE 跑偏了的程序员 50-50 mix of software background systemsengineering background.重度强迫症和处女座 “a team of people who fundamentally will no
6、t acceptdoing things over and over by hand.“Ben Treynor 脸皮厚 DEV/OPSEternal conflict DEV The incentive of the development team is to getfeatures launched and to get users to adopt theproduct.OPS The incentives of a team with operational duties is toensure that the thing doesnt blow up on their watch.
7、一图看懂组织结构 BOSS 产品线 小BOSS 艺术类 开发团队 生产线 APP SRE Infrastructure SRE 数据中心运营 供应链组织结构 以各产品线为核心,松散的学习型组织 Get Incentives right.SRE is a privilege,not a right.Free to move,Free to leave bad service.SRE 要做什么 SRE 说了算 Production Readiness Review(PRR).ROI matters most for SRE SRE resource is limited High marginal
8、 benefits work.Early phase SRE gives guidance in automating routine tasks Reduces workload by eliminating administrivia SRE points out errors,omissions in documents Developer might then beg others for assistance SRE suggests additional long term monitors These fill in coverage gaps and track perform
9、ance Administrators need sufficient,trustworthy monitoringMature phaseThe decisions become progressively longer term Daily task workload for a site is getting reduced Software improvements are tuning and analysisThe developer still has a short term viewpoint Working on the next release,fixing known
10、bugs The old live releases start to be a distraction An obvious incentive to request site transfer to SREONCALL PHASE On call more than quick fixes SRE team members take turns.Fix any problem whose solution is not yet automated Accumulate occurrence counts to identify prioritiesDocument the effectiv
11、e diagnostics and solutions The permanent solution takes a lot more time File bug,develop patch,test,code review,submit Schedule for integration,release and deployment Why spend many hours or days doing all that?Deployment model Following the sun.Only one engineer responds to any given alert Use a p
展开阅读全文