The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt

上传人（卖家）：晟晟文业

文档编号：3761716

上传时间：2022-10-10

格式：PPT

页数：30

大小：684.68KB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

22 文币

交易提醒：下载本文档，相应价格的文币将全额进入上传人（卖家）的账号。立即下载优惠套餐（点此详情）

【下载声明】
1. 本站全部试题类文档，若标题没写含答案，则无答案；标题注明含答案的文档，主观题也可能无答案。请谨慎下单，一旦售出，不予退换。
2. 本站全部PPT文档均不含视频和音频，PPT中出现的音频或视频标识（或文字）仅表示流程，实际无音频或视频文件。请谨慎下单，一旦售出，不予退换。
3. 本页资料《The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt》由用户（晟晟文业）主动上传，其收益全归该用户。163文库仅提供信息存储空间，仅对该用户上传内容的表现方式做保护处理，对上传内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知163文库（点击联系客服），我们立即给予删除！
4. 请根据预览情况，自愿下载本文。本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
5. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007及以上版本和PDF阅读器，压缩文件请下载最新的WinRAR软件解压。

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: The Design and Architecture of Microsoft Cluster Service 微软集群服务设计结构精选课件

资源描述：: 1、The Design and Architecture of the Microsoft Cluster Service(MSCS)-W.Vogels et al.ECE 845 PresentationBySandeep TamboliApril 18,20001Outline Prerequisites Introduction Design Goals Cluster Abstractions Cluster Operation Cluster Architecture Implementation Examples Summary2Prerequisites Availability=
2、MTTF/(MTTF+MTTR)MTTF:Mean Time To Failure MTTR:Mean Time To Repair High Availability:Modern taxonomy of High Availability:A system having sufficient redundancy in components to mask certain defined faults,has High Availability(HA).IBM High Availability Services:The goals of high availability solutio
3、ns are to minimize both the number of service interruptions and the time needed to recover when an outage does occur.High availability is not a specific technology nor a quantifiable attribute;it is a goal to be reached.This goal is different for each system and is based on the specific needs of the
4、 business the system supports.The presenter:May have degraded performance while a component is down3MSCS(a.k.a.Wolfpack)Extension of Windows NT to improve availability First phase of implementation Scalability limited up to 2 nodes MSCS features:Fail over Migration Automated restart Differences with
5、 previous HA solutions:Simpler User Interface More sophisticated modeling of applications Tighter integration with the OS(NT)4MSCS(2)Shared nothing cluster model:Each node owns a subset of cluster resources Only one node may own a resource at a time On failure,another node may take the resource owne
6、rship5Design Goals Commodity Commercial-off-the-shelf nodes Windows NT server Standard Internet protocols Scalability Transparency Presented as a single system to the clients System management tools manage as if a single server Service and system execution information available in single cluster wid
7、e log6Design Goals(2)Availability On failure detection Restart application on another node Migrate other resources ownership Restart policy can specify availability requirements of the application Hardware/software upgrades possible in phased manner7Cluster Abstractions Node:Runs an instance of Clus
8、ter Service Defined and active Resource Functionality offered at a node Physical:printer Logical:IP address Applications implement logical resources Exchange mail database SAP applications Quorum Resource Persistent storage for Cluster Configuration Database Arbitration mechanism to control membersh
9、ip Partition on a fault tolerant shared SCSI disk8Cluster Abstractions(2)Resource Dependencies Dependency trees:Sequence to bring resources online Resource Groups Unit of migration Virtual servers Application runs within virtual server environment Illusion to applications,administrators,and clients
10、of a single stable environment Client connects using virtual server name Enables many application instances to run on a same physical node9Cluster Abstractions(3)Cluster Configuration Database Replicated at each node Accessed through NT registry updates applied using Global Update Protocol10Cluster
11、Membership OperationOfflineStart ClusterService FailsCluster ServiceStartedMemberSearchJoiningPausedOnlineExitingSleepingQuorumDisk SearchFormingResumePauseJoinSucceedsJoin FailsFoundOnlineMemberSearch FailsSearch FailsEvict or Leave ClusterShutdown System Stop Cluster ServiceSynchronizeSucceedsTime
12、outRetriesExceededCompleteRundownQuorumDiskFoundInitializingKey:-Externally visibile state-Internal state11Member Join Sponsor broadcasts the identity of the joining node Sponsor informs the joining node about Current membership Cluster configuration database Joining members heartbeats start Sponsor
13、 waits for the first heartbeat Sponsor signals the other nodes to consider the joining node a full member Acknowledgement is sent to the joining node On failure,Join operation aborted Joining node removed from the membership12Member Regroup Upon suspicion that an active node has failed,member regrou
14、p operation is executed to detect any membership changes Reasons for suspicion:missing heartbeats power failures The regroup algorithm moves each node through 6 stages Each node sends periodic messages to all other nodes,indicating which stage it has finished Barrier synchronization13Regroup Algorit
15、hmActivate:After a local clock tick,each node sends and collects status messages Node advances if all responses collected or timeout occursClosing:It is determined if partitions exist and if current nodes partition should survivePruning:All nodes that are pruned for lack of connectivity,haltCleanup
16、phase one:All the surviving nodes Install new membership Mark the halted nodes as inactive Inform the cluster network manager to filter out halted nodes messages Make event manager invoke local callback handlers announcing node failuresCleanup phase two:A second cleanup callback is invoked to allow
17、a coordinated two-phase cleanupStabilized:The regroup has finished14Partition SurvivalA partition survives if any of the following is satisfied:n(new membership)1/2*n(original membership)Following three conditions satisfied together n(new membership)=1/2*n(Original membership)n(new membership)2 tieb
18、reaker node (new membership)Following three conditions satisfied together n(original membership)=2 n(new membership)=1 quorum disk (new membership)15Resource ManagementResource control DLL for each type of resourcePolymorphic design allows easy management of varied resource typesResource state trans
19、ition diagram:OfflineOnline-pendingFailedOffline-pendingOnlineRequest to offlineRequest to offlineRequest to onlineInit failedInit completeShutdown complete16Resource Migration:Pushing a group Executed when Resource failure at the original node Resource group prefers to execute at other node Adminis
20、trator moves the group Steps involved:All resources taken to offline state A new active host node selected Brought online at the new node17Resource Migration:Pulling a group Executed when The original node fails Steps involved A new active host node selected Brought online at the new node Nodes can
21、determine the new owner hosts without communicating with each other with the help of replicated cluster database18Resource Migration:Fail-back No automatic migration to preferred owner Constrained by fail-back window:How long must the node be up and running Blackout periods Fail-back deferred for co
22、st or availability reasons19Cluster ArchitectureComponents of the Cluster ServiceComponentFunctionalityEvent processorProvides intra-component event delivery serviceObject managerA simple object management system for the object collections in the Cluster ServiceNode managerControls the quorum Form a
23、nd Join process,generates node failure notifications,and managesnetwork and node objectsMembership managerHandles the dynamic cluster membership changesGlobal Update managerA distributed atomic update service for the volatile global cluster state variables.Database managerImplements the Cluster Conf
24、iguration DatabaseCheckpoint managerStores the current state of a resource(in general its registry entries)on persistent storage.Log managerProvides structured logging to persistent storage and a light-weight transaction mechanismResource managerControls the configuration and state of resources and
25、resource dependency trees.It monitors activeresources to see if they are still onlineFailover managerControls the placement of resource groups at cluster nodes.Responds to configuration changes andfailure notifications by migrating resource groupsNetwork mangerProvides inter-node communication among
26、 cluster members20Global Update Management Atomic broadcast protocol If one surviving member receives an update,all the surviving members eventually receive the update Locker node has a central role Steps in normal execution:A node wanting to start global update contacts the locker When accepted by
27、locker,the sender RPCs to each active node to install the update,in the order of node-ID starting with the node immediately after the locker Once global update is over,the sender sends the locker an unlock request to indicate successful termination21Failure Conditions If all the nodes that received
28、update fail=update never occurred If sender fails during the update operation Locker reconstructs the update and sends it to each active node Nodes ignore the duplicate update If sender and locker both fail after sender installed the update at any node beyond the locker The next node in the update l
29、ist is assigned as a new locker The new locker will complete the update22Support Components Cluster Network:Extension to the basic OS Heartbeat management Cluster Disk Driver:Extension to the basic OS Shared SCSI bus Cluster wide Event Logging Events sent via RPC to all other nodes (periodically)Tim
30、e Service Clock synchronization23Implementation Examples MS SQL Server A SQL Server resource group configured as Virtual Server 2-node cluster can have 2 or more HA SQL Servers Oracle servers Oracle Parallel Server Shared disk model Uses MSCS to track cluster organization and membership notification
31、s Oracle Fail-Safe server Each instance of Fail-Safe database is a virtual server Upon failure:The virtual server migrates to the other node The clients reconnect under the same name and address24Implementation Examples(2)SAP R/3 Three-tier client/server system Normal operation:One node hosts databa
32、se virtual server The other provides application components combined in a server Upon failure:The failed virtual server migrates to the surviving node The application servers are failover aware Migration of the application server needs new login session25Scalability Issues:Join Latency,Regroup messa
33、ges,GUP Latency,GUP throughput26Summary A highly available 2-node cluster design using commodity components Cluster is managed in 3 tiers Cluster abstractions Cluster operation Cluster Service components(interaction with OS)Design not scalable beyond about 16 nodes27Relevant URLs A Modern Taxonomy o
34、f High Availability interlog/resnick/HA.htm An overview of Clustering in Windows NT Server 4.0,Enterprise Edition microsoft/ntserver/ntserverenterprise/exec/overview/clustering.asp Scalability of MSCS cs.cornell.edu/rdc/mscs/nt98/IBM High Availability Services as.ibm/asus/highavail2.html High-Availa
35、bility Linux Project linux-ha.org/28Discussion Questions Is clustering the only choice for HA systems?Why is MSCS in use today despite of its scalability concerns?Does performance suffer because of HA provisions?Why?Are geographical HA solutions needed(in order to take care of site disasters)?This i
36、s good for transaction oriented services.What about,say,scientific computing?Hierarchical clustering?29GlossaryNetBIOS:Short for Network Basic Input Output System,an application programming interface(API)that augments the DOS BIOS by adding special functions for local-area networks(LANs).Almost all
37、LANs for PCs are based on the NetBIOS.Some LAN manufacturers have even extended it,adding additional network capabilities.NetBIOS relies on a message format called Server Message Block(SMB).SMB:Short for Server Message Block,a message format used by DOS and Windows to share files,directories and dev
38、ices.NetBIOS is based on the SMB format,and many network products use SMB.These SMB-based networks include Lan Manager,Windows for Workgroups,Windows NT,and LanServer.There are also a number of products that use SMB to enable file sharing among different operating system platforms.A product called Samba,for example,enables UNIX and Windows machines to share directories and files.30

展开阅读全文

163文库所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt
链接地址：https://www.163wenku.com/p-3761716.html

晟晟文业

内容提供者

实名认证

联系作者