Operation and Maintenance System Architecture Design and Practice for Cloud-network Integration

As cloud-network integration, 5G and 6G, all-optical network, and IP-based services evolve, operational systems of operators face new challenges. The current operational systems suffer from inconsistent technical architecture, insufficient data sharing, lack of decoupling capability, and inadequate intelligence, leading to a high maintenance workload and poor customer experience. This paper analyzes the requirements of intelligent operation systems for operators and innovatively proposes an architecture and design scheme for intelligent operational systems based on cloud-network integration. The paper summarizes the practical experience of the new generation of cloud-network operation and maintenance systems and proposes improvements and outlooks for the development of the next generation of operational management systems.


Introduction
As cloud-network convergence, 5G and 6G, all-optical networks, and IP-based services continue to develop, users' demands for quality experience from network operators are becoming increasingly stronger.They are more inclined to choose operators based on experience factors such as network quality, speed, and convenience [1] [2].At the same time, because operators desire to reduce operational costs and improve network operations efficiency, current operational requirements cannot be met through the procurement of independent systems from operational system developers.Firstly, because different systems are developed by different developers, infrastructure is not uniform, and the deployment, monitoring, and operation of systems cannot be unified, resulting in a lack of rapid iteration, repeat development of the same functions, inability to share between systems, and low security.Secondly, due to independent systems and inconsistent data models, data cannot be unified and shared.Finally, due to the isolation between systems, significant differences in intelligence capabilities, and the effect of the weakest link, operational and maintenance intelligence levels are low, and efficiency is also low.
International standardization organizations and operators are actively exploring new operational system architectures and technologies.The TM Forum was the first to propose the NG OSS (Next Generation Operations Support System) functional model, eTOM (enhanced Telecom Operations Map), which has been widely accepted by international telecommunications operators, equipment manufacturers, and operational maintenance system developers [3].eTOM is the standard for managing IT services in the telecommunications industry [4] [5].In recent years, most global telecommunications operators have referred to the eTOM model, and have designed and developed operational maintenance systems that meet their business requirements based on their own business and network characteristics, as well as operational and maintenance needs [6].The ITU-T (International Telecommunication Union, Telecommunication Standardization Sector) has proposed a new layered architecture for managing the AITOM (Artificial Intelligence-enhanced Telecom Operations and Management) system [7].The new architecture is divided into an infrastructure management layer, data fusion management layer, management service capability layer, scenario application layer, and customer-oriented market layer.It also clearly outlines the positioning of artificial intelligence in the entire management architecture and its interaction with other layers.
AT&T  Based on the research and continuous practices of global operators in operation technology, the architecture of operation support system (OSS) technology can be mainly divided into three domains: OSS, business support system (BSS), and management support system (MSS).The OSS domain is mainly composed of network management systems from various equipment manufacturers, which realize configuration management, performance management, and fault management of professional networks.The BSS domain is developed and operated by independent developers of the operation support system, mainly for customer management, product management, and billing functions.The enterprise management MSS domain realizes strategic, financial, and human resources management, mainly designed and developed by enterprise management software developers.The relative separation of the OSS domain, BSS domain, and MSS domain, with only a few data interactions and interface interactions, results in the current problem of long processes, low efficiency, and inconsistent data of the operation and support system for operators.This paper mainly elaborates on the functional and technical architecture design of a new generation of cloud-network integrated operation and maintenance system, which integrates BSS, OSS, and MSS, based on the references from the eTOM process model, the layered architecture of intelligent and application capability center from AITOM model, and the design philosophy of design state and running state separation from ONAP system, combined with the operational requirements of cloud-network integration.Furthermore, the practical experience of the new generation of cloud-network integrated operation and maintenance systems at China Telecom is summarized, and the improvement and prospects of the next generation of intelligent operation and maintenance systems are proposed.

Requirements of the cloud-network integration operation and maintenance system
The rise of China's new infrastructure construction represents a new stage of digital transformation for industries.This poses new demands for information infrastructure construction, requiring a new infrastructure system that provides services such as digital transformation, intelligent upgrading, and integrated innovation.Therefore, the cloud network infrastructure of operators needs to meet the development requirements of the new infrastructure, providing new capabilities such as digitization, intelligence, and IT/CT integration, aggregating and opening up various new capabilities, empowering all walks of life, and promoting high-quality development of the entire social and economic life.Cloudnetwork integration can help telecom operators achieve transformation, change the service concept of independent products such as cloud services and network services, and promote cloud-network fusion products.The service content will extend from bottom to top and enhance core competitiveness, and operators will become comprehensive intelligent information service providers.The operating system is an important foundation for achieving cloud-network fusion, but the existing operating system of operators has various problems such as low operating efficiency, slow product launch, long business activation cycle, and poor customer perception, which cannot effectively support the comprehensive promotion of cloud-network fusion.Currently, operators have the following improvement requirements for their operating and maintenance systems:  Network capability decoupling: Traditional methods provide services directly from basic infrastructure and business capabilities are dispersed in various business systems, resulting in long response cycles for new businesses and weak cross-cloud-network comprehensive business capabilities.It is urgently needed to decouple business systems from basic infrastructure and form a middle platform between the front and back office, making the middle platform the link to empower new businesses with basic infrastructure. Cloud computing and network capability integration: Single-cloud services or network services are difficult to meet customer needs, and one-stop cloud/network/edge/terminal services need to be provided.The operating system needs to perceive global capabilities and flexibly combine various cloud-network capabilities on demand.

 Rapid deployment of new business provides users with a what-you-see-is-what-you-get new
business experience.Cloud-network services realize automatic scheduling and one-click activation.Business status is perceived in real-time and intelligently safeguarded.The demand for new businesses has brought about changes in the cloud-network operation mode, requiring the IT system to evolve from traditional support systems to brain command systems, abstract digital cloud-network capabilities from cloud-network infrastructure, and aggregate/extend/open up capabilities through the intelligent middle platform to meet business requirements such as ToC/ToB [9]. Capability aggregation, system middle-platform: Traditional operation support systems need to be horizontally segmented, and general capabilities need to be transformed into microservices.New systems such as big data/AI can directly adopt a microservices architecture, and general capabilities can be aggregated into the middle platform.Based on the new requirements of these operational support systems, it is crucial to innovate the architecture design and technological foundation of the current operational systems of operators.In terms of functional architecture, it is necessary to achieve functional integration, professional decoupling, and capability reuse.In terms of technical architecture, it is necessary to achieve technical unity, rapid development iteration, and security monitoring.

Functional architecture design of new generation cloud-network integration operation and maintenance system
Based on the requirements of cloud network integration and intelligent operation and maintenance system, by integrating network management capabilities such as core network, wireless network, carrier network, transmission network, cloud, NFV, etc., we break the chimney-like construction mode of various disciplines, build a comprehensive and new cloud network integration operation and maintenance system, and achieve end-to-end collaborative operation and maintenance of cloud network.The New Generation Cloud-Network Operation System functional architecture is shown in Figure 2. The biggest feature of the new generation cloud-network fusion operation and maintenance system is the complete integration of network operation support systems, business operation support systems, and management operation support systems, achieving full integration and sharing of network capabilities, operation capabilities, operation data, and network data.Drawing on the application ability center layering architecture design concept of the AITOM model, the new generation cloud-network fusion operation and maintenance system adopts the design principle of separating capabilities and applications, integrating the core cloud-network capabilities that run through the entire enterprise operation, service, and management process into various capability centers.It flexibly constructs upperlayer scenario applications for customers, partners, and employees.
Capability centers are units that carry specific business logic, generate business data, have business value, and have reusable capabilities that can be opened.Capability centers encapsulate capabilities into OpenAPIs according to the principles of "standardization, service-orientation, and reusability," and open them to the outside through an atomic capability platform.Each capability center has clear boundaries and independent services.Capability centers aim to reduce duplicate development and construction, improve data consistency, enhance enterprise resource utilization and demand response efficiency, and unify customer experience through the sharing of related capabilities.The new generation cloud-network operation and maintenance business system covers five capability domains: marketing, operation, billing, management support, and basic capability domains, encompassing 24 capability centers.It mainly realizes the digitization of cloud-network operation elements and process digitization, providing a digital twin foundation for upper-layer applications.Subsequent capability centers can be dynamically adjusted according to actual business needs.
Among the 24 capability centers, the collection control center and resource center serve as the foundation for the cloud-network integrated operation and maintenance system, realizing the collection, interaction, rapid construction, and sharing of cloud-network data.The collection control center is the only channel for real-time collection and business interaction, mainly used for the unified collection of cloud and network data, model conversion and storage, pre-processing of various professional data, and encapsulation and opening of professional capabilities for network operation control, maintenance management, and other capabilities.The resource center provides an important foundation for standardized end-to-end network data services and business service capabilities.The resource center provides dynamic maintenance capabilities, data services, and subscription management capabilities for the entire lifecycle of cloud and network resources, and opens up standardized end-to-end cloudnetwork fusion business resource service capabilities to the outside world.It realizes a direct connection between the new generation cloud-network integrated operation and maintenance system and the data center, and pushes platform resources, network operations, and other related data, achieving rapid subscription and sharing of network resource data.By collecting and controlling cloud and network data and controls required by other centers through the collection control center and resource center, unified collection, control, and management of network capabilities and data can be achieved.This saves investment waste caused by the repeated construction of various network professional systems and improves operational efficiency.
Scenario application is a lightweight system portal that supports specific business scenarios of the enterprise and is flexibly constructed on demand.It directly provides convenient and friendly operation interfaces and self-service capabilities for customers, employees, and partners.The scenario application calls the core capabilities of the capability center through the atomic capability's platform, flexibly combines the service API according to the needs of the business scenario, and iterates quickly.The mature and reusable functions in the scenario application can also be precipitated to the capability center.[10] The main scenario applications include 5GC cloud network intelligent driving, mainly based on the public base core capabilities of the new generation of cloud network operation and business system, running through the business platform, data platform, and crossing BMO, realizing digital linkage of network, business, and operation and maintenance.We construct the "Intelligent Driving" digital operation application scenario of the 5GC cloud network.With the goal of providing zero waiting time, zero contact, and zero faults for 5G mobile services, and the ability to operate the business, we ensure activation and maintenance operation is realized.With the aim of creating a self-configuring, selfhealing, and self-optimizing 5GC cloud intelligent network, the ability to operate and maintain the 5GC network, including the digitization of network asset lifecycles, self-healing, and intelligent operation and maintenance, is achieved.

Technical architecture design of new generation cloud-network integration operation and maintenance system
Driven by business and operational needs, the construction of a standardized, normalized newgeneration cloud architecture promotes the standardization of software development encoding, security automation testing, agile process control, software delivery requirements, and intensive intelligent operation and maintenance requirements.This will gradually realize the standardization, agility, high efficiency, security, and stability of the development and operation processes of the new-generation cloud and network integration operation and maintenance system.
 The software architecture is designed to be cloud-native.In order to meet the technical requirements of front-end and back-end separation, application and data decoupling, centralization, microservice design, stateless design, application and configuration separation, unified logging, horizontal expansion, rapid startup and graceful shutdown, containerized deployment, and agile delivery of applications, the cloud technology has the ability to expand and contract horizontally and supports online and grey-scale deployment, maintaining the continuity of service and possessing high availability capabilities that match service level agreements.
[11]  Software development is standardized, agile, secure, and reliable.Grounded in the DevOps concept, technical points for the software development life cycle, including development, testing, packaging, integration, publishing, and deployment, are clearly defined.Relying on cloud platforms and agile Scrum modeling, software quality is quantitatively controlled, and software development and iteration processes are agile and efficient.It enhances the readability and maintainability of code for the telecom's own employees and ensures security and stability from the design of the architecture and source code [12]. Centralized architecture and microservices design refer to the construction of the system according to business functions, achieving architectural decoupling, adopting microservices architecture within the center, separating modules with different business iteration cycles, and further maintaining the lightness of services [13].Services are integrated using event-driven methods, reducing mutual dependencies; avoiding the need for multiple service system linkage upgrades for business versions; center-to-center service calls must implement timeout breaking and retry mechanisms according to business scenario requirements.If conditions permit, an elegant downgrade of services can be achieved.Applications in containers must have TCP and HTTP service health detection capabilities and can only provide services externally after they have been fully launched successfully. As a part of the operating system architecture design paper, the system is required to have integrated monitoring and operation features.It needs to meet the requirements of customeroriented, network-wide, integrated operation and maintenance for IaaS, PaaS, and SaaS [14].
Based on a unified monitoring platform, the system aims to promote centralized management of operation and maintenance resources and digitized intelligent monitoring.This ensures timely and efficient handling of alarms and faults, as well as standardized and unified change management.By clarifying business goals and establishing a quantitative evaluation system, the system aims to achieve visibility, manageability, and controllability in terms of operation and maintenance.The technical architecture of the new-generation cloud-network integration operation and maintenance system is shown in Figure 3: Based on these technical characteristics and business requirements, different resource types are adapted to different scenarios.The new generation of cloud network integration operation and maintenance system divides basic resources into the data core domain and the business application domain.The capability center is deployed in the data core domain, and the application-facing business is deployed in the business application domain, achieving resource isolation, component isolation, data isolation, and permission isolation.Taking into account the importance of the integrated system, access volume, data volume, protection level, security isolation, operation and maintenance mode, component self-permission, resource isolation, and other characteristics, a flexible deployment mode is provided for the PaaS component cluster.Resource planning of core PaaS components gives priority to factors such as load and security, maximally reusing host resources, and adopting resource sharing within the cluster to achieve instance-level isolation.It can be expanded or split into multiple clusters as needed.Application deployment provides business capabilities, and resource planning is prioritized based on factors such as load, security, and internal control.Resource, data, and permission isolation are achieved through various means such as namespaces and host tags.The overall architecture of the system is secure, resource management is efficient, and code development is agile, realizing the integration of system operation and development.

The practical experience of cloud-network integrated operation and maintenance system
The new generation cloud-network integrated operation and maintenance system has been put into practice by China Telecom with the results shown in Table 1.The system is adapted for the development of 5G and 6G and features intelligent and agile cloud-network integration supply, operation, and service capabilities.The system has comprehensive cloudification, capability decoupling, data integration, and automatic intelligence.This system has achieved the following improvements. We break down professional barriers to realize horizontal connections for all modules; We realize the first fully professional network capability decoupling, fully opening cloud and network capabilities, and basically complete the service transformation of existing networks. Integration in the OSS, BSS, and MSS domains at the operational level realizes the combination of capability and application architecture, which can achieve efficient service design and orchestration, as well as agile support for businesses. End-to-end automatic opening and intelligent maintenance of cloud and network digital security provide customers with self-service monitoring, business on-demand, and security awareness capabilities.Currently, as shown in Table 2, more than 30, 000 standardized API interfaces have been opened in the new generation of cloud-network integrated operation and maintenance system, and the APIs are called 4.81 million times per month, reducing the cost of capability and system development.The new access equipment is plug-and-play, realizing the unified modeling of cloud and network devices or network elements and the leading edge of cloud and network dynamic characterization and AI reasoning

Challenges and prospects
Although the new generation of cloud-network integrated operation and maintenance system has been basically built and improved the operation and maintenance efficiency of China Telecom, customeroriented products have also been rapidly launched and provided.However, in the process of system construction and intelligence, there are still the following challenges.
 The independent research and development of the operating system by the operator is based on customized needs, which leads to high development costs.The developed functions can only be used by themselves and cannot be sold to the outside world.In addition, the original operation and maintenance system supplier has great resistance to cooperation, which reduces the development efficiency of the operating system and increases its cost. Due to the decoupling of functions, what a system can accomplish now depends on the underlying technology component services, and the API interface calls between multiple systems of other business micro-service modules can often be completed.If any API interface has a problem, it will directly affect the use of front-end business functions. The integration of professional network management capabilities such as core network, wireless network, bearing network, transmission network, cloud, and NFV with the organizational structure of operators focusing on professional network maintenance is not matched.A new cloud-network integrated operation and maintenance system is needed to achieve matching and integration between humans and systems.In order to meet the needs of user experience upgrade and digital transformation, 5G advanced networks and 6G networks need to evolve from the architectural and technical aspects, continuously enhance network capabilities, and achieve integration of cloud, calculation, and network.The development of intelligent technology will bring significant changes to the operation and maintenance of telecom operators.The scale of the cloud and network is gradually becoming larger, and it is necessary to explore more efficient and intelligent operation and maintenance solutions to improve customer service quality and reduce operating costs.The development of digital twin technology and resource sharing and unified scheduling among different networks and systems will also promote the transformation of cloud and network operation systems.Combining these new challenges and needs, we are also exploring a new flexible and efficient cloud network operating system (CNOS), which will gradually evolve the new generation of cloud network operation and maintenance systems into a cloud and network operating system.
Although China Telecom's new generation cloud-network integrated operation and maintenance system has been largely established and has improved China Telecom's operation and maintenance efficiency and the speed of product launch and delivery to customers, there are still many challenges during the construction and intelligentization process of the system.
 One of the challenges is that the development of the operator's self-developed operating system is based on customized requirements, which leads to high development costs and limited functionality that can only be used internally without external sales.Additionally, the original operating system vendors and their associated cooperation often create resistance, reducing the development efficiency of the operating system and increasing costs. The decoupling of capabilities means that what used to be accomplished by one system now requires dependency on underlying technology components, with multiple API interface calls between business microservice modules of multiple systems to complete tasks.If any of the API interfaces encounter problems, it will directly affect the use of business functions on the user front end. Integration of professional network management capabilities such as core networks, wireless networks, bearer networks, transmission networks, clouds, and NFV, does not match the organizational structure of operators whose focus is on professional network maintenance.A new cloud network operation and management system is required to match and integrate people with the system.In order to meet the needs of user experience upgrades and digital intelligent transformation, the 5G advanced network and 6G network need to evolve from architecture and technology perspectives, continuously enhancing network capabilities and achieving the integration of cloud, computing, and network [15] [16].The development of intelligent technology will bring significant changes to the operational and maintenance tasks of telecom operators [17].The scale of clouds and networks is gradually expanding, and more efficient and intelligent operation and maintenance solutions need to be explored to improve customer service quality and reduce operators' own operating costs.The development of digital twin technology will also promote the transformation of cloud and network operating systems as resources sharing and unified scheduling among different networks and systems become possible [18].In response to these new challenges and demands, a new flexible and efficient cloud network operating system (CNOS) is being explored to gradually evolve the new generation of cloud network operating systems into cloud and network operating systems [19].

Conclusion
This paper analyzes the architecture design of the operator's operation and maintenance system and the current requirements of the operator's operation and maintenance system and elaborates on the architecture and technical characteristics of China Telecom's next-generation cloud-network integrated operation and maintenance system.The integration of BSS, OSS, and MSS systems A new architecture is formed, which enables services to be reused, arranged, and combined to form new services, respond quickly to the market, and achieve business agility and innovation.Cloud-native technology architecture makes the system more flexible and secure.Compared with traditional operation and maintenance systems, the new generation of cloud-network integrated operation and maintenance systems realizes the rapid expansion and contraction of the system, improves the efficiency of system development and operation, and supports the rapid growth of the business.According to the needs of 6G technology development and operation, we will increase the evolution towards cloud-network operating systems in aspects such as adaptability to multi-source heterogeneous networks, unified scheduling of multiple factors, cloud-network resource twinning, and system-intrinsic intelligence, and improve the operator's intelligent operation capabilities and provide customers with better experience services.It also promotes the management of cloud and network infrastructure.
(American Telephone and Telegraph) has independently researched and cooperated to build a next-generation operating platform, ECOMP (Enhanced Control, Coordination, Management, and Policy), based on cloud, open sharing, and ICT (Information and Communications Technology) integration, and has worked with Linux Foundation Networking to jointly develop the ONAP (Open Network Automation Platform) architecture based on ECOMP.The ONAP architecture is designed to achieve closed-loop automation from the outset.ONAP provides an automated, integrated operations platform for network orchestration and business operations.The ONAP architecture provides resource, service, and product design tools, technologies, and storage repositories, executes rules and policies for design, and creates environment distribution at runtime while managing controllers for physical and virtual networks.The Active and Available Inventory (A&AI) component provides a real-time view of system resources, services, products, and their relationships.The ONAP open-source architecture has been widely used by operators.The ONAP architecture is shown in the figure [8].

Figure 2 .
Figure 2. New Generation Cloud-network Integration Operation and Maintenance System Functional Architecture.

Figure 3 .
Figure 3.The New Generation Cloud-network Integration Operation and Maintenance System technical architecture.

Table 1 .
Practical Results of the New Generation Cloud Network Comprehensive Operation and Maintenance System.

Table 2 .
The ubiquitous IaaS computing network, full stack PaaS platform, full professional cloud, and network are fully utilized to form an ecological foundation.At the same time, the new generation of cloud-network integrated operation and maintenance systems is evolving towards cloud and network operating systems.It supports digital applications in various industries and becomes a new information infrastructure service provider, an enabler of interconnection between industries, and a digital economy.Empowerment.Cloud-network integration capability invocation Region Number of online APIs Number of called APIs API capability times