Internet Research Task Force C. Janz Internet-Draft Huawei Canada Intended status: Informational D. King Expires: 4 September 2024 Lancaster University 3 March 2024 Telemetry Methodologies for Analog Measurement Instrumentation draft-janzking-nmrg-telemetry-instrumentation-01 Abstract Evolution toward network operations automation requires systems encompassing software-based analytics and decision-making. Network- based instrumentation provides crucial data for these components and processes. However, the proliferation of such instrumentation and the need to migrate the data it generates from the physical network to "off-the-network" software, poses challenges. In particular, analog measurement instrumentation, which generates time-continuous real number data, may generate significant data volumes. Methodologies for handling analog measurement instrumentation data will need to be identified and discussed, informed in part by consideration of requirements for the operation of network digital twins, which may be important software-realm consumers of such data. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 4 September 2024. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Janz & King Expires 4 September 2024 [Page 1] Internet-Draft Tele Methods Analog Measurement March 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Optical Network Measurement Instrumentation . . . . . . . . . 5 5. Telemetry Use Cases . . . . . . . . . . . . . . . . . . . . . 5 6. Analog Measurement Requirements . . . . . . . . . . . . . . . 6 6.1. Sampling . . . . . . . . . . . . . . . . . . . . . . . . 6 6.2. Time Precision . . . . . . . . . . . . . . . . . . . . . 7 6.3. Reduction and Other Pre-Processing . . . . . . . . . . . 7 6.4. Compression . . . . . . . . . . . . . . . . . . . . . . . 7 6.5. Programmable Streaming . . . . . . . . . . . . . . . . . 9 6.6. Streaming versus Polling . . . . . . . . . . . . . . . . 9 6.7. Communication Protocols . . . . . . . . . . . . . . . . . 10 6.8. Data Models . . . . . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 8. Operational Considerations . . . . . . . . . . . . . . . . . 11 9. Security Considerations . . . . . . . . . . . . . . . . . . . 11 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 11.1. Normative References . . . . . . . . . . . . . . . . . . 12 11.2. Informative References . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 1. Introduction Existing studies for network telemetry typically deal with packet- oriented measurements for generating packet traffic, path, discard, latency and other data [RFC7799], [OPSAWG-IFIT-FRAMEWORK]. However, some networking equipment and network operations scenarios feature or use more physically-oriented measurement instrumentation that generates data of a different character. Here, the particularities of data generated by such "analog" instrumentation are examined, and telemetry methodologies suitable for such data are considered. This consideration is informed by the requirements of specific use cases, including network digital twins. Optical networks, which are increasingly rich in analog instrumentation, are used as a specific example here. But the telemetry methodologies discussed may apply to instrumentation and telemetry intersecting a wide variety of networks and their related Janz & King Expires 4 September 2024 [Page 2] Internet-Draft Tele Methods Analog Measurement March 2024 operational software, for example, in support of digital twins that provide modeling of radio-based transmission, thermal characteristics or energy consumption. This document presents telemetry methodologies tailored for analog measurement instruments, aiming to enhance data accuracy, transmission efficiency, and real-time monitoring capabilities for network digital twins. The findings underscore the potential of these methodologies to for best practice for telemetry digital twin networks that require analog measurement instruments. It provides a state-of-the-art summary, including gaps and possible areas for further research 2. Terminology Network Digital Twin: A Network Digital Twin is a virtual replica of a physical network system that allows for the simulation, monitoring, and analysis of the network's behavior under various conditions without impacting the actual network. Network Measurement Instrumentation: Network Measurement Instrumentation refers to the tools, techniques, and systems used to collect, monitor, and analyze data about the performance and behavior of a network. This instrumentation is crucial for understanding how well the network is functioning, identifying problems, and making informed decisions to optimize network performance and reliability. 3. Background Photonic networks, which transmit data through light signals via fiber optic cables, are fundamental to telecommunications, internet services, data center operations, and many other critical aspects of modern digital infrastructure. A range of measurement instruments are routinely used in the deployment and maintenance of these networks. Key examples include: The concept of network slicing is a key capability to serve a customer with a wide variety of different service needs expressed as SLOs/SLEs in terms of, e.g., latency, reliability, capacity, and service function-specific capabilities. This section outlines the key capabilities required to realize network slicing in a TE-enabled IETF technology network. * Optical Time Domain Reflectometers (OTDRs): These devices are used to test the integrity of fiber optic cables by sending a series of light pulses into the fiber and measuring the light that is Janz & King Expires 4 September 2024 [Page 3] Internet-Draft Tele Methods Analog Measurement March 2024 scattered or reflected back. OTDRs can detect and locate faults, splices, and bends in fiber optic cables, and are crucial for both installation and troubleshooting; * Optical Spectrum Analyzers (OSAs): OSAs measure the power spectrum of optical devices to analyze the wavelength or frequency distribution of light. They are vital for characterizing the performance of components like lasers and optical amplifiers within the network; * Optical Power Meters and Light Sources: Used in tandem, these instruments measure the loss or attenuation in optical fibers and verify the power levels to ensure that signals are transmitted with sufficient strength without exceeding the damage threshold of the network components; * Network Analyzers and Bit Error Rate Testers (BERTs): These tools assess the overall performance of the optical network by analyzing parameters such as signal integrity, bit error rates, and network latency. They help in ensuring that the network can reliably handle the intended data loads; * Wavelength Division Multiplexing (WDM) Analyzers: WDM technology combines multiple optical carrier signals on a single optical fiber by using different wavelengths. WDM analyzers are specialized tools for testing and maintaining these systems, ensuring that each channel is transmitted efficiently without interference; * Dispersion Analyzers: These are used to measure chromatic and polarization mode dispersion in fiber optic cables, which can affect the quality and speed of data transmission. Managing dispersion is crucial for long-distance and high-data-rate optical communications. These instruments play a critical role in the characterization, deployment, optimization, and troubleshooting of optical networks. But their use tends to be restricted to specific operational phases, requires manual operation, and is generally not compatible with application to operating facilities. The term instrumentation refers more properly to "embedded" capability that is both operable on active infrastructure and capable of continuous measurement operation. Such instrumentation is a necessary foundation for telemetry Janz & King Expires 4 September 2024 [Page 4] Internet-Draft Tele Methods Analog Measurement March 2024 4. Optical Network Measurement Instrumentation Optical network instrumentation has typically focused on detecting transmission performance degradation, through measurement of error correction rates in FEC engines, counting of errored OTN frames, etc. Such measurements are typically executed on network elements through time-interval-based counting. The resulting counts may be forwarded to or collected by software on a subscription or polling basis. The data consists of series of integer numbers, or series of time stamp- integer number couplets. In recent years, however, the nature and scope of optical network instrumentation has broadened and deepened [JIANG]. The idea has been to instrument the optical network more richly to support more effective operations management, including using software-based analytics and modeling. Implicated network operations include network and connection planning and configuration, network and connection fault management (fault and impairment detection, classification, localization, preemption, correction), and others. The optical network is a high-performance analog transmission network, so, unsurprisingly, much of this new instrumentation is analog; that is, it produces time-continuous real-number data or data sets. Examples include optical loss, optical power (total, channel peak, etc.), optical spectra (narrow-band-filtered power measured at a series of center wavelengths), differential group delay (DGD), polarization mode dispersion (PMD), polarization dependent loss (PDL), Stokes vector components reflecting state of polarization (SOP), linear optical signal-to-noise ratio (OSNR) and generalized optical signal-to-noise ratio (GSNR). Many of these measurements are synthesized by coherent receivers across the network, while some may be synthesized by in-span elements such as amplifiers and ROADMs. 5. Telemetry Use Cases One application of this data in the software realm is with optical network digital twins (NDTs), used for transmission performance modeling [JANZ], [NMRG-PODTS]. Such NDTs constitute an important class of analytical engine supporting optical network and service planning and other operations, and they rely heavily on data from network instrumentation to enable accurate modeling of optical transmission performance on targeted variations of the actual network and service configuration, state and condition. A default expectation would be that all instrumentation measurements are reflected continuously in the software realm for use by optical NDTs. However, at best only an approximation to this can be achieved (e.g., only a series of sampled measurements may in fact be streamed from the network), so the imperative is to find efficient ways to support Janz & King Expires 4 September 2024 [Page 5] Internet-Draft Tele Methods Analog Measurement March 2024 sufficiently-accurate such approximations. This imperative grows more compelling the greater the scale of the network and the greater the richness of embedded instrumentation. A second example application lies in the fault management domain, wherein analysis of rich data, concentrated around the time of a detected evolution in transmission conditions, may be used to classify and localize the origin of the observed evolution [HAHN]. Transient evolutions of transmission performance are commonplace on optical networks and have myriad causes, including extrinsic causes such as lightning strikes, earthworks and construction, weather, road and rail traffic, fires, etc., as well as intrinsic causes including continuous or discrete deteriorations to equipment or fibre plant. Detection, classification, and localization of transmission performance evolutions permit assessment of the likelihood, expected severity, and rate of further deterioration, and planning of timely and cost-effective corrective interventions where indicated. However, successful analysis may depend on the availability of richer data sets in software that may be supported by continuous streaming or required by other applications. 6. Analog Measurement Requirements [RFC9232]provides a framework for considering concepts, constructs and developments in network telemetry. Many of the methods and mechanisms it discusses or suggests are invoked here. 6.1. Sampling An analog-to-digital conversion process typically converts analog signals into digital data that can be transmitted, stored, and processed more efficiently. This often involves sampling the signal at a certain rate and quantizing the amplitude into digital values. The "mirroring" (transmission for replication at a different place) of continuous-time real number data, generated by in-network instrumentation, begins with sampling and representing measured values by a scalar or vector of finite-decimal-place numbers. As neither sampling at fixed intervals, nor fixed time alignment or offset among measurement points in the network or between such points and the off-network software realm, can generally be assumed; it is useful that instrumentation should generate, as primary data, a series of couplets or vectors consisting of sample time stamps and corresponding measured data values. Janz & King Expires 4 September 2024 [Page 6] Internet-Draft Tele Methods Analog Measurement March 2024 6.2. Time Precision Inadequate sampling frequency and quantization error are both potential sources of error, in the - literal or effective - "reconstruction"" of the original time-continuous measurement in the software realm. It is possible that sampling frequencies might be varied in response to evolving temporal characteristics of measured parameters; this is one strategy for data reduction (and one reason why sampling may not occur at fixed-period intervals). Requirements on the precision of reconstructed data, its time basis, and the alignment in time of different reconstructed measurements; are determined by the operational role played by the analytical functions that consume the data. Some operations of interest, such as network and service planning or fault and impairment management, may impose only relatively relaxed requirements on time synchronization among measurement instruments, and between those instruments and the software realm. Other applications, e.g., those concerning operations tending toward closed loop control, may require tighter temporal data alignment among different measurement sources. These considerations have implications in terms of source and synchronization of clocks producing time stamps; but in general, requirements on clock synchronization and precision are far from those required for bit-level operations: i.e. they are generally more like "network time" than "digital time". Similarly, requirements on the absolute or relative (i.e. among different measurement instruments) precision of reconstructed measured data values may be application-dependent. In many cases, relative precision, or precision consistency, may be more important than absolute precision. 6.3. Reduction and Other Pre-Processing With telemetric data volume a primary potential challenge, methods for reducing data volume associated with analog measurement instrumentation are of evident interest. Signals may also be filtered to remove noise and unwanted frequencies to improve the data quality. 6.4. Compression Data compression is an obvious candidate methodology for bandwidth reduction. Methods for lossless compression of series of numerical data have been widely studied, e.g. [RATANAWORABHAN]. Janz & King Expires 4 September 2024 [Page 7] Internet-Draft Tele Methods Analog Measurement March 2024 Obviously, such compression must be implemented as a "pre-processing" function executed by the telemetric instrumentation itself, or some proxy to it. Similarly, decompression must be implemented as a "post-processing" function within the software realm. Where time stamps are uncompressed, depending on the compression methodology employed, it may be possible to support selective decompression of data, e.g., only on selected time intervals. This might allow for application-driven "as-required" post-processing (decompression) of more limited volumes of telemetric data. The compressibility of time-based data depends on its evolution in data-entropic terms, resulting in streamed data flows of varying volume or rate. The effective transmission and reception rates of data samples thus may vary and differ at any point from the rate of data generation. This is another reason why data samples may require time stamps. Other forms of effective data reduction through pre-processing may also be useful, or preferred: * Thresholding: Data samples are transmitted only if and when a measured value, or a derivative of the measured value, crosses a threshold. Possible examples include: a) exceeding some absolute or proportional variation from the last transmitted sample value; b) exceeding a previously observed and transmitted maximum or minimum value; or, c) exceeding some time rate-of-change of the measured value. Post-processing of threshold-driven data may or may not be required by applications. For example, an application may generate a scenario for behavioral analysis by an NDT that requires the "current" data from network instrumentation. To whatever precision is effectively reflected in the details of the operating thresholding mechanisms, that data is simply the most recently transmitted sample from network measurement instruments. Another application, however, perhaps one dealing with fault or impairment management, might require a regular and continuous time series presentation of measured data. In that case, e.g. interpolation or other post-processing of received data samples might be needed. Other kinds of pre-processing may also be interest, including normalization of data, frequency domain conversion, and computation of statistics. * Triggering: An extension or variation of thresholding, triggering may refer to, e.g. the transmission of a series of samples - from a defined set of measurement instruments, over a defined period of time and at defined time intervals - on crossing of a particular Janz & King Expires 4 September 2024 [Page 8] Internet-Draft Tele Methods Analog Measurement March 2024 threshold (i.e., that threshold crossing "triggers" the transmission of the defined data series). Triggering of this kind may be useful in e.g. fault and impairment management. The detection by instrumentation of some pre-defined circumstance or occurrence - e.g. observation of an unusually large or rapid change in an optical power level or channel SOP - would trigger the transmission of a pre-defined, "rich" set of data covering a time interval around the triggering observation. That data could then be subjected to various forms of "forensic" analysis in software to support detection, classification or localization of transmission performance-impacting events. Required pre- processing includes processing of triggers, and the sliding storage of instrumentation data sample values sufficient to cover the targeted data capture time "window" as well as trigger processing and transmission intervals. 6.5. Programmable Streaming As discussed in [RFC9232], in-network pre-processing of telemetry data may usefully be "programmed" by telemetry clients (i.e., software applications that are consumers of instrumentation data), including dynamically or variably. The range and nature of software applications and their data requirements may vary among systems, may evolve with time within any given system - based on experience and learning (automated or not) or with the deployment of new capabilities - and may also vary as a function of available instrumentation capabilities on a given network, which themselves may evolve. 6.6. Streaming versus Polling Streaming - i.e., subscription-based push - is, as identified in [RFC9232] and other works, and as suggested by the discussion above, expected to be the principal, if not exclusive, operational modality for telemetry, including analog instrumentation telemetry. Software clients consume data generated by the network, and having identified which data they require and from where within the network, use subscriptions to place themselves in a position to receive it, on an ongoing basis, without continuing operational steps. Triggered transmission of "batched" data is aligned with a streaming paradigm, as the telemetry server (i.e., instrumentation) must detect the trigger conditions and react by capturing and transmitting data to subscribing clients. It is worth considering, however, whether polling can or should be completely dispensed with, or whether it might retain some utility in some cases or circumstances. Janz & King Expires 4 September 2024 [Page 9] Internet-Draft Tele Methods Analog Measurement March 2024 The discussion so far supports a view that the data needs of NDTs can be satisfied, and in fact probably are best served by, streaming. However, polling could be used if NDT-based analyses are required relatively infrequently, do not require very rapid execution, and do not draw arbitrarily on historical data. Polling might also be useful as a complementary mechanism to streaming. For example, to reduce data transmission and handling volumes, an NDT might choose to unsubscribe from telemetry it has observed changes little with time. However, for particularly critical analyses, the NDT might want to ensure that all available telemetry data is up-to-date, by polling the unsubscribed instrumentation. Further, if certain kinds of data compression are used, decompression processes can enter into errored regimes e.g. through transmission loss of telemetry data. Periodic polling may be useful to "re-set" absolute data values in such cases. In fact, as suggested in [RFC7799], the possibility of transmission loss of streamed telemetry packets, a concern particularly if unreliable transport paradigms such as UDP are used, may provide a general reason to enable polling as a "failsafe" mechanism. 6.7. Communication Protocols Communication protocols facilitate the reliable data exchange between telemetry devices and control systems. Depending on the method, streaming and/or polling, various messaging protocols exist to provide efficient delivery of instrumentation data. 6.8. Data Models A complete framework for analog instrumentation telemetry might require data models supporting: * Identification of instrumentation-equipped and telemetry-capable network equipment, the latter's available instrumentation, its available pre-processing, and what aspects of available pre- processing are programmable; * Subscription to streaming from specific instrumentation; * Programming (or re-programming) of pre-processing on specific subscriptions and instrumentation, including type of pre- processing, applicable thresholds or triggers, and definition of trigger-associated data sets (included data and start/stop interval limits vs. triggering events); * Transmission of applicable time stamp-data value couplets, vectors or batches. Janz & King Expires 4 September 2024 [Page 10] Internet-Draft Tele Methods Analog Measurement March 2024 7. IANA Considerations This document makes no requests for action by IANA. 8. Operational Considerations Operational considerations for Optical Network Measurement Instrumentation involve a range of factors to ensure accurate, reliable, and efficient performance of the optical networks. These considerations are critical for deploying, maintaining, and troubleshooting fiber optic systems. Key operational considerations include: * Calibration and Signal Integrity * Dynamic Range and Sensitivity * Resolution and Accuracy * Scalability * Bandwidth and storage of instrumentation data Future version of this document will expand on the topics above and increase the scope of operational considerations. 9. Security Considerations The security implications of optical network telemetry are critical, given the increasing reliance on optical networks for data transmission in various sectors. Ensuring the security and integrity of these networks and thetelemetry instrumentation used to measure and maintain them is paramount to prevent unauthorized access, data breaches, potential service disruptions, and use as possible threat vectors and attack surfaces. Key security considerations include: * Encryption of sensitive telemetry data * Secure configuration and management of telemetry functions * Network monitoring and anomaly detection * Secure data handling and storage Future version of this document will expand on the topics above and increase the scope of security considerations. Janz & King Expires 4 September 2024 [Page 11] Internet-Draft Tele Methods Analog Measurement March 2024 10. Acknowledgements Thanks to discussions in the Network Digital Twin discussions Network Management Research Group that provided further input into this work. This work is supported by the UK Department for Science, Innovation and Technology under the Future Open Networks Research Challenge project TUDOR (Towards Ubiquitous 3D Open Resilient Network). The views expressed are those of the authors and do not necessarily represent the project 11. References 11.1. Normative References 11.2. Informative References [HAHN] Optical Fiber Communications, "On the Spatial Resolution of Location-Resolved Performance Monitoring by Correlation Method", 1 March 2023. [JANZ] IEEE/IFP Network Operations and Management Symposium, Workshop of Technologies for Network Twins, "Digital Twin for the Optical Network: Key Technologies and Enabled Automation Applications", 1 April 2022, . [JIANG] Journal of Lightwave Technology, vol. 40, No. 10, pp. 3128-3136, "Progresses of Pilot Tone Based Optical Performance Monitoring in Coherent Systems", 1 October 2023, . [NMRG-PODTS] IETF, "Performance-Oriented Digital Twins for Packet and Optical Networks", 1 October 2023, . [OPSAWG-IFIT-FRAMEWORK] IETF, "Framework for In-Situ Flow Information Telemetry", 1 October 2023, . [RATANAWORABHAN] Data Compression Conference, "Fast Lossless Compression of Scientific Floating-Point Data", 1 May 2006. Janz & King Expires 4 September 2024 [Page 12] Internet-Draft Tele Methods Analog Measurement March 2024 [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, May 2016, . [RFC9232] Song, H., Qin, F., Martinez-Julia, P., Ciavaglia, L., and A. Wang, "Network Telemetry Framework", RFC 9232, DOI 10.17487/RFC9232, May 2022, . Authors' Addresses Chris Janz Huawei Canada Email: christopher.janz@huawei.com Daniel King Lancaster University Email: d.king@lancaster.ac.uk Janz & King Expires 4 September 2024 [Page 13]