Enhanced architecture and implementation of spectrum shaping codes

Bingrui Wang; Zhaopeng Xie; Xingang Zhang

doi:10.7717/peerj-cs.1883

Enhanced architecture and implementation of spectrum shaping codes

Bingrui Wang¹, Zhaopeng Xie ², Xingang Zhang¹

1Henan Engineering Research Center of Service and Guarantee for Intelligent Emergency, Nanyang Normal University, Nanyang, Henan, China

2The School of Advanced Manufacturing, Fuzhou University, Jinjiang, Fujian, China

DOI: 10.7717/peerj-cs.1883

Published: 2024-02-21
Accepted: 2024-01-29
Received: 2023-11-28

Academic Editor: Sedat Akleylek

Subject Areas: Algorithms and Analysis of Algorithms, Computer Networks and Communications
Keywords: Spectrum shaping, K-constraint, Guided scrambling, Accumulated signal power, Spectrum null

Copyright: © 2024 Wang et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Wang B, Xie Z, Zhang X. 2024. Enhanced architecture and implementation of spectrum shaping codes. PeerJ Computer Science 10:e1883 https://doi.org/10.7717/peerj-cs.1883

The authors have chosen to make the review history of this article public.

Abstract

Spectral shaping codes are modulation codes widely used in communication and data storage systems. This research enhances the algorithms employed in constructing spectral shaping codes for hardware implementation. We present a parallel scrambling calculation with a time complexity of O(1). Second, in the minimum accumulated signal power (MASP) module, the sine-cosine accumulation needs to be determined by remainder with time complexity O(n²). We offer reduced MASP computations for short bit-width data, ROM storage, and addition pipelines. It can remove the remainder operation, reducing accumulated complexity to O(1). In addition, we present a search algorithm to generate segmented lines to replace the square operations in the MASP module. By employing the search algorithm and shift operations, we can reduce the complexity of the square from O(n²) to O(1). The implementation results reveal that the original and proposed MASPs yield nearly identical spectrum nulls. The encoder-decoder of the spectral shaping codes with proposed approaches consumes just 6% of the hardware resources when carried out with a Spartan6 XC6SLX25.

Introduction

Spectrum shaping codes are categorized as modulation codes, and they are applied initially to digital communications utilizing transformers to connect two lines. Transformers cannot transfer signals without significant distortion if the power spectral densities of signals include low frequency components. The shaping codes are designed to adjust source data to satisfy the features of the communication channel. These codes are also employed for digital recording systems to translate an arbitrary data sequence into a sequence with particular characteristics required by the systems (Immink & Cai, 2021). More recently, a novel concept of integrated microwave photonics spectral shaping is introduced to open avenues to advanced functionalities (Daulay et al., 2021).

Spectrum shaping technologies are utilized in a variety of fields. (1) In information processing and transmission fields, Chai et al. (2014) discuss the practical obstacles to implementing dynamic spectrum access (DSA) devices and offer solutions. In order to accommodate DSA in commercial off-the-shelf wireless devices, they also propose a general per-frame spectrum-shaping protocol. A simple spectrum shaping technique based on switching three loads has been presented for backscatter modulation-based Internet of Things (IoT) systems (Nagaraj, 2017). Danila (2021) describes theoretical research conducted in the terahertz G-band for a piezoelectrically-responsive ring-cone element metasurface composed of polyvinylidene fluoride (PVDF)/silicon and PVDF/silica glass. Utilizing the longitudinal piezoelectric effect of PVDF, this study examines the spectrum shaping ability of a polymer-based metasurface. Three distinct filter functions, such as Fano-like resonances, wavelength interleaving, and variable resonance mode splitting, are accomplished in Arianfard et al. (2021). The outcomes theoretically validate the proposed device as a compact photonic filter with many functions for adjustable spectral shaping. Dobre et al. (2021) developed spectrum-skirt-filled pulse-shaping filters corresponding to spectral mask response. The suggested system design achieves more excellent data rates in a dispersive microwave propagation environment than conventional transmission using Nyquist pulse shaping. Nasarre et al. (2021) present a novel concept of frequency-domain spectral shaping (FDSS) with spectral extension for the enhancement of the uplink (UL) coverage in 5G New Radio (NR), based on discrete Fourier transform spread orthogonal frequency-domain multiplexing. The results demonstrate that the spectrally-extended FDSS method is a highly effective solution for improving the 5G NR UL coverage. Furthermore, we can create dependable systems by integrating modern modulation techniques and rate-diverse error-correcting codes (Fang et al., 2023; Chen et al., 2023; Lin et al., 2023). (2) In information storage fields, spectrum shaping codes have spectrum nulls at specific frequencies (Pelusi et al., 2015). In addition, it is expected to enhance the performance of dedicated servo recording systems by using the shaping codes (Ng et al., 2015; Yuan et al., 2015), which is a promising technology for ultra-mobile hard disk drives. Shaping codes with spectrum nulls at non-zero frequencies effectively reduces interference between data signals and narrow band signals. In a dedicated servo recording system, there are two frequencies for servo signals, i.e., a frequency of f₁ on even tracks and a frequency of f₂ on odd tracks. In addition to avoiding interference between data and servo signals, it also permits filtering of low-frequency disc noise. Moreover, the applied recording systems require a run-length limit constraint (Tandon, Motani & Varshney, 2019), also known as the k-constraint. Kahlman & Immink (1995) concern the spectral shaping of both embedded pilot tones and spectral nulls in digital magnetic video tape recordings. The spectral notches are essential to prevent interference between the written data and the servo detecting mechanism. (3) In medical fields, Greffier et al. (2020) investigate the influence of tin filter-based spectral shaping computed tomography (CT) on image quality and radiation dose for use in ultralow-dose CT protocols. Tin filtering enhances the quality of the X-ray beam and the image quality characteristics of phantom images. Baldi et al. (2020) suggest a spectral shaping and third-generation dual-source multidetector CT scanner for evaluating osteolytic lesions caused by multiple myeloma. The outcome validates the benefits of whole-body low dose computed tomography for diagnosing patients with multiple myeloma. Agostini et al. (2021) investigate the function of third-generation iterative reconstruction (ADMIRE3) in a dual-source, high-pitch chest CT protocol with spectral shaping at 100 kVp coronavirus disease 2019 (COVID-19). The low-dose CT with spectral shaping and ADMIRE3 provides acceptable image quality for evaluating COVID-19 patients while significantly reducing radiation dose and motion anomalies. Hardening the X-ray beam, tin prefiltration is established for imaging high-contrast subjects in energy-integrating detector computed tomography (EID-CT) (Grunz et al., 2022). This study aims to examine the potential dose-saving effect of spectral shaping via tin prefiltration in photon-counting detector CT (PCD-CT) of the temporal bone. Seeking for matched image noise, high-voltage scan methods with tin prefiltration enables more significant dose savings in EID-CT. However, superior inherent denoising reduces the dose reduction potential of spectral shaping in PCD-CT.

Based on the excellent performance of the research in Cai et al. (2017), this study presents the implementation of spectrum shaping codes deploying a field programmable gate array (FPGA). The shaping code architecture consists of scrambling and descrambling, k-constrained encoding and decoding (Immink, 2012), and a minimum accumulated signal power (MASP) module. We provide simplified approaches for these modules, which are suitable for hardware implementation. Scrambling is a highly effective technique (Park & Son, 2020; Xiao et al., 2020; Liu et al., 2021). In the proposed scrambling and descrambling, we use only XOR (exclusive or) logical operations and no other arithmetic operations. The algorithm for k-constrained encoding and decoding is then described. Furthermore, we propose improved calculations to reduce parameter storage and processing complexity in the MASP module.

The study is organized as follows. In ‘Shaping Code Algorithms’, we describe the overall architecture of the FPGA system implementation and present the algorithms of spectrum shaping codes. ‘The enhanced algorithms’ enhances the algorithms. ‘FPGA implementation of a spectrum shaping code’ demonstrates a specific hardware implementation of the shaping algorithms with reduced computations. The shaping code is synthesized, and the consumed resources are analyzed. ‘Discussion and conclusion’ gives the conclusion and discussion.

Shaping Code Algorithms

Figure 1 illustrates a block diagram of an encoder and a decoder for spectrum shaping codes. In the encoder, the first step is to generate 0 to 2^p − 1 numbers in decimal form and convert them to binary vector A with the size of 2^p × p, and p denotes the length of a scrambler. Then, we append A to the user data of length m bits, generating a vector B of size 2^p × (m + p), that is, B = [B(0), B(1), …, B(2^p − 1)]. Second, B is fed into a guided scrambler module and then is scrambled. Then, we can generate a vector C of size 2^p × (m + p). Third, the scrambled vector C is encoded using a k-constrained encoder, yielding a vector D of size 2^p × (m + p + 1). Finally, the accumulated signal power is calculated from D(0) to D(2^p − 1), and the one with the least power vector T is chosen and sent. In the decoding process, the received signal R with a bit-width (m + p + 1) is fed into the k-constrained decoder, which produces the data Y with a bit-width (m + p). By descrambling Y, we can obtain the data Z with a bit-width (m + p). The original user data can be recovered by eliminating the redundant p bits.

Figure 1: Schematic diagram of the spectrum shaping code with guided scrambling.

Download full-size image

DOI: 10.7717/peerjcs.1883/fig-1

Simplified scrambling and unscrambling algorithms

In this study, the guided scrambler (GS) polynomial is (1) $g (x) = g_{0} + g_{1} x + \dots + g_{p - 2} x^{p - 2} + g_{p - 1} x^{p - 1} + x^{p} .$ Where g₀ is a constant bit of value 1, and g_i is binary bit of 1 or 0, 0 < i < p. The $\{b_{0}, b_{1}, b_{2}, \dots, b_{n - 2}\}$ is the bit set to be scrambled. The $\{c_{0}, c_{1}, c_{2}, \dots, c_{n - 2}\}$ represents the scrambled bit set. The b_i and c_i are the binary bits, 0 ≤ i ≤ n − 2. Each value of $\{c_{0}, c_{1}, \dots, c_{p - 1}\}$ is initialized to zero. The bits of $\{c_{p}, c_{p + 1}, \dots, c_{n - 2}\}$ can be generated by employing the encoding as (2) $c_{i} = b_{i} + \sum_{k = 0}^{p - 1} g_{p - k - 1} c_{i - k - 1}, p \leq i \leq n - 2 .$

As b_i, c_i, and g_j are binary values of 1 or 0, 0 ≤ i ≤ n − 2, 0 ≤ j < p, and then the multiplication result of g_i−k−1c_i−k−1 is also binary. Moreover, logical XOR operation can replace the addition involved in Eq. (2). The XOR compares two bits and returns a bit 1 if the two bits are different, 0 if they are equal (Qiu et al., 2024). An XOR operation between a variable and 0 returns the variable itself. Let the operation ⊕ indicate bitwise XOR. Therefore, we can modify Eq. (2) as (3) $c_{i} = g_{0} c_{i - p} \oplus g_{1} c_{i - p + 1} \oplus \dots \oplus g_{p - 1} c_{i - 1} \oplus b_{i}, p \leq i \leq n - 2 .$ Since Eq. (3) has a recursive structure, we perform a serial implementation, which takes n clock cycles to complete. Thus, the time complexity of GS encoding is O(n).

Next, we show the GS decoding process for restoring the original bit set $\{b_{0}, b_{1}, b_{2}, \dots, b_{n - 2}\}$ from the encoded bit set $\{c_{0}, c_{1}, c_{2}, \dots, c_{n - 2}\}$ . The GS encoding and decoding involve only XOR operations. In order to get b_i from c_i, we use Equation (2) or (3) and add b_i⊕c_i on the both hands side of Eqs. (2) or (3). Hence, we get the desired last Eq. (4) generating the values of b_i. (4) $\begin{matrix} b_{i} = b_{i} \oplus \sum_{k = 0}^{p - 1} g_{p - k - 1} c_{i - k - 1} \oplus b_{i} \oplus c_{i} \\ = c_{i} \oplus \sum_{k = 0}^{p - 1} g_{p - k - 1} c_{i - k - 1}, p \leq i \leq n - 2 . \end{matrix}$ By comparing Eqs. (2) with (4), we observe that we only need to change positions between c_i and b_i. Since the c_i is the encoded bit and can be known, the GS decoding can be implemented in parallel.

The algorithm of k-constrained encoding and decoding

The k-constrained encoding algorithm is described as follows:

Step 1: Add a bit 1 to the scrambled bit set $c = \{c_{0}, c_{1}, \dots, c_{n - 2}\}$ , and then generate $e = \{e_{0}, \dots, e_{n - 1}\}$ , where e₀ = 1 and (c₀, c₁, …, c_n−2) = (e₁, e₂, …, e_n−1).

Step 2: The bit set e is splitted into L blocks of q bits, where L = 2^q−1 and n = L∗q. The L blocks consist of L₀ = (e₀, e₁, …, e_q−1), L₁ = (e_q, e_q+1, …, e_2q−1), ⋯, L_i = (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}), ⋯, L_2^q⁻¹−1 = (e_n−q, e_n−q+1, …, e_n−1), 0 ≤ i ≤ (2^q⁻¹ − 1).

Step 3: If (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = [0]_bin, then (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = (e_u∗q, e_u∗q+1, …, e_{(u+1)∗q−1}), where 0 ≤ u ≤ i ≤ (2^q⁻¹ − 1) and the []_bin represents to take a binary number with q bits.

Step 4: (e_u∗q, e_u∗q+1, …, e_{(u+1)∗q−1}) = [i]_bin. Repeat steps 3 and 4 L − 1 times.

Step 5: If (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = [2^q − 1]_bin and the first bit of (e_u∗q, e_u∗q+1, …, e_{(u+1)∗q−1}) is 1, then (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = [0]_bin. Repeat step 5 L times.

The k-constrained decoding algorithm is presented as follows:

Step 1: If (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = [0]_bin, we can have (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) = [2^q − 1]_bin, 0 ≤ i ≤ (2^q⁻¹ − 1). Repeat step 1 L times.

Step 2: If the first bit of (e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1}) is 0, we have z = [(e_i∗q, e_i∗q+1, …, e_{(i+1)∗q−1})]_dec and (e_z∗q, e_z∗q+1, …, e_{(z+1)∗q−1}) = [0]_bin, where the []_dec represents to take a decimal number. Repeat step 2 L times. Note that if the first bit e₀ is 1, the step 2 is executed only once, and the decoded data is obtained by simply removing the first bit e₀.

The algorithm of minimum accumulated signal power

As given in Cai et al. (2017), the MASP criterion is (5) $\sum_{s = 1}^{t} {|\sum_{i = 1}^{n (l - 1)} w_{i}^{*} e^{- j 2 π f_{s} i} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} e^{- j 2 π f_{s} i}|}^{2} .$ Where $j = \sqrt{- 1}$ , t is the number of spectrum nulls at frequencies f₁, f₂, ..., f_t and n is the length of one codeword, l indicates the number of codewords that need to be computed. The w and w^∗ express the current unencoded codeword and previous encoded codeword, respectively.

Let (6) $\begin{matrix} R e_{s}^{l - 1} = {[\sum_{i = 1}^{n (l - 1)} w_{i}^{*} e^{- j 2 π f_{s} i}]}_{real part} = \sum_{i = 1}^{n (l - 1)} w_{i}^{*} cos (2 π f_{s} i), \\ I m_{s}^{l - 1} = {[\sum_{i = 1}^{n (l - 1)} w_{i}^{*} e^{- j 2 π f_{s} i}]}_{imaginary part} \times - 1 = \sum_{i = 1}^{n (l - 1)} w_{i}^{*} sin (2 π f_{s} i), \end{matrix}$ where $R e_{s}^{l - 1}$ and $I m_{s}^{l - 1}$ are the sine-cosine accumulations of the l − 1 codewords and have been calculated. In Eq. (5), the left part is already obtained and can be directly added to the right part. By application of Euler’s formula, the unencoded codeword is computed by (7) $\begin{matrix} \sum_{s = 1}^{t} {|R e_{s}^{l - 1} - j I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} [cos (2 π f_{s} i) - j sin (2 π f_{s} i)]|}^{2} \\ = \sum_{s = 1}^{t} {|R e_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i) - j [I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)]|}^{2} \\ = \sum_{s = 1}^{t} \{{[R e_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i)]}^{2} + {[I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)]}^{2}\} . \end{matrix}$

The enhanced algorithms

Parallel scrambling algorithms

The GS scrambler polynomial is employed as 1 + x², where p, g₀, and g₁ are equivalent to the digits 2, 1, and 0, respectively. Based on the scrambling Eq. (2), the encoding is given by (8) $\begin{matrix} c_{i} = c_{i - 2} \oplus b_{i}, 2 \leq i \leq n, \end{matrix}$ where c₀ and c₁ are initially set to zero. We can see that this calculation is recursively executed in serial. In other words, one clock is taken to produce one c_i. It takes n clocks to calculate all c_i. Thus, the time complexity of the initial scrambling polynomial is O(n). To reduce this time consumption, we transform Eq. (8) as (9) $\begin{matrix} \{\begin{matrix} c_{2 j} = c_{0} \oplus b_{2} \oplus \dots \oplus b_{2 j}, \\ c_{2 j + 1} = c_{1} \oplus b_{3} \oplus \dots \oplus b_{2 j + 1}, 1 \leq j \leq \frac{n - 2}{2} . \end{matrix} \end{matrix}$

In this way, the calculation Eq. (9) is not recursive since the right side of Eq. (9), i.e., input data b and c₀ and c₁, are known in advance. Then, we can independently compute c₃, c₄, …, c_n at one clock in parallel, with a time complexity of O(1). It means that the time complexity is reduced from O(n) in Eq. (8) to O(1).

The corresponding decoding is given by (10) $\begin{matrix} b_{i} = c_{i - 2} \oplus b_{i} \oplus c_{i - 2}, \\ = c_{i} \oplus c_{i - 2}, 2 \leq i \leq n . \end{matrix}$

Recall that the encoding and decoding of scrambling only use XOR operations.

Improved MASP with remainder operation

Solving for sine and cosine is a critical step in Eq. (7), and we propose a minimum accumulated signal power with remainder (MASP-R), which is stated as follows.

Step 1: Convert the radian value 2πf_si to the degree value h₁, (l − 1)n + 1 ≤ i ≤ ln.

Step 2: The h₁ may be greater than 360 degrees, so we need to perform the remainder of the operation on h₁. Furthermore, since sin(h₁) = cos(h₁ + 270°), we also need to calculate the remainder of h₁ + 270° to 360°. Thus, the sine and cosine of 2πf_si can be given by (11) $h_{r 1} = m o d (h_{1}, 360 °), cos (2 π f_{s} i) = cos h_{r 1}, 0 ° \leq h_{r 1} \leq 360 °, h_{r 2} = m o d (h_{1} + 270 °, 360^{o}), sin (2 π f_{s} i) = sin (h_{1}) = cos h_{r 2}, 0 ° \leq h_{r 2} \leq 360 ° .$

Step 3: We construct a ROM and store the cosine values from 0 degrees to 359 degrees in the ROM. Determine the cosine values of h_r1 and h_r2 from the ROM.

Thus, given the cosine value in the first quadrant, we can determine the values in the other three quadrants. The ROM only needs to store 91 numbers from 0 to 90 degrees instead of 360 values, thus saving 3/4 of the storage space. When MASP solves for sine and cosine, it solves for cos(h_r1) and cos(h_r2). Next, we show an example of a modified cosine solution using h_r1. (12) $cos (h r_{1}) = \{\begin{matrix} cos h r_{1}, 0 ° \leq h r_{1} \leq 90 °, \\ - cos (180 ° - h r_{1}), 90 ° < h r_{1} \leq 180 °, \\ - cos (h r_{1} - 180 °), 180 ° < h r_{1} \leq 270 °, \\ cos (36 0^{o} - h r_{1}), 27 0^{o} < h r_{1} < 36 0^{o} . \end{matrix}$

Improved MASP with no remainder and square

Remove the remainder operation

Next, we propose an improved MASP algorithm with no remainder and square (MASP-NRS). According to the MASP formula Eq. (7), we compute a 360-degree remainder, obtain the related sine-cosine value, and perform an addition. A sine-cosine accumulation requires n clocks. The parallel execution for the accumulation is complex, and serial operation is employed instead. The time complexity of the remainder operation is O(n), while a sine-cosine accumulation requires n remainders. It leads to the time complexity of accumulation O(n²). To reduce this time complexity, we propose an algorithm to remove the remainder operation that includes the following methods.

Improvement 1: Reduce the number of codewords involved in accumulation. The $\sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i)$ and $\sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)$ of the current l-th codeword need to be added to the $\sum_{i = 1}^{n l} w_{i}^{*} cos (2 π f_{s} i)$ and $\sum_{i = (l - 1) n + 1}^{n l} w_{i}^{*} sin (2 π f_{s} i)$ of the previous (l − 1) codewords. We have (13) $\begin{matrix} R e_{s}^{l} = \sum_{i = 1}^{n (l - 1)} w_{i}^{*} cos (2 π f_{s} i) + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i), \\ I m_{s}^{l} = \sum_{i = 1}^{n (l - 1)} w_{i}^{*} sin (2 π f_{s} i) + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i) . \end{matrix}$ The values of $R e_{s}^{l}$ and $I m_{s}^{l}$ increase as the number of codewords increases. After accumulating 64 codewords, we reset $R e_{s}^{l}$ and $I m_{s}^{l}$ to 0 to limit these values.

Improvement 2: Eliminate the remainder of the operation and use ROM storage instead. Let the shaping code utilize two dual servo frequencies, f₁ and f₂. A codeword w has n bits that is multiplied by four groups of sine-cosine cos(2πf₁i), sin(2πf₁i), cos(2πf₂i), sin(2πf₂i), and 0 ≤ i ≤ n − 1. Each sine-cosine group contains n data points. A total of 64×4×n sine-cosine values are stored.

Improvement 3: Adopt small bit-width. The initial sine-cosine values need to be transformed from decimals to integers to calculate on the FPGA. Multiply the initial sine-cosine values by 15 and round to the nearest integer number, which is approximately equivalent to moving the values left by four bits. As a result, including the sign bit, the bit width of four groups of sine-cosine values is 5, with a maximum value of 15.

Improvement 4: The parallel operation. Each item in codeword w has a value of either -1 or 1. The select operations can multiply w by sine and cosine. We can acquire n sine-cosine values from ROM at the same time and perform parallel selection operations to complete the sine-cosine accumulation in a single clock cycle. Thus, we eliminate the remainder operation, reducing the accumulation time complexity from O(n²) to O(1).

Remove the square operation

Equation (7) involves a square operation, which has a calculating cost of O(n²) and is challenging to compute. We provide a segmented line search algorithm with dynamic error. The search algorithm seeks segmented points, which are combined to produce a segmented curve. Applying the curve, we obtain an approximate estimation. The operation of this curve only involves deterministic shifts and additions/subtractions with a complexity of O(1). The complexity of the proposed search algorithm is two orders of magnitude lower than that of the square operation. The key features of the algorithm are the usage of dynamical error and the balanced coefficient of mean square error. The search algorithm is described in Algorithm 1 .

 
_____________________________________________________________________________ 
 
Algorithm 1 Segmented line search algorithm with dynamical error 
Require: f(x): the square function; x: the independent variable; 
Ensure: a set of segmented points; 
  1:  x ∈ [x0, x1, ⋅⋅⋅, xn−1], k=1; 
  2:  xb = x0, xv = x2, sp0 = x0; 
  3:  for j = 2; j <= n − 1; j + +  do 
 4:       while xb < xi < xv do 
 5:            compute  ˆ f (xi) = f(xv)−f(xb) 
    xv−xb    (xi − xb) + f(xb); 
  6:       end while 
 7:       compute error =  v−1 
   ∑ 
 i=b+1       [ ˆ f (xi)−f(xi)]2 
            ____________________________________ 
(v−b−1)e| ˆ f (xi)−f(xi)|α 
       β        ; 
  8:       if error ≥|f(xv) − f(xb)|μ then 
 9:            spk = xv−1; 
10:            xb = xv; 
11:            xv = xv+2; 
12:            k = k + 1; 
13:       else 
14:            xb = xb; 
15:            xv = xj; 
16:       end if 
17:  end for 
18:  return sp = [sp0, sp1, sp2, ...]; 
_____________________________________________________________________________

We generally use expression Eq. (14) to calculate the mean square error. (14) $\begin{matrix} e r r o r = \frac{\sum_{i = 1}^{n} {[\hat{f} (x_{i}) - f (x_{i})]}^{2}}{n} \end{matrix}$

where $\hat{f} (x_{i})$ represents the predicted value and f(x_i) the actual value.

The large, varied item significantly influences the error expression Eq. (14), but the little various item has a minor impact. So, in Algorithm 1 , we propose a balanced-coefficient mean square error expression Eq. (15) to accurately describe the importance of each item. (15) $\begin{matrix} e r r o r^{*} = \frac{\sum_{i = 1}^{n} {[\hat{f} (x_{i}) - f (x_{i})]}^{2}}{n [e^{\frac{{|\hat{f} (x_{i}) - f (x_{i})|}^{α}}{β}}]} \end{matrix}$ where α and β are called fast and slow decay factors, respectively.

Multiple segmented line Eq. (16) can be generated when segmented points are provided. The product of x and k₀, k₁, …, can be replaced by shift operation on x. The complexity of calculating x2 is O(n2), whereas applying Eq. (16) and combining with the shift operation to compute the square of x decreases the complexity to O(1). As a result, we can rewrite Eq. (7) as Eq. (17). (16) $\begin{matrix} \hat{f} (x) = \{\begin{matrix} k_{0} x + b_{0}, s p_{0} \leq x \leq s p_{1} \\ k_{1} x + b_{1}, s p_{1} < x \leq s p_{2} \\ k_{2} x + b_{2}, s p_{2} < x \leq s p_{3} \\ ⋮ \end{matrix} \end{matrix}$ (17) $\begin{matrix} \sum_{s = 1}^{t} \{\hat{f} [R e_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i)] + \hat{f} [I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)]\} \end{matrix}$

The values of the cosine and sine functions range from −1 to 1 in Eq. (17). It can obtain large values of ${[R e_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i)]}^{2}$ and ${[I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)]}^{2}$ , when the length n of a codeword is large and the binary bits w_i are all positive 1. Note that we use the MASP algorithm only for comparison. Thus, we can simultaneously reduce the sum of the two trigonometric functions without affecting the comparison. Then, we can modify Eq. (17) as (18) $\begin{matrix} \sum_{s = 1}^{t} \{\hat{f} [\frac{R e_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} cos (2 π f_{s} i)}{n u m}] + \hat{f} [\frac{I m_{s}^{l - 1} + \sum_{i = (l - 1) n + 1}^{n l} w_{i} sin (2 π f_{s} i)}{n u m}]\} . \end{matrix}$ Where num expresses an integer.

FPGA implementation of a spectrum shaping code

Here, we employ a specific shaping code as an example of FPGA implementation. Let the lengths of the shaping code and the original message be 80 and 77 bits. Then we get L = 16 and q = 80/L =5. The GS scrambler polynomial is 1 + x², and the length of the scrambler is 2. Two bits are chosen from the binary set 00, 01, 10, 11 and appended to the original message. Next, the 79 bits need to be scrambled utilizing the parallel scrambling algorithms described in ‘Parallel scrambling algorithms’. After that, we add a bit 1 to the scrambled 79 bits to create 80 bits. The k-constrained and MASP algorithms are then executed.

The implementation of MASP-NRS

The implementation of removing reminder operation

Let the shaping code utilize two kinds of dual servo frequencies, f₁ = 1/90 and f₂ = 1/60. Each sine-cosine group contains 80 data points. We store 64 × 4× 80 groups of sine-cosine values, requiring a total of 64 × 4 × 80 × 5 = 12.5 KB.

Next, using a pipelined operation, we implement the sine-cosine accumulation in Eq. (18). Figure 2 illustrates the pipeline structure.

Figure 2: The pipeline of sine-cosine accumulation.

Download full-size image

DOI: 10.7717/peerjcs.1883/fig-2

Step 1: In Fig. 2, we use r_f_kcos_w_i and r_f_ksin_w_i to denote the product of w_i with cos(2πf_ki) and sin(2πf_ki), k = 1, 2, and 0 ≤ i ≤ 79. According to the value of w_i, we use selectors to determine the 80 values of r_f₁cos_w_i, 0 ≤ i ≤ 79.

Step 2: Accumulate r_f₁cos_w_i. If 80 data points are added two by two, the four-stage accumulation will need 40, 20, 10, and 5 addition operations, respectively. Thus, five operands are remaining after the four-stage addition. However, adding these five operands two by two is inconvenient. We construct a segmented accumulation equation because the r_f₁cos_w_i has a small five-bit width Eq. (19). Applying the equation, the first stage accumulation of 80 data requires only 32 addition operations. (19) $\begin{matrix} \{\begin{matrix} r_1_a i = r_f_{1} c o s_w_{i * 2} + r_f_{1} c o s_w_{i * 2 + 1}, 0 \leq i \leq 15, \\ r_1_a i = r_f_{1} c o s_w_{i * 2} + r_f_{1} c o s_w_{i * 2 + 1} + \\ r_f_{1} c o s_w_{i + 48}, 16 \leq i \leq 31 . \end{matrix} \end{matrix}$

Step 3: In Fig. 2, the variables r_1_a0, …, r_1_a31, r_2_a0, …, r_2_a15, r_3_a0, …, r_3_a7, r_4_a0, …, r_4_a3, r_5_a0, …, r_5_a1, r_6_a, comprise a six-step sine-cosine cumulative pipeline. A two-by-two addition is then performed, i.e., (20) $\begin{matrix} \{\begin{matrix} r_2_a 0 = r_1_a 0 + r_1_a 1, \\ r_3_a 0 = r_2_a 0 + r_2_a 1, \\ r_4_a 0 = r_3_a 0 + r_3_a 1, \\ r_5_a 0 = r_4_a 0 + r_4_a 1 . \end{matrix} \end{matrix}$ The cumulative result r_6_a = r_5_a0 + r_5_a1. Each r_6_a of the current codeword needs to be added to the accumulated values of the previous codewords (denoted by Re₁, Im₁, Re₂ and Im₂) to yield r_7_Re₁, r_7_Im₁, r_7_Re₂ and r_7_Im₂.

Step 4: Following accumulation, an operation instead of the square is performed, which is introduced in the next section. The accumulation of an encoded codeword is completed after 14 cycles. At the 5th clock, calculate the next encoded codeword. In Fig. 2, the buffer indicates a cache of one clock.

The implementation of removing square operation

For Eq. (18), a symbol contains 80 bits. Due to the k-constrained algorithm, there will not be five consecutive 1’s and five consecutive 0’s, and a symbol contains no more than 80*80% 1’s. In addition, the sine-cosine values are represented by integers in the range of 0 to 15. In extreme cases, 80*80% 1’s are required to multiply with these sine-cosine values. The sine-cosine values involved in the multiplication are considered as the mean value, 7.5, and then the multiplication result is 80*80% *7.5. The result of the current symbol needs to be added to that of the previous symbol, so the accumulated result can be 80*80% *7.5*2 = 960. To simplify calculating the square of large number, the num in Eq. (18) is set to 16. Thus, 80*80% *7.5*2/16 = 960/16 = 60. The division by 16 yields the same result as a 4-bit right shift. In order to prevent some accumulated results divided by 16 from exceeding 60, we add an overflow control. If any results are greater than 60, the results are set to 60.

In Algorithm 1 , the slow decay and fast decay factors are set to 1/2 and 4, and the µis 0.06, and then we get the segmented line equation

(21) $\begin{matrix} \hat{f} (x) = \{\begin{matrix} 2 x, s p_{0} \leq x \leq s p_{1} \\ 6 x - 8, s p_{1} < x \leq s p_{2} \\ 11 x - 28, s p_{2} < x \leq s p_{3} \\ 18 x - 77, s p_{3} < x \leq s p_{4} \\ 26 x - 165, s p_{4} < x \leq s p_{5} \\ 35 x - 300, s p_{5} < x \leq s p_{6} \\ 46 x - 520, s p_{6} < x \leq s p_{7} \\ 58 x - 832, s p_{7} < x \leq s p_{8} \\ 70 x - 1216, s p_{8} < x \leq s p_{9} \\ 83 x - 1710, s p_{9} < x \leq s p_{10} \\ 98 x - 2385, s p_{10} < x \leq s p_{11} \\ 113 x - 3180, s p_{11} < x \leq s p_{12} \end{matrix} \end{matrix}$

where the values of sp₀, sp₁, sp₂, ..., sp₁₂ are 0, 2, 4, 7, 11, 15, 20, 26, 32, 38, 45, 53, 60. We can explore other appropriate values in conjunction with optimization algorithms such as particle swarms (Chen et al., 2022). By replacing the multiplication in Eq. (21) with a shift operation, Eq. (21) become (22) $\begin{matrix} \hat{f} (x) = \{\begin{matrix} x < < 1, s p_{0} \leq x \leq s p_{1} \\ x < < 2 + x < < 1 - 8, s p_{1} < x \leq s p_{2} \\ x < < 3 + x < < 1 + x - 28, s p_{2} < x \leq s p_{3} \\ x < < 4 + x < < 1 - 77, s p_{3} < x \leq s p_{4} \\ x < < 4 + x < < 3 + x < < 1 - 165, s p_{4} < x \leq s p_{5} \\ x < < 5 + x < < 1 + x - 300, s p_{5} < x \leq s p_{6} \\ x < < 5 + x < < 3 + x < < 2 + x - 520, s p_{6} < x \leq s p_{7} \\ x < < 5 + x < < 4 + x < < 3 + x - 832, s p_{7} < x \leq s p_{8} \\ x < < 6 + x < < 2 + x < < 1 - 1216, s p_{8} < x \leq s p_{9} \\ x < < 6 + x < < 4 + x < < 1 + x - 1710, s p_{9} < x \leq s p_{10} \\ x < < 6 + x < < 5 + x < < 1 - 2385, s p_{10} < x \leq s p_{11} \\ x < < 6 + x < < 5 + x < < 4 + x - 3180, s p_{11} < x \leq s p_{12} \end{matrix} \end{matrix}$ where the < < indicates that the variable x is left-shifted.

As shown in Fig. 3, we compare the segmented line function $\hat{f} (x)$ with the square function f(x). It is seen that the two curves exhibit a high degree of concordance, suggesting a strong resemblance between them. Using Eq. (23), the correlation coefficient rela between the estimated and actual square values equals 1. (23) $\begin{matrix} r e l a = \frac{\sum_{i = 1}^{n} \{f (x_{i}) - E [f (x_{i})]\} \{\hat{f} (x_{i}) - E [\hat{f} (x_{i})]\}}{\sqrt{\sum_{i = 1}^{n} {\{f (x_{i}) - E [f (x_{i})]\}}^{2} \sum_{i = 1}^{n} {\{\hat{f} (x_{i}) - E [\hat{f} (x_{i})]\}}^{2}}} \end{matrix}$ where E[f(x_i)] denotes the expected actual value and $E [\hat{f} (x_{i})]$ denotes the expected estimated value. (24) $\begin{matrix} t d = \frac{r e l a}{\sqrt{\frac{1 - r e l a^{2}}{n - 2}}} \end{matrix} .$

Figure 3: The comparison of segmented line prediction and actual square.

Download full-size image

DOI: 10.7717/peerjcs.1883/fig-3

Then, we define a variable td according to Eq. (24), consult the t-distribution table, and obtain a p-value of 0 that is less than the significance level (p = 0.05). As a result, the correlation coefficient rela is regarded as significant. The $\hat{f} (x)$ and f(x) are completely correlated. (25) $\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {[\hat{f} (x_{i}) - f (x_{i})]}^{2}}{\sum_{i = 1}^{n} {[\bar{f} (x_{i}) - f (x_{i})]}^{2}} . \end{matrix}$

Next, we examine the R² relationship between $\hat{f} (x)$ and f(x) as stated in Eq. (25). The calculated value of R² is zero, demonstrating that the variance of the difference between $\hat{f} (x)$ and f(x) is 0% of the variance of f(x). The variance of the difference between $\hat{f} (x)$ and f(x) is extremely small, indicating that $\hat{f} (x)$ and f(x) are quite close in value.

Implementation result

We use a Spartan6 XC6SLX25 to implement the FPGA. Table 1 illustrates the resources consumed by spectrum shaping encoder based on MASP-R and MASP-NRS. These two MASPs employ the same decoding technique, and the hardware resources of decoder are detailed in Table 2. We can see that the encoder consumes more resources than the decoder, since the former one implements the MASP-R/MASP-NRS algorithms. The encoder consumes more 1,500 slice registers than the decoder. Also, it consumes twice as many LUT slices as decoder, due to MASP-R/MASP-NRS needs combinatorial logics such as addition. In particular, the encoder with MASP-R employs two DSPs to calculate the remainder and square operation as in Eq. (7), whereas the encoder with MASP-NRS needs no DSP. Since MASP-NRS eliminates the remainder operation, the corresponding encoder occupies some Block RAMs. Based on the Spartan6 XC6SLX25 implementation, the encoder and decoder with MASP-NRS can operate at frequencies of 121.560 MHz and 164.401 MHz, respectively.

Table 1:

The hardware sources of spectrum shaping encoder based on MASP-R and MASP-NRS.

Logic utilization	Method	Used	Available	Utilization
Slice registers	MASP-R	2,966	30,064	9%
Slice registers	MASP-NRS	1,965	30,064	6%
Slice LUTs	MASP-R	5,250	15,032	34%
Slice LUTs	MASP-NRS	4,273	15,032	28%
Block RAM/FIFO	MASP-R	0	52	0%
Block RAM/FIFO	MASP-NRS	24	52	46%
BUFG/BUFGCTRLs	MASP-R	1	16	6%
BUFG/BUFGCTRLs	MASP-NRS	1	16	6%
DSP48E1s	MASP-R	2	38	5%
DSP48E1s	MASP-NRS	0	38	0%

DOI: 10.7717/peerjcs.1883/table-1

Table 2:

The hardware sources of spectrum shaping decoder.

Logic utilization	Used	Available	Utilization
Slice registers	864	30,064	2%
Slice LUTs	1,686	15,032	11%
Block RAM/FIFO	0	52	0%
BUFG/BUFGCTRLs	1	16	6%
DSP48E1s	0	38	0%

DOI: 10.7717/peerjcs.1883/table-2

Figure 4 demonstrates the power spectrum densities for the same spectrum code. The dashed curve corresponds to the result of the initial MASP which is depicted in Eq. (7), while the solid curve represents the result of the MASP-NRS. Both curves use a code length of 80 bits, and the encoding and decoding methods are similar, except for the difference in the accumulated signal power method and scrambling. In Fig. 4, we can see that the MASP can generate spectrum nulls of −22.8 dB at frequency 1/90 and −20.0 dB at frequency 1/60. The improved algorithm MASP-NRS obtains spectrum nulls of −22.5 dB and −19.4 dB at frequencies 1/90 and 1/60, respectively. The spectrum nulls of MASP-NRS are 98.7% and 97.0% of those of the MASP, with losses of 1.3% and 3% due to truncation operations in MASP-NRS.

Figure 4: Comparison of the power spectrum density of MASP and MASP-NRS.

Download full-size image

DOI: 10.7717/peerjcs.1883/fig-4

Discussion and conclusion

In this research, we enhance the encoder–decoder algorithms for spectrum shaping codes in order to facilitate hardware implementation. We improve the scrambling algorithm and provide a mathematical description of the k-constrained algorithm. Concerning both descrambling and scrambling, we employ parallel operations that can be executed within a single schedule. We propose an enhanced MASP-R algorithm to compute remainder operations for sine-cosine accumulation; however, its execution in parallel is challenging due to its significant time complexity. Thus, we further present a MASP-NRS algorithm that quantizes sine-cosine values with short bit-width and stores them in ROM, eliminating the remainder operation. In particular, the MASP-NRS allows parallel operations for the sine-cosine accumulation within a single clock. It is capable of resolving the parallelization issue that plagued the initial MASP. Furthermore, we put forward a search algorithm that utilizes two approaches: dynamical error and balanced-coefficient mean square error. The search algorithm generates a curve $\hat{f} (x)$ similar to the square function f(x). By employing correlation and R² analysis, it is possible to ascertain that f(x) and $\hat{f} (x)$ are almost equivalent. The complexity is reduced by two orders of magnitude through substituting the square operation in MASP with $\hat{f} (x)$ . Finally, the encoder–decoder of shaping codes is executed utilizing the Spartan6 XC6SLX25. The synthesis results show that the decoder is simpler than the encoder since it does not have to calculate the accumulated signal power. Furthermore, we demonstrate that the performance of initial MASP and MASP-NRS is nearly identical, yielding spectrum nulls of approximately −22.8 dB, which confirms the accuracy of the proposed algorithm.

Supplemental Information