• Keine Ergebnisse gefunden

Control module

Dual port RAM

Register Multipliers

Squarers Adders

Multiplexer

Control Signals

address

Figure 2.21: Data-path structure.

a host processor interface module and the other one will be connected through a multiplexer to the arithmetic modules.

Multiplexer

There are three groups of data buses in the processor, mainly:

– input data to the dual port RAM which are also the outputs of arithmetic units, – output data from the dual port RAM,

– and input data to the arithmetic units.

Saving and loading data to and from the dual port RAM is always time consuming.

It requires at least two clock cycles and it is sometimes more efficient to load an arithmetic unit directly from another or the same module. The multiplexer decides which of the first two data lines should be used to load each arithmetic module.

Adders

This module consists of two adders, each of them having two input buffers and one output buffer. Each input buffer requires one clock cycle to be loaded. Addition which is only a bitwise XOR combination will be done in one clock cycle. The code uses “generic” parameters.

2.5. FPGA-Based Co-Processor 55

Squarers

There are two squarers which can be used in parallel. Their input-output structures are like those of adders. They are generated using a code generator for each field extension.

Multipliers

As we have already seen at most two multipliers can be deployed at the same time during the Montgomery algorithm. The multipliers are the most time and area consuming elements in our design. They are LFSR multipliers which are generated using a code generator. They are flexible both with respect to polynomial length and parallelism degree. So if there is more space on the platform FPGA, the word-length of the multipliers can be increased. But it should be taken into account that this structure uses extra clock cycles to load and save from and into register files and is effective as long as the multipliers require several clock cycles.

Control module

This is probably the most complicated module in our ECCo. It controls the over-all point multiplication and consists of several other submodules. So we devote a complete section to it.

2.5.2 Control Module

This part is responsible for performing the Montgomery multiplication algorithm. The required sequence of point additions and doublings of this algorithm is shown in Algo-rithm 3 in whichk =dlog2ne. The point additions and doublings can be in any represen-tations and they-coordinate needs to be computed only in the last stage. We use additions and doublings as stated in Figures 2.18 and 2.19.

This module consists of a state machine, performing Algorithm 3, which communi-cates with several other submodules as shown in Figure 2.22. These different submodules are described as follows:

Algorithm 3 The Montgomery point multiplication algorithm expressed in point level.

Input: An elliptic curve with a fixed pointQon it, together with the binary representation of the scalar multipliermas(mk1mk2. . . m1m0)2.

Output: mQ

1: Q1 ←Q,Q2 ←2Q

2: forifromk−2downto0do

3: ifmi = 1then

4: Q1 ←Q1+Q2,Q2 ←2Q2

5: else

6: Q2 ←Q1+Q2,Q1 ←2Q1

7: end if

8: end for

Add Double Compute Y

Control line mux.

Control module Counter

Shift register

Address Hardwired

addresses

Indirect addresses

DPRAM address Control+Address lines

Figure 2.22: The structure of the control module in ECCo.

2.5. FPGA-Based Co-Processor 57

Counter

The counter in the control module takes care of the number of iterations in Algo-rithm 3 to be exactlydlog2ne −1whennis the order of the group of points on the elliptic curve.

Shift register

This register will be directly loaded from the dual port RAM and contains the multiplier m. By each repetition of Algorithm 3 this register will be shifted to right. The LSB of this register is the decision criterion for the control module state machine.

Control module state machine

This state machine controls the overall operation of the processor. It starts other modules, gives the control to them, and waits for their terminations.

Add, Double, and Compute Y

These state machines perform the operations addition, doubling, and computing the y-coordinate. The latter will be activated only once during the total point multipli-cation. Each of these operations will be started with the command of the control module state machine, which at the same time gives the control of all of the proces-sor elements to these modules. After finishing, they activate a signal in the main state machine which takes their control back by changing the addresses of the con-trol line multiplexer.

Control line multiplexer

There is a single control bus in the processor which consists of address lines for the dual port RAM, commands to arithmetic units, and their ready status signals.

The control module state machine can change the master of this bus by activating the corresponding address in this multiplexer.

Address multiplexer

As stated above there is a single control line in the processor. The point additions and doublings in Algorithm 3 consist of the same operations which are performed on different variables. Results should also be written back to different addresses.

This is done during an indirect addressing process. The control module state ma-chine puts the addresses of the arguments and return values on the indirect address line inputs of the multiplexer. The module which controls the processor can select these addresses by activating the indirect address line on this multiplexer.