Background

I have recently been studying how to introduce the Wishbone bus protocol into my Principles of Computer Composition course, so I took the opportunity to learn about the Wishbone protocol.

Bus

What is a bus? The bus is usually used to connect CPU and peripherals. For better compatibility and reusability, it is thought that a unified protocol can be designed, where the CPU implementation is the party that initiates the request (also known as master) and the peripheral implementation is the party that receives the request (also known as slave), so that if you want to add peripherals or replace the CPU implementation, it becomes simpler and reduces a lot of adaptation workload.

So, let’s think about what a bus protocol needs to include. For the CPU, the program will read and write memory, and reading and writing memory requires the following signals to be transferred to memory.

  1. address (addr): for example, a 32-bit processor is a 32-bit address, or the width of the address line is calculated according to the size of the memory
  2. data (w_data and r_data): write data and read data respectively, usually 32 or 64 bits wide, i.e. the amount of data that can be transferred in one clock cycle
  3. read or write (we): high means write, low means read
  4. byte valid (be): for example, to achieve a single-byte write, although w_data may be 32 bits wide, but the actual write is one of the bytes

In addition to the content of the request, a valid signal is added to indicate that the CPU wants to send the request: high means send the request, low means don’t send the request. In many cases, the peripheral is slow and may not be able to process the request every cycle, so the peripheral can provide a ready signal: when valid=1 && ready=1, the request is sent and processed; when valid=1 && ready=0, it means the peripheral is not ready, so the CPU needs to keep valid=1, and when the peripheral is ready, the valid=1 && ready=1 request will take effect.

To briefly summarize the above requirements, we can get the list of signals on the master and slave side respectively. This time, we use _o for output and _i for input to get the signals on the master side (CPU side).

  1. clock_i: clock input
  2. valid_o: high means master wants to send request
  3. ready_i: high means the slave is ready to process the request
  4. addr_o: the address that master wants to read or write to
  5. we_o: master wants to read or write
  6. data_o: master wants to write the data
  7. be_o: master read/write byte enable, used to implement single byte write, etc.
  8. data_i: the data that the slave provides to the master for reading

In addition to the clock are input, the rest of the above signals input and output symmetry, you can get the slave side (peripheral side) of the signal.

  1. clock_i: clock input
  2. valid_i: high means master wants to send request
  3. ready_o: high means the slave is ready to process the request
  4. addr_i: the address that master wants to read or write to
  5. we_i: master wants to read or write
  6. data_i: the data that master wants to write
  7. be_i: master read/write byte enable, used to implement single byte write, etc.
  8. data_o: the data that the slave provides to the master for reading

Based on the self-research bus we designed above, the following waveform can be plotted (with the master’s signal as an example).

waveform

  • a cycle: at this time valid_o=1 && ready_i=1 means there is a request, at this time we_o=1 means it is a write operation and the write address is addr_o=0x01 and the data written is data_o=0x12
  • b cycle: at this point valid_o=0 && ready_i=0 means nothing happened
  • c cycle: at this time valid_o=1 && ready_i=0 means master wants to read data (we_o=0) from address 0x02 (addr_o=0x02), but slave does not accept it (ready_i=0)
  • d cycle: at this point valid_o=1 && ready_i=1 means there is a request, master reads data (we_o=0) from address 0x02 (addr_o=0x02), and the data read is 0x34 (data_i=0x34)
  • e cycle: at this point valid_o=0 && ready_i=0 means nothing happened
  • f cycle: at this time valid_o=1 && ready_i=1 means there is a request, master writes data (we_o=1) to address 0x03 (addr_o=0x03), the written data is 0x56 (data_i=0x56)
  • g cycle: at this time valid_o=1 && ready_i=1 means there is a request, master reads data (we_o=0) from address 0x01 (addr_o=0x01), the data read is 0x12 (data_i=0x12)
  • h cycle: at this point valid_o=1 && ready_i=1 means there is a request, master writes data (we_o=1) to address 0x02 (addr_o=0x02), the data written is 0x9a (data_i=0x9a)

From the waveform above, several observations can be made.

  1. when the master wants to initiate a request, set valid_o=1; when the slave can accept the request, set ready_i=1; when valid_o=1 && ready_i=1, it is considered as one request
  2. If the master initiates the request and the slave cannot receive the request, i.e. valid_o=1 && ready_i=0, the master should keep addr_o, we_o, data_o and be_o unchanged until the request ends
  3. When the master does not initiate a request, i.e. valid_o=0, the signals on the bus are considered invalid and should not be processed; for read operations, the data on data_i is valid only when valid_o=1 && ready_i=1.
  4. The request can occur for multiple consecutive cycles, i.e. valid_o=1 && ready_i=1 is equal to one for multiple consecutive cycles, which is the ideal situation to achieve the highest transmission speed of the bus

Wishbone Classic Standard

First let’s look at the simplest version of Wishbone, Wishbone Classic Standard, which is very similar to the above self-research bus, so let’s take a look at its signals, such as the master side (CPU side).

  1. CLK_I: clock input, i.e. clock_i in the self-developed bus
  2. STB_O: high means master wants to send a request, i.e. valid_o in the self-research bus
  3. ACK_I: High means the slave is processing the request, i.e. ready_i in the self-research bus
  4. ADR_O: the address that master wants to read or write, i.e. addr_o in the self-research bus
  5. WE_O: whether master wants to read or write, i.e. we_o in the self-study bus
  6. DAT_O: the data that master wants to write, i.e. data_o in the self-research bus
  7. SEL_O: master’s byte read/write enable, i.e. be_o in the self-study bus
  8. DAT_I: data read by master from slave, i.e. data_i in the self-study bus
  9. CYC_O: enable signal of the bus, no corresponding self-research bus signal

There are also some optional signals, so I won’t go into them here. As you can see, except for the last one CYC_O, the other signals are actually the self-developed bus we just designed. The CYC_O can be considered as the master wants to occupy the slave’s bus interface, and in common usage scenarios, it is directly considered as CYC_O=STB_O. It is used for the following purposes

  1. to occupy the bus interface of the slave and not allow other masters to access it
  2. simplify the implementation of interconnect

By replacing the waveform of the self-researched bus above with Wishbone Classic Standard, we get.

Wishbone Classic Standard

Wishbone Classic Pipelined

The Wishbone Classic Standard protocol above is very simple, but it runs into a problem: suppose the implementation is an SRAM controller that has a one-cycle delay for read operations, i.e., the address is given in one cycle and the result is not available until the next cycle. In Wishbone Classic Standard, the following waveform would appear.

waveform

  • a cycle: master gives the read address 0x01, then the SRAM controller starts reading, but the data is not read back yet, so ACK_I=0.
  • b cycle: SRAM finishes reading, put the read data 0x12 in DAT_I and set ACK_I=1.
  • c cycle: master gives the next read address 0x02, SRAM has to start reading again
  • d cycle: the SRAM finishes the second read, puts the read data 0x34 in DAT_I and sets ACK_I=1.

From the waveform, there is no problem with the function, but only one read operation can be performed every two cycles, which does not bring out the highest performance. So how to solve this problem? We give the first address in a cycle and get the first data in b cycle, then if we can give the second address in b cycle, we can get the second data in c cycle. In this way, a pipelined read operation can be performed once per cycle. However, Wishbone Classic Standard requires that the first request is not finished by the b cycle, so we need to modify the protocol to implement a pipelined request.

The idea is simple: since Wishbone Classic Standard considers the first request to be pending at the b cycle, let the first request be completed earlier in the a cycle, except that its data will not be available until the b cycle. In fact, a read operation at this time can be considered as divided into two parts: first, the master sends a read request to the slave, which is completed in the a cycle; then the slave sends the result of the read to the master, which is completed in the b cycle. To achieve this, we make the following changes.

  • Add STALL_I signal: CYC_O=1 && STB_O=1 && STALL_I=0 to indicate a read request
  • Modify the meaning of ACK_I signal: CYC_O=1 && STB_O=1 && ACK_I=1 means one read response

With the above modifications, we have the Wishbone Classic Pipelined bus protocol. The waveforms of the two consecutive read operations above are shown below.

waveforms

  • a cycle: master requests read address 0x01, slave receives read request (STALL_O=0)
  • b cycle: slave returns read request result 0x12 and sets ACK_I=1; at the same time master requests read address 0x02, slave receives read request (STALL_O=0)
  • c cycle: slave returns the result of read request 0x34 and sets ACK_I=1; master does not initiate request anymore and sets STB_O=0
  • d cycle: all requests complete, master sets CYC_O=0

This way we have implemented a slave that performs one read operation per cycle.