ELEC 350 EZ-KIT LITE TUTORIAL Rev.1.1 - Raymond Li - Sep.96 =========================================================== Introduction ============ The purpose of this tutorial is to supplement the documentation supplied with the EZ-KIT Lite. It is meant to give you a fundamental understanding of the ADSP-2181 and AD1847 codec combination in order to quickly implement DSP algorithms. The reader should have at least a copy EZ-KIT Lite Reference Manual. Optional in depth documentation include: Data Sheets Packaged with EZ-KIT Lite: ADSP-2100 Family DSP Micromputers ADSP-21xx DSP Microcomputer ADSP-2181 Serial-Port 16-Bit SoundPort Stereo Codec AD1847 PDF (Portable Document Format) Documentation from Analog Devices Web Site (wwww.analog.com): ADSP 2100 Family User's Manual DSP Applications Using the ADSP-2100 Family, Vol. 1 DSP Applications Using the ADSP-2100 Family, Vol. 2 Rutgers University DSP Course Lab Manual by [prof name] [url] Bound Documentation from Analog Devices ADSP-2100 Family Assembler Tools & Simulator Manual ADSP-2181 User's Manual Template and Batch Files by Author ez_shell.dsp : template for DSP coding ez_init.dsp : initialization module ez_core.dsp : DSP algrorithm module ez_end.dsp : wrapup module eza.bat : quick assembly and link ezl.bat : quick upload to kit ezs.bat : quick simulation I recommend that you complete these preliminaries before attempting to program to the DSP: 1. Read the EZ-KIT Lite Reference Manual Chp1 : o all Chp2 : o skim Chp3 : o note EZ-Kit Lite board components Chp4 : o note subdirectories containing sample DSP programs o note JP2 is used to select MIC or LINE level input Chp5 : o note the design procedure o gives only brief assembly language introduction (this tutorial addresses that need w/o reading the 400+ page User's Manual) o note sample code listing (compare with ez_shell.dsp by author) o note assembler, linker, and simulator calling (see batch files by author) Chp6 : o try each demonstration o note that the simulator does not run in a DOS box under Win 3.x or Win95; simpler to use DOS loader (i.e. ezfast.com) Chp7 : o note program and data memory restrictions Chp8 : o all PQR : o note these sections Development Software Invocation Commands Instruction Set Summary ALU MAC Shifter Data Move Program Flow Control/Status Registers Memory Maps Interrupt Vector Tables 2. Go through this tutorial 3. Skim through sample code listings o demo programs installed with the EZ-KIT Lite software o ez_shell.dsp by author The tutorial is divided into 3 sections: [to complete] Section A: Description of DSP A.1 Registers A.2 Computational Units A.3 Numeric Format A.4 Memory A.5 Program Control A.6 Data Transfer A.7 Multifunction Instructions Section B: Programming DSP Section C: Notes and Hints In order to program the DSP you'll need to be comfortable with assembly language. In other words you should be familiar with these concepts: o binary and hexadecimal to decimal conversion o instruction set (bit, logic, and arithmetic functions) o memory addressing modes o data & program memory management o registers (control, data, and status) o program counter o stack, stack pointer o interrupts, interrupt vector table, interrupt service routines o accumulator First, you'll be introduced to the assembly language characteristics particular to the ADSP-2181. You'll quickly notice that its syntax and architecture are quite different from 68k, HC11, 80x86, and other microprocessors/controllers. The ADSP-2181 syntax is algebraic like and, with references at hand, is not too hard to understand. Next, many simple example code fragments will be given to illustrate the syntax of the most common instructions. After a few examples, you'll see the pattern of the instruction set and be able to sequence them into your algorithm. Specifically, code syntax comparisions will be made between ADSP-2181 and MATLAB. MATLAB's scripting language is hi-level enough for anyone with some programming experience to understand. As well, MATLAB is perfectly suitable for implementing DSP algorithms and will in fact be an aid when debugging your algorithm. The code syntax comparisions will include: o variable declaration (and restrictions) o data transfer & assignment o arithmetic o flow control statements o loops Hopefully this tutorial will be sufficient to get you started in DSP programming. Any corrections/suggestions are welcome. Raymond Li UVic EE Comm r.li@ieee.ca Section A: Description of DSP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Registers: Used to Hold Data Values =================================== Registers are groups of flip-flops used for memory storage. Here is a list available registers on the ADSP-2181 (detailed explanations follow): ADSP-Register Register Name(s) ---------------------------------------------------------------------------- ax0, ax1, ay0, ay1 ALU Inputs ar ALU Result af ALU Feedback mx0, mx1, my0, my1 Multiplier Inputs mr0, mr1, mr2 Multiplier Results (3 parts) mf Mulfiplier Feedback si Shifter Input se Shifter Exponent sr0, sr1 Shifter Result (2 parts) sb Shifter Block (for block floating-point format) px PMD-DMD Bus Exchange i0-i7 DAG Index Registers m0-m7 DAG Modify Registers l0-l7 DAG Length Registers (for circular buffers) pc Program Counter cntr Counter for Loops astat Arithmetic Status mstat Mode Status sstat Stack Status imask Interrupt Mask icntl Interrupt Control Modes rx0, rx1 Receive Data Registers (not on ADSP-2100) tx0, tx1 Transmit data Registers (not on ADSP-2100) The ADSP-2181 has quite a few registers which is typical for RISC type CPUs. This is a common tradeoff for the increased computational speed. You'll find out that a large number of registers are needed since instructions are restricted by to the operands (registers) they can use. Register Wordlength [complete this] ------------------- Along with the types of registers available, you'll need to know their lengths: 16 bits: mx0, mx1, my0, my1, mr0, mr1, ax0, ax1, ay0, ay1, ar, sr0, sr1, si 40 bits: mr (consists of mr2, mr1, mr0) 32 bits: sr (consists of sr1, sr0) 8 bits: mr2, se Reserved Registers ------------------ The following is very important. Since the ADSP-2181 is interfaced to the AD1847 codec, some resources are consumed. These include the 2 serial ports and these registers: i0, l0, i1, l1 This is because these registers are used during the initialization code for the EZ-KIT Lite: i0 = ^rx_buf; {point to start of buffer} l0 = %rx_buf; {initialize length register} i1 = ^tx_buf; l1 = %tx_buf; i3 = ^init_cmds; l3 = %init_cmds; These lines are in the example code listing in the Reference Manual page 5-7. Notice that i3 and l3 are available after the initialization, while the others are used (needed) throughout the program. (Note the algebraic like syntax, semicolon delimiters, and braces... more on syntax later.) Computational Units: Gives Computational Functionality ====================================================== There are 3 computational units in the ADSP-2181. These 3 units form the basis of the instruction set and allows you to perform a variety of arithmetic, logical and bit manipulation functions. Note that most instructions operate with two operands, xop and yop, and that there are unique restrictions to what these are operands can be. Check what the permissable xops and yops are (i.e. which registers) from the Reference Manual. The breakdown of the functions for each unit follows. Do become familiar with the instruction set summary in the Appendix of the Reference Manual as you will very often need to refer to it (pgs 10-17). ALU (Arithmetic Logic Unit) Add / Add with carry Subtract X-Y / Subtract X-Y with borrow Subtract Y-X / Subtract Y-X with borrow AND, OR, XOR PASS, CLEAR Negate NOT Absolute Value Increment Decrement Divide Bit Operations MAC Multiply Multiply / Accumulate Multiply / Subtract Transfer MR Clear Conditional MR Saturation Shifter Arithmetic Shift Logical Shift Normalize Derive Exponent Block Exponent Adjust Arithmetic Shift Immediate Logical Shift Immediate Program Control: Loops, Branching, and Subroutines ================================================== The other aspect of algorithm design is the ability to control program flow. The ADSP-2181 control flow instructions include the following (pg 20): Do Until Jump Call Subroutine Jump/Call on Flag In Pin Modify Flag Out Pin Return from Subroutine Return from Interrupt Service Routine Idle Data Transfer: Program and Data Management ========================================== One of the distinctions between hi-level and assembly language programming is the details of data transfers in assembly. Instead of data types, you now deal with addressing modes, allocation, and transfers. Here are the kinds of data transfers you can do with the ADSP-2181 (pg 18): Register-to-Register Move Load Register Immediate Read Overlay Register Write Overlay Register Data Memory Read (Direct Address) I/O Read (Direct Address) Data Memory Read (Indirect Address) Program Memory Read (Indirect Address) Data Memory Write (Direct Address) Writes Contents of Overlay Registers to Data Memory I/O Write (Direct Address) Data Memory Write (Indirect Address) Program Memory Write (Indirect Address) Multifunction Instructions ========================== This will be new to people without previous DSP programming experience. DSPs are distinguished from normal CPUs with their speed and their ability to execute MACs (multiply and accumulate) instructions. As it turns out, this instruction is used very often in DSP algorithms. To facilitate the processing speed, the ADSP-2181 has multifunction instructons. Since, almost all instructions execute in 1 clock cycle (30ns), execution of multiple functions greatly increase the computation of your DSP algorithm. Here instructions include (pg 19): Computation with Register-to-Register Move Computation with Memory Read Computation with Memory Write Data & Program Memory Read ALU/MAC with Data & Program Memory Read Miscellaneous Instructions ========================== For completeness, here are the miscellaneous instructions (pg 21): NOP Modify Address Register Stack Control Mode Control Put Processor in Idle State Put Processor in Idle State and Slow Clock by a Factor of n Numeric Formats =============== The ADSP-2181 is a 16-bit fixed point DSP microprocessor which means it has 16-bits of precision to represent numeric values. There are two general classes of DSPs: fixed-point or integer DSPs and floating-point DSPs. Fixed point/integer DSPs are often cheaper, faster, and consume less power whereas floating point DSPs are simpler to program. Floating-point DSPs differ from fixed-point/integer DSPs in that they use an exponent to indicate the radix point (decimal point) of the numeric value. You can see the benefits of floating-point representation from using exponential or scientific notation - you get more dynamic range by not being limited by the position of the radix point. For example: if you only had 4 decimal digits, the smallest positive non-zero number you can represent is 0.0001 in fixed point fractional or 1 in integer format. In floating point format with a 3 digit exponent, it would be 1E-999. Floating point allows you to omit the non-significant digits to increase the precision of the numeric value. Since numbers are represented in bits and not in digits in the DSP, each bit is weighted according to its position from the radix point. For example 101.1010 in binary is 5.625 in decimal: 101.101b = 1*2^2 + 0*2^1 + 1*2^0 + 1*2^-1 + 0*2^-2 + 1*2^-3 + 0*2^-4 = 4 + 0 + 1 + 0.5 + 0 + 0.125 + 0 = 5.625d With the 16-bit ADSP-2181, you can have 16 different fixed-point formats by varying the position of the radix point: 1.15, 2.14, 3.13, 4.12, 5.11, 6.10, 7.9, 8.8, 9.7, 10.6, 11.5, 12.4, 13.3, 14.2, 15.1, and 16.0. The above example is in 3.4 format where 3 is the number of integer bits and 4 is the number of fractional bits. You can calculate out a table showing the largest positive, negative, and the LSB values the 16 formats. For example: Format Largest Postive Largest Negative LSB Value 1.15 0.999969482421875 -1 0.000030517578125 16.0 32767.000000000000000 -32768 1.000000000000000 In general, the range of an a.b fractional format number is: -2^(a-1) <= x <= 2^(a-1) - 2^(-b) where a is the number of integer bits b is the number of fractional bits x is the numeric value In binary multiplication, the product of two 16-bit numbers is a 32-bit number. More specifically, M.N multiplied with P.Q gives a (M+P).(N+Q) format number. For example, the product of two 13.3 numbers is a 26.6 number and the product of two 1.15 numbers is a 2.30 number. The product of 2 twos-complement number gives 2 sign bits; one of which is identical and redundant. (Remember twos-complement representation? One's complement is bit inversion, twos-complement is bit inversion then add 1). Since one bit is redundant, you can left shift the product by one bit. Additionally, if one of the inputs was a 1.15 number, the left shift causes the result to have the same format as the other input (with 16 bits of additional precision). For example, multiplying a 1.15 number by a 5.11 number yields a 6.26 number. When shifted left one bit, the result is a 5.27 number, or a 5.11 number plus 16 LSBs. [REF] The ADSP-2181 has two modes: fractional and integer. In fractional mode, which is the default on reset, the multiplier result is always shifted left one bit before being written to the result register. A left shift causes the multiplier result to be 1.31 which can be rounded to 1.15. As a result, the 1.15 format is the most convenient to use. In integer mode, the left shift does not occur. The choice of mode is controlled by a bit in the MSTAT register. 1.15 numbers are conveniently represented in hex notation. For example: 1.15 Number Decimal Equivalent 0x0001 0.000030517578125 0x7FFF 0.999969482421875 0xFFFF -0.000030517578125 0x8000 -1.000000000000000 Cycle Times =========== The ADSP-2181 normally executes all instructions in one cycle (30ns). If an instruction causes a data fetch from program memory, an extra cycle is required since the processor cannot pre-fetch the next instruction in the same cycle. This overhead cycle usually occurs inside loops and only once. Available Memory ================ The ADSP-2181 has 80K bytes of on-chip memory. 16K words (24 bits wide) are allocated as program RAM and 16K words (16 bits wide) are allocated for data memory as shown: Total Memory: 80K bytes = 80 * 1024 * 8 = 655,360 bits Program Memory: 16K * 24 = 16 * 1024 * 24 = 393,216 bits Data Memory: 16K * 16 = 16 * 1024 * 16 = 262,144 bits 393,216 + 262,144 = 655,360 bits = 80K bytes This is a significant amount of RAM compared to other DSPs from TI and Motorola. The memory maps are on page 30. However, they don't show the memory taken up by the Monitor Program. The monitor program essentially is the OS for the EZ-KIT Lite and is loaded from EPROM to their RAM locations on reset. Note the memory restrictions on pg 7-1 and the revised memory map below: Data Memory Map with Monitor (Words are 16 bits wide) ---------------------------------------------------- Data Memory Address +---------------------+--------+ | 32 Memory Mapped | 0x3FFF | | Registers | 0X3FE0 | +---------------------+--------+ | 480 Monitor | 0x3FDF | | Operating Variables | 0x3E00 | +---------------------+--------+ | 7680 Available | 0x3DFF | | Internal Words | 0x2000 | +---------------------+--------+ | | 0x1FFF | | | | | | | | 8K Available | | | Internal Words | | | | | | | | | | | | | 0x0000 | +---------------------+--------+ As you can see, this leaves 15,872 data words available. Program Memory Map with Monitor (Words are 24 bits wide) -------------------------------------------------------- Program Memory Address +---------------------+--------+ | 2048 Monitor | 0x3FFF | | Program Words | 0X3800 | +---------------------+--------+ | | 0x37FF | | | | | 6144 Available | | | Internal Words | | | | | | | 0x2000 | +---------------------+--------+ | | 0x1FFF | | | | | | | | 8K Available | | | Internal Words | | | | | | | | | | | | | 0X0000 | +---------------------+--------+ The Monitor takes up 2K words of memory leaving 14K or 14,336 words available. This is plenty of memory for most DSP algorithms, but you will need to the restrictions in mind when allocating large data buffers. Shell Program ============= The ADSP-2181 and AD1847 interface requires a fair amount of setup as shown in the example listing starting on pg 5-5. I've merged several versions of this "template" program into a listing called "ez_shell.dsp" which comes with this tutorial. Do read the additional comments to gain a better understanding of how it works and the syntax of the instructions. Batch Files =========== The development software executes under DOS with command line options. To simplify the process, 3 batch files were written to speed up the iterative process: eza.bat : quick assembly and link ezl.bat : quick upload to kit ezs.bat : quick simulation Take a look at the invocation of the programs in the batch files and modify them to suit your needs. Other Resources =============== Further to the resources listed in the Introduction, here are some additional resources if you are especially keen on DSPing: [complete] DSP FAQ TI & Motorola web sites DSPNet newsgroup Code & S/W Update ================= An update for the EZ-KIT Lite software is available on Analog's Devices web site. You can find it under [directions] Instruction Set Syntax ====================== o variable declaration (and restrictions) o data transfer & assignment o arithmetic o flow control statements o loops {=========================================================================} {=========================================================================} Instruction Set Overview ------------------------ MAC Registers ------------- mx0, mx1, my0, my1, mr, mr0, mr1, mr2 Where: mr0 16 LSB bits mr1 16 MSB bits mr2 overflow bits MAC Instructions ---------------- mr = 0; mr = xop * yop (ss); (ss) 1.15 signed numbers mr = xop * yop (rnd); (rnd) round 32 bit result into 16 MSB in mr1 mr = mr + xop * yop (ss); mr = mr + xop * yop (rnd); mr = mr - xop * yop (ss); mr = mr - xop * yop (rnd); if mv sat mr; saturates mr to its largest (positive or negative) value whenever the overflow flag mv is raised Where: xop: mx0, mx1, mr0, mr1, mr2, ar, sr0, sr1 yop: my0, my1 ALU Registers ------------- ax0, ax1, ay0, ay1, ar ALU Instructions ---------------- ar = xop + yop; ar = xop - yop; ar = yop - xop; Where: xop: ax0, ax1, ar, mr0, mr1, mr2, sr0, sr1 yop: ay0, ay1 Shifter Registers ----------------- sr, sr0, sr1, si, se Shifter Instructions -------------------- sr = ashift xop by exp (hi); scale xop by 2^exp into sr ; sr1 contains 16 MSB bits sr = ashift xop (hi); exp preloaded into se ; se is the 8-bit exponent register Where: xop: si, sr0, sr1, ar, mr0, mr1, mr2 exp: any signed integer, such as, 1 -1, 2, -2, ... DAG (Data Address Registers) ---------------------------- DAG1 only points to DM memory Index Registers: i0, i1, i2, i3 {14 bit registers} Modify Registers: m0, m1, m2, m3 Length Registers: l0, l1, l2, l3 DAG2 points to DM or PM memory Index Registers: i4, i5, i6, i7 Modify Registers: m4, m5, m6, m7 Length Registers: l4, l5, l6, l7 mr1 = dm(i2, m2); {write into mr1 the contents of the DM memory location pointed to by i2 and then change i2 by an amount m2} dm(i2, m2) = mr1; {write into DM memory location pointed by i2 the contents of mr1 and then change i2 by an amount m2} modify(i2,m2); {modify i2 by amount m2 without data access} {=========================================================================} {=========================================================================} Example: Constant and Variable Declaration ------------------------------------------ .const a = 0x6000; {a = 0.75 in decimal format} .const D = 3; {D = 3, an integer; can't use this in a register!} .var/dm w[D+1]; {4-dimensional linear buffer w[i], i=0,1,2,3} .var/dm x, y; {temporary variables} Example: Data Transfer ---------------------- mr1 = 0; {load mr1 with zero} my1 = a; {load my1 with the constant a} ax1 = 0x4000; {load ax1 with the value 0x4000 = 0.50 in decimal} ar = sr1; {load ar with content of sr1} mx1 = ay0; {load mx1 with content of ay0} mr1 = dm(x); {load mr1 with content of DM location x} dm(y) = my1; {load DM location y with the content of my1} mr1 = dm(w); {load mr1 with content of buffer location w[0]} {note syntax for buffer} mr1 = dm(w+1); {load mr1 with content of buffer location w[1]} dm(w+2) = mr1; {load buffer location w[2] with content of mr1} Example: Linear Buffer ---------------------- .const D = 100; .var/dm w[D+1]; {placed in DM memory} i2 = ^w; {i2 points to beginning of w} l2 = 0; {l2 must be set to 0 for a linear buffer; does not autowrap} Example: Circular Delay-Line Buffer ----------------------------------- .const D = 100; .var/dm/circ w[D+1]; {circular buffer length 101 placed in DM memory} i2 = ^w; {i2 points to beginning of w; note ^ operator} l2 = %w; {l2 is set equal to the length of w; note % operator} m2 = 1; {post increment i2 by one} Example: Concatenated Circular Buffers -------------------------------------- {This declaration defines an extended circular buffer of double-length 2(M+1). The DAG pointer i4 will traverse both buffers a and b before wrapping around to the beginning.} .const M = 100; .var/pm/circ a[M+1], b[M+1]; i4 = ^a; l4 = 2*(M+1); Example: Do Loop ---------------- cntr = l2; do zero until ce; {repeat until counter expires} zero: dm(i2, m2) = 0; {put 0 in dm(i2, m2) and point to next} Examples: Getting a Circular Buffer Value ----------------------------------------- m2 = d; modify(iw, m2); {go to location pointed to by i2 + d} m2 =-d; mr1 = dm(i2, m2); {put its content in mr1, then restore i2} Example: First Order Filter --------------------------- Theory: y(n)=ay(n-1)+bx(n), where a=0.75, b=0.25 output=fn(past output, present input, coefficients) Assign internal states: w1(n) = y(n-1) w1(n+1) = y(n) .const a = 0x6000; {a=0.75} .const b = 0x2000; {b=0.25} .var/dm w1; {filter's internal state} ax0 = 0; {ax0 is used to hold the constant 0 because there is no instruction to write an immediate data value to memory using an immediate address} dm(w1) = ax0; {initialize w1 to zero} my0 = dm(rx_buf + 2); {get right input from codec} {the x value} mx0 = b; {filter coefficient b} mr = mx0 * my0 (ss); {mr=b*x} mx0 = a; {filter coefficient a} my0 = dm(w1); {get internal state from DM} mr = mr + mx0 * my0 (rnd); {mr = y = a*w1+b*x = output sample} {rounded to 16 MSB} dm(w1) = mr1; {update state, w1=y} dm(tx_buf + 2) = mr1; {send right output to codec} Example: 3rd Order FIR Filter ----------------------------- Theory: y(n)=2x(n)-3x(n-1)-2x(n-2)+x(n-3) {dot product of input & coefficient vectors} output=fn(past inputs, present input, coefficients) Algorithm: for each input x do: *p = s0 = x s1 = tap(^w,1,1,p) s2 = tap(^w,1,2,p) s3 = tap(^w,1,3,p) y = 2*s0-3*s1-2*s2+s3 cdelay() Coefficients in [-4,4], scaled down by 4 to 1.15 format: h=[2,-3,-2,1] -> [0.50, -0.75, -0.50, 0.25] -> [0x4000, 0xa000, 0xc000, 0x2000] The final sum will then need to be scaled up by 4. Code: .const M = 3; {filter order} .var/dm/circ w[M+1]; {delay-line buffer placed in DM} .var/pm/circ h[M+1]; {filter coefficient buffer placed in PM} .init h: 0x4000, 0xa000, 0xc000, 0x2000; {can be entered as 3.13} i2 = ^w; l2 = %w {delay-line buffer pointer and length} i4 = ^h; l4 = %h; {delay-line buffer pointer and length} zero (i2, m2, l2); {clear delay-line buffer to zero} mx1 = dm(rx_buf +2); {read right input from codec} tapin(i2, m2, mx1); {put mx1 into tap-0 of delay line} {---Dot Product of internal states with filter coefficients---} m2 = 1; m4 = 1; {set increments to 1} mr = 0, mx0 = dm(i2,m2), my0 = pm(i4,m4); {example of multifunction instructions} {clear, fetch, increment} {executes in one cycle = 30ns} {fetch s0,h0, point to s1,h1} mr = mr + mx0 * my0 (ss), mx0 = dm(i2,m2), my0 = pm(ir,m4); {1st partial sum} {psum = s0*h0} {fetch s1,h1, point to s2,h2} mr = mr + mx0 * my0 (ss), mx0 = dm(i2,m2), my0 = pm(ir,m4); {2nd partial sum} {psum = psum + s1*h1} {fetch s2,h2, point to s3,h3} mr = mr + mx0 * my0 (ss), mx0 = dm(i2,m2), my0 = pm(ir,m4); {3rd partial sum} {psum = psum + s2*h2} {fetch s3,h3, point to s0,h0} {wrap around to s0,h0} mr = mr + mx0 * my0 (rnd) {mr = y; final sum} {final sum} {sum = psum + s3*h3} {sum in mr1} if mv sat mr; {check for saturation} cdelay(i2, m2); {update delay} sr = ashift mr1 by 2 (hi) {scale output by factor of 2^2 = 4} dm(tx_buf + 2) = sr1 {write right output to codec} {---Using Do-Loop for Multifunction Instructions---} m2 = 1; m4 = 1; mr = 0, mx0 = dm(i2,m2), my0 = pm(ir,m4); cntr = M; {M = filter order} do dotloop until ce; dotloop: mr = mr + mx0 * my0 (ss), mx0 = dm(i2,m2), my0 = pm(i4,m4); mr = mr + mx0 * my0 (rnd); if mv sat mr; {---Replace Multifunction Instructions with DOT.DSP Macro---} mx1 = dm(rx_buf + 2); {read right input from codec} tapin(i2, m2, mx1); {put mx1 into tap-0 of delay line} dot(M, i4, m4, i2, m2); {compute output into mr1} cdelay(i2, m2); {update delay} sr = ashift mr1 by 2 (hi); {scale output by factor of 2^2 = 4} dm(tx_buf +2) = sr1; {write right output to codec} {---Replace Instructions with CFIR.DSP Macro---} mx1 = dm(rx_buf + 2); {read right input from codec} cfir(M, i4, m4, i2, m2, mx1); {input from mx1, output in mr1} sr = ashift mr1 by 2 (hi); {scale output by factor of 2^2 = 4} dm(tx_buf +2) = sr1; {write right output to codec} {=========================================================================} {=========================================================================} {zero.dsp - initialize delay line buffer to zero. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. %0 = pointer to delay-line buffer, e.g., I2 %1 = M-register to use with buffer, e.g., M2 %2 = length of buffer, e.g., L2 typical usage: -------------- zero(i2, m2, L2); i2 cycles back to its initial value internal operation: ------------------- cntr = L2; m2 = 1; do loop until ce; loop: dm(i2, m2) = 0; } .macro zero(%0, %1, %2); .local loop; cntr = %2; %1 = 1; do loop until ce; loop: dm(%0, %1) = 0; .endmacro; {=========================================================================} {=========================================================================} {tap.dsp - tap outputs of circular delay line. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. Based on tap.c and tap2.c of Introduction to Signal Processing. %0 = pointer to delay-line buffer, e.g., i2 %1 = M-register to use with buffer, e.g., m2 %2 = d, for d-th tap content, where d=1, ... ,D %3 = data register for result, e.g., ax0, ax1, ay0, ay1, ar, mx0, mx1, my0, my1, mr1, sr1 typical usage: -------------- tap(i2, m2, d, sr1); put d-th tap content into SR1 note: i2 is not changed internal operation: ------------------- m2 = d; modify(i2, m2); point to d-th tap m2 =-d; sr1 = dm(i2, m2); put d-th tap in data register and restore i2 to its entry value } .macro tap(%0, %1, %2, %3); %1 = %2; modify(%0, %1); {point to d-th tap} %1 = -%2; %3 = dm(%0, %1); {put d-th tap in data register} .endmacro; {=========================================================================} {=========================================================================} {tapin.dsp - put input sample into tap-0 of delay line. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. %0 = pointer to delay-line buffer, e.g., I2 %1 = M-register to use with buffer, e.g., M2 %2 = data register holding input, e.g., ax0, ax1, ay0, ay1, ar, mx0, mx1, my0, my1, mr1, sr0, sr1 typical usage: -------------- tapin(i2, m2, mx1); put value from MX1 into 0-th tap note: i2 is not changed internal operation: ------------------- m2 = 0; dm(i2, m2) = mx1; } .macro tapin(%0, %1, %2); %1 = 0; dm(%0, %1) = %2; {put value from dreg %2 into delay line} .endmacro; {=========================================================================} {=========================================================================} {cdelay.dsp - update circular delay-line buffer. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. Based on cdelay.c and cdelay2.c of Introduction to Signal Processing. %0 = pointer to delay-line buffer, e.g., i2 %1 = m-register to use with buffer, e.g., m2 typical usage: -------------- cdelay(i2, m2); internal operation: ------------------- m2 = -1; modify(i2, m2); (i.e., backshift pointer i2) } .macro cdelay(%0, %1); %1 = -1; modify(%0, %1); {backshift pointer} .endmacro; {=========================================================================} {=========================================================================} {dot.dsp - dot product of a DM with a PM circular buffer of length M+1. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. %0 = filter order M, i.e., length L = M+1 %1 = pointer to filter taps buffer in PM, e.g., i4 (not modified) %2 = m-register to use with tap buffer, e.g., m4 %3 = pointer to delay-line buffer in DM, e.g., i2 (not modified) %4 = m-register to use with delay buffer, e.g., m2 result is returned in MR1; i2, i4 are not modified - they cycle around to their entry values typical usage: -------------- dot(M, i4, m4, i2, m2); internal operation: ------------------- m2 = 1; m4 = 1; mr = 0, mx0 = dm(i2, m2), my0 = pm(i4, m4); cntr = M; do loop until ce loop: mr = mr + mx0 * my0 (ss), mx0 = dm(i2, m2), my0 = pm(i4, m4); mr = mr + mx0 * my0 (rnd); if mv sat mr; } .macro dot(%0, %1, %2, %3, %4); .local loop; %2 = 1; %4 = 1; mr = 0, mx0 = dm(%3, %4), my0 = pm(%1, %2); cntr = %0; do loop until ce; loop: mr = mr + mx0 * my0 (ss), mx0 = dm(%3, %4), my0 = pm(%1, %2); mr = mr + mx0 * my0 (rnd); if mv sat mr; .endmacro; {=========================================================================} {=========================================================================} {cfir.dsp - direct-form FIR filter of order M using circular buffers. Junior DSP Lab - Rutgers ECE Dept - S. J. Orfanidis - Jan 1996. Based on cfir.c and cfir2.c of Introduction to Signal Processing. In book: y = cfir(M, h, w, &p, x); %0 = filter order M, so that filter length is L = M+1 %1 = pointer to filter taps buffer in PM, e.g., i4 %2 = m-register to use with tap buffer, e.g., m4 %3 = pointer to delay-line buffer in DM, e.g., i2 %4 = m-register to use with delay buffer, e.g., m2 %5 = data register holding input, e.g., ax0, ax1, ay0, ay1, ar, mx0, mx1, my0, my1, mr1, sr0, sr1 the filter output is returned in MR1 and the delay-line pointer i2 is updated, that is, backshifted typical usage: -------------- cfir(M, i4, m4, i2, m2, mx1); internal operation: ------------------- tapin(i2, m2, mx1); put input from MX1 into tap-0 dot(M, i4, m4, i2, m2); compute dot-product output cdelay(i2, m2); update delay line } .macro cfir(%0, %1, %2, %3, %4, %5); tapin(%3, %4, %5); {read input sample into delay line} dot(%0, %1, %2, %3, %4); {compute filter output into MR1} cdelay(%3, %4); {update delay line} .endmacro;