The C64 operates at 1MHz. Due to the multiplexing of the address/data bus, you can utilize the bus for half of that time, or 500nS.
The AVR32 operates at 66MHz (some of the parts operate faster, but I'm using your example). Thus, that would be ~66 instructions for cycle, and ~33 per half cycle.
Now, in actuality, you have to wait until the VIC-II gets off the bus, and you have to get off right as PHI2 transitions back, so the actual time window is < 50%. I took 30 cycles as a generous estimate.
The fastest way to emulate a device on the bus would be to spin looking at PHI2. When it transitions high, grab the data and address, do your work, and decide how to respond. Such a spin might take a few cycles, so you're < 30 cycles at that point.
For a write operation, assuming you made the bus transceivers a bit smarter such that they could simply capture the address and data information, you can take the full 66 cycles in 1uS to do the work, but for a read operation, you have to deliver something on the bus in about 20 cycles, so that the 6510 has enough setup time to latch the data. The REU might be even more picky, I do not know.
So, by the time you factor in:
- 50% of a 6510 cycle is effectively your window of usage
- you'll need either 10-20 cycles for an IRQ service, or 3-6 cycles for a spin loop on PHI2 transition
- You must place the read data on the bus in enough time to stay withing the setup requirements for the 6510
You're going to be struggling to satisfy timing requirements on the bus interface. Mind you, the other 33 cycles can be used to do actual work (Ethernet, USB, etc.), but the bus interface will be tough to implement.
Jim