Why would a NES game use an undocumented 1-byte or 2-byte NOP in production?

77

16

Reading the NESdev wiki page on CPU unofficial opcodes, I see a few games use an undocumented 2-byte NOP instuction in production: Puzznic, F-117A Stealth Fighter, and Infiltrator use $89 #i. Beauty and the Beast uses $80 #i. Additionally, Dynowarz uses the 1-byte NOPs $DA and $FA.

Why would the devs do this? What benefit do these instructions provide when developing for the 6502?

JAL

Posted 2016-12-04T04:21:22.730

Reputation: 5 992

6Good question. Normally, you'd use an undocumented NOP for timing ($04, $44, and $64 are three cycles long), but $DA and $FA take the same one byte of code and two execution cycles as the official $EA NOP, while $80 and $89 are two bytes and two cycles. – Mark – 2016-12-04T07:29:38.817

6

@ThisClark The No Operation assembly instruction.

– JAL – 2016-12-04T16:33:09.137

Are you sure those are really NO-OP? Un-documented means just that, and look at the information from Mark on so well known but undocumented instructions. Maybe those are actually STA or LDA or something and they are used to misdirect pirates trying to remove the copy protection coding. This is a bit far-fetched and probably the other answers, especially the ones about leaving some room for expansion, or some relative addressing that would be messed up between versions of the code base are probably better, but keep in mind, un-documented op-codes are just that. They may not be 'NO-OP' at all – Andyz Smith – 2016-12-05T16:18:50.633

@AndyzSmith Am I absolutely sure? No, because I didn't work on the processor architecture and I don't have a processor spec. But if you look online, guides and instruction documentation list these instructions as NOP (or DOP for the 2-byte NOP). – JAL – 2016-12-05T16:40:00.877

@JAL Yeah, I know but that is unofficial and like you say, nobody really knows what they did when the did the processor architecture, except, MAYBE some real deep techs who write these intensely complicated F117 simulator in assembly, people who were involved in the design of the processor die itself.....so they just MIGHT know that certain 'undocumented' NOP op-codes are actually LDA or STA and use that to their companies benefit in creating obscure copy protection schemes. – Andyz Smith – 2016-12-05T17:05:08.440

@AndyzSmith It's also possible that these are mistakes (see this answer). $89 could have been STA #ii, but that makes no sense.

– JAL – 2016-12-05T18:40:27.573

2@JAL I would say, somewhere deep in the internal op-code pipelining and physical ALU on-die wiring, it was more practical to simply have duplicate op-codes that share a similar bit pattern than to try to use extra die-wiring/die layers to discriminate between two bit patterns when there no need for op-code space. So I'd look at that and see, ok LDA is 11111110 or whatever and an undocumented, unapproved and technically illegal op-code is 01111110 right so they just didn't bother to discriminate and waste wire and so now 011111110 is actually exactly the same as LDA, but it's 'secret'. – Andyz Smith – 2016-12-05T19:10:45.777

@JAL I guess there could be mistakes but that kind of big mistake would usually cause an error, wouldn't you think? A JMP to Fishkill or some artithmetic error that causes the entire graphics to suddenly be inconsistent. – Andyz Smith – 2016-12-05T19:33:33.650

7@AndyzSmith The 6502 has been thoroughly reverse-engineered from die photos. We can tell exactly what effect each instruction has on every bit of processor state. I don't think there's a question of whether they're really NOPs. I don't think there was a question even in the 90s. It's such a tiny chip, there are only so many bits of state that any instruction could actually affect. – hobbs – 2016-12-05T22:17:44.810

@hobbs Ok yeah, I'm not really an expert and that WIKI goes beyond me as well. If I can ask, what's your vote on the mindset/intention of F117 Sim. authors in using those? – Andyz Smith – 2016-12-05T22:39:52.883

@AndyzSmith I think it's very hard to say. My first guess would be exactly what Mitchell Spector wrote in his answer (needed to nop-out a two-byte instruction without relocating everything), but it could be any number of things, and the answer is probably unknowable without getting one of the original developers, and maybe even then. – hobbs – 2016-12-05T22:44:26.923

1Note that sticking to the contract didn't matter much in those days - you coded for one machine. Your code never run on subtly different machines, so if you had something that worked, you didn't care whether it was documented (and part of the contract) or not. In contrast, today your code runs on millions of weird combinations of hardware and software, so sticking to the contract rather than the implementation is the only way to have at least a chance that your application is going to work across most of them (and just as importantly, across time). – Luaan – 2016-12-07T12:09:55.890

Answers

104

One use is as a copyright mechanism. Many distributors would steal/copy programs and sell pirate or derivative copies, by changing the text strings inside the code and reordering the blocks, it was hard to prove the code had been stolen.

Placing noops of different types you could put a signature sequence which was much easier to detect and hard to hide. A particular piece of working code could be argued as an "accidental" match, but the same argument was not possible for a sequence of noops. 32 noops spread through 4096 bytes of code makes the accidental argument, 4 billion times weaker using less than 1% of extra memory.

LOIS 16192

Posted 2016-12-04T04:21:22.730

Reputation: 1 036

@JohnKugelman - The Alsys Ada compiler for x86 used unexpected nops (e.g., mov cx,cx in a context where the flag bits don't matter) and also the x86 direction bit (as an alternate instruction encoding for a reg-reg op) for both this reason - copyright protection - and also for encoding static information necessary for the runtimes to handle exceptions properly at different points in the code. The x86 instruction set had a bunch of ways to encode information at no cost due to its irregular and redundant encodings. (I worked on that compiler at Alsys, but this stuff was undocumented of course.) – davidbak – 2017-12-01T23:44:44.120

21

Welcome to Retrocomputing Stack Exchange. You seem to know about assembly and software development: perhaps you'll be interested in some of these questions.

– wizzwizz4 – 2016-12-04T11:00:02.543

1Indeed! Half my rep overnight! Nice work, and welcome to the site :) – Matt Lacey – 2016-12-05T11:20:39.357

2

Comments are not for extended discussion; this conversation has been moved to chat.

– wizzwizz4 – 2016-12-06T07:35:49.890

3Do you have any quotes or references which you can add to back up this answer? – John Kugelman – 2016-12-06T19:02:00.103

@JohnKugelman The only source I could find for "watermarking" is on the Nesev wiki, but my guess is that this answer comes from personal experience.

– JAL – 2016-12-07T15:12:07.303

1I don't know about this particular use of alternate instruction encodings, but one example I do know of was a shareware DOS assembler back in the early 90s that (according to its documentation -- I never did analyse it and see if it was true) used a number of alternate encodings in a fixed pattern so that the author could grab anybody's publicly distributed program and find out if they'd assembled it using his assembler. Seems like a somewhat similar idea. – Jules – 2016-12-07T19:59:32.897

67

The NES was also from the era where some sound and graphics resources were also executable code. (Typically, this worked the other way around. Identify a needed sound and listen to chunks of the binary to find a reasonable candidate.) Injecting NOPs can improve the look or sound derived from a section of executable.

Example: "One of the more-challenging aspects of the development was searching the code for byte sequences that could also be reused as sound effects data."

This causes no end of difficulties for recompiling these executables to target modern CPUs since you can either have the original instructions (with correct sounds and graphics) or you can have working instructions (with garbled sound and graphics).

Eric Towers

Posted 2016-12-04T04:21:22.730

Reputation: 909

Or you take a third option and include the original byte sequences as well (changing offsets where necessary, and hoping that you've caught any self-modifying code, etc.) – TLW – 2018-07-23T00:11:10.973

1Welcome to Retrocomputing! Great first post! – JAL – 2016-12-05T03:33:00.360

9

The "neutral zone" in Yars' Revenge was the program code itself, and made a neat white noise effect: http://www.2600connection.com/interviews/howard_scott_warshaw/interview_howard_scott_warshaw.html

– Chris Gregg – 2016-12-05T06:14:14.877

33

I'm just speculating here, but one possible reason for using a 2-byte NOP would be if you wanted to change an existing 2-byte instruction into a NOP (to fix a bug, for instance), without changing the byte count for the instruction. (An undocumented 2-byte NOP might execute more quickly than two standard 1-byte NOPs in succession.)

You might do this to avoid changing the addresses of other instructions (maybe there's an already-prepared table with those addresses, or there's a JMP or a JSR that you can't change, or an indirect JMP where you compute an address and you don't want to have to change the computation, or there's some relative addressing that would be messed up by the change, etc.). You might also want to just patch existing machine-language code without going through the assembler (or compiler) again.

Mitchell Spector

Posted 2016-12-04T04:21:22.730

Reputation: 566

5Or maybe you just want padding for some code-layout reason, and the NOP is in a place that will be executed. It seems reasonable to expect that a single multi-byte NOP is more efficient than multiple short NOPs. That's the case on modern x86 CPUs, where it's possible to create single-instruction NOPs from one to to 15 bytes long. – Peter Cordes – 2016-12-05T17:09:26.753

29

A mistake?

The instruction $89 on the 6502 is a two-byte NOP. Based on adjacent instructions in the opcode matrix, especially LDA #ii ($A9 ii), it would have been STA #ii, a store to an immediate value, which makes no sense. On the 65C02, this instruction is changed to BIT #ii, which almost behaves as a two-byte NOP. One hypothesis is that a programmer working on both NES projects and projects for some 65C02-based system forgot that the original 6502 lacked BIT #ii, but because the instruction does so little anyway, the programmer didn't notice any difference.

Clockslide

A clockslide is a is a sequence of instructions that wastes a small constant amount of cycles plus one cycle per executed byte, no matter whether it's entered on an odd or even address. With official instructions, one can construct a clockslide from CMP instructions: ... C9 C9 C9 C9 C5 EA:

  • Disassemble from the start and you get CMP #$C9 CMP #$C9 CMP $00EA (6 bytes, 7 cycles).
  • Disassemble one byte in and you get CMP #$C9 CMP #$C5 NOP (5 bytes, 6 cycles).

A calculated start address into a clockslide can be used with indirect jumps (JMP (aaaa) or LDA highbyte PHA LDA lowbyte PHA RTS) to precisely control timing, such as when playing PCM audio or sending video register changes to the PPU in a raster effect. It's even more important on the Atari 2600, where the whole screen is a raster effect.

CMP has a side effect of destroying most of the processor status flags, but unofficial instructions that skip one byte can be used to preserve them. For example, replace $C9 (CMP) with $89 or $80, which skips one immediate byte, and replace $C5 with $04, $44, or $64, which reads a byte from zero page and ignores it.

Watermarking

As LOIS 16192 mentioned, the official NOP instruction ($EA) can be inserted at random places in a particular subroutine that isn't an inner loop. This can identify authorship of a piece of code in a way similar to trap streets. But it adds even more entropy to use unofficial NOPs ($1A, $3A, $5A, $7A, $DA, or $FA), two-byte NOPs ($80 ii, $82 ii, $89 ii, $C2 ii, $E2 ii), or two-byte NOPs that read the zero page ($04 dd, $44 dd, or $64 dd). And now that NES games are manufactured with flash memory instead of mask ROM, each cartridge can have a slightly different pattern of NOPs. This can help identify exactly which copy of a game was leaked to the warez scene.

As with clockslide, watermarking can also be done without unofficial instructions. But because of the cost of copying a mask ROM, this sort of watermarking wasn't actually used in games during the original commercial era of the Famicom and NES (1983 to 1996). It may be in use in homebrew-era games (2010 and later).

Sources

Damian Yerrick

Posted 2016-12-04T04:21:22.730

Reputation: 986

1Great answer, and great first post! Thank you for stopping by to share your knowledge! – JAL – 2016-12-05T17:56:41.553

I hadn't seen clockslides used on the NES, but I've used them on the Atari 2600; even on that platform they're not terribly common, since HMPxx have five clock cycles' worth of range--enough to accommodate "sbc #5 / bcs lp". – supercat – 2017-05-09T15:29:52.043

@supercat Audio playback in Big Bird's Hide and Speak uses a NOP slide, which isn't quite a clockslide but is close. – Damian Yerrick – 2017-05-09T21:54:25.740

12

On the 6502, it's pretty common for code to use the BIT instruction to skip over bits of code; the most common usage pattern is skipping over a 2-byte instruction using a 3-byte BIT, but the approach could also work skipping over a one-byte instruction with a 2-byte BIT. For example:

EnterWithCarrySet:
    sec
    db  $24   ; Bit ZP
EnterWithCarryClear:
    clc
MainEntry:

The only effects of the BIT instruction are to perform a read of the indicated address and update the Z and V flags. In most cases, setting the flags will be harmless even if it's not particularly desirable. The ABS and ZP forms of "NOP" are similar, except they can be used in cases where it's desirable to leave the flags alone. Additionally, there is "NOP immediate" instruction which fetches the byte following the opcode but does not perform a subsequent access to the location indicated thereby, thus saving a cycle.

supercat

Posted 2016-12-04T04:21:22.730

Reputation: 6 224

9

It's more than 30 years since I don't develop anything for a 8 bit computer, and specifically for a 6502 processor, but recompiling as we understand it nowadays it was not possible. You had to code right on the memory addresses, and moving blocks of code was a feature that only advanced tools had.

Sometimes I left 'gaps' between pieces of code filled with NOPs just in case you need to code something in the middle. The problem was when you did not have enough 'gaps' or because of performance restrictions several NOPs could impact in the overall performance of your game. Then I started to play with non documented instructions until I was able to adjust performance and 'gaps' for example.

Maybe those guys had the same problem I had when I developed for Oric computers...

Diego Parrilla

Posted 2016-12-04T04:21:22.730

Reputation: 99

Welcome to Retrocomputing! Great first post, I think your experience will be valuable here. I hope you stick around to answer more questions, and make sure to check out the tour to earn your first badge.

– JAL – 2016-12-05T14:48:26.887

In addition, you may want to elaborate on why an undocumented NOP was used instead of the regular instruction. – JAL – 2016-12-05T14:57:20.487

1It has passed a very long time... the computer I learnt 6502 assembler was an ORIC-1 very popular in some countries in Europe. It lacked of any kind of advanced custom chip for graphics, and things like vertical and horizontal sync were quite rudimentary. – Diego Parrilla – 2016-12-05T15:10:35.533

2development tools were primitive, and all the young boys developing games at that time had to squeeze that machine in such a way that a NOP in the main event loop of control of a game could make the graphical elements to flow or suck. Everything had to be under control, and hardware interruptions could kill the performance. Some undocumented instructions could 'delay' some actions of the CPU and give you more performance. – Diego Parrilla – 2016-12-05T15:15:15.620

The 6502 has undocumented 3- and 4-cycle NOP instructions, but the specific ones in the question are two cycles, just like the official NOP. – Mark – 2016-12-05T20:25:00.913

-1

NOPs are used for synchronization of IO in many 8-bit architectures that lack more sophisticated instructions and a capability to control the bus effectively.

Examples include a lack of wait states for memory or IO, serial communications, lack of direct memory transfer, etc.

Cartridge based systems are rife with NOPs, as are devices that have need of user input devices, such as a controller.

In x86 serial communications, a popular macro that included NOPs was called "punt".

Michael

Posted 2016-12-04T04:21:22.730

Reputation: 17

1Ok, so why would a software developer use an undocumented NOP? Are you saying that the undocumented instructions are just macros for NOP and DOP? – JAL – 2016-12-05T17:28:56.863

3The 6502 has an official NOP instruction you can use if all you need is a two-cycle delay somewhere. The question is why you'd use an unofficial instruction that has the same effect as the official one. – Mark – 2016-12-05T20:23:04.017