The key question is not really if there is such software, but rather if such software is possible, or if it would work very well.
While a printer is more than just stepper motors, those are one of the trickiest parts, so it makes sense to look primarily at that.
Back in the old days of personal computing, it was not uncommon to generate stepper motor signals from the CPU of a personal computer, using individually settable bits on either a special purpose interface (that is in fact basically how the head in a floppy drive was typically moved) or borrowing another available interface such as a parallel printer port.
But then two things happened: computers got faster, but more isolated from the world, and operating systems in common use became much more strict in what they permitted.
To move a high resolution stepper at a decent speed, you need to generate either step pulses or winding activation signals at a fairly high rate. And to accelerate and decelerate a motor under load, you need to finely vary their timing. Back when I/O ports hung directly off processor buses, and operating systems could not prevent programs from speeding up the system hardware timer in order to run a stepper routine rapidly, this worked to a degree. But today:
Most PC class processors have little, if any directly coupled I/O - especially by the time you get outside the box. An interface like USB is great for moving a large amount of data per unit time, but it is absolutely horrible for accomplishing a trivial task with precise, frequent timing - it is a freight train, not a bicycle courier. Many of the things which let a processor internally operate quickly specifically work by decoupling it from an outside world that often cannot keep up - memory caches, bus exchange units, etc. If you do find a parallel port today, it is likely to be on the far side of a PCI bus bridge at the least, and have a different low-level interface than a legacy one.
Modern operating systems have a time-slice scheduler which "owns" the CPU(s) and hands out small chunks of processing time to ordinary programs. These programs typically get to run often enough to appear responsive to the user, but not frequently enough to accurately drive stepper motors. There are various schemes which have been tried, for example to create a "hard real time" scheduler which owns the processor, and allows a motor control task to register for precisely needed time slices - then, with whatever time is left over a Linux or Windows or similar kernel is allowed to run, and divide up the remaining time among ordinary programs through its scheduling rules. Of course, such a scheme tends to need to be revised each time the utilized conventional operating system has a new major release.
While there are ways around these issues, they tend to require atypical hardware and deep changes to the operating system installation - making them neither inexpensive nor easy to setup for end users.
Instead, it is generally simpler and more cost effective (not even $10 these days) to put an embedded processor on an external circuit board, and have it act as a delegate to execute precise-timing tasks on behalf of the host processor. Somewhat extending from the idea of industrial CNC machines that originally read punched paper tape, and were later updated with a scheme where an ordinary computer "drip feeds" G-code commands over a serial port, modern 3D Printers tend to deliver G-code (or other) command data a little bit in advance of when it is needed, so that the latency of a USB or serial connection doesn't really matter. Normally enough data is buffered on the printer for it to keep running, but even if not it would only pause briefly between the complete moves which are transmitted, not experience motor stuttering as it would if the USB were trying to deliver each individual step pulse.
As for why an Arduino - probably mostly the history of who built the machines which kicked off the enthusiast printing trend. If someone from an industrial background were tasked with building something like an FDM printer or a machine with similar motion needs in isolation today, chances are they would end up with an ARM processor that would be a bit faster, more flexible, and with more resources, and likely cost a little less. But in actual history, the early affordable machines were built by those in maker community, who were already familiar with the availability of the Arduino, and willing to put some cleverness into getting good motion out of its limitations. RAMPS in particular seems designed to be a coarse-pitch through-hole bridge that a hobbyist could build themselves, and then buy the slightly trickier to work with surface mount processor and motor drive chips preassembled in the form of an Arduino Mega and stepper drive modules. That even fairly propriety machines maintain these basic parts choices is probably an indication of the utility of not "reinventing the wheel" - if you want to develop a printer, you can start from available components and customize them only one by one as you choose, rather than not being able to run your development prototype until you get a working circuit board designed and fabbed, a working software base developed, etc.