One important aspect of Quantum Mechanics that frequently gets lost in discussions avoiding the mathematics is that there is no absolute distinction between "pure" and "superposed" states — it is all so to say in the eye of the beholder, and that beholder is typically not some human experimenter but rather a physical observable (technically: self-adjoint operator on the quantum state). @GuyInchbald hints at this in his answer by pointing out that designating some states as pure is the same thing as choosing a basis for the space of quantum states, but I fear the point needs to be stressed much stronger when raised in a philosophical context.
The case of spin
A system providing a good example of this arbitrariness of what to label as "pure" is that of the spin of an electron. The coarse popular science description is that the electron has the spin states 'up' and 'down', mathematically denoted +1/2 and −1/2 (for technical reasons that make sense in calculations). The slightly more refined description, which one encounters when wishing to discuss quantum superposition, is that the electron can also be in arbitrary superpositions of these states. After this comes the more seldomly seen (but really illuminating) further refinement that spin is really tied to a direction in physical space – we speak about 'spin up' and 'spin down' because there is a convention to use the eigenstates for spin in the Z direction as the standard basis when describing spin, but one could just as well use any other direction to define the basis states (those designated as "pure").
What is nice about the spin is that it is quite straightforward to observe, as is done in the Stern–Gerlach experiment: when an electron travels through a suitable nonhomogeneous magnetic field (ideally constant in time and direction, but strength varying between different points in space), its path will be slightly diverted according to the direction of its spin, very much as classically a rotating charged particle would be (hence the name 'spin'). Practically the experiment is conducted with a beam of electrons: passing through the magnetic field splits the beam according to the spins of the individual electrons, and that is what in the more abstract discussion of QM is described as making a measurement of their spins. (IMHO there are problems with the way that the 'measurement' concept is used in discussions of QM, but more on that later.)
From the classical physics point of view, the outcome of the Stern–Gerlach is weird, because the beam is always split into two subbeams: each electron is either shifted a certain distance in one direction, or an equal distance in the opposite direction, corresponding to spins of +1/2 and −1/2 respectively. For a classical rotating charge the distance would depend on the speed of rotation, so an initial interpretation could be that all electrons seem to always rotate at the exact same speed, but it's actually a lot weirder, because the classical effect would also depend on how well the axis of rotation aligns with the direction of the magnetic field; getting two distinct beams out would classically only happen if all charges rotate at the same speed and all have axes of rotation parallel to the magnetic field. In the Stern–Gerlach experiment you get two beams no matter what direction the magnetic field has, so it seems like electrons have an axis of rotation parallel to every direction there is! But that was just because we were trying to make a strained classical interpretation; the modern consensus is that electrons aren't rotating as such, they just happen to have a property known as spin which (among other things) causes them to interact with magnetic fields in a way that resembles how rotating charged particles would interact. That spin is quantised is in itself no weirder than that electrons "orbiting" a nucleus only has a discrete set of possible orbits; we may gladly think of 'up' and 'down' as the two possible values of the physical quantity 'spin'. As long as we only measure spin in the Z direction.
Things get weirder if we repeat the experiment with one of the subbeams coming out of the Stern–Gerlach apparatus, for example that of 'up' electrons. Feeding that through another Stern–Gerlach apparatus with magnetic field in Z direction will not split the beam, because these electrons are all in the same spin state, and are thus shifted the exact same amount. Feeding the beam through another Stern–Gerlach apparatus with magnetic field in Y direction will however split the beam 50/50 into two subbeams, because even though measuring the spin of any particular electron always produces a value of either +1/2 or −1/2, measuring the spin in the Y direction is something different than measuring it in the Z direction; one does not determine the other. Yet the actual spin state of the electron is always just a superposition of two basis states, which may be chosen as the 'up' and 'down' eigenstates in Z direction, but equally well as the two distinct eigenstates of spin in the Y direction, or the again two distinct eigenstates of spin in any other direction. None is more fundamental than any other, so it is perfectly fine to consider 'up' and 'down' as superpositions of their Y direction counterparts |σ_y = +½⟩ and |σ_y = −½⟩ (curse the lack of formula rendering on this SE!).
What happens if you place several Stern–Gerlach apparati in series (and always use only one output beam as input to the next) is that the beam splits if the magnetic field directions are different (in proportions depending on the angle between them; for a straight angle the split is equal, whereas for a smaller angle there is a bias towards the spin closer to that of the input beam) whereas it doesn't split if the directions are the same. There is in particular no "memory" going back further than that, so if you split Z, Y, Z then 1/8 of the original beam will come out 'up' from the last Z and 1/8 will come out 'down', even though all electrons in those subbeams came out in (say) the 'up' subbeam from the first Z. The standard way of doing the math here is to say that when the beam gets to the Y splitter, the relevant basis to use is that of the Y direction spin eigenstates. The electrons coming in are all in the Z direction spin eigenstate of 'up', but since 'up' in the Y basis is an equal absolute value superposition of the two basis states, each electron will have an equal 50% probability of going into the +½ subbeam or the −½ subbeam, after which they will then be in either of those two spin states instead. Then coming to the last Z splitter, everything is the same but with Z and Y interchanged, so again each electron has an equal 50% chance of going into either subbeam. It's all very neat, but just not what a training in classical physics would have led you to expect.
In particular it has been seen as inconceivable that the electrons would randomly fall into one subbeam or the other, even though that is what the experiments suggest. The aim of the various "hidden variable" theories has often been to try and restore determinism by expanding the state of an electron to make it predetermined what it would do when subjected to e.g. a particular sequence of Stern–Gerlach experiments (1/8 of the original electrons being predetermined to go up–left–up when subject to a Z,Y,Z split, and another 1/8 being predetermined to go up–left–down in the same experiment, etc.), but those theories have failed to match experimental results (especially when entanglement and interferrence enters the picture, but that's another level of complications).
More general reflections
Popular descriptions of QM often paint a picture where physical systems normally reside in one or another classical pure state, even though on microscopic scales you can temporarily create these weird quantum states that are superpositions of several pure states, but luckily those superpositions quickly "collapse" back to pure states, even though it is nondeterministic which pure state that will be the result. This picture is seriously flawed.
First, the "pure" states are not classical at all. The states one designates as pure are typically chosen to be easy to comprehend (to the extent that is possible), and one approach for such choices is to require that some observable (or quantity, in the classical terminology) has a distinct value; mathematically this means picking the eigenstates of that observable as the states to designate as pure (basis) states. But merely letting one observable have a definite value does not make the state classical — from the point of view of another observable, the state is typically a superposition of some other set of states where that second observable has definite values. A classical state would be one where both position and momentum of all particles have definite values, and such states simply do not exist.
Second, as mentioned above, what is "pure" or "superposition" is merely a matter of how we choose to describe the space of quantum states, not an aspect of the reality. Though that said, one should also be aware that some descriptions are more strained than others; there are states that would be labelled as superpositions in any humanly sensible description of the system.
Third, one should not make the mistake of believing that classical physics is always clear and intuitive whereas quantum mechanics is weird, because they both harbour plenty of things that are weird from the everyday layman perspective: shadows that are objectively less dark in the middle for one thing (by classical wave theory of light), or the finer points of celestial mechanics. But since the early writers on QM were classically trained physicists, they were biased in that these were familiar phenomena, not this new quantum weirdness. Even today the physics curriculum first deals with the classical material before addressing the quantum view, because the classical material is "easier". (Some parts surely are, but other parts I'm not so sure about; you can do a lot of QM with just linear algebra, whereas classical physics relies heavily on PDEs.)
Another thing that frequently gets misrepresented is the matter of "measurements" in QM. From classical physics, we're used to the idea that a measurement reveals what is already there, a fact about the world that was true regardless of whether we knew it or not. Those in the business of actually making measurements know that it is not always quite that easy; a voltmeter has a large but finite impedance, so the mere act of connecting it to two points of an electrical circuit will slightly change the currents and therefore also the voltages in that circuit, however in that case the distortion is typically small enough that it may be ignored. For measuring other quantities we may be less lucky, but for the sake of philosophy it is common to disregard practical aspects such as imperfections of measurement devices (among other things, because it makes the discussion much messier).
Either way, measurements in QM aren't that easy. A common description (which IMHO is misleading) of what is going on is that the observable again has a definite value (as in the above classical model), because that is true in the pure states, but since the state at hand unfortunately is a superposition this definite value degrades into a random variable. By "collapsing the wavefunction", the act of measuring forces this random variable to pick a definite value and thus reveal its underlying truth. This description is correct insofar as one can use it to carry out the calculations, but it is not so useful for philosophical enquiries. There is even a strong version of this description according to which the superposed state is unknowable, unlike the pure states which can be known, but that strong version is simply wrong and the spin example gives a good explanation of why.
If in the repeated Stern–Gerlach experiment we pick the 'left' subbeam coming out of the Y direction splitter, then the spin states of the electrons in that beam are simply 'left'; we know this, because we just measured this to be the case — there are no ifs or buts. For then calculating the effect of the subsequent Z direction splitter, the above description would ask us to instead view this 'left' state as the equivalent superposition of 'up' and 'down'; according to QM, this is just the same thing. By measuring the spin in the Z direction, we then force each electron to pick one of those two possibilities, or at least that is one way to interpret the calculations. A seemingly similar, but according to QM (and experiments) incorrect, probabilistic interpretation is that half the electrons going in are predetermined to become 'up' electrons and the other half predetermined to become 'down' electrons; that erroneous conclusion is however easy to reach if one believes "a measurement reveals what is already there".
A better picture of measurements in QM is that a measurement of an observable of a quantum system is an interaction with that system which forces it into a state where said observable has a definite value, subject to certain rules relating the probabilities of the possible outcomes to the amplitudes of the corresponding pure components in the pre-measurement state. This may sound strange, but it matches what many measurement processes actually do: to measure the vertical/horizontal polarisation of a photon, one aims it at a polarisation filter (of e.g. vertical polarisation), and if the photon gets through it has been measured to be polarised vertically, whereas if it is reflected it has been measured to have the complementary polarisation of horizontal. Even if the photon was in fact known to be polarised at a 45° angle before encountering the polarisation filter, it will be either vertical or horizontal when leaving it. Interactions between individual particles and pieces of macroscopic experimental apparatus have a definite tendency to behave in this manner; something as simple as having a particle pass through a particular hole constitutes a measurement of the fact that the particle was at the position of that hole, i.e., a measurement of its position. Measurements can be rather subtle.
On the other hand, it is subjective whether a measurement has in fact occurred, because measurements happen when you extract classical information from a quantum system; when describing an experiment, there is at least in principle always an option to delay the point at which the measurement takes place, by expanding what you regard to be the quantum system (to include more of the experimental apparatus, particularly detectors and registration)! Concretely, if your quantum state only includes the spins of the electrons, then the Stern–Gerlach apparatus performs a measurement of the spins by forcing each electron to go either into one subbeam or the other. However if you expand the quantum state to include the position of the electron, then the apparatus simply performs the reversible state transformations
|↑,0⟩ ⟼ |↑,+d⟩ (spin up at position 0 goes to spin up at position +d)
|↓,0⟩ ⟼ |↓,−d⟩ (spin down at position 0 goes to spin down at position −d)
What happens when the spin is instead |σ_y=−½⟩ = ( |↑⟩ − i|↓⟩ )/√2 (the imaginary unit i here is significant – without it we would instead have the state |σ_x=−½⟩) is that the transformation acts independently on the spin up and spin down terms, mapping
( |↑,0⟩ − i|↓,0⟩ )/√2 ⟼ ( |↑,+d⟩ − i|↓,−d⟩ )/√2
The electron coming out of the apparatus is thus in a state that is a superposition of 'spin up at position +d' and 'spin down at position −d', rather than in just one of the two. Because this is an entangled state, we now have the extra option of deducing the spin from measuring the position, but the Stern–Gerlach apparatus itself has not measured the spin. It would (at least in theory; I'm not versed well enough in the experimental aspects to be sure of how practical it is to do with electrons) be quite possible to recombine the two beams using a second Stern–Gerlach appartus with the magnetic field in the opposite direction, and thereby recover the original state of the electron. The measurement does not happen until the electron hits a detector not part of the quantum system, and theoretically there is no problem of redefining that detector and its records as being part of an even larger quantum system, in which case the superposition persists until someone looks at those records to see what the detector registered.
I believe the above is all established physics. What I haven't seen is anyone drawing the (IMHO) obvious conclusions of the above points with respect to more philosophical concepts such as determinism/nondeterminism, so the following rather counts more as my own opinions. It is however perfectly possible that this just happens to coincide with one of the standard interpretations of QM — a lot of them seem to have rather illogical names, so it is quite likely that I would fail to find this even if I spent a month looking for it.
There is a paradox present in the ordinary description of quantum mechanics, in that quantum systems are supposed to evolve unitarily — a property stronger than deterministically in that not only is the future completely determined by the present, but the past is also so determined, since everything is reversible (no information is ever created or destroyed, just rearranged) — until a measurement occurs, at which point the system makes a random transition that creates new information (result of measurement) and destroys old (actual state before measurement). This is a paradox because the physics laboratories in which such measurements happen are built up from matter that supposedly interacts in ways that obey the unitary laws of quantum mechanics — if at the micro scale everything is unitary, then how can it fail to be so also at the macro scale?!?
One solution is apparent in the idea that measurements happen when you extract classical information from a quantum system. The catch is that in a quantum mechanical universe, there is no such thing as classical information, although at macroscopic scales you can get darn good approximations of it (or at least: so it seems). Consequently measurements cannot exist either (which resolves the paradox), although in interactions between micro and macro systems there has to be something which manages to produce very good approximations of them — maybe the superpositions do not in fact collapse, but rather one of the outcomes get heavily suppressed (somewhat like in Grover's algorithm)? Or more likely, it's down to entanglement with the environment — certainly if you had a quantum system running the usual experiments to test whether the laws of quantum mechanics hold you would get an overwhelming probability for the conclusion 'yes', but low probabilities for any particular outcome in many intermediate stages which anyway aren't important. This is speculation, but speculation that could potentially be examined mathematically: would unitary interactions between macroscopic and microscopic quantum systems behave in ways that approximate the QM laws for "measurements" (for macro systems that approximate classical information processing)?
If they do, this also puts an interesting spin on the matter of QM randomness, since it philosophically comes out pretty much the same as the deterministic pseudorandomness used for cryptography: both rely on external entropy sources (in the quantum case: entangling with the environment) to produce results that come out as effectively random. It's only that in the quantum case this happens spontaneously, whereas in classical computers we need fancy hash algorithms to achieve similar effects.