Why don't we force miners to embed the height of the TX Merkle tree in the first two bytes of the 4-byte block header version?
That would be a sane softfork, but it would need to take into account that due to BIP65, the block version (which is a signed integer) needs to be at least 4, restricting it to 2^31-4 values. Maintaining compatibility with BIP9 and possibly BIP340, which assign meaning to certain bits in the version number, would complicate things further.
Also, it's hard to convince miners to softfork in a change that complicates matters for them, which means that absent a successful UASF-style pressure, it's unlikely this would be adopted.
It'd fix the leaf-node brute force weakness (CVE-2017-12842), which is currently fixed only by standard-ness rules but not by consensus rules.
Indeed! I think a consensus change that enforces a minimal transaction size instead is probably far less invasive.
Similarly to the description in BIP 141, we can introduce a type of node that is neither a pruned node nor a complete full node, but they'd have txindex=1 for transactions with unspent outputs. Those nodes would first store full blocks. When the next block comes, using their txindex they'd look up the block, find the transaction and check if all outputs are spent. If so, they would remove the transaction from the storage of that block but only keep its hash. This would save a large space since it seems to me that most scenarios of querying a txindex-enabled node would use gettransaction on transactions with unspent outputs?
I don't see how this change is related to this potential mode of operation. Whatever you suggest blocks commit to in their header, it can instead just be remembered by nodes when they first validate a block, without consensus change. To a node locally, if it itself validated how many transactions there actually in the block, it won't ever convince itself of anything else.
I also don't think this mode has any benefits. It shares the slowness of the pre-UTXO model validation(*) with the lack of ability to provide full blocks (like current pruned nodes) while consuming far more disk space (it would need to keep every transaction fully that has at least one unspent output, rather than keeping just the unspent outputs themselves).
(*) Before Bitcoin 0.8, instead of a blockchain + UTXO set, there was a blockchain + index into that chain for every transaction + a boolean for every output whether it was spent or not. This was slow, because it meant that the working set (data frequently accessed by the code) was effectively the whole blockchain (doing many random seeks into for every input being spent). In the UTXO model introduced in 0.8, the UTXO set is kept independently from the blockchain (it does not contain pointers into the blockchain; it contains a copy of all the UTXOs), enabling efficient caching of just the UTXOs, and pretty making the blockchain unnecessary for normal operation (except serving blocks to other peers, or rescanning). That model is what enabled pruning too, as the blockchain itself is no longer used for validation.