There are two things going on here:
The first is that you are not as familiar with French and Spanish, so their speech appears to be faster. This occurs with pretty much every language.
The second is that, yes, your intuitions are (tentatively) correct and some languages are spoken faster. There was a recent study that explored this, and I'll post a link to an article in TIMES summarizing it. I'll recap it here. Since you asked here, I'll try to give a slightly more technical explanation.
The reason has to do with information density, which is basically the information content of something divided by its size. For example, 'male human' is not as dense as 'man', because the later conveys the same information, but in one syllable, whereas the former does it in three.
We can give a roughly mathematical account by fixing the unit of meaning to be the meaning of some arbitrary expression. This allows us to express the information density of a language with a ratio: syllables/meaning. Two other important ratios here are syllables/time (How many syllables a speaker of the language uses per unit of time) and meaning/time (how much time it takes to express a unit of meaning). Notice that (syllables/time * meaning/syllables = meaning/time). That is, there are certain mathematical relationships between these concepts.
When researchers looked at all of these, they found that meaning/time was constant. No matter what language was used, speakers took about the same amount of time to express the paragraph. What differed between languages was how many syllables they used to express that meaning. This is where the perception of faster speech comes in: Spanish, for instance, used more syllables than English, so there is some sense in which Spanish is 'faster' than English. Japanese is perceived as super fast, and Chinese is slower.
So what is different about Spanish? Look at the equation (syllables/time * meaning/syllables = meaning/time) again. Notice that if meaning/time is constant (which the study found), then syllables/time and meaning/syllable must have an inverse relationship with each other: that is, when one goes up, the other goes down.
Basically, Spanish syllables have less information in them than English syllables, so it takes more of them to express the same meaning. Why is that?
It probably has to do with what syllables are allowed in a language. Different languages allow different combinations of sounds in their syllable. English, for instance, is fairly permissive: "stretch", for instance, is one syllable, but it is a very complex combination of sounds. Other languages aren't as permissive. Chinese doesn't have a lot of consonants in the coda (last part) of their syllables, but they more than make up for it by having four tones for any syllable they do allow.
To take an extreme example, imagine a language with only two syllables: /ba/ and /da/. Different words are expressed with different sequences of /ba/ and /da/. "No" could be /ba/, "Yes" could be /da/, "man" could be /badaba/, etc. Eventually we have "bachelor" expressed as /bababadabadabadababadabadadadababada/. As you can see, it will take a long time to get anything expressed in that language. To generalize a bit, the more syllables a language allows, the fewer it will need to express any given meaning.
What specific reasons Spanish has for having low information density, I can't tell you.
Article: Slow Down! Why Some Languages Sound So Fast.