When we make a voiced sound, for example a vowel sound, our vocal cords vibrate. This gives the sound pitch. We can make voiced sounds with a high pitch or a low pitch. For this reason, you can sing a tune using a [z] sound. You cannot do this, for example, with an unvoiced sound like [s]. We don't vibrate our vocal folds for an [s] sound, so it has no musical pitch. All vowel sounds are voiced, of course.
When the voiceless plosives [p, t] or [k] occur at the beginning of a stressed syllable in English, they have an effect on the following voiced sounds. In English there are rules about what types of sound can occur after these consonants when they occur at the beginning of a syllable (these are called phonotactic rules or phonotoactic constraints). English only allows the consonants /r, l, w, j/ or a vowel after a /p, t/ or /k/ at the start of a syllable.
Now, the phonemes /r, l, w, j/ and the vowel sounds are voiced. This means that they have pitch and that our vocal cords vibrate when we make them. However, after [p, t, k] at the beginning of a stressed syllable something strange happens. There is a delay before our vocal cords start vibrating. So, for example, in the word pool /pu:l/, there is a gap between our lips opening for the [p] and our vocal cords vibrating for the following [u]. This gap is called the Voice Onset Time. The Voice Onset Time in English is quite long in this situation. After we release our lips for the [p] in pool air starts rushing out of our mouths as it is pushed up from the lungs. During the gap between the [p] and our vocal folds starting to vibrate, we can hear this air rushing out of our mouths. It sounds like an [h]. In a word like pool where the sound after the plosive is a vowel sound, we call this [h]-like quality at the beginning of the vowel aspiration. Really what we are hearing is a devoiced vowel. Our mouths are already making the vowel shape but the vocal cords have not started vibrating yet.
Exactly the same thing happens when the sound after the [p, t] or [k] is one of the approximant sounds [r, l, w, j]. There is a gap after the release of the [p, t] or [k] before the vocal folds start vibrating. So in the word clean /kli:n/, for example, there is a gap after the [k] sound before the vocal folds start vibrating for the voiced [l]. During this gap our mouths are already making the necessary [l] shape but there is no vocal fold vibration. We will hear the air rushing out of the mouth from the lungs. It will have an [h]-like quality, but it will have an [l]-like quality too. In this situation we say that the [l] has become devoiced. However, we do not call the hissing noise that we hear aspiration when it occurs during a consonant sound like [r, l, w, j].
The Original Poster's question
When [p, t] or [k] occur at the beginning of a stressed syllable, they cause a delay in the voicing of the following sound. This delay is known as the voice onset time. During the delay we will hear a devoiced version of the following vowel or consonant. There will be a hissing noise as the air escapes from the lungs before the vocal cords start to vibrate. This happens regardless of whether the following sound is a vowel or a consonant such as [r, l, w, j]. When the following sound is a vowel we call this hissing noise aspiration but when the following sound is a consonant such as [r, l, w] or [j], we just say that the consonant has become devoiced.
You can hear a nice example of this type of devoicing on this page by Goeff Lindsey.
I have used the [r] symbol to represent the sound we make when we say an English /r/. This is to make the post easily readable. However, the standard English /r/ sound is technically not [r], but [ɹ].