The difference is huge. I'll spare you the LMGTFY answer, although you really should try that sometime.
In a cluster, each machine is largely independent of the others in terms of memory, disk, etc. They are interconnected using some variation on normal networking. The cluster exists mostly in the mind of the programmer and how s/he chooses to distribute the work.
In a Massively Parallel Processor, there really is only one machine with thousands of CPUs tightly interconnected. MPPs have exotic memory architectures to allow extremely high speed exchange of intermediate results with neighboring processors.
The major variants are SIMD (Single Instruction, Multiple Data) and MIMD (Multiple Instruction, Multiple Data). In a SIMD system, every processor is executing the same instruction at the same time, only on different bits of memory. Essentially, there is only one Program Counter. In a MIMD machine, each CPU has it's own PC.
MPPs can be a bitch to program and are of use only on algorithms that are embarrassingly parallel(that's actually what they call it). However, if you have such a problem, then an MPP can be shockingly fast. They are also incredibly expensive.
A cluster is a bunch of machines, normally usually Ethernet interconnect (read: network), each running it's own and separate copy of an OS which happen to serve a single purpose.
An MPP supercomputer usually implies a faster propitiatory very fast interconnect (e.g. SGI NUMALink) that supports either Distributed Shared Memory (run processes on different MPP nodes that use shared memory over the fast interconnect to share data as if they were running on a single computer) or even a Single System Image (a single instance of an operating system, mostly Linux, running on all the nodes at the same time as if on a single machine - e.g. "ps aux" on any node will show you all the processes running on the MPP).
As you can see the definition is quite fluid, it's more a question of scale then clear cut differences.