I grew up on Z80 and x86 assembly. I happily moved data into my registers from anywhere in memory and back without ever giving it a second thought. Life was nice and simple.
It wasn't until years later, when I started dealing with caches, vector units, and different architectures that data alignment became a big deal. Today, data alignment is an inescapable reality, and we all have to deal with it one way or another.
Data alignment refers to where data is located in memory. All data is at least 1-byte aligned, meaning that it starts at any one byte in memory. Data that is n-byte aligned is located somewhere with a memory address that is an exact multiple of n bytes.
We care about alignment for a single reason: Performance.
In modern hardware, memory can only be accessed on particular boundaries. Trying to read data from an unaligned memory address can result in two reads from main memory plus some logic to combine the data and present it to the user.
Considering how slow main memory access is, that can be a major performance hit. Some platforms such as the PowerPC choose not to hide such an inefficient use of hardware and simply raise a hardware exception flagging the invalid memory access. Neither scenario is particularly attractive.
Even when reading from fast, level-one cache memory, unaligned access to data will use up more cache lines, making poor use of the available memory and evicting data that might be accessed again. As a result, sections of your code can thrash the cache and become major performance bottlenecks.
Vector operations usually force programmers to deal with data alignment explicitly. To squeeze more performance without increasing the frequency of CPU cores, hardware designers have been putting more emphasis on vector operations (which are just a type of SIMD instruction -- single instruction multiple data).
One vector instruction can operate on four or more data points at once, considerably improving the performance of a program that operates over each data point serially. Vector units often have a restriction that they can only work on data aligned on a particular boundary.
For example, Intel's and AMD's SSE vector instructions operate on data aligned on 16-byte boundaries, so it is the programmer's responsibility to align the data properly before feeding it to the vector units.
In general, the more you interface with the hardware directly, the more careful you will have to be about the alignment of your data.