Sunday, December 27, 2009

Why is it good that the index of the first element in arrays is zero in a lot of programming languages?

The answers before mine are pretty decent, except for their atrocious pointer math! :-)





address + (i * 32) -- How many bits in a byte? Also pointer math with arrays of structures can be a little tricky, due to a compiler optimization technique called structure packing, where extra bytes of padding may be added for alignment reasons.





v[i] == *(v + i) -- This is only correct for an array of bytes.





In C/C++, using sizeof() is a much better practice than using magic numbers for type sizes in pointer math expressions. The following will always be true for any integral type (though could be skewed by structure packing, as noted above):





v[i] == *(v + (sizeof(*v) * i))





But back to the original question, why is it good? Well, it's good when things are intuitive and consistent -- wait, who said it was good?





I think it's good in C/C++ because you always know where you stand, all arrays are zero-index-based, no ambiguity.





Otoh, I also think it's good that VB6 arrays are so flexible, they can be declared with anything as the index base:





Dim a(47 to 100) As Int





That declares an array with base index of 47. But by the same token, it always strikes me as a little freaky that this array:





Dim a(100) As Int





has 101 elements, not 100, because the default base is 0, and the number 100 refers to the upper-bound of the array, not the number of elements.





So I'm not sure that good or not good strictly pertain. It seems that many if not most recently developed languages are going with zero-based; my guess would be the primary motivation for this is just plain old consistency.Why is it good that the index of the first element in arrays is zero in a lot of programming languages?
Addendum: Note that my statement:





%26gt;%26gt; v[i] == *(v + i) -- This is only correct for an array of bytes. %26lt;%26lt;





Is not, in fact, correct. The compiler scales pointer types for addition/subtraction ops automatically. My bad. Apologies for any confusion this may cause. Report Abuse
Why is it good that the index of the first element in arrays is zero in a lot of programming languages?
Because that is how the first number is represented in binary.





It is a relic of a time when resources were extremely limited, and so programmers had to work around them. There was no room for making it user friendly.
An array variable is really a pointer. It is associated with (points to) the address of the start of the first element in the array. An array itself is a contiguous memory area --- i.e., in memory, element 1 is right after element 0, element 2 is right after element 1, etc.





It is easier to use a 0-based element list because all you have to do is take the starting address and then multiply the index by however much space each element takes.





So, for example, if I have an array of 32-bit integers, and my array starts at some (phony decimal equivalent) address of 1000, then element 0 is at 1000 + (0 * 32) = 1000, element 1 is at 1000 + (1 * 32) = 1032, element 2 is 1000 + (2 * 32) = 1064, etc.





This holds true even if I have an array of some odd structure or object.
Is 0-based indexing good?





I wonder. In the '70s and '80s, some BASIC interpreters had a statement OPTION BASE, that would let you switch the base (OPTION BASE 0, and OPTION BASE 1). Early FORTRAN (before FORTRAN 77) used base 1; FORTRAN 77 lets you decide (per array index).





C was the language that really made base 0 indexing popular: the equivalence v[i] and *(v + i) made it clear. ';v'; was no more than the address of the base of the vector/array.





I tend (mostly) to prefer 1 based indexing -- after all, we refer to the FIRST character of a string, and the FIRST chapter of a book. It tends to make the code a bit more self-documenting.





Of course there are some ';computer purists'; who insist on ';Chapter 0';! However, imagine an editor that needs to represent a character position - what is the ordinal name of the position BEFORE the first character? Generally, I would refer to THAT as 0, and not the character itself. This reasoning leads one to basing on 1. After all, the language should model the problem domain. After which, the next step is to allow independent specification of the index range. It's in FORTRAN 77, damn it, and THAT was 30 year ago!





The only language that 0 based indexing is defensable in is C (and, by extension, C++) but those are almost ';too the iron'; languages.





Hope this helps,

No comments:

Post a Comment