/article/discussion/arrayIndexes.php
Array Indexes Must Start At 1, Otherwise Cripled
- historical development
- mathematical extensions
- and ergonomics.
- I will also consider argumentation of others.
Performance difference
Even the low-level assembler is a compiled language - which neutralizes the dilemma completely because the
compiler can take care of the translation. If a language really should allow tackling problems relying on
such
micro-optimizations, syntactical constructs like [0 ]
and [1
]
can be used to address this.
Extendability of concept
Accessing items from the end of the array
If we begin array with a 0
, we have no choice, but to take -1
as the index of
first item from the end. Thinking in 0-based paradigm from the start and in 1-based paradigm from the
end -
that requires to be skilled in thinking in both. To start indexing with the 1
, allows a
simpler, symmetrical extension from domain of unsigned to the domain of signed numbers.
+0
and -0
do exist in certain types of numerical encodings1. This representation exists at the level of source code, but there is no concept which would capture this distinction at runtime.
And so 0-based arrays could hold the line, somehow3, yet the extensibility concept does not end there. An array does not consist all just of neatly ordered items.
Addressing space between items
Inserting an item between some other two is exactly that, addressing a gap. Usually, inserting is treated as a special operation, where the index passed as parameter, talks about inserting before the item of that index2. Yet there are two different semantics: inserting before and after.
The same goes for addressing slices. Does the starting index include the element, is the ending index "up to, not including"? Now there are four options, when one has no choice but to stick to integers. Floating points offer a solution.
The concept of 1-based array can straightforwardly encompass this - defining an item to be placed at
positions i+0.5
or i-0.5
is just that. The extension, is from the domain of
signed numbers to the domain of floating-point numbers.
The 0-based concept breaks down here. In it's native, the positive domain, inserting item
before 0
, would mean to insert it at -0.5
. Worse
than counterintuitive, collision occurs, when inserting item in the negative domain after the end, at
the index -1 + 0.5
.
Further numerical extensions
One basedness allows 0
to have special meaning, but also +Inf
,
-Inf
and NaN
can be difined to have a meaning. Languages might want to begin supporting
+0
and -0
floating point expressions.
2 There is no other choice in 0-baseness, explained in a later paragraph.
3 As well, IEEE floating-point supports
+0
and -0
.
Notation Comparism
Here are all different notations of iteration upon zero- and one-based arrays. I only list iterating through the positive indexes, as it is always simpler and using the negative domain does not lead to any advantages.
There are always two alternatives on how to notate an iteration: having a sharp and unsharp comparism. I included both for completeness, just note that the upper one of the pairs is the better/usual way to notate the range.
form
problems
special chars variation
(i=0; i<n; i++)
1
in 1 cluster
and 1 logical group
(i=0; i<=n-1; i++)
3
in 2 clusters
and 2 logical groups
(i=n-1; i>=0; i--)
3
in 2 clusters
and 2 logical groups
(i=n-1; i>-1; i--)
more groups than clusters
invalid index: -1
3
in 2 clusters
and 3 logical groups
(i=1; i<=n; i++)
2
in 1 cluster
and 1 logical group
(i=1; i<n+1; i++)
visually cumbersome: n+1
2
in 2 clusters
and 2 logical groups
(i=n; i>=1; i--)
2
in 1 cluster
and 1 logical group
(i=n; i>0; i--)
1
in 1 cluster
and 1 logical group
1-basedness iterations are beautifully symmetrical, using the same unsharp compare operator (larger-equal /
smaller-equal). This also shows how practical is, when the upper index is the same as element count. Imagine
every n
would be expressed as count(arr)
.
Visual simplicity makes reading and understanding simpler, especially boundaries which are in-sync with how they are expressed. This aids natural perception and direct expectancy fulfilment.
v.s.
Dijkstra
I found it was a rather quickly written note.
Dijkstra rooted himself in the first section about intervals, didn't put the two variations against each other anymore in the later parts and thus had no choice but arriving at 0-based index as the more natural one.
We sail apart right at the first argument, because having constants readable on the monitor is what I consider as expected, instead of thinking in length of runs. There are two subtractions which give correct length (expanding either the lower side or the upper side of the interval). Yet visually in-sync, is only one variation. The empirical evidence is 35 years old without a reference to inspect it.
My argumentation worked itself through multiple areas independently, and in each section found 1-based array to be a better variation in all areas.
Others
Most discussion pro 0-baseness contains reference to Dijkstra and simple arguments without analysis (like a naive argument on obvious performance).
1-based arguments often contain references to mathematical notation and pedagogical reasons reasoned by expectations and simplicity.
Conclusion
Well defined concepts which provide sound extensions - are in mathematics and it's applied areas, like programming, what it is all about. To me, 0-based arrays seem like a missed decision on decoupling of assembler language from binary code - which was bound on the hardware and it's structure by necessity. We are struggling with that ever since.
Links
Wiki of Cunningham & Cunningham: Zero And One Based Indexes
Wikipedia: Zero based numbering
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.
Waiting for approval.