Anyone familiar with Lua knows that Lua allows you to modify and delete elements in a for ... pairs
loop. pairs` loops to modify and delete elements in a table. There is nothing wrong with the following code:
However, if we delete elements and add new elements while traversing, there will be a problem. For example:
Running the above code will give you this error:
Anyone familiar with Lua will be familiar with this error. The solution is to avoid adding new elements to the table during the traversal. The official Lua documentation also makes it clear: if any value is assigned to a field in the table that does not exist during the traversal, the behavior of next
is undefined. But the error itself is interesting, as it involves many aspects of Lua’s mechanics, and I think it’s a good opportunity to understand Lua’s internal implementation.
Puzzling behavior
Why is it interesting? First of all this error doesn’t always occur (the documentation also says “behavior undefined”), for example the following code doesn’t have this problem:
On the other hand, we know that the for ... pairs
loop essentially calls next
at each iteration, passing in the table and the key of the previous iteration to get the key-value pair for the current loop.pairs loop is essentially calling next
at each iteration, passing in the table and the key of the previous iteration, and getting the key-value pair of the current loop. The meaning of the next(t, k)
function is that, for a given table t
, it returns the next key-value pair adjacent to the given key k
. If k
is the last element of t
, then nil
is returned; if k
is nil
, then the first element of t
is returned. What if k
is not in t
? Naturally, an error should be reported:
The familiar error. In that case, the code at the beginning of this article should also report an error, because when v
equals 1, the corresponding key is deleted, causing the next iteration to call next
to try to find the next element for a key that does not exist in the table. But not only does it work, the following code also works:
The second next
call returns normally, as if a
were still in t
. The error occurs when we delete and then insert a new element:
Of course there’s more fun, to get this error, you don’t have to insert a new element, sometimes it comes out with a GC:
It’s written in a strange way to accommodate certain Lua mechanisms. If you change the above code a bit, for example by changing next(t, 'a'...' b')
to next(t, 'ab')
, the error will not appear.
Implementation of next function
When calling next(t, k)
, k
should exist in t
; but it seems to work if k
has just been deleted. How can this be done? Let’s look at the implementation of the next
function:
|
|
You can see that luaH_next
first calls findindex
to find the position of a given key
in table t
, and then looks backwards from that position to the next element. In findindex
, several things are done:
- Line 5 determines that if
key
isnil
, it returns 0; - Lines 6 to 8, if
key
is a positive integer and is in the array field, return the array index; - Otherwise, look in the hash field. First line 11 gets the primary location of
key
; - In the loop from line 12 to 25, lines 14 to 16 check if the value of the current position is equal to
key
, if so, the lookup succeeds, line 19 returns directly; otherwise, line 21 checks the next position; - If the node equal to
key
is not found until the end of the chain, the search fails and an errorinvalid key to 'next'
is thrown on line 23.
As we saw earlier, calling next
immediately after an element is set to nil
, returns it normally. This means that findindex
has successfully found the position of the element. Notice the comment on line 13: “key may be dead already, but it is ok to use it in ’next’”, which means that a deleted key is fine for next
? With this in mind, let’s first see what happens when we execute t[k] = nil
.
|
|
We know that t[k] = nil
executes the OP_SETTABLE
instruction, so we find the lvm.c
where the luaV_execute
function executes OP_SETTABLE
, which is lines 27 to 32 above, and see that it does several things:
- Get the operands,
ra
,rb
, andrc
for table, key, and value, respectively; - Line 30 calls the macro
settableProtected
; - line 15
settableProtected
first tries to callluaV_fastset
; luaV_fastset
determines thatluaV_fastset
fails whent
is not a table (line 4), ork
is not int
(line 7), and returns 0. Otherwise, the value is assigned directly in line 9;- If
luaV_fastset
fails, thenluaV_finishset
is called instead at line 16.
When we delete an element using t[k] = nil
, k
is obviously in t
, so it will execute to line 9, assigning a value directly to slot
. Notice that slot
is actually the return value of luaH_get
, which will be equal to the value of k
. So when t[k] = nil
is executed, it will only assign the value of k
to nil
, and will not change the key. This also gives the next
function an opportunity to do so.
Let’s go back to luaH_next
. It does this in findindex
to determine if the current position is equal to key
. If our key is not a GCObject, such as a number or a boolean, it can be accessed even if it is deleted, and luaV_rawequalobj
always gives the correct result; even if the key is a GCObject, it can be compared properly as long as it is not GC’d. However, if the position of the key is occupied by an element inserted later, the lookup will fail. This is why it is possible to get an error like “invalid key to ’next’” when traversing both deleted and new elements.
When using the for ... pairs
loop, you don’t have to worry about the key being GC’d, because the iterator always holds a reference to the current key. So in a for ... pairs
loop, you can safely perform the delete operation.
Lua’s GC mechanism
(ttisdeadkey(gkey(n)) && iscollectable(key) && deadvalue(gkey(n)) == gcvalue(key))
is used to handle the case when the key is GC’d. If the current location is a deadkey, and key
is a GCObject, compare deadvalue(gkey(n)) == gcvalue(key)
, i.e. compare the address of the key at the current location (i.e. gkey(n)
) and the incoming key
to see if they are equal. So what is deadkey? This brings us to Lua’s GC mechanism.
Lua GC works by stringing all the GCObjects into a chain; then periodically scanning and marking all the objects in the VM, and the ones that are not marked are the ones that need to be recycled. Let’s look at the function that scans the table:
|
|
To be precise, this is a function that scans non-weak tables. This function is relatively straightforward, and it does the following:
- Lines 6 to 7 scan the array fields, all marked;
- Lines 8 to 17 scan the hash field, where:
- lines 10 to 11 check if the value of the current position is
nil
, and if so, callremoveentry
; - Otherwise lines 13 to 15 mark the key and value of the current location.
- lines 10 to 11 check if the value of the current position is
Objects that are not marked are reclaimed, so if the value of an element is nil
, it will be reclaimed in this GC. But before it is reclaimed, removeentry
is also called, and what does it do?
|
|
It will mark the key of the element as dead. Note that this marker actually sets the object’s type tt_
field to LUA_TDEADKEY
. This way when calling luaV_rawequalobj
to compare a deadkey with a regular object it always returns false. This way when deadvalue(gkey(n)) == gcvalue(key)
is executed, the key of element n
must have been GC’d.
For table, function, thread, userdata, two objects are equal if and only if their addresses are equal. When an object is GC’d, it doesn’t have any references, so it can’t be passed into next
.
This is not the case with strings, where the equality of two strings depends on the content of the string, not on its address. So when a string is GC’d, we can still construct a string that is equal to it, but with a different address. Passing the newly constructed string into next
, will cause an error. In the previous example, the reason for using 'a'...' b'' is because we want him to construct a different object. Just change any of the ''a'...' b'
anywhere in it, then the string ab'
will be in the constant table, so it won’t be GC’d; Lua has a short string reuse mechanism, so 'a'...' b'
and 'ab'
are actually the same object, so this error will not occur.
Conclusion
I was surprised when I first discovered that next
could pass in a key that had just been deleted, because a deleted key should be something like a wild pointer and should not be used anymore. But after a closer look, everything is fine. This made me feel the subtlety of Lua design once again. This little problem involves the Lua table implementation, the GC mechanism, the constant table mechanism, and short string reuse. It is rather boring to gnaw on Lua source code, so we can explore Lua implementation from some interesting phenomena.