I think the best idea would be to run multiple Lua instances, one per cog, and provide a library that allows for message passing between them.
This merits its own thread. Of course, you've mentioned the need for message passing elsewhere. And this is definitely something the P2 should handle well. So the question here is: what options do we have right now?
Like the P1, we can do Hub slots/mailboxes. Also like the P1, there's a penalty for random access timing. Unlike the P1, we can use SETQ to more efficiently transfer large messages. You can also have a queue, which then provides the ability to asynchronously send messages. Because the Hub is a shared resource, it is possible for multiple cogs to interact with the same message(s), either intentionally or unintentionally.
Like the P1, we can do locks as flags. Also like the P1, these are slow hub operations. Unlike the P1, there are 16 instead of 8. Also, unlike the P1, cogs can efficiently INT/WAIT/POLL them via selectable events. Unlike Hub, these are limited to flags/events (1-bit messages), and cannot be used asynchronously (for the purpose of "message passing").
For paired cogs, we can do message passing using the LUT. This can be treated similarly to Hub message passing, but without the overhead cost. Further, if the messages can fit into a single LONG, you can effectively use selectable events to INT/WAIT/POLL for messages. Share LUT also has the added advantage that they are protected from being read or modified by other cogs. However, there is less overall space for messages, as compared to Hub messaging. And, you are limited to communicating with your paired cog.
The new ATN functionality also provides a simple form of messaging. Since ATN can be detected by INT/WAIT/POLL, this can be effectively used by itself, or in combination with Hub or Shared LUT messaging. On their own, ATN has similar limitations as Locks, but without the overhead cost.
You can even use the smart pins! In this case, each pin can store a LONG. You can then use INT/WAIT/POLL and read them. The advantage of using these over hub is that they are fast. The advantage of using these over Shared LUT is that you can pass messages to any cog, or even multiple cogs. With 64 smart pins, it will be increasingly likely that you will have some unused smarts that could be repurposed for messaging. However, this is still a limited resource that is not available if the Smart Pin is otherwise in use.
So... is there any other way that I've missed?