Hi folks. As some of you will know, we’ve recently started doing OpenSimulator load tests for the second OpenSimulator Community Conference (OSCC 2014) taking place on the weekend of November 8th/9th.
In this context, a load test means getting as many people as possible to log on to the keynote regions (where the keynote speeches are given) so that we can see how the system performs under heavy load. The keynote regions are
- cc.opensimulator.org:8002:keynote 1
- cc.opensimulator.org:8002:keynote 2
- cc.opensimulator.org:8002:keynote 3
- cc.opensimulator.org:8002:keynote 4
The tests take place every Tuesday at 8pm British Summer Time (12pm Pacific Time, 3pm US Eastern Time, 9pm Central European Summer time) up until the conference. Getting real people to attend is very valuable since they (and their viewers) can behave in all sorts of ways that one doesn’t expect. However, if there aren’t enough real people available then the load can be supplemented by bots.
This year, I hope to write up some reports of these tests when I get the time. I think they show the challenges faced in reliably reaching higher avatar concurrency and the tools and code changes that evolve to meet these challenges.
One thing I wanted to do this year is to improve the data that we get from these tests for later analysis. In a system as complex as OpenSimulator, identifying why something went wrong under load is as big a challenge as actually fixing the problem.
We already have experimental statistical logging in OpenSimulator that can capture a range of statistics (e.g. threads available in the threadpool, root agents logged in) at regular intervals.
However, the challenge is then to analyze this data. To this end, I spent some time last week writing a set of Python scripts (known as Osta) that can produce summaries and graphs from statistical logs. The tools are still very rough around the edges but they were enough to give some good insights as to the behaviour of the simulator during the load test.
In the graphs below, samples are plotted along the x axis where a sample was taken every 5 seconds. So the test period was just over an hour.
To proceed to the analysis itself, our peak number of connected avatars for this test was 291. The vast majority of these, over 200, were bots.
Connections were distributed fairly evenly, though practically all real users were on Keynote 1.
You can see from this and the previous graph that on occasions large number of avatars (around 50) disconnected simultaneously. This was because instances of the bot test program (pCampbot in the OpenSimulator distribution) crashed with failure at the Mono level, causing large numbers of bot connections to simultaneously timeout. This is one problem that needs to be tackled.
It’s interesting to correlate the root agents graph with the scene frame times for each keynote region.
The peaks represent points at which users would have experienced extreme movement lag because of the time that the scene loop was taking to complete each frame. Judging by eye, there may be some correlation with the mass bot timeouts. However, the analysis tools need to be improved to see if the high frame times occurred exactly when the disconnections happened or slightly before or after.
Another interesting graph is one which shows how many inbound packets were being received from connected clients and viewers.
As you can see, the number of inbound packets processed crashed to 0 at 2 points for Keynote 1. Again, this is very strongly correlated with the peaks in frame time on that same region. It’s also correlated with a peak in threads used from the SmartThreadPool for keynote 1, as shown below.
In fact, the maximum number of threads allocated to this pool was 128 and we hit this limit at about sample 500. This would stop many inbound UDP packets from being processed, leading to an extremely laggy region from the user’s point of view.
Whilst this was certainly a problem and the number of STP threads has been increased for the next load test, it doesn’t really explain why there was a simultaneous crash in IncomingUDPReceivesCount. This stat occurs at a very low level of inbound UDP processing and uses the runtime IOCP threadpool (which has no capacity problem) rather than the STP threads. It will be interesting to see if this same problem occurs in today’s load test now that the number of STP threads has been increased.