Switch over to BSP2. I'm not sure why many people are overlooking this, perhaps it wasn't focused on enough in the documentation? BSP2 is much more efficient at handling millions of instances.
Using my 3-4 year old 32-bit system with 2gb ram I can easily render over 600 instanced copies of your tree mesh.

(I stopped the render because I have work to do and couldn't tie up my PC for the full render).
Bottom line, when using a lot of instanced proxy objects, you should be using BSP2 for optimal results.