Fixes a race condition when multiple threads try to acquire the GIL
before `detail::internals` have been initialized. `gil_scoped_release`
is now tasked with initializing `internals` (guaranteed single-threaded)
to ensure the safety of subsequent `acquire` calls from multiple threads.
Py_Finalize could potentially invoke code that calls `get_internals()`,
which could create a new internals object if one didn't exist.
`finalize_interpreter()` didn't catch this because it only used the
pre-finalize interpreter pointer status; if this happens, it results in
the internals pointer not being properly destroyed with the interpreter,
which leaks, and also causes a `get_internals()` under a future
interpreter to return an internals object that is wrong in various ways.
At this point, there is only a single test for interpreter basics.
Apart from embedding itself, having a C++ test framework will also
benefit the C++-side features by allowing them to be tested directly.