Page 1 of 3

Save game size reduction

Posted: Sat Sep 04, 2021 9:19 am
by Cjkjvfnby
I want to reduce the save game size. It affects save game loading and saving (autosave is enabled by default).
during this, I want to reduce game state size, so gameplay performance could be benefited from it too.

Save for turn 284 is 8MB in compressed and 172MB in uncompressed.

Let's start with something simple:

Do we disable pretty XML format in a compressed state?

When compress XML is disabled, we format the code with line breaks and tabs.
Here are the top chars in the uncompressed file (turn 284). Guess the top tag in save :)

Code: Select all

'\t', 56401955
'<', 9210448
'>', 9210448
'e', 8553221
'0', 8056579
't', 7018587
'i', 6510353
'\n', 6118236
's', 4878465

Re: Save game size reduction

Posted: Sat Sep 04, 2021 10:22 am
by Geoff the Medio
Cjkjvfnby wrote: Sat Sep 04, 2021 9:19 amDo we disable pretty XML format in a compressed state?
I'm not sure what you mean by that. The XML format text before compression is generated by Boost Serialization. I don't think (but am unsure) there is an option to change its format to be less verbose, so before compression it will look like

Code: Select all

<m_meters>
	<count>25</count>
	<item_version>0</item_version>
	<item>
		<first>0</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
	<item>
		<first>1</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
	<item>
		<first>2</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
...
It's probably possible to switch to another serialization library, but that will be a somewhat large task. Binary saves are also an option if you don't need portability, but they tend to be bigger than the compressed XML.

Re: Save game size reduction

Posted: Sat Sep 04, 2021 4:57 pm
by Cjkjvfnby
Geoff the Medio wrote: Sat Sep 04, 2021 10:22 am
Cjkjvfnby wrote: Sat Sep 04, 2021 9:19 amDo we disable pretty XML format in a compressed state?
I'm not sure what you mean by that. The XML format text before compression is generated by Boost Serialization. I don't think (but am unsure) there is an option to change its format to be less verbose, so before compression it will look like

Code: Select all

<m_meters>
	<count>25</count>
	<item_version>0</item_version>
	<item>
		<first>0</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
	<item>
		<first>1</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
	<item>
		<first>2</first>
		<second>
			<c>0.000000000e+00</c>
			<i>0.000000000e+00</i>
		</second>
	</item>
...
It's probably possible to switch to another serialization library, but that will be a somewhat large task. Binary saves are also an option if you don't need portability, but they tend to be bigger than the compressed XML.

I mean that this XML could be printed in one line. Usually, XML printers have some settings. Could you point me to the place where we write XML? This could save us about 30% of the save file.

Re: Save game size reduction

Posted: Sat Sep 04, 2021 5:30 pm
by Geoff the Medio
Cjkjvfnby wrote: Sat Sep 04, 2021 4:57 pmCould you point me to the place where we write XML?
Not really... it's a bit complicated.

Individual classes that are serialized have their own serialize functions, which can in turn call other nested classes' serialize functions. An innermost such class that is relevant to the XML above is Meter::serialize:

Code: Select all

template <typename Archive>
void Meter::serialize(Archive& ar, const unsigned int version)
{
    if (Archive::is_loading::value && version < 1) {
        ar  & BOOST_SERIALIZATION_NVP(m_current_value)
            & BOOST_SERIALIZATION_NVP(m_initial_value);
    } else {
        // use minimum size NVP label to reduce archive size bloat for very-often serialized meter values...
        ar  & boost::serialization::make_nvp("c", m_current_value)
            & boost::serialization::make_nvp("i", m_initial_value);
    }
}
But note that's templated by the Archive type, which will be a particular Boost.Serialization archive class, which are created eg. here:

Code: Select all

freeorion_xml_oarchive xoa(s_sink);
where freeorion_xml_oarchive is a typedef of boost::archive::xml_oarchive. In typical Boost / C++ (debatably overengineered) fashion, that's a complicated inherited type that is difficult to dig much deeper into. Archive is a template type because the serialize functions above are used for several different types of serialization, including writing to disk / network, reading from disk / network, and into/from binary or xml-text formats. For the XML text case, you'd just need to investigate the xml archive classes, or perhaps some of the serialize functions.

Maybe this will help: https://www.boost.org/doc/libs/1_69_0/l ... ppers.html

Re: Save game size reduction

Posted: Sat Sep 04, 2021 9:53 pm
by Cjkjvfnby
Geoff the Medio wrote: Sat Sep 04, 2021 5:30 pm
Cjkjvfnby wrote: Sat Sep 04, 2021 4:57 pmCould you point me to the place where we write XML?
Not really... it's a bit complicated.
Agree it's complicated and wired. Looks like we don't have a good thing to control it.
Do we use the same serialization for sending state to AI?

Re: Save game size reduction

Posted: Sat Sep 04, 2021 10:49 pm
by Geoff the Medio
Cjkjvfnby wrote: Sat Sep 04, 2021 9:53 pmAgree it's complicated and wired.
Complicated yes, though weird is debatable. Doing very general serialization that can handle various primitive, (smart) pointer, polymorphic, container, and composite types with multiple different formats on multiple different operating systems is not easy.
Looks like we don't have a good thing to control it.
We could in theory write our own Archive class, but I certainly don't want to. There is also a semi-supported portable binary format archive, but which doesn't support floating point types, which are very difficult to get working portably.
Do we use the same serialization for sending state to AI?
Yes, sort of. If the AI or human clients are running on the same operating system as the server, then it will try to send the gamestate info as binary serialized data, not XML, since in that case there shouldn't be portability issues with the faster binary format.

Re: Save game size reduction

Posted: Tue Sep 07, 2021 5:52 am
by Cjkjvfnby
Geoff the Medio wrote: Sat Sep 04, 2021 10:49 pm We could in theory write our own Archive class, but I certainly don't want to.
Maybe we could try for sequence since this is the most used class in XML. And XML is used for server-client communication in multiplayer. Chars "items" took about 25 MB in the file. We could replace it with "i" and save 20 of them (0.5 MB in packed xml). Looks like it should not be so hard to do it https://stackoverflow.com/questions/430 ... 2#43055862

Re: Save game size reduction

Posted: Tue Sep 07, 2021 5:57 am
by Cjkjvfnby
I am not sure if we use/could have any benefits from multithreading for that task. In the theory process, some sections in parallel and write them in sequence could speed up saving a bit.

Code: Select all

ar & make_nvp(a) & make_nvp(b) -> x, y = parallel(make_nvp(a), make_nvp(b)); a & x & y 

Re: Save game size reduction

Posted: Tue Sep 07, 2021 6:02 am
by Cjkjvfnby
And the last but not the least is to check all serialized objects and their properties. We could check if we can remove them or use a less memory-consuming type.

I think it's better to have a call and do it together, Geoff what do you think about it?

Re: Save game size reduction

Posted: Tue Sep 07, 2021 6:11 am
by Cjkjvfnby
Fox example universe object m_x double(64) could be changed to float(32). Or even it could be multiplied by 1000 and changed to short (16).

Re: Save game size reduction

Posted: Tue Sep 07, 2021 9:50 am
by Geoff the Medio
Cjkjvfnby wrote: Tue Sep 07, 2021 5:52 amAnd XML is used for server-client communication in multiplayer.
Non-binary-compatible client-server messages could be sent with compressed XML instead of raw text XML as well, if volume of data being sent is an issue. This should produce a smaller message size than changing the XML label sizes, at the cost of a bit of time unpacking the message.
Chars "items" took about 25 MB in the file. We could replace it with "i" and save 20 of them (0.5 MB in packed xml).
Specifically the <item> XML tags are one of the more difficult cases to change. These appear in numerous complicated bits of the Boost templated serialization code for standard containers, which I suspect would be error prone to change. I could just copy it all and change that string for a template specialization, but I'd be worried it would lead to compatibility issues with different versions of Boost.
Looks like it should not be so hard to do it https://stackoverflow.com/questions/430 ... 2#43055862
The example there is a much simpler case: serialization of std::pair. That would probably be possible, as there is no versioning of pair itself or tracking of the number of elements it contains like is needed for the container classes (that have "item" a bunch). The "first" and "second" can probably be changed to "f" and "s", and would probably substantially reduce the size of the XML text, since those two appear a lot, for all associative containers. I did this some time ago for the similar Meter class.
Cjkjvfnby wrote: Tue Sep 07, 2021 5:57 amI am not sure if we use/could have any benefits from multithreading for that task. In the theory process, some sections in parallel and write them in sequence could speed up saving a bit.

Code: Select all

ar & make_nvp(a) & make_nvp(b) -> x, y = parallel(make_nvp(a), make_nvp(b)); a & x & y 
It's not that simple. The make_nvp calls are just creating a temporary local pair of pointers, not doing any substantial (de)serialization work (ie. converting to text or creating objects in memory and assigning values after converting back from text). All that real work happens in operator&, so that's what would need to be done in parallel. But operator& modifies the archive it's called on, essentially to append serialized info to the archive or to consume serialized info to recreate the previously-serialized data. It thus can't be called in parallel on the same archive. Rather, one would need to have two separate archive objects and serialize stuff to and from them independently in parallel, probably starting at a high level, rather than the low level as suggested in your snippet. I started testing that last month: https://github.com/freeorion/freeorion/ ... lSerialize
Cjkjvfnby wrote: Tue Sep 07, 2021 6:11 amFox example universe object m_x double(64) could be changed to float(32).
I'd be hesitant, as lowering the precision of positions can lead to weirdness for objects further from the centre of the universe coordinate space. Most other object state is tracked with float rather than double, but this specific case was a bit different due to the way positions are used. The choise of double instead of float for object positions was made considering main memory requirements, rather than size of serialized text representation, though.
Or even it could be multiplied by 1000 and changed to short (16).
Converting to and from a fixed-point representation just for serialization doesn't make sense, I think, as it would lead to actually different results between client and server or before and after saving and loading. Switching to a fixed point representation of object positions for the game mechanics internally might work, though I'm hesitant to do that as it's not trivial to implement and makes a lot of calculates more tricky to get right without a bunch of awkward conversions to and from floating point types. Maybe it'd be enough to store the positions internally as ints and have them be converted and scaled to and from floats whenever the rest of the game logic uses the values, but that's a lot of ifs for a questionable space savings vs. just compressing the text before sending over the network...

Edit: did a test implementation of storing positions internally as scaled int but using double values for all calculations of object positions: https://github.com/freeorion/freeorion/commits/IntPos which makes UniverseObject XML look like

Code: Select all

<UniverseObject class_id="87" tracking_level="0" version="3">
    <m_id>0</m_id>
    <m_name>Alnath α</m_name>
    <x>145544</x>
    <y>240571</y>
    <m_owner_empire_id>-1</m_owner_empire_id>
    <m_system_id>-1</m_system_id>

Re: Save game size reduction

Posted: Sun Sep 12, 2021 8:20 am
by Cjkjvfnby
The "first" and "second" can probably be changed to "f" and "s", and would probably substantially reduce the size of the XML text, since those two appear a lot, for all associative containers.
Looks like an easy way to get a bit more size reduction.

Some stats to tags:

Code: Select all

{
    "item": 905343,
    "first": 659055,
    "second": 659055,
    "count": 250661,
    "item_version": 250640,
    "i": 233983,
    "c": 230651,
    "px": 112590,
    "CombatEvent": 90113,
    "events": 63153,
}
PS. In Python, we often use k, v as names.

Re: Save game size reduction

Posted: Sun Sep 12, 2021 8:38 am
by Cjkjvfnby
Rather, one would need to have two separate archive objects and serialize stuff to and from them independently in parallel,
I'll try to describe the same thing in a different words.

Zip archive could have multiple files inside. Technically it stores packed files and tables of content with names and offsets. So we could compose one file from multiple sources (in our case it could be first-level tags). Writing to it is still sequential. But we could prepare each inner file in a separate thread.

The same for reading we read files one/by one from the archive and schedule jobs to the ThreadPool. We could play with jobs order (we could read the biggest section first, or hardcode some section to be first.)

PS. Probably we could use the same thread pool for serialization and deserialization.

Re: Save game size reduction

Posted: Sun Sep 12, 2021 8:54 am
by Cjkjvfnby
How hard would it be to change orbits from the list to the number and list of planets?

Each m_orbits tag has a count of 7.
```
<m_orbits><count>7</count>
```
So instead of a list of fixed size, we could use a number and a list of real planets.
It won't save any space, but this could make things a bit simplier.

Re: Save game size reduction

Posted: Sun Sep 12, 2021 9:01 am
by Cjkjvfnby
Some values could be calculated and no need to save them.

The number of planets could be calculated from the planet list (orbits)

Code: Select all

<m_orbits>
    <count>7</count>
    <item_version>0</item_version>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
</m_orbits>
<m_planets>
<count>0</count>
<item_version>0</item_version>
</m_planets>
Ships in the system can be populated from fleets in the system:

Code: Select all

<m_fleets>
    <count>1</count>
    <item_version>0</item_version>
    <item>112026</item>
</m_fleets>
<m_ships>
<count>1</count>
    <item_version>0</item_version>
    <item>112015</item>
</m_ships>