Save game size reduction

Programmers discuss here anything related to FreeOrion programming. Primarily for the developers to discuss.

Moderator: Committer

Message
Author
User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#16 Post by Cjkjvfnby »

Looks like data in the `universe/objects` is already present in the `universe/empire_latest_known_objects`

Could we replace objects with just ids?

Unpacked XML: objects 15 mb, empire_latest_known_objects 57 mb.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#17 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 12, 2021 9:14 am Looks like data in the `universe/objects` is already present in the `universe/empire_latest_known_objects`

Could we replace objects with just ids?
No, because the latest known information about an object for a particular empire is not necessarily the same as the actual state of that object on a given turn, if the empire hasn't had visibility of the object on the latest turn. There's probably a possibility to have a sentinel value or other way to mark that the current latest known state is the current state in the cases where they do match, but it would require some extra logic to handle.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#18 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 12, 2021 8:20 am
The "first" and "second" can probably be changed to "f" and "s", and would probably substantially reduce the size of the XML text, since those two appear a lot, for all associative containers.
Looks like an easy way to get a bit more size reduction.
For uncompressed strings, perhaps. For the compressed strings, it would probably make no difference, assuming the compression algorithm is reasonably good. So, for space on disk, changing these labels will no do much. And as above, for network communication, the messages could also be compressed for probably much more effectively data volume reduction than anything that would come from changing the text labels in the XML.
Some stats to tags:

Code: Select all

{
    "item": 905343,
    "first": 659055,
    "second": 659055,
    "count": 250661,
    "item_version": 250640,
    "i": 233983,
    "c": 230651,
    "px": 112590,
    "CombatEvent": 90113,
    "events": 63153,
}
"item", "count", and "item_version" are not stuff that FreeOrion names, but rather internal stuff to how the serialization code tracks what it's doing, which would be difficult to change. "first" and "second" were discussed above for standalone std::pair. I'm not sure if or how much writing a custom serializer for std::pair would automatically also apply to serialized contents of maps, though, which is probably most of those cases, or if it would be complicated to do that for stuff within containers. "i", "c", and "px" are already as or almost as small as possible. "CombatEvents" and "events" are probably changable without much trouble.

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#19 Post by Cjkjvfnby »

For the compressed strings, it would probably make no difference, assuming the compression algorithm is reasonably good.
This is the same kind of the optimization as for `f, s, px`. So if previous was beneficial, this should be beneficial too.

When you deserialize data you work with the uncompressed file, so size probably affects parsing speed.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#20 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 12, 2021 10:15 pmSo if previous was beneficial, this should be beneficial too.
What is "previous" and was it substantially beneficial when using compressed data?
When you deserialize data you work with the uncompressed file, so size probably affects parsing speed.
It would slightly reduce the memory consumption for the text buffer when deserializing, which would slightly speed up deserialization since it wouldn't need to fetch as much data from main memory when processing it. But the main thing taking time probably isn't memory copying, but rather allocating and constructing all the objects that it's deserializing. When serializing, we don't know how much memory the serialized text will take, so pre-allocate a hopefully larger than needed buffer. In a few cases it might make a difference if the text is shorter and that means it doesn't need to re-allocate a buffer when it uses up the pre-allocated one, but that shouldn't often be an issue and wouldn't be a huge speed difference if it just happens a couple times, which it should if the buffer grows exponentially, which they normally do.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#21 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 12, 2021 8:54 am How hard would it be to change orbits from the list to the number and list of planets?

Each m_orbits tag has a count of 7.
```
<m_orbits><count>7</count>
```
So instead of a list of fixed size, we could use a number and a list of real planets.
It won't save any space, but this could make things a bit simplier.
I don't understand what you're suggesting. Right now it lists how many orbits there are and what planet object ID is in each. It uses a fixed size list because it tracks which orbit slot each planet is in, and orbital position can matter in some calculations.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#22 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 12, 2021 9:01 am Some values could be calculated and no need to save them.

The number of planets could be calculated from the planet list (orbits)

Code: Select all

<m_orbits>
    <count>7</count>
    <item_version>0</item_version>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
    <item>-1</item>
</m_orbits>
<m_planets>
<count>0</count>
<item_version>0</item_version>
</m_planets>
Ships in the system can be populated from fleets in the system:

Code: Select all

<m_fleets>
    <count>1</count>
    <item_version>0</item_version>
    <item>112026</item>
</m_fleets>
<m_ships>
<count>1</count>
    <item_version>0</item_version>
    <item>112015</item>
</m_ships>
It's probably possible, but it would complicate serialization and deserialization quite a bit. Almost all the data and its organization in the serialized text reflects the way the gamestate is stored in memory. There is some redundancy in that, with for example a system tracking all the objects in it, including all the ships and fleets, and then those fleets also tracking what ships they contain, and the ships and fleets tracking what system they are in and the ships what fleet they are in. Some of this data could be omitted from the serialized state, but then would need to be recalculated when deserializing.

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#23 Post by Cjkjvfnby »

Geoff the Medio wrote: Mon Sep 13, 2021 6:46 am
Cjkjvfnby wrote: Sun Sep 12, 2021 10:15 pmSo if previous was beneficial, this should be beneficial too.
What is "previous" and was it substantially beneficial when using compressed data?
void Meter::serialize https://github.com/freeorion/freeorion/ ... eter.h#L64
Geoff the Medio wrote: Mon Sep 13, 2021 6:46 am
When you deserialize data you work with the uncompressed file, so size probably affects parsing speed.
It would slightly reduce the memory consumption for the text buffer when deserializing, which would slightly speed up deserialization since it wouldn't need to fetch as much data from main memory when processing it. But the main thing taking time probably isn't memory copying, but rather allocating and constructing all the objects that it's deserializing. When serializing, we don't know how much memory the serialized text will take, so pre-allocate a hopefully larger than needed buffer. In a few cases it might make a difference if the text is shorter and that means it doesn't need to re-allocate a buffer when it uses up the pre-allocated one, but that shouldn't often be an issue and wouldn't be a huge speed difference if it just happens a couple times, which it should if the buffer grows exponentially, which they normally do.
What is the idea behind adding Meter::serialize? Was it optimization?
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#24 Post by Cjkjvfnby »

Geoff the Medio wrote: Mon Sep 13, 2021 7:11 am It's probably possible, but it would complicate serialization and deserialization quite a bit. Almost all the data and its organization in the serialized text reflects the way the gamestate is stored in memory. There is some redundancy in that, with for example a system tracking all the objects in it, including all the ships and fleets, and then those fleets also tracking what ships they contain, and the ships and fleets tracking what system they are in and the ships what fleet they are in. Some of this data could be omitted from the serialized state, but then would need to be recalculated when deserializing.
Or we could add lazy init for these variables. We don't get the result until we will measure it.

I got the idea behind the orbits.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#25 Post by Geoff the Medio »

Cjkjvfnby wrote: Mon Sep 13, 2021 10:59 pmWhat is the idea behind adding Meter::serialize? Was it optimization?
There needs to be a Meter::serialize function to serialize and deserialize Meter objects. Since it was written by me (rather than being provided by Boost like is the case for most C++ standard library contains like std::map or std::vector, it was trivial to specify the single-char tokens to use in the XML.
Cjkjvfnby wrote: Mon Sep 13, 2021 11:07 pmwe could add lazy init for these variables
I suppose, but that would complicate the code that uses them quite a bit, particularly in multi-threaded contexts.

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#26 Post by Cjkjvfnby »

Took me some time to understand what compressed.xml means.

We create inner xml, zip it, encode to base64, add to outer xml and save it. If we replace it with the binary zip archive it should save some space and time (base64 encoding/decoding is not free).

Experiments with file zipping.
Screenshot 2021-09-16 at 09.34.08.png
Screenshot 2021-09-16 at 09.34.08.png (47.33 KiB) Viewed 1884 times
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#27 Post by Geoff the Medio »

Cjkjvfnby wrote: Thu Sep 16, 2021 6:35 amIf we replace it with the binary zip archive it should save some space and time (base64 encoding/decoding is not free).
Part of the point of having an outer uncompressed XML structure for save files is that the header information can be read and deserialized without unzipping the whole archive, which should be substantially faster, particularly for large saves. This is important when populating the list of save files to pick from.

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#28 Post by Cjkjvfnby »

Geoff the Medio wrote: Thu Sep 16, 2021 12:48 pm
Cjkjvfnby wrote: Thu Sep 16, 2021 6:35 amIf we replace it with the binary zip archive it should save some space and time (base64 encoding/decoding is not free).
Part of the point of having an outer uncompressed XML structure for save files is that the header information can be read and deserialized without unzipping the whole archive, which should be substantially faster, particularly for large saves. This is important when populating the list of save files to pick from.
It is not a problem to get first bytes of the file packed to the zip archive from disc. Zip is designed for that.

You read last 22 bytes: End of central directory record (EOCD). You will get the start point of the Central directory file header.
From the Central directory file header you will get the absolute offset of your file in archive.
You read and unpack it as a stream.
So basically you will need about 3-5 random access reads to get that information. (You could put the header as a separate file, to make deserialization a bit easier)

More details about zip structure: https://en.wikipedia.org/wiki/ZIP_(file ... #Structure
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Cjkjvfnby
AI Contributor
Posts: 539
Joined: Tue Jun 24, 2014 9:55 pm

Re: Save game size reduction

#29 Post by Cjkjvfnby »

One more vector of optimization is `map<..., ...>

A map is a data structure for search. So basically it consumes more memory than data inside, to give you a faster search. https://en.cppreference.com/w/cpp/container/map.

Let's look into examples:

Code: Select all

Univers/universe.h: typedef std::map<Visibility, int>               VisibilityTurnMap; 
This structure could be represented by class(?) with 3 fields last_turn_for_partial, last_turn_for_basic, last_turn_for_full.

In most cases, we use direct access to attributes: visTurnMap[Visibility::VIS_PARTIAL_VISIBILITY], but for random access, we could add a method with a switch statement. visTurnMap::getTurn(Visibility visibility)

Cons:
  • Less memory usage. We don't maintain extra data structure for search
  • Faster access: attribute access is much faster than access item in a map
  • A bit more convenient API, it's a bit more readable when you access attributes when keys
  • A smaller footprint in the save game. We have an instance for each empire for each object it could see.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13586
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Save game size reduction

#30 Post by Geoff the Medio »

Cjkjvfnby wrote: Sun Sep 19, 2021 5:13 am
  • Less memory usage. We don't maintain extra data structure for search
  • Faster access: attribute access is much faster than access item in a map
There are many uses of map, set, and their unordered_ equivalents in the gamestate code. Most of these could theoretically be replaced with flat_ equivalents to have faster lookup or memory use, particularly for smaller containers with single or double-digit numbers of items in them when the index is an int or double. But doing that is a lot of work and not something I'm currently prioritizing unless the profilers suggest it might be in the hot path for processing time or UI lagginess issues. Saving a few tens of bytes, even for thousands of objects, isn't THAT important when RAM is measured in GB and disk space in TB. They are also not trivially substitable in order to guarantee better performance. The flat_ variants in particular are good for small set lookups due to cache locality, but if there are also lots of insertions or deletions of items from the sets, then it gets less clear whether they'd be a performance win. So similarly, while using a few member variables instead of a map might be faster, that's not a reason enough to go to the trouble to change how things are set up currently. For the memory use when serializing in particular, switching to different containers probably doesn't make a huge difference either. There is some overhead for the map state tracking, but it's not a huge amount, really, I suspect. But a case of just 3 visibility levels, the map probably is not really beneficial, you're right. If you're really keen on switching it, we can, but I wouldn't prioritize it otherwise.

Post Reply