[RESOLVED] Condition::Contains implementation

Message

cami · #1 Post by **cami** » Sun Oct 02, 2011 8:10 am

Following my new responsibility =), I created another callgrind using r4304. This time, it contains two turns instead of one to reduce the impact of game loading. It appears the next candidate for improvement is Condition::Contains, where 50% of time is spent, mostly performing std::set<>::insert().

#2 Post by **Geoff the Medio** » Sun Oct 02, 2011 8:41 am

cami wrote:It appears the next candidate for improvement is Condition::Contains, where 50% of time is spent, mostly performing std::set<>::insert().

What happens if you just swap in unordered_set for Condition::ObjectSet and Effect::TargetSet typedefs?

#3 Post by **tzlaine** » Mon Oct 03, 2011 1:19 pm

I hope we can do even better than hashing by just using a std::vector<> to represent the UniverseObject sets (this assumes that the index into the vector is the object ID, and that the highest object ID never gets too high for use to use this trick). We should definitely try hashing first though.

cami · #4 Post by **cami** » Mon Oct 03, 2011 5:06 pm

std::vector is probably a sensible choice - however, it requires a careful evaluation of existing code, due to the iterator invalidation problem.

What do you think about non-destructive Eval() signatures?
Many insertions are simply copies of an existing container, just for later moving obects from the copy to a new container. I think it might be very helpful to provide a variant of Eval() that populates a container (two containers?) but leaving the other one unchanged. That would also eleminate the iterator invalidation problem. suggested signature:

Code: Select all

void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet* matches = NULL, ObjectSet* non_matches = NULL, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const

or if you dont like the pointers and default arguments:

Code: Select all

void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & matches, ObjectSet & non_matches, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const;
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & results, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const; /* results is matches or non_matches, depending on search_domain*/

templates/metaprogramming can help avoiding code repetition and/or construction of unused containers here. For instance, I could imagine a "NULL" container that is always empty and does nothing on any operation taking the place of unused output parameters.

#5 Post by **tzlaine** » Mon Oct 03, 2011 7:30 pm

cami wrote:std::vector is probably a sensible choice - however, it requires a careful evaluation of existing code, due to the iterator invalidation problem.

There should be no invalidation problem if there are no insertions or deletions (and there shouldn't be!).

I mean something like this:

Code: Select all

// at most one of matches[i] and non_matches[i] contains a pointer to the object with ID i
std::vector<UniverseObject*> matches(objects.size()), 0;
std::vector<UniverseObject*> non_matches(objects.size()); // initialize somewhere so that matches[i] = <address of object w/ID i

if (/*match found*/) {
    matches[i] = non_matches[i]; // was matches.insert(...);
    non_matches[i] = 0; // was non_matches.erase(...);
}

But this has the problem that the set of objects being searched never gets smaller; it only gets more null members. If this proves to be too expensive, we can always use a rope implementation. For rope, deletions and insertions at any point are O(1), but storage is discontinuous.

#6 Post by **tzlaine** » Mon Oct 03, 2011 7:31 pm

cami wrote:What do you think about non-destructive Eval() signatures?
Many insertions are simply copies of an existing container, just for later moving obects from the copy to a new container. I think it might be very helpful to provide a variant of Eval() that populates a container (two containers?) but leaving the other one unchanged. That would also eleminate the iterator invalidation problem. suggested signature:
Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet* matches = NULL, ObjectSet* non_matches = NULL, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const
or if you dont like the pointers and default arguments:
Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & matches, ObjectSet & non_matches, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const;
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & results, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const; /* results is matches or non_matches, depending on search_domain*/
templates/metaprogramming can help avoiding code repetition and/or construction of unused containers here. For instance, I could imagine a "NULL" container that is always empty and does nothing on any operation taking the place of unused output parameters.

I'm afraid I don't get what you're suggesting yet. Could you explain it a different way?

cami · #7 Post by **cami** » Mon Oct 03, 2011 9:23 pm

I'll try to be more detailed. The major problem actually isn't slow containers, but the tremendous amount of container operations - although I admit, that copying a std::vector should be substantially faster than copying an equally-dimensioned std::set. Many, if not most (by number of calls) of those operations are caused by Condition::*::Eval being a destructive operation: its parameters "matches" and "non_matches" are in-out, they both provide the set of candidates and store the result of the condition evaluation. "Destructive" here means that the original value of matches and non_matches is lost and cannot be used for further calls to Eval(). Often, Eval() is called in loops that need to provide the same input values to matches and non_matches in each iteration, but since the value is "destroyed" by Eval, it needs to create a copy each iteration.

What I suggest is a strict separation of "in" and "out" parameters, in particular Condition::*::Eval() receives three sets instead of two:

a set of candidates to search within. This parameters is "in" and must not be manipulated by Eval(), but can be inspected in any way as appropriate.
a reference to a set to store matches found in. This parameter is "out", Eval() may only add new obects to the set and not inspect it in any other way.
a reference to a set to store candidates that didn't match in. This parameter is "out", Eval() may only add new objects to the set and not inspect it in any other way.

In most cases however, we won't be interested in both matches and non_matches. Thus, creating the unneeded set can be wasteful (especially the non_matches set can be almost as large as the candidate set -- often the whole universe). However, a special implementation for each case (only matches wanted, only non-matches wanted, or both wanted) would create duplicate code. That's why I suggest a templated signature of Condition::*::Eval:

Code: Select all

template <class Matches, class NonMatches> Eval(Matches& matches, NonMatches& non_matches, const ObjectSet& candidates)

The templated implementation can assume that Matches and NonMatches are containers of UniverseObject * (like std::set), and will store the results of its computation inside those. Now, we create a special Container type "NullContainer<ValueType>", that behaves similar to a Container (in particular, it should implement the interface necessary for Eval()), with the exception that its mutators simply do nothing. With inlining enabled, instantiating Eval() with a NullContainer<UniverseObject *> out-parameter will result in an Eval() version that does not waste time for filling this container, as the compiler will optimize-away all manipulations of this parameter. In particular, we can provide an additional signature for Eval() that is common to all Conditions with almost no additional code:

Code: Select all

void Condition::ConditionBase::Eval(ObjectSet& results, const ObjectSet& candidates, SearchDomain return_what = MATCHES)
{
	NullContainer<const UniverseObject *> ignored; // does nothing whatever method you call
	
	switch(return_what)
	{
		case MATCHES:
			Eval(results, ignored, candidates);
			break;
		case NON_MATCHES:
			Eval(ignored, results, candidates);
			break;
	}
}

This is pure convencience, but in fact most current code only needs the matches and can benefit from the candidate list remaining unchanged. The template instantiation hidden in the first switch case optimizes-away all code that fills the "non_matches" container. Likewise, the second switch case doesn't waste any time constructing a set of matches, but uses the same templated code. Whenever you needed both, you could simply call Eval() with all three parameters being an ObjectSet:

Code: Select all

ObjectSet temp_matches temp_non_matches;
// stores matches in temp_matches and (candidates - matches) in temp_non_matches
Eval(temp_matches, temp_non_matches, candidates);

If the template implementation is smart enough to even behave correctly when non_matches and candidates are references to the same object and also when matches and candidates are references to the same object, then we can even reconstruct the old destructive behavior by supplying the search domain as "candidates" parameter:

Code: Select all

void Condition::ConditionBase::Eval(ObjectSet& matches, ObjectSet& non_matches, SearchDomain search_domain = NON_MATCHES)
{
	switch(search_domain)
	{
		case MATCHES:
			Eval(matches, non_matches, matches);
			break;
		case NON_MATCHES:
			Eval(matches, non_matches, non_matches);
			break;
	}
}

#8 Post by **Geoff the Medio** » Tue Oct 04, 2011 1:31 am

Regardless of whether Eval is destructive or preserving of the input candidates set, perhaps using std::list instead of a std::set or std::vector would work for ObjectSet / TargetSet?

We presently don't need to locate a particular UniverseObject in the ObjectSet, so using a vector or map indexed by object ID seems unnecessary. I'm also a bit concerned about the potential to have lots of 30kB+ mostly-null vectors of pointers being constructed and destructed repeatedly during a compound-condition evaluation.

cami · #9 Post by **cami** » Tue Oct 04, 2011 7:37 am

Allocation isn't as bad as one might expect. GCC libstdc++ uses allocators with deferred deallocation and object recycling techniques. The callgrind shows that allocation and deallocation causes <7% of total running time, 13% if you include initialization and finalization. While this is a fairly high value, static allocation won't bring a major speedup here at the moment. As a comparison: the total cost of inserts and deletes is about 64%. Switching to std::vector will, however, also dramatically speedup insertion and deletion, in most cases it will result in a simple memory access. I suspect this will be a quick and very effective change. Update: std::list probably plays in the same league, here, being slightly slower in general but avoiding "gaps" in the list that would occur when setting entries to NULL in std::vector. In any case, I believe we should implement Conditions in a way that switching containers is only a typedef change (this might require a little metaprogramming for efficiency). Then we can experiment what suits our needs best.

Nevertheless, I'd personally prefer avoiding as many insertions and deletions (the latter are primarily caused by destructors, so no problem here for std::vector I guess) as possible altogether. I think that moving to a non-destructive Eval() would not only speedup, but also remove quite some boilerplate (of temporary set construction) from the code. If done well, it could give us full freedom in container choice without big performance penalties. It requires a bit of refactoring however, and as such is more error-prone.

Edit: sorry, misread your post. Updated to answer your question.

#10 Post by **tzlaine** » Tue Oct 04, 2011 2:56 pm

cami wrote:I'll try to be more detailed. The major problem actually isn't slow containers, but the tremendous amount of container operations - although I admit, that copying a std::vector should be substantially faster than copying an equally-dimensioned std::set. Many, if not most (by number of calls) of those operations are caused by Condition::*::Eval being a destructive operation: its parameters "matches" and "non_matches" are in-out, they both provide the set of candidates and store the result of the condition evaluation.

Agreed. Though I'd say that the slowdown comes not from the mutation of these objects per se, but the fact that a single element mutation is relatively expensive. It's k * O(log(N)), with a fairly high k, but we'd like it to be k * O(1), with a very low k.

"Destructive" here means that the original value of matches and non_matches is lost and cannot be used for further calls to Eval().

This is by design. The case we hoped would be most common is that of refining a set down to a smaller and smaller subset of itself -- you need the sets to be mutated at each step for this greediness to work.

Often, Eval() is called in loops that need to provide the same input values to matches and non_matches in each iteration, but since the value is "destroyed" by Eval, it needs to create a copy each iteration.

This goes against the greedy design I just mentioned. We may need to special-case this.

What I suggest is a strict separation of "in" and "out" parameters, in particular Condition::*::Eval() receives three sets instead of two:

a set of candidates to search within. This parameters is "in" and must not be manipulated by Eval(), but can be inspected in any way as appropriate.

a reference to a set to store matches found in. This parameter is "out", Eval() may only add new obects to the set and not inspect it in any other way.

a reference to a set to store candidates that didn't match in. This parameter is "out", Eval() may only add new objects to the set and not inspect it in any other way.
In most cases however, we won't be interested in both matches and non_matches.

I don't think this is true (or at least you can't know if it's true within a given Condition's code) because of the existence of Condition::Not. Not inverts the match and non_match sets. If you only mutated the set you thought was important in a given Condition, and then that Condition was nested inside a Not, the Not would select the unmutated, incorrect set.

I therefore think that the best we can do initially would be to replace std::set with something cheaper to copy and insert() and remove() from.

A more in-depth change that would allow for the kind of optimization you suggest here would be to:

1) make all Conditions aware of not-logic, making the SearchDomain a data member instead of a parameter to Eval(), and
2) have a Not-collapsing pass that gets performed on Condition trees, that would remove Nots, and modify each Not's nested Condition to invert its SearchDomain member

#11 Post by **tzlaine** » Tue Oct 04, 2011 3:07 pm

Geoff the Medio wrote:Regardless of whether Eval is destructive or preserving of the input candidates set, perhaps using std::list instead of a std::set or std::vector would work for ObjectSet / TargetSet?

We presently don't need to locate a particular UniverseObject in the ObjectSet, so using a vector or map indexed by object ID seems unnecessary. I'm also a bit concerned about the potential to have lots of 30kB+ mostly-null vectors of pointers being constructed and destructed repeatedly during a compound-condition evaluation.

We do need to locate a particular UniverseObject in the general case. ConditionBase::Eval() does just this. If it finds a match, it adds the match-object to the to_set, and removes it from the from_set. If you can't do fast removal by ID or object address, the latter operation gets really expensive. For this reason, std::list won't work.

There will be something like 4 allocated arrays in existence at any one time, IIRC, so I don't think this will be a memory hit. Allocating a single block of memory of a fixed size and setting it to all 0's is super fast (which is all that std::vector<UniverseObject*>'s ctor does), especially compared to copying a node-based container like std::set. Copying it is likewise very fast.

cami · #12 Post by **cami** » Tue Oct 04, 2011 7:26 pm

tzlaine wrote:Though I'd say that the slowdown comes not from the mutation of these objects per se, but the fact that a single element mutation is relatively expensive. It's k * O(log(N)), with a fairly high k, but we'd like it to be k * O(1), with a very low k.

It's a matter of perspective. If cost is A*B, and you can reduce both A and B substantially, doing either will result in a substantial overall gain. I guess we should consider both, for even more substantial gain

tzlaine wrote:
cami wrote:In most cases however, we won't be interested in both matches and non_matches.
I don't think this is true (or at least you can't know if it's true within a given Condition's code) because of the existence of Condition::Not. Not inverts the match and non_match sets. If you only mutated the set you thought was important in a given Condition, and then that Condition was nested inside a Not, the Not would select the unmutated, incorrect set.

I understand your design, and although I see its beauty, I think it's hurting more than it helps, the way the "real world" conditions in the game currently structured. The two sets usually have vastly different sizes, and we almost always want the smaller. Nevertheless, there is no need to replace the current mechanism, we just shouldn't enforce it. The situations where the current signature hurts most is at the top level, either when evaluating subconditions (typically with the whole universe as candidate set!), or at the real top level when retrieving effect target sets. That said, it's important to have a constant complete-universe object set for a non-destructive signature to develop its full potential.

tzlaine wrote:I therefore think that the best we can do initially would be to replace std::set with something cheaper to copy and insert() and remove() from.

Agreeing with that.

tzlaine wrote:A more in-depth change that would allow for the kind of optimization you suggest here would be to:

1) make all Conditions aware of not-logic, making the SearchDomain a data member instead of a parameter to Eval(), and
2) have a Not-collapsing pass that gets performed on Condition trees, that would remove Nots, and modify each Not's nested Condition to invert its SearchDomain member

As we have the fully parsed conditions permanently available, that's probably the smartest solution. At the moment it doesn't look necessary because of the remarks above, but I'm not the one to foresee whether it's going to stay like that.

#13 Post by **Geoff the Medio** » Tue Oct 04, 2011 7:42 pm

tzlaine wrote:We do need to locate a particular UniverseObject in the general case.

UniverseObject, or UniverseObject* ? (Big difference in size and cost to allocate, I suspect...)

ConditionBase::Eval() does just this. If it finds a match, it adds the match-object to the to_set, and removes it from the from_set. If you can't do fast removal by ID or object address, the latter operation gets really expensive. For this reason, std::list won't work.

ConditionBase::Eval doesn't erase by address or ID; it erases by iterator, so there's no need to be able to erase quickly by ID or address:

Code: Select all

    for ( ; it != end_it; ) {
        ObjectSet::iterator temp = it++;
        bool match = Match(ScriptingContext(parent_context, *temp));
        if ((search_domain == MATCHES && !match) || (search_domain == NON_MATCHES && match)) {
            to_set.insert(*temp);
            from_set.erase(temp);
        }
    }

temp is an iterator, not a bare pointer. Insert uses *temp which is a pointer, but for a list this becomes just push_back which is very fast. Perhaps not as fast as setting *temp = 0 for a vector, but lists avoid the need to iterate over a large sparsely-non-null vector of pointers to find objects process for each Eval recursion.

There will be something like 4 allocated arrays in existence at any one time

If there are nested conditions (eg. Contains ContainedBy Source), each level will need to keep its own set of matches to test its candidates with, won't it?

Allocating a single block of memory of a fixed size and setting it to all 0's is super fast (which is all that std::vector<UniverseObject*>'s ctor does), especially compared to copying a node-based container like std::set. Copying it is likewise very fast.

I'm contrasting with a list, which as above would need only fast push_back and fast by-iterator only-local-invalidating erase, and avoids the need for iterating over the whole set at each recursion with nested conditions. List copying would be slower than vectors for large sets, compared with vector, in the case of a full (of all UniverseObject) ObjectSet, but not as bad as a set.

cami wrote:That said, it's important to have a constant complete-universe object set for a non-destructive signature to develop its full potential.

Doing this properly will be a bit tricky, as you've got to update that ObjectSet whenever the universe gains or loses an object, so that subsequent calls to Eval on that ObjectSet, or copies of it, will have the right input. This is more complicated than the requirement for parallelism that the Universe doesn't change during evaluation of effect TargetSets, as conditions can be used at any time in code, including during effect exectuion which is repeatedly modifying the universe, necessitating frequent updating of that common contains-everything ObjectSet.

Also, if copying ObjectSets is made fast and a common starting point contains-everything ObjectSet is maintained, then that could be copied and then used as input for destructive Eval, with most of the benefits in cases when the starting set isn't rapidly changing.

#14 Post by **tzlaine** » Tue Oct 04, 2011 7:56 pm

cami wrote:Nevertheless, there is no need to replace the current mechanism, we just shouldn't enforce it.

I don't quite understand this sentence, but if it means you don't think we need to maintain both lists as long as each Condition does not know if it will be negated by being nested inside a Not, I think you're mistaken. Otherwise, I think we're on the same page.

At the moment it doesn't look necessary because of the remarks above, but I'm not the one to foresee whether it's going to stay like that.

Again, it sounds like you're saying we can apply the optimization you've suggested without the Not-collapsing transformation, but that's not possible. The only optimization we could do would be to switch from std::set to std::vector or std::list.

#15 Post by **tzlaine** » Tue Oct 04, 2011 7:59 pm

ConditionBase::Eval doesn't erase by address or ID; it erases by iterator, so there's no need to be able to erase quickly by ID or address:
Code: Select all
    for ( ; it != end_it; ) {
        ObjectSet::iterator temp = it++;
        bool match = Match(ScriptingContext(parent_context, *temp));
        if ((search_domain == MATCHES && !match) || (search_domain == NON_MATCHES && match)) {
            to_set.insert(*temp);
            from_set.erase(temp);
        }
    }

Good point! We can probably use std::list after all.

There will be something like 4 allocated arrays in existence at any one time
If there are nested conditions (eg. Contains ContainedBy Source), each level will need to keep its own set of matches to test its candidates with, won't it?

Ok, so maybe it will be ~10 arrays... I'm just saying that the memory hit is not probably an issue even then.

FreeOrion

[RESOLVED] Condition::Contains implementation

[RESOLVED] Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation

Re: Condition::Contains implementation