[RESOLVED] Condition::Contains implementation
Moderator: Committer
[RESOLVED] Condition::Contains implementation
Following my new responsibility =), I created another callgrind using r4304. This time, it contains two turns instead of one to reduce the impact of game loading. It appears the next candidate for improvement is Condition::Contains, where 50% of time is spent, mostly performing std::set<>::insert().
Last edited by cami on Sun Jan 26, 2014 3:05 am, edited 3 times in total.
Yesterday, we were still on the brink. Fortunately, today we have come one step further.
- Geoff the Medio
- Programming, Design, Admin
- Posts: 13603
- Joined: Wed Oct 08, 2003 1:33 am
- Location: Munich
Re: Condition::Contains implementation
What happens if you just swap in unordered_set for Condition::ObjectSet and Effect::TargetSet typedefs?cami wrote:It appears the next candidate for improvement is Condition::Contains, where 50% of time is spent, mostly performing std::set<>::insert().
Re: Condition::Contains implementation
I hope we can do even better than hashing by just using a std::vector<> to represent the UniverseObject sets (this assumes that the index into the vector is the object ID, and that the highest object ID never gets too high for use to use this trick). We should definitely try hashing first though.
Re: Condition::Contains implementation
std::vector is probably a sensible choice - however, it requires a careful evaluation of existing code, due to the iterator invalidation problem.
What do you think about non-destructive Eval() signatures?
Many insertions are simply copies of an existing container, just for later moving obects from the copy to a new container. I think it might be very helpful to provide a variant of Eval() that populates a container (two containers?) but leaving the other one unchanged. That would also eleminate the iterator invalidation problem. suggested signature:
or if you dont like the pointers and default arguments:
templates/metaprogramming can help avoiding code repetition and/or construction of unused containers here. For instance, I could imagine a "NULL" container that is always empty and does nothing on any operation taking the place of unused output parameters.
What do you think about non-destructive Eval() signatures?
Many insertions are simply copies of an existing container, just for later moving obects from the copy to a new container. I think it might be very helpful to provide a variant of Eval() that populates a container (two containers?) but leaving the other one unchanged. That would also eleminate the iterator invalidation problem. suggested signature:
Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet* matches = NULL, ObjectSet* non_matches = NULL, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const
Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & matches, ObjectSet & non_matches, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const;
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & results, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const; /* results is matches or non_matches, depending on search_domain*/
Yesterday, we were still on the brink. Fortunately, today we have come one step further.
Re: Condition::Contains implementation
There should be no invalidation problem if there are no insertions or deletions (and there shouldn't be!).cami wrote:std::vector is probably a sensible choice - however, it requires a careful evaluation of existing code, due to the iterator invalidation problem.
I mean something like this:
Code: Select all
// at most one of matches[i] and non_matches[i] contains a pointer to the object with ID i
std::vector<UniverseObject*> matches(objects.size()), 0;
std::vector<UniverseObject*> non_matches(objects.size()); // initialize somewhere so that matches[i] = <address of object w/ID i
if (/*match found*/) {
matches[i] = non_matches[i]; // was matches.insert(...);
non_matches[i] = 0; // was non_matches.erase(...);
}
Re: Condition::Contains implementation
I'm afraid I don't get what you're suggesting yet. Could you explain it a different way?cami wrote:What do you think about non-destructive Eval() signatures?
Many insertions are simply copies of an existing container, just for later moving obects from the copy to a new container. I think it might be very helpful to provide a variant of Eval() that populates a container (two containers?) but leaving the other one unchanged. That would also eleminate the iterator invalidation problem. suggested signature:
or if you dont like the pointers and default arguments:Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet* matches = NULL, ObjectSet* non_matches = NULL, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const
templates/metaprogramming can help avoiding code repetition and/or construction of unused containers here. For instance, I could imagine a "NULL" container that is always empty and does nothing on any operation taking the place of unused output parameters.Code: Select all
void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & matches, ObjectSet & non_matches, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const; void Condition::Contains::Eval(const ScriptingContext& parent_context, ObjectSet & results, const ObectSet& candidates, SearchDomain search_domain = NON_MATCHES) const; /* results is matches or non_matches, depending on search_domain*/
Re: Condition::Contains implementation
I'll try to be more detailed. The major problem actually isn't slow containers, but the tremendous amount of container operations - although I admit, that copying a std::vector should be substantially faster than copying an equally-dimensioned std::set. Many, if not most (by number of calls) of those operations are caused by Condition::*::Eval being a destructive operation: its parameters "matches" and "non_matches" are in-out, they both provide the set of candidates and store the result of the condition evaluation. "Destructive" here means that the original value of matches and non_matches is lost and cannot be used for further calls to Eval(). Often, Eval() is called in loops that need to provide the same input values to matches and non_matches in each iteration, but since the value is "destroyed" by Eval, it needs to create a copy each iteration.
What I suggest is a strict separation of "in" and "out" parameters, in particular Condition::*::Eval() receives three sets instead of two:
The templated implementation can assume that Matches and NonMatches are containers of UniverseObject * (like std::set), and will store the results of its computation inside those. Now, we create a special Container type "NullContainer<ValueType>", that behaves similar to a Container (in particular, it should implement the interface necessary for Eval()), with the exception that its mutators simply do nothing. With inlining enabled, instantiating Eval() with a NullContainer<UniverseObject *> out-parameter will result in an Eval() version that does not waste time for filling this container, as the compiler will optimize-away all manipulations of this parameter. In particular, we can provide an additional signature for Eval() that is common to all Conditions with almost no additional code:
This is pure convencience, but in fact most current code only needs the matches and can benefit from the candidate list remaining unchanged. The template instantiation hidden in the first switch case optimizes-away all code that fills the "non_matches" container. Likewise, the second switch case doesn't waste any time constructing a set of matches, but uses the same templated code. Whenever you needed both, you could simply call Eval() with all three parameters being an ObjectSet:
If the template implementation is smart enough to even behave correctly when non_matches and candidates are references to the same object and also when matches and candidates are references to the same object, then we can even reconstruct the old destructive behavior by supplying the search domain as "candidates" parameter:
What I suggest is a strict separation of "in" and "out" parameters, in particular Condition::*::Eval() receives three sets instead of two:
- a set of candidates to search within. This parameters is "in" and must not be manipulated by Eval(), but can be inspected in any way as appropriate.
- a reference to a set to store matches found in. This parameter is "out", Eval() may only add new obects to the set and not inspect it in any other way.
- a reference to a set to store candidates that didn't match in. This parameter is "out", Eval() may only add new objects to the set and not inspect it in any other way.
Code: Select all
template <class Matches, class NonMatches> Eval(Matches& matches, NonMatches& non_matches, const ObjectSet& candidates)
Code: Select all
void Condition::ConditionBase::Eval(ObjectSet& results, const ObjectSet& candidates, SearchDomain return_what = MATCHES)
{
NullContainer<const UniverseObject *> ignored; // does nothing whatever method you call
switch(return_what)
{
case MATCHES:
Eval(results, ignored, candidates);
break;
case NON_MATCHES:
Eval(ignored, results, candidates);
break;
}
}
Code: Select all
ObjectSet temp_matches temp_non_matches;
// stores matches in temp_matches and (candidates - matches) in temp_non_matches
Eval(temp_matches, temp_non_matches, candidates);
Code: Select all
void Condition::ConditionBase::Eval(ObjectSet& matches, ObjectSet& non_matches, SearchDomain search_domain = NON_MATCHES)
{
switch(search_domain)
{
case MATCHES:
Eval(matches, non_matches, matches);
break;
case NON_MATCHES:
Eval(matches, non_matches, non_matches);
break;
}
}
Yesterday, we were still on the brink. Fortunately, today we have come one step further.
- Geoff the Medio
- Programming, Design, Admin
- Posts: 13603
- Joined: Wed Oct 08, 2003 1:33 am
- Location: Munich
Re: Condition::Contains implementation
Regardless of whether Eval is destructive or preserving of the input candidates set, perhaps using std::list instead of a std::set or std::vector would work for ObjectSet / TargetSet?
We presently don't need to locate a particular UniverseObject in the ObjectSet, so using a vector or map indexed by object ID seems unnecessary. I'm also a bit concerned about the potential to have lots of 30kB+ mostly-null vectors of pointers being constructed and destructed repeatedly during a compound-condition evaluation.
We presently don't need to locate a particular UniverseObject in the ObjectSet, so using a vector or map indexed by object ID seems unnecessary. I'm also a bit concerned about the potential to have lots of 30kB+ mostly-null vectors of pointers being constructed and destructed repeatedly during a compound-condition evaluation.
Re: Condition::Contains implementation
Allocation isn't as bad as one might expect. GCC libstdc++ uses allocators with deferred deallocation and object recycling techniques. The callgrind shows that allocation and deallocation causes <7% of total running time, 13% if you include initialization and finalization. While this is a fairly high value, static allocation won't bring a major speedup here at the moment. As a comparison: the total cost of inserts and deletes is about 64%. Switching to std::vector will, however, also dramatically speedup insertion and deletion, in most cases it will result in a simple memory access. I suspect this will be a quick and very effective change. Update: std::list probably plays in the same league, here, being slightly slower in general but avoiding "gaps" in the list that would occur when setting entries to NULL in std::vector. In any case, I believe we should implement Conditions in a way that switching containers is only a typedef change (this might require a little metaprogramming for efficiency). Then we can experiment what suits our needs best.
Nevertheless, I'd personally prefer avoiding as many insertions and deletions (the latter are primarily caused by destructors, so no problem here for std::vector I guess) as possible altogether. I think that moving to a non-destructive Eval() would not only speedup, but also remove quite some boilerplate (of temporary set construction) from the code. If done well, it could give us full freedom in container choice without big performance penalties. It requires a bit of refactoring however, and as such is more error-prone.
Edit: sorry, misread your post. Updated to answer your question.
Nevertheless, I'd personally prefer avoiding as many insertions and deletions (the latter are primarily caused by destructors, so no problem here for std::vector I guess) as possible altogether. I think that moving to a non-destructive Eval() would not only speedup, but also remove quite some boilerplate (of temporary set construction) from the code. If done well, it could give us full freedom in container choice without big performance penalties. It requires a bit of refactoring however, and as such is more error-prone.
Edit: sorry, misread your post. Updated to answer your question.
Yesterday, we were still on the brink. Fortunately, today we have come one step further.
Re: Condition::Contains implementation
Agreed. Though I'd say that the slowdown comes not from the mutation of these objects per se, but the fact that a single element mutation is relatively expensive. It's k * O(log(N)), with a fairly high k, but we'd like it to be k * O(1), with a very low k.cami wrote:I'll try to be more detailed. The major problem actually isn't slow containers, but the tremendous amount of container operations - although I admit, that copying a std::vector should be substantially faster than copying an equally-dimensioned std::set. Many, if not most (by number of calls) of those operations are caused by Condition::*::Eval being a destructive operation: its parameters "matches" and "non_matches" are in-out, they both provide the set of candidates and store the result of the condition evaluation.
This is by design. The case we hoped would be most common is that of refining a set down to a smaller and smaller subset of itself -- you need the sets to be mutated at each step for this greediness to work."Destructive" here means that the original value of matches and non_matches is lost and cannot be used for further calls to Eval().
This goes against the greedy design I just mentioned. We may need to special-case this.Often, Eval() is called in loops that need to provide the same input values to matches and non_matches in each iteration, but since the value is "destroyed" by Eval, it needs to create a copy each iteration.
I don't think this is true (or at least you can't know if it's true within a given Condition's code) because of the existence of Condition::Not. Not inverts the match and non_match sets. If you only mutated the set you thought was important in a given Condition, and then that Condition was nested inside a Not, the Not would select the unmutated, incorrect set.What I suggest is a strict separation of "in" and "out" parameters, in particular Condition::*::Eval() receives three sets instead of two:
In most cases however, we won't be interested in both matches and non_matches.
- a set of candidates to search within. This parameters is "in" and must not be manipulated by Eval(), but can be inspected in any way as appropriate.
- a reference to a set to store matches found in. This parameter is "out", Eval() may only add new obects to the set and not inspect it in any other way.
- a reference to a set to store candidates that didn't match in. This parameter is "out", Eval() may only add new objects to the set and not inspect it in any other way.
I therefore think that the best we can do initially would be to replace std::set with something cheaper to copy and insert() and remove() from.
A more in-depth change that would allow for the kind of optimization you suggest here would be to:
1) make all Conditions aware of not-logic, making the SearchDomain a data member instead of a parameter to Eval(), and
2) have a Not-collapsing pass that gets performed on Condition trees, that would remove Nots, and modify each Not's nested Condition to invert its SearchDomain member
Re: Condition::Contains implementation
We do need to locate a particular UniverseObject in the general case. ConditionBase::Eval() does just this. If it finds a match, it adds the match-object to the to_set, and removes it from the from_set. If you can't do fast removal by ID or object address, the latter operation gets really expensive. For this reason, std::list won't work.Geoff the Medio wrote:Regardless of whether Eval is destructive or preserving of the input candidates set, perhaps using std::list instead of a std::set or std::vector would work for ObjectSet / TargetSet?
We presently don't need to locate a particular UniverseObject in the ObjectSet, so using a vector or map indexed by object ID seems unnecessary. I'm also a bit concerned about the potential to have lots of 30kB+ mostly-null vectors of pointers being constructed and destructed repeatedly during a compound-condition evaluation.
There will be something like 4 allocated arrays in existence at any one time, IIRC, so I don't think this will be a memory hit. Allocating a single block of memory of a fixed size and setting it to all 0's is super fast (which is all that std::vector<UniverseObject*>'s ctor does), especially compared to copying a node-based container like std::set. Copying it is likewise very fast.
Re: Condition::Contains implementation
It's a matter of perspective. If cost is A*B, and you can reduce both A and B substantially, doing either will result in a substantial overall gain. I guess we should consider both, for even more substantial gaintzlaine wrote:Though I'd say that the slowdown comes not from the mutation of these objects per se, but the fact that a single element mutation is relatively expensive. It's k * O(log(N)), with a fairly high k, but we'd like it to be k * O(1), with a very low k.
I understand your design, and although I see its beauty, I think it's hurting more than it helps, the way the "real world" conditions in the game currently structured. The two sets usually have vastly different sizes, and we almost always want the smaller. Nevertheless, there is no need to replace the current mechanism, we just shouldn't enforce it. The situations where the current signature hurts most is at the top level, either when evaluating subconditions (typically with the whole universe as candidate set!), or at the real top level when retrieving effect target sets. That said, it's important to have a constant complete-universe object set for a non-destructive signature to develop its full potential.tzlaine wrote:I don't think this is true (or at least you can't know if it's true within a given Condition's code) because of the existence of Condition::Not. Not inverts the match and non_match sets. If you only mutated the set you thought was important in a given Condition, and then that Condition was nested inside a Not, the Not would select the unmutated, incorrect set.cami wrote:In most cases however, we won't be interested in both matches and non_matches.
Agreeing with that.tzlaine wrote:I therefore think that the best we can do initially would be to replace std::set with something cheaper to copy and insert() and remove() from.
As we have the fully parsed conditions permanently available, that's probably the smartest solution. At the moment it doesn't look necessary because of the remarks above, but I'm not the one to foresee whether it's going to stay like that.tzlaine wrote:A more in-depth change that would allow for the kind of optimization you suggest here would be to:
1) make all Conditions aware of not-logic, making the SearchDomain a data member instead of a parameter to Eval(), and
2) have a Not-collapsing pass that gets performed on Condition trees, that would remove Nots, and modify each Not's nested Condition to invert its SearchDomain member
Yesterday, we were still on the brink. Fortunately, today we have come one step further.
- Geoff the Medio
- Programming, Design, Admin
- Posts: 13603
- Joined: Wed Oct 08, 2003 1:33 am
- Location: Munich
Re: Condition::Contains implementation
UniverseObject, or UniverseObject* ? (Big difference in size and cost to allocate, I suspect...)tzlaine wrote:We do need to locate a particular UniverseObject in the general case.
ConditionBase::Eval doesn't erase by address or ID; it erases by iterator, so there's no need to be able to erase quickly by ID or address:ConditionBase::Eval() does just this. If it finds a match, it adds the match-object to the to_set, and removes it from the from_set. If you can't do fast removal by ID or object address, the latter operation gets really expensive. For this reason, std::list won't work.
Code: Select all
for ( ; it != end_it; ) {
ObjectSet::iterator temp = it++;
bool match = Match(ScriptingContext(parent_context, *temp));
if ((search_domain == MATCHES && !match) || (search_domain == NON_MATCHES && match)) {
to_set.insert(*temp);
from_set.erase(temp);
}
}
If there are nested conditions (eg. Contains ContainedBy Source), each level will need to keep its own set of matches to test its candidates with, won't it?There will be something like 4 allocated arrays in existence at any one time
I'm contrasting with a list, which as above would need only fast push_back and fast by-iterator only-local-invalidating erase, and avoids the need for iterating over the whole set at each recursion with nested conditions. List copying would be slower than vectors for large sets, compared with vector, in the case of a full (of all UniverseObject) ObjectSet, but not as bad as a set.Allocating a single block of memory of a fixed size and setting it to all 0's is super fast (which is all that std::vector<UniverseObject*>'s ctor does), especially compared to copying a node-based container like std::set. Copying it is likewise very fast.
Doing this properly will be a bit tricky, as you've got to update that ObjectSet whenever the universe gains or loses an object, so that subsequent calls to Eval on that ObjectSet, or copies of it, will have the right input. This is more complicated than the requirement for parallelism that the Universe doesn't change during evaluation of effect TargetSets, as conditions can be used at any time in code, including during effect exectuion which is repeatedly modifying the universe, necessitating frequent updating of that common contains-everything ObjectSet.cami wrote:That said, it's important to have a constant complete-universe object set for a non-destructive signature to develop its full potential.
Also, if copying ObjectSets is made fast and a common starting point contains-everything ObjectSet is maintained, then that could be copied and then used as input for destructive Eval, with most of the benefits in cases when the starting set isn't rapidly changing.
Re: Condition::Contains implementation
I don't quite understand this sentence, but if it means you don't think we need to maintain both lists as long as each Condition does not know if it will be negated by being nested inside a Not, I think you're mistaken. Otherwise, I think we're on the same page.cami wrote:Nevertheless, there is no need to replace the current mechanism, we just shouldn't enforce it.
Again, it sounds like you're saying we can apply the optimization you've suggested without the Not-collapsing transformation, but that's not possible. The only optimization we could do would be to switch from std::set to std::vector or std::list.At the moment it doesn't look necessary because of the remarks above, but I'm not the one to foresee whether it's going to stay like that.
Re: Condition::Contains implementation
Good point! We can probably use std::list after all.ConditionBase::Eval doesn't erase by address or ID; it erases by iterator, so there's no need to be able to erase quickly by ID or address:Code: Select all
for ( ; it != end_it; ) { ObjectSet::iterator temp = it++; bool match = Match(ScriptingContext(parent_context, *temp)); if ((search_domain == MATCHES && !match) || (search_domain == NON_MATCHES && match)) { to_set.insert(*temp); from_set.erase(temp); } }
Ok, so maybe it will be ~10 arrays... I'm just saying that the memory hit is not probably an issue even then.If there are nested conditions (eg. Contains ContainedBy Source), each level will need to keep its own set of matches to test its candidates with, won't it?There will be something like 4 allocated arrays in existence at any one time