Best Python code snippet using radish
orientdb_query_execution.py
Source:orientdb_query_execution.py
1# Copyright 2018-present Kensho Technologies, LLC.2"""Workarounds for OrientDB scheduler issue that causes poor query planning for certain queries.3For purposes of query planning, the OrientDB query planner ignores "where:" clauses4that hit indexes but do not use the "=" operator. For example, "CONTAINS" can be used to check5that a field covered by an index is in a specified list of values, and can therefore be covered6by an index, but OrientDB will ignore this. When no equality ("=") checks on indexed columns7are present, OrientDB will generate a query plan that starts execution at the class with8lowest cardinality, which can lead to excessive numbers of scanned and discarded records.9Assuming the query planner creates a query plan where a location with CONTAINS is10the first in the execution order, the execution system will apply indexes11to speed up this operation. Therefore, it's sufficient to trick the query planner into12always creating such a query plan, even though it thinks indexes cannot be used in the query.13Valid query execution start points for the OrientDB query planner must satisfy the following:14 - Must not be "optional: true".15 - Must not have a "while:" clause nor follow a location that has one.16 - Must have a "class:" defined. This class is used for cardinality estimation, and to17 look for available indexes that may cover any "where:" clause that may be present.18The optimizations in this file improve performance by enabling execution start points according19to the following assumptions:20 1. Start points with "where:" clauses that reference only local fields (i.e. not tagged values21 from other query locations) are always better than start points without a "where:".22 This is because the filter will have to be applied one way or the other, so we might as well23 apply it early.24 2. If no such start points are available, we'd like to make available as many start points25 as possible, since we'd like OrientDB to start at the start point whose class has26 the lowest possible cardinality.27The process of applying the optimizations is as follows:28 - Exclude and ignore all query steps that are inside a fold, optional, or recursion scope,29 or have a "where:" clause that references a non-local (i.e. tagged) field.30 - Find all remaining query steps with "where:" clauses that reference only local fields.31 - If any are found, we guide our actions from assumption 1 above:32 - Ensure they have a defined "class:" -- i.e. the OrientDB scheduler will consider them33 valid start points.34 - Then, prune all other query steps (ones without such "where:" clauses) by removing their35 "class:" clause, making them invalid as query start points for OrientDB's scheduler.36 - If none are found, we guide our actions from assumption 2 above:37 - Ensure that all query points not inside fold, optional, or recursion scope contain38 a "class:" clause. That increases the number of available query start points,39 so OrientDB can choose the start point of lowest cardinality.40"""41from ..blocks import CoerceType, Filter, QueryRoot, Recurse, Traverse42from ..expressions import (43 BinaryComposition,44 ContextField,45 ContextFieldExistence,46 Literal,47 LocalField,48)49from ..helpers import get_only_element_from_collection50def _is_local_filter(filter_block):51 """Return True if the Filter block references no non-local fields, and False otherwise."""52 is_local_filter = True53 filter_predicate = filter_block.predicate54 def visitor_fn(expression):55 """Expression visitor function that looks for uses of non-local fields."""56 nonlocal is_local_filter57 non_local_expression_types = (ContextField, ContextFieldExistence)58 if isinstance(expression, non_local_expression_types):59 is_local_filter = False60 # Don't change the expression.61 return expression62 filter_predicate.visit_and_update(visitor_fn)63 return is_local_filter64def _classify_query_locations(match_query):65 """Classify query locations into three groups: preferred, eligible, ineligible.66 - Ineligible locations are ones that cannot be the starting point of query execution.67 These include locations within recursions, locations that are the target of68 an optional traversal, and locations with an associated "where:" clause with non-local filter.69 - Preferred locations are ones that are eligible to be the starting point, and also have70 an associated "where:" clause that references no non-local fields -- only local fields,71 literals, and variables.72 - Eligible locations are all locations that do not fall into either of these two categories.73 Args:74 match_query: MatchQuery object describing the query being analyzed for optimization75 Returns:76 tuple (preferred, eligible, ineligible) where each element is a set of Location objects.77 The three sets are disjoint.78 """79 preferred_locations = set()80 eligible_locations = set()81 ineligible_locations = set()82 # Any query must have at least one traversal with at least one step.83 # The first step in this traversal must be a QueryRoot.84 first_match_step = match_query.match_traversals[0][0]85 if not isinstance(first_match_step.root_block, QueryRoot):86 raise AssertionError(87 "First step of first traversal unexpectedly was not QueryRoot: "88 "{} {}".format(first_match_step, match_query)89 )90 # The first step in the first traversal cannot possibly be inside an optional, recursion,91 # or fold. Its location is always an eligible start location for a query.92 # We need to determine whether it is merely eligible, or actually a preferred location.93 if first_match_step.where_block is not None:94 if _is_local_filter(first_match_step.where_block):95 preferred_locations.add(first_match_step.as_block.location)96 else:97 # TODO(predrag): Fix once we have a proper fix for tag-and-filter in the same scope.98 # Either the locally-scoped tag will have to generate a LocalField99 # instead of a ContextField, or we'll have to rework the local filter100 # detection code in this module.101 raise AssertionError(102 "The first step of the first traversal somehow had a non-local "103 "filter. This should not be possible, since there is nowhere "104 "for the tagged value to have come from. Values: {} {}".format(105 first_match_step, match_query106 )107 )108 else:109 eligible_locations.add(first_match_step.as_block.location)110 # This loop will repeat the analysis of the first step of the first traversal.111 # QueryRoots other than the first are required to always be at a location whose status112 # (preferred / eligible / ineligible) is already known. Since we already processed113 # the first QueryRoot above, the rest of the loop can assume all QueryRoots are like that.114 for current_traversal in match_query.match_traversals:115 for match_step in current_traversal:116 current_step_location = match_step.as_block.location117 if isinstance(match_step.root_block, QueryRoot):118 already_encountered_location = any(119 (120 current_step_location in preferred_locations,121 current_step_location in eligible_locations,122 current_step_location in ineligible_locations,123 )124 )125 if not already_encountered_location:126 raise AssertionError(127 "Unexpectedly encountered a location in QueryRoot whose "128 "status has not been determined: {} {} {}".format(129 current_step_location, match_step, match_query130 )131 )132 at_eligible_or_preferred_location = (133 current_step_location in preferred_locations134 or current_step_location in eligible_locations135 )136 # This location has already been encountered and processed.137 # Other than setting the "at_eligible_or_preferred_location" state for the sake of138 # the following MATCH steps, there is nothing further to be done.139 continue140 elif isinstance(match_step.root_block, Recurse):141 # All Recurse blocks cause locations within to be ineligible.142 at_eligible_or_preferred_location = False143 elif isinstance(match_step.root_block, Traverse):144 # Optional Traverse blocks cause locations within to be ineligible.145 # Non-optional Traverse blocks do not change the eligibility of locations within:146 # if the pre-Traverse location was eligible, so will the location within,147 # and if it was not eligible, neither will the location within.148 if match_step.root_block.optional:149 at_eligible_or_preferred_location = False150 else:151 raise AssertionError(152 "Unreachable condition reached: {} {} {}".format(153 match_step.root_block, match_step, match_query154 )155 )156 if not at_eligible_or_preferred_location:157 ineligible_locations.add(current_step_location)158 elif match_step.where_block is not None:159 if _is_local_filter(match_step.where_block):160 # This location has a local filter, and is not otherwise ineligible (it's not161 # in a recursion etc.). Therefore, it's a preferred query start location.162 preferred_locations.add(current_step_location)163 else:164 # Locations with non-local filters are never eligible locations, since they165 # depend on another location being executed before them.166 ineligible_locations.add(current_step_location)167 else:168 # No local filtering (i.e. not preferred), but also not ineligible. Eligible it is.169 eligible_locations.add(current_step_location)170 return preferred_locations, eligible_locations, ineligible_locations171def _calculate_type_bound_at_step(match_step):172 """Return the GraphQL type bound at the given step, or None if no bound is given."""173 current_type_bounds = []174 if isinstance(match_step.root_block, QueryRoot):175 # The QueryRoot start class is a type bound.176 current_type_bounds.extend(match_step.root_block.start_class)177 if match_step.coerce_type_block is not None:178 # The CoerceType target class is also a type bound.179 current_type_bounds.extend(match_step.coerce_type_block.target_class)180 if current_type_bounds:181 # A type bound exists. Assert that there is exactly one bound, defined in precisely one way.182 return get_only_element_from_collection(current_type_bounds)183 else:184 # No type bound exists at this MATCH step.185 return None186def _assert_type_bounds_are_not_conflicting(187 current_type_bound, previous_type_bound, location, match_query188):189 """Ensure that the two bounds either are an exact match, or one of them is None."""190 if all(191 (192 current_type_bound is not None,193 previous_type_bound is not None,194 current_type_bound != previous_type_bound,195 )196 ):197 raise AssertionError(198 "Conflicting type bounds calculated at location {}: {} vs {} "199 "for query {}".format(location, previous_type_bound, current_type_bound, match_query)200 )201def _expose_only_preferred_locations(202 match_query, location_types, coerced_locations, preferred_locations, eligible_locations203):204 """Return a MATCH query where only preferred locations are valid as query start locations."""205 preferred_location_types = dict()206 eligible_location_types = dict()207 new_match_traversals = []208 for current_traversal in match_query.match_traversals:209 new_traversal = []210 for match_step in current_traversal:211 new_step = match_step212 current_step_location = match_step.as_block.location213 if current_step_location in preferred_locations:214 # This location is preferred. We have to make sure that at least one occurrence215 # of this location in the MATCH query has an associated "class:" clause,216 # which would be generated by a type bound at the corresponding MATCH step.217 current_type_bound = _calculate_type_bound_at_step(match_step)218 previous_type_bound = preferred_location_types.get(current_step_location, None)219 if previous_type_bound is not None:220 # The location is already valid. If so, make sure that this step either does221 # not have any type bounds (e.g. via QueryRoot or CoerceType blocks),222 # or has type bounds that match the previously-decided type bound.223 _assert_type_bounds_are_not_conflicting(224 current_type_bound, previous_type_bound, current_step_location, match_query225 )226 else:227 # The location is not yet known to be valid. If it does not have228 # a type bound in this MATCH step, add a type coercion to the type229 # registered in "location_types".230 if current_type_bound is None:231 current_type_bound = location_types[current_step_location].name232 new_step = match_step._replace(233 coerce_type_block=CoerceType({current_type_bound})234 )235 preferred_location_types[current_step_location] = current_type_bound236 elif current_step_location in eligible_locations:237 # This location is eligible, but not preferred. We have not make sure238 # none of the MATCH steps with this location have type bounds, and therefore239 # will not produce a corresponding "class:" clause in the resulting MATCH query.240 current_type_bound = _calculate_type_bound_at_step(match_step)241 previous_type_bound = eligible_location_types.get(current_step_location, None)242 if current_type_bound is not None:243 # There is a type bound here that we need to neutralize.244 _assert_type_bounds_are_not_conflicting(245 current_type_bound, previous_type_bound, current_step_location, match_query246 )247 # Record the deduced type bound, so that if we encounter this location again,248 # we ensure that we again infer the same type bound.249 eligible_location_types[current_step_location] = current_type_bound250 # Remove blocks that would emit a "class:" clause251 if isinstance(match_step.root_block, QueryRoot):252 new_root_block = None253 else:254 # The root_block can be a QueryRoot, Traverse, Recurse or Backtrack.255 new_root_block = match_step.root_block256 new_coerce_type_block = None257 new_where_block = match_step.where_block258 # If needed, add a type bound that emits an INSTANCEOF in the "where:" clause259 if previous_type_bound is None and current_type_bound is not None:260 instanceof_predicate = BinaryComposition(261 "INSTANCEOF", LocalField("@this", None), Literal(current_type_bound)262 )263 if match_step.where_block:264 # TODO(bojanserafimov): This branch needs test coverage265 new_where_block = Filter(266 BinaryComposition(267 "&&", instanceof_predicate, match_step.where_block.predicate268 )269 )270 else:271 new_where_block = Filter(instanceof_predicate)272 new_step = match_step._replace(273 root_block=new_root_block,274 coerce_type_block=new_coerce_type_block,275 where_block=new_where_block,276 )277 else:278 # This location is neither preferred nor eligible.279 # No action is necessary at this location.280 pass281 new_traversal.append(new_step)282 new_match_traversals.append(new_traversal)283 return match_query._replace(match_traversals=new_match_traversals)284def _expose_all_eligible_locations(match_query, location_types, eligible_locations):285 """Return a MATCH query where all eligible locations are valid as query start locations."""286 eligible_location_types = dict()287 new_match_traversals = []288 for current_traversal in match_query.match_traversals:289 new_traversal = []290 for match_step in current_traversal:291 new_step = match_step292 current_step_location = match_step.as_block.location293 if current_step_location in eligible_locations:294 # This location is eligible. We need to make sure it has an associated type bound,295 # so that it produces a "class:" clause that will make it a valid query start296 # location. It either already has such a type bound, or we can use the type297 # implied by the GraphQL query structure to add one.298 current_type_bound = _calculate_type_bound_at_step(match_step)299 previous_type_bound = eligible_location_types.get(current_step_location, None)300 if current_type_bound is None:301 current_type_bound = location_types[current_step_location].name302 new_coerce_type_block = CoerceType({current_type_bound})303 new_step = match_step._replace(coerce_type_block=new_coerce_type_block)304 else:305 # There is a type bound here. We simply ensure that the bound is not conflicting306 # with any other type bound at a different MATCH step with the same location.307 _assert_type_bounds_are_not_conflicting(308 current_type_bound, previous_type_bound, current_step_location, match_query309 )310 # Record the deduced type bound, so that if we encounter this location again,311 # we ensure that we again infer the same type bound.312 eligible_location_types[current_step_location] = current_type_bound313 else:314 # This function may only be called if there are no preferred locations. Since this315 # location cannot be preferred, and is not eligible, it must be ineligible.316 # No action is necessary in this case.317 pass318 new_traversal.append(new_step)319 new_match_traversals.append(new_traversal)320 return match_query._replace(match_traversals=new_match_traversals)321def expose_ideal_query_execution_start_points(322 compound_match_query, location_types, coerced_locations323):324 """Ensure that OrientDB only considers desirable query start points in query planning."""325 new_queries = []326 for match_query in compound_match_query.match_queries:327 location_classification = _classify_query_locations(match_query)328 preferred_locations, eligible_locations, _ = location_classification329 if preferred_locations:330 # Convert all eligible locations into non-eligible ones, by removing331 # their "class:" clause. The "class:" clause is provided either by having332 # a QueryRoot block or a CoerceType block in the MatchStep corresponding333 # to the location. We remove it by converting the class check into334 # an "INSTANCEOF" Filter block, which OrientDB is unable to optimize away.335 new_query = _expose_only_preferred_locations(336 match_query,337 location_types,338 coerced_locations,339 preferred_locations,340 eligible_locations,341 )342 elif eligible_locations:343 # Make sure that all eligible locations have a "class:" clause by adding344 # a CoerceType block that is a no-op as guaranteed by the schema. This merely345 # ensures that OrientDB is able to use each of these locations as a query start point,346 # and will choose the one whose class is of lowest cardinality.347 new_query = _expose_all_eligible_locations(348 match_query, location_types, eligible_locations349 )350 else:351 raise AssertionError(352 "This query has no preferred or eligible query start locations. "353 "This is almost certainly a bug: {}".format(match_query)354 )355 new_queries.append(new_query)...
emit_match.py
Source:emit_match.py
1"""Convert lowered IR basic blocks to MATCH query strings."""2from collections import deque3import six4from .blocks import Filter, MarkLocation, QueryRoot, Recurse, Traverse5from .expressions import TrueLiteral6from .helpers import get_only_element_from_collection, validate_safe_string7def _get_vertex_location_name(location):8 """Get the location name from a location that is expected to point to a vertex."""9 mark_name, field_name = location.get_location_name()10 if field_name is not None:11 raise AssertionError(u"Location unexpectedly pointed to a field: {}".format(location))12 return mark_name13def _first_step_to_match(match_step):14 """Transform the very first MATCH step into a MATCH query string."""15 parts = []16 if match_step.root_block is not None:17 if not isinstance(match_step.root_block, QueryRoot):18 raise AssertionError(u"Expected None or QueryRoot root block, received: " u"{} {}".format(match_step.root_block, match_step))19 match_step.root_block.validate()20 start_class = get_only_element_from_collection(match_step.root_block.start_class)21 parts.append(u"class: %s" % (start_class,))22 if match_step.coerce_type_block is not None:23 raise AssertionError(u"Invalid MATCH step: {}".format(match_step))24 if match_step.where_block:25 match_step.where_block.validate()26 parts.append(u"where: (%s)" % (match_step.where_block.predicate.to_match(),))27 if match_step.as_block is None:28 raise AssertionError(u"Found a MATCH step without a corresponding Location. " u"This should never happen: {}".format(match_step))29 else:30 match_step.as_block.validate()31 parts.append(u"as: %s" % (_get_vertex_location_name(match_step.as_block.location),))32 return u"{{ %s }}" % (u", ".join(parts),)33def _subsequent_step_to_match(match_step):34 """Transform any subsequent (non-first) MATCH step into a MATCH query string."""35 if not isinstance(match_step.root_block, (Traverse, Recurse)):36 raise AssertionError(u"Expected Traverse root block, received: " u"{} {}".format(match_step.root_block, match_step))37 is_recursing = isinstance(match_step.root_block, Recurse)38 match_step.root_block.validate()39 traversal_command = u".%s('%s')" % (match_step.root_block.direction, match_step.root_block.edge_name)40 parts = []41 if match_step.coerce_type_block:42 coerce_type_set = match_step.coerce_type_block.target_class43 if len(coerce_type_set) != 1:44 raise AssertionError(u"Found MATCH type coercion block with more than one target class:" u" {} {}".format(coerce_type_set, match_step))45 coerce_type_target = list(coerce_type_set)[0]46 parts.append(u"class: %s" % (coerce_type_target,))47 if is_recursing:48 parts.append(u"while: ($depth < %d)" % (match_step.root_block.depth,))49 if match_step.where_block:50 match_step.where_block.validate()51 parts.append(u"where: (%s)" % (match_step.where_block.predicate.to_match(),))52 if not is_recursing and match_step.root_block.optional:53 parts.append(u"optional: true")54 if match_step.as_block:55 match_step.as_block.validate()56 parts.append(u"as: %s" % (_get_vertex_location_name(match_step.as_block.location),))57 return u"%s {{ %s }}" % (traversal_command, u", ".join(parts))58def _represent_match_traversal(match_traversal):59 """Emit MATCH query code for an entire MATCH traversal sequence."""60 output = []61 output.append(_first_step_to_match(match_traversal[0]))62 for step in match_traversal[1:]:63 output.append(_subsequent_step_to_match(step))64 return u"".join(output)65def _represent_fold(fold_location, fold_ir_blocks):66 """Emit a LET clause corresponding to the IR blocks for a @fold scope."""67 start_let_template = u"$%(mark_name)s = %(base_location)s"68 traverse_edge_template = u'.%(direction)s("%(edge_name)s")'69 base_template = start_let_template + traverse_edge_template70 edge_direction, edge_name = fold_location.get_first_folded_edge()71 mark_name, _ = fold_location.get_location_name()72 base_location_name, _ = fold_location.base_location.get_location_name()73 validate_safe_string(mark_name)74 validate_safe_string(base_location_name)75 validate_safe_string(edge_direction)76 validate_safe_string(edge_name)77 template_data = {"mark_name": mark_name, "base_location": base_location_name, "direction": edge_direction, "edge_name": edge_name}78 final_string = base_template % template_data79 for block in fold_ir_blocks:80 if isinstance(block, Filter):81 final_string += u"[" + block.predicate.to_match() + u"]"82 elif isinstance(block, Traverse):83 template_data = {"direction": block.direction, "edge_name": block.edge_name}84 final_string += traverse_edge_template % template_data85 elif isinstance(block, MarkLocation):86 pass87 else:88 raise AssertionError(u"Found an unexpected IR block in the folded IR blocks: " u"{} {} {}".format(type(block), block, fold_ir_blocks))89 final_string += ".asList()"90 return final_string91def _construct_output_to_match(output_block):92 """Transform a ConstructResult block into a MATCH query string."""93 output_block.validate()94 selections = (u"%s AS `%s`" % (output_block.fields[key].to_match(), key) for key in sorted(output_block.fields.keys()))95 return u"SELECT %s FROM" % (u", ".join(selections),)96def _construct_where_to_match(where_block):97 """Transform a Filter block into a MATCH query string."""98 if where_block.predicate == TrueLiteral:99 raise AssertionError(u"Received WHERE block with TrueLiteral predicate: {}".format(where_block))100 return u"WHERE " + where_block.predicate.to_match()101def emit_code_from_single_match_query(match_query):102 """Return a MATCH query string from a list of IR blocks."""103 query_data = deque([u"MATCH "])104 if not match_query.match_traversals:105 raise AssertionError(u"Unexpected falsy value for match_query.match_traversals received: " u"{} {}".format(match_query.match_traversals, match_query))106 match_traversal_data = [_represent_match_traversal(x) for x in match_query.match_traversals]107 query_data.append(match_traversal_data[0])108 for traversal_data in match_traversal_data[1:]:109 query_data.append(u", ")110 query_data.append(traversal_data)111 query_data.appendleft(u" (")112 query_data.append(u"RETURN $matches)")113 fold_data = sorted([_represent_fold(fold_location, fold_ir_blocks) for fold_location, fold_ir_blocks in six.iteritems(match_query.folds)])114 if fold_data:115 query_data.append(u" LET ")116 query_data.append(fold_data[0])117 for fold_clause in fold_data[1:]:118 query_data.append(u", ")119 query_data.append(fold_clause)120 query_data.appendleft(_construct_output_to_match(match_query.output_block))121 if match_query.where_block is not None:122 query_data.append(_construct_where_to_match(match_query.where_block))123 return u" ".join(query_data)124def emit_code_from_multiple_match_queries(match_queries):125 """Return a MATCH query string from a list of MatchQuery namedtuples."""126 optional_variable_base_name = "$optional__"127 union_variable_name = "$result"128 query_data = deque([u"SELECT EXPAND(", union_variable_name, u")", u" LET "])129 optional_variables = []130 sub_queries = [emit_code_from_single_match_query(match_query) for match_query in match_queries]131 for (i, sub_query) in enumerate(sub_queries):132 variable_name = optional_variable_base_name + str(i)133 variable_assignment = variable_name + u" = ("134 sub_query_end = u"),"135 query_data.append(variable_assignment)136 query_data.append(sub_query)137 query_data.append(sub_query_end)138 optional_variables.append(variable_name)139 query_data.append(union_variable_name)140 query_data.append(u" = UNIONALL(")141 query_data.append(u", ".join(optional_variables))142 query_data.append(u")")143 return u" ".join(query_data)144def emit_code_from_ir(schema_info, compound_match_query):145 """Return a MATCH query string from a CompoundMatchQuery."""146 match_queries = compound_match_query.match_queries147 if len(match_queries) == 1:148 query_string = emit_code_from_single_match_query(match_queries[0])149 elif len(match_queries) > 1:150 query_string = emit_code_from_multiple_match_queries(match_queries)151 else:152 raise AssertionError(u"Received CompoundMatchQuery with an empty list of MatchQueries: " u"{}".format(match_queries))...
main_tournament_results.py
Source:main_tournament_results.py
1import pandas as pd2from functions import get_soup3import re4# Get tournament days5def get_tournament_days(step):6 base_url_hit = 'https://bwf.tournamentsoftware.com/sport/matches.aspx?id='7 url_hit = base_url_hit + step8 soup_html = get_soup(url_hit)9 10 t_days = soup_html.find('ul', attrs={'class':'tournamentcalendar'})11 test = t_days.find_all('li')12 13 base_hit = 'https://bwf.tournamentsoftware.com'14 url_hit_tourney_days = []15 for i in test:16 url_hit_tourney_days.append(base_hit+re.search('href="(.*)"><span', str(i)).group(1))17 return url_hit_tourney_days18def get_game_results(soup_tournament_day, game_scores):19 match_table = soup_tournament_day.find('table', attrs={'class':'ruler matches'})20 match_table = match_table.find_all('tr')21 22 for step_match in match_table:23# step_match = match_table[8]24 match_type = len(step_match.find_all('tr'))25 if match_type > 0:26 match_det = step_match27 match_step = match_det.find_all('tr')28 if match_type == 2:29 match_type = 'singles'30 p1a = match_step[0].text.split('[')[0].replace('\n', '')31 p2a = match_step[1].text.split('] ')[1].replace('\n', '')32 p1b = 'none'33 p2b = 'none'34 match_duration = match_det.find_all('td')[13].text35 # a = match_det.find_all('td')[10].text.split(' ')36 elif match_type == 4:37 match_type = 'doubles'38 p1a = match_step[0].text.split('[')[0].replace('\n', '')39 p1b = match_step[1].text.split('[')[0].replace('\n', '')40 p2a = match_step[2].text.split('] ')[1].replace('\n', '')41 p2b = match_step[3].text.split('] ')[1].replace('\n', '')42 match_duration = match_det.find_all('td')[17].text43 # a = match_det.find_all('td')[14].text.split(' ')44 if match_det.find('span', attrs={'class':'score'}) is None:45 a = 'Walkover'46 else:47 a = match_det.find('span', attrs={'class':'score'}).text48 if ((a == 'Walkover') | (a == 'No match')):49 if 'strong' in str(match_step[0]):50 scores = ['Walkover Win', 'Walkover Loss']51 else:52 scores = ['Walkover Loss', 'Walkover Win']53 build_list = {54 'p1a': p1a,55 'p1b': p1b,56 'p2a': p2a,57 'p2b': p2b,58 'p1_scores': scores[0],59 'p2_scores': scores[1],60 'tournament_id': tournament_id,61 'date': tournament_date,62 'duration': match_duration,63 'type': match_type64 }65 game_scores.append(build_list)66 else:67 a = re.sub("\s+", " ", a.strip())68 a = a.split(' ')69 for i in a:70 if i == 'Retired':71 if 'strong' in str(match_step[0]):72 scores = ['Walkover Win', 'Walkover Loss']73 else:74 scores = ['Walkover Loss', 'Walkover Win']75 else:76 scores = i.split('-')77 build_list = {78 'p1a': p1a,79 'p1b': p1b,80 'p2a': p2a,81 'p2b': p2b,82 'p1_scores': scores[0],83 'p2_scores': scores[1],84 'tournament_id': tournament_id,85 'date': tournament_date,86 'duration': match_duration,87 'type': match_type88 }89 game_scores.append(build_list)90 return game_scores91tournament_list = pd.read_csv('data/00_source/tournament_list.csv', sep="|")92skip_list = [14, 21, 32, 52, 54, 57, 61, 63, 67, 69, 88, 93, 94, 95, 96, 116, 121, 122, 126, 129, 136, 138, 149, 155, 171, 177, 178, 179, 180, 183, 188, 194, 198, 200, 201, 204, 231, 233, 237, 241, 268, 270, 271, 282, 285, 293, 298, 324, 325, 340, 343, 362, 363, 367, 371, 375, 377, 383, 387, 405, 425, 430, 437, 453, 461, 465, 471, 474, 476, 479, 491, 492, 503, 518, 520, 522, 526, 533, 534, 537, 539, 547, 559, 570, 584, 590, 594, 600, 613, 623, 624, 647, 657, 658, 659, 685, 686, 702, 723, 760, 764, 807, 815, 818, 823, 831, 835, 836, 849, 850, 858, 875, 876, 878, 881, 890, 891, 898, 936, 942, 947, 948, 972, 983, 1036, 1069, 1071, 1083, 1085, 1091, 1108, 1109, 1124, 1153, 1172, 1175, 1228, 1229, 1230, 1231, 1232, ]93for step in range(max(skip_list), len(tournament_list)):94#step = 3295 if step not in skip_list:96 # print(step)97 tournament_id = tournament_list['tournament_id'][step]98 print('{}: {}'.format(step, tournament_id))99 game_scores = []100 url_hit_tourney_days = get_tournament_days(tournament_id)101 102 test_t = get_soup(url_hit_tourney_days[0])103 104 if 'printonly flag' in str(test_t):105 106 for tourney_day in range(0, len(url_hit_tourney_days)):107 hit_tourney = url_hit_tourney_days[tourney_day]108 tournament_date = hit_tourney.split(';d=')[1]109 soup_tournament_day = get_soup(hit_tourney)110 game_scores = get_game_results(soup_tournament_day, game_scores)111 112 game_scores_df = pd.DataFrame(game_scores) 113 filename = "data/00_source/match_results_{}.csv".format(tournament_id)114 game_scores_df.to_csv(filename, index=False, sep='|', header=True)115 else:116 skip_list.append(step)...
eval_pprint.py
Source:eval_pprint.py
1#!/usr/bin/env python2from __future__ import print_function3import json4import copy5import sys6import argparse7import re8import numpy as np9basedir = "tmp_data/evaluation"10def parse_args():11 desc = 'Find highest and worse performing clusters'12 parser = argparse.ArgumentParser(description=desc)13 parser.add_argument(14 'k', type=str, default='600',15 help='Number of K to check or a bash-style range (start..end[..step])' \16 ' e.g., 100..1000..10, meaning 100 through 1000 k, with a step of 10.' \17 ' Note that step is optional and will default to 1.'18 )19 parser.add_argument(20 '--output', type=str, default='all',21 help='Output type: all or combined'22 )23 args = parser.parse_args()24 return args25def parse_range(range_str):26 if '..' not in range_str:27 return [int(range_str)]28 match_step = re.match('([0-9]+)\.\.([0-9]+)\.\.([0-9]+)', range_str)29 match_range = re.match('([0-9]+)\.\.([0-9]+)', range_str)30 if not match_range and not match_step:31 raise ValueError('Invalid range supplied %s' % range_str)32 start_k = None33 end_k = None34 step = 135 if match_step:36 start_k, end_k, step = match_step.groups()37 elif match_range:38 start_k, end_k = match_range.groups()39 return range(int(start_k), int(end_k), int(step))40def log(msg):41 sys.stderr.write("%s\n" % msg)42def load_object(filename, postprocess=json.loads):43 log("Loading object %s" % filename)44 with open(filename, 'r') as f:45 return postprocess(f.read())46def process_csv(csv_string):47 csv_lines = csv_string.strip().split('\n')48 lines = filter( lambda x: x, csv_string.split('\n'))49 data = {}50 for line in lines:51 cluster, words = line.split(',')52 data[cluster.strip()] = filter(lambda x: x, words.strip().split())53 return data54if __name__ == "__main__":55 args = parse_args()56 k_range = parse_range(args.k)57 models = [58 "fasttext_spherical",59 "word2vec_spherical",60 "fasttext_kmeans",61 "word2vec_kmeans",62 "lda_lda"63 ]64 template = {65 "eval": "%(base)s/fulltext_%(model)s_%(k)s_clusters_evaluation.json",66 "clust": "tmp_data/fulltext_%(model)s_%(k)s_clusters/clusters.csv"67 }68 cluster_scores = []69 for k in k_range:70 for model in models:71 filename_args = {72 "k": k,73 "base": basedir,74 "model": model75 }76 evalfile = template["eval"] % (filename_args)77 clustfile = template["clust"] % (filename_args)78 evals = load_object(evalfile)79 clusters = load_object(clustfile, postprocess=process_csv)80 for cluster in evals.keys():81 score = evals[cluster]['mean']82 cluster_labels = ", ".join(clusters[cluster])83 # print('cluster_eval %s' % cluster_eval)84 # print('cluster_labels %s' % cluster_labels)85 score_name = '%s-%s' % (model, k)86 if np.isnan(score):87 continue88 cluster_scores.append([score, score_name, cluster_labels])89 seen_identifiers = {}90 for score, cluster_name, labels in sorted(cluster_scores):91 if args.output == 'all':92 print('%s %s %s' % (score, cluster_name, labels))93 continue94 unique_identifier = '%s %s' % (score, labels)95 if unique_identifier not in seen_identifiers:96 seen_identifiers[unique_identifier] = []97 seen_identifiers[unique_identifier].append(cluster_name)98 if args.output == 'combined':99 for identifier in sorted(seen_identifiers.keys()):100 score, labels = identifier.split(' ', 1)101 cluster_names = seen_identifiers[identifier]...
Learn to execute automation testing from scratch with LambdaTest Learning Hub. Right from setting up the prerequisites to run your first automation test, to following best practices and diving deeper into advanced test scenarios. LambdaTest Learning Hubs compile a list of step-by-step guides to help you be proficient with different test automation frameworks i.e. Selenium, Cypress, TestNG etc.
You could also refer to video tutorials over LambdaTest YouTube channel to get step by step demonstration from industry experts.
Get 100 minutes of automation test minutes FREE!!