frex.utils package

Submodules

frex.utils.class_generator module

class frex.utils.class_generator.ClassGenerator(*, onto_file: str, save_dir: pathlib.Path)[source]

Bases: object

The ClassGenerator utility is used to generate python dataclasses based on an ontology’s data models. These generated dataclasses should be suitable for use with the DomainKgQueryService, as its basic query implementation relies on some properties that are automatically included in classes generated by this tool.

Dataclasses generated by this utility have some type hints, but the type hints are not extremely detailed. In particular, in cases where properties are known to point to a certain data type that is another domain object, this utility will simply add a type hint of “URIRef”. This implementation is partially based on the fact that some restrictions on property ranges in owl are difficult to parse in a meaningful way, and partially based on the fact that we can’t necessarily guarantee that a user would want to fully parse through the URIs that a given property points to. Trying to re-query and convert results to objects might also result in cycles if URIs point to each other for certain properties, to simply stopping at a point that a property refers to a URI simplifies the process.

add_restriction(*, p: owlready2.class_construct.Restriction, properties: List)[source]

Add properties based on owl restrictions. Restrictions should correspond to class restrictions in owl, such as requiring some property be filled for a class to be valid.

Parameters
  • p – The target class restriction to parse into a property for code generation

  • properties – The ongoing list of properties for the current class that is being updated

convert_to_py_class(c: owlready2.entity.ThingClass) Tuple[str, List[str]][source]

Produce a string to generate a python dataclass corresponding to the input owl class.

Parameters

c – The target owl class to generate code for

Returns

A tuple, containing the string that will be output to a file for the class and a list of

superclasses that the generated class will inherit from. The superclasses are necessary to ensure that import ordering is correct and circular import errors aren’t caused down the line.

generate_classes()[source]

Generate python dataclasses based on classes present in an ontology.

get_inner_restrictions(*, p: <module 'owlready2.class_construct' from 'c:\\users\\sola\\pycharmprojects\\frex_code\\venv\\lib\\site-packages\\owlready2\\class_construct.py'>, properties: List)[source]

Parse restrictions that are nested within a class construct. This should be getting called when a restriction that is a logical construct (AND and OR types) ocurs.

Parameters
  • p – The target class construct to parse into a property for code generation

  • properties – The ongoing list of properties for the current class that is being updated

get_property_names_and_types(c: owlready2.entity.ThingClass) List[Tuple[str, Any, str]][source]

For the target owl class, extract the property names and types that the class should have. These properties are based on owl restrictions that define the class.

Parameters

c – The target class to extract property names for

Returns

A list of tuples, ordered as (prop_name, prop_type, prop_iri).

get_superclass_names(c: owlready2.entity.ThingClass) List[str][source]

Identify all superclasses of a given owl class, and if those classes are present in the main ontology, return a list their names.

Parameters

c – The target owl class to get superclasses for

Returns

a list of class names that are valid superclasses of the target class

populate_template(*, name: str, superclasses: List[str], properties: List[Tuple[str, Any, str]]) str[source]

Populate a template for producing generated python dataclasses. The current template is based on implementations in python 3.8 - in future versions, some minor details (like keyword-only dataclasses) might be introduced, which may call for change. For the moment, templates are populated to assume that none of the dataclass’s properties have default values, and instead we will assume that the querying service will properly handle adding default values in cases where the appropriate properties weren’t returned as part of a SPARQL query.

Parameters
  • name – The name of the class to be generating

  • superclasses – A list of superclass names that the generated class should inherit from. All superclasses

in this list are expected to also be generated by this same code generation script. :param properties: A list of tuples, ordered as (prop_name, prop_type, prop_iri) corresponding to the properties that this dataclass should include. :return: A string, corresponding to the content of the new python dataclass that will be written to a file

to_snake_case(name: str) str[source]

frex.utils.common module

frex.utils.common.rgetattr(obj, attr, *args)[source]

Get an attribute of an object, allowing for the target attribute to be an attribute of some sub-object.

frex.utils.constraint_solver module

class frex.utils.constraint_solver.ConstraintSolver(*, scaling: int = 1)[source]

Bases: object

A class to perform constraint solving to produce a final solution of items using constraints on the overall set of items.

add_item_selection_constraint(*, item_a_uri: rdflib.term.URIRef, item_b_uri: rdflib.term.URIRef, constraint_type: frex.models.constraints.constraint_type.ConstraintType)[source]

Require that candidates chosen in the final solution have some relationship based on the constraint, e.g., EQ to ensure either both item_a and item_b are selected/not selected, or LEQ to ensure that if item_a is selected then item_b must also be selected. :param item_a_uri: The domain object’s URI of the first item :param item_b_uri: The domain object’s URI of the second item :param constraint_type: The type of constraint to apply for how the final items are selected :return:

add_overall_count_constraint(*, min_count: Optional[int] = None, max_count: Optional[int] = None, exact_count: Optional[int] = None)[source]

Set constraints on the total number of items chosen for the solution. This function will check for an exact count first, and if it exists it will only create a constraint for making sure the number of items assigned to the target section is equal to that quantity. Otherwise, both a min and max count of items assigned to a section can be specified.

Parameters
  • min_count – The minimum number of items to assign to the target section

  • max_count – The maximum number of items to assign to the target section

  • exact_count – An exact number of items to assign to the target section

Returns

add_overall_item_constraint(*, attribute_name: str, constraint_type: frex.models.constraints.constraint_type.ConstraintType, constraint_value: int) frex.utils.constraint_solver.ConstraintSolver[source]

Add a constraint to be applied to the entire solution. E.g., a constraint on the cost of all items chosen across all sections of the solution.

Parameters
  • attribute_name – The domain_object’s attribute to apply the constraint to

  • constraint_type – The type of constraint - i.e. ==, <=, or >=

  • constraint_value – The value to constraint the solution to

Returns

self, with a new Constraint added to the overall_constraints list

add_required_item_selection(*, target_uri: rdflib.term.URIRef)[source]

Require that the final solution selects a candidate whose domain object has the target URI.

Parameters

target_uri – the URI of the item that must be included in the final solution

Returns

set_candidates(*, candidates: Tuple[frex.models.candidate.Candidate, ...])[source]

Set the candidates that will be used to produce the solution. Candidates are expected to be produced as the output of some pipeline, which handles scoring. The solver will not handle any sort of scoring for the candidates, but rather it will produce an optimized solution based on the Candidates’ total_score (which should be computed by a pipeline) and other constraints.

Parameters

candidates – A tuple of Candidate objects, with corresponding domain_objects and scores

Returns

self, with an updated list of candidates

set_section_set_constraints(*, section_sets: Tuple[frex.models.constraints.section_set_constraint.SectionSetConstraint, ...])[source]

Set all the SectionSetConstraints that need to be solved to produce a valid solution.

Parameters

section_sets – A tuple of SectionSetConstraints that will be applied to the solution

Returns

solve(*, output_uri: rdflib.term.URIRef) Optional[frex.models.constraints.constraint_solution.ConstraintSolution][source]

Perform integer programming to solve constraints and maximize an objective function based on the total scores applied to candidates. This function expects candidates that are the result of some recommendation pipeline (i.e., candidates have scores, and problematic candidates have already been filtered out).

This will produce outputs assigning candidates to ‘sections’. A section can be thought of as e.g. a day in a meal plan, or a semester in a student’s plan-of-study.

Currently assumes that (1) the objective function is always to maximize the total score of the final output, (2) each candidate can only be a part of one section, (3) each section must have an exact number of candidates assigned to it, and (4) the order of sections does not matter.

Output_uri

The URI to attach to the output constraint solution

Returns

frex.utils.vector_similarity_utils module

class frex.utils.vector_similarity_utils.VectorSimilarityUtils[source]

Bases: object

static cosine_sim(*, comparison_vector: numpy.array, comparison_matrix: numpy.array) numpy.array[source]

Return the cosine similarity between a given vector and the rows of a matrix.

Parameters
  • comparison_vector – The vector to serve as the source of comparison

  • comparison_matrix – A matrix containing rows with which the comparison_vector will be compared

Returns

An array of cosine similarities between the comparison_vector and each row of the comparison_matrix

static get_item_vector_similarity(*, target_item: Any, target_vector: numpy.array, comparison_items: List[Any], comparison_contents: List[numpy.array]) List[Tuple[Any, float]][source]

Convert a tuple of comparison_items and their corresponding vectors into a matrix and return a list of items and scores. The shape of item content vectors is expected to be (1, N) for each item. The target item and its vector should not be contained in comparison_items or comparison_contents.

Parameters
  • target_item – The item to get similarities for. currently unused.

  • target_vector – A vector representing the target_item.

  • comparison_items – A list of other items to compare the target_item with.

  • comparison_contents – A list of vectors that represent each item in comparison_items

Returns

A list of tuples (x, y) where x is an item and y is the similarity of that item and the target_item

static get_top_n_candidates(*, candidate_score_dict: List[Tuple[Any, float]], top_n: int) List[Tuple[Any, float]][source]

Get the top N candidates out of a list of tuples, where the second index of the tuple is the item’s score. This score should typically be something like a similarity score, e.g. what comes out of the get_item_vector_similarity function.

Parameters
  • candidate_score_dict – A list of tuples (x, y) where x is an item and y is some score for that item

  • top_n – The number of items to return

Returns

A list of the top N items from candidate_score_dict in descending order

static jaccard_sim(*, comparison_vector: numpy.array, comparison_matrix: numpy.array) numpy.array[source]

Return the jaccard similarity between a given vector and the rows of a matrix.

Parameters
  • comparison_vector – The vector to serve as the source of comparison

  • comparison_matrix – A matrix containing rows with which the comparison_vector will be compared

Returns

An array of jaccard similarities between the comparison_vector and each row of the comparison_matrix

static sparse_l2_norm(*, matrix: scipy.sparse.csr.csr_matrix) numpy.array[source]

Return the l2 norm of an input csr sparse matrix. This is significantly faster and less memory intensive than simply passing the matrix to numpy.

Module contents