Filtering (masking) practice with visual feedback (AKA logic pop-it)#
This notebook gives some examples for filtering the data, i.e., selecting a subset of the data based on some query or search criteria. For example, given a dataframe of grades for the class, a typical query would be to ask Python to give us the subset of the class that earned an A, or B, etc.
Before running the code, make sure to import the needed packages. If you get an error that the packages is not found, install it using your Conda or Python installation.
import pandas as pd
import tkinter as tk
Now that we loaded the needed packages, we also need to run the following cell so that we have access to the functions we will be using. Make sure to run the following cell.
def highlight_true_false(val, mask_fill):
    color = mask_fill["mask_true_fill"] if val is True else mask_fill["mask_false_fill"]
    # color = 'green' if val is True else 'red'
    return 'background-color: {}'.format(color)
def on_rectangle_click(event, fill_colors):
    
    # color dict for background shading
    color_dict = {element: ind for ind, element in enumerate(list(fill_colors.values())[:2])}
    # get clicked object id
    item_id = event.widget.find_closest(event.x, event.y)[0]
    # get current color
    current_color = canvas.itemcget(item_id, "fill")
    # change to the complement of the current color
    new_color = list(color_dict.keys())[1 - color_dict[current_color]]
    # # change text to the complement of the current cell state    
    # txt_id = canvas.find_withtag(str(item_id)+ '_txt')
    # # then set the new text
    # new_text = bool(color_dict[list(color_dict.keys())[1 - color_dict[current_color]]])
    # # modify the text
    # canvas.itemconfig(txt_id, text=new_text, font=("Arial", 12), justify="center")
    # change the color of the clicked rectangle on the canvas
    canvas.itemconfig(item_id, fill=new_color)
def create_grid(canvas, rows, cols, data_vec, width, height, fill_colors):
    rectangle_colors = {0: fill_colors["logical_false_fill"], 1: fill_colors["indices_columns_fill"]}
    cell_width = width // cols
    cell_height = height // rows
    for row in range(rows):
        for col in range(cols):
            # find lower left corner (x,y) coordinates for the row, col cell
            x1 = col * cell_width
            y1 = row * cell_height
            # find the top right corner (x,y) coordinates for the row, col cell
            x2 = x1 + cell_width
            y2 = y1 + cell_height
            # assign an id to the created rectangle
            fill_color = rectangle_colors[(row < 1) or (col < 1)]
            rect_id = canvas.create_rectangle(x1, y1, x2, y2, fill=fill_color, outline="black")
            # add text
            text_x, text_y = (x1 + x2) // 2, (y1 + y2) // 2
            # get text value from the data vector
            text_value = data_vec[row * cols + col]
            # add text to canvas
            canvas.create_text(text_x, text_y, text=text_value, font=("Arial", 12), justify="center", tags=str(rect_id) + '_txt')
            # add the created rectangle to the canvas
            if (row > 0) and (col > 0):
                canvas.tag_bind(rect_id, "<Button-1>", lambda event, arg=fill_colors: on_rectangle_click(event, fill_colors=fill_colors))
Next we will define the data we will be working with. This is weather data for the lasning area that contains the average temparature, feels like, minimum temperature, maximum temperature, humidity, and wind speed for the years 2015-2024. Run the cell below to get the data. Later, you can come back and experiment with other types of data (see ‘Challenge Mode’ at the end of this notebook).
# weather data
weather_data = {'temp':[48, 50, 49, 48, 47, 49, 49, 48, 50, 51], 'feels_like':[44,46,46,44,43,45,46,44,46,48], 
                'temp_min': [47,48,48,46,45,47,48,46,48,49], 'temp_max': [49,51,51,50,49,51,51,50,51,52], 
                'humidity': [71,71,70,73,76,73,75,74,75,76], 'wind_speed': [8,9,9,8,9,9,8,8,8,8]}
# the index vector
ind_vec = [i for i in range(2015, 2025)]
# create the data frame
df = pd.DataFrame(weather_data, index=ind_vec)
We will create a grid that you can click on and manually perform the masking for some given query similar to the provided screenshot. You will also ask Python to do the same query. Compare the output that you’re expecting with the output you obtain from the query you coded. Are they the same? If they’re not, go back and edit your query.
Enter your query for each of the follwoing in the cell below and run it. Then run the following cell to obtain an interface to manually perform the same query. Compare the two outputs: if they’re the same, great! If not, go back and refine your query.
Write a conditional statement that will return true for all wind speeds larger than 8.
Write a conditional statement that will return true for all temperature (temp, feels_like, temp_min, temp_max) that are above 47.
Write a conditional statement that will return true for all the entries where the temperature (temp) is larger than 48, and the humidity level is larger than 70.
Note: When you run the following cell, a data frame will be shown with the mask that you created used to create different fill colors for the True/False parts of the data frame. The resulting columns will say ‘mask’ at the top because sometimes your mask combines multiple cells and outputs one columns of True/False. Feel free to change the column names of the resulting mask to something more meaningful such as the original column names, or some descriptoing of the created mask, e.g., ‘temp above 47.’
# Customization cell! Write your maks in the following cell, not this one.
# this cell allows you to customize the colors of the background fill for your mask to you liking.
# The 'False' entries will take the fill color that corresponds to the key 'mask_false_fill', and analogously for the 'True entries'
# You can use any CSS named colors here: https://developer.mozilla.org/en-US/docs/Web/CSS/named-color
mask_fill = {"mask_false_fill": "orange",   # this will be the color of the False entries of the mask
             "mask_true_fill": "blue"}  # this will be the color of the True entries of the mask
# Enter your mask here
# Remember that df[[column_name]] with two brackets returns a data frame, while a single set of brackets df[column_name] returns a pandas Series
# Query 1
mask1 = df['wind_speed']  > 8
df1 = mask1.to_frame(name='mask')
# Query 2
# Query 3
# Replace df1 with your specific dataframe name
df1.style.applymap(highlight_true_false,mask_fill=mask_fill)
# Note: If you get the warning "FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead." replace the above line with:
# df1.style.map(highlight_true_false,mask_fill=mask_fill)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[19], line 4
      1 # Enter your mask here
      2 # Remember that df[[column_name]] with two brackets returns a data frame, while a single set of brackets df[column_name] returns a pandas Series
      3 # Query 1
----> 4 mask1 = df['wind_speed']  > 8
      5 df1 = mask1.to_frame(name='mask')
      7 # Query 2
      8 
      9 
   (...)
     12 
     13 # Replace df1 with your specific dataframe name
File ~\anaconda3\Lib\site-packages\pandas\core\ops\common.py:81, in _unpack_zerodim_and_defer.<locals>.new_method(self, other)
     77             return NotImplemented
     79 other = item_from_zerodim(other)
---> 81 return method(self, other)
File ~\anaconda3\Lib\site-packages\pandas\core\arraylike.py:56, in OpsMixin.__gt__(self, other)
     54 @unpack_zerodim_and_defer("__gt__")
     55 def __gt__(self, other):
---> 56     return self._cmp_method(other, operator.gt)
File ~\anaconda3\Lib\site-packages\pandas\core\series.py:6096, in Series._cmp_method(self, other, op)
   6093 rvalues = extract_array(other, extract_numpy=True, extract_range=True)
   6095 with np.errstate(all="ignore"):
-> 6096     res_values = ops.comparison_op(lvalues, rvalues, op)
   6098 return self._construct_result(res_values, name=res_name)
File ~\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:293, in comparison_op(left, right, op)
    290     return invalid_comparison(lvalues, rvalues, op)
    292 elif is_object_dtype(lvalues.dtype) or isinstance(rvalues, str):
--> 293     res_values = comp_method_OBJECT_ARRAY(op, lvalues, rvalues)
    295 else:
    296     res_values = _na_arithmetic_op(lvalues, rvalues, op, is_cmp=True)
File ~\anaconda3\Lib\site-packages\pandas\core\ops\array_ops.py:82, in comp_method_OBJECT_ARRAY(op, x, y)
     80     result = libops.vec_compare(x.ravel(), y.ravel(), op)
     81 else:
---> 82     result = libops.scalar_compare(x.ravel(), y, op)
     83 return result.reshape(x.shape)
File ~\anaconda3\Lib\site-packages\pandas\_libs\ops.pyx:107, in pandas._libs.ops.scalar_compare()
TypeError: '>' not supported between instances of 'str' and 'int'
# prepare the data frame for the gui
df_gui = df
df_gui.reset_index(names="year", inplace=True)
# add column names as a row
df_gui.loc[-1] = df_gui.columns
df_gui.index = df_gui.index + 1
df_gui = df.sort_index()
# flatten data frame to pass to the canvas boxes later
data_vec = df_gui.to_numpy().flatten()
When you click run or hit shift_enter, you should have a window pop up that looks like this (screenshot provided). You can click on the different values to explore how your mask will work and affect the dataframe! We recommend that you drag the resulting window next to the output of the mask creation cell above so that you can compare the mask you created manuall using the window pop up with the mask that you coded using conditional statements.
# Customize the colors for the pop-it window!
# enter the color names strings using this link: https://cs111.wellesley.edu/archive/cs111_fall14/public_html/labs/lab12/tkintercolor.html
fill_colors = {"logical_false_fill": "coral", 
               "logical_true_fill": "pale green",
               "indices_columns_fill": "sky blue"}
# create the window
window = tk.Tk()
window.title("Clickable Rectangle Grid")
# set canvas width and height
canvas_width = 700
canvas_height = 300
# create canvas
canvas = tk.Canvas(window, width=canvas_width, height=canvas_height)
canvas.pack()
# set the dimensions of the grid
rows = df_gui.shape[0]
cols = df_gui.shape[1]
# create the grid
create_grid(canvas, rows, cols, data_vec, canvas_width, canvas_height, fill_colors)
# create the gui window
window.mainloop()
Challenge Mode: Try the same process but by reading in another small dataset from a file instead of using the given data. You might need to experiment with canvas_width and canvas_height to get the pop up window to display correctly.