Apply Python Filter Using Lambda Function With Multiple Arguments

Python provides filter(func, iterable) capability so you can quickly filter out rows that you want by constructing a function use lambda function. However, of all the examples I found online, the lambda function only takes one argument which is the iterator. It’s not really helpful if you cannot use a dynamic lambda with multiple arguments. Say if you have list of dicts, and you want to filter the list based on one of the key in the dict. However, you want to pass the key name and the value to the lambda function. For a simple example, you have a list like the following:

rows = [{'capacityUsed': 2, 'quantitySent': 3, 'storeId': '234'},
        {'capacityUsed': 1, 'quantitySent': 2, 'storeId': '123'}]
Read More

Evaluate Classifier Performance

An important concept in evaluating a classifier is confusion matrix. Confusion matrix is a symmetric matrix showing the number of correct and wrong classification on a set of test data for which the true values are known.

Read More

Useful Retry Function

In data ETL or query operations, API requests are frequently made towards the server. However, servers can be busy or heavily loaded. Often in that case, exceptions would be raised and your pipeline would break. Below is a useful retry decorator function that can be used to retry the API request if it failed at the first try without breaking your pipeline.

Read More

Working With Big Query

BigQuery is Google’s fully managed, petabyte scale, low cost enterprise data warehouse for analytics. Queries based on stored datasets can be used as back-end services to create beatiful reports in your front-end UI. In my daily work, I save whatever got posted back to our clients or consumed messages sent by clients to avro file. At the end of day, those avro files are loaded automatically to BigQuery. And scheduled queries on those BigQuery datasets are triggered to continuously track our KPI.

Read More