PyNoon Plus Lesson 1 - Tutorial

This tutorial will cover use of AI models, loading data from text files, and constructing DataFrames.

Setup

  1. Make a new notebook for this lesson
  2. What’s the first thing to do? RENAME IT!
  3. Name it pynoon_plus_1.ipynb

Using AI models from Python

Normally we’d have to install transformers and its dependency torch using pip, but Colab already has these installed.

from transformers import pipeline

classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
classifier(
    'one day I will see the world',
    candidate_labels=['travel', 'cooking', 'technology'],
)
def classify_text(text_to_classify):
    result = classifier(
        text_to_classify,
        candidate_labels=['travel', 'cooking', 'dancing'],
    )
    return result['labels'][0]

classify_text('one day I will see the world')

Processing a text file to produce a DataFrame

with open('titles.txt') as titles_file:
    titles = titles_file.readlines()

.readlines() has provided us with a list of strings representing each line in the file:

titles

We can use a list comprehension to transform each value in a list:

[classify_text(title) for title in titles]

We can use a list comprehension to construct a list of dictionaries, where each dictionary contains the title and its label:

title_details = [
    {
        'title': title,
        'label': classify_text(title),
    }
    for title in titles
]
title_details
import pandas as pd

title_df = pd.DataFrame(title_details)
title_df