Python: Pickle and Unpickle Tree Classifier with Hashing Vectorizer

pickle and python

I took this piece of code out of a project I am working on. I wanted to guess the tag based on keywords in the body of text. So, I take the text, apply a hash vectorizer and then pass the hashed values into a AdaBoostClassifier that uses DecisionTreeClassifier. I wanted to build it once and use it over and over again, so I used Pickle to save it on the file system to reuse.

This code assumes you have a dataframe populated already.

Includes:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
import pickle
import os.path

Setting up filesystem and parameters stuff:

resetPickle = False
foundPickle = False
"""This is where you would load the dataset"""
df_tags = pd.DataFrame()
pick_model_path1='pickles/modelAdaDecTreeClassifier.pickle'
pick_model_tags_root_pre = 'pickles/model_tag_'
pick_model_tags_root_post = '_DecTreeClassifier.pickle'
tag_pickle_path = pick_model_tags_root_pre + 'PIC' + pick_model_tags_root_post

Create HashingVectorizer. The ngrams 1,2 means that it will use words like “Richmond” and “Richmond VA” as tokens:

vctrizr_tag = HashingVectorizer(ngram_range=(1, 2))

This will check to see if the pickle exists. It will load it into the model if it exists:

if resetPickle == False and os.path.isfile(tag_pickle_path):
    pickle_in = open(tag_pickle_path,'rb')
    model_tag = pickle.load(pickle_in)
    foundPickle = True

If the pickle does not exist, it will go and train the AdaBoostClassifier and save it into the pickle:

if foundPickle == False:
    y_tag = df_tags
    vctr_tag= vctrizr_tag.transform(df_tags['Text'])
    X_tag = vctrizr_tag.transform(df_tags['Text'])
    X_train_tag, X_test_tag, y_train_tag, y_test_tag = train_test_split(X_tag, y_tag, test_size=0.2, random_state=1)
    model_tag = AdaBoostClassifier(DecisionTreeClassifier(max_depth=44),n_estimators=25)
    model_tag = model_tag.fit(X_train_tag, y_train_tag)
    score = model_tag.score(X_test_tag, y_test_tag)
    print('score',score)
    with open(tag_pickle_path, 'wb') as f:
        pickle.dump(model_tag, f)

All together now:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import HashingVectorizer
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
import pickle
import os.path
resetPickle = False
foundPickle = False
"""This is where you would load the dataset"""
df_tags = pd.DataFrame()
 
pick_model_path1='pickles/modelAdaDecTreeClassifier.pickle'
pick_model_tags_root_pre = 'pickles/model_tag_'
pick_model_tags_root_post = '_DecTreeClassifier.pickle'
tag_pickle_path = pick_model_tags_root_pre + 'PIC' + pick_model_tags_root_post
vctrizr_tag = HashingVectorizer(ngram_range=(1, 2))
if resetPickle == False and os.path.isfile(tag_pickle_path):
    pickle_in = open(tag_pickle_path,'rb')
    model_tag = pickle.load(pickle_in)
    foundPickle = True
if foundPickle == False:
    y_tag = df_tags
    vctr_tag= vctrizr_tag.transform(df_tags['Text'])
    X_tag = vctrizr_tag.transform(df_tags['Text'])
    X_train_tag, X_test_tag, y_train_tag, y_test_tag = train_test_split(X_tag, y_tag, test_size=0.2, random_state=1)
    model_tag = AdaBoostClassifier(DecisionTreeClassifier(max_depth=44),n_estimators=25)
    model_tag = model_tag.fit(X_train_tag, y_train_tag)
    score = model_tag.score(X_test_tag, y_test_tag)
    print('score',score)
    with open(tag_pickle_path, 'wb') as f:
        pickle.dump(model_tag, f)

Setup Geany for Haskell Development

Setup the build commands:
Toolbar > Build > Set Build Commands

Label: "HaskellBuild"
Command*: /opt/ghc/7.8.4/bin/ghc %f
Working directory: (BLANK)

* I used the full path to ghc because it was not resolving the path on it’s own.

Under “Execute commands” at the bottom of the build commands, you should see:

Label: "Execute"
Command: ./%e
Working directory: (BLANK)

Now, you should be able to select:
Toolbar > Build > HaskellBuild (F8)
Toolbar > Build > Execute (F8)

Using SQL Server Management Objects (SMO) in C# with Setup

First Add References. Mine are located:

C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies
- Microsoft.SqlServer.Smo.dll
- You may have to add more than just this DLL.  Try adding one at a time until it works.

Then, you should be able to access the includes:

using Microsoft.SqlServer.Management.Common;
using Microsoft.SqlServer.Management.Smo;

After this, you should be setup to transfer databases and script them out from one place to another like you are using SQL Server Management Studio. All the options should be available.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
ServerConnection sourceConnection = new ServerConnection("SOURCEPATH");
Server sourceServer = new Server(sourceConnection);
//sourceServer.ConnectionContext.LoginSecure = false;
//sourceServer.ConnectionContext.Login = "3tier";
//sourceServer.ConnectionContext.Password = "3tier";
Database sourceDatabase = sourceServer.Databases["SourceDB"];
 
ServerConnection destinationConnection = new ServerConnection("localhost");
Server destinationServer = new Server(destinationConnection);
Database destinationDatabase;
if (destinationServer.Databases.Contains("SourceDB"))
{
    //destinationServer.ConnectionContext.LoginSecure = false;
    //destinationServer.ConnectionContext.Login = "3tier2";
    //destinationServer.ConnectionContext.Password = "3tier2";
    destinationDatabase = destinationServer.Databases["SourceDB"];
}
else
{
    destinationDatabase = new Database(destinationServer, "SourceDB");
    destinationDatabase.Create();
}
 
//Microsoft.SqlServer.Management.Smo
Transfer transfer = new Transfer(sourceDatabase);
transfer.CopyAllObjects = false; //turn off because we just want tables and sp
transfer.CopyAllTables = true;
transfer.CopyAllStoredProcedures = true;
 
transfer.DropDestinationObjectsFirst = true;
transfer.UseDestinationTransaction = true;
 
transfer.CopyAllTables = true;
transfer.Options.Indexes = true;
transfer.Options.WithDependencies = true;
transfer.Options.ContinueScriptingOnError = true;
transfer.CopySchema = true;
transfer.Options.WithDependencies = true;
transfer.Options.DriAll = true;
 
transfer.Options.AnsiFile = true;
transfer.Options.SchemaQualify = true;
transfer.Options.WithDependencies = false;
transfer.Options.ScriptDrops = true;
transfer.CreateTargetDatabase = true;
 
transfer.CopySchema = true;
transfer.CopyData = false;
 
transfer.DestinationServer = destinationServer.Name;
transfer.DestinationDatabase = destinationDatabase.Name;
//transfer.DestinationLoginSecure = false;
//transfer.DestinationLogin = "3tier2";
//transfer.DestinationPassword = "3tier2";
 
transfer.Options.IncludeIfNotExists = true;
transfer.TransferData();

Be sure to double and triple check because it can wipe out the destination databases without you realizing it.

I hope this helps you out!

How To Know If CloudFlare Is Working

  1. Login to your Cloudflare console
  2. Click the gear on the url and set “Page Rules”
  3. Check “Forwarding”
  4. Url Patern: http://yoursite.com/cloudflare
  5. Destination: http://yoursite.com/ison
  6. Add Rule and wait a few minutes.
  7. Go to your url, and you should be redirected to your “ison” url.

 

GTD with Evernote

This is how I successfully GTD for a year. Even though I don’t do some of the bigger items like doing large projects or scanning every document and putting it into a folder. I still feel very accomplished when I crank through small items and close.

I read Getting Things Done by David Allen last summer around this time and have been using many of his methods ever since.

I am not going to go into details into GTD, but here are the 5 Principles:

Principle 1: Use your brain for processing not storage.
Principle 2: All items must be processed.
Principle 3: Master Workflow.
Principle 4: Five Steps of Project Planning.
Principle 5: The Power of the Next-Action Decision.

Evernote Setup:

Bookmark it on your desktop computers
Download the app for your smartphone
Install Evernote widgets (android) to be able to quickly insert items into evernote.

Create Notebooks:
@@Inbox (where items will await processing)
@Active (where items will stay till complete)
@Inactive (or closed/complete)

Create Custom Searches:
@Active tag:web
@Active tag:home
@Active tag:call

I have yet to transform every aspect of my life with GTD, but I have taken control of the smaller to-do list type items. Here are the principles I focus on:

Principle 1: Use your brain for processing and not storage.

As soon as I think of an item, I will insert a very brief reminder into an Evernote notebook called @@Inbox.
The @ symbol is so it will appear first in the list and is default.

This allows me to never have a moment when I have to say to myself “What was it I wanted to do/lookup/read/errand/remind someone?”

The items in @@Inbox will be processed later when I am not at the store or driving.

Principle 2: All items must be processed.

I open my @@Inbox notebook and go through each and every item

Then I tag. Examples : home, web, errand, call.

Add reminders to anything you plan on doing at a later date.

Move the item notebook called @Active
– Note: complete any items that take less than two minutes.

Principle 3: Master Workflow

This step is a lot more involved than I use it for, but for simple to-do list activities this is what I do:

Get reminders that I set in Principle 2

Or, Process Backlog:

If I am at home, I will pull up the Custom Search “@Active tag:home” first. This will give me all the backlog items at home.

You can bookmark this custom search so it will show up in the app at all times or you can add it to the widget.

If I just want to go through all items, I pull up @Active and look for items.
When finished, move the items to @Inactive.

This is how I successfully GTD for a year. Even though I don’t do some of the bigger items like doing large projects or scanning every document and putting it into a folder. I still feel very accomplished when I crank through small items and close them and set them to @Inactive.

New Domain: dan.folkes.me

I went ahead and took the plunge! Our family has a new domain!

I purchased folkes.me and setup subdomains for:
dan.folkes.me and micaela.folkes.me.

If you go to folkes.me, I just made a pretty lame looking landing page.

I debated doing danfolk.es, but I wanted micaela and all of our other children and animals to have the ability to have subdomains. So, I look forward to making a fat cat homepage at pyro.folkes.me.

😀

I setup a free account with CloudFlair too! It should increase speed and reduce server load. I need this because all of this is hosted on a free hosting account at Freehostia (I love Freehostia).

Get a Server’s IP that is Always Changing

So, my server at home’s IP address is always changing. Sometimes it’s sitting there broadcasting my development server’s website to the world on an IP that I don’t know.

So, when this happens, I have to go get physical access and bring up canyouseeme.org or something like it.

There are a few solutions to this problem:

  • DynDNS – A good solution, but I hate that they will cancel your service when you don’t login.
  • Get a static IP from your ISP – Costs extra money
  • Somehow, get the server to broadcast it’s address to a static place

I went with the last option. You could do this in a multitude of ways. You could have your server send it’s address in an email to you periodically or on boot. You could FTP it to a free FTP server. You get the idea.

I did this:
* danfolkes.com : that keeps it’s IP. It has PHP.
* lab.danfolkes.com : that changes it’s IP.

I setup a cron job on lab.danfolkes.com to run every 8 hours and open a PHP page:

0 */8 * * * /usr/bin/curl "http://danfolkes.com/path/to/file.php"

The page on danfolkes.com will record the last 50 IPs sent to it in a logfile:

	// danfolkes.com/path/to/file.php:
	$file = 'log/file.txt';
	$max = 50;
	$content = "\n<br/>" . date("Y-m-d H:i:s") . " " . $_SERVER['REMOTE_ADDR'] . "";
 
	$filecontents = file_get_contents($file);
	$filecontents = $content . $filecontents;
	$filecontents = implode("\n<br/>", array_slice(explode("\n<br/>", $filecontents),0, $max));
	echo $filecontents;
 
	file_put_contents($file, $filecontents);

So, if my server’s IP ever changes, there will be a nice little record of the last known IPs. I can go into my DNS settings for lab.danfolkes.com and set them to the new IP.

<!-- Log File -->
<br/>2013-07-30 12:56:09 24.125.92.189
<br/>2013-07-30 04:56:10 24.125.92.189
<br/>2013-07-29 20:56:21 24.125.92.189
<br/>2013-07-29 20:56:19 24.125.92.189
<br/>2013-07-29 20:56:15 24.125.92.189
<br/>2013-07-29 20:56:12 24.125.92.189
<br/>2013-07-29 20:55:48 24.125.92.189
<br/>2013-07-29 20:55:32 24.125.92.189
<br/>2013-07-29 20:50:04 24.125.92.189

Quick Weather App in Your URL

I wanted to make a good little weather app for myself for Richmond, VA.

I couldn’t find what I wanted, but I was able to write a little bookmarklet using Chrome and Firefox’s ability to have data:text urls.

Here is the URL:
LINK

data:text/html,<html><head><style>body,img {white-space:nowrap;float:left;}</style></head><body><img src = "http://forecast.weather.gov/wtf/meteograms/Plotter.php?lat=37.53819&lon=-77.46955&wfo=AKQ&zcode=VAZ071&gset=20&gdiff=10&unit=0&tinfo=EY5&ahour=0&pcmd=101001100100000000000000000000000000000000000000000000000&lg=en&indu=1!1!1&dd=0&bw=0&hrspan=48&pqpfhr=6&psnwhr=6" /><img src = "http://forecast.weather.gov/wtf/meteograms/Plotter.php?lat=37.53819&lon=-77.46955&wfo=AKQ&zcode=VAZ071&gset=20&gdiff=10&unit=0&tinfo=EY5&ahour=48&pcmd=101001100100000000000000000000000000000000000000000000000&lg=en&indu=1!1!1&dd=0&bw=0&hrspan=48&pqpfhr=6&psnwhr=6" /><img src = "http://forecast.weather.gov/wtf/meteograms/Plotter.php?lat=37.53819&lon=-77.46955&wfo=AKQ&zcode=VAZ071&gset=20&gdiff=10&unit=0&tinfo=EY5&ahour=96&pcmd=101001100100000000000000000000000000000000000000000000000&lg=en&indu=1!1!1&dd=0&bw=0&hrspan=48&pqpfhr=6&psnwhr=6" /></body></html>

Just put that in your URL and you will get a good Richmond Weather Graph.

The pictures are taken from:
http://forecast.weather.gov/MapClick.php?lat=37.58036&lon=-77.4874049&unit=0&lg=english&FcstType=graphical

Android – Large File Transfer – Samsung Galaxy S3 SGS

This is one way transfer large files to android when other methods do not work:

  • Turn on Developer Mode and Debugging Tools
  • Install Android SDK on your computer
  • Plug in phone and open the command prompt and type:
  • More ADB info

    PATH-TO\android-sdk-windows\platform-tools\adb.exe push PATH-TO\FILE.zip /mnt/sdcard/
  • This usually takes a second.  I would first try transferring a small file to make sure you have the correct path.

Note: The ‘/mnt/sdcard’ may be specific to my device.  If you can’t seem to find a writable folder do this:

PATH-TO\android-sdk-windows\platform-tools\adb.exe shell

then, using ls and cd commands, you can move around and try and find the path to your writable directory.

 

Hope this helps someone!