JavaScript app developer

One year experience as a JavaScript app developer

With a headache I couldn’t even get up, laying in bed and realizing that I am already working as a JavaScript app developer for a year now. So maybe it’s time to write some down, and time is now 😛

Using React Native is actually quite easy. You will have troubles for sure, but none of them will stop you from getting things down. React Native is guaranteed to work on all the upcoming phones in 2017.

Situation

We only have 2 front-end developers, and we are a small team(4 people). We know nothing about Android or iOS native development, but we already know a Web-view app is not going to be “Native” easily.

Two choices

We were looking for every possible way to avoid having too many knowledge learning at the beginning. You don’t want your head to be exploded when the project is not even starting yet.

But after research, there are only two choices: Native Script or React Native. And I think there are two reasons why we didn’t pick Native Script:

Community

React Native clearly is the famous one, and the company like Facebook just make you feel safe. The community is very active, and many react native libraries could be found on the GitHub.

Debug

I don’t know what situation now, but one year before when I looked at Native Script, the debug tool is clearly not useful at all. It makes me feel like if I pick this framework on our project, I have to debug JavaScript by using “alert();” function if you know what I mean.

Surprise

So our project starts with React Native. And surprisingly it went well, especially when you are using Redux with it: painful to write, easy to maintain.

One year before, we don’t know any native development at all. One year later, at least I am sure I still only know a few. I never touched native side, although there is always some problem: Android Emulator is not able to access host file, Android Emulator adds “UTF-8” in restful API requests content-type, iOS is not able to access user cell phone number, iOS is not able to send SMS in the background and so on.

But we learn fast and there are also many solutions you can find on the internet. So now our app is already in App Store and will be on Google Play soon.

At last, thanks to Kartenmacherei that giving me this opportunity to learn React Native and working with talented teammates to bring the app go live.
The journey is full of challenges and joy.

How to hack time and determine one’s Time Value

How to hack time and determine one’s Time Value?

In today’s post, I want to hypothesize how to maximize one’s time value both in the short-term and the long-term. So let’s get right into it!

Each day, we have 24 hours to live, earn money, and create value. A back-of-the-envelope calculation provides the following:

24 hours = 8 hours of sleep + 8 hours of economic activity + 8 hours of family/friends/self time

So, in the short-term, we are spending about 8 hours a day (excluding weekend) or 40 hours a week to drive economic benefit for our lives. So if a person earns $4000/month, he or she is earning $1000/week, $200/day or $8.33/work hour.

So the person’s daily economic value is $200 and hourly value is $8.33. The more your salary increases, so does your ST time value.

It can further be enhanced with passive drivers such as savings and investment.

The remaining 16 hours of our day contributes to the long-term value of our time. Maintaining a regular schedule of healthy activities contributes greatly to the long-term value of time, including lifespan, and happiness. These 16 hours create a positive loop: if you are healthy, you are happy; if you are happy, you are healthy.

We can conclude that if person spends 8 hours a day to drive economic benefit, the remaining can be allocated to sleep, family, friend and me-time and still maintain his/her salary level and lifestyle.

The key is efficient time management and fungibility between each our hour of the 24 hours we are gifted each day.

Term “fungible” is often associated with gold and currency (by association, cryptocurrency included). If something, in our example time, is fungible, it can be traded for another unit of this substance and still bear the same value. If I borrow a $1 note from someone with serial number ending in 1, it still is worth $1 no matter what kind of medium I use to repay you. I can give a different note or coins or I can transfer it digitally.

Thus, if time is fungible, i.e. any 1 hour of my time is same as any other 1 hour of my time, the ideas represented in this post holds true.

Perl Scripts to Extract Rows and Columns from Many Excel Files

I had to aggregate rows and columns from about 100 MS Excel spreadsheet files from different sheets in the spreadsheets, they were reports filed by a hundred different offices, all the same format. Each extract was in a tab delimited format to throw into another spreadsheet. The first script combines all the rows in many Excel files into one file. The second script combines all the columns in many Excel files into one file. Why is this a common problem? Well MSExcel is a cheap and common report and analysis tool, everyone knows Excel and therefore everyone uses it. It is a common solution to have a bunch of people fill out a spread sheet, it is common to need to combine the spreadsheets into one file to analyze them.

The code is quick, dirty and ugly to boot. But it worked like a charm, in a couple hours of scripting I saved some people many many hours of screwing around with the 100 spreadsheets cutting pasting or making errors.

# Get rows from many Excel spreadsheets in a directory
###################################
#! /usr/local/bin/perl -w
    use strict;
    use Spreadsheet::ParseExcel::Simple;

my $excel_directory = 'Budget';
my $out_directory = 'xxxout';
opendir(EXCELDIR, $excel_directory) || die ("no excel directory");
my @excelfiles = readdir(EXCELDIR);
closedir(EXCELDIR);

chdir($excel_directory);
       my $LPHname;    # String to hold Local Public Health Name.
       my @sheetarray; # Array to hold the row.
       my $sheetcount; # Array element in the row.
       my $sheetname; # Name of the Excel spreadsheet.
       my $sheettemp;  # Temporary string to hold row for join.
       my $cellnumber;  # Cell number in the row.
       my $cellwanted;  # Cell number in the row.
       my $rowwanted;  # Row number wanted.
       my $county_namecell;  # Cell for county name.
       my $county_namerow;  # Row for county name.
foreach my $exxfilename (@excelfiles){
   if ($exxfilename =~ /^\.+.*/) { next; }
   my $xls = Spreadsheet::ParseExcel::Simple->read($exxfilename);
   foreach my $sheet ($xls->sheets) {
       $sheetname= $sheet->{sheet}->{Name}; # Sheet Name
       if ($sheetname !~ '2007 Budget') { next; }
       $sheetcount=0;
       $county_namecell=11;
       $county_namerow=1;
#      $cellwanted=4;
       $rowwanted=11;

       while ($sheet->has_data) {
            my @data = $sheet->next_row;
            $sheetcount++;
         if ($sheetcount==$county_namerow){
            $cellnumber=0;
            foreach my $ttcell (@data) {
                $cellnumber++;
                if ($cellnumber != $county_namecell ){next;};
                 $sheettemp=$sheetarray[$sheetcount];
#                 $sheetarray[$sheetcount]=join("\t",$sheettemp,$ttcell);
                 $LPHname=$ttcell;
            }
         } 

#        if (($sheetcount < ($rowwanted-1)) || 
               ($sheetcount > ($rowwanted+7))){next;}
         if ($sheetcount != $rowwanted){next;};
         $cellnumber=0;
         $sheetarray[$sheetcount]=join("\t",$sheettemp,$LPHname);
         foreach my $ttcell (@data) {
            $cellnumber++;
#            if ($cellnumber != $cellwanted ){next;};
            $sheettemp=$sheetarray[$sheetcount];
            $sheetarray[$sheetcount]=join("\t",$sheettemp,$ttcell);
         }
       }
    }
    foreach my $sheetline (@sheetarray){
        print $sheetline,"\n";
    }
}
exit


###############################################################
# Column extract.
# Get columns from many Excel spreadsheets in a directory
###############################################################

#! /usr/local/bin/perl -w
    use strict;
    use Spreadsheet::ParseExcel::Simple;


my $excel_directory = 'TEST';
opendir(EXCELDIR, $excel_directory) || die ("no excel directory");
my @excelfiles = readdir(EXCELDIR);
closedir(EXCELDIR);

chdir($excel_directory);
       my @sheetarray; # Array to hold the row.
       my $sheetcount; # Array element in the row.
       my $sheetname; # Name of the Excel spreadsheet.
       my $sheettemp;  # Temporary string to hold row for join.
       my $cellnumber;  # cell number in the row.
       my $cellwanted;  # cell number in the row.
       my $rowwanted;  # row number wanted.
       my $county_namecell;  # cell for county name.
       my $county_namerow;  # row for county name.
foreach my $exxfilename (@excelfiles){
    if ($exxfilename =~ /^\.+.*/) { next; }
    my $xls = Spreadsheet::ParseExcel::Simple->read($exxfilename);
    foreach my $sheet ($xls->sheets) {
       $sheetname= $sheet->{sheet}->{Name};


# Name the sheet to take stuff out of.
       if ($sheetname !~ '2007 Budget') { next; }
       $sheetcount=0;
       $county_namecell=11;
       $county_namerow=1;
       $cellwanted=2;
       $rowwanted=5;

       while ($sheet->has_data) {
            my @data = $sheet->next_row;
            $sheetcount++;
          if ($sheetcount==$county_namerow){
             $cellnumber=0;
             foreach my $ttcell (@data) {
                $cellnumber++;
                if ($cellnumber != $county_namecell ){next;};
                $sheettemp=$sheetarray[$sheetcount];
                $sheetarray[$sheetcount]=join("\t",$sheettemp,$ttcell);
             }
          }
          if (($sheetcount < ($rowwanted)) || ($sheetcount > ($rowwanted+5)))
             {next;}
#column boundary starting from rowwanted and getting cellwanted column.
#         if ($sheetcount != $rowwanted){next;};
             $cellnumber=0;
             foreach my $ttcell (@data) {
                $cellnumber++;
                if ($cellnumber != $cellwanted ){next;};
                $sheettemp=$sheetarray[$sheetcount];
                $sheetarray[$sheetcount]=join("\t",$sheettemp,$ttcell);
          }
       }
    }
}
foreach my $sheetline (@sheetarray){
    print $sheetline,"\n";
}
exit

Example: Wrap Oracle SQL in Unix Shell

The basic idea is to wrap SQL in a Unix shell language to generate SQL scripts. Use scripts to write more scripts, yeah…

#!/bin/ksh
#
# Make a two dimension array of time interval and successful 
# course registrations during the interval.  Intervals can be 
# second, minute, hour, day.

# Count up timestamps with associative array then filter with sort command.
counter () {
    /bin/nawk '
        {time_array[$0]++}
        END {for (time in time_array) print time, time_array[time] }' |
    sort -nr
}

# Usage message.
usage () {
   echo
   echo Usage: $0 's m h d date-string'
   echo "s=second, m=minute, h=hour, d=day"
   echo "date-string=11-MAR-98"
   echo Example: $0 'h 11-SEP-98'
   echo
   exit
} 

#######################################################
# Main

export ORACLE_HOME=/home/oracle/app/oracle/product/7.3.2
export ORACLE_SID=p_web

COMMENT=""
DAY=\'$2\'

case $1 in
  s) INTERVAL=\'YYYY-MM-DD:HH24:MI:SS\';;
  m) INTERVAL=\'YYYY-MM-DD:HH24:MI\';;
  h) INTERVAL=\'YYYY-MM-DD:HH24\';;
  d) INTERVAL=\'YYYY-MM-DD\'; COMMENT="--" ;;
  *) usage;;
esac

case $2 in
  "") usage;;
  *) ;;
esac

# format string in SQL: select to_char(entry_date,'YYYY-MM-DD:HH24:MI:SS')
sqlplus << EOF | counter
name/passwd  
    set pages 0 feed off echo off
    select to_char(entry_date,$INTERVAL)
            from feestmt
$COMMENT    where to_char(entry_date) = $DAY
            order by 1;
EOF
#################### cut here ######################

Script to Find Referenced Files in HTML, ColdFusion, and Other Languages

The Problem: Missing References in ColdFusion Code

Some rip-off contractors dumped a pile of trash filled ColdFusion code on some totally naive (or crooked) project managers. Not a process ran without errors, including many 404 errors in the web server logs. The project manager told me to make it work. The code came in a big wad of multiple directory trees and many thousands of files, many types of file, no installation package, no build tools.

I decided to run a bunch of lint type searches on the ColdFusion files for each type of error, (like missing references,) to document the bugs. I found thousands of errors: missing file references, unused files, empty files, empty directories, as well as many coding errors that indicated the contractor never tested the code.

A project that uses common software engineering processes (specifications, testing to specifications, bug tracking, build tools, packaging code for release, regression testing and release testing) would not find much with a lint script, usually lint type programs give out a bunch of false positive errors, but on this bunch of code there were so many real errors I would have been happy to see a false positive.

Here is an example of a bash shell script to find missing referenced files from  tags in HTML, it is easy to modify for other file reference tags, like “link” tags for css, “script” tags for javascript, “form” tags, “a” tags, ColdFusion “cfinclude” tags, etc. And I did do that and found hundreds of missing references for every type of tag I checked. It is a quick and dirty script, but it was useful for the situation.

Missing File Reference Script for Tag Based Languages

#######################
# Note: use the HTML source of this page to copy the script!!
#       The browser hides some of the script when it interprets the HTML!!

# find files that do a call to  image file
# Uses (dirname, basename) to get paths

# Root directory of code files.
LOCATION=docs

find_img () {

cd ${LOCATION}
pwd

CHECK_FILES=$(find . -name "*.htm*" -exec grep -ci '\<img '="" {}="" \;="" -print="" |="" grep="" '^\.')="" for="" file="" in="" $check_files="" do="" references="$(nawk" \<\!\-\-="" ,="" \-\-\="">/    {next;} 
     /\<[Ii][Mm][Gg] /, /\>/ { 
                              sub(/^.*[Ss][Rr][Cc]=/,"");
                              $0=$1;
                              gsub(/"/,"");
                              print;}' ${FILE} | 
                                 egrep -i 'gif|jpg|jpeg|png' | sort -u  )

     for REF_FILE in ${REFERENCES}
     do
         DIRECTORYROOT=$(dirname ${FILE})
         ls ${DIRECTORYROOT}/${REF_FILE} >/dev/null 2>&1
         [ $? != 0 ] && print "file:${FILE}     missing:${REF_FILE}";
     done
done  
}

find_img

# Here are some other type of file references that can use
# a modified form of this script:
#cfinclude template = "../cfdocs/dochome.htm"
#link rel="prefetch" href="/images/big.jpeg"
#link rel="StyleSheet" href="CSS/default.css" type="text/css" 
#form method="get" action="/some/form_script/form_stuff.php"
#script src="missing_javascript_file.js" type="text/javascript"
#a href="missing_link" ...

Restart Baan licd License Daemon

Baan has design flaws in its license scheme, there is no way to free a license that is left by a hung session or disconnected session that did not go through the logout process. Because the number of licenses is finite and expensive after a while all the licenses are used up. The Baan software uses this “feature” to sell more licenses to ignorant customers. The solution is to restart the license daemon, licd to free all the licenses, but this has the disadvantage of booting everyone off the Baan system for several minutes until the license daemon is restarted.

When using this shell script the sessions are rubbed out and unable to connect for a couple minutes as the licd port connection takes a while to die off. Used on Sun OS. Usually run when no one should be using the system, like during a planned shutdown.

###################### cut here ##########################
#!/bin/ksh
# Restart the licd6.1
#
# Baan has design flaws in its license scheme.  
# Licenses stay in a used state and can only be freed by 
# restarting the license daemon, licd.
# NOTE: run as root


# Check if the Baan licd process exists, and print the pid.
check_licd () {
   test $(ps -ef| grep licd6.1 | grep -v grep | /usr/bin/nawk '{print $2}') &&
   print "PID of licd6.1: " "$(ps -ef| grep ${LICD_NAME} | grep -v grep | /usr/bin/nawk '{print $2}')"
}


# Check if the TCP port licd uses is not connected to processes.
check_port () {
   test $(netstat -a | egrep "${LICD_HOST}\.${LICD_PORT} " | wc -l) -ne 0 && return 0
}

###########################################
# Main

export BSE=/apps/baan/bse
export BSE_TMP=/apps/baan/bse/tmp
export LICD_NAME=licd6.1              # Name of the license process
export LIC_MONITOR_NAME=licmon6.1     # Name of the license monitor process
export LICD_PORT=6003              # Port that the licd uses
export LICD_HOST=hostname          # Host of licd

${BSE}/bin/${LIC_MONITOR_NAME} -k;    # Kill the licd 

print "Checking ports every 20 seconds..."
while check_port
do
 sleep 20
 print "Number of connections: $(netstat -a | egrep "${LICD_HOST}\.${LICD_PORT} " | wc -l)"

done

print "Start the licd"

${BSE}/bin/${LICD_NAME};
check_no_licd

###################### cut here ##########################

ColdFusion SQL String In CFQUERY Tag Single Quote Problem

Escaped single quotes mess up SQL used in the cfquery tag as described in ColdFusion single quote string screwup.

The behavior described in the page above also affects ColdFusion MX 7: When a string variable is put into a cfquery tag any single quotes are changed to TWO single quotes, ( note, not a double quote). This makes creating SQL strings difficult because they do not work inside of the cfquery tag, for example a typical SQL string with should have a single quote around a string value but inside the cfquery tag the string would become:

select * from mytable where first_name=”Joe Bob”

NOTE: it is 2 single quotes not a “double quote”, this does not work as SQL. Worse yet, it looks like a “double quote” problem and so it is a masked error that is harder to figure out.

This error is one of the most annoying bugs I have seen in Cold Fusion, it was in ColdFusion 6.1 MX from years ago and still plagues every release. What is wrong with Adobe Systems that they let a fundamental problem like creating SQL scripts from string variables used in almost every Cold Fusion application be such a burden to developers for years and years? I always wondered why the construct of a simple string for SQL was such a convoluted pile of worm crap in all the Adobe docs for ColdFusion. The docs hide that it does not pass a single quote properly, no wonder PHP cleaned their clock and took the web scripting market years ago. So what that CF limps along as a JVM based language, it is a crappy JVM based language. Switch to JRuby if you want a JVM based language.

ColdFusion SQL String Single Quote Hack

My method: use the string function REReplace to change the two single quotes, ”, to one single quote, ‘, but this must happen inside the cfquery tag after the string is evaluated, not outside of the cfquery tag when the 2 single quotes are not yet present.

.cfset string_variable="select * from mytable where first_name='Joe Bob'">

.cfquery name="string_with_single_quote" datasource="schema_user" result="my_result">
            #REReplace(string_variable,"''","'","ALL")#
./cfquery>

I prefer the regular expression replace to the PreserveSingleQuotes function in ColdFusion because at least I have some idea what the regular expression is doing instead of just some black box. Big whoop, now you can make a string to pass SQL to an RDBMS with ColdFusion. Just do not forget this lame crutch or your application will be impossible to create. The convoluted ColdFusion documentation just does not cover this basic problem

Redesign of MNsure.org that uses MN Dept. of Revenue to Prescreen All Citizens

MNsure.org Website Failed and Never Improved

No testing of the site occurred before launch Oct. 1st, 2013. Many people could not make accounts or finish applications. Completed applications could not be processed, insurance companies could not get information from MNsure.

The worst problem was that nothing improved. The software process is a shambles, the design complex and unworkable, the contractors incompetent and venal and still no testing has ever occurred. MNsure’s Director April Todd-Malmlov resigned mid-December in an uproar of scandal.

The error filled difficult to use incomprehensible website and the disincentive of having people come to the awful website on their own initiative and fill out pages and pages and pages of jargonized information that is then attempted to be “verified” to slightly different information already collected is an exersize in futility and a guarantee of low participation. Few people can complete any transaction from the current website. It is like a Republican voting scheme: a design proven to turn people away.

MN Department of Revenue and Department of Human Services Already Have Information Needed to Get Most Enrolled.

The information is already collected. Social Security Number, dependents, income, address, food stamps, etc. Why collect it again?

Of the 450,000 people estimated to be without health insurance in Minnesota maybe 300k or more would qualify for MA and MNCare. The MN Department of Revenue has both Federal IRS and Minnesota tax forms. People could be pre-screened for Medical Assistance (Medicaid), MNCare and QHP (20% profit ripoff private insurance) tax subsidies.

County services and the MN Dept. of Human Services (DHS) would then have a list of people that qualify for MA/MNcare but do not yet have health care. People would then be contacted to verify information and finish enrollment by County workers and DHS through a website or paper form. Limited information would be needed to verify or update the prescreened information.

UPDATE: 3/31/2014
And here I thought I was so smart. West Virginia and 3 other states already scanned their citizens existing information and signed them up. From Charles Gaba’s ACASignups.net blog on ACA numbers 73% of the uninsured of West Virginia were signed up and no crush of people at the website:
“the reason West Virginia nearly doubled the projected number of enrollment was by identifying potential participants using information on existing food stamps and Medicaid applications. Only one of four states with this kind of auto enrollment, the process garnered around 118,000 enrollees.[in W. Virginia]”

The stupid wasteful advertising expenses of MNsure could be cut, we already know who qualifies. Verification time would be shorter than now (40 days is common) and most people could get enrolled without a problem.

Simplified Health Program Eligibility Would Speed the Process

Because of the complex obtuse eligibility rules for health programs or updates to information at Revenue there will be a fraction of people that prescreening can not easily verify. Simplified eligibility rules for MA and MNcare tailored to Department of Revenue information and update cycles would help keep that fraction small. Because of prescreening many of the difficult cases would be known beforehand and special handling could be prepared.

Pre-Screened Tax Subsidies Would Even Sell Ripoff 20% Profit Private Insurance (QHP)

With everyone prescreened people could check the website with only limited information to see eligibility for subsidies and amounts for policies that qualify without filling out pages and pages and pages of forms.

Single Payer Health Care Would Be Even Easier

Of course the most obvious improvement to health care for all is extending Medicare for all. No complex eligibility, no one left out.

MNsure.org Enrollment Lags Compared To Successful State Exchanges

The MNsure.org site is known as a website wobbling on the edge of failure. It is not known as a successful state health insurance exchange with its history of enrollment problems, Director resignations, shoddy venal contractors, never tested before opening for business, lack of improvement on the site until March 2014 when they started fixing missing pages and removing broken scripts. A few thousand users overwhelm the site and then users get maintenance screens and unauthorized errors. An outside contractor, Optum, hired to assess the website and software raised the possibility of scrapping the software altogether and starting over calling the software process broken with MNsure unable to make fixes to the site as of January 2014. The website can only run 18 hours per day and has had many outages, some for as long as 5 days. 20% of enrollment is from paper forms that really only got started in January 2014. The backlog of completed applications has been as high as 27,000 in March 2014 and the time interval of the backlog can be measured in months for many.

Maryland’s website software is going to be abandoned and use Kentucky or Connecticut’s version for its exchange. This is notable because it shares contractors and licensed software from IBM Curam and EngagePoint with MNsure. Maryland was able to get an effective paper application process started and enrolled 295,000.

The terrible website has definitely stifled enrollment, the question is how much has it been suppressed? My data is from ACASignups.net, the only blogger tracking national ACA enrollment state by state. I am looking at percent of uninsured enrolled and percent of total population enrolled.

  • Kentucky, population 4.4 million, 350,000 people have enrolled by March 27, 18% uninsured about 792,000. 44% enroll/uninsured, 8% enroll/population
  • Washington, population. 6.9 million, which has enrolled 500,000 March 28, uninsured is 1/7 or 14% or about 942,000. 53% enroll/uninsured, 5.5% enroll/population
  • Connecticut, population 3.6 million, 178,600 enrollments March 27, 337,000 uninsured. 53% enroll/uninsured, 7% enroll/population
  • Rhode Island, population 1 million, 69,400 enrollments March 8, 16% uninsured 140,000. 50% enroll/uninsured, 7% enroll/population
  • Minnesota, population 5.4 million, 150,000 enrollments March 27, 9.1% uninsured, about 490,000. 31% enroll/uninsured, 3% enroll/population

Enrollment Suppression Numbers at MNsure

Minnesota numbers of enrolled/uninsured and enrolled/population are quite a bit less than the successful states, I am guessing it is the terrible website. Here is what it would be if Minnesota matched the proportions of the successful state exchanges:

At 40% enrolled/uninsured enrollment would be 196,000 or 46,000 suppressed.
At 50% enrolled/uninsured enrollment would be 245,000 or 95,000 suppressed.

At 4% enrolled/population enrollment would be 216,000 or 66,000 suppressed.
At 5% enrolled/population enrollment would be 270,000 or 120,000 suppressed.
At 6% enrolled/population enrollment would be 324,000 or 174,000 suppressed.
At 7% enrolled/population enrollment would be 378,000 or 228,000 suppressed.

So between 46,000 and 228,000 suppressed enrollment from a bad website. These numbers are only estimates but it can be fairly certain that 10’s of thousands did not enroll at the website from the screwups at MNsure. And remember 20% of the finished applications are paper, not on the website, I am not counting that against MNsure.

4 States Auto Enrolled People

See Prescreening People to Enroll in Health Insurance to see how W. Virginia doubled projected enrollment by using data already gathered from food stamp programs and previous Medicaid applications to auto-enroll citizens using HealthCare.gov, the Federal website.
West Virginia, population 1.85 million, 105,000 Medicaid only enrollments, 14% uninsured 259,000, 41% enrolled/uninsured, 6% enrolled/population.

I do not know the other 3 auto enrolling states that used pre-screening processes, let me know if you find out.

MNsure Lags on Processing Paper Applications Too

Maryland’s, population 5.9 million, used 200 additional clerks to process forms and enrolled 295,000, MNsure, with a similar population, only had 50 clerks and 170,000 enrollment. The lack of a robust paper process also suppressed MNsure enrollment. (Minnesota had the fewest additional clerks from April 3, 2014 testimony to Congress Committee chaired by Rep Lankford of exchanges from CA, HI, MN, OR, MD, MA)

Single Payer Health Care Would Be 100% Enrollment

Of course the most obvious improvement to the website would be extending Medicare for all. No complex eligibility rules, no one left out, may not even need a web site to sign up.

Shell script to find processes with lots of CPU time

Here it is, a shell script.

###################### cut here ########################
#!/bin/sh
# By Steven.Hauser-1@tc.umn.edu
# Find non root processes with more than TLIMIT minutes of CPU time.
# For machine Sun Solaris 2.6 

# MINUTES equals the first parameter or 5 if no parameter.
MINUTES=$1
MINUTES=${MINUTES:=5}

AWK=/usr/bin/nawk

ps -ef |
$AWK 'BEGIN {TLIMIT='$MINUTES'}
      $7 ~ ":"  { split ($7,t,":"); 
                  if (t[1] >= TLIMIT) {print $0 }}
      $8 ~ ":"  { split ($8,t,":");
                  if (t[1] >= TLIMIT) {print $0}}' | 
grep -v root | sort 

###################### cut here ########################

Oracle Database Tuning, MySql, Postgresql, Informix: Quick and Dirty Database Performance Tuning the RDBMS

Quick results are the point of this article, not optimum effect, here are quick and dirty database performance tuning methods. Oracle database tuning, Informix tuning, DB2 tuning, Postgres, MySql tuning? It does not matter what RDBMS you use these days as database performance tuning is all pretty much the same or at least this method of finding and fixing database performance problems is the same. I would use the this same approach for Sybase tuning and SQL Server tuning though I have never administered either product. RDBMS tuning is pretty much all the same because database software is pretty much the same these days. Any feature that is useful in one RDBMS is quickly copied by other database software and database tuning is pretty similar across the products. All RDBMS (Relational Database Management Systems) have an engine, an optimizer, SQL, run on some OS, an applications interface to the RDBMS, most have stored procedures and so they are more similar than different.

What Is Addressed In This Article.

Database configuration, disk layout and the operating system are usually under the configuration management control of the database administrator and system administrator and this article addresses what you can control. Many applications are controlled by third party vendors and subject to processes that are outside the administrator’s control so applications are not addressed in detail in this article.

Here are the heuristics listed in order of effort expended:

Analyze the Whole System, Not Just the Database.

Switch your point of view to examine the whole system, not just the database. This may be the easiest and most effective tuning technique.

You may not have to tune the database at all. At one customer site, shipping labels were not generated fast enough; the multiple CPU Unix system was maxed for hours using all the I/O and CPU resources. “The database is too slow,” the customer told me. “Go tune it.” I looked at the database statistics and also the operating system statistics. That’s when I noticed many processes spawning per minute from the Unix “at” command (a basic batch scheduling utility).

It was clear that the database was not the cause of this problem. The application scheduled a batch job for each shipping label with the “at” command, which had the Unix OS thrashing when it tried to generate and schedule a hundred thousand batch jobs to make labels. After changing a few lines in a shell script to generate the labels without the “at” utility the label generation took a tenth of the time. The database was not changed.

Look at net traffic, OS stats, DB stats, I/O stats. The system is a whole, so compare the stats together to get a view of what the system is doing over an interval of time. The statistics many times reflect the boundary between layers of software. I/O stats may show database activity as shown by the OS, CPU context switches may show how other processes affect database processess. It is the whole system that must perform well, not just the database, so make a well tuned system by taking a wide look at what is going on.

The Database is Only One Layer in the System, Check the Layer Interfaces.

Keep in mind that the application is separate from the database and the database engine is separate from the OS it runs on and that OS connects to a network. This layer architecture is a way I think of the performance problems that tuning will fix. Many problems are where the layers interface or how well the system is integrated.

Track All the Problems that Affect Database Performance.

Make a list of down times, batch job failure, backup failure, slow performance, etc., in the network, OS, applications and the database. Look for patterns and relationships in the statistics and the failures. A system is a house of cards that needs a solid foundation. Lack of backups, lack of OS resources, network problems, a lack of security and poor configuration management may mask or compound any database performance problems. Tuning just the database is futile if the system crashes every hour. From the problem tracking list find the easiest things to fix with the most payoff. Usually there are a couple quick and effective things that can be done to get performance improvements.

Many times the quick and effective fixes may not be database problems but are a problem or mix of problems in other parts of the system or configuration management. Examples I have seen: OS file systems that are too full, shared memory misconfiguration, software integration problems between the OS, database and third party applications such as missing software patches or other dependencies, these can be seen a software layer interface problem. Lack of configuration management of software going into production systems is a problem that many overlook. Adding software to production systems without controls can degrade performance or stop systems altogether.

Check Configuration Files for Obvious Problems.

Again, check OS, network, application configurations and user configurations, not just the database configuration. Look up parameters that you do not understand. I have seen old configuration files from previous releases of software that used out of date parameters or lacked new parameters that make the software perform well. One system had new fast NIC cards set to run at an old low network speed, they forgot to add the new fast settings in the configuration file.

Read the release notes that come with the database software for OS configuration parameters on the hardware used and note the needed OS and application patches for the particular configuration being used. And and then make sure the database and OS match the release notes of the applications that are run on the system. This chain of dependencies can be crippling to performance if every link is not in place.

Run the Database Statistics Jobs to Help RDBMS Optimizer Performance.

One basic improvement to performance in databases such as Postgresql, Oracle, DB2 and Informix is the cost based optimizers that generate query plans by looking at statistics from the data such as number of rows in a table and distributions of values. But the statistics used by the optimizers are not usually generated by default and many times I have seen no statistics jobs run in a database installation and poor performance is a result.

Check that the database optimizer statistics are run as recommended. Oracle “ANALYZE” (release 7), “DBMS_STATS” (8+), Informix “UPDATE STATISTICS”, Postgesql “vacuum analyze” commands create the statistics used by the SQL optimizers to choose fast access methods to the data. Statistics only have to be run when the data changes enough to make a difference, running statistics jobs is resource intensive so remember to run statistics only when needed, the object is to improve and tune database performance not cripple it with excess statistics processes.

SQL Can Affect Performance by Magnitudes of 10.

Most RDBMS collect statistics from the RDBMS process memory to monitor the database while it is running. These statistics are accessed with functions that are presented as views and can be accessed with SQL. With PostgreSQL, the monitoring statistics functions and views have information about the SQL statements run, the number of times a table or index is scanned, and more. Same with Oracle, DB2, Informix and the rest. Use these monitoring statistics to find the SQL statements that are the real system problems. Typically, plenty of SQL statements perform badly but only a few are the real resource problems; find those few that cause the most problem and put your effort into fixing those.

Start by identifying what SQL is run and why the SQL is run. Sometimes reports are run at the wrong times on an OLTP application system or a batch SQL job can be easily rescheduled to not conflict with another application that is using lots of resources on a server.

Identify the SQL statements that are slow and schedule these to run at times that do not crush the rest of the system. Run ‘set explain on’, (Informix), or ‘set autotrace on;set timing on’, (Oracle), to see what the optimizer is doing to SQL that you are checking.

Fix Indexes That Affect the Bad SQL Code

Sometimes indexes are controlled by applications and cannot be administered easily. But usually there is no problem manipulating the indexing. The first thing is to check that indexes are set up for slow SQL statements, many times there is no index used by the problem SQL statement because the application or data distribution has changed but not indexing on the tables affected. I have seen magnitudes of 10 improvement in performance with the addition of a good index or adding an attribute to an index.

There are lots of quick fixes that can be checked for indexes. Does the index contain an attribute which has nulls in the domain? Then the index will not be as effective in comparisons, maybe a default value can be used to get rid of the nulls. Data attributes that have a small domain, uneven distribution of values, and nulls are not usually good choices for attributes of an index.

Here is an Oracle script to check the values of index attributes to evaluate performance. Adding an attribute with a larger domain of values or dropping attributes that are not useful or even dropping an index to reduce overhead are all options. Sometimes indexes need rebuilding if they are in tables that have changed a lot over an interval of time.

There are different types of indexes and index organization to affect performance problems. Bitmap indexes can be used for attributes with low cardinality especially if there is an uneven distribution across the attribute domain. Because of costly updates they are mostly used in tables that do not get updated but are read only such as a database warehouse application. I have had problems with the Oracle 7.3.4 optimizer recognizing a bitmap index without a hint which makes it almost worthless without changing the code in an application that may not be under the control of a database administrator.

Parallelism is Usually Default “Off” in a Database. Turn It On?

If you have multiple CPUs and many disks some degree of parallelism can be easily configured, but it is too much detail for each database to cover here. Check your manuals, environment variables and configuration files for both sorting and SQL parallelism. I have seen many database installations where there were easy to use parallelism features that noticeably improved performance after a couple line configuration file change and a restart. Some parallelism features can make performance problems though, carefully evaluate them and check that a particular parallel feature can be turned on for a particular time or set of transactions and turned off at other times.

Table Partitioning can Make a Large I/O Problem Small.

Table partitioning is a method of storing tables in separate parts of the disk to reduce the size of the data that needs to be accessed. Analysis of SQL that accesses large tables may reveal a table or index partition scheme that can get rid of a bottleneck by shrinking access to a small part of the table, the rest is partitioned into rarely accessed pieces that are basically dormant. An example is partitioning a table on a “date” attribute for queries that access data by date, if the table partition is by year or month and the usual query looks for recent dates then the rest of the table is not searched. I have seen a speed up of response by magnitudes of 10 for specific SQL statements using this technique. The more developed RDBMS products have several table and index partitioning methods. There is a cost to administer more partitions so evaluate this technique carefully.

Spread the Database Over Many Disks and I/O Channels for Parallel I/O.

– If you have a huge Hitachi or EMC disk array get a lot of cache on the disk array. Make sure you do not contend with other applications pounding the i/o on the array. Evenly cover the array with everything to spread the i/o and use the full array i/o capacity.

– If you have a smaller set of disks then do the traditional database disk configuration. Separate the system catalogs from the various types of logs in different database partitions. (Informix- logical and physical logs,) (Oracle- rollback, redo and archive logs,) there are similar schemes for all the rest of the types of RDBMS software. Separate very active tables from each other. Separate indexes from tables. Table and index partitioning over multiple disks will help parallelism and i/o performance, however, disk reorganization is a lot of work. Separate disk space for the database logs is probably the most payoff, followed by the most active tables, but this move should be researched carefully to assess a high enough return on the investment.

Prevention is Easier Than Fixing Problems.

Help the developers to stop future problems by getting involved in design and code reviews. Look for such things as many connections made and terminated to the database over a network when one connection can be made, or a connection pooling scheme could be used, lack of database error trapping, huge joins with gobs of I/O that are sent over networks to fat clients for additional processing. Show the developers how to analyze SQL for performance and avoid long transactions.

Your Perfectly Tuned System Just Changed

Systems change. Data changes, software is upgraded, new hardware and applications are added and old parts of the system are removed. A whole system approach to RDBMS tuning and administration helps you be aware of the changes, calculate the risks and anticipate problems.

An older version of this article: Quick and Dirty RDBMS Tuning is on O’Reilly’s Onlamp.com site.

Export Oracle Data Definition Language (DDL) Scripts and Extract Oracle DDL

Oracle is very anal retentive about getting usable and complete data definition language (DDL) out of its database servers before release 9i and still provides very limited schema utilities. This was written to extract the data definition language (DDL) for database development projects and configuration management and the end result is a runable DDL script to create a database or a user schema.

How to do it:
Create an Oracle exp (export) file without the data.
Use the Oracle imp (import) utility to get the DDL (the show option) from an Oracle exp (export) file to make a text file.
Then use the shell-awk filter below on the text file to create the DDL script can be used by sqlplus or some SQL interpreter.

#!/bin/ksh
# Make DDL usable for Oracle svrmgrl or sqlplus from the Oracle 
# export/import utilities.
# More stuff at http://www.sofbot.com and http://www.tc.umn.edu/~hause011/

# How to do it: Use Oracle imp (import) utility to get the DDL from  
# an Oracle exp (export) file.  Then filter textfile with this awk. 
# The resulting DDL can be used by sqlplus or used to convert 
# to a different database product.
 
# Example:
# exp name/password full=y rows=n ...
# imp name/password show=y full=y file=xxxnamexx.dmp > textfile
# ugly_DDL.ksh textfile > bunch_of_SQL.sql

# Note: to remove storage declarations uncomment the line:
# '# remove_tablespace' near the bottom of this script.

# NOTE: Configure which awk to use right after "# Main" below. 

# Known Bugs: 
# DATE: 1/3/1999 
# PROBLEM: Line to long for awk to process
# FIX: Use gawk instead of awk or nawk. Or convert to perl.

# DATE: 1/4/1999 
# PROBLEM: arnold.schommer@io-computer.de has shown that for 
#     PL/SQL code blocks, especially Package declarations,
#     if there are comments starting with -- , in some - not all - cases, 
#     the script joins the following row.
# FIX: None at this time.  A work-around would be to edit the output of the
#     PL/SQL code before running it in an interpreter.
#


# Find lines with beginning 'space doublequote' and length < 78.
# Remove beginning 'space doublequote' and end doublequote, add space at end
# as they end at whole tokens.
# Find lines with beginning 'space doublequote' and length = 78.
# Remove beginning 'space doublequote' and end doublequote
# Break at the beginning of DDL statements. 
get_statements () {
    $AWK '
      /^ "/ { if (length($0)==78) {gsub("\"$","",$0);} 
                 else  {gsub("\"$"," ",$0);};  
              gsub("^ \"","",$0); 
              sub("^CREATE ","\nCREATE ",$0);
              sub("^ALTER ","\nALTER ",$0);
              sub("^COMMENT ","\nCOMMENT ",$0);
#              sub("--","\n--",$0);
#              sub("PROCEDURE ","\nPROCEDURE ",$0);
#              sub("FUNCTION ","\nFUNCTION ",$0);
              sub("^GRANT ","\nGRANT ",$0);
              print }
    ' 
}

# Grab whole blank line delimited records, put a semicolon at the
# end of a record then join the broken tokens by removing \n in the records.
join_tokens () {
    $AWK '
          BEGIN { RS="" } 
          { gsub("\n","",$0); }
          { print $0";\n" }
    ' 
}

# Remove tablespace and extent declarations.
remove_tablespace () {
    $AWK '
         { sub(" PCTFREE.*;",";",$0); }
         { print }
    ' 
}

# Fix line lengths; sqlplus and vi have problems with long lines.
semicolon_break () {
    $AWK '
         /;.*/ && /CREATE TRIGGER/ { gsub(";",";\n",$0); }
         /;.*/ && /CREATE PACKAGE/ { gsub(";",";\n",$0); }
         /;.*/ && /CREATE FUNCTION/ { gsub(";",";\n",$0); }
         { print }
    ' 
}

# Fix line lengths; sqlplus and vi have problems with long lines.
comma_break () {
    $AWK '
         length($0)> 1024  { gsub(",",",\n",$0); }
         { print }
    ' 
}

# Usage message.
usage () {
    echo;
    echo 'Usage: '$0 ''
    echo;
    exit 
}


###################################
# Main
#
AWK=/usr/bin/awk; export AWK

test $# -eq 1 || usage

cat $1|
get_statements |
join_tokens |
#remove_tablespace |
semicolon_break |
comma_break

#############################################

MNsure.org Simple Design Problems and Solutions

First, the MNsure.org site is a terrible design, everyone who has used it knows that from their own experience and what is more it has never been user tested to fix the user interface problems. After using the site and complaints I have seen online here are a few serious problems:

  • People are having trouble creating an account with the rigamarole they put you through with bogus security questions sucked from your credit history.
  • Bad code that fails on some web browsers and not others.
  • There are many questions about insurance jargon and policy coverage and comparing health coverage, eligibility.

Finding Quick Fixes to Increase Enrollment

Getting a few user tests done by some local user interface professionals could be done in a few days. Some basic user goals and scenarios could be tested and improved within a week. A couple suggestions to get started:

  • finding a navigator to assist enrollment
  • getting the paper form to use instead of the website
  • finding out the status of an application without having to call
  • shopping for insurance without creating an account.

The same approach can be taken to improve website speed and performance. Local web site performance consultants can be hired for a day or two to assess problems, propose quick fixes and get some improvements.

Will this approach fix all the problems at MNsure? NO. There are difficult software problems of billing, interface to insurance companies, process flow of the website, bugs in the code, etc. But there are quick improvements that can be made, the Federal site, HealthCare.gov did some of these already.

Quick Examples of Scenarios and Some Heuristic Fixes

Finding Navigators to Help Apply

For many people the web site is just too much of a barrier and they need alternative methods to sign up and/or help from Navigators.

There is a prominent link on the front page to become a “Navigator” and a teeny link that goes to a page that goes to another teeny link for the list of “assisters” which are the same as a navigator but now called something else. There are many citizens needing help negotiating the rules to get health coverage. When I look on Google for “MNsure Navigator List” I do not see the list at the top of the page and the list is not apparent and not among the choices on the result page.

  • Get rid of the “assister” label and call them all Navigators or the other way around. Calling navigators assisters and assisters navigators just makes it hard to find the list.
  • Instead of just a big button on the home page for joining as a Navigator add a big button to point to the list of Navigators so people get help enrolling by actually finding Navigators.

Finding Paper Forms to Apply

At least if you look up “MNsure paper form” on Google you find it number one on the result page. But on the home page it is located under a teeny little link. It needs a great big button on the home page.

Get Rid of the Big Pictures and Popup Window on the Home Page

Put important info “above the fold” instead of below worthless pictures. The popup window is an annoyance that should go away and its info should be on the home page.

Single Payer Health Care Would Be Even Easier

Of course the most obvious improvement to the website would be extending Medicare for all. No complex eligibility rules, no one left out, may not even need a web site to sign up.

Create a Database Instance in Oracle

Before 8i there was no easy way to get the scripts to create databases in Oracle. Like it was a state secret or something. Larry, you are a moron.

Oracle seems to have a lack of basic utilities, like schema extraction, etc. Some of this DDL can be seen in a “full” database export, but gee whiz, Oracle seems to revel in making it difficult to use their expensively licensed product.

#################### cut here ##########################
REM * Database instance create script used with Oracle 8.0.5
REM * Start the  instance (ORACLE_SID here must be set to ).
REM * Use svrmgrl so admin procedures get run correctly.
-- Shutdown instance, remove all datafiles, control files, lockfiles,
-- logfiles to rerun and recreate the same instance

set termout on
set echo on
spool create.log

connect internal
startup nomount pfile=/apps/oracle/product/8.0.5/dbs/initumnp.ora

create database "umnp"
    maxinstances 8
    maxlogfiles  32
    maxdatafiles 400
    character set "US7ASCII"
    datafile
	'/vol01/oradata/umnp/system01.dbf'	size  100M
    logfile 
	('/vol05/oradata/umnp/redo01.01',
	'/vol06/oradata/umnp/redo01.02') size 20M,
        group 2
        ('/vol05/oradata/umnp/redo02.01',
        '/vol06/oradata/umnp/redo02.02') size 20M,
        group 3
        ('/vol05/oradata/umnp/redo03.01',
        '/vol06/oradata/umnp/redo03.02') size 20M,
        group 4
        ('/vol05/oradata/umnp/redo04.01',
        '/vol06/oradata/umnp/redo04.02') size 20M,
        group 5
        ('/vol05/oradata/umnp/redo05.01',
        '/vol06/oradata/umnp/redo05.02') size 20M ;


@/apps/oracle/product/8.0.5/rdbms/admin/catalog.sql
@/apps/oracle/product/8.0.5/rdbms/admin/catproc.sql
@/apps/oracle/product/8.0.5/rdbms/admin/catparr.sql


create rollback segment temprbs
   tablespace system
   storage (initial 25K next 25K minextents 2 maxextents 99);

alter rollback segment temprbs online;

drop tablespace rollback01;
create tablespace rollback01 
 datafile '/vol07/oradata/umnp/rollback01_01.dbf' size 128M reuse
  default storage (initial 1m next 1m maxextents unlimited pctincrease 0);

drop tablespace temp01 ;
create tablespace temp01 
 datafile '/vol01/oradata/umnp/temp01_01.dbf' size 128M reuse
  default storage (initial 2m next 2m maxextents unlimited pctincrease 0) 
  temporary; 

drop tablespace user01 ;
create tablespace user01 
 datafile '/vol02/oradata/umnp/user01_01.dbf' size 100M reuse
  default storage (initial 100k next 100k maxextents unlimited pctincrease 0); 

create rollback segment R01
 tablespace rollback01
 storage (initial 1M next 1M optimal 10M
          minextents 20 maxextents 250);

create rollback segment R02
 tablespace rollback01
 storage (initial 1M next 1M optimal 10M
          minextents 20 maxextents 250);

create rollback segment R03
 tablespace rollback01
 storage (initial 1M next 1M optimal 10M
          minextents 20 maxextents 250);

create rollback segment R04
 tablespace rollback01
 storage (initial 1M next 1M optimal 10M
          minextents 20 maxextents 250);

create rollback segment R05
 tablespace rollback01
 storage (initial 1M next 1M optimal 10M
          minextents 20 maxextents 250);


alter rollback segment r01 online;
alter rollback segment r02 online;
alter rollback segment r03 online;
alter rollback segment r04 online;
alter rollback segment r05 online;

alter rollback segment temprbs offline;

REM * Alter SYS and SYSTEM users, because Oracle will make SYSTEM
REM * the default and temporary tablespace by default, and we don't
REM * want that.
REM *
alter user sys temporary tablespace temp01;
alter user system default tablespace user01 temporary tablespace temp01;

REM * Now run the Oracle-supplied scripts we need for this DB
REM *
@/apps/oracle/product/8.0.5/rdbms/admin/catblock.sql 
@/apps/oracle/product/8.0.5/rdbms/admin/catio.sql 
@/apps/oracle/product/8.0.5/rdbms/admin/dbmspool.sql

REM * All done, so close the log file and exit.
REM *
spool off
exit

################## cut here ###################################

MNsure.org PR Propaganda and Enrollment Projections

One of the first actions of the new Director Scott Leitz was to hire a full crew of PR hacks to improve the tone of the media coverage of MNsure. They have tried to do several propaganda pushes of blatantly fake stories as well as missed opportunities to tell actual successes.

MNsure.org PR Propaganda On Website Uptime

In the MNsure Board reports website uptime is reported many weeks as 100%. This is even though MNsure.org is closed 6 hours every day or 25%.

When I do math this means that the website is open for business 75% of the time. In Oct-December MNsure.org closed most of the weekend as well as 8 hours/day on week days. The standard for a website is 24/7, not bankers hours. The metrics should show the improvement from a horrible October to a slight improvement in March, not a phoney “100%” for almost every time period from October to March. Other ACA state websites and the Federal Healthcare.gov have no downtime scheduled everyday, so this is a deliberate fogging of IT problems MNsure.org.

MNsure.org PR Propaganda Line On MNcare Enrollment

Another fake story is the equating of MNcare (state subsidized) enrollments as “they should be counted as QHP” [private for profit ripoff insurance] enrollments because the program is not available in all other states. The population served by MNcare is up to 200% of the Federal poverty measure. There is no way people can pay for high deductible 20% profit insurance at that income level. The PR hacks should have gone for the “Big Lie” and declared the expanded Medicaid (MA) enrollments not available in the Republican controlled states should also be counted as QHP enrollments, then they would have met or exceeded the goals of MNsure QHP enrollment. They could have had another “100%” success story.

The disturbing part of the MNcare story is the plateau of enrollments and the backlog of 27,000 finished applications. It seems that MNsure is trying to drive MNcare and Medicaid applicants into the QHP markets by holding up applications instead of processing finished applications.

MNsure.org PR Propaganda Line On QHP Enrollment and March 31 Deadline

Another PR problem is the relentless concentration on getting QHP enrollment without a website or support program that makes that enrollment possible and the PR line that enrollment stops March 31st.

The reality is that the QHP market is small and consists of people who are jobless, low income, work independently or do not get health insurance as part of a job. Of the ~500K people without insurance in Minnesota over 1/2 qualify for Medicaid or MNCare. The uncertain tax subsidies are not enough inducement at the higher incomes to keep customers once they try the horrible website and compare that experience to other private markets for insurance.

The terrible website, complex rules to get a tax subsidy for QHP badly implemented at MNsure, the long backlog for finished applications, no paper application option until about January, lack of phone support and delays in training and deploying adequate navigator support mean that most of the higher income individuals in the individual insurance market have gone to private insurance brokers to buy policies. Only in the last few weeks has phone support been adequate to deal with the crappy web site. Only in the last few weeks have there been enough Navigators and Navigator marketing to assist people signing up.

Actually there is no March 31st deadline for most people. MA and MNcare have NO DEADLINE and will continue signing up citizens! It is only private insurance that will stop signing up suckers, I mean customers.

The missed success story is the larger than expected Medicaid (MA) and MNcare enrollment that is expanding in proportion of enrollments every month and is almost 72% of enrollments. This is almost Single Payer insurance and the critical mass of people signing up (300,000+?) will make Single Payer a reality as combined with VA and Medicare Single Payer will begin to control the market in Minnesota.

I expect that MA/MNcare will be 90-95% of enrollments by next October when QHP enrollment opens again and then few QHP will sign up. People without insurance usually do not have the money to buy private 20% profit ripoff insurance. In a few years as employers drop insurance benefits from more and more jobs and changes to the system take place I expect Single Payer MA/MNcare enrollments to increase as demands for Single Payer are made by the middle class.

Backlog is the Hidden Story With MNsure PR

The huge backlog of unprocessed finished applications is the story ignored by MNsure and the media even though the Legislative Auditor has repeatedly critiqued the problems of the Dept. of Human Services (DHS) past performance with its backlog in Health Services over many years. MNsure seems to be using this backlog to drive citizens into the private insurance market to get its fees as citizens eligible to get MA and MNcare get desperate to avoid tax penalties and get health coverage.

Single Payer Health Care Would Be Even Easier

Of course the most obvious improvement to health care for all is extending Medicare for all. No complex eligibility, no one left out.

70% Enrolled In MA And MNCare Single Payer. People Reject Private For Profit Ripoff Insurance.

Data is from a MNsure press release Feb 21, 2014 Citizens Reject Private Ripoff Insurance

    31,088 Minnesotans selected private health plans
    21,574 Minnesotans enrolled in MinnesotaCare
    48,682 Minnesotans enrolled in Medical Assistance (MA)
    101,344 total enrollments

MA+MNCare=Single Payer or (48,682+21,574)/101,344 = 69%
It is obvious that people reject the crooked private insurance that can extract a profit of 20% from citizens. This rate is up from 66% Single Payer on Feb 12, more and more people choose the single payer alternatives.

1/5 Finished Applications Stuck, 2/5 Accounts Never Get Health Care From MNsure

From Feb 12 document: https://www.mnsure.org/images/Bd-2014-02-12-Dashboard.pdf page 3 labeled “Applications and Enrollment through MNsure”:

“161,589 accounts created”
“118,2668 applications submitted”, A Typo on the web page, my guess is 118,268.
“92,498 enrollments”

From this information we have the following conclusions:

118,268 – 92498 = 25,770 completed applications have not yet been enrolled or 25,770 / 118,268 = 21.7% or over 1/5 cannot enroll after completing an application. That is a massive failure rate to enroll completed applications, no reasons given.

161,589 – 92,498 = 69,091 accounts have not yet enrolled 42.7% or over 2/5 have not made it through the process.

Something is seriously wrong with MNsure when 2/5 fail to get health care. $150 million in resources were thrown away on non-working software and expensive IT contractors at $350/hr for non-working processes. For $20 million 200 case workers could have manually enrolled the people for a savings of $100+ million including a simple site to let people submit application forms online. This is unbelievable that after 3 years the leadership of MNsure is allowed to continue to work for the state and do this to citizens. April Todd-Malmlov the Director who resigned in disgrace did not work alone. Plenty of other idiots helped destroy this program and should pay with their careers.

Simple Single Payer Must Be the Solution

I would rather have a simple single payer health care system, no eligibility rules, no 20% profit insurance sold by crooks, everyone pay just like Medicare like the rest of the civilized world.

Oracle Export and Import Compress Scripts

Here are two scripts that export and import databases and make compressed files for Oracle. Tested on Oracle 8.0.5 and Sun 2.7 If the resulting compressed file is too big for the file system you can run the file through the Unix utility “split” to make smaller files.

#!/bin/ksh
# Export with flags (full, consistent) to a compressed file that 
# is named by SID-month-day using Oracle exp
##############################################
# Main

# Configure environment
. oracle_environment.ksh   # Oracle environment.
export ORACLE_SID=XXX1

DIRECTORY=/apps/home/oracle/local; export DIRECTORY
FILENAME=${DIRECTORY}/Exports/${ORACLE_SID}-`date +%d`.dmp.gz; export FILENAME
LOG=${DIRECTORY}/Exports/${ORACLE_SID}-`date +%d`.log; export LOG
PIPE=${DIRECTORY}/export_pipe.dmp; export PIPE

test -p ${PIPE} || mknod ${PIPE} p
chmod +rw ${PIPE}

cat $PIPE | /usr/local/bin/gzip > ${FILENAME} &
${ORACLE_HOME}/bin/exp log=${LOG} buffer=10485760 file=${PIPE} consistent=yes fu
ll=yes << EOF
system/password 
EOF
#############################################################



#!/bin/ksh
# Oracle imp import for gzipped Oracle exp file.
##############################################
# Main

# Path of exp file.
FILENAME=$1; export FILENAME

PIPE=export_pipe.dmp; export PIPE
test -p ${PIPE} || mknod export_pipe.dmp p
chmod +rw ${PIPE}

/usr/local/bin/zcat ${FILENAME} > ${PIPE} &
${ORACLE_HOME}/bin/imp buffer=10485760 file=${PIPE} full=yes log=impzip.log << EOF
system/password
EOF
#########################################################

Free Unix Shell Statistical Spam Filter and Whitelist

Client Side Unix Shell – AWK with updating email address “Whitelist”

I now use a “Statistical Spam Filter”. Wow, the scummy sewer of internet mail is cleansed, refreshed and usable again. Just using the delete button was getting too difficult, I got 8 to 10 spam for every good piece of mail. As a spam detector I am not as good a filter as you might think, just the subject and address is not always enough, an anti-spam tool I am not, I would occasionally open a spam to my great annoyance.

My interpretation of Paul Graham’s Spam Article

My filter was inspired by Paul Graham’s article about a Naive Bayesian spam filter. The article is at “A Plan for Spam”. He basically says that you get statistics on how often tokens show up in two bodies of mail, (spam and good,) and then calculate the a statistical value that a single mail is spam by looking at the tokens in it. The more mail in the good and spam mail bodies, the better the filter is “trained”. Jeez, he made it sound so easy. And it is. I slapped an anti-spam tool together as a ksh and awk script for use as a personal filter on a Unix type system. To implement it I put it in the ~/.forward file. The code is at the bottom of the article, less than 100 lines for the training script and less than 100 lines for the filter. The total code for the filter and training script is less than 200 lines, including comments, and it is less than 6000 characters.

This filter differs in lots of ways from the Paul Graham article. I took out some of the biases he describes and simplified it, maybe it is too simple. What I find most interesting is that the differences do not seem to matter much, I still filter out 96+% of spams. I got those results with a spam sample that is at least 500 emails and a good email sample that is at least 700 emails. With smaller training samples or a different mail mix it may not get as good results, or it may be better. Note: I later changed the training body to be more like the proportion of real spam to good mail, which is much more spam than good mail, about 8-10 spam to every good mail received and the anti-spam tool worked better.

How the Spam Filter Works in Unix

First I run the training script on two bodies of mail, ~/Mail/received (good mail) and ~/Mail/junk (saved spam mail.) The ~/Mail/received file is already created on my unix box and holds mail that I have read and not deleted. The training script finds all the tokens in the emails and gives them a probability depending on how the token is found in the “spam mail” and the “good mail”. The training script also creates the whitelist of addresses from the “good mail.” As the mail flows through the system the training script will then “learn” each time it is run.

I run the actual spam filter script from the .forward file which allows a user to process mail before it hits your inbox. (Look up “man forward” at the shell prompt for further information on the .forward file.) The script first checks the whitelist for a good address, if it is found it passes the filter. If the address is not found it is passed to the statistical spam filter, the tokens are checked and the email is given a spaminess value. Above a certain value the email is classified as spam and put in the ~/Mail/junk file, below the value it passes to /var/spool/mail/mylogin where I read it as god intended email to be read, with a creaky old unix client. However, I can still read it with any other client I want, POP or IMAP.

Testing the Spam Filter

I included a little test script below that I used to check my results. I just split emails into files and run them one at a time and check the value the filter gives.

Testing on email that has been used to train the filter will give results that are very good and not valid, so I tested on email not seen by the training script. The filter does get much better at filtering as the training sample gets bigger, just like the other statistical spam filters. For example, at lower sample sizes (trained with 209 good mails, 301 spammails) the filter was pretty bad. When the average spam value cutoff was raised to .51 so no good mail was blocked, 44% of the spam email passed through on a set of 320 spam and 683 good email. Even so, that means %56 of the spam was blocked. Small sample sizes are not perfect, but are usable and I began using the mail filter with a sample set of about 600 good mail and 300 spam. As the training sample increased the results improved. As I changed the mail mix to reflect the real spam proportions it got even better, around 96-98% of the spam blocked. I think the lower early results were because of the proportions of spam to good email, they should reflect the real proportion received on the system used by the filter.

Paul Graham or others may have superior filters and better mathematics for anti-spam algorithms but I am not sure that it matters all that much, the amount of spam that gets through is small enough not to bother me.

Filter Performance

I used gawk in the filter and checked it with the gawk profiler to look for performance problems. The largest performance constraint is creating the spam-probability associative array in memory, the key-value pairs of tokens and the spam value I assign to them. Creating this associative array is more that 95% of the current time to process an email through the filter and gets worse when the set of tokens gets larger. Perl and other language users can get around this performance problem with DBM file interfaces, currently not available to my gawk filter.

White List Filter Improves Performance and Cuts Errors

I added a “whitelist” of good email addresses, a feature that helps keep good email from a bad classification and improves performance by a huge amount (at least a magnitude of 100) by not having to further filter the message. The white list is not one of the “challenge-response” things that annoys me so much that I toss any such email away, it simply learns from the email used to train the filter, it saves addresses that are from email that has passed the filter and gets in my “received” file. I figure that if I receive a good email from someone, chances are 100% that I want to receive email from that address. Note there is a place in the white list script to get rid of commonly forged email addresses, like your own address.

Why Differences With Bayes Filters Do Not Matter

The main concept put forward by Paul Graham holds true and seems ungodly robust: applying statistics to filter spam works very well compared to lame rule sets and black lists. My program just proves the robustness of the solution; apparently any half-baked formula (like what I used) seems to work as long as the base probability of the tokens is computed.

Here are some of the many differences between this filter and the filter in the Paul Graham article in no particular order of importance:

– I do not lower-case the tokens, one result is that token frequency is set to three instead of five to be included in in the spam-probability associative array. I think that “case” is an important token characteristic.

– “Number of mails” is replaced with “number of tokens.” My explanation is that I am looking at a token frequency in an interval of a stream of tokens. It seems simpler to think of it that way, instead of number of mails. And when I tried “number of mails” I got the same result values on the messages for the formula I used.

– “Interesting tokens” were tokens in the message with a spam statistic “greater than 0.95” and “less than 0.05” Easy to implement. I did not figure out the fifteen most interesting tokens, the limit used by Paul Graham. As a result, most of my mail has more than 15 interesting tokens, a few have fewer, which could be a weakness, but does not seem to matter too much.

– Paul Graham’s Naive Bayesian formula goes to 0 or 1 pretty quickly, which is fine, I tried it out in awk too. But now I just sum the “interesting token” probabilities and divide by the number of “interesting tokens” per message. Yes, it is just an average of the probability of “interesting tokens” and it is easy to implement and spreads the values over a 0-1 interval, spam towards 1 and good mail towards 0. I did this to implement some spam filtering as soon as possible. Even with a small sample of mail I was able to adjust the average probability value up to keep all the good mail and still get rid of a good proportion of spam. As I acquired more sample mail the filter caught more spam and I adjusted the average probability value down.

– I have a “training” program that generates the token probabilities and an address “whitelist” to be run as a batch job at intervals (like once a day or week) and a separate filter program run out of “.forward”

– I did not double the frequency value of the “good tokens” to bias them in the base spam probability calculation of each token.

– Tokens not seen by the filter before are ignored. Paul Graham gives them a 0.4 probability of spaminess. Most other methods of calculating the probability of unknown tokens end up being ignored by my formula as they would have a probability outside the “interesting token” ranges.

– I noticed that Paul Graham ignores HTML comments. When I looked at some of the spam I found out why, some spammers load recipient address and common words into HTML comments spread through the text to pass rule filters but the statistical spam filter seems to find them anyway so I include tags, comments, everything.

Try out the Spam Filter

############################## cut here ##################

#!/bin/ksh
# Script to Test the SpamFilter
# Note: Do not test mail that has been used to train the filter,
#       test mail not seen by the training program.
filter_test () {
  # Split a file of unix email into many mail files with this:
  cat ~/Mail/rece* |csplit -k -f good -n 4 - '/^From /' {900}

  # Run a modified filter that displays the spam value for each mail file.
  # I just commented out the last part of the filter and added a 
  # print statement of the Subject line and spam value the filter found.
  for I in test/good*
  do
     cat $I | [filter_program-that_shows_the_value_only]
  done | sort -n 
}

############################## cut here ##################
#!/bin/ksh
# Training script for the SpamFilter.
# Call from the command line or in a crontab file.

number_of_tokens (){
  zcat $1 | cat $2 - | wc -w
}

# Note: Get rid of addresses that are commonly forged at the
#       "My-Own-Address" string.
address_white_list (){
  zcat $1 | 
  cat $2 - | 
  egrep '^From |^Return-Path: ' | 
  nawk '{print tolower($2)}'| 
  nawk '{gsub ("<",""); gsub (">","");print;}'| 
  grep -v 'My-Own-Address'| 
  sort -u > ~/Mail/address_whitelist
}

# Create a hash with probability of spaminess per token.
#       Words only in good hash get .01, words only in spam hash get .99
spaminess () {
nawk 'BEGIN {goodnum=ENVIRON["GOODNUM"]; junknum=ENVIRON["JUNKNUM"];}
       FILENAME ~ "spamwordfrequency" {bad_hash[$1]=$2}
       FILENAME ~ "goodwordfrequency" {good_hash[$1]=$2}

    END    {
    for (word in good_hash) {
        if (word in bad_hash) { print word, 
            (bad_hash[word]/junknum)/ \
            ((good_hash[word]/goodnum)+(bad_hash[word]/junknum)) }
        else { print word, "0.01"}
    }
    for (word in bad_hash) {
        if (word in good_hash) { done="already"}
        else { print word, "0.99"}
    }}' ~/Mail/spamwordfrequency ~/Mail/goodwordfrequency 

}

# Print list of word frequencies
frequency (){
  nawk ' { for (i = 1; i <= NF; i++)
        freq[$i]++ }
    END    {
    for (word in freq){
        if (freq[word] > 2) {
          printf "%s\t%d\n", word, freq[word];
        }
    } 
  }'
}
# Note: I store the email in compressed files to keep my storage space small,
#       so I have the gzipped mail that I run through the filter training 
#       script as well as current uncompressed "good" and spam files.
#       
prepare_data () {
  export JUNKNUM=$(number_of_tokens '/Your/home/Mail/*junk*.gz' '/Your/home/Mail/junk')
  export GOODNUM=$(number_of_tokens '/Your/home/Mail/*received*.gz' '/Your/home//Mail/received')
  address_white_list '/Your/home/Mail/*received*.gz' '/Your/home/Mail/received'

  echo $JUNKNUM $GOODNUM

  zcat ~/Mail/*junk*.gz | cat ~/Mail/junk - |
    frequency|
    sort -nr -k 2,2 > ~/Mail/spamwordfrequency
  zcat ~/Mail/*received*.gz | cat ~/Mail/received - |
    frequency|
    sort -nr -k 2,2 > ~/Mail/goodwordfrequency

  spaminess| 
    sort -nr -k 2,2 > ~/Mail/spamprobability
  # Clean up files
  rm ~/Mail/spamwordfrequency ~/Mail/goodwordfrequency 
}

#########
# Main

prepare_data
exit

########################### Cut Here ####################
#!/bin/ksh
# Spamfilter using statistical filtering.
# Inspired by the Paul Graham article "A Plan for Spam" www.paulgraham.com
#
# Implement in the .forward file like so:
#      "| /Your/path/to/bin/spamfilter"

# If mail is spam then put in a spam file
# else put in the good mail file. 

spamly () {
/usr/bin/nawk '

   { message[k++]=$0; }

   END { if (k==0) {exit;} # empty message or was in the whitelist.

         good_mail_file="/usr/spool/mail/your_user";
         spam_mail_file="/Your/home/Mail/junk";
         spam_probability_file="/Your/home/Mail/spamprobability";
         total_tokens=0.01;

         while (getline < spam_probability_file)
            bad_hash[$1]=$2; close(spam_probability_file);

         for (line in message){ 
           token_number=split(message[line],tokens);
           for (i = 0; i <= token_number; i++){
             if (tokens[i] in bad_hash) { 
               if (bad_hash[tokens[i]] <= 0.06 || bad_hash[tokens[i]] >= 0.94){
                  total_tokens+=1;
                  spamtotal+=bad_hash[tokens[i]];
                }
              }
            }
         }

         if (spamtotal/total_tokens > 0.50) { 
            for (j = 0; j <= k; j++){ print message[j] >> spam_mail_file}
            print "\n\n" >> spam_mail_file;
         }
         else {
            for (j = 0; j <= k; j++){ print message[j] >> good_mail_file}
            print "\n\n" >> good_mail_file;
         }
   }'
}

# Check whitelist for good address. 
# if in whitelist then put in good_mail_file
#   else Pass message through filter.
whitelister () {
  /usr/bin/nawk '
      BEGIN { whitelist_file="/Your/home/Mail/address_whitelist";
              good_mail_file="/usr/spool/mail/your_user";
              found="no";
              while (getline < whitelist_file)
              whitelist[$1]="address"; close(whitelist_file);
      }
      { message[k++]=$0;}
      /^From / {sender=tolower($2); 
            gsub ("\<","",sender);
            gsub ("\>","",sender); 
            if (whitelist[sender]) { found="yes";}
      }
      /^Return-Path: / {sender=tolower($2); 
            gsub ("\<","",sender);
            gsub ("\>","",sender); 
            if (whitelist[sender]) { found="yes";}
      }
      END { if (found=="yes") { 
               for (j = 0; j <= k; j++){ print message[j] >> good_mail_file}
               print "\n\n" >> good_mail_file;
            }
            else {
               for (j = 0; j <= k; j++){ print message[j];}
            }
      }'
}

#####################################
# Main
# The mail is first checked by the white list, if it is not found in the
# white list it is piped to the spam filter.
whitelister | spamly 
exit

KSH – AWK CGI Script Example

Yeah, you can use just about anything to code a cgi and this proves it, a Korn shell and awk cgi for a document search on a small site.

How to Download the Script

To get the script below save the source of this page as it renders the html bits in a browser so “what you see is not what you want.” # Korn shell cgi for a grep search of files in directories. # log the search host and query. log_search () { print 'QUERY_STRING='$QUERY_STRING >> ${LOGFILE} print 'REMOTE_ADDR='$REMOTE_ADDR >> ${LOGFILE} print 'REQUEST_URI='$REQUEST_URI >> ${LOGFILE} } # HTML for top of search result page. html_header () { print 'Content-type: text/html\n\n' print '' print 'HOME' FORM_STRING=$( echo ${WORDS} | ${AWK} '{gsub( "\\|","\+");print $0}' ) cat << EOF

 

EOF } # Result html links and words in context. html_links_and_context () { ${AWK} ' BEGIN {home="http://www.your.domain/"} {path=$0; sub (/\/your\/path\//,"",path); print ""path"
"; }' | sort -u -t '&' -k 0,1 } # Get the query string from one of the environment variables. get_query () { ${AWK} '{ n=split ($0,EnvString,"&"); for (i = n; i>0; i--) { # debug print EnvString[i]; m=split (EnvString[i],formlist,"="); valuelist[formlist[1]]=formlist[2]; }; # debug for (y in valuelist) { print y, valuelist[y] }; x=gsub(/\%../," ",valuelist["query"]); # remove funny stuff. x=gsub(/[^ |a-zA-Z0-9]/," ",valuelist["query"]); x=gsub(/[\*\+]/," ",valuelist["query"]); print valuelist["query"]; }' } # Search information for usage. search_info () { print 'Search Tips: Only letters [A-Z] [a-z] [0-9] and space ' print 'can be used in the search, other characters are removed.
' } # If the search fails get the search words. search_fail () { print 'Fail to find search: '${REQUEST_URI} >> ${LOGFILE} print '
NO RESULT: try each term separately, fewer terms or partial words.
' for term in $WORDS do print '   '$term'' done print '' exit } ###################### # Main AWK=/usr/bin/nawk # Configure to awk, nawk, gawk export CODEDIR='/your/path/to/search' export ARTICLEDIR='/path/to/search' export LOGFILE='/path/log/search.log' log_search WORDS=$(print ${REQUEST_URI} | get_query) html_header search_info export FOUND_FILES="${CODEDIR}/* ${ARTICLEDIR}/*" find_file () { for term in ${WORDS} do export FOUND_FILES=$( grep -il "$term" ${FOUND_FILES} ) ${FOUND_FILES:=search_fail} #Check for success done } find_file print '

Search Results

' for i in ${FOUND_FILES} do print $i | html_links_and_context done print ''

Database Management: Optimizer, Statistics and Performance

Ask yourself: What is the purpose of optimization statistics? It is to get statistics for the data in the RDBMS so the optimizer can create a query plan to find a fast path to the data, it is part of database management.Statistics are created by running ‘ANALYZE’ in Oracle, ‘update statistics’ in Informix or ‘vacuum’ in Postgresql and other utilities in other RDBMS. But a performance problem may occur if the processing to get the statistics takes a significant amount of time in a large application.

Sometimes people see large increases in performance from statistics and think that they must run the statistics jobs to reflect every change in the data. This is usually not needed, the optimizer will probably pick the same method to get data from a table that has a million rows as when it has a million and a half or even ten million. So, if the time to process statistics is crippling the application performance to improve performance, what to do?If table sizes do not change drastically and the distribution of values do not change drastically the easiest way to optimize runtime of the statistics job is to run it fewer times. Instead of five times a week, run it once a week, or once a month or split the job to run a few tables per day.

Or just once if the size and distribution of the data is more or less static and no indexes are changed.If a table is fluctuating in size by magnitudes of “X” and/or attribute value distribution has a “large” difference over time run the statistics job on that particular table more often if it makes a difference in the performance of the application. If an index is added to a table, also run the statistics on that table.Some RDBMS utilities can get statistics on a sample of the data, or some subset of statistics. The statistics job will run faster.

The idea is that some statistics, even if not complete or totally accurate, will help the optimizer enough to get a fast path to the data.