Coding Data

Repository for Coding Data and comments about working up the data for analysis.
Table of Contents

Initial reconnaissance.

Before getting too carried away, I wanted to reconnoiter the first column of the spreadsheet since it seemed to have the most diverse assortment of information in it.

Here are some command-line operations I performed in Terminal.app on the Mac after selecting the first column of FINALS.xlsx and copying to the clipboard:

Scale of stuff to look at

pbpaste | cat | wc
   65536    9807  101150
  • pbpaste takes the contents of the clipboard and sends it to stdout
  • | pipe character which "pipes" stdout to stdin of the following command
  • cat concatenates to stdout . I use cat here defensively: cat seems to do some smart things with encodings, "conditioning" the text for use by other utilities and in this simple pipeline could have been omitted with identical results. I have encountered situations in which subsequent processing of the clipboard contents behaved better when using pbpaste if I inserted cat . It may be superfluous voodoo smile
  • wc performs a word count, reporting number of lines, number of words, number of characters * The clipboard apparently has
    • 65536 lines -- more than I want to look at smile -- Most are probably empty. That number is 2^16 and probably represents the maximum number of possible rows.
    • 9807 words
    • 101150 characters

Scale of unique stuff

pbpaste | cat | sort | uniq -c | wc
     967    2571   14253
  • sort sorts the lines
  • uniq -c finds unique lines, the -c flag says to count how many instances of each line occurred
  • I was actually wanting to see the unique lines, but by starting with wc I got an idea of how much stuff I was going to need to look at, here nearly 1000 lines.

The unique stuff

pbpaste | cat | sort | uniq -c | less
  • same as above except replace wc with less which lets me page backwards and forwards through the output.

The unique stuff of likely interest that isn't a problem number

pbpaste | cat | grep '^[A-Z]' | sort | uniq -c | less
  • similar to above, but only show lines which start with a capital letter
  • grep g eneralized   r egular   e xpression   *p*arser looks at lines and passes ones which match to stdout discarding non-matches
    • ^ anchors to the start of the line
    • [A-Z] matches any single character in the given range
    • single quotes to protect the search pattern from interpretation by the shell

Check the other stuff

pbpaste | cat | grep -v '^[A-Z]' | sort | uniq -c | less
  • same as above, except the -v flag tells grep to reverse its behavior, send lines which do not match to stdout
  • Why? To see if I missed anything of interest.
  • UCSMP Data Chapters 1-8 here. -- DickFurnas - 02 Nov 2009
  • UCSMP Data Chapters 1-8 attached. -- DickFurnas - 02 Nov 2009
  • Dick- here is an initial attempt of the spreadsheet (including all Core Plus data). I'll talk to you about filling in the missing parameters soon. -- GabrielDobbs - 02 Nov 2009
  • Thanks Dick, I just had to reregister. I just uploaded some sample files as well. -- GabrielDobbs - 13 Oct 2009
  • Hi Gabriel. Here are the data sets I've gotten so far. -- DickFurnas - 18 Sep 2009
