Coding Data

Repository for Coding Data and comments about working up the data for analysis.
Table of Contents


  • I separated out Book as referenced in the files from ISBN since in doing some searching and reading about ISBN numbers, they can be messier than I had hoped. I thought it true that an ISBN number uniquely identifies a book, edition, and likely pagination, but I'm no longer so sure. It turns out to be discouraged as an identifier for a bibliographic reference. -- DickFurnas - 03 Nov 2009
  • FINALS_Glencoe.ods attached. Has nearly everything parsed out into separate columns. Missing is file name and a column for the date of original source for data... and where does section number come from? the file name? -- DickFurnas - 03 Nov 2009
  • UCSMP Data Chapters 1-8 attached. -- DickFurnas - 02 Nov 2009
  • Dick- here is an initial attempt of the spreadsheet (including all Core Plus data). I'll talk to you about filling in the missing parameters soon. -- GabrielDobbs - 02 Nov 2009
  • Thanks Dick, I just had to reregister. I just uploaded some sample files as well. -- GabrielDobbs - 13 Oct 2009
  • Hi Gabriel. Here are the data sets I've gotten so far. -- DickFurnas - 18 Sep 2009
  • First Post smile -- DickFurnas - 18 Sep 2009

Initial reconnaissance.

Before getting too carried away, I wanted to reconnoiter the first column of the FINALS.xlsx spreadsheet since it seemed to have the most diverse assortment of information in it.

Here are some command-line operations I performed in on the Mac after selecting the first column of FINALS.xlsx and copying to the clipboard:

Scale of stuff to look at

pbpaste | cat | wc
   65536    9807  101150
  • pbpaste takes the contents of the clipboard and sends it to stdout
  • | pipe character which "pipes" stdout to stdin of the following command
  • cat concatenates to stdout . I use cat here defensively: cat seems to do some smart things with encodings, "conditioning" the text for use by other utilities and in this simple pipeline could have been omitted with identical results. I have encountered situations in which subsequent processing of the clipboard contents behaved better when using pbpaste if I inserted cat . It may be superfluous voodoo smile
  • wc performs a word count, reporting number of lines, number of words, number of characters * The clipboard apparently has
    • 65536 lines -- more than I want to look at smile -- Most are probably empty. That number is 2^16 and probably represents the maximum number of possible rows.
    • 9807 words
    • 101150 characters

Scale of unique stuff

pbpaste | cat | sort | uniq -c | wc
     967    2571   14253
  • sort sorts the lines
  • uniq -c finds unique lines, the -c flag says to count how many instances of each line occurred
  • I was actually wanting to see the unique lines, but by starting with wc I got an idea of how much stuff I was going to need to look at, here nearly 1000 lines.

The unique stuff

pbpaste | cat | sort | uniq -c | less
  • same as above except replace wc with less which lets me page backwards and forwards through the output.

The unique stuff of likely interest that isn't a problem number

pbpaste | cat | grep '^[A-Z]' | sort | uniq -c | less
  • similar to above, but only show lines which start with a capital letter
  • grep g eneralized   r egular   e xpression   p arser looks at lines and passes ones which match to stdout discarding non-matches
    • ^ anchors to the start of the line
    • [A-Z] matches any single character in the given range
    • single quotes to protect the search pattern from interpretation by the shell

Check the other stuff

pbpaste | cat | grep -v '^[A-Z]' | sort | uniq -c | less
  • same as above, except the -v flag tells grep to reverse its behavior, send lines which do not match to stdout
  • Why? To see if I missed anything of interest.

-- DickFurnas - 2009-09-18

Topic attachments
I Attachment HistorySorted ascending Action Size Date Who Comment
Compressed Zip archivezip r1 manage 271.3 K 2009-10-28 - 21:21 MaryAnnHuntley Core-Plus Mathematics Project
Compressed Zip archivezip r1 manage 670.0 K 2009-09-18 - 02:33 DickFurnas :zip: Zip file of data Sets collected to date
Compressed Zip archivezip r1 manage 1585.4 K 2009-10-13 - 20:28 GabrielDobbs  
Microsoft Excel Spreadsheetxlsx FINALS.xlsx r1 manage 714.3 K 2009-11-02 - 02:15 GabrielDobbs Data Spreadsheet
Unknown file formatods FINALS_Glencoe.ods r1 manage 313.7 K 2009-11-03 - 00:58 DickFurnas  
Compressed Zip archivezip r1 manage 13.0 K 2009-10-27 - 01:51 GabrielDobbs  
Compressed Zip archivezip r1 manage 132.7 K 2009-11-02 - 14:24 DickFurnas UCSMP Data Chapters 1-8 here

This topic: MH > WebHome > TextbookCoding > CodingData
Topic revision: r9 - 2009-11-03 - DickFurnas
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.