"Arc Flow" data. Arriving via a DVD came file Wolack2.zip of size 2,278,665,373 bytes (that would have been better as all_arc.7z of size 1,306,057,660 bytes) which when expanded produced all_arc.csv of size 23,778,762,633 bytes and on reformatting and splitting the files used 11,562,960,536 bytes for the directory of split files, and which can be compressed via 7-zip to 1,261,313,940 bytes as a comparison. File all_arc.csv proved to have a brainless layout, as follows: ARC_ID,TRADING_DAY,TRADING_PERIOD,RUN_TYPE,FLOW_INTO_ARC,FLOW_OUT_OF_ARC,SHADOW_PRICE,RUN_TIME,MWMAX KIK_STK3.1,1996-10-23,34,F,-1.091000000000000e+000,-1.105000000000000e+000,0.000000000000000e+000,1996-11-17 07:46:58, KIK_T1.H1,1996-10-23,34,F,1.653900000000000e+001,1.653900000000000e+001,0.000000000000000e+000,1996-11-17 07:46:58, KIK_T1.L1,1996-10-23,34,F,-8.200000000000000e-001,-8.220000000000001e-001,0.000000000000000e+000,1996-11-17 07:46:58, KIK_T1.M1,1996-10-23,34,F,-1.571700000000000e+001,-1.571700000000000e+001,0.000000000000000e+000,1996-11-17 07:46:58, KIN_T2.T2,1996-10-23,34,F,-1.518200000000000e+001,-1.521400000000000e+001,0.000000000000000e+000,1996-11-17 07:46:58, KIN_T4.T4,1996-10-23,34,F,-1.579200000000000e+001,-1.583300000000000e+001,0.000000000000000e+000,1996-11-17 07:46:58, If these data were to have been supplied in a format corresponding to that long established by the "final prices" data files and many others, it would be recognised directly and could be read by Gnash (without trouble other than whatever data errors might be encountered), as follows: ARC_ID,TRADING_DAY,TRADING_PERIOD,FLOW_INTO_ARC,FLOW_OUT_OF_ARC,SHADOW_PRICE,MWMAX,RUN_TIME,RUN_TYPE KIK_STK3.1,23/10/1996,34,-1.091,-1.105,0,?,17/11/1996 07:46:58,F KIK_T1.H1,23/10/1996,34,16.539,16.539,0,?,17/11/1996 07:46:58,F KIK_T1.L1,23/10/1996,34,-.82,-.822,0,?,17/11/1996 07:46:58,F KIK_T1.M1,23/10/1996,34,-15.717,-15.717,0,?,17/11/1996 07:46:58,F KIN_T2.T2,23/10/1996,34,-15.182,-15.214,0,?,17/11/1996 07:46:58,F KIN_T4.T4,23/10/1996,34,-15.792,-15.833,0,?,17/11/1996 07:46:58,F For the benefit of those incapable of developing a generalisation via the exercise of thought, following the established style for these data would yield ,,,,,,,, as reformatted. not ,,,,,,,, as supplied And notice that the date is with slashes and is day/month/year, not year-month-day (though this is a declared standard). An omitted value is indicated by a ?, but a null field is also acceptable. Hand-editing such a monster file is unworkable, so instead a special-purpose programme was devised to reformat the file, and, split the data into monthly blobs while it was at it. This split is not absolute, as the data are not supplied in chronological order. A split was made when the incoming record bore a month number later than that assigned for the current output file. Subsequent lines might "reach back" into the previous month, but no matter, the split was made. Some four hours of computer time later, the following schedule was completed. Split Start rec. 1996m10 2 1996m11 1121932 1996m12 2212012 1997m 1 3333910 1997m 2 4458806 1997m 3 5484278 1997m 4 6621148 1997m 5 7722566 1997m 6 8872790 1997m 7 9969306 1997m 8 11129946 1997m 9 12291066 1997m10 13415706 1997m11 14576272 1997m12 15700912 1998m 1 16863040 1998m 2 18025168 1998m 3 19078454 1998m 4 19833812 1998m 5 20961332 1998m 6 22127924 1998m 7 23256884 1998m 8 24426452 1998m 9 25596020 1998m10 26630900 1998m11 27776606 1998m12 28886846 1999m 1 30034094 1999m 2 31175390 1999m 3 32206238 1999m 4 33348876 1999m 5 34451916 1999m 6 35591724 1999m 7 36694764 1999m 8 37842012 1999m 9 38992236 1999m10 40105356 1999m11 41254034 1999m12 42367154 2000m 1 43517378 2000m 2 44667602 2000m 3 45742178 2000m 4 46890968 2000m 5 48004520 2000m 6 49169672 2000m 7 50301944 2000m 8 51484616 2000m 9 52665128 2000m10 53801528 2000m11 54963580 2000m12 56089660 2001m 1 57253276 2001m 2 58421452 2001m 3 59472652 2001m 4 60638364 2001m 5 61769916 2001m 6 62947116 2001m 7 64089036 2001m 8 65269740 2001m 9 66456972 2001m10 67602780 2001m11 68788660 2001m12 69939364 2002m 1 71129764 2002m 2 72321700 2002m 3 73390036 2002m 4 74571506 2002m 5 75713714 2002m 6 76907234 2002m 7 78084818 2002m 8 79317218 2002m 9 80542514 2002m10 81741602 2002m11 83000780 2002m12 84234668 2003m 1 85570124 2003m 2 86882588 2003m 3 88035548 2003m 4 89283430 2003m 5 90491782 2003m 6 91739350 2003m 7 92932726 2003m 8 94180342 2003m 9 95457238 2003m10 96730006 2003m11 98170390 2003m12 99727846 2004m 1 101340262 2004m 2 102967942 2004m 3 104499766 2004m 4 105871368 2004m 5 107197674 2004m 6 108521946 2004m 7 109841034 2004m 8 111242202 2004m 9 112650378 2004m10 114019818 2004m11 115253652 2004m12 116392780 2005m 1 117570080 2005m 2 118747344 2005m 3 119811792 2005m 4 120991296 2005m 5 122131776 2005m 6 123311864 2005m 7 124455224 2005m 8 125636696 2005m 9 126818168 2005m10 127961888 2005m11 129143787 2005m12 130292343 2006m 1 131480775 2006m 2 132669687 2006m 3 133743543 2006m 4 134935733 2006m 5 136096397 2006m 6 137300405 2006m 7 138468701 2006m 8 139672861 2006m 9 140874421 2006m10 142037941 2006m11 143237981 2006m12 144403958 2007m 1 145609206 2007m 2 146814486 2007m 3 147903918 2007m 4 149111426 2007m 5 150279314 2007m 6 151488442 2007m 7 152654098 2007m 8 153858538 2007m 9 155067450 2007m10 156238534 2007m11 157451151 2007m12 158626191 2008m 1 159851031 2008m 2 161074167 2008m 3 162222151 2008m 4 163450015 2008m 5 164641107 2008m 6 165873459 2008m 7 167074467 2008m 8 168313803 2008m 9 169551939 2008m10 170745527 2008m11 171984095 2008m12 173187215 2009m 1 174435647 2009m 2 175684871 2009m 3 176819967 2009m 4 178075983 2009m 5 179301085 2009m 6 180561421 181212563 read. Count Minimum Maximum Name. 181212563 -700.750000 1369.399000 FLOW_INTO_ARC 181212563 -700.750000 1369.399000 FLOW_OUT_OF_ARC 181212563 -158.730000 338.150000 SHADOW_PRICE 102613331 0.000000 10000.000000 MWMAX So that's some 153 files of size 70-90MB each, totalling 11GB which is half the size of the original.