Linux Lunacy 2 25 Oct 2002 Guido van Rossum Application development with Python A simple main python module should enable traceback, then import the module and start it. This allows the module to be pre-compiled independently of main(), which saves the time to compile it ad hoc. python & wikis Guido is a wiki fan, but thinks that html in wikis is bad. Real Python wikis include: moinmoin http://moin.sf.net python & zope wiki: zwiki http://zwiki.sf.net "I like to write my python modules top-down, so the main module is at the top. The only thing that doesn't work with this is class declarations, because the ancestor class must exist before its descendents." The traceback -- in purple -- prints every variable used in the stack frame at the problem. "This, by the way, explains why I hate regular expressions." -- this from a seeming bug where a WikiWord was created when it looked like it shouldn't have been. The trouble turned out to be that the regex was checking for the presence of any WikiWord in the passed expression. Always use the path manipulation functions in python, rather than simply building a path with string functions. It's so nice that this runs on linux, PC, and Mac, and you don't get this if you use string functions. The store() function writes the edited page to disk. It writes the entire page in one fwrite, then fwrites another \n after. This is much cheaper than appending the "\n" to the string, because we don't have to copy the entire string to another buffer. Plus, the "\n" will flush the file to disk. Spreadsheet example: interface is done with tcl/tk, but the python module "tkinter" hides this. There are many ways to store a sparse 2d array. Here we use a dictionary indexed by {x,y} tuples. reset -- check first for existence of reset method with "if hasattr(reset)"...proposed syntax "if (self.?reset())" or "if self has reset()), but these syntax changes have not made it into the language. python up to and including 2.2 has an integral GUI, so a program error will cause the GUI to crash. I've been experimenting with an IDL to separate the GUI and the underlying program, but it's very slow due to an inefficient RPC mechanism. *********************** End ************************ Paranoia, mucking out, and cleaning up Steve Oualline I deal with telephone code for Nokia which is 10 million lines of C code written in 62 different countries. Outline: Why software rots Tools for cleanup The cleanup process Paranoia How to keep things clean. Why software rots -- a picture of an 1890 cable car which we were stripping the paint from to see if there was wood or rot underneath. Mostly rot, alas. Developer priorities 1 code 2 debug 3 add comments 4 Think about the project and design it Management priorities: Give developers a new job before they finish #2 <--xxx #1 Steve's job is flying troubleshooter and gadfly. You must define the problem and write it down. Think then code. I've said to management: It will take me six months to add this feature to existing code, or 3 months to rewrite it from scratch and have it do what you want. I once wrote some code to control raster imaging in a 7-nozzle print head for an hp printer. It was complicated because the nozzles were in a vertical line, but horizontally offset from eachother. I wrote three pages of comments for the routine, including a page of design, a page of explanation, and a page of ASCII art showing how it should work. When I got to writing the code, it worked the first time I tried it. Standards are important, even if they're arbitrary. For example, books are always printed with the Table of Contents in front, and the index in the back. There's no reason for this except convention. Troff even prints the TOC at the back, because it gathers the information for the TOC in one pass. The Microsoft API is the result of standards inside groups, but no standards between them. Define units. Or else: /* I have no idea what the input units are nor do I have any idea what the output units but I do know if I divide by three the plots look about the right size. */ Emissions are measured in grams/mile. Not a metric standard, not an english standard. A government standard. Design documents are crucial. Code is more frequently undercommented than overcommented, although overcommenting does happen. The only real improvement in coding I've seen over courier type 80 characters long has been lower case. I've experimented with POD, but not anything like javadocs. POD is pretty cool. How many obsolete machines must you support? I've removed support for several legacy platforms from our code, including CP/M and VAX. Language differences. A sister who knew german but not programming translated "is called by" as "is shouted at". Produce standards and enforce them. *28* Open source rots less. Peer pressure: Linus once said 'I don't like a particular subsystem, so I'll release a better version of it & let everyone on the kernel mailing list flame [the original author] until he goes away.' You can use :grep from inside vim to do a search, then save the results of the search into a buffer. There's also a 'quickfix' mode which will put you at each found string, much as many IDEs position your cursor at a compile error. grep -r will do a recursive search, but it will not separate by filenames. Anyway, we have all kinds of miscellaneous text files containing parameters, input to program generators, and other strange stuff. I don't even know all their extensions. swift -- is like glimpse, but index generation is very very slow. jeff -- Jeff believed that it was much easier to debug if the maximum amount of code was on the screen at once. So his code featured no whitespace except the ' ' character. I used indent(1) to fix his code up into something readable. I've also used snavigator -- source navigator -- this tries to be an entire IDE, may be buggy and is slow. Very impressive -- does subroutine cross-reference, finding where data is declared, et cetera. There's also lxr, a linux cross-reference at "http://www.lxr.org", which works with a 10-million line C code project. It requires a web server and you shouldn't try to set it up without some knowledge of perl. vim set scrollbind -- binds a pair of windows together. It's not too hard to make an assert fail in a program. It takes threads to make 2 asserts fail at the same time, thus starting up two distinct copies of the debugger. Trust no one. Jeff taught me this, as he swore up and down that his program was giving me parameters in the correct range. Alas, he was not correct. I've had people call me up and ask me to remove the assert()s from my library routines, because they couldn't get their code started. ox_gen.pl and ox.pl -- find symbols in object files, so you don't have to figure out the #ifdefs in the source. These are available from my website. Use asserts Provide a "debug-me" module which catches an assert() and cranks up the debugger. "Leave the code better than you find it. It's usually not that hard!" Resources: glimpse http://webglimpse.net indent http://www.gnu.org vim http://www.vim.org lxr http://lxr.linux.no ox http://www.oualline.com source navigator http://sourcenav.sourceforge.net *********************** End ************************ Code Reviews Steve Oualline I used the word "that" 2-3 too many times on each page of practical C. My copy editor sent me a blue pencil, and I spent a long time removing 1500 "that"s from the page proofs of the book. I have never made that mistake again. Checklists for code: quick, simple, often a large return on investment, can be surprisingly good. Before we can dreate a checklist, we need something to check. Style sheets Programming rules. When we started doing checklists, I had to add a new category of bug because we were finding so many things I didn't expect. Checklist rules must be simple. I aim for a checklist no more than 2 pages long. I think my current one is 3. But you must make it simple enough to be easily manageable. Precedence, for example, is too complicated. I've been programming in C for years, and the only thing I know about precedence is that addition and subtraction come before multiplication and division. For everything else, I use parens. A ground point is an essential measurement in software designed to exploit 3D satellite photos to get a single scene which an airplane can make a simulated flight through. How many ways can you spell ground point? ground_point, grnd_pt, gpt, grndPt, groundPoint . . . ? Come up with rules to prevent errors. There should be no debate on them. "Comments should be clear" is a bad rule. "Every variable declaration must be followed by a comment" is a good rule. Use coding templates, to ensure that everyone is doing things the same way. Prefer snprintf (bounds checked printf) to sprintf. In San Diego we have a beach at the bottom of a cliff. Ask a programmer how to get to the beach and he'll say, "Just jump off the cliff." Why? It's the fastest way. C lets you do lots and lots of things. Don't do them all. Consistency is so important in code. It's interesting that in writing you should not use the same patterns over and over, but in coding you must. I want boring code that works! V3 -- no int declarations. Use "INT16", "INT32", et cetera. This is very significant for the embedded market, because size of types matters here. I like for a word to mean just one thing. A guy in California who asked for 3 different choices for a custom license plate: 'sail', 'sailing', and 'none'. Of course, he got 'none', so he now gets several hundred traffic tickets a month. If a phone has a memory leak, it will be returned, and may have to be recalled. I work for Nokia, who can say 'We will only sell 10 million of these phones. . . " *96* Report metrics on checklist defects, so that you can improve your checklists and so that you can watch for sudden upswings in bugs not detected by static analysis due to bad procedure changes. *97* I have seen a case where a procedure change caused the defect rate to spike up after a procedure change made the static checking methods less effective. *98* Explaining your program to a good programmer can help. *99* Walkthroughs. *100* Meetings can be hazardous. Provide code to be inspected ahead of time. *102* The programmer is the one who's paid for her work. So she's the one who gets to decide if something is really a defect which should be fixed. *103* I went to MIT, where we had 2 nobel prize winners in physics: Milliken and Feynman. If either one of them was on your thesis committee, you were doomed. One guy by the luck of the draw got them _both_. We waited outside of the thesis room on the 'death watch' for three hours -- the average thesis defence takes an hour. Finally the guy came out and, he said 'It was great. They'd ask one just one question, then immediately start arguing with each other. . ." *105* Data flow analysis is most useful for protocol stacks and memory problems. Memory is very significant on a mobile phone. If you ask for more of it, you'll be told "That's 50 cents more per phone. If we do that on 10 million phones, that's 5 million dollars and our bottom line will go up in smoke, the economy of Finland will go into a tailspin, we'll go to war with the United States and it'll be World War 3, all because you wanted an extra few K of memory..." Develop software to implement coding rules? Some software to do this is out there, but it may be better to do manual reviews. You can find errors which are not in the checklist this way. A programmer shouldn't work on less than a procedure. If he doesn't understand that much of the code, he probably shouldn't be working on it. And if he does understand the procedure, he should fix the entire thing. lint is also a good tool, especially lclint. Many of the functions of the older lint have been subsumed into ANSI compilers. filegrind -- does code reformatting? *********************** End ************************ Brian Carrier carrier@atstake.com 617-768-2756 Digital Forensics Forensics: Acquisition and analysis of data to find evidence It's used in both legal and corporate environments. Tools are used to -- make a snapshot -- Process the partition data -- Preserve the partition data -- Search the partition data Acquiring imaage -- use dd(1) to save an entire hard drive to a file, as 'dd if=/dev/hda of=/mnt/hda1.dd' You can also use netcat (nc) dd if=/dev/hda | nc 10.0.0.1 9000 On receiving machine: nc -l -p 9000 > hda1.dd One worm was purely memory-based, and the trend is toward grabbing memory as well as disk, but tools are still very crude for analysis. Acquisition: Grab both allocated & non-allocated space -- Don't use backup; backup tools may modify access times. Toms rootboot or some other trusted environment is useful for this. Always verify your image first thing -- use md5sum hda1.dd or similar and write it down on a piece of paper. Later (e.g. in court) you can demonstrate that you have the same file by re-running the checksum. Use chain-of-custody forms. And do analysis on a copy of the 'original' data. Goals Was the system compromised? How? Who? Did someone have bad stuff? Recover deleted files if necessary. Difficulties Large disk sizes (needle-haystack problem) Logging not turned on or misconfigured Unknown baseline (what is normal?) Ways -- Timeline of system activity Chart inode modification/access/change times of files. Integrate IDS alerts and log entries. Keywoard searches for known data: filenames docs binaries File and directory analysis Check startup scripts, log files. On windows, the system registry. Use NIST NSRL -- checksums of system binaries. Or use cron to run periodic checksums of /bin/* (a poor man's tripwire). Solaris has a fingerprint database, and RedHat/Suse/Mandrake RPMs contain checksum information as well. Forensic examination software: ILEC -- available only to law enforcement. Another brand will sell only to law enforcement and the Big 5. Dominant tool on windows is encase. License fee is $2500 per seat, and you are locked into the tool when you start using it, although recently they have started to publish their file formats. Testing closed-source forensics tools is done with random test cases, since there's no access to the source code. Daubert -- these are legal guidelines for scientific evidence, which specify that science involves peer-reviewed journals, et cetera. Whether digital evidence is considered 'scientific' and falls under Daubert is still in play. *15* debugfs allows you to view unallocated inodes in ext2 and ext3 fs s *16* Task and autopsy I wrote most of task and all of autopsy. They're built on TCT and TCTUTILS (Don Farmer, The Coroner's Toolkit) 15 command-line tools to do specific things. autopsy is an html interface to these tools. *18* Three layers: File system layer Content layer meta data layer (inode layer) File Name Layer (also FAT) These tools are independent of any drivers and can handle NTFS,FFS,FAT,EXT2FS, and EXT3FS. Slide 21 didn't make it to the handout. Task: A sorter perl script that calls other task tools, then sorts files based on their file types. It identifies files with extension mismatches, e.g. executables ending in ".txt", picture files ending in ".sh". Also looks up and throws out files which match a database of md5 checksums, allowing for data reduction. *27* Because of its nature as an html tool, you can actually do live remote data analysis with autopsy. This ain't recommended, of course, because it's awfully hard to prove later that you found what you say you found. The demonstration involved finding the file "/dev/ptyp", which contained a list of names. From there he demonstrated finding the string "/dev/ptyp" in the ps(1) binary, thus showing that it was compromised not to list certain running processes in its report of the ps list. This is a common technique in root kits. *28* Other tools: foremost, developed by the Air Force office of special investigations (USAF OSI), then released open source. Interesting that the government is releasing open source tools now. *29* Lazarus -- very slow, somewhat like foremost. This goes through a disk image and produces a separate 1024-byte file for each block on the disk. It's your problem to put them all back together though. Another tool called fatback which does FAT file recovery and was written by the DOD. Open sourced and runs on linux. Open-source digital forensics is still small and under-supported. My open source tools have also been turned to the black hat side, I've been told. A version exists which opens a disk R/W instead of R/O, so that you can hide evidence contained in deleted files by erasing the contents of deallocated inodes. SMART -- ASR data. Runs on Linux and BEOS only, closed source commercial all-in-one tool. Thus Endeth Linux Lunacy 2.