The Master Plan

Please start by reading the About page for this blog.

This post will be regularly updated, and remain at a very high level.  Detailed discussions will go into other posts, most of which will be linked to from this page.  Any link below beginning with “>” is to another post of this blog.

Read the rest of this entry

Dual boot with UEFI

It’s been a few years since I posted here, and during that time we obtained our first UEFI computers. I bought them from Dell with Windows 8 pre-installed, and when I tried to make them dual boot I ran into a million problems. I moved the laptop hard drive out of the laptop and on to my desktop several times because I couldn’t figure out how to boot USB under UEFI, and screwed up the windows file system and had to restore it from a backup image dozens of times. Once I figured out what to do, though, it wasn’t any harder than the old way, just different.

The problems I had mostly boil down to two:

  • My existing bootable USB sticks would not boot; I needed to make UEFI-bootable USB sticks.
  • For unknown reasons, if I shrunk the NTFS windows file system using the linux tool ntfsresize, windows could not boot it; I had to use windows tools to shrink the Windows file system.

UEFI is a fairly new standard for computer manufacturers, a replacement for the BIOS standard. It no longer uses the first sector of the primary hard drive as a boot sector; instead, the first partition of the hard drive contains boot information; and the disk must be partitioned using the Guid Partition Table (GPT) convention, which is different from the old “msdos” partition convention. You may read on the internet that you have to use parted because fdisk does not work with GPT. However, that is no longer true, recent releases of fdisk work fine with GPT. When you buy a UEFI computer, it will already have a GPT-partitioned disk with a UEFI partition. If the UEFI partition has some room left, you won’t have to mess with the partition; you can just add some files to it.

The whole UEFI boot procedure is different than the old BIOS boot procedure. It starts with some firmware on the computer, called the Boot Manager. The Boot Manager is aware of Boot Loaders on the UEFI partition. If there are more than one, it lets the user choose which one to boot. You can write to the firmware; from linux, use the efibootmgr command. You will need this if you want to make linux the default boot loader instead of windows.

Here is the final procedure I worked out – I did this on a Dell Inspiron 15 3531; your mileage may vary.

  1. Boot Windows and see how much free space there is on your C: drive. Then go to shut it down, and hold down SHIFT as you click Restart. That will give you some debug options after the OS exits. Navigate until you get a shell prompt.
  2. Start the diskpart program, and issue the following commands:
    • LIST DISK
    • SELECT DISK 1
    • LIST VOLUME
    • SELECT VOLUME C
    • SHRINK DESIRED=200000

    Note: the SHRINK command tells how much to shrink by, not how large the resulting filesystem should be. So if there were a 300 GB Windows filesystem, and you issued SHRINK DESIRED=100000, it would shrink by 100 GB and leave a 200 GB Windows filesystem. You can’t shrink by more than the amount of free space on the volume.

  3. I did all the following with secure boot turned off; I don’t know if that’s necessary. Secure boot is a specialization of the UEFI boot process which only boots to OSes that are digitally signed. To turn off secure boot, I had to go into the computer setup menus (what we used the call “the BIOS”) and find the correct option. On my Dell, I did that by powering on and hitting F2 several times until setup started, going to the Boot menu, and choosing “BOOT LIST OPTION” to be “UEFI”, and disabled both “Secure Boot” and “Load Legacy Option Rom”.
  4. UEFI machines are supposed to be able to boot old BIOS-style disks if you set the appropriate “LEGACY” mode in the computer setup menus (what we used to call “the BIOS”). However, on my computer (despite hours of effort) I never got that to work. So we need to get a UEFI-bootable CD or make a UEFI-bootable USB stick. Some recent LIVE CDs are UEFI-bootable (and also BIOS-bootable), but some are not. Look for an EFI subdirectory in the CD’s root directory. To format the USB stick on some other computer running Linux, you must find a UEFI-bootable ISO. yum install livecd-tools and then: livecd-iso-to-disk –format –reset-mbr –efi %PATH_TO_ISO% /dev/%USB_DEVICE%
  5. Boot the target machine to a linux LIVE disk and run the installer.

This will create a boot loader for Linux on the EFI partition, without erasing the Windows boot loader. Note that grub configuration and extra images now live in the EFI partition filesystem rather than in /boot/grub of the Linux filesystem. The next time you boot, you will be given a choice of the two boot loaders, Windows or Linux. If you want to make Linux the default, read up on the efibootmgr linux command (or you may be able to do it from your computer’s setup menus).

Fedora 14 and ClearOS LDAP

I had a hard time getting the client machines to allow LDAP logins.  The solution I finally came up with is not ideal, because the communication is taking place in the clear.  If my students were better hackers, they might be able to steal passwords, including passwords that give administrator access to the server.

In the process, I learned a lot about how login and some other processes work on Linux.

In the beginning, the only way to define a user was in the /etc/passwd file of the local hard drive.  Now there are lots of ways.  Linux has abstracted the process into an /etc/nsswitch.conf file.  (NSS stands for Name Service Switch.) Using that file, you can tell the system what to do when it is confronted with a login request.  For example, your recipe may be to first check the local files (i.e. /etc/passwd); and if that fails to check LDAP.  People with new user directory schemes can write NSS plugins to implement their scheme.

There is a getent command which will tell you what wound up being the value of an entity after taking nsswitch.conf into consideration.  For example, if you type ‘getent passwd rnolty’ it will return a passwd-file-like entry for user rnolty.  (Don’t worry, it doesn’t actually contain the user’s password.  Neither has the /etc/passwd file for the past 20 years or so….)  Maybe that entry comes out of the local /etc/passwd file, or maybe it comes from LDAP.  getent doesn’t tell you where the info comes from; but if ‘getent passwd username’ returns nothing, then username will not be able to log on.

Rather than using LDAP directly, I wanted to use it under SSS (System Security Services — see my description here).  SSS (among many other things) caches LDAP credentials locally, so that after a user has logged on once, that user can log on again in the future even if the machine is unable to connect to the LDAP server.  So the relevant lines in nsswtich.conf were:

passwd: files sss

shadow: files sss

group: files sss

I also discovered the system-config-auth program, which was reachable by menus under System/Administration/Authorization.  When I told it I wanted to use ldap, it set it up to be under SSS.  (SSS was developed by Fedora people, and has been installed by default in recent releases of Fedora.)

The settings that allowed LDAP logins (although over unencrypted connection) looked like this:

You can achieve the same thing from the command line by typing:

# authconfig –enablelocauthorize –enableldap –enableldapauth –ldapserver=ldaps://192.168.3.1/ –passalgo=sha512 –ldapbasedn=dc=gateway,dc=paja –enablecachecreds –enablemkhomedir –nostart –updateall

Running system-config-auth rewrites both your /etc/nsswitch.conf and your /etc/sssd/sssd.conf files (and, I forget, maybe some /etc/pam.d files).

ClearOS LDAP makes the main group for every user the allusers group.  However, I had also defined a teachers group and a students group on ClearOS; so a user was either a member of allusers and teachers, or a member of allusers and students.  But after running system-config-auth, the clients did not know who was in the teachers or students group.  If I ran getent on the client, I got

# getent group teachers

teachers:x:63000:

That is, the teachers group had no members.  To fix this, I had to add the following to /etc/sssd/sssd.conf in the default stanza:

ldap-schema: rfc3207-bis

This tells SSS what standard this flavor of LDAP is using to store the member information in the group directory entry.  After that getent returned:

teachers:x:63000:teacher1,teacher2,teacher3

For LDAP login to work, obviously the client has to connect to the network before the first user logs in.  As far as I can make out, the ability to connect to a wireless AP is controlled by gnome.  When you use the NetworkManager applet to connect to a wireless AP and type in the wireless key, by default gnome stores that on a per-user basis.  I guess this makes sense — you might want to have some accounts that connect to the network and others that are not allowed to.  The problem is it doesn’t know how to connect to the wireless until after you  log in — big problem if you need a network connection to log in!  However, in the NetworkManager dialog there was a checkbox that said, “Make this connection available to all users.”  If I checked that, next time I booted it would go ahead and connect to the wireless even before anyone logged in.

NOTES ON DEBUGGING THE SETUP

After I first used system-config-auth to set up LDAP, I still could not log in LDAP users or see them in getent.

There is an ldapsearch command, but it works with openldap, not with the SSS implementation of LDAP.  So I made sure that /etc/openldap/*.conf files were essentially consistent with /etc/sssd/sssd.conf with regards to LDAP.  Then I could run commands like

$ ldapsearch -x ‘{cn:rnolty}’ uid

to see the user ID number for user rnolty in the LDAP directory.  I couldn’t get it to work without the -x, which tells me that the TLS certificate stuff is not working.

I did a lot of strace-ing to verify what was really going on under the hood.  I did ‘strace getent passwd rnolty’.  I saw the program read through /etc/passwd (but I knew rnolty wasn’t in /etc/passwd) and then send a request down a named pipe to the SSS daemon.  Actually, there are three SSS daemons — one to handle requests from NSS applications, one to handle PAM authentication requests, and one to talk to the LDAP server.  After a little while, a response came back from SSS over the pipe, and then getent printed its output.  I then used ‘strace -p <pid>’ to connect to the already-running sssd_nss process, while I did the getent command in another window.  I saw the (presumptive) getent process connect and send some data in; than sssd_nss sent a request up a socket to the sssd_be process.  After a bit a response came back from sssd_be, and sssd_nss sent a reply to getent.  Finally, I used strace -p to connect to the running sssd_be process.  I saw it get a request, presumably from sssd_nss, and make a TCP connection to the LDAP server.  The response from the LDAP server came trickling back in over several read calls, and then sssd_be sent a response to sssd_nss.  Surprisingly, even all of this spying gave me no clue why things weren’t working when they weren’t working.

Note that, because SSS caches credentials locally, if you want to force it to go to the server for a user that has logged in in the past, you have to clear the cache.  So I did a lot of

# service sssd stop

# rm /usr/share/sss/db/cache_default.db

# service sssd start

Another thing I tried was, on the ClearOS server, stopping slapd (the LDAP daemon), and starting it on the command line with high debug levels.  But that didn’t help either.

Finally, about the 10th time I ran system-config-auth on the client, it reported to me that two of my parameters were incompatible with each other.  The final system-config-auth that works is in the screenshot above.

ClearOS configuration

ClearOS is a variant of Redhat Enterprise Linux optimized to serve as a gateway, sitting between your LAN and your internet connection.

I installed a second ethernet card in the box, and during install configured the box as a gateway.  eth0 is connected to my DSL modem, while eth1 is connected to the LAN.

After installation,  ClearOS boots into a graphical shell called ratpoison, in which every window is fullscreen and you switch between windows by clicking tabs at the top of the screen.  The initial window is running a version of firefox that shows you the screen, but has no menu, addressbar, etc.  So all you can do is click around.  It is initially visiting localhost:82, a web server process called ‘webconfig’, which gives you access to all the most-commonly-used configuration parameters for your ClearOS box.  Some links open a new window, which in ratpoison means a new tab appears at the top; get back to the original window by clicking the leftmost tab.

There are a couple of things below that cannot be set by webconfig, and must be done from the command line.  To do this, either (a) on the ClearOS machine, type Ctl-Alt-F2 to get a login-prompt, and login as root.  (If you want to see webconfig again, type Ctl-Alt-F7.)  OR (b) from a client machine, SSH into the ClearOS machine as root.  With DNS configured as below, from a linux client command line, I can type ‘ssh root@gateway’.

The first thing to do is to use webconfig to get your IP Settings right.  I made eth0 have the External role, and eth1 the LAN role.  I was flummoxed for a day or two by a very weird problem — my AT&T DSL modem will set up its DHCP client to be on 192.168.1.64; if I specified a static IP address on eth0 that was different (even though it was on 192.168.1) the DSL modem refused to talk to it.  So I just let eth0 be configured by DHCP.  For eth1, I set a static IP of 192.168.3.1.

The LAN port should run a DHCP server for all my clients to use.  So on the DHCP Serving settings page, I made sure it was running.  I enabled it only for eth1, and set an IP range from 192.168.3.100 to 192.168.3.250.  The last thing I did in the Network family of settings was on the Local DNS Server page, where I added to the 192.168.3.1 record — I added hostnames of gateway and wpad (more on wpad later).  At this point, I could boot a computer on the LAN, and it would get configured and have internet access through the ClearOS box.  Also, the computer ‘gateway’ was defined in DNS to point to the ClearOS box.  One very weird thing is that if I use a client computer to access webconfig, if I just type ‘gateway’ (or, equivalently, http://gateway/) into a browser, it inexplicably redirects me to an error page telling me to fix my proxy settings (even if they are already correct).  To access webconfig from a client machine, you must type ‘gateway/admin’.

I added some users and groups under the Directory family of settings.

The first thing I wanted to do was get content filtering working.  This was a several-step process.  Under the Gateway family of settings, I went to the Web Proxy page.  I made it Running and Automatic.  I first enabled Transparent Mode, and ensured that my clients could access the web.  What is happening here is that when any outgoing request for Port 80 comes into ClearOS, ClearOS routes the request through its local web proxy (a program called squid).  The only value of this is that if several clients request the same resource (say a Youtube video), the video will only come through the DSL modem once and be cached in squid.

To get content filtering working, I had to leave transparent mode (i.e. disable it).  I enabled Content Filter and User Authentication.  On the Content Filter page, I made sure it was Running and Automatic.  For the Default filter group I left all config as it was.

When using a proxy but not in transparent node, you have to tell the client browsers to explicitly use the proxy.  There are a few ways to do this. If I want to use Automatic Configuration, I have to set up wpad.  I have already defined the wpad address in DNS (above).  Now I need to make that machine serve a wpad.dat file.  From the ClearOS command line, I use an editor such as ‘vi’ to create the file /var/webconfig/htdocs/wpad.dat, and in that file I type

function FindProxyForURL(url,host)

{

return “PROXY 192.168.3.1:8080”;

}

On a windows client, go to Internet Options->Connections->LAN Settings.  Because wpad (Web Proxy Autoconfiguration Discovery) is defined in DNS, you can just click “Automatically detect settings”.   When the browser first tries to access a page on the internet, it will download a certain file from the wpad computer and that will set up its proxy settings. On Linux, you can either set the proxy options globally (for example, under gnome, right-click on the network icon in the notification bar, choose Network Settings, and in the popup click on Network Proxy) or set it specifically in the browser.  In Firefox, go to Edit-> Preferences-> Advanced -> Network -> Connection Settings; you can choose “Use system proxy settings” if you have set it system-wide, or just click “Auto-detect proxy settings for this network.”  On Chrome, it is supposed to always use the system settings though on my client that did not work.  The only thing I found that worked was to launch Chrome from the command line with ‘chromium-browser –set-proxy=gateway:8080’ (and even that only worked part of the time :-/)

The proxy settings in effect (when Content Filtering is turned on) will be 192.168.3.1:8080 (or, equivalently, gateway:8080).  Now, if you want to visit google.com, the web browser does not contact google.com directly; instead, it sends a message to gateway:8080 asking the proxy to give it the google.com page.  Because I turned on User Authentication in the Web Proxy settings on the ClearOS box, the first time the browser requests a page from the proxy, the proxy requires the user to log in.  At that point, the user must enter a username and password as defined in the Directory section of ClearOS settings.  Now, if I type ‘nude-girls.com’ into the browser, I get a page from Dans Guardian (the name of the content filter being used by ClearOS) saying that the page is denied.  (Irony: I had to turn off “Block IP Domains” in the content filter, because when the content filter denies a page, it redirects to 192.168.3.1:82, which gets further denied because it is an IP address, so you wind up with an almost infinite-loop of denial messages, and the one that finally appears doesn’t tell you anything about what the original denial was about.)

After using the proxy for several days in my everyday computing, I’ve found that my web browser works very reliably; I have been able to visit every page on the internet that I legitimately want, without any spurious content filter denials.  (One super-ironic exception: one of the help pages at clearfoundation.org about content filtering is denied by the content filter!)  However, other internet-enabled programs may have trouble because they do not know about proxying.  For example, whenever I run yum on a linux box to download software, I see lots of failure messages trying to fetch the software from the internet; though in the end it always succeeds, I think because it eventually tries a ftp mirror.)  Bottom line: web proxy is a pain, but I don’t know a better way.

Next I turned attention to the Directory service (LDAP) which allows logon and password information to be stored on the ClearOS machine, and used to log on to any of the client machines.  Under the Directory family of settings, the Domain and LDAP page, I set the domain to gateway.paja (paja is the abbreviation for my school, the Peace and Justice Academy).  Inexplicably, the ldap service on ClearOS is bound only to 127.0.0.1; from a client machine I can run “telnet gateway 389” and get “connection refused”.  I can change this by editing /etc/sysconfig/ldap on the ClearOS machine.  On the ClearOS command line (see above), I use an editor such as ‘vi’ to change the last line of /etc/sysconfig/ldap from “BIND_POLICY=localhost” to “BIND_POLICY=lan”.  Then on the command line I do ‘service ldap restart’ and now LDAP communication with the client can occur.

On the client machines, LDAP configuration is detailed in a separate post.

The eHub

The administrators really want an e-hub to centrally manage information like assignments and grades.  For example, using the e-hub, a teacher could create an assignment for a class.  Initially, the status of all students for that assignment is incomplete.  Students, parents, and teachers or volunteers in Practice and Review periods could access the list of incomplete assignments for a  student. Read the rest of this entry

Disk Freezing

Disk Freezing means that the file system will roll back to a fixed state periodically (for example, on every reboot), forgetting every change that was made since the last rollback.  Thus, if students have obtained a virus, or installed a program, it will soon be forgotten. Read the rest of this entry

Internet

We’ve been skating by without any internet filtering, but that’s got to change.

>>ClearOS is intriguing to me — a free, open source OS designed to serve as a gateway/firewall.  I don’t yet understand the difference between the free apps and the paid services.

Read the rest of this entry

The Student Machines

Clearly the first decision is the OS to install on the student laptops.  Last year we had Windows 7.  It seems to me most of the major headaches I had last year would be solved if we used linux.  But of course Windows has a lot of advantages too.

Read the rest of this entry

The System Server

As a linux guy, last year I found administering the school’s Windows Server 2003 much less painful than administering the Windows laptops.  So I’m not anti-Windows for the server.  However, I’m leaning toward linux for several reasons:

Read the rest of this entry