The Web on the Console

On 2010年12月13日, in tips, by netoearth

Most people think “graphical interfaces” when they think of surfing the Web. And, under X11, there are lots of great programs, like Firefox or Chrome. But, the console isn’t the wasteland it might seem. Lots of utilities are available for surfing the Web and also for downloading or uploading content.

Let’s say you want to surf the Web and find some content. The first utility to look at is also one of the oldest, the venerable Lynx. Lynx actually was my first Web browser, running on a machine that couldn’t handle X11. In its most basic form, you simply run it on the command line and give it a filename or a URL. So, if you wanted to hit Google, you would run:

lynx http://www.google.com

Lynx then asks you whether you want to accept a cookie Google is trying to set. Once you either accept or reject the cookie, Lynx loads the Web page and renders it. As you will no doubt notice, there are no images. But, all the links and the text box for entering search queries are there. You can navigate from link to link with the arrow keys. Because the layout is very simple and text-based, items are in very different locations on the screen from what you would see when using a graphical browser.

Several options to Lynx might be handy to know. You can hand in more than one URL when you launch Lynx. Lynx adds all of those URLs to the history of your session and renders the last URL and displays it. When you tested loading Google above, Lynx asked about whether or not to accept a cookie. Most sites these days use cookies, so you may not want to hear about every cookie. Use the option -accept_all_cookies to avoid those warning messages. You can use Lynx to process Web pages into a readable form with the option -dump, which takes the rendered output from Lynx and writes it to standard out. This way, you can process Web pages to a readable format and dump them into a file for later viewing. You can choose what kind of key mapping to use with the options -vikeys or -emacskeys, so shortcut keys will match your editor of choice.

Lynx does have a few issues. It has a hard time with HTML table rendering, and it doesn’t handle frames. So, let’s look at the Links browser. Links not only works in text mode on the command line, but it also can be compiled to use a graphics display. The graphics systems supported include X11, SVGA and framebuffer. You can select one of these graphics interfaces with the option -g. Links also can write the rendered Web pages to standard output with the -dump option. If you need to use a proxy, tell Links which to use with the option -http-proxy host:port. Links also is able to deal with buggy Web servers. Several Web servers claim to be compliant with a particular HTTP version but aren’t. To compensate for this, use the -http-bugs.* options. For example, -http-bugs.http10 1 forces Links to use HTTP 1.0, even when a server claims to support HTTP 1.1.

If you are looking for a strictly text replacement for the venerable Lynx, there is ELinks. ELinks supports colors, table rendering, frames, background downloading and tabbed browsing. One possibly useful option is -anonymous 1. This option disables local file browsing and downloads, among other things. Another interesting option is -lookup. When you use this, ELinks prints out all the resolved IP addresses for a given domain name.

Now that you can look at Web content from the command line, how can you interact with the Web? What I really mean is, how do you upload and download from the Web? Say you want an off-line copy of some content from the Web, so you can read it at your leisure by the lake where you don’t have Internet access. You can use curl to do that. curl can transfer data to or from a server on the Internet using HTTP, FTP, SFTP and even LDAP. It can do things like HTTP POST, SSL connections and cookies. You can specify form name/value pairs so that the Web server thinks you are submitting a form by using the option -F name=value. One really interesting option is the ability to use multiple URLs through ranges. For example, you can specify multiple hosts with:

curl http://site.{one,two,three}.com

which hits all three sites. You can go through alphanumeric ranges with square brackets. The command:

curl http://www.site.com/text[1-10].html

downloads the files text1.html to text10.html.

What if you want a copy of an entire site for off-line browsing? The wget tool can help here. In this case, you likely will want to use the command:

wget -k -r -p http://www.site.com

The -r option recurses through the site’s links starting at http://www.site.com/index.html. The -k option rewrites the downloaded files so that links from page to page are all relative, allowing you to navigate correctly through the downloaded pages. The -p option downloads all extra content on the page, such as images. This way, you can get a mirror of a site on your desktop. wget also handles proxies, cookies and HTTP authentication, along with many other conditions.

If you’re uploading content to the Web, use wput. wput pushes content up using FTP, with an interface like wget.

Now you should be able to interact with the Internet without ever having to use a graphical interface—yet another reason to keep you on the command line.

Tagged with:  

Comments are closed.