Understanding HTML
HTML is a markup language used to describe the structure of a web page.
Consider a snippet of text with no markup:
HTML HyperText Markup Language (HTML) is a markup language used to describe the structure of a web page. We can use it to differentiate such content as headings lists links images Want to https://www.packtpub.com/web-development Learn more about web development.
The preceding snippet of text makes some sense. It may also raise some questions. Why does the snippet begin with the word HTML? Why is there a URL in the middle of a sentence? Is this one paragraph?
Using HTML, we can differentiate several bits of content to give them greater meaning. We could mark the word HTML as a heading, <h1>HTML</h1>
, or we could mark a link to another web page using <a href="https://www.packtpub.com/web-development">Learn more about
web development</a>
.
There have been several versions of HTML since its first release in 1993 at the beginning of the web. Throughout the rest of this chapter, and indeed the rest of this book, we will be looking at and working with the current version of the HTML language, HTML5, which is the 5th major version of HTML. When we use the term HTML, we will refer specifically to HTML5 and if we need to talk about a different version we will do so explicitly (e.g., HTML 4.01).
In the next section, we will look at the syntax of HTML in more detail.
Syntax
The syntax of HTML is made up of tags (with angle brackets, <>
) and attributes. HTML provides a set of tags that can be used to mark the beginning and end of a bit of content. The opening tag, closing tag, and all content within those bounds represent an HTML element. The following figures show the HTML element representation without and with tag attributes respectively:
Figure 1.4: HTML element representation without tag attributes
Figure 1.5: HTML element representation with tag attributes
A tag has a name (for instance, p
, img
, or h1
), and that name combined with attributes will describe how the browser should handle the content. Many tags have a start and end tag with some content in between, but some tags don’t expect any content, and these can be self-closing.
An opening tag can have any number of attributes associated with it. These are modifiers of the element. An attribute is a name-value pair. For example, href="https://www.packtpub.com/web-development"
is an attribute with the name of href
and the value of https://www.packtpub.com/web-development
. An href
attribute represents a hypertext reference or a URL, and when this attribute is added to an anchor element, <a>
, it creates a hyperlink that the user can click in the browser to navigate to that URL.
To provide information within an HTML document to be ignored by the parser and not shown to the end user, you can add comments. These are useful for notes and documentation to aid anyone who might read or amend the source of the HTML document. A comment begins with <!--
and ends with -->
. Comments, in HTML, can be single or multiline. The following are some examples:
<!-- Comment on a single line --> <!-- This comment is over multiple lines. Comments can be used to inform and for detailed documentation. -->
You can use comments to provide helpful hints to other developers working on the web page but they will be ignored by the browser when parsing the page.
Let’s see what the previous snippet of text content looks like when it is given some meaning with HTML:
<h1>HTML</h1> <p> HyperText Markup Language (HTML) is a markup language used to describe the structure of a web page. </p> <p> We can use it to differentiate such content as: </p> <ul> <li>headings</li> <li>lists</li> <li>links</li> <li>images</li> </ul> <p> Want to <a href="https://www.packtpub.com/web- development">learn more about web development?</a> </p>
If we were to look at this HTML code rendered in a browser, it would look like the following figure:
Figure 1.6: HTML rendered in the Google Chrome web browser
The first line shows the HTML
text content with a start tag, <h1>
, and an end tag, </h1>
. This tells the browser to treat the text content as an h1
heading element.
The next line of our code snippet has a <p>
start tag, which means the content until the corresponding end tag, </p>
(on the last line), will be treated as a paragraph element. We then have another paragraph and then an unordered list element that starts with the <ul>
start tag and ends with the </ul>
end tag. The unordered list has four child elements, which are all list item elements (from the <li>
start tag to the </li>
end tag).
The last element in the example is another paragraph element, which combines text content and an anchor element. The anchor element, starting from the <a>
start tag and ending at the </a>
end tag, has the learn more about web development?
text content and an href
attribute. The href
attribute turns the anchor element into a hyperlink, which a user can click to navigate to the URL given as the value of the href
attribute.
As with our example, the contents of a paragraph element might be text but can also be other HTML elements, such as an anchor tag, <a>
. The relationship between the anchor and paragraph elements is a parent-child relationship.
HTML elements
HTML5 defines more than a hundred tags that we can use to mark up parts of an HTML document. These include the following:
- The document root element:
<html>
- Metadata elements:
<base>
,<head>
,<link>
,<meta>
,<style>
, and<title>
- Content sectioning elements:
<address>
,<article>
,<aside>
,<body>
,<footer>
,<header>
,<h1>
,<h2>
,<h3>
,<h4>
,<h5>
,<h6>
,<main>
,<nav>
, and<section>
- Block text elements:
<blockquote>
,<dd>
,<details>
,<dialog>
,<div>
,<dl>
,<dt>
,<figcaption>
,<figure>
,<hr>
,<li>
,<menu>
,<ol>
,<p>
,<pre>
,<summary>
, and<ul>
- Inline text elements:
<a>
,<abbr>
,<b>
,<bdi>
,<bdo>
,<br>
,<cite>
,<code>
,<data>
,<dfn>
,<em>
,<i>
,<kbd>
,<mark>
,<q>
,<rp>
,<rt>
,<ruby>
,<s>
,<samp>
,<small>
,<span>
,<strong>
,<sub>
,<sup>
,<del>
,<ins>
,<time>
,<u>
,<var>
, and<wbr>
- Media elements:
<area>
,<audio>
,<img>
,<canvas>
,<map>
,<track>
,<video>
,<embed>
,<iframe>
,<object>
,<picture>
,<portal>
,<source>
,<svg>
, and<math>
- Scripting elements:
<noscript>
and<script>
- Table elements:
<caption>
,<col>
,<colgroup>
,<table>
,<tbody>
,<td>
,<tfoot>
,<th>
,<thead>
, and<tr>
- Form elements:
<button>
,<datalist>
,<fieldset>
,<form>
,<input>
,<label>
,<legend>
,<meter>
,<optgroup>
,<option>
,<output>
,<progress>
,<select>
, and<textarea>
- Web component elements:
<slot>
and<template>
We don’t have to know all of these tags to use HTML well; some fulfill more common use cases than others. Each has a distinct purpose and provides a different semantic meaning and throughout this book, we will go into some detail about how to use these elements.
Content types
When starting with HTML, it can be easy to find the number and variety of elements overwhelming. It may be helpful to think about HTML in terms of content types.
The following table has a description and example of the different content types that can describe an element:
Type |
Description |
Example |
Metadata |
Content hosted in the head of an HTML document. Doesn’t appear in the web page directly but is used to describe a web page and its relationship to other external resources. |
|
Flow |
Text and all elements that can appear as content in the body of an HTML document. |
|
Sectioning |
Used to structure the content of a web page and to help with layout. Elements in this category are described in Chapter 2, Structure and Layout. |
|
Phrasing |
Elements such as those used for marking up content within a paragraph element. Chapter 3, Text and Typography, will be largely concerned with this content type. |
|
Heading |
Elements used to define the headings of a section of an HTML document. The |
|
Embedded |
Embedded content includes media, such as video, audio, and images. |
|
Interactive |
Elements that a user can interact with, which include media elements with controls, form inputs, buttons, and links. |
|
Table 1.1: Different content types
Let’s run through an example of how an element can fit into these category types using the <
img>
element.
If we want to embed an image in our web page, the simplest way is to use the img
element. If we want to create an img
element, an example of the code looks like this: <img src="media/kitten.png" alt="A
cute kitten">
.
We set the src
attribute on the img
element to an image URL; this is the source of the image that will be embedded in the web page.
Unless your image has no value other than as a decoration, it is a very good idea to include an alt
attribute. The alt
attribute provides an alternative description of the image as text, which can then be used by screen readers if an image does not load, or in a non-graphical browser.
Note
A screen reader is a software application that allows people who are visually impaired or blind to access and interact with a computer. The screen reader allows a user to navigate a web page with a keyboard and will output the content as speech. We will look further at accessibility in Chapter 9.
An img
element is a form of embedded content because it embeds an image in an HTML document. It can appear in the body of an HTML document as the child element of the body
element, so it would be categorized as flow content.
An image can be included as content in a paragraph, so it is a type of phrasing content. For example, we could have inline images appear in the flow of a paragraph:
<p> Kittens are everywhere on the internet. The best thing about kittens is that they are cute. Look here's a kitten now: <img src="media/kitten.jpg" alt="A cute kitten">. See, cute isn't it? </p>
This code would render the following figure, with the image embedded in the paragraph and the rest of the text flowing around it:
Figure 1.7: Image with text flowing around it
In certain circumstances, an img
element is a type of interactive content. For this to be the case, the image must have a usemap
attribute. The usemap
attribute allows you to specify an image map, which defines areas of an image that are treated as hyperlinks. This makes the image interactive.
An img
element does not act as metadata and it does not provide a sectioning structure to an HTML document. Nor is it a heading.
Elements can appear in more than one category and there is some overlap between the relationships of the categories. Some of these elements are very common and are used often, but some of these elements have very specific purposes and you may never come across a use case for them.
The content types can be useful for understanding how elements work together and which elements are valid in where. For further reference, we can see where each available element is categorized in the W3C’s documentation on HTML5: https://html.spec.whatwg.org/multipage/dom.html#kinds-of-content.
The HTML document
A web page is made up of an HTML document. The document represents a hierarchical tree structure similar to a family tree. Starting from a root element, the relationship between an element and its contents can be seen as that of a parent element and a child element. An element that is at the same level of the hierarchy as another element is a sibling to that element. We can describe elements within a branch of the tree as ancestors and descendants.
This structure can be represented as a tree diagram to get a better idea of the relationship between elements.
Take, for example, this simple HTML document:
<html> <head> <title>HTML Document structure</title> </head> <body> <div> <h1>Heading</h1> <p>First paragraph of text.</p> <p>Second paragraph of text.</p> </div> </body> </html>
Here, the root is an html
element. It has two children: a head
element (containing a title
) and a body
element containing some more content. It can be represented as a tree diagram as follows:
Figure 1.8: A representation of the HTML document as a tree diagram
In the browser, this code would render the following web page:
Figure 1.9: HTML rendered in the Google Chrome web browser
The <html>
element is the parent of both the <head>
and <body>
, which (as children of the same parent) are siblings. <body>
has one child, a <div>
tag, and that has three children: an <h1>
element and two <p>
elements. The <h1>
element is a descendant of <body>
but not of <head>
.
Understanding this structure will become more important when we look at CSS selectors and how we target parts of the HTML document later in this chapter.
Structuring an HTML document
An HTML5 document normally starts with a doctype declaration and has a root html
element with two children – the head
element and the body
element.
The doctype declaration tells the browser it is dealing with an HTML5 document. The doctype is <!DOCTYPE html>
and appears as the first line of the document. It is recommended to always add a doctype to make sure your HTML document renders as expected.
Note
The doctype declaration is not case sensitive, so variations such as <!doctype html>
and <!DOCTYPE HTML>
are equally valid.
One of the nice things about HTML5 is that it simplifies doctype declaration. Before HTML5, there were two commonly used variations of web markup – HTML4 and XHTML1 – and they both had strict, transitional, and frameset versions of their doctype declarations. For example, the HTML 4 strict declaration looked like this: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
.
After the doctype, we have the html
element, which is the root of the HTML document.
It is strongly recommended that you add a lang
attribute to your html
element to allow the browser, screen readers, and other technologies, such as translation tools, to better understand the text content of your web page.
The two children of the html
element are as follows:
- The
head
element, which includes thetitle
and metadata providing information about assets to load and how web crawlers and search engines should handle the page. - The
body
element, which mostly represents the content rendered for a human browser user to consume. This includes articles, images, and navigation.In code, the structure we have described looks like this:
<!doctype html> <html lang="en"> <head><title>Page Title</title></head> <body></body> </html>
This code would result in a blank web page with no content or metadata.
Metadata
The head
is home to most machine-read information in an HTML document. The browser, screen readers, and web crawlers can get a lot of information from metadata and handle the web page differently depending on that information.
The following elements are considered metadata content:
base
: This lets you set a base URLlink
: This determines the relationship between a page and a resource (such as an external style sheet)meta
: This a catch-all for metadatatitle
: This is the name of your web page as it appears in the browser tab and search results and is announced by screen readers- The
meta
element can represent many different types of metadata, including some used by social networks to represent a web page.
Some common usages include the following:
- Setting character encoding for a page –
<
meta charset="utf-8">
- Setting the viewport for a browser on a mobile device –
<meta name="viewport"
content="width=device-width, initial-scale=1">
These elements give web developers ways to tell a browser how to handle the HTML document and how it relates to its environment. We can describe our web page for other interested parties (such as search engines and web crawlers) using metadata.
Our first web page
In our first example, we will create a very simple web page. This will help us to understand the structure of an HTML document and where we put different types of content.
Exercise 1.01 – creating a web page
In this exercise, we will create our first web page. This will be the minimal foundation upon which future chapters can build.
Note
The complete code for this exercise can be found at https://packt.link/SduQx.
The steps are as follows:
- To start, we want to create a new folder,
chapter_1
, and then open that folder in Visual Studio Code (File | Open Folder…). - Next, we will create a new plain text file and save it as
index.html
. - In
index.html
, we start by adding the doctype declaration for HTML5:<!DOCTYPE html>
- Next, we add an HTML tag (the root element of the HTML document):
<html lang="en"> </html>
- In between the opening and closing tags of the
html
element, we add ahead
tag. This is where we can put metadata content. For now, thehead
tag will contain a title:<head> <title>HTML and CSS</title> </head>
- Below the
head
tag and above the closinghtml
tag, we can then add abody
tag. This is where we will put the majority of our content. For now, we will render a heading and a paragraph:<body> <h1>HTML and CSS</h1> <p> How to create a modern, responsive website with HTML and CSS </p> </body>
The result of this exercise should look like the following figure when opened in a browser:
Figure 1.10: The web page as displayed in the Chrome web browser
Activity 1.01 – video store page template
We’ve been tasked with creating a website for an online on-demand film store called Films On Demand. We don’t have designs yet but want to set up a web page boilerplate that we can use for all the pages on the site.
We will use comments as placeholders to know what needs to change for each page that is built on top of the boilerplate template. For visible content in the body
element, we will use lorem ipsum to get an idea of how content will flow.
The steps are as follows:
- Create a file named
template.html
. - We want the page to be a valid HTML5 document. So, we will need to add:
- The correct doctype definition.
- Elements to structure the document: The
html
element, thehead
element, and thebody
element. - A
title
element that combines the Films on Demand brand with some specifics about the current page. - Metadata to describe the site: We’ll set this to
Buy films from our great selection. Watch movies
on demand
. - Metadata for the page character set and a
viewport
tag to help make the site render better on mobile browsers.
- We want to add placeholders for a heading (an
h1
element) for the page, which we will populate with lorem ipsum, and a paragraph for the content flow, which we will also populate with the following lorem ipsum text:"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam quis scelerisque mauris. Curabitur aliquam ligula in erat placerat finibus. Mauris leo neque, malesuada et augue at, consectetur rhoncus libero. Suspendisse vitae dictum dolor. Vestibulum hendrerit iaculis ipsum, ac ornare ligula. Vestibulum efficitur mattis urna vitae ultrices. Nunc condimentum blandit tellus ut mattis. Morbi eget gravida leo. Mauris ornare lorem a mattis ultricies. Nullam convallis tincidunt nunc, eget rhoncus nulla tincidunt sed. Nulla consequat tellus lectus, in porta nulla facilisis eu. Donec bibendum nisi felis, sit amet cursus nisl suscipit ut. Pellentesque bibendum id libero at cursus. Donec ac viverra tellus. Proin sed dolor quis justo convallis auctor sit amet nec orci. Orci varius natoque penatibus et magnis dis parturient montes, nascetur
ridiculus mus."
Note
The solution to this activity can be found at https://packt.link/WbEPx
In this section, we’ve looked at HTML, the markup language that structures and gives context to the content of a web page. We have looked at the syntax of HTML, created our first web page, and learned about the structure of an HTML document. When we’ve looked at our web page in a browser, it has been rendered with the default styling provided by the browser. In the next section, we will look at how we can customize the styling of our web page using CSS. We will learn how to add styles, how to specify what parts of a page they apply to, and some of the properties we can style.