mirror of
https://github.com/AetherDroid/android_kernel_samsung_on5xelte.git
synced 2025-09-05 07:57:45 -04:00
Fixed MTP to work with TWRP
This commit is contained in:
commit
f6dfaef42e
50820 changed files with 20846062 additions and 0 deletions
69
Documentation/ia64/IRQ-redir.txt
Normal file
69
Documentation/ia64/IRQ-redir.txt
Normal file
|
@ -0,0 +1,69 @@
|
|||
IRQ affinity on IA64 platforms
|
||||
------------------------------
|
||||
07.01.2002, Erich Focht <efocht@ess.nec.de>
|
||||
|
||||
|
||||
By writing to /proc/irq/IRQ#/smp_affinity the interrupt routing can be
|
||||
controlled. The behavior on IA64 platforms is slightly different from
|
||||
that described in Documentation/IRQ-affinity.txt for i386 systems.
|
||||
|
||||
Because of the usage of SAPIC mode and physical destination mode the
|
||||
IRQ target is one particular CPU and cannot be a mask of several
|
||||
CPUs. Only the first non-zero bit is taken into account.
|
||||
|
||||
|
||||
Usage examples:
|
||||
|
||||
The target CPU has to be specified as a hexadecimal CPU mask. The
|
||||
first non-zero bit is the selected CPU. This format has been kept for
|
||||
compatibility reasons with i386.
|
||||
|
||||
Set the delivery mode of interrupt 41 to fixed and route the
|
||||
interrupts to CPU #3 (logical CPU number) (2^3=0x08):
|
||||
echo "8" >/proc/irq/41/smp_affinity
|
||||
|
||||
Set the default route for IRQ number 41 to CPU 6 in lowest priority
|
||||
delivery mode (redirectable):
|
||||
echo "r 40" >/proc/irq/41/smp_affinity
|
||||
|
||||
The output of the command
|
||||
cat /proc/irq/IRQ#/smp_affinity
|
||||
gives the target CPU mask for the specified interrupt vector. If the CPU
|
||||
mask is preceded by the character "r", the interrupt is redirectable
|
||||
(i.e. lowest priority mode routing is used), otherwise its route is
|
||||
fixed.
|
||||
|
||||
|
||||
|
||||
Initialization and default behavior:
|
||||
|
||||
If the platform features IRQ redirection (info provided by SAL) all
|
||||
IO-SAPIC interrupts are initialized with CPU#0 as their default target
|
||||
and the routing is the so called "lowest priority mode" (actually
|
||||
fixed SAPIC mode with hint). The XTP chipset registers are used as hints
|
||||
for the IRQ routing. Currently in Linux XTP registers can have three
|
||||
values:
|
||||
- minimal for an idle task,
|
||||
- normal if any other task runs,
|
||||
- maximal if the CPU is going to be switched off.
|
||||
The IRQ is routed to the CPU with lowest XTP register value, the
|
||||
search begins at the default CPU. Therefore most of the interrupts
|
||||
will be handled by CPU #0.
|
||||
|
||||
If the platform doesn't feature interrupt redirection IOSAPIC fixed
|
||||
routing is used. The target CPUs are distributed in a round robin
|
||||
manner. IRQs will be routed only to the selected target CPUs. Check
|
||||
with
|
||||
cat /proc/interrupts
|
||||
|
||||
|
||||
|
||||
Comments:
|
||||
|
||||
On large (multi-node) systems it is recommended to route the IRQs to
|
||||
the node to which the corresponding device is connected.
|
||||
For systems like the NEC AzusA we get IRQ node-affinity for free. This
|
||||
is because usually the chipsets on each node redirect the interrupts
|
||||
only to their own CPUs (as they cannot see the XTP registers on the
|
||||
other nodes).
|
||||
|
5
Documentation/ia64/Makefile
Normal file
5
Documentation/ia64/Makefile
Normal file
|
@ -0,0 +1,5 @@
|
|||
# List of programs to build
|
||||
hostprogs-y := aliasing-test
|
||||
|
||||
# Tell kbuild to always build the programs
|
||||
always := $(hostprogs-y)
|
43
Documentation/ia64/README
Normal file
43
Documentation/ia64/README
Normal file
|
@ -0,0 +1,43 @@
|
|||
Linux kernel release 2.4.xx for the IA-64 Platform
|
||||
|
||||
These are the release notes for Linux version 2.4 for IA-64
|
||||
platform. This document provides information specific to IA-64
|
||||
ONLY, to get additional information about the Linux kernel also
|
||||
read the original Linux README provided with the kernel.
|
||||
|
||||
INSTALLING the kernel:
|
||||
|
||||
- IA-64 kernel installation is the same as the other platforms, see
|
||||
original README for details.
|
||||
|
||||
|
||||
SOFTWARE REQUIREMENTS
|
||||
|
||||
Compiling and running this kernel requires an IA-64 compliant GCC
|
||||
compiler. And various software packages also compiled with an
|
||||
IA-64 compliant GCC compiler.
|
||||
|
||||
|
||||
CONFIGURING the kernel:
|
||||
|
||||
Configuration is the same, see original README for details.
|
||||
|
||||
|
||||
COMPILING the kernel:
|
||||
|
||||
- Compiling this kernel doesn't differ from other platform so read
|
||||
the original README for details BUT make sure you have an IA-64
|
||||
compliant GCC compiler.
|
||||
|
||||
IA-64 SPECIFICS
|
||||
|
||||
- General issues:
|
||||
|
||||
o Hardly any performance tuning has been done. Obvious targets
|
||||
include the library routines (IP checksum, etc.). Less
|
||||
obvious targets include making sure we don't flush the TLB
|
||||
needlessly, etc.
|
||||
|
||||
o SMP locks cleanup/optimization
|
||||
|
||||
o IA32 support. Currently experimental. It mostly works.
|
263
Documentation/ia64/aliasing-test.c
Normal file
263
Documentation/ia64/aliasing-test.c
Normal file
|
@ -0,0 +1,263 @@
|
|||
/*
|
||||
* Exercise /dev/mem mmap cases that have been troublesome in the past
|
||||
*
|
||||
* (c) Copyright 2007 Hewlett-Packard Development Company, L.P.
|
||||
* Bjorn Helgaas <bjorn.helgaas@hp.com>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <sys/types.h>
|
||||
#include <dirent.h>
|
||||
#include <fcntl.h>
|
||||
#include <fnmatch.h>
|
||||
#include <string.h>
|
||||
#include <sys/ioctl.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/stat.h>
|
||||
#include <unistd.h>
|
||||
#include <linux/pci.h>
|
||||
|
||||
int sum;
|
||||
|
||||
static int map_mem(char *path, off_t offset, size_t length, int touch)
|
||||
{
|
||||
int fd, rc;
|
||||
void *addr;
|
||||
int *c;
|
||||
|
||||
fd = open(path, O_RDWR);
|
||||
if (fd == -1) {
|
||||
perror(path);
|
||||
return -1;
|
||||
}
|
||||
|
||||
if (fnmatch("/proc/bus/pci/*", path, 0) == 0) {
|
||||
rc = ioctl(fd, PCIIOC_MMAP_IS_MEM);
|
||||
if (rc == -1)
|
||||
perror("PCIIOC_MMAP_IS_MEM ioctl");
|
||||
}
|
||||
|
||||
addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset);
|
||||
if (addr == MAP_FAILED)
|
||||
return 1;
|
||||
|
||||
if (touch) {
|
||||
c = (int *) addr;
|
||||
while (c < (int *) (addr + length))
|
||||
sum += *c++;
|
||||
}
|
||||
|
||||
rc = munmap(addr, length);
|
||||
if (rc == -1) {
|
||||
perror("munmap");
|
||||
return -1;
|
||||
}
|
||||
|
||||
close(fd);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int scan_tree(char *path, char *file, off_t offset, size_t length, int touch)
|
||||
{
|
||||
struct dirent **namelist;
|
||||
char *name, *path2;
|
||||
int i, n, r, rc = 0, result = 0;
|
||||
struct stat buf;
|
||||
|
||||
n = scandir(path, &namelist, 0, alphasort);
|
||||
if (n < 0) {
|
||||
perror("scandir");
|
||||
return -1;
|
||||
}
|
||||
|
||||
for (i = 0; i < n; i++) {
|
||||
name = namelist[i]->d_name;
|
||||
|
||||
if (fnmatch(".", name, 0) == 0)
|
||||
goto skip;
|
||||
if (fnmatch("..", name, 0) == 0)
|
||||
goto skip;
|
||||
|
||||
path2 = malloc(strlen(path) + strlen(name) + 3);
|
||||
strcpy(path2, path);
|
||||
strcat(path2, "/");
|
||||
strcat(path2, name);
|
||||
|
||||
if (fnmatch(file, name, 0) == 0) {
|
||||
rc = map_mem(path2, offset, length, touch);
|
||||
if (rc == 0)
|
||||
fprintf(stderr, "PASS: %s 0x%lx-0x%lx is %s\n", path2, offset, offset + length, touch ? "readable" : "mappable");
|
||||
else if (rc > 0)
|
||||
fprintf(stderr, "PASS: %s 0x%lx-0x%lx not mappable\n", path2, offset, offset + length);
|
||||
else {
|
||||
fprintf(stderr, "FAIL: %s 0x%lx-0x%lx not accessible\n", path2, offset, offset + length);
|
||||
return rc;
|
||||
}
|
||||
} else {
|
||||
r = lstat(path2, &buf);
|
||||
if (r == 0 && S_ISDIR(buf.st_mode)) {
|
||||
rc = scan_tree(path2, file, offset, length, touch);
|
||||
if (rc < 0)
|
||||
return rc;
|
||||
}
|
||||
}
|
||||
|
||||
result |= rc;
|
||||
free(path2);
|
||||
|
||||
skip:
|
||||
free(namelist[i]);
|
||||
}
|
||||
free(namelist);
|
||||
return result;
|
||||
}
|
||||
|
||||
char buf[1024];
|
||||
|
||||
static int read_rom(char *path)
|
||||
{
|
||||
int fd, rc;
|
||||
size_t size = 0;
|
||||
|
||||
fd = open(path, O_RDWR);
|
||||
if (fd == -1) {
|
||||
perror(path);
|
||||
return -1;
|
||||
}
|
||||
|
||||
rc = write(fd, "1", 2);
|
||||
if (rc <= 0) {
|
||||
close(fd);
|
||||
perror("write");
|
||||
return -1;
|
||||
}
|
||||
|
||||
do {
|
||||
rc = read(fd, buf, sizeof(buf));
|
||||
if (rc > 0)
|
||||
size += rc;
|
||||
} while (rc > 0);
|
||||
|
||||
close(fd);
|
||||
return size;
|
||||
}
|
||||
|
||||
static int scan_rom(char *path, char *file)
|
||||
{
|
||||
struct dirent **namelist;
|
||||
char *name, *path2;
|
||||
int i, n, r, rc = 0, result = 0;
|
||||
struct stat buf;
|
||||
|
||||
n = scandir(path, &namelist, 0, alphasort);
|
||||
if (n < 0) {
|
||||
perror("scandir");
|
||||
return -1;
|
||||
}
|
||||
|
||||
for (i = 0; i < n; i++) {
|
||||
name = namelist[i]->d_name;
|
||||
|
||||
if (fnmatch(".", name, 0) == 0)
|
||||
goto skip;
|
||||
if (fnmatch("..", name, 0) == 0)
|
||||
goto skip;
|
||||
|
||||
path2 = malloc(strlen(path) + strlen(name) + 3);
|
||||
strcpy(path2, path);
|
||||
strcat(path2, "/");
|
||||
strcat(path2, name);
|
||||
|
||||
if (fnmatch(file, name, 0) == 0) {
|
||||
rc = read_rom(path2);
|
||||
|
||||
/*
|
||||
* It's OK if the ROM is unreadable. Maybe there
|
||||
* is no ROM, or some other error occurred. The
|
||||
* important thing is that no MCA happened.
|
||||
*/
|
||||
if (rc > 0)
|
||||
fprintf(stderr, "PASS: %s read %d bytes\n", path2, rc);
|
||||
else {
|
||||
fprintf(stderr, "PASS: %s not readable\n", path2);
|
||||
return rc;
|
||||
}
|
||||
} else {
|
||||
r = lstat(path2, &buf);
|
||||
if (r == 0 && S_ISDIR(buf.st_mode)) {
|
||||
rc = scan_rom(path2, file);
|
||||
if (rc < 0)
|
||||
return rc;
|
||||
}
|
||||
}
|
||||
|
||||
result |= rc;
|
||||
free(path2);
|
||||
|
||||
skip:
|
||||
free(namelist[i]);
|
||||
}
|
||||
free(namelist);
|
||||
return result;
|
||||
}
|
||||
|
||||
int main(void)
|
||||
{
|
||||
int rc;
|
||||
|
||||
if (map_mem("/dev/mem", 0, 0xA0000, 1) == 0)
|
||||
fprintf(stderr, "PASS: /dev/mem 0x0-0xa0000 is readable\n");
|
||||
else
|
||||
fprintf(stderr, "FAIL: /dev/mem 0x0-0xa0000 not accessible\n");
|
||||
|
||||
/*
|
||||
* It's not safe to blindly read the VGA frame buffer. If you know
|
||||
* how to poke the card the right way, it should respond, but it's
|
||||
* not safe in general. Many machines, e.g., Intel chipsets, cover
|
||||
* up a non-responding card by just returning -1, but others will
|
||||
* report the failure as a machine check.
|
||||
*/
|
||||
if (map_mem("/dev/mem", 0xA0000, 0x20000, 0) == 0)
|
||||
fprintf(stderr, "PASS: /dev/mem 0xa0000-0xc0000 is mappable\n");
|
||||
else
|
||||
fprintf(stderr, "FAIL: /dev/mem 0xa0000-0xc0000 not accessible\n");
|
||||
|
||||
if (map_mem("/dev/mem", 0xC0000, 0x40000, 1) == 0)
|
||||
fprintf(stderr, "PASS: /dev/mem 0xc0000-0x100000 is readable\n");
|
||||
else
|
||||
fprintf(stderr, "FAIL: /dev/mem 0xc0000-0x100000 not accessible\n");
|
||||
|
||||
/*
|
||||
* Often you can map all the individual pieces above (0-0xA0000,
|
||||
* 0xA0000-0xC0000, and 0xC0000-0x100000), but can't map the whole
|
||||
* thing at once. This is because the individual pieces use different
|
||||
* attributes, and there's no single attribute supported over the
|
||||
* whole region.
|
||||
*/
|
||||
rc = map_mem("/dev/mem", 0, 1024*1024, 0);
|
||||
if (rc == 0)
|
||||
fprintf(stderr, "PASS: /dev/mem 0x0-0x100000 is mappable\n");
|
||||
else if (rc > 0)
|
||||
fprintf(stderr, "PASS: /dev/mem 0x0-0x100000 not mappable\n");
|
||||
else
|
||||
fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n");
|
||||
|
||||
scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1);
|
||||
scan_tree("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0);
|
||||
scan_tree("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1);
|
||||
scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0);
|
||||
|
||||
scan_rom("/sys/devices", "rom");
|
||||
|
||||
scan_tree("/proc/bus/pci", "??.?", 0, 0xA0000, 1);
|
||||
scan_tree("/proc/bus/pci", "??.?", 0xA0000, 0x20000, 0);
|
||||
scan_tree("/proc/bus/pci", "??.?", 0xC0000, 0x40000, 1);
|
||||
scan_tree("/proc/bus/pci", "??.?", 0, 1024*1024, 0);
|
||||
|
||||
return rc;
|
||||
}
|
221
Documentation/ia64/aliasing.txt
Normal file
221
Documentation/ia64/aliasing.txt
Normal file
|
@ -0,0 +1,221 @@
|
|||
MEMORY ATTRIBUTE ALIASING ON IA-64
|
||||
|
||||
Bjorn Helgaas
|
||||
<bjorn.helgaas@hp.com>
|
||||
May 4, 2006
|
||||
|
||||
|
||||
MEMORY ATTRIBUTES
|
||||
|
||||
Itanium supports several attributes for virtual memory references.
|
||||
The attribute is part of the virtual translation, i.e., it is
|
||||
contained in the TLB entry. The ones of most interest to the Linux
|
||||
kernel are:
|
||||
|
||||
WB Write-back (cacheable)
|
||||
UC Uncacheable
|
||||
WC Write-coalescing
|
||||
|
||||
System memory typically uses the WB attribute. The UC attribute is
|
||||
used for memory-mapped I/O devices. The WC attribute is uncacheable
|
||||
like UC is, but writes may be delayed and combined to increase
|
||||
performance for things like frame buffers.
|
||||
|
||||
The Itanium architecture requires that we avoid accessing the same
|
||||
page with both a cacheable mapping and an uncacheable mapping[1].
|
||||
|
||||
The design of the chipset determines which attributes are supported
|
||||
on which regions of the address space. For example, some chipsets
|
||||
support either WB or UC access to main memory, while others support
|
||||
only WB access.
|
||||
|
||||
MEMORY MAP
|
||||
|
||||
Platform firmware describes the physical memory map and the
|
||||
supported attributes for each region. At boot-time, the kernel uses
|
||||
the EFI GetMemoryMap() interface. ACPI can also describe memory
|
||||
devices and the attributes they support, but Linux/ia64 currently
|
||||
doesn't use this information.
|
||||
|
||||
The kernel uses the efi_memmap table returned from GetMemoryMap() to
|
||||
learn the attributes supported by each region of physical address
|
||||
space. Unfortunately, this table does not completely describe the
|
||||
address space because some machines omit some or all of the MMIO
|
||||
regions from the map.
|
||||
|
||||
The kernel maintains another table, kern_memmap, which describes the
|
||||
memory Linux is actually using and the attribute for each region.
|
||||
This contains only system memory; it does not contain MMIO space.
|
||||
|
||||
The kern_memmap table typically contains only a subset of the system
|
||||
memory described by the efi_memmap. Linux/ia64 can't use all memory
|
||||
in the system because of constraints imposed by the identity mapping
|
||||
scheme.
|
||||
|
||||
The efi_memmap table is preserved unmodified because the original
|
||||
boot-time information is required for kexec.
|
||||
|
||||
KERNEL IDENTITY MAPPINGS
|
||||
|
||||
Linux/ia64 identity mappings are done with large pages, currently
|
||||
either 16MB or 64MB, referred to as "granules." Cacheable mappings
|
||||
are speculative[2], so the processor can read any location in the
|
||||
page at any time, independent of the programmer's intentions. This
|
||||
means that to avoid attribute aliasing, Linux can create a cacheable
|
||||
identity mapping only when the entire granule supports cacheable
|
||||
access.
|
||||
|
||||
Therefore, kern_memmap contains only full granule-sized regions that
|
||||
can referenced safely by an identity mapping.
|
||||
|
||||
Uncacheable mappings are not speculative, so the processor will
|
||||
generate UC accesses only to locations explicitly referenced by
|
||||
software. This allows UC identity mappings to cover granules that
|
||||
are only partially populated, or populated with a combination of UC
|
||||
and WB regions.
|
||||
|
||||
USER MAPPINGS
|
||||
|
||||
User mappings are typically done with 16K or 64K pages. The smaller
|
||||
page size allows more flexibility because only 16K or 64K has to be
|
||||
homogeneous with respect to memory attributes.
|
||||
|
||||
POTENTIAL ATTRIBUTE ALIASING CASES
|
||||
|
||||
There are several ways the kernel creates new mappings:
|
||||
|
||||
mmap of /dev/mem
|
||||
|
||||
This uses remap_pfn_range(), which creates user mappings. These
|
||||
mappings may be either WB or UC. If the region being mapped
|
||||
happens to be in kern_memmap, meaning that it may also be mapped
|
||||
by a kernel identity mapping, the user mapping must use the same
|
||||
attribute as the kernel mapping.
|
||||
|
||||
If the region is not in kern_memmap, the user mapping should use
|
||||
an attribute reported as being supported in the EFI memory map.
|
||||
|
||||
Since the EFI memory map does not describe MMIO on some
|
||||
machines, this should use an uncacheable mapping as a fallback.
|
||||
|
||||
mmap of /sys/class/pci_bus/.../legacy_mem
|
||||
|
||||
This is very similar to mmap of /dev/mem, except that legacy_mem
|
||||
only allows mmap of the one megabyte "legacy MMIO" area for a
|
||||
specific PCI bus. Typically this is the first megabyte of
|
||||
physical address space, but it may be different on machines with
|
||||
several VGA devices.
|
||||
|
||||
"X" uses this to access VGA frame buffers. Using legacy_mem
|
||||
rather than /dev/mem allows multiple instances of X to talk to
|
||||
different VGA cards.
|
||||
|
||||
The /dev/mem mmap constraints apply.
|
||||
|
||||
mmap of /proc/bus/pci/.../??.?
|
||||
|
||||
This is an MMIO mmap of PCI functions, which additionally may or
|
||||
may not be requested as using the WC attribute.
|
||||
|
||||
If WC is requested, and the region in kern_memmap is either WC
|
||||
or UC, and the EFI memory map designates the region as WC, then
|
||||
the WC mapping is allowed.
|
||||
|
||||
Otherwise, the user mapping must use the same attribute as the
|
||||
kernel mapping.
|
||||
|
||||
read/write of /dev/mem
|
||||
|
||||
This uses copy_from_user(), which implicitly uses a kernel
|
||||
identity mapping. This is obviously safe for things in
|
||||
kern_memmap.
|
||||
|
||||
There may be corner cases of things that are not in kern_memmap,
|
||||
but could be accessed this way. For example, registers in MMIO
|
||||
space are not in kern_memmap, but could be accessed with a UC
|
||||
mapping. This would not cause attribute aliasing. But
|
||||
registers typically can be accessed only with four-byte or
|
||||
eight-byte accesses, and the copy_from_user() path doesn't allow
|
||||
any control over the access size, so this would be dangerous.
|
||||
|
||||
ioremap()
|
||||
|
||||
This returns a mapping for use inside the kernel.
|
||||
|
||||
If the region is in kern_memmap, we should use the attribute
|
||||
specified there.
|
||||
|
||||
If the EFI memory map reports that the entire granule supports
|
||||
WB, we should use that (granules that are partially reserved
|
||||
or occupied by firmware do not appear in kern_memmap).
|
||||
|
||||
If the granule contains non-WB memory, but we can cover the
|
||||
region safely with kernel page table mappings, we can use
|
||||
ioremap_page_range() as most other architectures do.
|
||||
|
||||
Failing all of the above, we have to fall back to a UC mapping.
|
||||
|
||||
PAST PROBLEM CASES
|
||||
|
||||
mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
|
||||
|
||||
The EFI memory map may not report these MMIO regions.
|
||||
|
||||
These must be allowed so that X will work. This means that
|
||||
when the EFI memory map is incomplete, every /dev/mem mmap must
|
||||
succeed. It may create either WB or UC user mappings, depending
|
||||
on whether the region is in kern_memmap or the EFI memory map.
|
||||
|
||||
mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
|
||||
|
||||
The EFI memory map reports the following attributes:
|
||||
0x00000-0x9FFFF WB only
|
||||
0xA0000-0xBFFFF UC only (VGA frame buffer)
|
||||
0xC0000-0xFFFFF WB only
|
||||
|
||||
This mmap is done with user pages, not kernel identity mappings,
|
||||
so it is safe to use WB mappings.
|
||||
|
||||
The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
|
||||
which uses a granule-sized UC mapping. This granule will cover some
|
||||
WB-only memory, but since UC is non-speculative, the processor will
|
||||
never generate an uncacheable reference to the WB-only areas unless
|
||||
the driver explicitly touches them.
|
||||
|
||||
mmap of 0x0-0xFFFFF legacy_mem by "X"
|
||||
|
||||
If the EFI memory map reports that the entire range supports the
|
||||
same attributes, we can allow the mmap (and we will prefer WB if
|
||||
supported, as is the case with HP sx[12]000 machines with VGA
|
||||
disabled).
|
||||
|
||||
If EFI reports the range as partly WB and partly UC (as on sx[12]000
|
||||
machines with VGA enabled), we must fail the mmap because there's no
|
||||
safe attribute to use.
|
||||
|
||||
If EFI reports some of the range but not all (as on Intel firmware
|
||||
that doesn't report the VGA frame buffer at all), we should fail the
|
||||
mmap and force the user to map just the specific region of interest.
|
||||
|
||||
mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
|
||||
|
||||
The EFI memory map reports the following attributes:
|
||||
0x00000-0xFFFFF WB only (no VGA MMIO hole)
|
||||
|
||||
This is a special case of the previous case, and the mmap should
|
||||
fail for the same reason as above.
|
||||
|
||||
read of /sys/devices/.../rom
|
||||
|
||||
For VGA devices, this may cause an ioremap() of 0xC0000. This
|
||||
used to be done with a UC mapping, because the VGA frame buffer
|
||||
at 0xA0000 prevents use of a WB granule. The UC mapping causes
|
||||
an MCA on HP sx[12]000 chipsets.
|
||||
|
||||
We should use WB page table mappings to avoid covering the VGA
|
||||
frame buffer.
|
||||
|
||||
NOTES
|
||||
|
||||
[1] SDM rev 2.2, vol 2, sec 4.4.1.
|
||||
[2] SDM rev 2.2, vol 2, sec 4.4.6.
|
128
Documentation/ia64/efirtc.txt
Normal file
128
Documentation/ia64/efirtc.txt
Normal file
|
@ -0,0 +1,128 @@
|
|||
EFI Real Time Clock driver
|
||||
-------------------------------
|
||||
S. Eranian <eranian@hpl.hp.com>
|
||||
March 2000
|
||||
|
||||
I/ Introduction
|
||||
|
||||
This document describes the efirtc.c driver has provided for
|
||||
the IA-64 platform.
|
||||
|
||||
The purpose of this driver is to supply an API for kernel and user applications
|
||||
to get access to the Time Service offered by EFI version 0.92.
|
||||
|
||||
EFI provides 4 calls one can make once the OS is booted: GetTime(),
|
||||
SetTime(), GetWakeupTime(), SetWakeupTime() which are all supported by this
|
||||
driver. We describe those calls as well the design of the driver in the
|
||||
following sections.
|
||||
|
||||
II/ Design Decisions
|
||||
|
||||
The original ideas was to provide a very simple driver to get access to,
|
||||
at first, the time of day service. This is required in order to access, in a
|
||||
portable way, the CMOS clock. A program like /sbin/hwclock uses such a clock
|
||||
to initialize the system view of the time during boot.
|
||||
|
||||
Because we wanted to minimize the impact on existing user-level apps using
|
||||
the CMOS clock, we decided to expose an API that was very similar to the one
|
||||
used today with the legacy RTC driver (driver/char/rtc.c). However, because
|
||||
EFI provides a simpler services, not all ioctl() are available. Also
|
||||
new ioctl()s have been introduced for things that EFI provides but not the
|
||||
legacy.
|
||||
|
||||
EFI uses a slightly different way of representing the time, noticeably
|
||||
the reference date is different. Year is the using the full 4-digit format.
|
||||
The Epoch is January 1st 1998. For backward compatibility reasons we don't
|
||||
expose this new way of representing time. Instead we use something very
|
||||
similar to the struct tm, i.e. struct rtc_time, as used by hwclock.
|
||||
One of the reasons for doing it this way is to allow for EFI to still evolve
|
||||
without necessarily impacting any of the user applications. The decoupling
|
||||
enables flexibility and permits writing wrapper code is ncase things change.
|
||||
|
||||
The driver exposes two interfaces, one via the device file and a set of
|
||||
ioctl()s. The other is read-only via the /proc filesystem.
|
||||
|
||||
As of today we don't offer a /proc/sys interface.
|
||||
|
||||
To allow for a uniform interface between the legacy RTC and EFI time service,
|
||||
we have created the include/linux/rtc.h header file to contain only the
|
||||
"public" API of the two drivers. The specifics of the legacy RTC are still
|
||||
in include/linux/mc146818rtc.h.
|
||||
|
||||
|
||||
III/ Time of day service
|
||||
|
||||
The part of the driver gives access to the time of day service of EFI.
|
||||
Two ioctl()s, compatible with the legacy RTC calls:
|
||||
|
||||
Read the CMOS clock: ioctl(d, RTC_RD_TIME, &rtc);
|
||||
|
||||
Write the CMOS clock: ioctl(d, RTC_SET_TIME, &rtc);
|
||||
|
||||
The rtc is a pointer to a data structure defined in rtc.h which is close
|
||||
to a struct tm:
|
||||
|
||||
struct rtc_time {
|
||||
int tm_sec;
|
||||
int tm_min;
|
||||
int tm_hour;
|
||||
int tm_mday;
|
||||
int tm_mon;
|
||||
int tm_year;
|
||||
int tm_wday;
|
||||
int tm_yday;
|
||||
int tm_isdst;
|
||||
};
|
||||
|
||||
The driver takes care of converting back an forth between the EFI time and
|
||||
this format.
|
||||
|
||||
Those two ioctl()s can be exercised with the hwclock command:
|
||||
|
||||
For reading:
|
||||
# /sbin/hwclock --show
|
||||
Mon Mar 6 15:32:32 2000 -0.910248 seconds
|
||||
|
||||
For setting:
|
||||
# /sbin/hwclock --systohc
|
||||
|
||||
Root privileges are required to be able to set the time of day.
|
||||
|
||||
IV/ Wakeup Alarm service
|
||||
|
||||
EFI provides an API by which one can program when a machine should wakeup,
|
||||
i.e. reboot. This is very different from the alarm provided by the legacy
|
||||
RTC which is some kind of interval timer alarm. For this reason we don't use
|
||||
the same ioctl()s to get access to the service. Instead we have
|
||||
introduced 2 news ioctl()s to the interface of an RTC.
|
||||
|
||||
We have added 2 new ioctl()s that are specific to the EFI driver:
|
||||
|
||||
Read the current state of the alarm
|
||||
ioctl(d, RTC_WKLAM_RD, &wkt)
|
||||
|
||||
Set the alarm or change its status
|
||||
ioctl(d, RTC_WKALM_SET, &wkt)
|
||||
|
||||
The wkt structure encapsulates a struct rtc_time + 2 extra fields to get
|
||||
status information:
|
||||
|
||||
struct rtc_wkalrm {
|
||||
|
||||
unsigned char enabled; /* =1 if alarm is enabled */
|
||||
unsigned char pending; /* =1 if alarm is pending */
|
||||
|
||||
struct rtc_time time;
|
||||
}
|
||||
|
||||
As of today, none of the existing user-level apps supports this feature.
|
||||
However writing such a program should be hard by simply using those two
|
||||
ioctl().
|
||||
|
||||
Root privileges are required to be able to set the alarm.
|
||||
|
||||
V/ References.
|
||||
|
||||
Checkout the following Web site for more information on EFI:
|
||||
|
||||
http://developer.intel.com/technology/efi/
|
1068
Documentation/ia64/err_inject.txt
Normal file
1068
Documentation/ia64/err_inject.txt
Normal file
File diff suppressed because it is too large
Load diff
286
Documentation/ia64/fsys.txt
Normal file
286
Documentation/ia64/fsys.txt
Normal file
|
@ -0,0 +1,286 @@
|
|||
-*-Mode: outline-*-
|
||||
|
||||
Light-weight System Calls for IA-64
|
||||
-----------------------------------
|
||||
|
||||
Started: 13-Jan-2003
|
||||
Last update: 27-Sep-2003
|
||||
|
||||
David Mosberger-Tang
|
||||
<davidm@hpl.hp.com>
|
||||
|
||||
Using the "epc" instruction effectively introduces a new mode of
|
||||
execution to the ia64 linux kernel. We call this mode the
|
||||
"fsys-mode". To recap, the normal states of execution are:
|
||||
|
||||
- kernel mode:
|
||||
Both the register stack and the memory stack have been
|
||||
switched over to kernel memory. The user-level state is saved
|
||||
in a pt-regs structure at the top of the kernel memory stack.
|
||||
|
||||
- user mode:
|
||||
Both the register stack and the kernel stack are in
|
||||
user memory. The user-level state is contained in the
|
||||
CPU registers.
|
||||
|
||||
- bank 0 interruption-handling mode:
|
||||
This is the non-interruptible state which all
|
||||
interruption-handlers start execution in. The user-level
|
||||
state remains in the CPU registers and some kernel state may
|
||||
be stored in bank 0 of registers r16-r31.
|
||||
|
||||
In contrast, fsys-mode has the following special properties:
|
||||
|
||||
- execution is at privilege level 0 (most-privileged)
|
||||
|
||||
- CPU registers may contain a mixture of user-level and kernel-level
|
||||
state (it is the responsibility of the kernel to ensure that no
|
||||
security-sensitive kernel-level state is leaked back to
|
||||
user-level)
|
||||
|
||||
- execution is interruptible and preemptible (an fsys-mode handler
|
||||
can disable interrupts and avoid all other interruption-sources
|
||||
to avoid preemption)
|
||||
|
||||
- neither the memory-stack nor the register-stack can be trusted while
|
||||
in fsys-mode (they point to the user-level stacks, which may
|
||||
be invalid, or completely bogus addresses)
|
||||
|
||||
In summary, fsys-mode is much more similar to running in user-mode
|
||||
than it is to running in kernel-mode. Of course, given that the
|
||||
privilege level is at level 0, this means that fsys-mode requires some
|
||||
care (see below).
|
||||
|
||||
|
||||
* How to tell fsys-mode
|
||||
|
||||
Linux operates in fsys-mode when (a) the privilege level is 0 (most
|
||||
privileged) and (b) the stacks have NOT been switched to kernel memory
|
||||
yet. For convenience, the header file <asm-ia64/ptrace.h> provides
|
||||
three macros:
|
||||
|
||||
user_mode(regs)
|
||||
user_stack(task,regs)
|
||||
fsys_mode(task,regs)
|
||||
|
||||
The "regs" argument is a pointer to a pt_regs structure. The "task"
|
||||
argument is a pointer to the task structure to which the "regs"
|
||||
pointer belongs to. user_mode() returns TRUE if the CPU state pointed
|
||||
to by "regs" was executing in user mode (privilege level 3).
|
||||
user_stack() returns TRUE if the state pointed to by "regs" was
|
||||
executing on the user-level stack(s). Finally, fsys_mode() returns
|
||||
TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
|
||||
The fsys_mode() macro is equivalent to the expression:
|
||||
|
||||
!user_mode(regs) && user_stack(task,regs)
|
||||
|
||||
* How to write an fsyscall handler
|
||||
|
||||
The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
|
||||
(fsyscall_table). This table contains one entry for each system call.
|
||||
By default, a system call is handled by fsys_fallback_syscall(). This
|
||||
routine takes care of entering (full) kernel mode and calling the
|
||||
normal Linux system call handler. For performance-critical system
|
||||
calls, it is possible to write a hand-tuned fsyscall_handler. For
|
||||
example, fsys.S contains fsys_getpid(), which is a hand-tuned version
|
||||
of the getpid() system call.
|
||||
|
||||
The entry and exit-state of an fsyscall handler is as follows:
|
||||
|
||||
** Machine state on entry to fsyscall handler:
|
||||
|
||||
- r10 = 0
|
||||
- r11 = saved ar.pfs (a user-level value)
|
||||
- r15 = system call number
|
||||
- r16 = "current" task pointer (in normal kernel-mode, this is in r13)
|
||||
- r32-r39 = system call arguments
|
||||
- b6 = return address (a user-level value)
|
||||
- ar.pfs = previous frame-state (a user-level value)
|
||||
- PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
|
||||
- all other registers may contain values passed in from user-mode
|
||||
|
||||
** Required machine state on exit to fsyscall handler:
|
||||
|
||||
- r11 = saved ar.pfs (as passed into the fsyscall handler)
|
||||
- r15 = system call number (as passed into the fsyscall handler)
|
||||
- r32-r39 = system call arguments (as passed into the fsyscall handler)
|
||||
- b6 = return address (as passed into the fsyscall handler)
|
||||
- ar.pfs = previous frame-state (as passed into the fsyscall handler)
|
||||
|
||||
Fsyscall handlers can execute with very little overhead, but with that
|
||||
speed comes a set of restrictions:
|
||||
|
||||
o Fsyscall-handlers MUST check for any pending work in the flags
|
||||
member of the thread-info structure and if any of the
|
||||
TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
|
||||
doing a full system call (by calling fsys_fallback_syscall).
|
||||
|
||||
o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
|
||||
r15, b6, and ar.pfs) because they will be needed in case of a
|
||||
system call restart. Of course, all "preserved" registers also
|
||||
must be preserved, in accordance to the normal calling conventions.
|
||||
|
||||
o Fsyscall-handlers MUST check argument registers for containing a
|
||||
NaT value before using them in any way that could trigger a
|
||||
NaT-consumption fault. If a system call argument is found to
|
||||
contain a NaT value, an fsyscall-handler may return immediately
|
||||
with r8=EINVAL, r10=-1.
|
||||
|
||||
o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
|
||||
any other operation that would trigger mandatory RSE
|
||||
(register-stack engine) traffic.
|
||||
|
||||
o Fsyscall-handlers MUST NOT write to any stacked registers because
|
||||
it is not safe to assume that user-level called a handler with the
|
||||
proper number of arguments.
|
||||
|
||||
o Fsyscall-handlers need to be careful when accessing per-CPU variables:
|
||||
unless proper safe-guards are taken (e.g., interruptions are avoided),
|
||||
execution may be pre-empted and resumed on another CPU at any given
|
||||
time.
|
||||
|
||||
o Fsyscall-handlers must be careful not to leak sensitive kernel'
|
||||
information back to user-level. In particular, before returning to
|
||||
user-level, care needs to be taken to clear any scratch registers
|
||||
that could contain sensitive information (note that the current
|
||||
task pointer is not considered sensitive: it's already exposed
|
||||
through ar.k6).
|
||||
|
||||
o Fsyscall-handlers MUST NOT access user-memory without first
|
||||
validating access-permission (this can be done typically via
|
||||
probe.r.fault and/or probe.w.fault) and without guarding against
|
||||
memory access exceptions (this can be done with the EX() macros
|
||||
defined by asmmacro.h).
|
||||
|
||||
The above restrictions may seem draconian, but remember that it's
|
||||
possible to trade off some of the restrictions by paying a slightly
|
||||
higher overhead. For example, if an fsyscall-handler could benefit
|
||||
from the shadow register bank, it could temporarily disable PSR.i and
|
||||
PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
|
||||
needed. In other words, following the above rules yields extremely
|
||||
fast system call execution (while fully preserving system call
|
||||
semantics), but there is also a lot of flexibility in handling more
|
||||
complicated cases.
|
||||
|
||||
* Signal handling
|
||||
|
||||
The delivery of (asynchronous) signals must be delayed until fsys-mode
|
||||
is exited. This is accomplished with the help of the lower-privilege
|
||||
transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
|
||||
checks whether the interrupted task was in fsys-mode and, if so, sets
|
||||
PSR.lp and returns immediately. When fsys-mode is exited via the
|
||||
"br.ret" instruction that lowers the privilege level, a trap will
|
||||
occur. The trap handler clears PSR.lp again and returns immediately.
|
||||
The kernel exit path then checks for and delivers any pending signals.
|
||||
|
||||
* PSR Handling
|
||||
|
||||
The "epc" instruction doesn't change the contents of PSR at all. This
|
||||
is in contrast to a regular interruption, which clears almost all
|
||||
bits. Because of that, some care needs to be taken to ensure things
|
||||
work as expected. The following discussion describes how each PSR bit
|
||||
is handled.
|
||||
|
||||
PSR.be Cleared when entering fsys-mode. A srlz.d instruction is used
|
||||
to ensure the CPU is in little-endian mode before the first
|
||||
load/store instruction is executed. PSR.be is normally NOT
|
||||
restored upon return from an fsys-mode handler. In other
|
||||
words, user-level code must not rely on PSR.be being preserved
|
||||
across a system call.
|
||||
PSR.up Unchanged.
|
||||
PSR.ac Unchanged.
|
||||
PSR.mfl Unchanged. Note: fsys-mode handlers must not write-registers!
|
||||
PSR.mfh Unchanged. Note: fsys-mode handlers must not write-registers!
|
||||
PSR.ic Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
|
||||
PSR.i Unchanged. Note: fsys-mode handlers can clear the bit, if needed.
|
||||
PSR.pk Unchanged.
|
||||
PSR.dt Unchanged.
|
||||
PSR.dfl Unchanged. Note: fsys-mode handlers must not write-registers!
|
||||
PSR.dfh Unchanged. Note: fsys-mode handlers must not write-registers!
|
||||
PSR.sp Unchanged.
|
||||
PSR.pp Unchanged.
|
||||
PSR.di Unchanged.
|
||||
PSR.si Unchanged.
|
||||
PSR.db Unchanged. The kernel prevents user-level from setting a hardware
|
||||
breakpoint that triggers at any privilege level other than 3 (user-mode).
|
||||
PSR.lp Unchanged.
|
||||
PSR.tb Lazy redirect. If a taken-branch trap occurs while in
|
||||
fsys-mode, the trap-handler modifies the saved machine state
|
||||
such that execution resumes in the gate page at
|
||||
syscall_via_break(), with privilege level 3. Note: the
|
||||
taken branch would occur on the branch invoking the
|
||||
fsyscall-handler, at which point, by definition, a syscall
|
||||
restart is still safe. If the system call number is invalid,
|
||||
the fsys-mode handler will return directly to user-level. This
|
||||
return will trigger a taken-branch trap, but since the trap is
|
||||
taken _after_ restoring the privilege level, the CPU has already
|
||||
left fsys-mode, so no special treatment is needed.
|
||||
PSR.rt Unchanged.
|
||||
PSR.cpl Cleared to 0.
|
||||
PSR.is Unchanged (guaranteed to be 0 on entry to the gate page).
|
||||
PSR.mc Unchanged.
|
||||
PSR.it Unchanged (guaranteed to be 1).
|
||||
PSR.id Unchanged. Note: the ia64 linux kernel never sets this bit.
|
||||
PSR.da Unchanged. Note: the ia64 linux kernel never sets this bit.
|
||||
PSR.dd Unchanged. Note: the ia64 linux kernel never sets this bit.
|
||||
PSR.ss Lazy redirect. If set, "epc" will cause a Single Step Trap to
|
||||
be taken. The trap handler then modifies the saved machine
|
||||
state such that execution resumes in the gate page at
|
||||
syscall_via_break(), with privilege level 3.
|
||||
PSR.ri Unchanged.
|
||||
PSR.ed Unchanged. Note: This bit could only have an effect if an fsys-mode
|
||||
handler performed a speculative load that gets NaTted. If so, this
|
||||
would be the normal & expected behavior, so no special treatment is
|
||||
needed.
|
||||
PSR.bn Unchanged. Note: fsys-mode handlers may clear the bit, if needed.
|
||||
Doing so requires clearing PSR.i and PSR.ic as well.
|
||||
PSR.ia Unchanged. Note: the ia64 linux kernel never sets this bit.
|
||||
|
||||
* Using fast system calls
|
||||
|
||||
To use fast system calls, userspace applications need simply call
|
||||
__kernel_syscall_via_epc(). For example
|
||||
|
||||
-- example fgettimeofday() call --
|
||||
-- fgettimeofday.S --
|
||||
|
||||
#include <asm/asmmacro.h>
|
||||
|
||||
GLOBAL_ENTRY(fgettimeofday)
|
||||
.prologue
|
||||
.save ar.pfs, r11
|
||||
mov r11 = ar.pfs
|
||||
.body
|
||||
|
||||
mov r2 = 0xa000000000020660;; // gate address
|
||||
// found by inspection of System.map for the
|
||||
// __kernel_syscall_via_epc() function. See
|
||||
// below for how to do this for real.
|
||||
|
||||
mov b7 = r2
|
||||
mov r15 = 1087 // gettimeofday syscall
|
||||
;;
|
||||
br.call.sptk.many b6 = b7
|
||||
;;
|
||||
|
||||
.restore sp
|
||||
|
||||
mov ar.pfs = r11
|
||||
br.ret.sptk.many rp;; // return to caller
|
||||
END(fgettimeofday)
|
||||
|
||||
-- end fgettimeofday.S --
|
||||
|
||||
In reality, getting the gate address is accomplished by two extra
|
||||
values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
|
||||
|
||||
o AT_SYSINFO : is the address of __kernel_syscall_via_epc()
|
||||
o AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
|
||||
|
||||
The ELF DSO is a pre-linked library that is mapped in by the kernel at
|
||||
the gate page. It is a proper ELF shared object so, with a dynamic
|
||||
loader that recognises the library, you should be able to make calls to
|
||||
the exported functions within it as with any other shared library.
|
||||
AT_SYSINFO points into the kernel DSO at the
|
||||
__kernel_syscall_via_epc() function for historical reasons (it was
|
||||
used before the kernel DSO) and as a convenience.
|
83
Documentation/ia64/kvm.txt
Normal file
83
Documentation/ia64/kvm.txt
Normal file
|
@ -0,0 +1,83 @@
|
|||
Currently, kvm module is in EXPERIMENTAL stage on IA64. This means that
|
||||
interfaces are not stable enough to use. So, please don't run critical
|
||||
applications in virtual machine.
|
||||
We will try our best to improve it in future versions!
|
||||
|
||||
Guide: How to boot up guests on kvm/ia64
|
||||
|
||||
This guide is to describe how to enable kvm support for IA-64 systems.
|
||||
|
||||
1. Get the kvm source from git.kernel.org.
|
||||
Userspace source:
|
||||
git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git
|
||||
Kernel Source:
|
||||
git clone git://git.kernel.org/pub/scm/linux/kernel/git/xiantao/kvm-ia64.git
|
||||
|
||||
2. Compile the source code.
|
||||
2.1 Compile userspace code:
|
||||
(1)cd ./kvm-userspace
|
||||
(2)./configure
|
||||
(3)cd kernel
|
||||
(4)make sync LINUX= $kernel_dir (kernel_dir is the directory of kernel source.)
|
||||
(5)cd ..
|
||||
(6)make qemu
|
||||
(7)cd qemu; make install
|
||||
|
||||
2.2 Compile kernel source code:
|
||||
(1) cd ./$kernel_dir
|
||||
(2) Make menuconfig
|
||||
(3) Enter into virtualization option, and choose kvm.
|
||||
(4) make
|
||||
(5) Once (4) done, make modules_install
|
||||
(6) Make initrd, and use new kernel to reboot up host machine.
|
||||
(7) Once (6) done, cd $kernel_dir/arch/ia64/kvm
|
||||
(8) insmod kvm.ko; insmod kvm-intel.ko
|
||||
|
||||
Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qemu, otherwise, may fail.
|
||||
|
||||
3. Get Guest Firmware named as Flash.fd, and put it under right place:
|
||||
(1) If you have the guest firmware (binary) released by Intel Corp for Xen, use it directly.
|
||||
|
||||
(2) If you have no firmware at hand, Please download its source from
|
||||
hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg
|
||||
you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries.
|
||||
|
||||
(3) Rename the firmware you owned to Flash.fd, and copy it to /usr/local/share/qemu
|
||||
|
||||
4. Boot up Linux or Windows guests:
|
||||
4.1 Create or install a image for guest boot. If you have xen experience, it should be easy.
|
||||
|
||||
4.2 Boot up guests use the following command.
|
||||
/usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image
|
||||
(xx is the number of virtual processors for the guest, now the maximum value is 4)
|
||||
|
||||
5. Known possible issue on some platforms with old Firmware.
|
||||
|
||||
In the event of strange host crash issues, try to solve it through either of the following ways:
|
||||
|
||||
(1): Upgrade your Firmware to the latest one.
|
||||
|
||||
(2): Applying the below patch to kernel source.
|
||||
diff --git a/arch/ia64/kernel/pal.S b/arch/ia64/kernel/pal.S
|
||||
index 0b53344..f02b0f7 100644
|
||||
--- a/arch/ia64/kernel/pal.S
|
||||
+++ b/arch/ia64/kernel/pal.S
|
||||
@@ -84,7 +84,8 @@ GLOBAL_ENTRY(ia64_pal_call_static)
|
||||
mov ar.pfs = loc1
|
||||
mov rp = loc0
|
||||
;;
|
||||
- srlz.d // serialize restoration of psr.l
|
||||
+ srlz.i // serialize restoration of psr.l
|
||||
+ ;;
|
||||
br.ret.sptk.many b0
|
||||
END(ia64_pal_call_static)
|
||||
|
||||
6. Bug report:
|
||||
If you found any issues when use kvm/ia64, Please post the bug info to kvm-ia64-devel mailing list.
|
||||
https://lists.sourceforge.net/lists/listinfo/kvm-ia64-devel/
|
||||
|
||||
Thanks for your interest! Let's work together, and make kvm/ia64 stronger and stronger!
|
||||
|
||||
|
||||
Xiantao Zhang <xiantao.zhang@intel.com>
|
||||
2008.3.10
|
194
Documentation/ia64/mca.txt
Normal file
194
Documentation/ia64/mca.txt
Normal file
|
@ -0,0 +1,194 @@
|
|||
An ad-hoc collection of notes on IA64 MCA and INIT processing. Feel
|
||||
free to update it with notes about any area that is not clear.
|
||||
|
||||
---
|
||||
|
||||
MCA/INIT are completely asynchronous. They can occur at any time, when
|
||||
the OS is in any state. Including when one of the cpus is already
|
||||
holding a spinlock. Trying to get any lock from MCA/INIT state is
|
||||
asking for deadlock. Also the state of structures that are protected
|
||||
by locks is indeterminate, including linked lists.
|
||||
|
||||
---
|
||||
|
||||
The complicated ia64 MCA process. All of this is mandated by Intel's
|
||||
specification for ia64 SAL, error recovery and unwind, it is not as
|
||||
if we have a choice here.
|
||||
|
||||
* MCA occurs on one cpu, usually due to a double bit memory error.
|
||||
This is the monarch cpu.
|
||||
|
||||
* SAL sends an MCA rendezvous interrupt (which is a normal interrupt)
|
||||
to all the other cpus, the slaves.
|
||||
|
||||
* Slave cpus that receive the MCA interrupt call down into SAL, they
|
||||
end up spinning disabled while the MCA is being serviced.
|
||||
|
||||
* If any slave cpu was already spinning disabled when the MCA occurred
|
||||
then it cannot service the MCA interrupt. SAL waits ~20 seconds then
|
||||
sends an unmaskable INIT event to the slave cpus that have not
|
||||
already rendezvoused.
|
||||
|
||||
* Because MCA/INIT can be delivered at any time, including when the cpu
|
||||
is down in PAL in physical mode, the registers at the time of the
|
||||
event are _completely_ undefined. In particular the MCA/INIT
|
||||
handlers cannot rely on the thread pointer, PAL physical mode can
|
||||
(and does) modify TP. It is allowed to do that as long as it resets
|
||||
TP on return. However MCA/INIT events expose us to these PAL
|
||||
internal TP changes. Hence curr_task().
|
||||
|
||||
* If an MCA/INIT event occurs while the kernel was running (not user
|
||||
space) and the kernel has called PAL then the MCA/INIT handler cannot
|
||||
assume that the kernel stack is in a fit state to be used. Mainly
|
||||
because PAL may or may not maintain the stack pointer internally.
|
||||
Because the MCA/INIT handlers cannot trust the kernel stack, they
|
||||
have to use their own, per-cpu stacks. The MCA/INIT stacks are
|
||||
preformatted with just enough task state to let the relevant handlers
|
||||
do their job.
|
||||
|
||||
* Unlike most other architectures, the ia64 struct task is embedded in
|
||||
the kernel stack[1]. So switching to a new kernel stack means that
|
||||
we switch to a new task as well. Because various bits of the kernel
|
||||
assume that current points into the struct task, switching to a new
|
||||
stack also means a new value for current.
|
||||
|
||||
* Once all slaves have rendezvoused and are spinning disabled, the
|
||||
monarch is entered. The monarch now tries to diagnose the problem
|
||||
and decide if it can recover or not.
|
||||
|
||||
* Part of the monarch's job is to look at the state of all the other
|
||||
tasks. The only way to do that on ia64 is to call the unwinder,
|
||||
as mandated by Intel.
|
||||
|
||||
* The starting point for the unwind depends on whether a task is
|
||||
running or not. That is, whether it is on a cpu or is blocked. The
|
||||
monarch has to determine whether or not a task is on a cpu before it
|
||||
knows how to start unwinding it. The tasks that received an MCA or
|
||||
INIT event are no longer running, they have been converted to blocked
|
||||
tasks. But (and its a big but), the cpus that received the MCA
|
||||
rendezvous interrupt are still running on their normal kernel stacks!
|
||||
|
||||
* To distinguish between these two cases, the monarch must know which
|
||||
tasks are on a cpu and which are not. Hence each slave cpu that
|
||||
switches to an MCA/INIT stack, registers its new stack using
|
||||
set_curr_task(), so the monarch can tell that the _original_ task is
|
||||
no longer running on that cpu. That gives us a decent chance of
|
||||
getting a valid backtrace of the _original_ task.
|
||||
|
||||
* MCA/INIT can be nested, to a depth of 2 on any cpu. In the case of a
|
||||
nested error, we want diagnostics on the MCA/INIT handler that
|
||||
failed, not on the task that was originally running. Again this
|
||||
requires set_curr_task() so the MCA/INIT handlers can register their
|
||||
own stack as running on that cpu. Then a recursive error gets a
|
||||
trace of the failing handler's "task".
|
||||
|
||||
[1] My (Keith Owens) original design called for ia64 to separate its
|
||||
struct task and the kernel stacks. Then the MCA/INIT data would be
|
||||
chained stacks like i386 interrupt stacks. But that required
|
||||
radical surgery on the rest of ia64, plus extra hard wired TLB
|
||||
entries with its associated performance degradation. David
|
||||
Mosberger vetoed that approach. Which meant that separate kernel
|
||||
stacks meant separate "tasks" for the MCA/INIT handlers.
|
||||
|
||||
---
|
||||
|
||||
INIT is less complicated than MCA. Pressing the nmi button or using
|
||||
the equivalent command on the management console sends INIT to all
|
||||
cpus. SAL picks one of the cpus as the monarch and the rest are
|
||||
slaves. All the OS INIT handlers are entered at approximately the same
|
||||
time. The OS monarch prints the state of all tasks and returns, after
|
||||
which the slaves return and the system resumes.
|
||||
|
||||
At least that is what is supposed to happen. Alas there are broken
|
||||
versions of SAL out there. Some drive all the cpus as monarchs. Some
|
||||
drive them all as slaves. Some drive one cpu as monarch, wait for that
|
||||
cpu to return from the OS then drive the rest as slaves. Some versions
|
||||
of SAL cannot even cope with returning from the OS, they spin inside
|
||||
SAL on resume. The OS INIT code has workarounds for some of these
|
||||
broken SAL symptoms, but some simply cannot be fixed from the OS side.
|
||||
|
||||
---
|
||||
|
||||
The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer
|
||||
violations. Unfortunately MCA/INIT start off as massive layer
|
||||
violations (can occur at _any_ time) and they build from there.
|
||||
|
||||
At least ia64 makes an attempt at recovering from hardware errors, but
|
||||
it is a difficult problem because of the asynchronous nature of these
|
||||
errors. When processing an unmaskable interrupt we sometimes need
|
||||
special code to cope with our inability to take any locks.
|
||||
|
||||
---
|
||||
|
||||
How is ia64 MCA/INIT different from x86 NMI?
|
||||
|
||||
* x86 NMI typically gets delivered to one cpu. MCA/INIT gets sent to
|
||||
all cpus.
|
||||
|
||||
* x86 NMI cannot be nested. MCA/INIT can be nested, to a depth of 2
|
||||
per cpu.
|
||||
|
||||
* x86 has a separate struct task which points to one of multiple kernel
|
||||
stacks. ia64 has the struct task embedded in the single kernel
|
||||
stack, so switching stack means switching task.
|
||||
|
||||
* x86 does not call the BIOS so the NMI handler does not have to worry
|
||||
about any registers having changed. MCA/INIT can occur while the cpu
|
||||
is in PAL in physical mode, with undefined registers and an undefined
|
||||
kernel stack.
|
||||
|
||||
* i386 backtrace is not very sensitive to whether a process is running
|
||||
or not. ia64 unwind is very, very sensitive to whether a process is
|
||||
running or not.
|
||||
|
||||
---
|
||||
|
||||
What happens when MCA/INIT is delivered what a cpu is running user
|
||||
space code?
|
||||
|
||||
The user mode registers are stored in the RSE area of the MCA/INIT on
|
||||
entry to the OS and are restored from there on return to SAL, so user
|
||||
mode registers are preserved across a recoverable MCA/INIT. Since the
|
||||
OS has no idea what unwind data is available for the user space stack,
|
||||
MCA/INIT never tries to backtrace user space. Which means that the OS
|
||||
does not bother making the user space process look like a blocked task,
|
||||
i.e. the OS does not copy pt_regs and switch_stack to the user space
|
||||
stack. Also the OS has no idea how big the user space RSE and memory
|
||||
stacks are, which makes it too risky to copy the saved state to a user
|
||||
mode stack.
|
||||
|
||||
---
|
||||
|
||||
How do we get a backtrace on the tasks that were running when MCA/INIT
|
||||
was delivered?
|
||||
|
||||
mca.c:::ia64_mca_modify_original_stack(). That identifies and
|
||||
verifies the original kernel stack, copies the dirty registers from
|
||||
the MCA/INIT stack's RSE to the original stack's RSE, copies the
|
||||
skeleton struct pt_regs and switch_stack to the original stack, fills
|
||||
in the skeleton structures from the PAL minstate area and updates the
|
||||
original stack's thread.ksp. That makes the original stack look
|
||||
exactly like any other blocked task, i.e. it now appears to be
|
||||
sleeping. To get a backtrace, just start with thread.ksp for the
|
||||
original task and unwind like any other sleeping task.
|
||||
|
||||
---
|
||||
|
||||
How do we identify the tasks that were running when MCA/INIT was
|
||||
delivered?
|
||||
|
||||
If the previous task has been verified and converted to a blocked
|
||||
state, then sos->prev_task on the MCA/INIT stack is updated to point to
|
||||
the previous task. You can look at that field in dumps or debuggers.
|
||||
To help distinguish between the handler and the original tasks,
|
||||
handlers have _TIF_MCA_INIT set in thread_info.flags.
|
||||
|
||||
The sos data is always in the MCA/INIT handler stack, at offset
|
||||
MCA_SOS_OFFSET. You can get that value from mca_asm.h or calculate it
|
||||
as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct
|
||||
ia64_sal_os_state), with 16 byte alignment for all structures.
|
||||
|
||||
Also the comm field of the MCA/INIT task is modified to include the pid
|
||||
of the original task, for humans to use. For example, a comm field of
|
||||
'MCA 12159' means that pid 12159 was running when the MCA was
|
||||
delivered.
|
137
Documentation/ia64/paravirt_ops.txt
Normal file
137
Documentation/ia64/paravirt_ops.txt
Normal file
|
@ -0,0 +1,137 @@
|
|||
Paravirt_ops on IA64
|
||||
====================
|
||||
21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp>
|
||||
|
||||
|
||||
Introduction
|
||||
------------
|
||||
The aim of this documentation is to help with maintainability and/or to
|
||||
encourage people to use paravirt_ops/IA64.
|
||||
|
||||
paravirt_ops (pv_ops in short) is a way for virtualization support of
|
||||
Linux kernel on x86. Several ways for virtualization support were
|
||||
proposed, paravirt_ops is the winner.
|
||||
On the other hand, now there are also several IA64 virtualization
|
||||
technologies like kvm/IA64, xen/IA64 and many other academic IA64
|
||||
hypervisors so that it is good to add generic virtualization
|
||||
infrastructure on Linux/IA64.
|
||||
|
||||
|
||||
What is paravirt_ops?
|
||||
---------------------
|
||||
It has been developed on x86 as virtualization support via API, not ABI.
|
||||
It allows each hypervisor to override operations which are important for
|
||||
hypervisors at API level. And it allows a single kernel binary to run on
|
||||
all supported execution environments including native machine.
|
||||
Essentially paravirt_ops is a set of function pointers which represent
|
||||
operations corresponding to low level sensitive instructions and high
|
||||
level functionalities in various area. But one significant difference
|
||||
from usual function pointer table is that it allows optimization with
|
||||
binary patch. It is because some of these operations are very
|
||||
performance sensitive and indirect call overhead is not negligible.
|
||||
With binary patch, indirect C function call can be transformed into
|
||||
direct C function call or in-place execution to eliminate the overhead.
|
||||
|
||||
Thus, operations of paravirt_ops are classified into three categories.
|
||||
- simple indirect call
|
||||
These operations correspond to high level functionality so that the
|
||||
overhead of indirect call isn't very important.
|
||||
|
||||
- indirect call which allows optimization with binary patch
|
||||
Usually these operations correspond to low level instructions. They
|
||||
are called frequently and performance critical. So the overhead is
|
||||
very important.
|
||||
|
||||
- a set of macros for hand written assembly code
|
||||
Hand written assembly codes (.S files) also need paravirtualization
|
||||
because they include sensitive instructions or some of code paths in
|
||||
them are very performance critical.
|
||||
|
||||
|
||||
The relation to the IA64 machine vector
|
||||
---------------------------------------
|
||||
Linux/IA64 has the IA64 machine vector functionality which allows the
|
||||
kernel to switch implementations (e.g. initialization, ipi, dma api...)
|
||||
depending on executing platform.
|
||||
We can replace some implementations very easily defining a new machine
|
||||
vector. Thus another approach for virtualization support would be
|
||||
enhancing the machine vector functionality.
|
||||
But paravirt_ops approach was taken because
|
||||
- virtualization support needs wider support than machine vector does.
|
||||
e.g. low level instruction paravirtualization. It must be
|
||||
initialized very early before platform detection.
|
||||
|
||||
- virtualization support needs more functionality like binary patch.
|
||||
Probably the calling overhead might not be very large compared to the
|
||||
emulation overhead of virtualization. However in the native case, the
|
||||
overhead should be eliminated completely.
|
||||
A single kernel binary should run on each environment including native,
|
||||
and the overhead of paravirt_ops on native environment should be as
|
||||
small as possible.
|
||||
|
||||
- for full virtualization technology, e.g. KVM/IA64 or
|
||||
Xen/IA64 HVM domain, the result would be
|
||||
(the emulated platform machine vector. probably dig) + (pv_ops).
|
||||
This means that the virtualization support layer should be under
|
||||
the machine vector layer.
|
||||
|
||||
Possibly it might be better to move some function pointers from
|
||||
paravirt_ops to machine vector. In fact, Xen domU case utilizes both
|
||||
pv_ops and machine vector.
|
||||
|
||||
|
||||
IA64 paravirt_ops
|
||||
-----------------
|
||||
In this section, the concrete paravirt_ops will be discussed.
|
||||
Because of the architecture difference between ia64 and x86, the
|
||||
resulting set of functions is very different from x86 pv_ops.
|
||||
|
||||
- C function pointer tables
|
||||
They are not very performance critical so that simple C indirect
|
||||
function call is acceptable. The following structures are defined at
|
||||
this moment. For details see linux/include/asm-ia64/paravirt.h
|
||||
- struct pv_info
|
||||
This structure describes the execution environment.
|
||||
- struct pv_init_ops
|
||||
This structure describes the various initialization hooks.
|
||||
- struct pv_iosapic_ops
|
||||
This structure describes hooks to iosapic operations.
|
||||
- struct pv_irq_ops
|
||||
This structure describes hooks to irq related operations
|
||||
- struct pv_time_op
|
||||
This structure describes hooks to steal time accounting.
|
||||
|
||||
- a set of indirect calls which need optimization
|
||||
Currently this class of functions correspond to a subset of IA64
|
||||
intrinsics. At this moment the optimization with binary patch isn't
|
||||
implemented yet.
|
||||
struct pv_cpu_op is defined. For details see
|
||||
linux/include/asm-ia64/paravirt_privop.h
|
||||
Mostly they correspond to ia64 intrinsics 1-to-1.
|
||||
Caveat: Now they are defined as C indirect function pointers, but in
|
||||
order to support binary patch optimization, they will be changed
|
||||
using GCC extended inline assembly code.
|
||||
|
||||
- a set of macros for hand written assembly code (.S files)
|
||||
For maintenance purpose, the taken approach for .S files is single
|
||||
source code and compile multiple times with different macros definitions.
|
||||
Each pv_ops instance must define those macros to compile.
|
||||
The important thing here is that sensitive, but non-privileged
|
||||
instructions must be paravirtualized and that some privileged
|
||||
instructions also need paravirtualization for reasonable performance.
|
||||
Developers who modify .S files must be aware of that. At this moment
|
||||
an easy checker is implemented to detect paravirtualization breakage.
|
||||
But it doesn't cover all the cases.
|
||||
|
||||
Sometimes this set of macros is called pv_cpu_asm_op. But there is no
|
||||
corresponding structure in the source code.
|
||||
Those macros mostly 1:1 correspond to a subset of privileged
|
||||
instructions. See linux/include/asm-ia64/native/inst.h.
|
||||
And some functions written in assembly also need to be overrided so
|
||||
that each pv_ops instance have to define some macros. Again see
|
||||
linux/include/asm-ia64/native/inst.h.
|
||||
|
||||
|
||||
Those structures must be initialized very early before start_kernel.
|
||||
Probably initialized in head.S using multi entry point or some other trick.
|
||||
For native case implementation see linux/arch/ia64/kernel/paravirt.c.
|
151
Documentation/ia64/serial.txt
Normal file
151
Documentation/ia64/serial.txt
Normal file
|
@ -0,0 +1,151 @@
|
|||
SERIAL DEVICE NAMING
|
||||
|
||||
As of 2.6.10, serial devices on ia64 are named based on the
|
||||
order of ACPI and PCI enumeration. The first device in the
|
||||
ACPI namespace (if any) becomes /dev/ttyS0, the second becomes
|
||||
/dev/ttyS1, etc., and PCI devices are named sequentially
|
||||
starting after the ACPI devices.
|
||||
|
||||
Prior to 2.6.10, there were confusing exceptions to this:
|
||||
|
||||
- Firmware on some machines (mostly from HP) provides an HCDP
|
||||
table[1] that tells the kernel about devices that can be used
|
||||
as a serial console. If the user specified "console=ttyS0"
|
||||
or the EFI ConOut path contained only UART devices, the
|
||||
kernel registered the device described by the HCDP as
|
||||
/dev/ttyS0.
|
||||
|
||||
- If there was no HCDP, we assumed there were UARTs at the
|
||||
legacy COM port addresses (I/O ports 0x3f8 and 0x2f8), so
|
||||
the kernel registered those as /dev/ttyS0 and /dev/ttyS1.
|
||||
|
||||
Any additional ACPI or PCI devices were registered sequentially
|
||||
after /dev/ttyS0 as they were discovered.
|
||||
|
||||
With an HCDP, device names changed depending on EFI configuration
|
||||
and "console=" arguments. Without an HCDP, device names didn't
|
||||
change, but we registered devices that might not really exist.
|
||||
|
||||
For example, an HP rx1600 with a single built-in serial port
|
||||
(described in the ACPI namespace) plus an MP[2] (a PCI device) has
|
||||
these ports:
|
||||
|
||||
pre-2.6.10 pre-2.6.10
|
||||
MMIO (EFI console (EFI console
|
||||
address on builtin) on MP port) 2.6.10
|
||||
========== ========== ========== ======
|
||||
builtin 0xff5e0000 ttyS0 ttyS1 ttyS0
|
||||
MP UPS 0xf8031000 ttyS1 ttyS2 ttyS1
|
||||
MP Console 0xf8030000 ttyS2 ttyS0 ttyS2
|
||||
MP 2 0xf8030010 ttyS3 ttyS3 ttyS3
|
||||
MP 3 0xf8030038 ttyS4 ttyS4 ttyS4
|
||||
|
||||
CONSOLE SELECTION
|
||||
|
||||
EFI knows what your console devices are, but it doesn't tell the
|
||||
kernel quite enough to actually locate them. The DIG64 HCDP
|
||||
table[1] does tell the kernel where potential serial console
|
||||
devices are, but not all firmware supplies it. Also, EFI supports
|
||||
multiple simultaneous consoles and doesn't tell the kernel which
|
||||
should be the "primary" one.
|
||||
|
||||
So how do you tell Linux which console device to use?
|
||||
|
||||
- If your firmware supplies the HCDP, it is simplest to
|
||||
configure EFI with a single device (either a UART or a VGA
|
||||
card) as the console. Then you don't need to tell Linux
|
||||
anything; the kernel will automatically use the EFI console.
|
||||
|
||||
(This works only in 2.6.6 or later; prior to that you had
|
||||
to specify "console=ttyS0" to get a serial console.)
|
||||
|
||||
- Without an HCDP, Linux defaults to a VGA console unless you
|
||||
specify a "console=" argument.
|
||||
|
||||
NOTE: Don't assume that a serial console device will be /dev/ttyS0.
|
||||
It might be ttyS1, ttyS2, etc. Make sure you have the appropriate
|
||||
entries in /etc/inittab (for getty) and /etc/securetty (to allow
|
||||
root login).
|
||||
|
||||
EARLY SERIAL CONSOLE
|
||||
|
||||
The kernel can't start using a serial console until it knows where
|
||||
the device lives. Normally this happens when the driver enumerates
|
||||
all the serial devices, which can happen a minute or more after the
|
||||
kernel starts booting.
|
||||
|
||||
2.6.10 and later kernels have an "early uart" driver that works
|
||||
very early in the boot process. The kernel will automatically use
|
||||
this if the user supplies an argument like "console=uart,io,0x3f8",
|
||||
or if the EFI console path contains only a UART device and the
|
||||
firmware supplies an HCDP.
|
||||
|
||||
TROUBLESHOOTING SERIAL CONSOLE PROBLEMS
|
||||
|
||||
No kernel output after elilo prints "Uncompressing Linux... done":
|
||||
|
||||
- You specified "console=ttyS0" but Linux changed the device
|
||||
to which ttyS0 refers. Configure exactly one EFI console
|
||||
device[3] and remove the "console=" option.
|
||||
|
||||
- The EFI console path contains both a VGA device and a UART.
|
||||
EFI and elilo use both, but Linux defaults to VGA. Remove
|
||||
the VGA device from the EFI console path[3].
|
||||
|
||||
- Multiple UARTs selected as EFI console devices. EFI and
|
||||
elilo use all selected devices, but Linux uses only one.
|
||||
Make sure only one UART is selected in the EFI console
|
||||
path[3].
|
||||
|
||||
- You're connected to an HP MP port[2] but have a non-MP UART
|
||||
selected as EFI console device. EFI uses the MP as a
|
||||
console device even when it isn't explicitly selected.
|
||||
Either move the console cable to the non-MP UART, or change
|
||||
the EFI console path[3] to the MP UART.
|
||||
|
||||
Long pause (60+ seconds) between "Uncompressing Linux... done" and
|
||||
start of kernel output:
|
||||
|
||||
- No early console because you used "console=ttyS<n>". Remove
|
||||
the "console=" option if your firmware supplies an HCDP.
|
||||
|
||||
- If you don't have an HCDP, the kernel doesn't know where
|
||||
your console lives until the driver discovers serial
|
||||
devices. Use "console=uart, io,0x3f8" (or appropriate
|
||||
address for your machine).
|
||||
|
||||
Kernel and init script output works fine, but no "login:" prompt:
|
||||
|
||||
- Add getty entry to /etc/inittab for console tty. Look for
|
||||
the "Adding console on ttyS<n>" message that tells you which
|
||||
device is the console.
|
||||
|
||||
"login:" prompt, but can't login as root:
|
||||
|
||||
- Add entry to /etc/securetty for console tty.
|
||||
|
||||
No ACPI serial devices found in 2.6.17 or later:
|
||||
|
||||
- Turn on CONFIG_PNP and CONFIG_PNPACPI. Prior to 2.6.17, ACPI
|
||||
serial devices were discovered by 8250_acpi. In 2.6.17,
|
||||
8250_acpi was replaced by the combination of 8250_pnp and
|
||||
CONFIG_PNPACPI.
|
||||
|
||||
|
||||
|
||||
[1] http://www.dig64.org/specifications/agreement
|
||||
The table was originally defined as the "HCDP" for "Headless
|
||||
Console/Debug Port." The current version is the "PCDP" for
|
||||
"Primary Console and Debug Port Devices."
|
||||
|
||||
[2] The HP MP (management processor) is a PCI device that provides
|
||||
several UARTs. One of the UARTs is often used as a console; the
|
||||
EFI Boot Manager identifies it as "Acpi(HWP0002,700)/Pci(...)/Uart".
|
||||
The external connection is usually a 25-pin connector, and a
|
||||
special dongle converts that to three 9-pin connectors, one of
|
||||
which is labelled "Console."
|
||||
|
||||
[3] EFI console devices are configured using the EFI Boot Manager
|
||||
"Boot option maintenance" menu. You may have to interrupt the
|
||||
boot sequence to use this menu, and you will have to reset the
|
||||
box after changing console configuration.
|
183
Documentation/ia64/xen.txt
Normal file
183
Documentation/ia64/xen.txt
Normal file
|
@ -0,0 +1,183 @@
|
|||
Recipe for getting/building/running Xen/ia64 with pv_ops
|
||||
--------------------------------------------------------
|
||||
|
||||
This recipe describes how to get xen-ia64 source and build it,
|
||||
and run domU with pv_ops.
|
||||
|
||||
============
|
||||
Requirements
|
||||
============
|
||||
|
||||
- python
|
||||
- mercurial
|
||||
it (aka "hg") is an open-source source code
|
||||
management software. See the below.
|
||||
http://www.selenic.com/mercurial/wiki/
|
||||
- git
|
||||
- bridge-utils
|
||||
|
||||
=================================
|
||||
Getting and Building Xen and Dom0
|
||||
=================================
|
||||
|
||||
My environment is;
|
||||
Machine : Tiger4
|
||||
Domain0 OS : RHEL5
|
||||
DomainU OS : RHEL5
|
||||
|
||||
1. Download source
|
||||
# hg clone http://xenbits.xensource.com/ext/ia64/xen-unstable.hg
|
||||
# cd xen-unstable.hg
|
||||
# hg clone http://xenbits.xensource.com/ext/ia64/linux-2.6.18-xen.hg
|
||||
|
||||
2. # make world
|
||||
|
||||
3. # make install-tools
|
||||
|
||||
4. copy kernels and xen
|
||||
# cp xen/xen.gz /boot/efi/efi/redhat/
|
||||
# cp build-linux-2.6.18-xen_ia64/vmlinux.gz \
|
||||
/boot/efi/efi/redhat/vmlinuz-2.6.18.8-xen
|
||||
|
||||
5. make initrd for Dom0/DomU
|
||||
# make -C linux-2.6.18-xen.hg ARCH=ia64 modules_install \
|
||||
O=$(/bin/pwd)/build-linux-2.6.18-xen_ia64
|
||||
# mkinitrd -f /boot/efi/efi/redhat/initrd-2.6.18.8-xen.img \
|
||||
2.6.18.8-xen --builtin mptspi --builtin mptbase \
|
||||
--builtin mptscsih --builtin uhci-hcd --builtin ohci-hcd \
|
||||
--builtin ehci-hcd
|
||||
|
||||
================================
|
||||
Making a disk image for guest OS
|
||||
================================
|
||||
|
||||
1. make file
|
||||
# dd if=/dev/zero of=/root/rhel5.img bs=1M seek=4096 count=0
|
||||
# mke2fs -F -j /root/rhel5.img
|
||||
# mount -o loop /root/rhel5.img /mnt
|
||||
# cp -ax /{dev,var,etc,usr,bin,sbin,lib} /mnt
|
||||
# mkdir /mnt/{root,proc,sys,home,tmp}
|
||||
|
||||
Note: You may miss some device files. If so, please create them
|
||||
with mknod. Or you can use tar instead of cp.
|
||||
|
||||
2. modify DomU's fstab
|
||||
# vi /mnt/etc/fstab
|
||||
/dev/xvda1 / ext3 defaults 1 1
|
||||
none /dev/pts devpts gid=5,mode=620 0 0
|
||||
none /dev/shm tmpfs defaults 0 0
|
||||
none /proc proc defaults 0 0
|
||||
none /sys sysfs defaults 0 0
|
||||
|
||||
3. modify inittab
|
||||
set runlevel to 3 to avoid X trying to start
|
||||
# vi /mnt/etc/inittab
|
||||
id:3:initdefault:
|
||||
Start a getty on the hvc0 console
|
||||
X0:2345:respawn:/sbin/mingetty hvc0
|
||||
tty1-6 mingetty can be commented out
|
||||
|
||||
4. add hvc0 into /etc/securetty
|
||||
# vi /mnt/etc/securetty (add hvc0)
|
||||
|
||||
5. umount
|
||||
# umount /mnt
|
||||
|
||||
FYI, virt-manager can also make a disk image for guest OS.
|
||||
It's GUI tools and easy to make it.
|
||||
|
||||
==================
|
||||
Boot Xen & Domain0
|
||||
==================
|
||||
|
||||
1. replace elilo
|
||||
elilo of RHEL5 can boot Xen and Dom0.
|
||||
If you use old elilo (e.g RHEL4), please download from the below
|
||||
http://elilo.sourceforge.net/cgi-bin/blosxom
|
||||
and copy into /boot/efi/efi/redhat/
|
||||
# cp elilo-3.6-ia64.efi /boot/efi/efi/redhat/elilo.efi
|
||||
|
||||
2. modify elilo.conf (like the below)
|
||||
# vi /boot/efi/efi/redhat/elilo.conf
|
||||
prompt
|
||||
timeout=20
|
||||
default=xen
|
||||
relocatable
|
||||
|
||||
image=vmlinuz-2.6.18.8-xen
|
||||
label=xen
|
||||
vmm=xen.gz
|
||||
initrd=initrd-2.6.18.8-xen.img
|
||||
read-only
|
||||
append=" -- rhgb root=/dev/sda2"
|
||||
|
||||
The append options before "--" are for xen hypervisor,
|
||||
the options after "--" are for dom0.
|
||||
|
||||
FYI, your machine may need console options like
|
||||
"com1=19200,8n1 console=vga,com1". For example,
|
||||
append="com1=19200,8n1 console=vga,com1 -- rhgb console=tty0 \
|
||||
console=ttyS0 root=/dev/sda2"
|
||||
|
||||
=====================================
|
||||
Getting and Building domU with pv_ops
|
||||
=====================================
|
||||
|
||||
1. get pv_ops tree
|
||||
# git clone http://people.valinux.co.jp/~yamahata/xen-ia64/linux-2.6-xen-ia64.git/
|
||||
|
||||
2. git branch (if necessary)
|
||||
# cd linux-2.6-xen-ia64/
|
||||
# git checkout -b your_branch origin/xen-ia64-domu-minimal-2008may19
|
||||
(Note: The current branch is xen-ia64-domu-minimal-2008may19.
|
||||
But you would find the new branch. You can see with
|
||||
"git branch -r" to get the branch lists.
|
||||
http://people.valinux.co.jp/~yamahata/xen-ia64/for_eagl/linux-2.6-ia64-pv-ops.git/
|
||||
is also available. The tree is based on
|
||||
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 test)
|
||||
|
||||
|
||||
3. copy .config for pv_ops of domU
|
||||
# cp arch/ia64/configs/xen_domu_wip_defconfig .config
|
||||
|
||||
4. make kernel with pv_ops
|
||||
# make oldconfig
|
||||
# make
|
||||
|
||||
5. install the kernel and initrd
|
||||
# cp vmlinux.gz /boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU
|
||||
# make modules_install
|
||||
# mkinitrd -f /boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img \
|
||||
2.6.26-rc3xen-ia64-08941-g1b12161 --builtin mptspi \
|
||||
--builtin mptbase --builtin mptscsih --builtin uhci-hcd \
|
||||
--builtin ohci-hcd --builtin ehci-hcd
|
||||
|
||||
========================
|
||||
Boot DomainU with pv_ops
|
||||
========================
|
||||
|
||||
1. make config of DomU
|
||||
# vi /etc/xen/rhel5
|
||||
kernel = "/boot/efi/efi/redhat/vmlinuz-2.6-pv_ops-xenU"
|
||||
ramdisk = "/boot/efi/efi/redhat/initrd-2.6-pv_ops-xenU.img"
|
||||
vcpus = 1
|
||||
memory = 512
|
||||
name = "rhel5"
|
||||
disk = [ 'file:/root/rhel5.img,xvda1,w' ]
|
||||
root = "/dev/xvda1 ro"
|
||||
extra= "rhgb console=hvc0"
|
||||
|
||||
2. After boot xen and dom0, start xend
|
||||
# /etc/init.d/xend start
|
||||
( In the debugging case, # XEND_DEBUG=1 xend trace_start )
|
||||
|
||||
3. start domU
|
||||
# xm create -c rhel5
|
||||
|
||||
=========
|
||||
Reference
|
||||
=========
|
||||
- Wiki of Xen/IA64 upstream merge
|
||||
http://wiki.xensource.com/xenwiki/XenIA64/UpstreamMerge
|
||||
|
||||
Written by Akio Takebe <takebe_akio@jp.fujitsu.com> on 28 May 2008
|
Loading…
Add table
Add a link
Reference in a new issue