From: Andi Kleen Big x86-64 merge for 2.6.8rc2-mm1. This is for post 2.6.8 mainline, but it would be good to already get some more testing in -mm. This patch has only the x86-64 specific bits. I'm sending the other stuff separately. Mainly it is lots of smaller bug fixes. A bigger change was to finally convert the IOMMU code over to work with generic devices. This helps for dealing with ISA-like devices. Everything should work with gcc 3.5-mainline now. This needed some restructuring in a few system call entry points to avoid misoptimization. I changed the "fnclex" fix inherited from i386 to use a faster lazy kernel math exception checking method now. i386 probably should do this too at some point. Again the machine check code has more fixes. Hopefully it is nearly perfect now. Main change is that it will log left over machine checks from before the last reboot. This allows to catch machine checks that cause the system to reboot. "lazy" scatter list merging is enabled by default now (concept nicked from ppc64), and the IOMMU code allows DAC in more cases now. This could improve block IO performance in some cases. And the command line should not get truncated anymore. And various other stuff, see below. Detailed change log: - Add 32bit emulation for PTRACE_GETEVENTMSG - Fix kernel_fpu_{begin,end} for preemptive kernels (Alexander Nyberg) - Readd proper check for biomerge (got lost) - Set up 32bit vsyscall page for ptrace early - Add 32bit emulation for lookup_dcookie() for oprofile - Export copy_page / clear_page - Use rex prefix in save_init_fpu fxsave (Jan Beulich) - Make it compile again - Fix handling of hwdev == NULL (= ISA/LPC devices) in swiotlb - Convert PCI DMA code to dma devices - Change IOMMU code to use dummy fallback device instead of hardcoded NULL tests everywhere. - Test iommu_sac_force instead of nommu for DAC supported macro (will cause more drivers to use DAC) - Harden non IOMMU dma_alloc_consistent code to fail less likely. - Drop ISA dependencies on IRDA drivers - Fix NMI watchdog documentation (suggested by Alexander Nyberg) - Remove use of strsep in option parsers - Remove duplicated exports (Arjan van der Ven) - Fix EFAULT checking in ptrace (John Blackwood) - Update defconfig - Remove dead URL from boot/setup.S (R.J. Wysocki) - Support iommu=force for swiotlb too. - Use compat_sigval_t instead of sigval_t32 (Al Viro) - Nanooptimization in 32bit ptregs calls - Fix gcc 3.5 compilation in mtrr.h - Pass pt_regs as pointer to avoid illegal pass by reference (for gcc 3.5) - Make set_bit take int not long (Harald Dunkel) - Avoid panic on pci_map_sg and pci_alloc_consistent overflow in GART IOMMU - Handle large lost time delays in HPET code (Suresh B. Siddha) - Work around theoretical bugs in prefetch handling (suggested by Jamie Lokier) - Remove mtrr_strings declaration for gcc 3.5 - Set KBUILD_IMAGE for make rpm (William Lee Irwin III) - Add iommu=noaperture to not touch the aperture - Clean up argument parsing for iommu= option - Export symbols for xchgadd based rwsems (still disabled) - Define iommu_bio_merge for !CONFIG_GART_IOMMU - Don't use backwards rep ; movsb for memmove - Out line bitmap search functions (saves 8k .text, from i386) - Convert bitmap search functions to 64bit accesses and optimize them a bit. - Handle corrupted page tables in page fault handler - Set iommu_merge (without force) to on by default again. - Don't do bio merging by default for iommu=merge. This should make it safe to use again - Add iommu=biomerge option to enable BIO merging (like old iommu=merge) - Fix iommu=memaper=... parsing - More MCE fixes (based on a patch by Eric Morton, heavily changed by me) - Fix check for banks causing exceptions - Allow to reinit MCEs later even after mce=off, fix wrong use of __initdata to disable at boot, but reenable later. - Log left over machine checks after boot and resume - Fix missing prototype warning with CPU_FREQ on - Fix parsing of noexec=on (Ian Hastie) - Fix warning in ia32_binfmt.c - Resync time variable cpu frequency handling with i386 - Resync msr.c with i386 - Add 0x60 level 1 intel cache descriptor (from i386) - Remove duplicated 32bit ioctls (Arnd Bergmann) - Enable -msoft-float (from i386) - Use faster version of FPU hang fix - handle the exception * a bit experimental, if you see "kernel ... math error" events in the log please report. Signed-off-by: Andrew Morton --- 25-akpm/Documentation/x86_64/boot-options.txt | 20 + 25-akpm/arch/x86_64/Makefile | 2 25-akpm/arch/x86_64/defconfig | 57 ++-- 25-akpm/arch/x86_64/ia32/ia32_binfmt.c | 3 25-akpm/arch/x86_64/ia32/ia32_ioctl.c | 15 - 25-akpm/arch/x86_64/ia32/ia32_signal.c | 33 +- 25-akpm/arch/x86_64/ia32/ia32entry.S | 33 +- 25-akpm/arch/x86_64/ia32/ptrace32.c | 6 25-akpm/arch/x86_64/ia32/sys_ia32.c | 21 + 25-akpm/arch/x86_64/kernel/aperture.c | 23 - 25-akpm/arch/x86_64/kernel/early_printk.c | 17 - 25-akpm/arch/x86_64/kernel/entry.S | 16 - 25-akpm/arch/x86_64/kernel/ioport.c | 6 25-akpm/arch/x86_64/kernel/mce.c | 117 +++++---- 25-akpm/arch/x86_64/kernel/msr.c | 315 ++++++++++++-------------- 25-akpm/arch/x86_64/kernel/pci-dma.c | 10 25-akpm/arch/x86_64/kernel/pci-gart.c | 259 +++++++++++---------- 25-akpm/arch/x86_64/kernel/pci-nommu.c | 52 ++-- 25-akpm/arch/x86_64/kernel/process.c | 14 - 25-akpm/arch/x86_64/kernel/ptrace.c | 14 - 25-akpm/arch/x86_64/kernel/setup.c | 3 25-akpm/arch/x86_64/kernel/setup64.c | 26 +- 25-akpm/arch/x86_64/kernel/signal.c | 23 - 25-akpm/arch/x86_64/kernel/time.c | 96 ++++++- 25-akpm/arch/x86_64/kernel/traps.c | 54 +++- 25-akpm/arch/x86_64/kernel/x8664_ksyms.c | 24 + 25-akpm/arch/x86_64/lib/Makefile | 2 25-akpm/arch/x86_64/lib/bitops.c | 136 +++++++++++ 25-akpm/arch/x86_64/lib/bitstr.c | 3 25-akpm/arch/x86_64/lib/memmove.c | 16 - 25-akpm/arch/x86_64/mm/fault.c | 77 ++++-- 25-akpm/arch/x86_64/mm/init.c | 18 + 25-akpm/include/asm-x86_64/bitops.h | 130 ---------- 25-akpm/include/asm-x86_64/dma-mapping.h | 133 ++++++++++ 25-akpm/include/asm-x86_64/i387.h | 24 + 25-akpm/include/asm-x86_64/ia32.h | 10 25-akpm/include/asm-x86_64/io.h | 7 25-akpm/include/asm-x86_64/mtrr.h | 2 25-akpm/include/asm-x86_64/pci.h | 253 +------------------- 25-akpm/include/asm-x86_64/proto.h | 2 25-akpm/include/asm-x86_64/swiotlb.h | 36 ++ 41 files changed, 1140 insertions(+), 968 deletions(-) diff -puN arch/x86_64/defconfig~x86-64-merge-for-268rc2-mm1 arch/x86_64/defconfig --- 25/arch/x86_64/defconfig~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.970008248 -0700 +++ 25-akpm/arch/x86_64/defconfig 2004-07-31 16:54:51.029999128 -0700 @@ -35,6 +35,7 @@ CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y +CONFIG_KALLSYMS_EXTRA_PASS=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_NOOP=y @@ -104,8 +105,8 @@ CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set -CONFIG_ACPI_TOSHIBA=y -CONFIG_ACPI_DEBUG=y +# CONFIG_ACPI_TOSHIBA is not set +# CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_BUS=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y @@ -115,7 +116,20 @@ CONFIG_ACPI_SYSTEM=y # # CPU Frequency scaling # -# CONFIG_CPU_FREQ is not set +CONFIG_CPU_FREQ=y +CONFIG_CPU_FREQ_PROC_INTF=y +CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y +# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set +CONFIG_CPU_FREQ_GOV_PERFORMANCE=y +CONFIG_CPU_FREQ_GOV_POWERSAVE=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +# CONFIG_CPU_FREQ_24_API is not set +CONFIG_CPU_FREQ_TABLE=y + +# +# CPUFreq processor drivers +# +CONFIG_X86_POWERNOW_K8=y # # Bus options (PCI etc.) @@ -123,6 +137,7 @@ CONFIG_ACPI_SYSTEM=y CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y +CONFIG_PCI_USE_VECTOR=y # CONFIG_PCI_LEGACY_PROC is not set # CONFIG_PCI_NAMES is not set @@ -144,6 +159,7 @@ CONFIG_UID16=y # # Generic Driver Options # +CONFIG_PREVENT_FIRMWARE_BUILD=y # CONFIG_DEBUG_DRIVER is not set # @@ -171,7 +187,7 @@ CONFIG_BLK_DEV_FD=y CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set -# CONFIG_BLK_DEV_CARMEL is not set +# CONFIG_BLK_DEV_SX8 is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_INITRD=y @@ -186,10 +202,10 @@ CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # +# CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y CONFIG_IDEDISK_MULTI_MODE=y -# CONFIG_IDEDISK_STROKE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set @@ -234,7 +250,7 @@ CONFIG_BLK_DEV_PIIX=y # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set -# CONFIG_BLK_DEV_VIA82CXXX is not set +CONFIG_BLK_DEV_VIA82CXXX=y # CONFIG_IDE_ARM is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set @@ -273,16 +289,17 @@ CONFIG_BLK_DEV_SD=y # SCSI low-level drivers # CONFIG_BLK_DEV_3W_XXXX_RAID=y +# CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set -# CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_MEGARAID is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_SVW is not set CONFIG_SCSI_ATA_PIIX=y +# CONFIG_SCSI_SATA_NV is not set # CONFIG_SCSI_SATA_PROMISE is not set # CONFIG_SCSI_SATA_SX4 is not set # CONFIG_SCSI_SATA_SIL is not set @@ -290,7 +307,6 @@ CONFIG_SCSI_ATA_PIIX=y CONFIG_SCSI_SATA_VIA=y # CONFIG_SCSI_SATA_VITESSE is not set # CONFIG_SCSI_BUSLOGIC is not set -# CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set @@ -393,6 +409,7 @@ CONFIG_IPV6=y # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set +# CONFIG_NET_CLS_ROUTE is not set # # Network testing @@ -443,8 +460,8 @@ CONFIG_FORCEDETH=y # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set -CONFIG_8139CP=m -CONFIG_8139TOO=m +# CONFIG_8139CP is not set +CONFIG_8139TOO=y # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set @@ -453,6 +470,7 @@ CONFIG_8139TOO=m # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_VIA_RHINE is not set +# CONFIG_VIA_VELOCITY is not set # # Ethernet (1000 Mbit) @@ -603,6 +621,9 @@ CONFIG_AGP_AMD64=y # CONFIG_DRM is not set # CONFIG_MWAVE is not set CONFIG_RAW_DRIVER=y +CONFIG_HPET=y +# CONFIG_HPET_RTC_IRQ is not set +# CONFIG_HPET_NOMMAP is not set CONFIG_MAX_RAW_DEVS=256 CONFIG_HANGCHECK_TIMER=y @@ -704,17 +725,8 @@ CONFIG_USB_OHCI_HCD=y # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_MIDI is not set # CONFIG_USB_ACM is not set -CONFIG_USB_PRINTER=y -CONFIG_USB_STORAGE=y -# CONFIG_USB_STORAGE_DEBUG is not set -# CONFIG_USB_STORAGE_DATAFAB is not set -# CONFIG_USB_STORAGE_FREECOM is not set -# CONFIG_USB_STORAGE_ISD200 is not set -# CONFIG_USB_STORAGE_DPCM is not set -# CONFIG_USB_STORAGE_HP8200e is not set -# CONFIG_USB_STORAGE_SDDR09 is not set -# CONFIG_USB_STORAGE_SDDR55 is not set -# CONFIG_USB_STORAGE_JUMPSHOT is not set +# CONFIG_USB_PRINTER is not set +# CONFIG_USB_STORAGE is not set # # USB Human Interface Devices (HID) @@ -910,7 +922,7 @@ CONFIG_DEBUG_KERNEL=y # CONFIG_DEBUG_SLAB is not set CONFIG_MAGIC_SYSRQ=y # CONFIG_DEBUG_SPINLOCK is not set -# CONFIG_INIT_DEBUG is not set +CONFIG_INIT_DEBUG=y # CONFIG_DEBUG_INFO is not set # CONFIG_FRAME_POINTER is not set # CONFIG_IOMMU_DEBUG is not set @@ -928,5 +940,6 @@ CONFIG_MAGIC_SYSRQ=y # # Library routines # +# CONFIG_CRC_CCITT is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set diff -puN arch/x86_64/ia32/ia32_binfmt.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/ia32_binfmt.c --- 25/arch/x86_64/ia32/ia32_binfmt.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.972007944 -0700 +++ 25-akpm/arch/x86_64/ia32/ia32_binfmt.c 2004-07-31 16:54:51.030998976 -0700 @@ -301,6 +301,9 @@ MODULE_AUTHOR("Eric Youngdale, Andi Klee #define elf_addr_t __u32 +#undef TASK_SIZE +#define TASK_SIZE 0xffffffff + static void elf32_init(struct pt_regs *); #include "../../../fs/binfmt_elf.c" diff -puN arch/x86_64/ia32/ia32entry.S~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/ia32entry.S --- 25/arch/x86_64/ia32/ia32entry.S~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.973007792 -0700 +++ 25-akpm/arch/x86_64/ia32/ia32entry.S 2004-07-31 16:54:51.031998824 -0700 @@ -270,35 +270,32 @@ quiet_ni_syscall: ret CFI_ENDPROC - .macro PTREGSCALL label, func + .macro PTREGSCALL label, func, arg .globl \label \label: leaq \func(%rip),%rax + leaq -ARGOFFSET+8(%rsp),\arg /* 8 for return address */ jmp ia32_ptregs_common .endm - PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn - PTREGSCALL stub32_sigreturn, sys32_sigreturn - PTREGSCALL stub32_sigaltstack, sys32_sigaltstack - PTREGSCALL stub32_sigsuspend, sys32_sigsuspend - PTREGSCALL stub32_execve, sys32_execve - PTREGSCALL stub32_fork, sys_fork - PTREGSCALL stub32_clone, sys32_clone - PTREGSCALL stub32_vfork, sys_vfork - PTREGSCALL stub32_iopl, sys_iopl - PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend + PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn, %rdi + PTREGSCALL stub32_sigreturn, sys32_sigreturn, %rdi + PTREGSCALL stub32_sigaltstack, sys32_sigaltstack, %rdx + PTREGSCALL stub32_sigsuspend, sys32_sigsuspend, %rcx + PTREGSCALL stub32_execve, sys32_execve, %rcx + PTREGSCALL stub32_fork, sys_fork, %rdi + PTREGSCALL stub32_clone, sys32_clone, %rdx + PTREGSCALL stub32_vfork, sys_vfork, %rdi + PTREGSCALL stub32_iopl, sys_iopl, %rsi + PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend, %rdx ENTRY(ia32_ptregs_common) CFI_STARTPROC popq %r11 SAVE_REST - movq %r11, %r15 call *%rax - movq %r15, %r11 RESTORE_REST - leaq ia32_sysret(%rip),%r11 - pushq %r11 - ret + jmp ia32_sysret /* misbalances the return cache */ CFI_ENDPROC .data @@ -332,7 +329,7 @@ ia32_sys_call_table: .quad sys_getuid16 .quad sys_stime /* stime */ /* 25 */ .quad sys32_ptrace /* ptrace */ - .quad sys_alarm /* XXX sign extension??? */ + .quad sys_alarm .quad sys_fstat /* (old)fstat */ .quad sys_pause .quad compat_sys_utime /* 30 */ @@ -558,7 +555,7 @@ ia32_sys_call_table: .quad sys_fadvise64 /* 250 */ .quad quiet_ni_syscall /* free_huge_pages */ .quad sys_exit_group - .quad sys_lookup_dcookie + .quad sys32_lookup_dcookie .quad sys_epoll_create .quad sys_epoll_ctl /* 255 */ .quad sys_epoll_wait diff -puN arch/x86_64/ia32/ia32_ioctl.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/ia32_ioctl.c --- 25/arch/x86_64/ia32/ia32_ioctl.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.975007488 -0700 +++ 25-akpm/arch/x86_64/ia32/ia32_ioctl.c 2004-07-31 16:54:51.031998824 -0700 @@ -171,23 +171,8 @@ struct ioctl_trans ioctl_start[] = { COMPATIBLE_IOCTL(HDIO_SET_KEEPSETTINGS) COMPATIBLE_IOCTL(HDIO_SCAN_HWIF) COMPATIBLE_IOCTL(BLKRASET) -COMPATIBLE_IOCTL(BLKFRASET) COMPATIBLE_IOCTL(0x4B50) /* KDGHWCLK - not in the kernel, but don't complain */ COMPATIBLE_IOCTL(0x4B51) /* KDSHWCLK - not in the kernel, but don't complain */ -COMPATIBLE_IOCTL(RTC_AIE_ON) -COMPATIBLE_IOCTL(RTC_AIE_OFF) -COMPATIBLE_IOCTL(RTC_UIE_ON) -COMPATIBLE_IOCTL(RTC_UIE_OFF) -COMPATIBLE_IOCTL(RTC_PIE_ON) -COMPATIBLE_IOCTL(RTC_PIE_OFF) -COMPATIBLE_IOCTL(RTC_WIE_ON) -COMPATIBLE_IOCTL(RTC_WIE_OFF) -COMPATIBLE_IOCTL(RTC_ALM_SET) -COMPATIBLE_IOCTL(RTC_ALM_READ) -COMPATIBLE_IOCTL(RTC_RD_TIME) -COMPATIBLE_IOCTL(RTC_SET_TIME) -COMPATIBLE_IOCTL(RTC_WKALM_SET) -COMPATIBLE_IOCTL(RTC_WKALM_RD) COMPATIBLE_IOCTL(FIOQSIZE) /* And these ioctls need translation */ diff -puN arch/x86_64/ia32/ia32_signal.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/ia32_signal.c --- 25/arch/x86_64/ia32/ia32_signal.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.976007336 -0700 +++ 25-akpm/arch/x86_64/ia32/ia32_signal.c 2004-07-31 16:54:51.032998672 -0700 @@ -115,7 +115,8 @@ int ia32_copy_siginfo_from_user(siginfo_ } asmlinkage long -sys32_sigsuspend(int history0, int history1, old_sigset_t mask, struct pt_regs regs) +sys32_sigsuspend(int history0, int history1, old_sigset_t mask, + struct pt_regs *regs) { sigset_t saveset; @@ -126,11 +127,11 @@ sys32_sigsuspend(int history0, int histo recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); - regs.rax = -EINTR; + regs->rax = -EINTR; while (1) { current->state = TASK_INTERRUPTIBLE; schedule(); - if (do_signal(®s, &saveset)) + if (do_signal(regs, &saveset)) return -EINTR; } } @@ -138,7 +139,7 @@ sys32_sigsuspend(int history0, int histo asmlinkage long sys32_sigaltstack(const stack_ia32_t __user *uss_ptr, stack_ia32_t __user *uoss_ptr, - struct pt_regs regs) + struct pt_regs *regs) { stack_t uss,uoss; int ret; @@ -155,7 +156,7 @@ sys32_sigaltstack(const stack_ia32_t __u } seg = get_fs(); set_fs(KERNEL_DS); - ret = do_sigaltstack(uss_ptr ? &uss : NULL, &uoss, regs.rsp); + ret = do_sigaltstack(uss_ptr ? &uss : NULL, &uoss, regs->rsp); set_fs(seg); if (ret >= 0 && uoss_ptr) { if (!access_ok(VERIFY_WRITE,uoss_ptr,sizeof(stack_ia32_t)) || @@ -274,9 +275,9 @@ badframe: return 1; } -asmlinkage long sys32_sigreturn(struct pt_regs regs) +asmlinkage long sys32_sigreturn(struct pt_regs *regs) { - struct sigframe __user *frame = (struct sigframe __user *)(regs.rsp-8); + struct sigframe __user *frame = (struct sigframe __user *)(regs->rsp-8); sigset_t set; unsigned int eax; @@ -294,20 +295,23 @@ asmlinkage long sys32_sigreturn(struct p recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); - if (ia32_restore_sigcontext(®s, &frame->sc, &eax)) + if (ia32_restore_sigcontext(regs, &frame->sc, &eax)) goto badframe; return eax; badframe: - signal_fault(®s, frame, "32bit sigreturn"); + signal_fault(regs, frame, "32bit sigreturn"); return 0; } -asmlinkage long sys32_rt_sigreturn(struct pt_regs regs) +asmlinkage long sys32_rt_sigreturn(struct pt_regs *regs) { - struct rt_sigframe __user *frame = (struct rt_sigframe __user *)(regs.rsp - 4); + struct rt_sigframe __user *frame; sigset_t set; unsigned int eax; + struct pt_regs tregs; + + frame = (struct rt_sigframe __user *)(regs->rsp - 4); if (verify_area(VERIFY_READ, frame, sizeof(*frame))) goto badframe; @@ -320,16 +324,17 @@ asmlinkage long sys32_rt_sigreturn(struc recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); - if (ia32_restore_sigcontext(®s, &frame->uc.uc_mcontext, &eax)) + if (ia32_restore_sigcontext(regs, &frame->uc.uc_mcontext, &eax)) goto badframe; - if (sys32_sigaltstack(&frame->uc.uc_stack, NULL, regs) == -EFAULT) + tregs = *regs; + if (sys32_sigaltstack(&frame->uc.uc_stack, NULL, &tregs) == -EFAULT) goto badframe; return eax; badframe: - signal_fault(®s,frame,"32bit rt sigreturn"); + signal_fault(regs,frame,"32bit rt sigreturn"); return 0; } diff -puN arch/x86_64/ia32/ptrace32.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/ptrace32.c --- 25/arch/x86_64/ia32/ptrace32.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.978007032 -0700 +++ 25-akpm/arch/x86_64/ia32/ptrace32.c 2004-07-31 16:54:51.033998520 -0700 @@ -249,8 +249,8 @@ asmlinkage long sys32_ptrace(long reques case PTRACE_GETFPREGS: case PTRACE_SETFPXREGS: case PTRACE_GETFPXREGS: + case PTRACE_GETEVENTMSG: break; - } child = find_target(request, pid, &ret); @@ -363,6 +363,10 @@ asmlinkage long sys32_ptrace(long reques break; } + case PTRACE_GETEVENTMSG: + ret = put_user(child->ptrace_message,(unsigned int __user *)(u64)data); + break; + default: ret = -EINVAL; break; diff -puN arch/x86_64/ia32/sys_ia32.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/ia32/sys_ia32.c --- 25/arch/x86_64/ia32/sys_ia32.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.979006880 -0700 +++ 25-akpm/arch/x86_64/ia32/sys_ia32.c 2004-07-31 16:54:51.034998368 -0700 @@ -1125,7 +1125,7 @@ long sys32_ustat(unsigned dev, struct us } asmlinkage long sys32_execve(char __user *name, compat_uptr_t __user *argv, - compat_uptr_t __user *envp, struct pt_regs regs) + compat_uptr_t __user *envp, struct pt_regs *regs) { long error; char * filename; @@ -1134,20 +1134,21 @@ asmlinkage long sys32_execve(char __user error = PTR_ERR(filename); if (IS_ERR(filename)) return error; - error = compat_do_execve(filename, argv, envp, ®s); + error = compat_do_execve(filename, argv, envp, regs); if (error == 0) current->ptrace &= ~PT_DTRACE; putname(filename); return error; } -asmlinkage long sys32_clone(unsigned int clone_flags, unsigned int newsp, struct pt_regs regs) +asmlinkage long sys32_clone(unsigned int clone_flags, unsigned int newsp, + struct pt_regs *regs) { - void __user *parent_tid = (void __user *)regs.rdx; - void __user *child_tid = (void __user *)regs.rdi; + void __user *parent_tid = (void __user *)regs->rdx; + void __user *child_tid = (void __user *)regs->rdi; if (!newsp) - newsp = regs.rsp; - return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, ®s, 0, + newsp = regs->rsp; + return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, regs, 0, parent_tid, child_tid); } @@ -1337,6 +1338,12 @@ long sys32_quotactl(void) return -ENOSYS; } +long sys32_lookup_dcookie(u32 addr_low, u32 addr_high, + char __user * buf, size_t len) +{ + return sys_lookup_dcookie(((u64)addr_high << 32) | addr_low, buf, len); +} + cond_syscall(sys32_ipc) static int __init ia32_init (void) diff -puN arch/x86_64/kernel/aperture.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/aperture.c --- 25/arch/x86_64/kernel/aperture.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.981006576 -0700 +++ 25-akpm/arch/x86_64/kernel/aperture.c 2004-07-31 16:54:51.035998216 -0700 @@ -31,6 +31,8 @@ int iommu_aperture_allowed __initdata = int fallback_aper_order __initdata = 1; /* 64MB */ int fallback_aper_force __initdata = 0; +int fix_aperture __initdata = 1; + /* This code runs before the PCI subsystem is initialized, so just access the northbridge directly. */ @@ -202,7 +204,7 @@ void __init iommu_hole_init(void) u64 aper_base; int valid_agp = 0; - if (iommu_aperture_disabled) + if (iommu_aperture_disabled || !fix_aperture) return; printk("Checking aperture...\n"); @@ -241,20 +243,15 @@ void __init iommu_hole_init(void) /* Got the aperture from the AGP bridge */ } else if ((!no_iommu && end_pfn >= 0xffffffff>>PAGE_SHIFT) || force_iommu || - valid_agp || + valid_agp || fallback_aper_force) { - /* When there is a AGP bridge in the system assume the - user wants to use the AGP driver too and needs an - aperture. However this case (AGP but no good - aperture) should only happen with a more broken than - usual BIOS, because it would even break Windows. */ - - printk("Your BIOS doesn't leave a aperture memory hole\n"); - printk("Please enable the IOMMU option in the BIOS setup\n"); - printk("This costs you %d MB of RAM\n", 32 << fallback_aper_order); - + printk("Your BIOS doesn't leave a aperture memory hole\n"); + printk("Please enable the IOMMU option in the BIOS setup\n"); + printk("This costs you %d MB of RAM\n", + 32 << fallback_aper_order); + aper_order = fallback_aper_order; - aper_alloc = allocate_aperture(); + aper_alloc = allocate_aperture(); if (!aper_alloc) { /* Could disable AGP and IOMMU here, but it's probably not worth it. But the later users cannot deal with diff -puN arch/x86_64/kernel/early_printk.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/early_printk.c --- 25/arch/x86_64/kernel/early_printk.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.982006424 -0700 +++ 25-akpm/arch/x86_64/kernel/early_printk.c 2004-07-31 16:54:51.035998216 -0700 @@ -99,18 +99,17 @@ static void early_serial_write(struct co #define DEFAULT_BAUD 9600 -static __init void early_serial_init(char *opt) +static __init void early_serial_init(char *s) { unsigned char c; unsigned divisor; unsigned baud = DEFAULT_BAUD; - char *s, *e; + char *e; - if (*opt == ',') - ++opt; + if (*s == ',') + ++s; - s = strsep(&opt, ","); - if (s != NULL) { + if (*s) { unsigned port; if (!strncmp(s,"0x",2)) { early_serial_base = simple_strtoul(s, &e, 16); @@ -124,6 +123,9 @@ static __init void early_serial_init(cha port = 0; early_serial_base = bases[port]; } + s += strcspn(s, ","); + if (*s == ',') + s++; } outb(0x3, early_serial_base + LCR); /* 8n1 */ @@ -131,8 +133,7 @@ static __init void early_serial_init(cha outb(0, early_serial_base + FCR); /* no fifo */ outb(0x3, early_serial_base + MCR); /* DTR + RTS */ - s = strsep(&opt, ","); - if (s != NULL) { + if (*s) { baud = simple_strtoul(s, &e, 0); if (baud == 0 || s == e) baud = DEFAULT_BAUD; diff -puN arch/x86_64/kernel/entry.S~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/entry.S --- 25/arch/x86_64/kernel/entry.S~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.983006272 -0700 +++ 25-akpm/arch/x86_64/kernel/entry.S 2004-07-31 16:54:51.036998064 -0700 @@ -323,19 +323,20 @@ int_restore_rest: * Certain special system calls that need to save a complete full stack frame. */ - .macro PTREGSCALL label,func + .macro PTREGSCALL label,func,arg .globl \label \label: leaq \func(%rip),%rax + leaq -ARGOFFSET+8(%rsp),\arg /* 8 for return address */ jmp ptregscall_common .endm - PTREGSCALL stub_clone, sys_clone - PTREGSCALL stub_fork, sys_fork - PTREGSCALL stub_vfork, sys_vfork - PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend - PTREGSCALL stub_sigaltstack, sys_sigaltstack - PTREGSCALL stub_iopl, sys_iopl + PTREGSCALL stub_clone, sys_clone, %r8 + PTREGSCALL stub_fork, sys_fork, %rdi + PTREGSCALL stub_vfork, sys_vfork, %rdi + PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx + PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx + PTREGSCALL stub_iopl, sys_iopl, %rsi ENTRY(ptregscall_common) CFI_STARTPROC @@ -385,6 +386,7 @@ ENTRY(stub_rt_sigreturn) CFI_STARTPROC addq $8, %rsp SAVE_REST + movq %rsp,%rdi FIXUP_TOP_OF_STACK %r11 call sys_rt_sigreturn movq %rax,RAX(%rsp) # fixme, this could be done at the higher layer diff -puN arch/x86_64/kernel/ioport.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/ioport.c --- 25/arch/x86_64/kernel/ioport.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.985005968 -0700 +++ 25-akpm/arch/x86_64/kernel/ioport.c 2004-07-31 16:54:51.037997912 -0700 @@ -83,9 +83,9 @@ asmlinkage long sys_ioperm(unsigned long * code. */ -asmlinkage long sys_iopl(unsigned int level, struct pt_regs regs) +asmlinkage long sys_iopl(unsigned int level, struct pt_regs *regs) { - unsigned int old = (regs.eflags >> 12) & 3; + unsigned int old = (regs->eflags >> 12) & 3; if (level > 3) return -EINVAL; @@ -94,6 +94,6 @@ asmlinkage long sys_iopl(unsigned int le if (!capable(CAP_SYS_RAWIO)) return -EPERM; } - regs.eflags = (regs.eflags &~ 0x3000UL) | (level << 12); + regs->eflags = (regs->eflags &~ 0x3000UL) | (level << 12); return 0; } diff -puN arch/x86_64/kernel/mce.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/mce.c --- 25/arch/x86_64/kernel/mce.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.986005816 -0700 +++ 25-akpm/arch/x86_64/kernel/mce.c 2004-07-31 16:54:51.039997608 -0700 @@ -24,7 +24,8 @@ #define MISC_MCELOG_MINOR 227 #define NR_BANKS 5 -static int mce_disabled __initdata; +static int mce_dont_init; + /* 0: always panic, 1: panic if deadlock possible, 2: try to avoid panic, 3: never panic or exit (for testing only) */ static int tolerant = 1; @@ -114,9 +115,8 @@ static void mce_panic(char *msg, struct static int mce_available(struct cpuinfo_x86 *c) { - return !mce_disabled && - test_bit(X86_FEATURE_MCE, &c->x86_capability) && - test_bit(X86_FEATURE_MCA, &c->x86_capability); + return test_bit(X86_FEATURE_MCE, &c->x86_capability) && + test_bit(X86_FEATURE_MCA, &c->x86_capability); } /* @@ -128,8 +128,9 @@ void do_machine_check(struct pt_regs * r struct mce m, panicm; int nowayout = (tolerant < 1); int kill_it = 0; - u64 mcestart; + u64 mcestart = 0; int i; + int panicm_found = 0; if (regs) notify_die(DIE_NMI, "machine check", regs, error_code, 255, SIGKILL); @@ -139,17 +140,11 @@ void do_machine_check(struct pt_regs * r memset(&m, 0, sizeof(struct mce)); m.cpu = hard_smp_processor_id(); rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus); - if (!regs && (m.mcgstatus & MCG_STATUS_MCIP)) - return; if (!(m.mcgstatus & MCG_STATUS_RIPV)) kill_it = 1; - if (regs) { - m.rip = regs->rip; - m.cs = regs->cs; - } rdtscll(mcestart); - mb(); + barrier(); for (i = 0; i < banks; i++) { if (!bank[i]) @@ -157,52 +152,62 @@ void do_machine_check(struct pt_regs * r m.misc = 0; m.addr = 0; + m.bank = i; + m.tsc = 0; rdmsrl(MSR_IA32_MC0_STATUS + i*4, m.status); if ((m.status & MCI_STATUS_VAL) == 0) continue; - /* Should be implied by the banks check above, but - check it anyways */ - if ((m.status & MCI_STATUS_EN) == 0) - continue; - /* Did this bank cause the exception? */ - /* Assume that the bank with uncorrectable errors did it, - and that there is only a single one. */ - if (m.status & MCI_STATUS_UC) { - panicm = m; - } else { - m.rip = 0; - m.cs = 0; + if (m.status & MCI_STATUS_EN) { + /* In theory _OVER could be a nowayout too, but + assume any overflowed errors were no fatal. */ + nowayout |= !!(m.status & MCI_STATUS_PCC); + kill_it |= !!(m.status & MCI_STATUS_UC); } - /* In theory _OVER could be a nowayout too, but - assume any overflowed errors were no fatal. */ - nowayout |= !!(m.status & MCI_STATUS_PCC); - kill_it |= !!(m.status & MCI_STATUS_UC); - m.bank = i; - if (m.status & MCI_STATUS_MISCV) rdmsrl(MSR_IA32_MC0_MISC + i*4, m.misc); if (m.status & MCI_STATUS_ADDRV) rdmsrl(MSR_IA32_MC0_ADDR + i*4, m.addr); - rdtscll(m.tsc); + if (regs && (m.mcgstatus & MCG_STATUS_RIPV)) { + m.rip = regs->rip; + m.cs = regs->cs; + } else { + m.rip = 0; + m.cs = 0; + } + + if (error_code != -1) + rdtscll(m.tsc); wrmsrl(MSR_IA32_MC0_STATUS + i*4, 0); mce_log(&m); + + /* Did this bank cause the exception? */ + /* Assume that the bank with uncorrectable errors did it, + and that there is only a single one. */ + if ((m.status & MCI_STATUS_UC) && (m.status & MCI_STATUS_EN)) { + panicm = m; + panicm_found = 1; + } } - wrmsrl(MSR_IA32_MCG_STATUS, 0); /* Never do anything final in the polling timer */ if (!regs) - return; + goto out; + + /* If we didn't find an uncorrectable error, pick + the last one (shouldn't happen, just being safe). */ + if (!panicm_found) + panicm = m; if (nowayout) - mce_panic("Machine check", &m, mcestart); + mce_panic("Machine check", &panicm, mcestart); if (kill_it) { int user_space = 0; if (m.mcgstatus & MCG_STATUS_RIPV) - user_space = m.rip && (m.cs & 3); + user_space = panicm.rip && (panicm.cs & 3); /* When the machine was in user space and the CPU didn't get confused it's normally not necessary to panic, unless you @@ -215,18 +220,15 @@ void do_machine_check(struct pt_regs * r (unsigned)current->pid <= 1) mce_panic("Uncorrected machine check", &panicm, mcestart); - /* do_exit takes an awful lot of locks and has as slight risk - of deadlocking. If you don't want that don't set tolerant >= 2 */ + /* do_exit takes an awful lot of locks and has as + slight risk of deadlocking. If you don't want that + don't set tolerant >= 2 */ if (tolerant < 3) do_exit(SIGBUS); } -} -static void mce_clear_all(void) -{ - int i; - for (i = 0; i < banks; i++) - wrmsrl(MSR_IA32_MC0_STATUS + i*4, 0); + out: + /* Last thing done in the machine check exception to clear state. */ wrmsrl(MSR_IA32_MCG_STATUS, 0); } @@ -269,22 +271,25 @@ static void mce_init(void *dummy) int i; rdmsrl(MSR_IA32_MCG_CAP, cap); - if (cap & MCG_CTL_P) - wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff); - banks = cap & 0xff; if (banks > NR_BANKS) { printk(KERN_INFO "MCE: warning: using only %d banks\n", banks); banks = NR_BANKS; } - mce_clear_all(); + /* Log the machine checks left over from the previous reset. + This also clears all registers */ + do_machine_check(NULL, -1); + + set_in_cr4(X86_CR4_MCE); + + if (cap & MCG_CTL_P) + wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff); + for (i = 0; i < banks; i++) { wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]); wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0); } - - set_in_cr4(X86_CR4_MCE); } /* Add per CPU specific workarounds here */ @@ -308,7 +313,9 @@ void __init mcheck_init(struct cpuinfo_x mce_cpu_quirks(c); - if (test_and_set_bit(smp_processor_id(), &mce_cpus) || !mce_available(c)) + if (mce_dont_init || + test_and_set_bit(smp_processor_id(), &mce_cpus) || + !mce_available(c)) return; mce_init(NULL); @@ -412,15 +419,16 @@ static struct miscdevice mce_log_device static int __init mcheck_disable(char *str) { - mce_disabled = 1; + mce_dont_init = 1; return 0; } -/* mce=off disable machine check */ +/* mce=off disables machine check. Note you can reenable it later + using sysfs */ static int __init mcheck_enable(char *str) { if (!strcmp(str, "off")) - mce_disabled = 1; + mce_dont_init = 1; else printk("mce= argument %s ignored. Please use /sys", str); return 0; @@ -436,7 +444,6 @@ __setup("mce", mcheck_enable); /* On resume clear all MCE state. Don't want to see leftovers from the BIOS. */ static int mce_resume(struct sys_device *dev) { - mce_clear_all(); on_each_cpu(mce_init, NULL, 1, 1); return 0; } @@ -494,7 +501,7 @@ static __init int mce_init_device(void) if (!err) err = sysdev_register(&device_mce); if (!err) { - /* could create per CPU objects, but is not worth it. */ + /* could create per CPU objects, but it is not worth it. */ sysdev_create_file(&device_mce, &attr_bank0ctl); sysdev_create_file(&device_mce, &attr_bank1ctl); sysdev_create_file(&device_mce, &attr_bank2ctl); diff -puN arch/x86_64/kernel/msr.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/msr.c --- 25/arch/x86_64/kernel/msr.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.988005512 -0700 +++ 25-akpm/arch/x86_64/kernel/msr.c 2004-07-31 16:54:51.040997456 -0700 @@ -46,234 +46,229 @@ static inline int wrmsr_eio(u32 reg, u32 eax, u32 edx) { - int err; + int err; - asm volatile( - "1: wrmsr\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %4,%0\n" - " jmp 2b\n" - ".previous\n" - ".section __ex_table,\"a\"\n" - " .align 8\n" - " .quad 1b,3b\n" - ".previous" - : "=&bDS" (err) - : "a" (eax), "d" (edx), "c" (reg), "i" (-EIO), "0" (0)); + asm volatile ("1: wrmsr\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %4,%0\n" + " jmp 2b\n" + ".previous\n" + ".section __ex_table,\"a\"\n" + " .align 8\n" " .quad 1b,3b\n" ".previous":"=&bDS" (err) + :"a"(eax), "d"(edx), "c"(reg), "i"(-EIO), "0"(0)); - return err; + return err; } static inline int rdmsr_eio(u32 reg, u32 *eax, u32 *edx) { - int err; + int err; - asm volatile( - "1: rdmsr\n" - "2:\n" - ".section .fixup,\"ax\"\n" - "3: movl %4,%0\n" - " jmp 2b\n" - ".previous\n" - ".section __ex_table,\"a\"\n" - " .align 8\n" - " .quad 1b,3b\n" - ".previous" - : "=&bDS" (err), "=a" (*eax), "=d" (*edx) - : "c" (reg), "i" (-EIO), "0" (0)); + asm volatile ("1: rdmsr\n" + "2:\n" + ".section .fixup,\"ax\"\n" + "3: movl %4,%0\n" + " jmp 2b\n" + ".previous\n" + ".section __ex_table,\"a\"\n" + " .align 8\n" + " .quad 1b,3b\n" + ".previous":"=&bDS" (err), "=a"(*eax), "=d"(*edx) + :"c"(reg), "i"(-EIO), "0"(0)); - return err; + return err; } #ifdef CONFIG_SMP struct msr_command { - int cpu; - int err; - u32 reg; - u32 data[2]; + int cpu; + int err; + u32 reg; + u32 data[2]; }; static void msr_smp_wrmsr(void *cmd_block) { - struct msr_command *cmd = (struct msr_command *) cmd_block; - - if ( cmd->cpu == smp_processor_id() ) - cmd->err = wrmsr_eio(cmd->reg, cmd->data[0], cmd->data[1]); + struct msr_command *cmd = (struct msr_command *)cmd_block; + + if (cmd->cpu == smp_processor_id()) + cmd->err = wrmsr_eio(cmd->reg, cmd->data[0], cmd->data[1]); } static void msr_smp_rdmsr(void *cmd_block) { - struct msr_command *cmd = (struct msr_command *) cmd_block; - - if ( cmd->cpu == smp_processor_id() ) - cmd->err = rdmsr_eio(cmd->reg, &cmd->data[0], &cmd->data[1]); + struct msr_command *cmd = (struct msr_command *)cmd_block; + + if (cmd->cpu == smp_processor_id()) + cmd->err = rdmsr_eio(cmd->reg, &cmd->data[0], &cmd->data[1]); } static inline int do_wrmsr(int cpu, u32 reg, u32 eax, u32 edx) { - struct msr_command cmd; - int ret; + struct msr_command cmd; + int ret; + + preempt_disable(); + if (cpu == smp_processor_id()) { + ret = wrmsr_eio(reg, eax, edx); + } else { + cmd.cpu = cpu; + cmd.reg = reg; + cmd.data[0] = eax; + cmd.data[1] = edx; - preempt_disable(); - if ( cpu == smp_processor_id() ) { - ret = wrmsr_eio(reg, eax, edx); - } else { - cmd.cpu = cpu; - cmd.reg = reg; - cmd.data[0] = eax; - cmd.data[1] = edx; - - smp_call_function(msr_smp_wrmsr, &cmd, 1, 1); - ret = cmd.err; - } - preempt_enable(); - return ret; + smp_call_function(msr_smp_wrmsr, &cmd, 1, 1); + ret = cmd.err; + } + preempt_enable(); + return ret; } -static inline int do_rdmsr(int cpu, u32 reg, u32 *eax, u32 *edx) +static inline int do_rdmsr(int cpu, u32 reg, u32 * eax, u32 * edx) { - struct msr_command cmd; - int ret; + struct msr_command cmd; + int ret; + + preempt_disable(); + if (cpu == smp_processor_id()) { + ret = rdmsr_eio(reg, eax, edx); + } else { + cmd.cpu = cpu; + cmd.reg = reg; + + smp_call_function(msr_smp_rdmsr, &cmd, 1, 1); - preempt_disable(); - if ( cpu == smp_processor_id() ) { - ret = rdmsr_eio(reg, eax, edx); - } else { - cmd.cpu = cpu; - cmd.reg = reg; - - smp_call_function(msr_smp_rdmsr, &cmd, 1, 1); - - *eax = cmd.data[0]; - *edx = cmd.data[1]; - - ret = cmd.err; - } - preempt_enable(); - return ret; + *eax = cmd.data[0]; + *edx = cmd.data[1]; + + ret = cmd.err; + } + preempt_enable(); + return ret; } -#else /* ! CONFIG_SMP */ +#else /* ! CONFIG_SMP */ static inline int do_wrmsr(int cpu, u32 reg, u32 eax, u32 edx) { - return wrmsr_eio(reg, eax, edx); + return wrmsr_eio(reg, eax, edx); } static inline int do_rdmsr(int cpu, u32 reg, u32 *eax, u32 *edx) { - return rdmsr_eio(reg, eax, edx); + return rdmsr_eio(reg, eax, edx); } -#endif /* ! CONFIG_SMP */ +#endif /* ! CONFIG_SMP */ static loff_t msr_seek(struct file *file, loff_t offset, int orig) { - loff_t ret = -EINVAL; - lock_kernel(); - switch (orig) { - case 0: - file->f_pos = offset; - ret = file->f_pos; - break; - case 1: - file->f_pos += offset; - ret = file->f_pos; - } - unlock_kernel(); - return ret; -} - -static ssize_t msr_read(struct file * file, char __user * buf, - size_t count, loff_t *ppos) -{ - char __user *tmp = buf; - u32 data[2]; - size_t rv; - u32 reg = *ppos; - int cpu = iminor(file->f_dentry->d_inode); - int err; - - if ( count % 8 ) - return -EINVAL; /* Invalid chunk size */ - - for ( rv = 0 ; count ; count -= 8 ) { - err = do_rdmsr(cpu, reg, &data[0], &data[1]); - if ( err ) - return err; - if ( copy_to_user(tmp,&data,8) ) - return -EFAULT; - tmp += 8; - } + loff_t ret = -EINVAL; + + lock_kernel(); + switch (orig) { + case 0: + file->f_pos = offset; + ret = file->f_pos; + break; + case 1: + file->f_pos += offset; + ret = file->f_pos; + } + unlock_kernel(); + return ret; +} + +static ssize_t msr_read(struct file *file, char __user * buf, + size_t count, loff_t * ppos) +{ + u32 __user *tmp = (u32 __user *) buf; + u32 data[2]; + size_t rv; + u32 reg = *ppos; + int cpu = iminor(file->f_dentry->d_inode); + int err; + + if (count % 8) + return -EINVAL; /* Invalid chunk size */ + + for (rv = 0; count; count -= 8) { + err = do_rdmsr(cpu, reg, &data[0], &data[1]); + if (err) + return err; + if (copy_to_user(tmp, &data, 8)) + return -EFAULT; + tmp += 2; + } - return tmp - buf; + return ((char __user *)tmp) - buf; } -static ssize_t msr_write(struct file * file, const char __user * buf, +static ssize_t msr_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - const char __user *tmp = buf; - u32 data[2]; - size_t rv; - u32 reg = *ppos; - int cpu = iminor(file->f_dentry->d_inode); - int err; - - if ( count % 8 ) - return -EINVAL; /* Invalid chunk size */ - - for ( rv = 0 ; count ; count -= 8 ) { - if ( copy_from_user(&data,tmp,8) ) - return -EFAULT; - err = do_wrmsr(cpu, reg, data[0], data[1]); - if ( err ) - return err; - tmp += 8; - } + const u32 __user *tmp = (const u32 __user *)buf; + u32 data[2]; + size_t rv; + u32 reg = *ppos; + int cpu = iminor(file->f_dentry->d_inode); + int err; + + if (count % 8) + return -EINVAL; /* Invalid chunk size */ + + for (rv = 0; count; count -= 8) { + if (copy_from_user(&data, tmp, 8)) + return -EFAULT; + err = do_wrmsr(cpu, reg, data[0], data[1]); + if (err) + return err; + tmp += 2; + } - return tmp - buf; + return ((char __user *)tmp) - buf; } static int msr_open(struct inode *inode, struct file *file) { - int cpu = iminor(file->f_dentry->d_inode); - struct cpuinfo_x86 *c = &(cpu_data)[cpu]; - - if (cpu >= NR_CPUS || !cpu_online(cpu)) - return -ENXIO; /* No such CPU */ - if ( !cpu_has(c, X86_FEATURE_MSR) ) - return -EIO; /* MSR not supported */ - - return 0; + unsigned int cpu = iminor(file->f_dentry->d_inode); + struct cpuinfo_x86 *c = &(cpu_data)[cpu]; + + if (cpu >= NR_CPUS || !cpu_online(cpu)) + return -ENXIO; /* No such CPU */ + if (!cpu_has(c, X86_FEATURE_MSR)) + return -EIO; /* MSR not supported */ + + return 0; } /* * File operations we support */ static struct file_operations msr_fops = { - .owner = THIS_MODULE, - .llseek = msr_seek, - .read = msr_read, - .write = msr_write, - .open = msr_open, + .owner = THIS_MODULE, + .llseek = msr_seek, + .read = msr_read, + .write = msr_write, + .open = msr_open, }; int __init msr_init(void) { - if (register_chrdev(MSR_MAJOR, "cpu/msr", &msr_fops)) { - printk(KERN_ERR "msr: unable to get major %d for msr\n", - MSR_MAJOR); - return -EBUSY; - } - - return 0; + if (register_chrdev(MSR_MAJOR, "cpu/msr", &msr_fops)) { + printk(KERN_ERR "msr: unable to get major %d for msr\n", + MSR_MAJOR); + return -EBUSY; + } + + return 0; } void __exit msr_exit(void) { - unregister_chrdev(MSR_MAJOR, "cpu/msr"); + unregister_chrdev(MSR_MAJOR, "cpu/msr"); } module_init(msr_init); diff -puN arch/x86_64/kernel/pci-dma.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/pci-dma.c --- 25/arch/x86_64/kernel/pci-dma.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.989005360 -0700 +++ 25-akpm/arch/x86_64/kernel/pci-dma.c 2004-07-31 16:54:51.041997304 -0700 @@ -24,12 +24,12 @@ * Device ownership issues as mentioned above for pci_map_single are * the same here. */ -int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, +int dma_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, int direction) { int i; - BUG_ON(direction == PCI_DMA_NONE); + BUG_ON(direction == DMA_NONE); for (i = 0; i < nents; i++ ) { struct scatterlist *s = &sg[i]; BUG_ON(!s->page); @@ -40,13 +40,13 @@ int pci_map_sg(struct pci_dev *hwdev, st return nents; } -EXPORT_SYMBOL(pci_map_sg); +EXPORT_SYMBOL(dma_map_sg); /* Unmap a set of streaming mode DMA translations. * Again, cpu read rules concerning calls here are the same as for * pci_unmap_single() above. */ -void pci_unmap_sg(struct pci_dev *dev, struct scatterlist *sg, +void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, int dir) { int i; @@ -58,4 +58,4 @@ void pci_unmap_sg(struct pci_dev *dev, s } } -EXPORT_SYMBOL(pci_unmap_sg); +EXPORT_SYMBOL(dma_unmap_sg); diff -puN arch/x86_64/kernel/pci-gart.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/pci-gart.c --- 25/arch/x86_64/kernel/pci-gart.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.990005208 -0700 +++ 25-akpm/arch/x86_64/kernel/pci-gart.c 2004-07-31 16:54:51.045996696 -0700 @@ -31,12 +31,6 @@ #include #include -#ifdef CONFIG_PREEMPT -#define preempt_atomic() in_atomic() -#else -#define preempt_atomic() 1 -#endif - dma_addr_t bad_dma_address; unsigned long iommu_bus_base; /* GART remapping area (physical) */ @@ -54,7 +48,7 @@ int force_iommu = 1; int panic_on_overflow = 0; int force_iommu = 0; #endif -int iommu_merge = 0; +int iommu_merge = 1; int iommu_sac_force = 0; /* If this is disabled the IOMMU will use an optimized flushing strategy @@ -64,6 +58,10 @@ int iommu_sac_force = 0; also seen with Qlogic at least). */ int iommu_fullflush = 1; +/* This tells the BIO block layer to assume merging. Default to off + because we cannot guarantee merging later. */ +int iommu_bio_merge = 0; + #define MAX_NB 8 /* Allocation bitmap for the remapping area */ @@ -104,8 +102,16 @@ AGPEXTERN __u32 *agp_gatt_table; static unsigned long next_bit; /* protected by iommu_bitmap_lock */ static int need_flush; /* global flush state. set for each gart wrap */ -static dma_addr_t pci_map_area(struct pci_dev *dev, unsigned long phys_mem, - size_t size, int dir); +static dma_addr_t dma_map_area(struct device *dev, unsigned long phys_mem, + size_t size, int dir, int do_panic); + +/* Dummy device used for NULL arguments (normally ISA). Better would + be probably a smaller DMA mask, but this is bug-to-bug compatible to i386. */ +static struct device fallback_dev = { + .bus_id = "fallback device", + .coherent_dma_mask = 0xffffffff, + .dma_mask = &fallback_dev.coherent_dma_mask, +}; static unsigned long alloc_iommu(int size) { @@ -146,7 +152,7 @@ static void free_iommu(unsigned long off /* * Use global flush state to avoid races with multiple flushers. */ -static void flush_gart(struct pci_dev *dev) +static void flush_gart(struct device *dev) { unsigned long flags; int flushed = 0; @@ -174,30 +180,29 @@ static void flush_gart(struct pci_dev *d } /* - * Allocate memory for a consistent mapping. + * Allocate memory for a coherent mapping. */ -void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, + unsigned gfp) { void *memory; - int gfp = preempt_atomic() ? GFP_ATOMIC : GFP_KERNEL; unsigned long dma_mask = 0; u64 bus; - if (hwdev) - dma_mask = hwdev->dev.coherent_dma_mask; + if (!dev) + dev = &fallback_dev; + dma_mask = dev->coherent_dma_mask; if (dma_mask == 0) dma_mask = 0xffffffff; /* Kludge to make it bug-to-bug compatible with i386. i386 - uses the normal dma_mask for alloc_consistent. */ - if (hwdev) - dma_mask &= hwdev->dma_mask; + uses the normal dma_mask for alloc_coherent. */ + dma_mask &= *dev->dma_mask; again: memory = (void *)__get_free_pages(gfp, get_order(size)); if (memory == NULL) - return NULL; + return NULL; { int high, mmu; @@ -223,28 +228,29 @@ void *pci_alloc_consistent(struct pci_de } } - *dma_handle = pci_map_area(hwdev, bus, size, PCI_DMA_BIDIRECTIONAL); + *dma_handle = dma_map_area(dev, bus, size, PCI_DMA_BIDIRECTIONAL, 0); if (*dma_handle == bad_dma_address) goto error; - flush_gart(hwdev); + flush_gart(dev); return memory; error: if (panic_on_overflow) - panic("pci_alloc_consistent: overflow %lu bytes\n", size); + panic("dma_alloc_coherent: IOMMU overflow by %lu bytes\n", size); free: free_pages((unsigned long)memory, get_order(size)); + /* XXX Could use the swiotlb pool here too */ return NULL; } /* - * Unmap consistent memory. + * Unmap coherent memory. * The caller must ensure that the device has finished accessing the mapping. */ -void pci_free_consistent(struct pci_dev *hwdev, size_t size, +void dma_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t bus) { - pci_unmap_single(hwdev, bus, size, 0); + dma_unmap_single(dev, bus, size, 0); free_pages((unsigned long)vaddr, get_order(size)); } @@ -280,7 +286,7 @@ void dump_leak(void) #define CLEAR_LEAK(x) #endif -static void iommu_full(struct pci_dev *dev, size_t size, int dir) +static void iommu_full(struct device *dev, size_t size, int dir, int do_panic) { /* * Ran out of IOMMU space for this operation. This is very bad. @@ -293,14 +299,14 @@ static void iommu_full(struct pci_dev *d */ printk(KERN_ERR - "PCI-DMA: Out of IOMMU space for %lu bytes at device %s[%s]\n", - size, dev ? pci_pretty_name(dev) : "", dev ? dev->slot_name : "?"); + "PCI-DMA: Out of IOMMU space for %lu bytes at device %s\n", + size, dev->bus_id); - if (size > PAGE_SIZE*EMERGENCY_PAGES) { + if (size > PAGE_SIZE*EMERGENCY_PAGES && do_panic) { if (dir == PCI_DMA_FROMDEVICE || dir == PCI_DMA_BIDIRECTIONAL) - panic("PCI-DMA: Memory will be corrupted\n"); + panic("PCI-DMA: Memory would be corrupted\n"); if (dir == PCI_DMA_TODEVICE || dir == PCI_DMA_BIDIRECTIONAL) - panic("PCI-DMA: Random memory will be DMAed\n"); + panic("PCI-DMA: Random memory would be DMAed\n"); } #ifdef CONFIG_IOMMU_LEAK @@ -308,9 +314,9 @@ static void iommu_full(struct pci_dev *d #endif } -static inline int need_iommu(struct pci_dev *dev, unsigned long addr, size_t size) +static inline int need_iommu(struct device *dev, unsigned long addr, size_t size) { - u64 mask = dev ? dev->dma_mask : 0xffffffff; + u64 mask = *dev->dma_mask; int high = addr + size >= mask; int mmu = high; if (force_iommu) @@ -323,9 +329,9 @@ static inline int need_iommu(struct pci_ return mmu; } -static inline int nonforced_iommu(struct pci_dev *dev, unsigned long addr, size_t size) +static inline int nonforced_iommu(struct device *dev, unsigned long addr, size_t size) { - u64 mask = dev ? dev->dma_mask : 0xffffffff; + u64 mask = *dev->dma_mask; int high = addr + size >= mask; int mmu = high; if (no_iommu) { @@ -339,8 +345,8 @@ static inline int nonforced_iommu(struct /* Map a single continuous physical area into the IOMMU. * Caller needs to check if the iommu is needed and flush. */ -static dma_addr_t pci_map_area(struct pci_dev *dev, unsigned long phys_mem, - size_t size, int dir) +static dma_addr_t dma_map_area(struct device *dev, unsigned long phys_mem, + size_t size, int dir, int do_panic) { unsigned long npages = to_pages(phys_mem, size); unsigned long iommu_page = alloc_iommu(npages); @@ -349,8 +355,8 @@ static dma_addr_t pci_map_area(struct pc if (!nonforced_iommu(dev, phys_mem, size)) return phys_mem; if (panic_on_overflow) - panic("pci_map_area overflow %lu bytes\n", size); - iommu_full(dev, size, dir); + panic("dma_map_area overflow %lu bytes\n", size); + iommu_full(dev, size, dir, do_panic); return bad_dma_address; } @@ -363,44 +369,44 @@ static dma_addr_t pci_map_area(struct pc } /* Map a single area into the IOMMU */ -dma_addr_t pci_map_single(struct pci_dev *dev, void *addr, size_t size, int dir) -{ +dma_addr_t dma_map_single(struct device *dev, void *addr, size_t size, int dir) +{ unsigned long phys_mem, bus; - BUG_ON(dir == PCI_DMA_NONE); + BUG_ON(dir == DMA_NONE); -#ifdef CONFIG_SWIOTLB if (swiotlb) - return swiotlb_map_single(&dev->dev,addr,size,dir); -#endif + return swiotlb_map_single(dev,addr,size,dir); + if (!dev) + dev = &fallback_dev; phys_mem = virt_to_phys(addr); if (!need_iommu(dev, phys_mem, size)) return phys_mem; - bus = pci_map_area(dev, phys_mem, size, dir); + bus = dma_map_area(dev, phys_mem, size, dir, 1); flush_gart(dev); return bus; } -/* Fallback for pci_map_sg in case of overflow */ -static int pci_map_sg_nonforce(struct pci_dev *dev, struct scatterlist *sg, +/* Fallback for dma_map_sg in case of overflow */ +static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg, int nents, int dir) { int i; #ifdef CONFIG_IOMMU_DEBUG - printk(KERN_DEBUG "pci_map_sg overflow\n"); + printk(KERN_DEBUG "dma_map_sg overflow\n"); #endif for (i = 0; i < nents; i++ ) { struct scatterlist *s = &sg[i]; unsigned long addr = page_to_phys(s->page) + s->offset; if (nonforced_iommu(dev, addr, s->length)) { - addr = pci_map_area(dev, addr, s->length, dir); + addr = dma_map_area(dev, addr, s->length, dir, 0); if (addr == bad_dma_address) { if (i > 0) - pci_unmap_sg(dev, sg, i, dir); + dma_unmap_sg(dev, sg, i, dir); nents = 0; sg[0].dma_length = 0; break; @@ -414,7 +420,7 @@ static int pci_map_sg_nonforce(struct pc } /* Map multiple scatterlist entries continuous into the first. */ -static int __pci_map_cont(struct scatterlist *sg, int start, int stopat, +static int __dma_map_cont(struct scatterlist *sg, int start, int stopat, struct scatterlist *sout, unsigned long pages) { unsigned long iommu_start = alloc_iommu(pages); @@ -452,7 +458,7 @@ static int __pci_map_cont(struct scatter return 0; } -static inline int pci_map_cont(struct scatterlist *sg, int start, int stopat, +static inline int dma_map_cont(struct scatterlist *sg, int start, int stopat, struct scatterlist *sout, unsigned long pages, int need) { @@ -462,14 +468,14 @@ static inline int pci_map_cont(struct sc sout->dma_length = sg[start].length; return 0; } - return __pci_map_cont(sg, start, stopat, sout, pages); + return __dma_map_cont(sg, start, stopat, sout, pages); } /* * DMA map all entries in a scatterlist. * Merge chunks that have page aligned sizes into a continuous mapping. - */ -int pci_map_sg(struct pci_dev *dev, struct scatterlist *sg, int nents, int dir) + */ +int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir) { int i; int out; @@ -477,19 +483,14 @@ int pci_map_sg(struct pci_dev *dev, stru unsigned long pages = 0; int need = 0, nextneed; -#ifdef CONFIG_SWIOTLB - if (swiotlb) - return swiotlb_map_sg(&dev->dev,sg,nents,dir); -#endif - - BUG_ON(dir == PCI_DMA_NONE); + BUG_ON(dir == DMA_NONE); if (nents == 0) return 0; -#ifdef CONFIG_SWIOTLB if (swiotlb) - return swiotlb_map_sg(&dev->dev,sg,nents,dir); -#endif + return swiotlb_map_sg(dev,sg,nents,dir); + if (!dev) + dev = &fallback_dev; out = 0; start = 0; @@ -508,19 +509,19 @@ int pci_map_sg(struct pci_dev *dev, stru boundary and the new one doesn't have an offset. */ if (!iommu_merge || !nextneed || !need || s->offset || (ps->offset + ps->length) % PAGE_SIZE) { - if (pci_map_cont(sg, start, i, sg+out, pages, + if (dma_map_cont(sg, start, i, sg+out, pages, need) < 0) goto error; out++; pages = 0; start = i; } - } + } need = nextneed; pages += to_pages(s->offset, s->length); } - if (pci_map_cont(sg, start, i, sg+out, pages, need) < 0) + if (dma_map_cont(sg, start, i, sg+out, pages, need) < 0) goto error; out++; flush_gart(dev); @@ -530,34 +531,32 @@ int pci_map_sg(struct pci_dev *dev, stru error: flush_gart(NULL); - pci_unmap_sg(dev, sg, nents, dir); + dma_unmap_sg(dev, sg, nents, dir); /* When it was forced try again unforced */ if (force_iommu) - return pci_map_sg_nonforce(dev, sg, nents, dir); + return dma_map_sg_nonforce(dev, sg, nents, dir); if (panic_on_overflow) - panic("pci_map_sg: overflow on %lu pages\n", pages); - iommu_full(dev, pages << PAGE_SHIFT, dir); + panic("dma_map_sg: overflow on %lu pages\n", pages); + iommu_full(dev, pages << PAGE_SHIFT, dir, 0); for (i = 0; i < nents; i++) sg[i].dma_address = bad_dma_address; return 0; } /* - * Free a PCI mapping. + * Free a DMA mapping. */ -void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, +void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, int direction) { unsigned long iommu_page; int npages; int i; -#ifdef CONFIG_SWIOTLB if (swiotlb) { - swiotlb_unmap_single(&hwdev->dev,dma_addr,size,direction); + swiotlb_unmap_single(dev,dma_addr,size,direction); return; } -#endif if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE || dma_addr >= iommu_bus_base + iommu_size) @@ -574,22 +573,25 @@ void pci_unmap_single(struct pci_dev *hw /* * Wrapper for pci_unmap_single working with scatterlists. */ -void pci_unmap_sg(struct pci_dev *dev, struct scatterlist *sg, int nents, - int dir) +void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, int dir) { int i; + if (swiotlb) { + swiotlb_unmap_sg(dev,sg,nents,dir); + return; + } for (i = 0; i < nents; i++) { struct scatterlist *s = &sg[i]; if (!s->dma_length || !s->length) break; - pci_unmap_single(dev, s->dma_address, s->dma_length, dir); + dma_unmap_single(dev, s->dma_address, s->dma_length, dir); } } -int pci_dma_supported(struct pci_dev *dev, u64 mask) +int dma_supported(struct device *dev, u64 mask) { /* Copied from i386. Doesn't make much sense, because it will - only work for pci_alloc_consistent. + only work for pci_alloc_coherent. The caller just has to use GFP_DMA in this case. */ if (mask < 0x00ffffff) return 0; @@ -605,22 +607,31 @@ int pci_dma_supported(struct pci_dev *de Assume all masks <= 40 bits are of this type. Normally this doesn't make any difference, but gives more gentle handling of IOMMU overflow. */ if (iommu_sac_force && (mask >= 0xffffffffffULL)) { - printk(KERN_INFO "%s: Force SAC with mask %Lx\n", dev->slot_name,mask); + printk(KERN_INFO "%s: Force SAC with mask %Lx\n", dev->bus_id,mask); return 0; } return 1; } -EXPORT_SYMBOL(pci_unmap_sg); -EXPORT_SYMBOL(pci_map_sg); -EXPORT_SYMBOL(pci_map_single); -EXPORT_SYMBOL(pci_unmap_single); -EXPORT_SYMBOL(pci_dma_supported); +int dma_get_cache_alignment(void) +{ + return boot_cpu_data.x86_clflush_size; +} + +EXPORT_SYMBOL(dma_unmap_sg); +EXPORT_SYMBOL(dma_map_sg); +EXPORT_SYMBOL(dma_map_single); +EXPORT_SYMBOL(dma_unmap_single); +EXPORT_SYMBOL(dma_supported); EXPORT_SYMBOL(no_iommu); EXPORT_SYMBOL(force_iommu); EXPORT_SYMBOL(bad_dma_address); -EXPORT_SYMBOL(iommu_merge); +EXPORT_SYMBOL(iommu_bio_merge); +EXPORT_SYMBOL(iommu_sac_force); +EXPORT_SYMBOL(dma_get_cache_alignment); +EXPORT_SYMBOL(dma_alloc_coherent); +EXPORT_SYMBOL(dma_free_coherent); static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size) { @@ -747,7 +758,7 @@ static int __init pci_iommu_init(void) if (swiotlb) { no_iommu = 1; - printk(KERN_INFO "PCI-DMA: Using software bounce buffering for IO (SWIOTLB)\n"); + printk(KERN_INFO "PCI-DMA: Using software bounce buffering for IO (SWIOTLB)\n"); return -1; } @@ -851,7 +862,7 @@ static int __init pci_iommu_init(void) fs_initcall(pci_iommu_init); /* iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] - [,forcesac][,fullflush][,nomerge] + [,forcesac][,fullflush][,nomerge][,biomerge] size set size of iommu (in bytes) noagp don't initialize the AGP driver and use full aperture. off don't use the IOMMU @@ -859,60 +870,73 @@ fs_initcall(pci_iommu_init); memaper[=order] allocate an own aperture over RAM with size 32MB^order. noforce don't force IOMMU usage. Default. force Force IOMMU. - merge Do SG merging. Implies force (experimental) + merge Do lazy merging. This may improve performance on some block devices. + Implies force (experimental) + biomerge Do merging at the BIO layer. This is more efficient than merge, + but should be only done with very big IOMMUs. Implies merge,force. nomerge Don't do SG merging. forcesac For SAC mode for masks <40bits (experimental) fullflush Flush IOMMU on each allocation (default) nofullflush Don't use IOMMU fullflush allowed overwrite iommu off workarounds for specific chipsets. soft Use software bounce buffering (default for Intel machines) + noaperture Don't touch the aperture for AGP. */ -__init int iommu_setup(char *opt) +__init int iommu_setup(char *p) { int arg; - char *p = opt; - - for (;;) { - if (!memcmp(p,"noagp", 5)) + + while (*p) { + if (!strncmp(p,"noagp",5)) no_agp = 1; - if (!memcmp(p,"off", 3)) + if (!strncmp(p,"off",3)) no_iommu = 1; - if (!memcmp(p,"force", 5)) { + if (!strncmp(p,"force",5)) { force_iommu = 1; iommu_aperture_allowed = 1; } - if (!memcmp(p,"allowed",7)) + if (!strncmp(p,"allowed",7)) iommu_aperture_allowed = 1; - if (!memcmp(p,"noforce", 7)) { + if (!strncmp(p,"noforce",7)) { iommu_merge = 0; force_iommu = 0; } - if (!memcmp(p, "memaper", 7)) { + if (!strncmp(p, "memaper", 7)) { fallback_aper_force = 1; p += 7; - if (*p == '=' && get_option(&p, &arg)) - fallback_aper_order = arg; + if (*p == '=') { + ++p; + if (get_option(&p, &arg)) + fallback_aper_order = arg; + } } - if (!memcmp(p, "panic", 5)) + if (!strncmp(p, "biomerge",8)) { + iommu_bio_merge = 4096; + iommu_merge = 1; + force_iommu = 1; + } + if (!strncmp(p, "panic",5)) panic_on_overflow = 1; - if (!memcmp(p, "nopanic", 7)) + if (!strncmp(p, "nopanic",7)) panic_on_overflow = 0; - if (!memcmp(p, "merge", 5)) { + if (!strncmp(p, "merge",5)) { iommu_merge = 1; force_iommu = 1; } - if (!memcmp(p, "nomerge", 7)) + if (!strncmp(p, "nomerge",7)) iommu_merge = 0; - if (!memcmp(p, "forcesac", 8)) + if (!strncmp(p, "forcesac",8)) iommu_sac_force = 1; - if (!memcmp(p, "fullflush", 9)) + if (!strncmp(p, "fullflush",8)) iommu_fullflush = 1; - if (!memcmp(p, "nofullflush", 11)) + if (!strncmp(p, "nofullflush",11)) iommu_fullflush = 0; - if (!memcmp(p, "soft", 4)) + if (!strncmp(p, "soft",4)) swiotlb = 1; + if (!strncmp(p, "noaperture",10)) + fix_aperture = 0; #ifdef CONFIG_IOMMU_LEAK - if (!memcmp(p,"leak", 4)) { + if (!strncmp(p,"leak",4)) { leak_trace = 1; p += 4; if (*p == '=') ++p; @@ -922,10 +946,9 @@ __init int iommu_setup(char *opt) #endif if (isdigit(*p) && get_option(&p, &arg)) iommu_size = arg; - do { - if (*p == ' ' || *p == 0) - return 0; - } while (*p++ != ','); + p += strcspn(p, ","); + if (*p == ',') + ++p; } return 1; } diff -puN arch/x86_64/kernel/pci-nommu.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/pci-nommu.c --- 25/arch/x86_64/kernel/pci-nommu.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.992004904 -0700 +++ 25-akpm/arch/x86_64/kernel/pci-nommu.c 2004-07-31 16:54:51.046996544 -0700 @@ -1,8 +1,11 @@ +/* Fallback functions when the main IOMMU code is not compiled in. This + code is roughly equivalent to i386. */ #include #include #include #include #include +#include int iommu_merge = 0; EXPORT_SYMBOL(iommu_merge); @@ -10,21 +13,30 @@ EXPORT_SYMBOL(iommu_merge); dma_addr_t bad_dma_address; EXPORT_SYMBOL(bad_dma_address); +int iommu_bio_merge = 0; +EXPORT_SYMBOL(iommu_bio_merge); + +int iommu_sac_force = 0; +EXPORT_SYMBOL(iommu_sac_force); + /* * Dummy IO MMU functions */ -void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) +void *dma_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, + unsigned gfp) { void *ret; - int gfp = GFP_ATOMIC; - - if (hwdev == NULL || - end_pfn > (hwdev->dma_mask>>PAGE_SHIFT) || /* XXX */ - (u32)hwdev->dma_mask < 0xffffffff) - gfp |= GFP_DMA; - ret = (void *)__get_free_pages(gfp, get_order(size)); + u64 mask; + + do { + ret = (void *)__get_free_pages(gfp, get_order(size)); + + if (hwdev) + mask = hwdev->coherent_dma_mask & *hwdev->dma_mask; + else + mask = 0xffffffff; + } while (((unsigned long)ret & ~mask) && !(gfp & GFP_DMA)); if (ret != NULL) { memset(ret, 0, size); @@ -32,35 +44,43 @@ void *pci_alloc_consistent(struct pci_de } return ret; } +EXPORT_SYMBOL(dma_alloc_coherent); -void pci_free_consistent(struct pci_dev *hwdev, size_t size, +void dme_free_coherent(struct device *hwdev, size_t size, void *vaddr, dma_addr_t dma_handle) { free_pages((unsigned long)vaddr, get_order(size)); } +EXPORT_SYMBOL(dma_free_coherent); -int pci_dma_supported(struct pci_dev *hwdev, u64 mask) +int dma_supported(struct device *hwdev, u64 mask) { /* * we fall back to GFP_DMA when the mask isn't all 1s, * so we can't guarantee allocations that must be * within a tighter range than GFP_DMA.. - * RED-PEN this won't work for pci_map_single. Caller has to - * use GFP_DMA in the first place. + * RED-PEN this won't work for pci_map_single. Caller has to + * use GFP_DMA in the first place. */ if (mask < 0x00ffffff) return 0; return 1; } +EXPORT_SYMBOL(dma_supported); -EXPORT_SYMBOL(pci_dma_supported); +int dma_get_cache_alignment(void) +{ + return boot_cpu_data.x86_clflush_size; +} +EXPORT_SYMBOL(dma_get_cache_alignment); static int __init check_ram(void) { if (end_pfn >= 0xffffffff>>PAGE_SHIFT) { - printk(KERN_ERR "WARNING more than 4GB of memory but no IOMMU.\n" - KERN_ERR "WARNING 32bit PCI may malfunction.\n"); + printk( + KERN_ERR "WARNING more than 4GB of memory but IOMMU not compiled in.\n" + KERN_ERR "WARNING 32bit PCI may malfunction.\n"); } return 0; } diff -puN arch/x86_64/kernel/process.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/process.c --- 25/arch/x86_64/kernel/process.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.993004752 -0700 +++ 25-akpm/arch/x86_64/kernel/process.c 2004-07-31 16:54:51.047996392 -0700 @@ -554,16 +554,16 @@ void set_personality_64bit(void) clear_thread_flag(TIF_IA32); } -asmlinkage long sys_fork(struct pt_regs regs) +asmlinkage long sys_fork(struct pt_regs *regs) { - return do_fork(SIGCHLD, regs.rsp, ®s, 0, NULL, NULL); + return do_fork(SIGCHLD, regs->rsp, regs, 0, NULL, NULL); } -asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void __user *parent_tid, void __user *child_tid, struct pt_regs regs) +asmlinkage long sys_clone(unsigned long clone_flags, unsigned long newsp, void __user *parent_tid, void __user *child_tid, struct pt_regs *regs) { if (!newsp) - newsp = regs.rsp; - return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, ®s, 0, + newsp = regs->rsp; + return do_fork(clone_flags & ~CLONE_IDLETASK, newsp, regs, 0, parent_tid, child_tid); } @@ -577,9 +577,9 @@ asmlinkage long sys_clone(unsigned long * do not have enough call-clobbered registers to hold all * the information you need. */ -asmlinkage long sys_vfork(struct pt_regs regs) +asmlinkage long sys_vfork(struct pt_regs *regs) { - return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.rsp, ®s, 0, + return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->rsp, regs, 0, NULL, NULL); } diff -puN arch/x86_64/kernel/ptrace.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/ptrace.c --- 25/arch/x86_64/kernel/ptrace.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.994004600 -0700 +++ 25-akpm/arch/x86_64/kernel/ptrace.c 2004-07-31 16:54:51.048996240 -0700 @@ -433,30 +433,32 @@ asmlinkage long sys_ptrace(long request, break; case PTRACE_GETREGS: { /* Get all gp regs from the child. */ - if (!access_ok(VERIFY_WRITE, (unsigned __user *)data, FRAME_SIZE)) { + if (!access_ok(VERIFY_WRITE, (unsigned __user *)data, + sizeof(struct user_regs_struct))) { ret = -EIO; break; } + ret = 0; for (ui = 0; ui < sizeof(struct user_regs_struct); ui += sizeof(long)) { - __put_user(getreg(child, ui),(unsigned long __user *) data); + ret |= __put_user(getreg(child, ui),(unsigned long __user *) data); data += sizeof(long); } - ret = 0; break; } case PTRACE_SETREGS: { /* Set all gp regs in the child. */ unsigned long tmp; - if (!access_ok(VERIFY_READ, (unsigned __user *)data, FRAME_SIZE)) { + if (!access_ok(VERIFY_READ, (unsigned __user *)data, + sizeof(struct user_regs_struct))) { ret = -EIO; break; } + ret = 0; for (ui = 0; ui < sizeof(struct user_regs_struct); ui += sizeof(long)) { - __get_user(tmp, (unsigned long __user *) data); + ret |= __get_user(tmp, (unsigned long __user *) data); putreg(child, ui, tmp); data += sizeof(long); } - ret = 0; break; } diff -puN arch/x86_64/kernel/setup64.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/setup64.c --- 25/arch/x86_64/kernel/setup64.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.996004296 -0700 +++ 25-akpm/arch/x86_64/kernel/setup64.c 2004-07-31 16:54:51.049996088 -0700 @@ -60,12 +60,12 @@ noforce (default) Don't enable by defaul */ static int __init nonx_setup(char *str) { - if (!strncmp(str, "on",3)) { + if (!strcmp(str, "on")) { __supported_pte_mask |= _PAGE_NX; do_not_nx = 0; vm_data_default_flags &= ~VM_EXEC; vm_stack_flags &= ~VM_EXEC; - } else if (!strncmp(str, "noforce",7) || !strncmp(str,"off",3)) { + } else if (!strcmp(str, "noforce") || !strcmp(str, "off")) { do_not_nx = (str[0] == 'o'); if (do_not_nx) __supported_pte_mask &= ~_PAGE_NX; @@ -91,26 +91,28 @@ Valid options: compat (default) Imply PROT_EXEC for PROT_READ */ - static int __init nonx32_setup(char *str) + static int __init nonx32_setup(char *s) { - char *s; - while ((s = strsep(&str, ",")) != NULL) { - if (!strcmp(s, "all") || !strcmp(s,"on")) { + while (*s) { + if (!strncmp(s, "all", 3) || !strncmp(s,"on",2)) { vm_data_default_flags32 &= ~VM_EXEC; vm_stack_flags32 &= ~VM_EXEC; - } else if (!strcmp(s, "off")) { + } else if (!strncmp(s, "off",3)) { vm_data_default_flags32 |= VM_EXEC; vm_stack_flags32 |= VM_EXEC; - } else if (!strcmp(s, "stack")) { + } else if (!strncmp(s, "stack", 5)) { vm_data_default_flags32 |= VM_EXEC; vm_stack_flags32 &= ~VM_EXEC; - } else if (!strcmp(s, "force")) { + } else if (!strncmp(s, "force",5)) { vm_force_exec32 = 0; - } else if (!strcmp(s, "compat")) { + } else if (!strncmp(s, "compat",5)) { vm_force_exec32 = PROT_EXEC; } - } - return 1; + s += strcspn(s, ","); + if (*s == ',') + ++s; + } + return 1; } __setup("noexec32=", nonx32_setup); diff -puN arch/x86_64/kernel/setup.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/setup.c --- 25/arch/x86_64/kernel/setup.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.997004144 -0700 +++ 25-akpm/arch/x86_64/kernel/setup.c 2004-07-31 16:54:51.050995936 -0700 @@ -79,8 +79,10 @@ unsigned long pci_mem_start = 0x10000000 unsigned long saved_video_mode; +#ifdef CONFIG_SWIOTLB int swiotlb; EXPORT_SYMBOL(swiotlb); +#endif /* * Setup options @@ -765,6 +767,7 @@ static struct _cache_table cache_table[] { 0x43, LVL_2, 512 }, { 0x44, LVL_2, 1024 }, { 0x45, LVL_2, 2048 }, + { 0x60, LVL_1_DATA, 16 }, { 0x66, LVL_1_DATA, 8 }, { 0x67, LVL_1_DATA, 16 }, { 0x68, LVL_1_DATA, 32 }, diff -puN arch/x86_64/kernel/signal.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/signal.c --- 25/arch/x86_64/kernel/signal.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:50.999003840 -0700 +++ 25-akpm/arch/x86_64/kernel/signal.c 2004-07-31 16:54:51.051995784 -0700 @@ -40,7 +40,7 @@ void ia32_setup_frame(int sig, struct k_ sigset_t *set, struct pt_regs * regs); asmlinkage long -sys_rt_sigsuspend(sigset_t __user *unewset, size_t sigsetsize, struct pt_regs regs) +sys_rt_sigsuspend(sigset_t __user *unewset, size_t sigsetsize, struct pt_regs *regs) { sigset_t saveset, newset; @@ -59,21 +59,22 @@ sys_rt_sigsuspend(sigset_t __user *unews spin_unlock_irq(¤t->sighand->siglock); #ifdef DEBUG_SIG printk("rt_sigsuspend savset(%lx) newset(%lx) regs(%p) rip(%lx)\n", - saveset, newset, ®s, regs.rip); + saveset, newset, regs, regs->rip); #endif - regs.rax = -EINTR; + regs->rax = -EINTR; while (1) { current->state = TASK_INTERRUPTIBLE; schedule(); - if (do_signal(®s, &saveset)) + if (do_signal(regs, &saveset)) return -EINTR; } } asmlinkage long -sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss, struct pt_regs regs) +sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss, + struct pt_regs *regs) { - return do_sigaltstack(uss, uoss, regs.rsp); + return do_sigaltstack(uss, uoss, regs->rsp); } @@ -134,13 +135,13 @@ badframe: return 1; } -asmlinkage long sys_rt_sigreturn(struct pt_regs regs) +asmlinkage long sys_rt_sigreturn(struct pt_regs *regs) { struct rt_sigframe __user *frame; sigset_t set; long eax; - frame = (struct rt_sigframe __user *)(regs.rsp - 8); + frame = (struct rt_sigframe __user *)(regs->rsp - 8); if (verify_area(VERIFY_READ, frame, sizeof(*frame))) { goto badframe; } @@ -154,7 +155,7 @@ asmlinkage long sys_rt_sigreturn(struct recalc_sigpending(); spin_unlock_irq(¤t->sighand->siglock); - if (restore_sigcontext(®s, &frame->uc.uc_mcontext, &eax)) { + if (restore_sigcontext(regs, &frame->uc.uc_mcontext, &eax)) { goto badframe; } @@ -162,13 +163,13 @@ asmlinkage long sys_rt_sigreturn(struct printk("%d sigreturn rip:%lx rsp:%lx frame:%p rax:%lx\n",current->pid,regs.rip,regs.rsp,frame,eax); #endif - if (do_sigaltstack(&frame->uc.uc_stack, NULL, regs.rsp) == -EFAULT) + if (do_sigaltstack(&frame->uc.uc_stack, NULL, regs->rsp) == -EFAULT) goto badframe; return eax; badframe: - signal_fault(®s,frame,"sigreturn"); + signal_fault(regs,frame,"sigreturn"); return 0; } diff -puN arch/x86_64/kernel/time.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/time.c --- 25/arch/x86_64/kernel/time.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.000003688 -0700 +++ 25-akpm/arch/x86_64/kernel/time.c 2004-07-31 16:54:51.053995480 -0700 @@ -11,7 +11,6 @@ * Copyright (c) 2002 Vojtech Pavlik * Copyright (c) 2003 Andi Kleen * RTC support code taken from arch/i386/kernel/timers/time_hpet.c - * */ #include @@ -42,6 +41,10 @@ u64 jiffies_64 = INITIAL_JIFFIES; EXPORT_SYMBOL(jiffies_64); +#ifdef CONFIG_CPU_FREQ +static void cpufreq_delayed_get(void); +#endif + extern int using_apic_timer; spinlock_t rtc_lock = SPIN_LOCK_UNLOCKED; @@ -82,7 +85,7 @@ static inline void rdtscll_sync(unsigned * timer interrupt has happened already, but vxtime.trigger wasn't updated yet. * This is not a problem, because jiffies hasn't updated either. They are bound * together by xtime_lock. - */ + */ static inline unsigned int do_gettimeoffset_tsc(void) { @@ -119,7 +122,7 @@ void do_gettimeofday(struct timeval *tv) usec = xtime.tv_nsec / 1000; /* i386 does some correction here to keep the clock - monotonus even when ntpd is fixing drift. + monotonous even when ntpd is fixing drift. But they didn't work for me, there is a non monotonic clock anyways with ntp. I dropped all corrections now until a real solution can @@ -214,7 +217,7 @@ static void set_rtc_mmss(unsigned long n * overflow. This avoids messing with unknown time zones but requires your RTC * not to be off by more than 15 minutes. Since we're calling it only when * our clock is externally synchronized using NTP, this shouldn't be a problem. - */ + */ real_seconds = nowtime % 60; real_minutes = nowtime / 60; @@ -297,15 +300,15 @@ EXPORT_SYMBOL(monotonic_clock); static irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) { static unsigned long rtc_update = 0; - unsigned long tsc, lost = 0; - int delay, offset = 0; + unsigned long tsc; + int delay, offset = 0, lost = 0; /* * Here we are in the timer irq handler. We have irqs locally disabled (so we * don't need spin_lock_irqsave()) but we don't know if the timer_bh is running * on the other CPU, so we need a lock. We also need to lock the vsyscall * variables, because both do_timer() and us change them -arca+vojtech - */ + */ write_seqlock(&xtime_lock); @@ -354,12 +357,27 @@ static irqreturn_t timer_interrupt(int i (((long) offset << 32) / vxtime.tsc_quot) - 1; } - if (lost) { + if (lost > 0) { + static long lost_count; + if (report_lost_ticks) { - printk(KERN_WARNING "time.c: Lost %ld timer " + printk(KERN_WARNING "time.c: Lost %d timer " "tick(s)! ", lost); print_symbol("rip %s)\n", regs->rip); } + + if (lost_count == 100) { + printk(KERN_WARNING + "warning: many lost ticks.\n" + KERN_WARNING "Your time source seems to be instable or some driver is hogging interupts\n"); + } else + lost_count++; + + if ((lost_count % 25) == 0) { +#ifdef CONFIG_CPU_FREQ + cpufreq_delayed_get(); +#endif + } jiffies += lost; } @@ -509,6 +527,34 @@ unsigned long get_cmos_time(void) Should fix up last_tsc too. Currently gettimeofday in the first tick after the change will be slightly wrong. */ +#include + +static unsigned int cpufreq_delayed_issched = 0; +static unsigned int cpufreq_init = 0; +static struct work_struct cpufreq_delayed_get_work; + +static void handle_cpufreq_delayed_get(void *v) +{ + unsigned int cpu; + for_each_online_cpu(cpu) { + cpufreq_get(cpu); + } + cpufreq_delayed_issched = 0; +} + +/* if we notice lost ticks, schedule a call to cpufreq_get() as it tries + * to verify the CPU frequency the timing core thinks the CPU is running + * at is still correct. + */ +static void cpufreq_delayed_get(void) +{ + if (cpufreq_init && !cpufreq_delayed_issched) { + cpufreq_delayed_issched = 1; + printk(KERN_DEBUG "Losing some ticks... checking if CPU frequency changed.\n"); + schedule_work(&cpufreq_delayed_get_work); + } +} + static unsigned int ref_freq = 0; static unsigned long loops_per_jiffy_ref = 0; @@ -518,14 +564,18 @@ static int time_cpufreq_notifier(struct void *data) { struct cpufreq_freqs *freq = data; - unsigned long *lpj; + unsigned long *lpj, dummy; + lpj = &dummy; + if (!(freq->flags & CPUFREQ_CONST_LOOPS)) #ifdef CONFIG_SMP lpj = &cpu_data[freq->cpu].loops_per_jiffy; #else lpj = &boot_cpu_data.loops_per_jiffy; #endif + + if (!ref_freq) { ref_freq = freq->old; loops_per_jiffy_ref = *lpj; @@ -538,7 +588,8 @@ static int time_cpufreq_notifier(struct cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new); - vxtime.tsc_quot = (1000L << 32) / cpu_khz; + if (!(freq->flags & CPUFREQ_CONST_LOOPS)) + vxtime.tsc_quot = (1000L << 32) / cpu_khz; } set_cyc2ns_scale(cpu_khz_ref / 1000); @@ -549,6 +600,18 @@ static int time_cpufreq_notifier(struct static struct notifier_block time_cpufreq_notifier_block = { .notifier_call = time_cpufreq_notifier }; + +static int __init cpufreq_tsc(void) +{ + INIT_WORK(&cpufreq_delayed_get_work, handle_cpufreq_delayed_get, NULL); + if (!cpufreq_register_notifier(&time_cpufreq_notifier_block, + CPUFREQ_TRANSITION_NOTIFIER)) + cpufreq_init = 1; + return 0; +} + +core_initcall(cpufreq_tsc); + #endif /* @@ -725,8 +788,8 @@ void __init time_init(void) cpu_khz = hpet_calibrate_tsc(); timename = "HPET"; } else { - pit_init(); - cpu_khz = pit_calibrate_tsc(); + pit_init(); + cpu_khz = pit_calibrate_tsc(); timename = "PIT"; } @@ -742,11 +805,6 @@ void __init time_init(void) setup_irq(0, &irq0); set_cyc2ns_scale(cpu_khz / 1000); - -#ifdef CONFIG_CPU_FREQ - cpufreq_register_notifier(&time_cpufreq_notifier_block, - CPUFREQ_TRANSITION_NOTIFIER); -#endif } void __init time_init_smp(void) @@ -1038,6 +1096,8 @@ irqreturn_t hpet_rtc_interrupt(int irq, } #endif + + static int __init nohpet_setup(char *s) { nohpet = 1; diff -puN arch/x86_64/kernel/traps.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/traps.c --- 25/arch/x86_64/kernel/traps.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.002003384 -0700 +++ 25-akpm/arch/x86_64/kernel/traps.c 2004-07-31 16:54:51.054995328 -0700 @@ -355,7 +355,7 @@ void __die(const char * str, struct pt_r #ifdef CONFIG_DEBUG_PAGEALLOC printk("DEBUG_PAGEALLOC"); #endif - printk("\n"); + printk("\n"); notify_die(DIE_OOPS, (char *)str, regs, err, 255, SIGSEGV); show_registers(regs); /* Executive summary in case the oops scrolled away */ @@ -516,7 +516,7 @@ asmlinkage void do_general_protection(st tsk->thread.error_code = error_code; tsk->thread.trap_no = 13; force_sig(SIGSEGV, tsk); - return; + return; } /* kernel gp */ @@ -673,7 +673,6 @@ clear_dr7: return regs; clear_TF_reenable: - printk("clear_tf_reenable\n"); set_tsk_thread_flag(tsk, TIF_SINGLESTEP); clear_TF: @@ -684,16 +683,43 @@ clear_TF: return regs; } +static int kernel_math_error(struct pt_regs *regs, char *str) +{ + const struct exception_table_entry *fixup; + fixup = search_exception_tables(regs->rip); + if (fixup) { + regs->rip = fixup->fixup; + return 1; + } + notify_die(DIE_GPF, str, regs, 0, 16, SIGFPE); +#if 0 + /* This should be a die, but warn only for now */ + die(str, regs, 0); +#else + printk(KERN_DEBUG "%s: %s at ", current->comm, str); + printk_address(regs->rip); + printk("\n"); +#endif + return 0; +} + /* * Note that we play around with the 'TS' bit in an attempt to get * the correct behaviour even in the presence of the asynchronous * IRQ13 behaviour */ -void math_error(void __user *rip) +asmlinkage void do_coprocessor_error(struct pt_regs *regs) { + void __user *rip = (void __user *)(regs->rip); struct task_struct * task; siginfo_t info; unsigned short cwd, swd; + + conditional_sti(regs); + if ((regs->cs & 3) == 0 && + kernel_math_error(regs, "kernel x87 math error")) + return; + /* * Save the info for the exception handler and clear the error. */ @@ -743,23 +769,23 @@ void math_error(void __user *rip) force_sig_info(SIGFPE, &info, task); } -asmlinkage void do_coprocessor_error(struct pt_regs * regs) -{ - conditional_sti(regs); - math_error((void __user *)regs->rip); -} - asmlinkage void bad_intr(void) { printk("bad interrupt"); } -static inline void simd_math_error(void __user *rip) +asmlinkage void do_simd_coprocessor_error(struct pt_regs *regs) { + void __user *rip = (void __user *)(regs->rip); struct task_struct * task; siginfo_t info; unsigned short mxcsr; + conditional_sti(regs); + if ((regs->cs & 3) == 0 && + kernel_math_error(regs, "simd math error")) + return; + /* * Save the info for the exception handler and clear the error. */ @@ -802,12 +828,6 @@ static inline void simd_math_error(void force_sig_info(SIGFPE, &info, task); } -asmlinkage void do_simd_coprocessor_error(struct pt_regs * regs) -{ - conditional_sti(regs); - simd_math_error((void __user *)regs->rip); -} - asmlinkage void do_spurious_interrupt_bug(struct pt_regs * regs) { } diff -puN arch/x86_64/kernel/x8664_ksyms.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/kernel/x8664_ksyms.c --- 25/arch/x86_64/kernel/x8664_ksyms.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.003003232 -0700 +++ 25-akpm/arch/x86_64/kernel/x8664_ksyms.c 2004-07-31 16:54:51.055995176 -0700 @@ -108,11 +108,8 @@ EXPORT_SYMBOL(pcibios_penalize_isa_irq); EXPORT_SYMBOL(pci_mem_start); #endif -#ifdef CONFIG_X86_USE_3DNOW -EXPORT_SYMBOL(_mmx_memcpy); -EXPORT_SYMBOL(mmx_clear_page); -EXPORT_SYMBOL(mmx_copy_page); -#endif +EXPORT_SYMBOL(copy_page); +EXPORT_SYMBOL(clear_page); EXPORT_SYMBOL(cpu_pda); #ifdef CONFIG_SMP @@ -182,10 +179,17 @@ EXPORT_SYMBOL_NOVERS(memcpy); EXPORT_SYMBOL_NOVERS(__memcpy); EXPORT_SYMBOL_NOVERS(memcmp); -/* syscall export needed for misdesigned sound drivers. */ -EXPORT_SYMBOL(sys_read); -EXPORT_SYMBOL(sys_lseek); -EXPORT_SYMBOL(sys_open); +#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM +/* prototypes are wrong, these are assembly with custom calling functions */ +extern void rwsem_down_read_failed_thunk(void); +extern void rwsem_wake_thunk(void); +extern void rwsem_downgrade_thunk(void); +extern void rwsem_down_write_failed_thunk(void); +EXPORT_SYMBOL(rwsem_down_read_failed_thunk); +EXPORT_SYMBOL(rwsem_wake_thunk); +EXPORT_SYMBOL(rwsem_downgrade_thunk); +EXPORT_SYMBOL(rwsem_down_write_failed_thunk); +#endif EXPORT_SYMBOL(empty_zero_page); @@ -211,8 +215,6 @@ EXPORT_SYMBOL(init_level4_pgt); extern unsigned long __supported_pte_mask; EXPORT_SYMBOL(__supported_pte_mask); -EXPORT_SYMBOL(clear_page); - #ifdef CONFIG_SMP EXPORT_SYMBOL(flush_tlb_page); EXPORT_SYMBOL_GPL(flush_tlb_all); diff -puN /dev/null arch/x86_64/lib/bitops.c --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ 25-akpm/arch/x86_64/lib/bitops.c 2004-07-31 16:54:51.056995024 -0700 @@ -0,0 +1,136 @@ +#include +#include + +/** + * find_first_zero_bit - find the first zero bit in a memory region + * @addr: The address to start the search at + * @size: The maximum size to search + * + * Returns the bit-number of the first zero bit, not the number of the byte + * containing a bit. + */ +inline long find_first_zero_bit(const unsigned long * addr, unsigned long size) +{ + long d0, d1, d2; + long res; + + if (!size) + return 0; + asm volatile( + " repe; scasq\n" + " je 1f\n" + " xorq -8(%%rdi),%%rax\n" + " subq $8,%%rdi\n" + " bsfq %%rax,%%rdx\n" + "1: subq %[addr],%%rdi\n" + " shlq $3,%%rdi\n" + " addq %%rdi,%%rdx" + :"=d" (res), "=&c" (d0), "=&D" (d1), "=&a" (d2) + :"0" (0ULL), "1" ((size + 63) >> 6), "2" (addr), "3" (-1ULL), + [addr] "r" (addr) : "memory"); + return res; +} + +/** + * find_next_zero_bit - find the first zero bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The maximum size to search + */ +long find_next_zero_bit (const unsigned long * addr, long size, long offset) +{ + unsigned long * p = ((unsigned long *) addr) + (offset >> 6); + unsigned long set = 0; + unsigned long res, bit = offset&63; + + if (bit) { + /* + * Look for zero in first word + */ + asm("bsfq %1,%0\n\t" + "cmoveq %2,%0" + : "=r" (set) + : "r" (~(*p >> bit)), "r"(64L)); + if (set < (64 - bit)) + return set + offset; + set = 64 - bit; + p++; + } + /* + * No zero yet, search remaining full words for a zero + */ + res = find_first_zero_bit ((const unsigned long *)p, + size - 64 * (p - (unsigned long *) addr)); + return (offset + set + res); +} + +static inline long +__find_first_bit(const unsigned long * addr, unsigned long size) +{ + long d0, d1; + long res; + + asm volatile( + " repe; scasq\n" + " jz 1f\n" + " subq $8,%%rdi\n" + " bsfq (%%rdi),%%rax\n" + "1: subq %[addr],%%rdi\n" + " shlq $3,%%rdi\n" + " addq %%rdi,%%rax" + :"=a" (res), "=&c" (d0), "=&D" (d1) + :"0" (0ULL), + "1" ((size + 63) >> 6), "2" (addr), + [addr] "r" (addr) : "memory"); + return res; +} + +/** + * find_first_bit - find the first set bit in a memory region + * @addr: The address to start the search at + * @size: The maximum size to search + * + * Returns the bit-number of the first set bit, not the number of the byte + * containing a bit. + */ +long find_first_bit(const unsigned long * addr, unsigned long size) +{ + return __find_first_bit(addr,size); +} + +/** + * find_next_bit - find the first set bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The maximum size to search + */ +long find_next_bit(const unsigned long * addr, long size, long offset) +{ + const unsigned long * p = addr + (offset >> 6); + unsigned long set = 0, bit = offset & 63, res; + + if (bit) { + /* + * Look for nonzero in the first 64 bits: + */ + asm("bsfq %1,%0\n\t" + "cmoveq %2,%0\n\t" + : "=r" (set) + : "r" (*p >> bit), "r" (64L)); + if (set < (64 - bit)) + return set + offset; + set = 64 - bit; + p++; + } + /* + * No set bit yet, search remaining full words for a bit + */ + res = find_first_bit (p, size - 64 * (p - addr)); + return (offset + set + res); +} + + +EXPORT_SYMBOL(find_next_bit); +EXPORT_SYMBOL(find_first_bit); +EXPORT_SYMBOL(find_first_zero_bit); +EXPORT_SYMBOL(find_next_zero_bit); diff -puN arch/x86_64/lib/bitstr.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/lib/bitstr.c --- 25/arch/x86_64/lib/bitstr.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.004003080 -0700 +++ 25-akpm/arch/x86_64/lib/bitstr.c 2004-07-31 16:54:51.056995024 -0700 @@ -1,3 +1,4 @@ +#include #include /* Find string of zero bits in a bitmap */ @@ -23,3 +24,5 @@ find_next_zero_string(unsigned long *bit } return n; } + +EXPORT_SYMBOL(find_next_zero_string); diff -puN arch/x86_64/lib/Makefile~x86-64-merge-for-268rc2-mm1 arch/x86_64/lib/Makefile --- 25/arch/x86_64/lib/Makefile~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.006002776 -0700 +++ 25-akpm/arch/x86_64/lib/Makefile 2004-07-31 16:54:51.057994872 -0700 @@ -8,7 +8,7 @@ obj-y := io.o lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \ usercopy.o getuser.o putuser.o \ - thunk.o clear_page.o copy_page.o bitstr.o + thunk.o clear_page.o copy_page.o bitstr.o bitops.o lib-y += memcpy.o memmove.o memset.o copy_user.o lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o diff -puN arch/x86_64/lib/memmove.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/lib/memmove.c --- 25/arch/x86_64/lib/memmove.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.007002624 -0700 +++ 25-akpm/arch/x86_64/lib/memmove.c 2004-07-31 16:54:51.057994872 -0700 @@ -10,18 +10,10 @@ void *memmove(void * dest,const void *sr if (dest < src) { __inline_memcpy(dest,src,count); } else { - /* Could be more clever and move longs */ - unsigned long d0, d1, d2; - __asm__ __volatile__( - "std\n\t" - "rep\n\t" - "movsb\n\t" - "cld" - : "=&c" (d0), "=&S" (d1), "=&D" (d2) - :"0" (count), - "1" (count-1+(const char *)src), - "2" (count-1+(char *)dest) - :"memory"); + char *p = (char *) dest + count; + char *s = (char *) src + count; + while (count--) + *--p = *--s; } return dest; } diff -puN arch/x86_64/Makefile~x86-64-merge-for-268rc2-mm1 arch/x86_64/Makefile --- 25/arch/x86_64/Makefile~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.008002472 -0700 +++ 25-akpm/arch/x86_64/Makefile 2004-07-31 16:54:51.057994872 -0700 @@ -59,6 +59,7 @@ endif # -funit-at-a-time shrinks the kernel .text considerably # unfortunately it makes reading oopses harder. CFLAGS += $(call check_gcc,-funit-at-a-time,) +CFLAGS += -msoft-float head-y := arch/x86_64/kernel/head.o arch/x86_64/kernel/head64.o arch/x86_64/kernel/init_task.o @@ -77,6 +78,7 @@ boot := arch/x86_64/boot all: bzImage BOOTIMAGE := arch/x86_64/boot/bzImage +KBUILD_IMAGE := $(BOOTIMAGE) bzImage: vmlinux $(Q)$(MAKE) $(build)=$(boot) $(BOOTIMAGE) diff -puN arch/x86_64/mm/fault.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/mm/fault.c --- 25/arch/x86_64/mm/fault.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.010002168 -0700 +++ 25-akpm/arch/x86_64/mm/fault.c 2004-07-31 16:54:51.059994568 -0700 @@ -58,16 +58,17 @@ void bust_spinlocks(int yes) /* Sometimes the CPU reports invalid exceptions on prefetch. Check that here and ignore. Opcode checker based on code by Richard Brunner */ -static int is_prefetch(struct pt_regs *regs, unsigned long addr) +static noinline int is_prefetch(struct pt_regs *regs, unsigned long addr, + unsigned long error_code) { unsigned char *instr = (unsigned char *)(regs->rip); int scan_more = 1; int prefetch = 0; unsigned char *max_instr = instr + 15; - /* Avoid recursive faults for this common case */ - if (regs->rip == addr) - return 0; + /* If it was a exec fault ignore */ + if (error_code & (1<<4)) + return 0; /* Code segments in LDT could have a non zero base. Don't check when that's possible */ @@ -218,6 +219,18 @@ int unhandled_signal(struct task_struct (tsk->sighand->action[sig-1].sa.sa_handler == SIG_DFL); } +static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs, + unsigned long error_code) +{ + oops_begin(); + printk(KERN_ALERT "%s: Corrupted page table at address %lx\n", + current->comm, address); + dump_pagetable(address); + __die("Bad pagetable", regs, error_code); + oops_end(); + do_exit(SIGKILL); +} + int page_fault_trace; int exception_trace = 1; @@ -268,11 +281,32 @@ asmlinkage void do_page_fault(struct pt_ mm = tsk->mm; info.si_code = SEGV_MAPERR; - /* 5 => page not present and from supervisor mode */ - if (unlikely(!(error_code & 5) && - ((address >= VMALLOC_START && address <= VMALLOC_END) || - (address >= MODULES_VADDR && address <= MODULES_END)))) - goto vmalloc_fault; + + /* + * We fault-in kernel-space virtual memory on-demand. The + * 'reference' page table is init_mm.pgd. + * + * NOTE! We MUST NOT take any locks for this case. We may + * be in an interrupt or a critical region, and should + * only copy the information from the master page table, + * nothing more. + * + * This verifies that the fault happens in kernel space + * (error_code & 4) == 0, and that the fault was not a + * protection error (error_code & 1) == 0. + */ + if (unlikely(address >= TASK_SIZE)) { + if (!(error_code & 5)) + goto vmalloc_fault; + /* + * Don't take the mm semaphore here. If we fixup a prefetch + * fault we could otherwise deadlock. + */ + goto bad_area_nosemaphore; + } + + if (unlikely(error_code & (1 << 3))) + goto page_table_corruption; /* * If we're in an interrupt or have no user @@ -351,18 +385,18 @@ bad_area: bad_area_nosemaphore: #ifdef CONFIG_IA32_EMULATION - /* 32bit vsyscall. map on demand. */ - if (test_thread_flag(TIF_IA32) && + /* 32bit vsyscall. map on demand. */ + if (test_thread_flag(TIF_IA32) && address >= 0xffffe000 && address < 0xffffe000 + PAGE_SIZE) { - if (map_syscall32(mm, address) < 0) - goto out_of_memory2; - return; - } + if (map_syscall32(mm, address) < 0) + goto out_of_memory2; + return; + } #endif /* User mode accesses just cause a SIGSEGV */ if (error_code & 4) { - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; /* Work around K8 erratum #100 K8 in compat mode @@ -376,7 +410,7 @@ bad_area_nosemaphore: return; if (exception_trace && unhandled_signal(tsk, SIGSEGV)) { - printk(KERN_INFO + printk(KERN_INFO "%s[%d]: segfault at %016lx rip %016lx rsp %016lx error %lx\n", tsk->comm, tsk->pid, address, regs->rip, regs->rsp, error_code); @@ -407,7 +441,7 @@ no_context: * Hall of shame of CPU/BIOS bugs. */ - if (is_prefetch(regs, address)) + if (is_prefetch(regs, address, error_code)) return; if (is_errata93(regs, address)) @@ -481,10 +515,8 @@ vmalloc_fault: * is really there and when yes flush the local TLB. */ pgd = pgd_offset_k(address); - if (pgd != current_pgd_offset_k(address)) - BUG(); if (!pgd_present(*pgd)) - goto bad_area_nosemaphore; + goto bad_area_nosemaphore; pmd = pmd_offset(pgd, address); if (!pmd_present(*pmd)) goto bad_area_nosemaphore; @@ -495,4 +527,7 @@ vmalloc_fault: __flush_tlb_all(); return; } + +page_table_corruption: + pgtable_bad(address, regs, error_code); } diff -puN arch/x86_64/mm/init.c~x86-64-merge-for-268rc2-mm1 arch/x86_64/mm/init.c --- 25/arch/x86_64/mm/init.c~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.012001864 -0700 +++ 25-akpm/arch/x86_64/mm/init.c 2004-07-31 16:54:51.060994416 -0700 @@ -416,6 +416,8 @@ int devmem_is_allowed(unsigned long page } +extern int swiotlb_force; + static struct kcore_list kcore_mem, kcore_vmalloc, kcore_kernel, kcore_modules, kcore_vsyscall; @@ -425,7 +427,10 @@ void __init mem_init(void) int tmp; #ifdef CONFIG_SWIOTLB - if (!iommu_aperture && end_pfn >= 0xffffffff>>PAGE_SHIFT) + if (swiotlb_force) + swiotlb = 1; + if (!iommu_aperture && + (end_pfn >= 0xffffffff>>PAGE_SHIFT || force_iommu)) swiotlb = 1; if (swiotlb) swiotlb_init(); @@ -616,7 +621,16 @@ static struct vm_area_struct gate32_vma struct vm_area_struct *get_gate_vma(struct task_struct *tsk) { - return test_tsk_thread_flag(tsk, TIF_IA32) ? &gate32_vma : &gate_vma; +#ifdef CONFIG_IA32_EMULATION + if (test_tsk_thread_flag(tsk, TIF_IA32)) { + /* lookup code assumes the pages are present. set them up + now */ + if (map_syscall32(tsk->mm, 0xfffe000) < 0) + return NULL; + return &gate32_vma; + } +#endif + return &gate_vma; } int in_gate_area(struct task_struct *task, unsigned long addr) diff -puN Documentation/x86_64/boot-options.txt~x86-64-merge-for-268rc2-mm1 Documentation/x86_64/boot-options.txt --- 25/Documentation/x86_64/boot-options.txt~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.013001712 -0700 +++ 25-akpm/Documentation/x86_64/boot-options.txt 2004-07-31 16:54:51.060994416 -0700 @@ -136,17 +136,25 @@ PCI IOMMU - iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,soft] + iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge] + [,forcesac][,fullflush][,nomerge][,noaperture] size set size of iommu (in bytes) noagp don't initialize the AGP driver and use full aperture. off don't use the IOMMU leak turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on) memaper[=order] allocate an own aperture over RAM with size 32MB^order. noforce don't force IOMMU usage. Default. - force Force IOMMU - soft Use software bounce buffering for non 32bit IO. Default on Intel - machines. + force Force IOMMU. + merge Do SG merging. Implies force (experimental) + nomerge Don't do SG merging. + forcesac For SAC mode for masks <40bits (experimental) + fullflush Flush IOMMU on each allocation (default) + nofullflush Don't use IOMMU fullflush + allowed overwrite iommu off workarounds for specific chipsets. + soft Use software bounce buffering (default for Intel machines) + noaperture Don't touch the aperture for AGP. - swiotlb=pages + swiotlb=pages[,force] - Prereserve that many 4K pages for the software IO bounce buffering. + pages Prereserve that many 128K pages for the software IO bounce buffering. + force Force all IO through the software TLB. diff -puN include/asm-x86_64/bitops.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/bitops.h --- 25/include/asm-x86_64/bitops.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.015001408 -0700 +++ 25-akpm/include/asm-x86_64/bitops.h 2004-07-31 16:54:51.062994112 -0700 @@ -25,10 +25,10 @@ * Note that @nr may be almost arbitrarily large; this function is not * restricted to acting on a single-word quantity. */ -static __inline__ void set_bit(long nr, volatile void * addr) +static __inline__ void set_bit(int nr, volatile void * addr) { __asm__ __volatile__( LOCK_PREFIX - "btsq %1,%0" + "btsl %1,%0" :"=m" (ADDR) :"dIr" (nr) : "memory"); } @@ -254,128 +254,10 @@ static __inline__ int variable_test_bit( #undef ADDR -/** - * find_first_zero_bit - find the first zero bit in a memory region - * @addr: The address to start the search at - * @size: The maximum size to search - * - * Returns the bit-number of the first zero bit, not the number of the byte - * containing a bit. - */ -static __inline__ int find_first_zero_bit(const unsigned long * addr, unsigned size) -{ - int d0, d1, d2; - int res; - - if (!size) - return 0; - __asm__ __volatile__( - "movl $-1,%%eax\n\t" - "xorl %%edx,%%edx\n\t" - "repe; scasl\n\t" - "je 1f\n\t" - "xorl -4(%%rdi),%%eax\n\t" - "subq $4,%%rdi\n\t" - "bsfl %%eax,%%edx\n" - "1:\tsubq %%rbx,%%rdi\n\t" - "shlq $3,%%rdi\n\t" - "addq %%rdi,%%rdx" - :"=d" (res), "=&c" (d0), "=&D" (d1), "=&a" (d2) - :"1" ((size + 31) >> 5), "2" (addr), "b" (addr) : "memory"); - return res; -} - -/** - * find_next_zero_bit - find the first zero bit in a memory region - * @addr: The address to base the search on - * @offset: The bitnumber to start searching at - * @size: The maximum size to search - */ -static __inline__ int find_next_zero_bit (const unsigned long * addr, int size, int offset) -{ - unsigned long * p = ((unsigned long *) addr) + (offset >> 6); - unsigned long set = 0; - unsigned long res, bit = offset&63; - - if (bit) { - /* - * Look for zero in first word - */ - __asm__("bsfq %1,%0\n\t" - "cmoveq %2,%0" - : "=r" (set) - : "r" (~(*p >> bit)), "r"(64L)); - if (set < (64 - bit)) - return set + offset; - set = 64 - bit; - p++; - } - /* - * No zero yet, search remaining full words for a zero - */ - res = find_first_zero_bit ((const unsigned long *)p, size - 64 * (p - (unsigned long *) addr)); - return (offset + set + res); -} - - -/** - * find_first_bit - find the first set bit in a memory region - * @addr: The address to start the search at - * @size: The maximum size to search - * - * Returns the bit-number of the first set bit, not the number of the byte - * containing a bit. - */ -static __inline__ int find_first_bit(const unsigned long * addr, unsigned size) -{ - int d0, d1; - int res; - - /* This looks at memory. Mark it volatile to tell gcc not to move it around */ - __asm__ __volatile__( - "xorl %%eax,%%eax\n\t" - "repe; scasl\n\t" - "jz 1f\n\t" - "leaq -4(%%rdi),%%rdi\n\t" - "bsfl (%%rdi),%%eax\n" - "1:\tsubq %%rbx,%%rdi\n\t" - "shll $3,%%edi\n\t" - "addl %%edi,%%eax" - :"=a" (res), "=&c" (d0), "=&D" (d1) - :"1" ((size + 31) >> 5), "2" (addr), "b" (addr) : "memory"); - return res; -} - -/** - * find_next_bit - find the first set bit in a memory region - * @addr: The address to base the search on - * @offset: The bitnumber to start searching at - * @size: The maximum size to search - */ -static __inline__ int find_next_bit(const unsigned long * addr, int size, int offset) -{ - const unsigned long * p = addr + (offset >> 6); - unsigned long set = 0, bit = offset & 63, res; - - if (bit) { - /* - * Look for nonzero in the first 64 bits: - */ - __asm__("bsfq %1,%0\n\t" - "cmoveq %2,%0\n\t" - : "=r" (set) - : "r" (*p >> bit), "r" (64L)); - if (set < (64 - bit)) - return set + offset; - set = 64 - bit; - p++; - } - /* - * No set bit yet, search remaining full words for a bit - */ - res = find_first_bit (p, size - 64 * (p - addr)); - return (offset + set + res); -} +extern long find_first_zero_bit(const unsigned long * addr, unsigned long size); +extern long find_next_zero_bit (const unsigned long * addr, long size, long offset); +extern long find_first_bit(const unsigned long * addr, unsigned long size); +extern long find_next_bit(const unsigned long * addr, long size, long offset); /* * Find string of zero bits in a bitmap. -1 when not found. diff -puN include/asm-x86_64/dma-mapping.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/dma-mapping.h --- 25/include/asm-x86_64/dma-mapping.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.016001256 -0700 +++ 25-akpm/include/asm-x86_64/dma-mapping.h 2004-07-31 16:54:51.062994112 -0700 @@ -1,6 +1,137 @@ #ifndef _X8664_DMA_MAPPING_H #define _X8664_DMA_MAPPING_H 1 -#include +/* + * IOMMU interface. See Documentation/DMA-mapping.txt and DMA-API.txt for + * documentation. + */ + +#include +#include +#include +#include + +extern dma_addr_t bad_dma_address; +#define dma_mapping_error(x) \ + (swiotlb ? swiotlb_dma_mapping_error(x) : ((x) == bad_dma_address)) + +void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, + unsigned gfp); +void dma_free_coherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle); + +#ifdef CONFIG_GART_IOMMU + +extern dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size, + int direction); +extern void dma_unmap_single(struct device *dev, dma_addr_t addr,size_t size, + int direction); + +#else + +/* No IOMMU */ + +static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr, + size_t size, int direction) +{ + dma_addr_t addr; + + if (direction == DMA_NONE) + out_of_line_bug(); + addr = virt_to_bus(ptr); + + if ((addr+size) & ~hwdev->dma_mask) + out_of_line_bug(); + return addr; +} + +static inline void dma_unmap_single(struct device *hwdev, dma_addr_t dma_addr, + size_t size, int direction) +{ + if (direction == DMA_NONE) + out_of_line_bug(); + /* Nothing to do */ +} + +#endif + +#define dma_map_page(dev,page,offset,size,dir) \ + dma_map_single((dev), page_address(page)+(offset), (size), (dir)) + +static inline void dma_sync_single_for_cpu(struct device *hwdev, + dma_addr_t dma_handle, + size_t size, int direction) +{ + if (direction == DMA_NONE) + out_of_line_bug(); + + if (swiotlb) + return swiotlb_sync_single_for_cpu(hwdev,dma_handle,size,direction); + + flush_write_buffers(); +} + +static inline void dma_sync_single_for_device(struct device *hwdev, + dma_addr_t dma_handle, + size_t size, int direction) +{ + if (direction == DMA_NONE) + out_of_line_bug(); + + if (swiotlb) + return swiotlb_sync_single_for_device(hwdev,dma_handle,size,direction); + + flush_write_buffers(); +} + +static inline void dma_sync_sg_for_cpu(struct device *hwdev, + struct scatterlist *sg, + int nelems, int direction) +{ + if (direction == DMA_NONE) + out_of_line_bug(); + + if (swiotlb) + return swiotlb_sync_sg_for_cpu(hwdev,sg,nelems,direction); + + flush_write_buffers(); +} + +static inline void dma_sync_sg_for_device(struct device *hwdev, + struct scatterlist *sg, + int nelems, int direction) +{ + if (direction == DMA_NONE) + out_of_line_bug(); + + if (swiotlb) + return swiotlb_sync_sg_for_device(hwdev,sg,nelems,direction); + + flush_write_buffers(); +} + +extern int dma_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, int direction); +extern void dma_unmap_sg(struct device *hwdev, struct scatterlist *sg, + int nents, int direction); + +#define dma_unmap_page dma_unmap_single + +extern int dma_supported(struct device *hwdev, u64 mask); +extern int dma_get_cache_alignment(void); +#define dma_is_consistent(h) 1 + +static inline int dma_set_mask(struct device *dev, u64 mask) +{ + if (!dev->dma_mask || !dma_supported(dev, mask)) + return -EIO; + *dev->dma_mask = mask; + return 0; +} + +static inline void dma_cache_sync(void *vaddr, size_t size, enum dma_data_direction dir) +{ + flush_write_buffers(); +} #endif diff -puN include/asm-x86_64/i387.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/i387.h --- 25/include/asm-x86_64/i387.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.017001104 -0700 +++ 25-akpm/include/asm-x86_64/i387.h 2004-07-31 16:54:51.063993960 -0700 @@ -39,16 +39,25 @@ static inline int need_signal_i387(struc * FPU lazy state save handling... */ -#define kernel_fpu_end() stts() - #define unlazy_fpu(tsk) do { \ if ((tsk)->thread_info->status & TS_USEDFPU) \ save_init_fpu(tsk); \ } while (0) +/* Ignore delayed exceptions from user space */ +static inline void tolerant_fwait(void) +{ + asm volatile("1: fwait\n" + "2:\n" + " .section __ex_table,\"a\"\n" + " .align 8\n" + " .quad 1b,2b\n" + " .previous\n"); +} + #define clear_fpu(tsk) do { \ if ((tsk)->thread_info->status & TS_USEDFPU) { \ - asm volatile("fnclex ; fwait"); \ + tolerant_fwait(); \ (tsk)->thread_info->status &= ~TS_USEDFPU; \ stts(); \ } \ @@ -116,6 +125,7 @@ static inline int save_i387_checking(str static inline void kernel_fpu_begin(void) { struct thread_info *me = current_thread_info(); + preempt_disable(); if (me->status & TS_USEDFPU) { asm volatile("rex64 ; fxsave %0 ; fnclex" : "=m" (me->task->thread.i387.fxsave)); @@ -125,9 +135,15 @@ static inline void kernel_fpu_begin(void clts(); } +static inline void kernel_fpu_end(void) +{ + stts(); + preempt_enable(); +} + static inline void save_init_fpu( struct task_struct *tsk ) { - asm volatile( "fxsave %0 ; fnclex" + asm volatile( "rex64 ; fxsave %0 ; fnclex" : "=m" (tsk->thread.i387.fxsave)); tsk->thread_info->status &= ~TS_USEDFPU; stts(); diff -puN include/asm-x86_64/ia32.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/ia32.h --- 25/include/asm-x86_64/ia32.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.018000952 -0700 +++ 25-akpm/include/asm-x86_64/ia32.h 2004-07-31 16:54:51.064993808 -0700 @@ -78,12 +78,6 @@ struct stat64 { unsigned long long st_ino; } __attribute__((packed)); - -typedef union sigval32 { - int sival_int; - unsigned int sival_ptr; -} sigval_t32; - typedef struct siginfo32 { int si_signo; int si_errno; @@ -102,7 +96,7 @@ typedef struct siginfo32 { struct { int _tid; /* timer id */ int _overrun; /* overrun count */ - sigval_t32 _sigval; /* same as below */ + compat_sigval_t _sigval; /* same as below */ int _sys_private; /* not to be passed to user */ int _overrun_incr; /* amount to add to overrun */ } _timer; @@ -111,7 +105,7 @@ typedef struct siginfo32 { struct { unsigned int _pid; /* sender's pid */ unsigned int _uid; /* sender's uid */ - sigval_t32 _sigval; + compat_sigval_t _sigval; } _rt; /* SIGCHLD */ diff -puN include/asm-x86_64/io.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/io.h --- 25/include/asm-x86_64/io.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.020000648 -0700 +++ 25-akpm/include/asm-x86_64/io.h 2004-07-31 16:54:51.064993808 -0700 @@ -299,11 +299,8 @@ out: #define flush_write_buffers() -/* Disable vmerge for now. Need to fix the block layer code - to check for non iommu addresses first. - When the IOMMU is force it is safe to enable. */ -extern int iommu_merge; -#define BIO_VMERGE_BOUNDARY (iommu_merge ? 4096 : 0) +extern int iommu_bio_merge; +#define BIO_VMERGE_BOUNDARY iommu_bio_merge #endif /* __KERNEL__ */ diff -puN include/asm-x86_64/mtrr.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/mtrr.h --- 25/include/asm-x86_64/mtrr.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.021000496 -0700 +++ 25-akpm/include/asm-x86_64/mtrr.h 2004-07-31 16:54:51.064993808 -0700 @@ -71,8 +71,6 @@ struct mtrr_gentry #ifdef __KERNEL__ -extern char *mtrr_strings[MTRR_NUM_TYPES]; - /* The following functions are for use by other drivers */ # ifdef CONFIG_MTRR extern int mtrr_add (unsigned long base, unsigned long size, diff -puN include/asm-x86_64/pci.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/pci.h --- 25/include/asm-x86_64/pci.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.023000192 -0700 +++ 25-akpm/include/asm-x86_64/pci.h 2004-07-31 16:54:51.066993504 -0700 @@ -44,81 +44,25 @@ int pcibios_set_irq_routing(struct pci_d #include #include -struct pci_dev; - extern int iommu_setup(char *opt); -extern dma_addr_t bad_dma_address; -#define pci_dma_mapping_error(x) ((x) == bad_dma_address) - -/* Allocate and map kernel buffer using consistent mode DMA for a device. - * hwdev should be valid struct pci_dev pointer for PCI devices, - * NULL for PCI-like buses (ISA, EISA). - * Returns non-NULL cpu-view pointer to the buffer if successful and - * sets *dma_addrp to the pci side dma address as well, else *dma_addrp - * is undefined. - */ -extern void *pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle); - -/* Free and unmap a consistent DMA buffer. - * cpu_addr is what was returned from pci_alloc_consistent, - * size must be the same as what as passed into pci_alloc_consistent, - * and likewise dma_addr must be the same as what *dma_addrp was set to. - * - * References to the memory and mappings associated with cpu_addr/dma_addr - * past this call are illegal. - */ -extern void pci_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle); - -#ifdef CONFIG_SWIOTLB -extern int swiotlb; -extern dma_addr_t swiotlb_map_single (struct device *hwdev, void *ptr, size_t size, - int dir); -extern void swiotlb_unmap_single (struct device *hwdev, dma_addr_t dev_addr, - size_t size, int dir); -extern void swiotlb_sync_single_for_cpu (struct device *hwdev, - dma_addr_t dev_addr, - size_t size, int dir); -extern void swiotlb_sync_single_for_device (struct device *hwdev, - dma_addr_t dev_addr, - size_t size, int dir); -extern void swiotlb_sync_sg_for_cpu (struct device *hwdev, - struct scatterlist *sg, int nelems, - int dir); -extern void swiotlb_sync_sg_for_device (struct device *hwdev, - struct scatterlist *sg, int nelems, - int dir); -extern int swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, - int nents, int direction); -extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg, - int nents, int direction); - -#endif - #ifdef CONFIG_GART_IOMMU - -/* Map a single buffer of the indicated size for DMA in streaming mode. - * The 32-bit bus address to use is returned. +/* The PCI address space does equal the physical memory + * address space. The networking and block device layers use + * this boolean for bounce buffer decisions * - * Once the device is given the dma address, the device owns this memory - * until either pci_unmap_single or pci_dma_sync_single_for_cpu is performed. + * On AMD64 it mostly equals, but we set it to zero to tell some subsystems + * that an IOMMU is available. */ -extern dma_addr_t pci_map_single(struct pci_dev *hwdev, void *ptr, size_t size, - int direction); - - -void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t addr, - size_t size, int direction); +#define PCI_DMA_BUS_IS_PHYS (no_iommu ? 1 : 0) /* - * pci_{map,unmap}_single_page maps a kernel page to a dma_addr_t. identical - * to pci_map_single, but takes a struct page instead of a virtual address + * x86-64 always supports DAC, but sometimes it is useful to force + * devices through the IOMMU to get automatic sg list merging. + * Optional right now. */ - -#define pci_map_page(dev,page,offset,size,dir) \ - pci_map_single((dev), page_address(page)+(offset), (size), (dir)) +extern int iommu_sac_force; +#define pci_dac_dma_supported(pci_dev, mask) (!iommu_sac_force) #define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME) \ dma_addr_t ADDR_NAME; @@ -133,113 +77,12 @@ void pci_unmap_single(struct pci_dev *hw #define pci_unmap_len_set(PTR, LEN_NAME, VAL) \ (((PTR)->LEN_NAME) = (VAL)) -static inline void pci_dma_sync_single_for_cpu(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - -#ifdef CONFIG_SWIOTLB - if (swiotlb) - return swiotlb_sync_single_for_cpu(&hwdev->dev,dma_handle,size,direction); -#endif - - flush_write_buffers(); -} - -static inline void pci_dma_sync_single_for_device(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - -#ifdef CONFIG_SWIOTLB - if (swiotlb) - return swiotlb_sync_single_for_device(&hwdev->dev,dma_handle,size,direction); -#endif - - flush_write_buffers(); -} - -static inline void pci_dma_sync_sg_for_cpu(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - -#ifdef CONFIG_SWIOTLB - if (swiotlb) - return swiotlb_sync_sg_for_cpu(&hwdev->dev,sg,nelems,direction); -#endif - flush_write_buffers(); -} - -static inline void pci_dma_sync_sg_for_device(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - BUG_ON(direction == PCI_DMA_NONE); - -#ifdef CONFIG_SWIOTLB - if (swiotlb) - return swiotlb_sync_sg_for_device(&hwdev->dev,sg,nelems,direction); -#endif - flush_write_buffers(); -} - -/* The PCI address space does equal the physical memory - * address space. The networking and block device layers use - * this boolean for bounce buffer decisions - * - * On AMD64 it mostly equals, but we set it to zero to tell some subsystems - * that an IOMMU is available. - */ -#define PCI_DMA_BUS_IS_PHYS (no_iommu ? 1 : 0) - -/* We lie slightly when the IOMMU is forced to get the device to - use SAC instead of DAC. */ -#define pci_dac_dma_supported(pci_dev, mask) (force_iommu ? 0 : 1) - #else -static inline dma_addr_t pci_map_single(struct pci_dev *hwdev, void *ptr, - size_t size, int direction) -{ - dma_addr_t addr; +/* No IOMMU */ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); - addr = virt_to_bus(ptr); - - /* - * This is gross, but what should I do. - * Unfortunately drivers do not test the return value of this. - */ - if ((addr+size) & ~hwdev->dma_mask) - out_of_line_bug(); - return addr; -} - -static inline void pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, int direction) -{ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); - /* Nothing to do */ -} - -static inline dma_addr_t pci_map_page(struct pci_dev *hwdev, struct page *page, - unsigned long offset, size_t size, int direction) -{ - dma_addr_t addr; - if (direction == PCI_DMA_NONE) - out_of_line_bug(); - addr = page_to_pfn(page) * PAGE_SIZE + offset; - if ((addr+size) & ~hwdev->dma_mask) - out_of_line_bug(); - return addr; -} +#define PCI_DMA_BUS_IS_PHYS 1 +#define pci_dac_dma_supported(pci_dev, mask) 1 -/* pci_unmap_{page,single} is a nop so... */ #define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME) #define DECLARE_PCI_UNMAP_LEN(LEN_NAME) #define pci_unmap_addr(PTR, ADDR_NAME) (0) @@ -247,74 +90,9 @@ static inline dma_addr_t pci_map_page(st #define pci_unmap_len(PTR, LEN_NAME) (0) #define pci_unmap_len_set(PTR, LEN_NAME, VAL) do { } while (0) -/* Make physical memory consistent for a single - * streaming mode DMA translation after a transfer. - * - * If you perform a pci_map_single() but wish to interrogate the - * buffer using the cpu, yet do not wish to teardown the PCI dma - * mapping, you must call this function before doing so. At the - * next point you give the PCI dma address back to the card, you - * must first perform a pci_dma_sync_for_device, and then the - * device again owns the buffer. - */ -static inline void pci_dma_sync_single_for_cpu(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); -} - -static inline void pci_dma_sync_single_for_device(struct pci_dev *hwdev, - dma_addr_t dma_handle, - size_t size, int direction) -{ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); - flush_write_buffers(); -} - -/* Make physical memory consistent for a set of streaming - * mode DMA translations after a transfer. - * - * The same as pci_dma_sync_single_* but for a scatter-gather list, - * same rules and usage. - */ -static inline void pci_dma_sync_sg_for_cpu(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); -} - -static inline void pci_dma_sync_sg_for_device(struct pci_dev *hwdev, - struct scatterlist *sg, - int nelems, int direction) -{ - if (direction == PCI_DMA_NONE) - out_of_line_bug(); - flush_write_buffers(); -} - -#define PCI_DMA_BUS_IS_PHYS 1 - -#define pci_dac_dma_supported(pci_dev, mask) 1 #endif -extern int pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction); -extern void pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction); - -#define pci_unmap_page pci_unmap_single - -/* Return whether the given PCI device DMA address mask can - * be supported properly. For example, if your device can - * only drive the low 24-bits during PCI bus mastering, then - * you would pass 0x00ffffff as the mask to this function. - */ -extern int pci_dma_supported(struct pci_dev *hwdev, u64 mask); +#include static inline dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, struct page *page, unsigned long offset, int direction) @@ -359,7 +137,6 @@ static inline void pcibios_add_platform_ /* generic pci stuff */ #ifdef CONFIG_PCI #include -#include #endif #endif /* __x8664_PCI_H */ diff -puN include/asm-x86_64/proto.h~x86-64-merge-for-268rc2-mm1 include/asm-x86_64/proto.h --- 25/include/asm-x86_64/proto.h~x86-64-merge-for-268rc2-mm1 2004-07-31 16:54:51.024000040 -0700 +++ 25-akpm/include/asm-x86_64/proto.h 2004-07-31 16:54:51.067993352 -0700 @@ -103,6 +103,8 @@ extern int fallback_aper_force; extern int iommu_aperture; extern int iommu_aperture_disabled; extern int iommu_aperture_allowed; +extern int fix_aperture; +extern int force_iommu; extern void smp_local_timer_interrupt(struct pt_regs * regs); diff -puN /dev/null include/asm-x86_64/swiotlb.h --- /dev/null 2003-09-15 06:40:47.000000000 -0700 +++ 25-akpm/include/asm-x86_64/swiotlb.h 2004-07-31 16:54:51.067993352 -0700 @@ -0,0 +1,36 @@ +#ifndef _ASM_SWIOTLB_H +#define _ASM_SWTIOLB_H 1 + +#include + +/* SWIOTLB interface */ + +extern dma_addr_t swiotlb_map_single(struct device *hwdev, void *ptr, size_t size, + int dir); +extern void swiotlb_unmap_single(struct device *hwdev, dma_addr_t dev_addr, + size_t size, int dir); +extern void swiotlb_sync_single_for_cpu(struct device *hwdev, + dma_addr_t dev_addr, + size_t size, int dir); +extern void swiotlb_sync_single_for_device(struct device *hwdev, + dma_addr_t dev_addr, + size_t size, int dir); +extern void swiotlb_sync_sg_for_cpu(struct device *hwdev, + struct scatterlist *sg, int nelems, + int dir); +extern void swiotlb_sync_sg_for_device(struct device *hwdev, + struct scatterlist *sg, int nelems, + int dir); +extern int swiotlb_map_sg(struct device *hwdev, struct scatterlist *sg, + int nents, int direction); +extern void swiotlb_unmap_sg(struct device *hwdev, struct scatterlist *sg, + int nents, int direction); +extern int swiotlb_dma_mapping_error(dma_addr_t dma_addr); + +#ifdef CONFIG_SWIOTLB +extern int swiotlb; +#else +#define swiotlb 0 +#endif + +#endif _